Communications in Computer and Information Science
151
Tai-hoon Kim Hojjat Adeli Rosslin John Robles Maricel Balitanas (Eds.)
Ubiquitous Computing and Multimedia Applications Second International Conference, UCMA 2011 Daejeon, Korea, April 13-15, 2011 Proceedings, Part II
13
Volume Editors Tai-hoon Kim Hannam University, Daejeon, Korea E-mail:
[email protected] Hojjat Adeli The Ohio State University Columbus, OH, USA E-mail:
[email protected] Rosslin John Robles Hannam University, Daejeon, Korea E-mail:
[email protected] Maricel Balitanas Hannam University, Daejeon, Korea E-mail:
[email protected]
ISSN 1865-0929 e-ISSN 1865-0937 ISBN 978-3-642-20997-0 e-ISBN 978-3-642-20998-7 DOI 10.1007/978-3-642-20998-7 Springer Heidelberg Dordrecht London New York Library of Congress Control Number: 2011926692 CR Subject Classification (1998): H.5.1, I.5, I.2, I.4, F.1, H.3, H.4
© Springer-Verlag Berlin Heidelberg 2011 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. The use of general descriptive names, registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)
Foreword
Ubiquitous computing and multimedia applications are areas that attract many academic and industry professionals. The goal of the International Conference on Ubiquitous Computing and Multimedia Applications is to bring together researchers from academia and industry as well as practitioners to share ideas, problems and solutions relating to the multifaceted aspects of ubiquitous computing and multimedia applications. We would like to express our gratitude to all of the authors of submitted papers and to all attendees, for their contributions to and participation in UCMA 2011. We believe in the need for continuing this undertaking in the future. We acknowledge the great effort of all the Chairs and the members of advisory boards and Program Committees of the above-listed event, who selected 15% of over 570 submissions, following a rigorous peer-review process. Special thanks go to SERSC (Science and Engineering Research Support Society) for supporting this conference. We are grateful in particular to the following speakers who kindly accepted our invitation and, in this way, helped to meet the objectives of the conference: Sabah Mohammed of Lakehead University and Peter Baranyi of Budapest University of Technology and Economics (BME). March 2011
Chairs of UCMA 2011
Preface
We would like to welcome you to the proceedings of the 2011 International Conference on Ubiquitous Computing and Multimedia Applications (UCMA 2011) which was held during April 13-15, 2011, at Hannam University, Daejeon, Korea. UCMA 2011 is focused on various aspects of advances in multimedia applications and ubiquitous computing with computational sciences, mathematics and information technology. It provided a chance for academic and industry professionals to discuss recent progress in the related areas. We expect that the conference and its publications will be a trigger for further related research and technology improvements in this important subject. We would like to acknowledge the great effort of all the Chairs and members of the Program Committee. Out of around 570 submissions to UCMA 2011, we accepted 86 papers to be included in the proceedings and presented during the conference. This gives an acceptance ratio firmly below 20%. Regular conference papers can be found in this volume while Special Session papers can be found in CCIS 151. We would like to express our gratitude to all of the authors of submitted papers and to all the attendees for their contributions and participation. We believe in the need for continuing this undertaking in the future. Once more, we would like to thank all the organizations and individuals who supported this event as a whole and, in particular, helped in the success of UCMA 2011. March 2011
Tai-hoon Kim Hojjat Adeli Rosslin John Robles Maricel Balitanas
Organization
Organizing Committee Honorary Co-chairs:
General Co-chairs:
Program Co-chairs:
Workshop Co-chairs:
Hyung-tae Kim (Hannam University, Korea) Hojjat Adeli (The Ohio State University, USA) Wai-chi Fang (National Chiao Tung University, Taiwan) Carlos Ramos (GECAD/ISEP, Portugal) Haeng-kon Kim (Catholic University of Daegu, Korea) Tai-hoon Kim (Hannam University, Korea) Sabah Mohammed (Lakehead University, Canada) Muhammad Khurram Khan (King Saud University, Saudi Arabia) Seok-soo Kim (Hannam University, Korea) Timothy K. Shih (Asia University, Taiwan)
International Advisory Board: Cao Jiannong (The Hong Kong Polytechnic University, Hong Kong) Frode Eika Sandnes (Oslo University College, Norway) Schahram Dustdar (Vienna University of Technology, Austria) Andrea Omicini (Universit` a di Bologna, Italy) Lionel Ni (The Hong Kong University of Science and Technology, Hong Kong) Rajkumar Buyya (University of Melbourne, Australia) Hai Jin (Huazhong University of Science and Technology, China) N. Jaisankar (VIT University, India) Gil-cheol Park (Hannam University, Korea) Ha Jin Hwang (Kazakhstan Institute of Management, Economics, and Strategic Research, Republic of Kazakhstan)
X
Organization
Publicity Co-chairs:
Publication Co-chairs:
Paolo Bellavista (Universit`a di Bologna, Italy) Ing-Ray Chen (Virginia Polytechnic Institute and State University, USA) Yang Xiao (University of Alabama, USA) J.H. Abawajy (Deakin University, Australia) Ching-Hsien Hsu (Chung Hua University, Taiwan) Deepak Laxmi Narasimha (University of Malaya, Malaysia) Prabhat K. Mahanti (University of New Brunswick, Canada) Soumya Banerjee (Birla Institute of Technology, India) Byungjoo Park (Hannam University, Korea) Debnath Bhattacharyya (MPCT, India)
Program Committee Alexander Loui Biplab K. Sarker Brian King Chantana Chantrapornchai Claudia Linnhoff-Popien D. Manivannan Dan Liu Eung Nam Ko Georgios Kambourakis Gerard Damm Han-Chieh Chao Hongli Luo Igor Kotenko J.H. Abawajy Jalal Al-Muhtadi
Javier Garcia-Villalba Khaled El-Maleh Khalil Drira Larbi Esmahi Liang Fan Mahmut Kandemir Malrey Lee Marco Roccetti Mei-Ling Shyu Ming Li Pao-Ann Hsiung Paolo Bellavista Rami Yared Rainer Malaka Robert C. Hsu Robert G. Reynolds
Rodrigo Mello Schahram Dustdar Seung-Hyun Seo Seunglim Yong Stefano Ferretti Stuart J. Barnes Su Myeon Kim Swapna S. Gokhale Taenam Cho Tony Shan Toshihiro Yamauchi Wanquan Liu Wenjing Jia Yao-Chung Chang
Table of Contents – Part II
A Smart Error Protection Scheme Based on Estimation of Perceived Speech Quality for Portable Digital Speech Streaming Systems . . . . . . . . . Jin Ah Kang and Hong Kook Kim
1
MDCT-Domain Packet Loss Concealment for Scalable Wideband Speech Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nam In Park and Hong Kook Kim
11
High-Quality and Low-Complexity Real-Time Voice Changing with Seamless Switching for Digital Imaging Devices . . . . . . . . . . . . . . . . . . . . . . Sung Dong Jo, Young Han Lee, Ji Hun Park, Hong Kook Kim, Ji Woon Kim, and Myeong Bo Kim Complexity Reduction of Virtual Reverberation Filtering Based on Index-Based Convolution for Resource-Constrained Devices . . . . . . . . . . . Kwang Myung Jeon, Nam In Park, Hong Kook Kim, Ji Woon Kim, and Myeong Bo Kim Audio Effect for Highlighting Speaker’s Voice Corrupted by Background Noise on Portable Digital Imaging Devices . . . . . . . . . . . . . . . . . . . . . . . . . . Jin Ah Kang, Chan Jun Chun, Hong Kook Kim, Ji Woon Kim, and Myeong Bo Kim
19
28
39
Detection of Howling Frequency Using Temporal Variations in Power Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jae-Won Lee and Seung Ho Choi
46
Differential Brain Activity in Reading Hangul and Hanja in Korean Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyo Woon Yoon and Ji-Hyang Lim
52
Computational Neural Model of the Bilingual Stroop Effect: An fMRI Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyo Woon Yoon
60
The Eye Movement and Data Processing Due to Obtained BOLD Signal in V1 : A Study of Simultaneous Measurement of EOG and fMRI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyo Woon Yoon, Dong-Hwa Kim, Young Jae Lee, Hyun-Chang Lee, and Ji-Hyang Lim
66
XII
Table of Contents – Part II
Face Tracking for Augmented Reality Game Interface and Brand Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yong Jae Lee and Young Jae Lee
72
On-line and Mobile Delivery Data Management for Enhancing Customer Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyun-Chang Lee, Seong Yoon Shin, and Yang Won Rhee
79
Design of LCL Filter Using Hybrid Intelligent Optimization for Photovoltaic System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jae Hoon Cho, Dong-Hwa Kim, M´ aria Virˇc´ıkov´ a, and Peter Sinˇc´ ak
90
Distributed Energy Management for Stand-Alone Photovoltaic System with Storages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jae Hoon Cho and Dong-Hwa Kim
98
Framework for Performance Metrics and Service Class for Providing End-to-End Services across Multiple Provider Domains . . . . . . . . . . . . . . . Chin-Chol Kim, Jaesung Park, and Yujin Lim
107
Design of a Transmission Simulator Based on Hierarchical Model . . . . . . . Sang Hyuck Han and Young Kuk Kim Recommendation System of IPTV TV Program Using Ontology and K-means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jongwoo Kim, Eungju Kwon, Yongsuk Cho, and Sanggil Kang A Novel Interactive Virtual Training System . . . . . . . . . . . . . . . . . . . . . . . . Yoon Sang Kim and Hak-Man Kim
114
123 129
Recommendation Algorithm of the App Store by Using Semantic Relations between Apps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yujin Lim, Hak-Man Kim, Sanggil Kang, and Tai-hoon Kim
139
A Comparative Study of Bankruptcy Rules for Load-shedding Scheme in Agent-Based Microgrid Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hak-Man Kim and Tetsuo Kinoshita
145
The Visualization Tool of the Open-Source Based for Flight Waypoint Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Myeong-Chul Park and Seok-Wun Ha
153
Space-Efficient On-the-fly Race Detection Using Loop Splitting . . . . . . . . Yong-Cheol Kim, Sang-Soo Jun, and Yong-Kee Jun A Comparative Study on Responding Methods for TCP’s Fast Recovery in Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mi-Young Park, Sang-Hwa Chung, Kyeong-Ae Shin, and Guangjie Han
162
170
Table of Contents – Part II
XIII
The Feasibility Study of Attacker Localization in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Young-Joo Kim and Sejun Song
180
Efficiency of e-NR Labeling for On-the-fly Race Detection of Programs with Nested Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sun-Sook Kim, Ok-Kyoon Ha, and Yong-Kee Jun
191
Lightweight Labeling Scheme for On-the-fly Race Detection of Signal Handlers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Guy Martin Tchamgoue, Ok-Kyoon Ha, Kyong-Hoon Kim, and Yong-Kee Jun
201
Automatic Building of Real-Time Multicore Systems Based on Simulink Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Minji Cha and Kyong Hoon Kim
209
A Study on the RFID/USN Integrated Middleware for Effective Event Processing in Ubiquitous Livestock Barn . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeonghwan Hwang and Hyun Yoe
221
Design of Android-Based Integrated Management System for Livestock Barns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . JiWoong Lee and Hyun Yoe
229
A Study of Energy Efficient MAC Based on Contextual Information for Ubiquitous Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hochul Lee and Hyun Yoe
234
A Service Scenario Based on a Context-Aware Workflow Language in u-Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yongyun Cho and Hyun Yoe
240
A Sensor Data Processing System for Mobile Application Based Wetland Environment Context-aware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoon-Cheol Hwang, Ryum-Duck Oh, and Gwi-Hwan Ji
245
An Intelligent Context-Aware Learning System Based on Mobile Augmented Reality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jin-Il Kim, Inn-woo Park, and Hee-Hyol Lee
255
Inference Engine Design for USN Based Wetland Context Inference . . . . So-Young Im, Ryum-Duck Oh, and Yoon-Cheol Hwang Metadata Management System for Wetland Environment Context-Aware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jun-Yong Park, Joon-Mo Yang, and Ryum-Duck Oh
265
275
XIV
Table of Contents – Part II
Toward Open World and Multimodal Situation Models for Sensor-Aware Web Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoshitaka Sakurai, Paolo Ceravolo, Ernesto Damiani, and Setsuo Tsuruta
285
Design of Intelligence Mobile Cloud Service Platform Based Context-Aware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyokyung Chang and Euiin Choi
294
Similarity Checking of Handwritten Signature Using Binary Dotplot Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Debnath Bhattacharyya and Tai-hoon Kim
299
Brain Tumor Detection Using MRI Image Analysis . . . . . . . . . . . . . . . . . . . Debnath Bhattacharyya and Tai-hoon Kim
307
Image Data Hiding Technique Using Discrete Fourier Transformation . . . Debnath Bhattacharyya and Tai-hoon Kim
315
Design of Guaranteeing System of Service Quality through the Verifying Users of Hash Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yoon-Su Jeong, Yong-Tae Kim, and Gil-Cheol Park
324
Design of RSSI Signal Based Transmit-Receiving Device for Preventing from Wasting Electric Power of Transmit in Sensor Network. . . . . . . . . . . Yong-Tae Kim, Yoon-Su Jeong, and Gil-Cheol Park
331
User Authentication in Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyokyung Chang and Euiin Choi
338
D-PACs for Access Control and Profile Storage on Ubiquitous Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Changbok Jang and Euiin Choi
343
An Approach to Roust Control Management in Mobile IPv6 for Ubiquitous Integrate Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giovanni Cagalaban and Byungjoo Park
354
A Software Framework to Associate Multiple FPMN MSISDNs with a HPMN IMSI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dongcheul Lee and Byung Ho Rhe
360
Uplink Interference Adjustment for Mobile Satellite Service in Multi-beam Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ill-Keun Rhee, Sang-Am Kim, Keewan Jung, Erchin Serpedin, Jong-Min Park, and Young-Hun Lee Protecting Computer Network with Encryption Technique: A Study . . . . Kamaljit I. Lakhtaria
371
381
Table of Contents – Part II
A Study on the Strategies of Foreign Market Expansion for Korean IT Venture Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Woong Eun and Yong-Seok Seo Binding Update Schemes for Inter-domain Mobility Management in Hierarchical Mobile IPv6: A Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Afshan Ahmed, Jawad Hassan, Mata-Ur-Rehman, and Farrukh Aslam Khan
XV
391
411
A Pilot Study to Analyze the Effects of User Experience and Device Characteristics on the Customer Satisfaction of Smartphone Users . . . . . Bong-Won Park and Kun Chang Lee
421
Exploring the Optimal Path to Online Game Loyalty: Bayesian Networks versus Theory-Based Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . Nam Yong Jo, Kun Chang Lee, and Bong-Won Park
428
The Effect of Users’ Characteristics and Experiential Factors on the Compulsive Usage of the Smartphone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bong-Won Park and Kun Chang Lee
438
Leadership Styles, Web-Based Commitment and Their Subsequent Impacts on e-Learning Performance in Virtual Community . . . . . . . . . . . . Dae Sung Lee, Nam Young Jo, and Kun Chang Lee
447
Test Case Generation for Formal Concept Analysis . . . . . . . . . . . . . . . . . . . Ha Jin Hwang and Joo Ik Tak
457
Mobile Specification Using Semantic Networks . . . . . . . . . . . . . . . . . . . . . . . Haeng-Kon Kim
459
A Novel Image Classification Algorithm Using Swarm-Based Technique for Image Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noorhaniza Wahid An Optimal Mesh Algorithm for Remote Protein Homology Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Firdaus M. Abdullah, Razib M. Othman, Shahreen Kasim, and Rathiah Hashim Definition of Consistency Rules between UML Use Case and Activity Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noraini Ibrahim, Rosziati Ibrahim, Mohd Zainuri Saringat, Dzahar Mansor, and Tutut Herawan Rough Set Approach for Attributes Selection of Traditional Malay Musical Instruments Sounds Classification . . . . . . . . . . . . . . . . . . . . . . . . . . Norhalina Senan, Rosziati Ibrahim, Nazri Mohd Nawi, Iwan Tri Riyadi Yanto, and Tutut Herawan
460
471
498
509
XVI
Table of Contents – Part II
Pairwise Protein Substring Alignment with Latent Semantic Analysis and Support Vector Machines to Detect Remote Protein Homology . . . . . Surayati Ismail, Razib M. Othman, and Shahreen Kasim Jordan Pi-Sigma Neural Network for Temperature Prediction . . . . . . . . . . Noor Aida Husaini, Rozaida Ghazali, Nazri Mohd Nawi, and Lokman Hakim Ismail Accelerating Learning Performance of Back Propagation Algorithm by Using Adaptive Gain Together with Adaptive Momentum and Adaptive Learning Rate on Classification Problems . . . . . . . . . . . . . . . . . . . Norhamreeza Abdul Hamid, Nazri Mohd Nawi, Rozaida Ghazali, and Mohd Najib Mohd Salleh Developing an HCI Model: An Exploratory Study of Featuring Collaborative System Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Saidatina Fatimah Ismail, Rathiah Hashim, and Siti Zaleha Zainal Abidin Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
526 547
559
571
581
Table of Contents – Part I
A SOA-Based Service Composition for Interactive Ubiquitous Entertainment Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Giovanni Cagalaban and Seoksoo Kim
1
A Study on Safe Reproduction of Reference Points for Recognition on Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sungmo Jung and Seoksoo Kim
7
Maximized Energy Saving Sleep Mode in IEEE 802.16e/m . . . . . . . . . . . . Van Cuong Nguyen, Van Thuan Pham, and Bong-Kyo Moon
11
Propose New Structure for the Buildings Model . . . . . . . . . . . . . . . . . . . . . . Tuan Anh Nguyen gia
23
Solving Incomplete Datasets in Soft Set Using Parity Bits of Supported Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ahmad Nazari Mohd. Rose, Hasni Hassan, Mohd Isa Awang, Tutut Herawan, and Mustafa Mat Deris An Algorithm for Mining Decision Rules Based on Decision Network and Rough Set Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hossam Abd Elmaksoud Mohamed A Probabilistic Rough Set Approach to Rule Discovery . . . . . . . . . . . . . . . Hossam Abd Elmaksoud Mohamed
33
44 55
A Robust Video Streaming Based on Primary-Shadow Fault-Tolerance Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bok-Hee Ryu, Dongwoon Jeon, and Doo-Hyun Kim
66
Oversampled Perfect Reconstruction FIR Filter Bank Implementation by Removal of Noise and Reducing Redundancy . . . . . . . . . . . . . . . . . . . . . Sangeeta Chougule and Rekha P. Patil
76
Designing a Video Control System for a Traffic Monitoring and Controlling System of Intelligent Traffic Systems . . . . . . . . . . . . . . . . . . . . . Il-Kwon Lim, Young-Hyuk Kim, Jae-Kwang Lee, and Woo-Jun Park
91
Approximate Reasoning and Conceptual Structures . . . . . . . . . . . . . . . . . . Sylvia Encheva
100
XVIII
Table of Contents – Part I
Edge Detection in Grayscale Images Using Grid Smoothing . . . . . . . . . . . Guillaume Noel, Karim Djouani, and Yskandar Hamam
110
Energy-Based Re-transmission Algorithm of the Leader Node’s Neighbor Node for Reliable Transmission in the PEGASIS . . . . . . . . . . . . Se-Jung Lim, A.K. Bashir, So-Yeon Rhee, and Myong-Soon Park
120
Grid-Based and Outlier Detection-Based Data Clustering and Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kyu Cheol Cho and Jong Sik Lee
129
Performance Comparison of PSO-Based CLEAN and EP-Based CLEAN for Scattering Center Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . In-Sik Choi
139
A Framework of Federated 3rd Party and Personalized IPTV Services Using Network Virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Md. Motaharul Islam, Mohammad Mehedi Hassan, and Eui-Nam Huh Faults and Adaptation Policy Modeling Method for Self-adaptive Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ingeol Chun, Jinmyoung Kim, Haeyoung Lee, Wontae Kim, Seungmin Park, and Eunseok Lee Reducing Error Propagation on Anchor Node-Based Distributed Localization in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . Taeyoung Kim, Minhan Shon, Mihui Kim, Dongsoo S. Kim, and Hyunseung Choo Attention Modeling of Game Excitement for Sports Videos . . . . . . . . . . . . Huang-Chia Shih
147
156
165
175
Discovering Art in Robotic Motion: From Imitation to Innovation via Interactive Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . M´ aria Virˇc´ıkov´ a and Peter Sinˇc´ ak
183
I/O Performance and Power Consumption Analysis of HDD and DRAM-SSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hyun-Ju Song and Young-Hun Lee
191
A Measurement Study of the Linux Kernel for Android Mobile Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Raheel Ahmed Memon and Yeonseung Ryu
197
RF Touch for Wireless Control of Intelligent Houses . . . . . . . . . . . . . . . . . . David Kubat, Martin Drahansky, and Jiri Konecny
206
Table of Contents – Part I
Facing Reality: Using ICT to Go Green in Education . . . . . . . . . . . . . . . . . Robert C. Meurant
XIX
211
A Network Coding Based Geocasting Mechanism in Vehicle Ad Hoc Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tz-Heng Hsu, Ying-Chen Lo, and Meng-Shu Chiang
223
An In-network Forwarding Index for Processing Historical Location Query in Object-Tracking Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . Chao-Chun Chen and Chung-Bin Lo
233
Embedding Methods for Bubble-Sort, Pancake, and Matrix-Star Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jong-Seok Kim, Mihye Kim, Hyun Sim, and Hyeong-Ok Lee
243
Development of a 3D Virtual Laboratory with Motion Sensor for Physics Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ji-Seong Jeong, Chan Park, Mihye Kim, Won-Keun Oh, and Kwan-Hee Yoo
253
An Adaptive Embedded Multi-core Real-Time System Scheduling . . . . . . Liang-Teh Lee, Hung-Yuan Chang, and Wai-Min Luk
263
Query Processing Systems for Wireless Sensor Networks . . . . . . . . . . . . . . Humaira Ehsan and Farrukh Aslam Khan
273
Educational Principles in Constructivism for Ubiquitous Based Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sung-Hyun Cha, Kum-Taek Seo, and Gi-Wang Shin
283
A Study on Utilization of Export Assistance Programs for SMEs and Their Exportation Performance in Korea . . . . . . . . . . . . . . . . . . . . . . . . . . . . Woong Eun, Sangchun Lee, Yong-Seok Seo, and Eun-Young Kim
290
The Entrance Authentication and Tracking Systems Using Object Extraction and the RFID Tag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dae-Gi Min, Jae-Woo Kim, and Moon-Seog Jun
313
A Privacy Technique for Providing Anonymity to Sensor Nodes in a Sensor Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jeong-Hyo Park, Yong-Hoon Jung, Hoon Ko, Jeong-Jai Kim, and Moon-Seog Jun Study on Group Key Agreement Using Two-Dimensional Array in Sensor Network Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seung-Jae Jang, Young-Gu Lee, Hoon Ko, and Moon-Seog Jun
327
336
XX
Table of Contents – Part I
Design and Materialization of Location Based Motion Detection System in USN Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Joo-Kwan Lee, Jeong-Jai Kim, and Moon-Seog Jun Realization of Integrated Service for Zoos Using RFID/USN Based Location Tracking Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Jae-Yong Kim, Young-Gu Lee, Kwang-Hyong Lee, and Moon-Seok Jun
350
361
Real-Time Monitoring System Using Location Based Service . . . . . . . . . . Jae-Hwe You, Young-Gu Lee, and Moon-Seog Jun
369
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
381
A Smart Error Protection Scheme Based on Estimation of Perceived Speech Quality for Portable Digital Speech Streaming Systems Jin Ah Kang and Hong Kook Kim School of Information and Communications Gwangju Institute of Science and Technology (GIST), Gwangju 500-712, Korea {jinari,hongkook}@gist.ac.kr
Abstract. In this paper, a smart error protection (SEP) scheme is proposed to improve speech quality of a portable digital speech streaming (PDSS) system via a lossy transmission channel. To this end, the proposed SEP scheme estimates the perceived speech quality (PSQ) for received speech data, and then transmits redundant speech data (RSD) in order to assist speech decoder to reconstruct lost speech signals for high packet loss rates. According to the estimated PSQ, the proposed SEP scheme controls the RSD transmission, and then optimizes a bitrate of speech coding to encode the current speech data (CSD) against the amount of RSD without increasing transmission bandwidth. The effectiveness of the proposed SEP scheme is finally demonstrated using adaptive multirate-narrowband (AMR-NB) and ITU-T Recommendation P.563 as a scalable speech codec and a PSQ estimator, respectively. It is shown from experiments that a PDSS system employing the proposed SEP scheme significantly improves speech quality under packet loss conditions. Keywords: Portable digital speech streaming systems, packet loss, error protection, perceived speech quality, redundant speech transmission.
1 Introduction Due to the rapid development of Internet protocol (IP) networks over the past few decades, audio and video streaming services are increasingly available via the Internet. Moreover, as these services are extended to wireless networks, the quality of service (QoS) of audio and video streaming is becoming even more critical. Specifically, portable digital speech streaming (PDSS) systems require a minimum level of speech communication quality, where the speech quality is largely related to the network conditions such as packet losses or end-to-end packet delays [1]. When the speech streaming is performed via user datagram protocol/IP (UDP/IP) networks, however, packets may be lost or arrive too late for playback due to inevitable delays. In this case, a typical PDSS system can only tolerate a few packet losses for real-time services, where these packet losses frequently occur in wireless networks due to bandwidth fluctuations [2]. T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 1–10, 2011. © Springer-Verlag Berlin Heidelberg 2011
2
J.A. Kang and H.K. Kim
Several packet loss recovery methods, implemented via the Internet and wireless networks, have been proposed for the speech streaming systems. For instance, the techniques proposed in [3] and [4] were sender-based packet loss recovery methods using forward error correction (FEC). In regards to wireless networks, the techniques proposed in [5] and [6] were based on unequal error protection (UEP) methods. In addition, the modified discrete cosine transform (MDCT) coefficients of audio signals were used as the redundant data in order to assist an MP3 audio decoder to reconstruct lost audio signals [7]. However, these methods did not take into account time-varying network conditions, i.e., packet loss rate (PLR). That is, in order to recover the lost packets based on the conventional FEC methods, the redundant data should be designed to be transmitted constantly even if the network conditions are declared as no packet losses. Therefore, a smart error protection (SEP) scheme is needed that recovers packet losses efficiently according to the time-varying characteristic of PLR. Towards this goal, this paper proposes an SEP scheme that transmits redundant speech data (RSD) adaptively according to the estimation of perceived speech quality (PSQ). To this end, the PSQ estimation is performed in real-time for received speech data by using a single-ended speech quality assessment. Simultaneously, the PLR is estimated by a moving average method. In addition, a real-time transport protocol (RTP) payload format is newly suggested as a means of supporting the proposed SEP scheme. In other words, a speech packet combines the bitstreams of the current speech data (CSD) and the RSD when the PLR is assumed to be high by the estimated PSQ and PLR. Thus, even if a speech packet is lost, the speech decoder can reconstruct the lost speech signal by using the RSD bitstreams from the previous packet. On the other hand, when the PLR is assumed to be low by the estimated PSQ and PLR, a speech packet is organized using the CSD bitstreams alone that are encoded by a higher bitrate. The effectiveness of the proposed SEP scheme is finally demonstrated by using the adaptive multirate-narrowband (AMR-NB) speech codec [8] and ITU-T Recommendation P.563 [9] as a scalable speech codec and a single-ended speech quality assessment, respectively. The remainder of this paper is organized as follows. Following this introduction, Section 2 presents the structure of a PDSS system based on the proposed SEP scheme and the RTP payload format for the proposed SEP scheme. Next, Section 3 describes the proposed SEP scheme in detail, and the performance of the proposed SEP scheme is discussed in Section 4. Finally, Section 5 concludes this paper.
2 A Portable Digital Speech Streaming System 2.1 Overview A PDSS system extends traditional speech communication services over a public switched telephone network (PSTN) to wireless networks in order to provide various mobile communication services. To this end, the PDSS system samples a continuous speech signal to discontinuous speech frames, and it encodes the speech frames to bitstreams at a lower bitrate by using a compression algorithm. Then, it transmits the bitstreams using a real-time streaming protocol after packetizing. Meanwhile, at the
A SEP Scheme Based on Estimation of PSQ for Portable Digital Speech Streaming Systems
3
Subsystem A Sender side Input Speech
Scalable Speech Encoder
RTP Payload Formatting
RTP Packetizing
Output Packet
RSD transmission request Receiver side Output Speech
PLR PSQ Estimation Estimation
Speech Frames Buffering
Scalable Speech Decoder
RTP Payload Analysis
RTP Unpacketizing
Input Packet
IP Network
Subsystem B Sender side Output Packet
RTP Packetizing
RTP Payload Formatting
Scalable Speech Encoder
Input Speech
PSQ PLR Estimation Estimation
Output Speech
RSD transmission request Receiver side Input Packet
RTP Unpacketizing
RTP Payload Analysis
Scalable Speech Decoder
Speech Frames Buffering
Fig. 1. Packet flow for a PDSS system employing the proposed SEP scheme, where Subsystems A and B represent the two communication parties
opposite PDSS system, the arriving packets are unpacketized to bitstreams, and the bitstreams are decoded to the speech frames. Finally, these speech frames are sent to an output device. Fig. 1 shows a packet flow for the PDSS system implemented in this paper, where Subsystems A and B represent both parties of the speech stream communication employing the proposed SEP scheme. First, the sender side of Subsystem A performs scalable speech encoding for the input speech frame. Next, the sender side generates a packet according to an RTP payload format, where the packet includes the CSD bitstreams with the decision result whether or not the RSD transmission is needed. Note here that the RSD bitstreams should be incorporated in this payload when the RSD transmission is requested by Subsystem B. After that, the formatted RTP packet is transmitted. Meanwhile, as the RTP packet arrives at the receiver side of Subsystem B, the receiver side analyzes the received packet according to the RTP payload format, and then extracts the CSD bitstreams and the decision result. In the case that the RTP payload format includes the RSD bitstreams, the RSD bitstreams are used to recover a
4
J.A. Kang and H.K. Kim
lost packet in the future. Next, the extracted CSD bitstreams are decoded using a scalable speech decoder and the decoded speech frames are stored in a speech buffer to be used for the PSQ estimation. Finally, the decision result regarding the RSD transmission is inserted into a RTP packet before a speech frame is sent to Subsystem A. 2.2 RTP Payload Format As mentioned in Section 2.1, a PDSS system employing the proposed SEP scheme can have an indicator for a scalable bitrate of speech coding. Moreover, in order to deliver the feedback information from Subsystem A to Subsystem B, and vice versa, there should be any fields reserved in the format to accommodate the transmission of RSD bitstreams and feedback information. Thus, we first select the RTP payload format defined in IETF RFC 3267 for the AMR-NB speech codec [10], as shown in Fig. 2. 1
0 CMR F
FT
2 Q F
FT
Q
4 byte
3 …
Speech frame 1 … Speech frame N
…
P P P P
Fig. 2. Example of the RTP payload format for AMR-NB speech codec defined in RFC 3267
In the payload format, an ‘F|FT|Q’ sequence of control fields is used to describe each speech frame. Note here that a codec mode request (CMR) field is applied to the entire speech frame. In other words, a one-bit F field indicates whether this frame is to be followed by another speech frame (F=1) or if it is the final speech frame (F=0). In addition, an FT field, comprised of 4 bits, then indicates if this frame is actually coded by a speech encoder or if it is just comfort noise. That is, a number in this field is assigned from 0 to 7, corresponding to encoding bitrates of 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2, and 12.2 kbit/s, respectively. However, if comfort noise is encoded, the assigned number ranges from 8 to 11. Note that the number 15 indicates the condition that there is no data to be transmitted, and that the numbers 12 to 14 are reserved for future use. Next, a Q field, indicating the speech quality with one-bit, is set at 0 when the speech frame data is severely damaged; otherwise, it is set at 1. Finally, the CMR field, comprised of 4 bits, is used to deliver a mode change signal to the speech encoder. For example, it is set to one out of eight encoding modes, corresponding to different bitrates of AMR-NB speech codec. At the end of the payload, P fields are used to ensure octet alignment. In order to realize the proposed SEP scheme in this payload format, two new frame indices for the RSD bitstreams and the feedback information are incorporated into the FT field, which are denoted using the numbers 12 and 13, respectively. The use of the RTP payload format described above has several advantages. First, the control ability for a speech encoder, such as the CMR field, is retained by using the
A SEP Scheme Based on Estimation of PSQ for Portable Digital Speech Streaming Systems
5
RTP payload format for the speech codec employed in the implemented PDSS system. Next, the overhead of the control fields for each RSD bitstreams is required to be as small as 6 bits in ‘F|FT|Q’. Finally, no additional transport protocol for the RSD transmission request is needed since this feedback is conducted using RTP packets that are used to deliver the speech bitstreams. Therefore, the transmission overhead for the RSD transmission request is significantly reduced, compared to existing transport protocols designed for feedback such as the RTP control protocol (RTCP) [11].
3 Proposed Smart Error Protection Scheme 3.1 Packet Loss Recovery and PSQ Estimation at the Receiver Side Fig. 3 presents the procedure of packet loss recovery with the PSQ estimation at the receiver side of a PDSS system employing the proposed SEP scheme. First, a packet loss occurrence is verified through RTP packet analysis. Then, the received CSD bitstreams are decoded if it is decided that there is no packet loss. On the other hand, if it is decided that there is a packet loss, the lost speech signals are recovered by using the RSD bitstreams or by using the packet loss concealment (PLC) algorithm in the speech decoder, depending on the availability of the RSD bitstreams. Finally, the speech decoder reconstructs the speech frame data from the CSD bitstreams, and estimate PSQ and PLR with speech data once the amount of speech frames is enough to estimate a PSQ score. Start
RTP Analysis
Packet Loss Verification
True
Lost Speech Recovery
False Feedback Verification
True
Speech Coding Bitrate Control
False RSD Verification
True
RSD Buffer
False CSD Decoding
NRX >= N False
Speech Buffer
True
PSQ Estimation PLR Estimation
End
Fig. 3. Procedure of the packet loss recovery with the PSQ estimation at the receiver side
6
J.A. Kang and H.K. Kim N fra mes
sˆ( m)
sˆ( m + 1)
Speech Buffer
sˆ(m + N −1)
.. . sˆ(m + N − P )
sˆ( m + N − P + 1)
…
sˆ(m + N −1)
…
sˆ( m + 2 N − P − 1)
P frames
Fig. 4. Overlap of speech frames for the PSQ estimation at the receiver side
For the PSQ estimation, the speech data in a speech buffer are used by overlapping, as shown in Fig. 4. In the figure, sˆ( m ) is the m-th speech frame input to the speech buffer, N is the total number of frames to be used for the PSQ estimation, and P is the number of frames to be overlapped for the next PSQ estimation. In other words, the PSQ estimation is conducted when every (N-P) frames are newly received from the opposite PDSS system. In addition, the estimated PLR, Lˆ ( k ), is obtained by moving average for the previous PLR, L( k − 1), with the average PLR, L ( 0 : k − 1), as (1)
Lˆ ( k ) = ( 1 − α ) L ( 0 : k − 1 ) + α L ( k − 1 )
Finally, it is decided whether or not requesting the RSD transmission by comparing the estimated PSQ and PLR with each threshold. That is, the request for the RSD transmission, RSD (k ), is set to true or false according to the equation of ⎧ true , RSD ( k ) = ⎨ ⎩ flase ,
if Qˆ ( k ) ≤ Thres
1
and
Lˆ ( k ) ≥ Thres
2
(2)
otherwise
where Qˆ ( k ) is the estimated PSQ score, and Thres 1 and Thres 2 are threshold for Qˆ ( k ) and Lˆ ( k ) , respectively. 3.2 Scalable Speech Coding and RSD Transmission at the Sender Side Fig. 5 shows the procedure how to transmit scalable speech coding bitstreams and the RSD bitstreams at the sender side for the proposed SEP scheme. First, for given feedback information transmitted from the opposite PDSS streaming system, the sender side verifies the request for the RSD transmission and changes the bitrate of scalable speech coding according to the request. In other words, when the RSD transmission is not requested, the bitrate is set at the highest bitrate and then the CSD bitstreams are encoded alone with no additional RSD bitstreams. On the other hand, when the RSD transmission is requested, the bitrate is set at smaller bitrate than the current bitrate in order to assign the remaining bitrate for the RSD transmission. Thus, both of the CSD and RSD bitstreams are encoded. Finally, after the RTP payload format described in Section 2.2 is configured according to such adaptive RSD transmission, the RTP packets are transmitted to the opposite PDSS system.
A SEP Scheme Based on Estimation of PSQ for Portable Digital Speech Streaming Systems
7
Start
RSD Request Verification
True
Encoding of CSD and RSD
False Encoding of CSD
RTP Formatting
End
Fig. 5. Procedure of the scalable speech coding and the adaptive RSD transmission at the sender side
As described above, we can several advantages of the proposed SEP scheme as follows. First, the adaptive operation of the packet loss recovery according to the network conditions is effective since burst packet losses generally occur when the network is congested due to a sudden increase in the amount of data coming to the network [4]. Second, compared to the conventional redundant data transmission (RDT) methods that require additional network overhead, the proposed SEP scheme generates redundant data without increasing the transmission bandwidth by controlling the bitrate of a scalable speech codec. Third, in order to estimate the network conditions, the proposed SEP scheme conducts the estimation of PSQ. This is motivated by the fact that the PSQ measured as a mean opinion score (MOS) can be considered to be a clearer indicator of the speech quality than other parameters in the PDSS system.
4 Performance Evaluation In order to demonstrate the effectiveness of the proposed SEP scheme, the PDSS system was first implemented by using the AMR-NB speech codec and the ITU-T Recommendation P.563 as a scalable speech codec and a PSQ estimator, respectively. Here, the speech signals were sampled at 8 kHz, and then encoded using the AMRNB speech codec operated at 10.2 kbit/s. Thus, when the RSD transmission was needed, the bitrate of the CSD and RSD was set at 4.75 kbit/s, which was almost half the bitrate of 10.2 kbit/s. By considering the requirements of the ITU-T Recommendation P.563, N in Fig.4 was set to 200 frames for the PSQ estimation, which corresponded to 4 seconds. Moreover, P was set to 150 frames, thus the PSQ estimation was conducted whenever every new 50 frames were received. For the PLR estimation, we carried out performance evaluation of the proposed SEP scheme with the different value of α in Eq. (1), and then we set α to 0.4. Similarly, the thresholds for the estimated PSQ and PLR in Eq. (2), Thres 1 and Thres 2 , were set to 4.0 MOS and 5%, respectively.
8
J.A. Kang and H.K. Kim
In the test, 48 speech files from NTT-AT speech database [12] were used, where each speech file was about 4 seconds long and sampled at a rate of 16 kHz. These speech signals were first filtered using a modified intermediate reference system (IRS) filter followed by an automatic level adjustment [13]. Then, the speech signals were down-sampled from 16 to 8 kHz. In order to show the effectiveness of the proposed SEP scheme under different PLRs including burst loss characteristics, we generated five different PLR patterns of 3, 5, 7, 9 and 11% by using the Gilbert-Elliot channel model defined in the ITU-T Recommendation G.191 [13]. Here, the burstiness of the packet losses was set at 0.5, and the mean and maximum consecutive packet losses were measured at 1.5 and 4.0 frames, respectively. In order to demonstrate the effectiveness of the proposed SEP scheme, the speech quality of the PDSS system with the proposed SEP scheme was compared to that of the PDSS system with the regular RDT. The regular RDT was designed to transmit the RSD regularly for each speech frame by encoding the CSD and RSD bitstreams at a bitrate of 4.75 kbit/s in order to evaluate the performance without increasing transmission bandwidth. In addition, the speech quality of the PDSS system by the PLC algorithm without the proposed SEP scheme or the regular RDT was also evaluated, where the PLC algorithm was operated at the highest bitrate of 10.2 kbit/s. Note here that the PLC algorithm embedded in the AMR-NB speech decoder was always applied without regarding to the RSD transmission. As the evaluation method for the recovered speech quality, the perceptual evaluation of speech quality (PESQ) defined in the ITU-T Recommendation P.862 [14] was used. Table 1. Speech quality measured in MOS using PESQ for the different packet loss recovery methods, under PLRs ranging from 3 to 11% MOS Score Method Without the regular RDT With the regular RDT With the proposed scheme
0 3.70 3.14 3.70
3 3.16 3.07 3.16
PLR (%) 5 7 2.97 2.83 2.97 2.91 2.93 2.87
9 2.61 2.80 2.71
11 2.52 2.78 2.72
Average 2.96 2.94 3.01
Table 1 compares speech quality measured in MOS using PESQ for the different packet loss recovery methods, under PLRs ranging from 3 to 11%. As shown in the table, the proposed SEP scheme first improved the speech quality for low PLRs as the PLC algorithm without using regular RDT did. In addition, the proposed SEP scheme provided better performance than without the regular RDT as the regular RDT did for high PLRs. Consequently, the proposed SEP scheme yielded the average speech quality of 3.01 MOS, which was 0.07 MOS higher than the regular RDT.
5 Conclusion In this paper, we proposed a new smart error protection (SEP) scheme that guaranteed the speech quality without increasing transmission bandwidth for a portable digital speech streaming system (PDSS). To this end, the proposed SEP scheme was
A SEP Scheme Based on Estimation of PSQ for Portable Digital Speech Streaming Systems
9
designed to transmit redundant speech data (RSD) according to the estimation results for the perceived speech quality (PSQ) and packet loss rate (PLR), where a singleended speech quality assessment and a moving average method were used to estimate PSQ and PLR, respectively. The proposed SEP scheme was applied to the receiver and sender sides of a PDSS system. In other words, the receiver side of the PDSS system first decided the RSD transmission based on the estimation of PSQ and PLR, and then sent feedback information on the decision result to the opposite PDSS system via real-time transport protocol (RTP) packets for speech bitstreams. On the other hand, the sender side of the PDSS system controlled the RSD transmission according to the received feedback, and subsequently optimized the speech coding bitrate in order to maintain the equivalent transmission bandwidth despite of the RSD bitstreams. Finally, we evaluated the speech quality recovered by the proposed SEP scheme under PLRs and compared it with that of the conventional redundant data transmission (RDT) method. From the results, the proposed SEP scheme improved the speech quality from 2.94 to 3.01 MOS compared than the conventional method for the PLRs ranged from 3% to 11%. Consequently, the proposed SEP scheme could be applied to the PDSS streaming systems in order to improve the speech quality degraded due to packet losses efficiently. Acknowledgments. This work was supported in part by the “Fusion-Tech Developments for THz Information & Communications” Program of the Gwangju Institute of Science and Technology (GIST) in 2011, by the Mid-career Researcher Program through the National Research Foundation of Korea (NRF) grant funded by the Korea government (MEST) (No. 2010-0000135), and by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2010-C1090-1021-0007).
References 1. Wu, C.-F., Lee, C.-L., Chang, W.-W.: Perceptual-based playout mechanisms for multistream voice over IP networks. In: Proceedings of Interspeech, Antwerp, Belgium, pp. 1673–1676 (September 2007) 2. Zhang, Q., Wang, G., Xiong, Z., Zhou, J., Zhu, W.: Error robust scalable audio streaming over wireless IP networks. IEEE Transactions on Multimedia 6(6), 897–909 (2004) 3. Bolot, J.-C., Fosse-Parisis, S., Towsley, D.: Adaptive FEC-based error control for Internet telephony. In: Proceedings of IEEE International Conference on Computer Communications (INFOCOM), New York, NY, pp. 1453–1460 (March 1999) 4. Jiang, W., Schulzrinne, H.: Comparison and optimization of packet loss repair methods on VoIP perceived quality under bursty loss. In: Proceedings of 12th International Workshop on Network and Operating Systems Support for Digital Audio and Video (NOSSDAV), Miami, FL, pp. 73–81 (May 2002) 5. Yung, C., Fu, H., Tsui, C., Cheng, R.S., George, D.: Unequal error protection for wireless transmission of MPEG audio. In: Proceedings of IEEE International Symposium on Circuits and Systems (ISCAS), Orlando, FL, pp. 342–345 (May 1999) 6. Hagenauer, J., Stockhammer, T.: Channel coding and transmission aspects for wireless multimedia. Proceedings of the IEEE 87, 1764–1777 (1999)
10
J.A. Kang and H.K. Kim
7. Ito, A., Konno, K., Makino, S.: Packet loss concealment for MDCT-based audio codec using correlation-based side information. International Journal of Innovative Computing, Information and Control 6, 3(B), 1347–1361 (2010) 8. ETSI 3GPP TS 26.101: Adaptive Multi-Rate (AMR) Speech Codec Frame Structure (January 2010) 9. ITU-T Recommendation P.563: Single-Ended Method for Objective Audio Quality Assessment in Narrow-Band Telephony Applications (May 2004) 10. IETF RFC 3267: Real-Time Transport Protocol (RTP) Payload Format and File Storage Format for the Adaptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMRWB) Audio Codecs (June 2002) 11. IETF RFC 1889: RTP: A Transport Protocol for Real-Time Applications (January 1996) 12. NTT-AT: Multi-Lingual Speech Database for Telephonometry (1994) 13. ITU-T Recommendation G.191: Software Tools for Speech and Audio Coding Standardization (November 1996) 14. ITU-T Recommendation P.862: Perceptual Evaluation of Speech Quality (PESQ), an Objective Method for End-to-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs (February 2001)
MDCT-Domain Packet Loss Concealment for Scalable Wideband Speech Coding Nam In Park and Hong Kook Kim School of Information and Communications Gwangju Institute of Science and Technology (GIST), Gwangju 500-712, Korea {naminpark,hongkook}@gist.ac.kr
Abstract. In this paper, we propose a modified discrete cosine transform (MDCT) based packet loss concealment (PLC) algorithm in order to improve the quality of decoded speech when a packet loss occurs in scalable wideband speech coders using MDCT as spectral parameters. The proposed PLC algorithm is realized by smoothing MDCT coefficients between the low and high bands for scalable wideband speech coders. In G.729.1, a typical scalable wideband speech coder standardized by ITU-T, two different PLC algorithms are applied to low band and high band in time and frequency domain, respectively. Thus, the MDCT coefficients around the boundary between the low and high band can be mismatched. The proposed PLC algorithm is replaced with the PLC algorithm applied to the high band, and it compensates for the mismatch in the MDCT domain at the boundary. Finally, we compare the performance of the proposed PLC algorithm with that of the PLC algorithm employed in G.729.1 by means of perceptual evaluation of speech quality (PESQ), an A-B preference test, and a waveform comparison under different random and burst packet loss conditions. It is shown from the experiments that the proposed PLC algorithm provides significantly better speech quality than the PLC of G.729.1. Keywords: Packet loss concealment (PLC), wideband speech coding, modified discrete cosine transform (MDCT), G.729.1.
1 Introduction With the increasingly popular use of the Internet, IP telephony devices such as voice over IP (VOIP) and voice over WiFi (VoWiFi) phones have attracted wide attention for speech communications. In IP phone services, speech packets are typically transmitted using a real-time transport protocol/user datagram protocol (RTP/UDP), though RTP/UDP does not verify whether the transmitted packets are correctly received [1]. Due to the nature of this type of transmission, the packet loss rate would become higher as the network becomes congested. In addition, depending on the network resources, the possibility of burst packet losses also increases, potentially resulting in severe quality degradation of the reconstructed speech [2]. Most speech coders in use today are based on telephone-bandwidth narrowband speech, nominally limited to about 200-3400 Hz and sampled at a rate of 8 kHz. On T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 11–18, 2011. © Springer-Verlag Berlin Heidelberg 2011
12
N.I. Park and H.K. Kim
the contrary, the wideband speech coders have been developed for the purpose of smoothly migrating from narrowband to wideband quality (50-7,000 Hz) at a rate of 16 kHz in order to improve speech quality in services. That is, voice services using wideband speech not only increase the intelligibility and naturalness of speech, but also add feeling of transparent communications. Especially, G.729.1, a scalable wideband speech coder, improves the quality of speech by encoding the frequency bands left out by the narrowband speech coder, G.729. Therefore, encoding wideband speech using G.729.1 is performed in two different approaches which are applied to low band and high band in time and frequency domain, respectively. In particular, when a frame loss occurs, the low band and the high band PLC algorithm work separately. In other words, the low band PLC algorithm reconstructs excitation and spectral parameters of the lost frame from the last good frame. Also, the high band PLC algorithm reconstructs spectral parameters, e.g., typically modified discrete cosine transform (MDCT) coefficients, of the lost frame from the last good frame. Of course, the excitation of the low band could be used for reconstructing the excitation of the high band if bandwidth extension from the low band to the high band is employed in wideband speech decoding [3][4]. In this paper, we propose a modified discrete cosine transform (MDCT) based packet loss concealment (PLC) algorithm in order to improve the quality of decoded speech when a packet loss occurs in a scalable wideband speech coder using MDCT as spectral parameters. Here, we select ITU-T Recommendation G.729.1, which is a typical MDCT-based scalable wideband speech coder employing G.729 as a narrowband speech coder. Since two different PLC algorithms in G.729.1 are applied to low band and high band in time and frequency domain, respectively, the MDCT coefficients around the boundary between the low and high band can be mismatched. The proposed PLC algorithm is replaced with the PLC algorithm applied to the high band, and it compensates for the mismatch in the MDCT domain at the boundary. The remainder of this paper is organized as follows. Following this introduction, Section 2 describes a conventional PLC algorithm currently employed in the G.729.1 decoder [5]. After that, Section 3 proposes an MDCT-based PLC algorithm that is implemented by replacing the conventional PLC algorithm in G.729.1. Section 4 then demonstrates the performance of the proposed PLC algorithm, and this paper is concluded in Section 5.
2 Conventional PLC Algorithm The PLC algorithm employed in the G.729.1 standard reconstructs speech signals of the current frame based on previously received speech parameters such as excitation in the low band and the MDCT coefficients in high band, which is shown in Fig. 1 [5]. In other words, the PLC algorithm replaces the missing excitation and MDCT coefficients with an equivalent characteristic from a previously received frame, while the excitation energy gradually decays. In addition, for frame error correction (FEC) it uses a voicing classifier based on the parameter C, which is a signal classification such as voice, unvoice, transition, and silence. During the frame error concealment process for the low band, the gain or the energy parameter, E, is adjusted according to C. Next, the synthesis filter for the lost frame uses the linear predictive coding (LPC)
MDCT-Domain Packet Loss Concealment for Scalable Wideband Speech Coding
13
coefficients of the last good frame. Similarly, the pitch period of the lost frame uses the integer part of the pitch period from the previous frame. To avoid becoming desynchronized, the phase parameter, Presync, is used for the recovery after the lost voice onset period.
Fig. 1. Overview of the G.729.1 PLC algorithm
During the frame error concealment process for the high band, the high-band signals of the previous good frame are applied to the time domain bandwidth extension (TDBWE) by using the excitation generated by the low-band PLC. Next, the MDCT coefficients of previous good frame are used to generate the high-band signals. Finally, the decoded speech for the lost frame is generated through a quadrature mirror filter (QMF) synthesis, which consists of the 64-tap filter coefficients.
3 Proposed PLC Algorithm As shown in Fig. 1, according to a 4-kHz boundary frequency, the PLC algorithm for the G.729.1 speech coder consists of low-band and high-band PLCs. Note that the PLC algorithm for regenerating the low-band signals is executed in the time domain, whereas the PLC algorithm for the high-band signals is executed in the frequency domain. Since different PLC algorithms are applied for the low- and high-bands, a frequency mismatch [7] could occur.
14
N.I. Park and H.K. Kim
Contrary to this conventional PLC algorithm, the proposed algorithm is designed as shown in Fig. 2. In the figure, the PLC of the low-band signal is equivalent to the conventional PLC algorithm. On the other hand, in the high-band PLC, the synthesized low-band signal is transformed into the frequency domain by using MDCT to smooth the boundary frequency between the low- and high-bands. In this case, the weighting filter [8] for smoothing the frequency is given by ′ ( k ) = 0.6 ⋅ S high ( k ) + 0.4 ⋅ S avg , S high
k = 0,1,L,39
(1)
′ (k ) are the low-band signal, high-band signal, and where Slow (k ), S high (k ), and S high
the smoothed high-band signal in the MDCT frequency-domain, respectively, and k is the MDCT frequency bin with a range of 0 to 159. S avg is the frequency average at the boundary frequency (4 kHz) and denoted as Savg =
1 2
(Slow (159) + Shigh (0)) . After
passing through the weighting filter, the smoothed signal is transformed into the time domain by applying an inverse modified discrete cosine transform (IMDCT), as shown in the figure. Finally, the decoded speech for the lost frame is recovered through QMF synthesis, which consists of the 64-tap filter coefficients.
Fig. 2. Overview of the proposed PLC algorithm
MDCT-Domain Packet Loss Concealment for Scalable Wideband Speech Coding
15
4 Performance Evaluation
4.1 4 3.9 3.8 3.7 3.6 3.5 3.4 3.3
G.729.1-PLC Proposed PLC PESQ (MOS)
PESQ (MOS)
To evaluate the performance of the proposed PLC algorithm, we replaced the PLC algorithm currently employed in G.729.1 [5] with the proposed PLC algorithm, and then obtained perceptual evaluation of the speech quality (PESQ) scores according to ITU-T Recommendation P.862 [8]. For the PESQ test, 76 audio files were taken from the SQAM audio database [9] and processed by G.729.1 using the proposed PLC algorithm under different packet loss conditions. The performance was also compared to that using the PLC algorithm employed in G.729.1, referred to here as G.729.1PLC. In this paper, we simulated two different packet loss conditions, including random and burst packet losses. During these simulations, packet loss rates of 3, 5, and 8% were generated by the Gilbert-Elliot model defined in ITU-T Recommendation G.191 [10]. Under the burst packet loss condition, the burstiness of the packet losses was set to 0.66. Thus, the mean and maximum consecutive packet losses were measured at 1.5 and 3.7 frames, respectively.
No loss 3 5 Single Packet Loss Rate (%)
4.1 4 3.9 3.8 3.7 3.6 3.5 3.4 3.3
8
Proposed PLC PESQ (MOS)
PESQ (MOS)
3 5 Single Packet Loss Rate (%)
8
(b) G.729.1-PLC
No loss 0 0.33 0.66 Burstiness (Packet Loss Rate =3%)
(c)
Proposed PLC
No loss
(a) 4.1 4 3.9 3.8 3.7 3.6 3.5 3.4 3.3
G.729.1-PLC
4.1 4 3.9 3.8 3.7 3.6 3.5 3.4 3.3
G.729.1-PLC Proposed PLC
No loss 0 0.33 0.66 Burstiness (Packet Loss Rate =3%)
(d)
Fig. 3. Comparison of PESQ scores of the proposed PLC and G.729.1-PLC for (a) speech data and (b) music data under single packet loss conditions, and for (c) speech data and (d) music data under burst packet loss conditions
Fig. 3 compares the PESQ scores when the proposed PLC and G.729.1-PLC were employed in G.729.1 under single packet loss conditions and burst packet loss conditions at a packet loss rate of 3%, respectively. It was shown from the figure that the proposed PLC algorithm had PESQ scores comparable to the G.729.1-PLC algorithm for all conditions in the case of the speech data. However, the
16
N.I. Park and H.K. Kim
effectiveness of the proposed PLC algorithm was investigated when packet losses occurred in audio data such as music. In order to evaluate the subjective performance, we performed an A-B preference listening test, in which 6 speech files (3 males and 3 females) and 2 music files were processed by both G.729.1-PLC and the proposed PLC under random and burst packet loss conditions. Table 1 shows the A-B preference test results. As shown in the table, the proposed PLC was significantly preferred than G.729.1-PLC. Table 1. A-B preference test results
(a)
Amplitude
Burstiness/Packet loss rate 3% γ = 0.0 5% (random) 8% 3% γ = 0.66 5% 8% Average
Preference Score (%) G.729.1-PLC No difference 15.62 46.88 12.08 56.67 21.88 45.31 18.75 51.56 14.06 54.69 15.63 57.81 16.34 52.15
Proposed PLC 37.50 31.25 32.81 29.69 31.25 26.56 31.51
1 0
-1
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
9.0
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
9.0
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
9.0
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9
9.0
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8
8.9 9.0 Time (sec)
1
(b)
0 -1 1
(c) 0 1
(d)
0 -1
1
(e)
0 -1
Fig. 4. Waveform comparison: (a) original waveform, (b) decoded speech signal with no packet loss, and reconstructed speech signals using (c) packet error patterns, (d) G.729.1-PLC, and (e) proposed PLC
MDCT-Domain Packet Loss Concealment for Scalable Wideband Speech Coding
17
Finally, Fig. 4 compares the waveform comparison of speech reconstructed by different PLC algorithms. Figs. 4(a) and 4(b) show the original speech waveform and the decoded speech waveform with no loss of the original signal, respectively. After applying the packet error pattern (expressed as a solid box in Fig. 4(c)), the proposed PLC (Fig. 4(e)) reconstructed the speech signals better than in G.729.1-PLC (Fig. 4(d)).
5 Conclusion In this paper, we proposed a packet loss concealment algorithm for the G.729.1 speech coder to improve the performance of speech quality when frame erasures or packet losses occurred. To this end, we proposed an MDCT-based approach, where MDCT coefficients between the low and high bands were smoothed in order to improve the quality of decoded speech. Next, we evaluated the performance of the proposed PLC algorithm on G.729.1 under random and burst packet loss rates of 3, 5, and 8%, and then compared it with that of the PLC algorithm already employed in G.729.1 (G.729.1-PLC). It was shown from the comparison of PESQ scores, A-B preference, and waveforms that the proposed PLC algorithm provided similar or better speech quality than G.729.1-PLC for all the simulated conditions. Acknowledgments. This work was supported in part by the “Fusion-Tech Developments for THz Information & Communications” Program of the Gwangju Institute of Science and Technology (GIST) in 2011, by the Mid-Career Researcher Program through an NRF grant funded by MEST, Korea (No. 2010-0000135), and by the Ministry of Knowledge Economy (MKE), Korea, under the Information Technology Research Center (ITRC) support program supervised by the National IT Industry Promotion Agency (NIPA) (NIPA-2010-C1090-1021-0007).
References 1. Goode, B.: Voice over internet protocol (VoIP). Proceedings of the IEEE 90(9), 1495– 1517 (2002) 2. Jian, W., Schulzrinne, H.: Comparison and optimization of packet loss repair methods on VoIP perceived quality under bursty loss. In: Proceedings of NOSSDAV, pp. 73–81 (2002) 3. Gournay, P., Rousseau, F., Lefebvre, R.: Improved packet loss recovery using late frames for prediction-based speech coders. In: Proceedings of ICASSP, pp. 108–111 (2003) 4. Tommy, V., Milan, J., Redwan, S., Roch, L.: Efficient frame erasure concealment in predictive speech codecs using glottal pulse resynchronisation. In: Proceedings of ICASSP, pp. 1113–1116 (2007) 5. Rogot, S., Kovesi, B., Trilling, R., Virette, D., Duc, N., Massaloux, D., Proust, S., Geiser, B., Gartner, M., Schandl, S., Taddei, H., Yang, G., Shlomot, E., Ehara, H., Yoshida, K., Vaillancourt, T., Salami, R., Lee, M.S., Kim, D.Y.: ITU-T G.729.1: an 8-32 kbit/s scalable coder interoperable with G.729 for wideband Telephony and voice over IP. In: Proceedings of ICASSP, pp. 529–532 (2007)
18
N.I. Park and H.K. Kim
6. Taleb, A., Sandgren, P., Johansson, I., Enstrom, D., Bruhn, S.: Partial spectral loss concealment in transform coders. In: Proceedings of ICASSP, pp. 185–188 (2005) 7. ETSI ES 202 050, v1.1.3.: Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithm (2003) 8. ITU-T Recommendation P.862. Perceptual Evaluation of Speech Quality (PESQ), and Objective Method for End-to-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Coders (2001) 9. EBU Tech Document 3253: Sound Quality Assessment Material, SQAM (1998) 10. ITU-T Recommendation G.191: Software Tools for Speech and Audio Coding Standardization (2000)
High-Quality and Low-Complexity Real-Time Voice Changing with Seamless Switching for Digital Imaging Devices Sung Dong Jo1, Young Han Lee1, Ji Hun Park1, Hong Kook Kim1, Ji Woon Kim2, and Myeong Bo Kim2 1
School of Information and Communications Gwangju Institute of Science and Technology(GIST), Gwangju 500-712, Korea {sdjo,cpumaker,jh_park,hongkook}@gist.ac.kr 2 Camcorder Business Team, Digital Imaging Business Samsung Electronics, Suwon-si, Gyenggi-do 443-742, Korea {jiwoon.kim,kmbo.kim}@samsung.com
Abstract. In this paper, we propose a voice changing method to provide a seamless switchable function with a low computational complexity for digital imaging devices. The proposed method combines a waveform similarity overlap-and-add (WSOLA) algorithm with a sampling rate changing technique that operates in the time domain. In addition, the proposed method includes a noise technique in the region where the voice changing switching mode changes from on to off, and vice versa. We finally compare the performance of the proposed method with that of a conventional one in terms of the processing time and speech quality. It is shown from the experiments that the proposed voice changing method gives a relative complexity reduction of 84.5% in a resource-constrained device having an ARM processor and is more preferred than the conventional method by 76%. Keywords: Voice changing, time-scale modification, waveform similarity overlap-and-add (WSOLA), sampling rate change, digital imaging device.
1 Introduction Voice changing is a process of transforming the characteristics of speech uttered by a speaker such that a listener would not recognize the speaker [1]. This technique is useful for disguising the speaker`s voice. Applications of a voice changing method are numerous, including text-to-speech synthesis based on acoustical unit concatenation, transformation of voice characteristics to disguise the speaker`s voice, foreign language learning, audio monitoring, film/soundtrack post-synchronization, and so on [2]. The basic mechanisms behind the voice changing process often consist of source modification, filter modification, and the combination of source and filter modification [3]. Source modification tries to modify prosody of speech acoustic, thus it is usually referred to as prosodic modification. Source modification is classified into three different types such as time-scale modification (TSM), pitch-scale modification (PSM), T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 19–27, 2011. © Springer-Verlag Berlin Heidelberg 2011
20
S.D. Jo et al.
and energy modification [3]. On the other hand, filter modification tries to modify the magnitude spectrum of the frequency response of the vocal tract system [4]. The fundamental frequency of the vocal fold vibration is denoted as F0, and the perceptual feature of speech corresponding to F0 is often called “pitch” [5]. Because pitch periods generally associated with male speakers are quite different from those associated with female speakers, a modification of speech is related with controlling pitch periods, i.e., PSM. However, a PSM algorithm affects the articulation rate of the original signal, which results in changing the apparent gender of the speaker [5]. Therefore, this can be compensated by matching the original articulation rate if we use TSM instead of PSM for the voice changing. A TSM is a technique used to modify the duration of speech or audio signals, while minimizing the distortion of other important characteristics such as the pitch and timbre [2]. A TSM algorithm has been widely used in the fields of speech and audio signal processing. For example, it has been used during preprocessing in speech recognition systems to improve the recognition rate [6]. Also, TSM can be applied to speech synthesis systems in order to produce sounds more naturally [7]. In this paper, we propose a voice changing method for a real-time implementation on a digital imaging device. The proposed voice changing method combines a waveform similarity overlap-and-add (WSOLA) TSM algorithm [8] with a sampling rate changing technique that operates in the time domain. The main concern of the proposed method is how to minimize the computational complexity since the resource of a digital imaging device is highly limited. In addition, when the voice changing can be on or off in the device, a seamless on/off mode control with a low computational complexity should be available. To this end, we give an effort on the complexity reduction of the WSOLA algorithm as well as an efficient control of the voice changing mode. Next, we incorporate a noise reduction technique in the proposed method when the voice changing mode is changed from on to off or vice versa. The organization of this paper is as follows. Following this introduction, we shortly review conventional voice changing methods in Section 2. After that, we propose a voice changing method and discuss how to reduce the computational complexity of the voice changing method in Section 4. In Section 5, we compare the performance of the proposed method with that of a conventional one in terms of the computational complexity and speech quality. Finally, we conclude this paper in Section 6.
2 Basic Mechanisms of Voice Changing Methods Voice changing methods modify the sound that is produced when a person speaks or sings. In this section, we shortly review the basic mechanisms consisting of source modification, filter modification, and the combination of source and filter modifications. 2.1 Source Modification Source modification deals with the modification of prosodic aspects of speech such as rhythm, intonation, and stress, and is classified into TSM, PSM, or energy modification, as mentioned in Section 1.
High-Quality and Low-Complexity Real-Time Voice Changing
21
First of all, TSM enables us to change the apparent rate of articulation without affecting the perceptual quality of the original speech [2]. This means that the formant structure is changed at a slower or faster rate than the rate of the input speech. Thus, TSM has been used to improve the compression rate in speech and audio coding [7]. Over the last several decades, various forms of TSM algorithms have been developed. Among them, synchronized overlap-and-add (SOLA) [9], pitch synchronous overlapand-add (PSOLA) [10], and waveform similarity overlap-and-add (WSOLA) [8] show relatively a good performance in regards to output quality. Second, PSM alters the fundamental frequency in order to compress or expand the harmonic components in the spectrum while preserving the short-time spectral envelope as well as the time evolution [3]. However, it changes the local pitch, which affects the original articulation rate, while TSM maintains the original articulation rate. Third, energy modification modifies the perceived loudness of the input speech [3]. It is considered to be the simplest modification in source modification. For example, signals are just multiplied by a scale factor to amplify or attenuate. 2.2 Filter Modification It is widely accepted that magnitude spectrum carries information of a speaker’s individuality. Therefore, by modifying the magnitude spectrum of the vocal tract, speaker identity can be controlled [3]. There are two types of filter modification: one without a specific target and the other with a specific target [3]. In the case where there is no specific target, the magnitude spectrum is modified in a general way without having a specific target speaker. For example, we want to modify the overall quality of a speech signal produced by a female voice so that it sounds as if it were produced by an older female speaker. However, in the case where there is a specific target, the filter of a source speaker is modified in a way that the modified filter approximates the characteristics of the filter according to another targeted speaker. Usually, we refer to this type of modification as voice conversion. To obtain the transformed spectrum, there is a learning process using the source and target spectrum to make the transformed spectrum equal to the average spectrum of the target spectrum during a training process [3][4]. 2.3 Combination of Source and Filter Modification The prosody characteristics of a speaker become a critical cue used for the identification of the speaker. At the same time, the vocal tract characteristics are also important for identification. Therefore, if we want to modify the voice of a speaker so that it sounds like the voice of another speaker, prosody and vocal tract modifications should be combined. If a specific target speaker is provided, then this becomes another type of voice conversion. On the other hand, if no specific target is provided, it is usually referred to as voice modification [3]. Voice morphing is one example of combined source and filter modification. For example, the same sentences are uttered by two source speakers and then a third speaker can be generated having characteristics from both source speakers by applying dynamic time warping (DTW) [11] between the two sentences.
22
S.D. Jo et al.
3 Proposed Voice Changing Method In this section, we propose a voice changing method using WSOLA to reduce the computational complexity. In addition, the proposed voice changing method includes a noise reduction technique, which is useful for the transient region where voice changing is from on to off or vice versa. Fig. 1 shows the procedure of the proposed method, where input signals are stereo, sampled at a rate of 32 kHz with 16-bit resolution.
Fig. 1. Procedure of the proposed voice changing method
As shown in the figure, the proposed voice changing method mainly consists of a sampling rate changing block and a WSOLA-based TSM block. The sampling rate changing block changes the pitch of the input signals by increasing or decreasing the number of input samples according to a given voice changing level. However, this block affects the articulation rate of the original signal, as PSM does. In order to compensate for mismatching the articulation rate, the WSOLA-based TSM algorithm is applied to the signals from the sampling rate changing block, which will be explained in Section 3.1. In this paper, the proposed voice changing method is designed to provide a realtime seamless voice changing on/off mode. When the mode is on, the WSOLA algorithm is fully applied. On the other hand, instead of the WSOLA algorithm, we simply apply an overlap-and-add technique when the mode is off. In the case when the mode switches from on to off or vice versa, a seamless on/off mode control is required due to noise in such transient region in the time domain, which will be discussed in Section 3.2. 3.1 WSOLA-Based Time-Scale Modification The WSOLA-based TSM algorithm, or simply WSOLA algorithm, uses a waveform similarity measure to eliminate the phase distortion of overlap-and-add [8]. The
High-Quality and Low-Complexity Real-Time Voice Changing
23
WSOLA algorithm determines the best spot between an input frame and a reference frame to be overlapped in order to maintain the natural continuity of the input signal. The synthesis equation of the WSOLA algorithm is defined as [8] ∑ v(n − k ⋅ D) x(n + k ⋅ D ⋅ α − k ⋅ D + Δk ) y(n) = k ∑ v( n − k ⋅ D ) k
(1)
where x(n ), y (n ), and v (n ) are an input signal, its corresponding time-scaled output signal, and a window signal, respectively. In addition, D indicates the overlap-andadd (OLA) length and α indicates a time-scale factor. If α is less than 1, the output signal is time-expanded. Otherwise the output signal is time-compressed. In Eq. (1), Δ k represents an optimal shift of the k-th frame. The optimal shift is measured by the following equation of
Δ k = arg max[corr ( R (n ), CΔ ( n ))] Δ
(2)
where corr ( R (n ), CΔ (n )) represents the normalized cross-correlation between the reference signal, R(n), and a candidate signal, CΔ (n ), for a search range, − Δ max ≤ Δ ≤ + Δ max . That is, the normalized cross-correlation is represented as 2 L −1
corr( R( n ), CΔ ( n )) =
∑ R( n )CΔ ( n )
n =0 2 L −1 2
2 L −1
n =0
n =0
∑ R (n)
∑
(3) CΔ2 ( n )
In the above equation, R (n ) = v (n − (k − 1) D ) x (n + (k − 1) ⋅ D ⋅ α + Δ k −1 + D ) and CΔ ( n ) = v (n − k ⋅ D ) x( n + k ⋅ D ⋅ α + Δ ). 3.2 Noise Reduction between on/off Transition
In order to remove the noise the voice changing mode is switched from on to off or vice versa, a Hanning window is applied to both the original signal stored in the buffer and the voice changed signal by the WSOLA algorithm. Then the two signals are merged by using the following equation of
⎧s (n ) w( N / 2 + n ) + sBUF ( n ) w( n ) , n < N / 2 s' (n ) = ⎨ VE n ≥ N /2 ⎩s BUF (n ),
(4)
where w(n ) = 0.5(1 − cos( N2π−n1 )) . In addition, the window length, N , is set as the half
length of an output frame length. In Eq. (4), sVE (n ) is the n-th sample of voice
24
S.D. Jo et al.
changed signal processed by the WSOLA algorithm, s BUF (n ) is the n-th sample of the signal stored in a buffer, and s' (n ) is the n-th sample of the output signal. Fig. 2 compares the spectrogram of a voice changed speech signal processed by a conventional method and that by the proposed method with the noise reduction technique in the transient regions. As shown in Fig. 2 (a), we can observe the noise spectrum at the boundary that the voice changing mode is changed from on to off. However, the kind of noise has been disappeared in Fig. 2 (b).
Fig. 2. Comparison of spectrograms processed by (a) the conventional method and (b) the proposed method
4 Complexity Reduction for Efficient Implementation on a Resource-Constrained Device In this section, we discuss a method of reducing the computational complexity for real-time implementation of the proposed voice changing method on a resourceconstrained device. Especially, we control the processing step in the WSOLA algorithm when the voice changing mode is switched on or off. Fig. 3 shows a flowchart of reducing the complexity reduction method. First of all, input signals are segmented into frames and then the proposed voice changing method is initialized for every frame. After that, one frame signal is stored in a buffer, where the buffer size is equal to the algorithmic delay. When the voice changing mode is on, the WSOLA algorithm is fully applied to the input signal. So, the cross-correlation between the signals from a present frame and the next frame is calculated and the optimal shift spot is found to maintain the natural continuity. However, when the voice changing mode is off, we skip the computations for both the cross-correlation calculation and the optimal shift finding in the WSOLA algorithm. But just the overlap-and-add technique is applied using the input frame signal stored in a buffer. This procedure enables us to reduce the computational complexity when the voice changing is off.
High-Quality and Low-Complexity Real-Time Voice Changing
25
Fig. 3. Flowchart for reducing the computational complexity of the proposed voice changing method with an on/off control
5 Performance Evaluation In this section, we evaluated the performance of the proposed voice changing method in terms of processing time and speech quality after implementing it on a resourceconstrained device. Table 1 describes the specifications of the device. That is, it is equipped with an embedded operating system (OS) and an ARM processor with a CPU clock of 133 MHz and a RAM size of 17 MB. Table 2 compares the computational complexity between the conventional method and the proposed method with the voice changing mode. Here, the average processing time was measured by using stereo speech data of 300 seconds long, sampled at a rate of 32 kHz. As shown in the table, the proposed method yielded a processing time reduction of 84.5% when the mode is off, compared to when the mode is on. Table 3 shows the preference test results of the proposed method over the conventional method. To this end, five people with no hearing disabilities participated in the test. Two speech files processed by the conventional method and the proposed
26
S.D. Jo et al.
method were presented to the participants, and the participants were asked to choose their preference. If they felt no difference between two files, they were guided to select ‘no difference.’ As shown in the table, the proposed method was preferred by 76%, compared to the conventional method. Table 1. Specifications of a resource-constrained device in which the proposed voice changing method is implemented Item Embedded OS CPU Clock RAM
Specification ARM 133 MHz 17 MB
Table 2. Computational complexity comparison between the conventional method and the proposed method when the voice changing mode is on or off Item Average processing time at on mode Average processing time at off mode Percentage of complexity reduction
Conventional method 11.82 ms 11.81 ms 0.08%
Proposed method 11.81 ms 1.83 ms 84.5%
Table 3. Preference test results
Preference
Conventional method
No difference
Proposed method
4%
20%
76%
6 Conclusion In this paper, we proposed a voice changing method to provide a seamless switchable function with a low computational complexity for digital imaging devices. The proposed method combined the WSOLA algorithm with a sampling rate changing technique that operated in the time domain. In order to reduce the computational complexity, we controlled the processing step in the WSOLA algorithm according to the voice changing mode. In addition, the proposed method was designed to include a noise reduction technique in the region where the voice changing switching mode changed from on to off or vice versa. We compared the performance of the proposed method with that of a conventional one in terms of the processing time and speech quality. It was shown from the experiments that the proposed voice changing method gave a relative complexity reduction of 84.5% on an ARM processor and was more preferred than the conventional method by 76%. Acknowledgments. This work was supported in part by Samsung Electronics Co., and by the Global Frontier project (No. 2010-0029751) of MEST in Korea.
High-Quality and Low-Complexity Real-Time Voice Changing
27
References 1. Salor, Ö., Demirekler, M.: Dynamic programming approach to voice transformation. Speech Communication 48(10), 1262–1272 (2006) 2. Moulines, E., Laroche, J.: Non-parametric techniques for pitch-scale and time-scale modification of speech. Speech Communication 16(2), 175–205 (1995) 3. Stylianou, Y.: Voice transformation: a survey. In: Proceedings of ICASSP, pp. 3585–3588 (2009) 4. Benesty, J., Sondhi, M., Huang, Y.: Handbook of Speech Processing. Springer, Heidelberg (2007) 5. Vergin, R., O’Shaughnessy, D., Farhat, A.: Time domain technique for pitch modification and robust voice transformation. In: Proceedings of ICASSP, pp. 947–950 (1997) 6. Roucos, S., Wilgus, A.: High quality time-scale modification for speech. In: Proceedings of ICASSP, pp. 493–496 (1986) 7. Wayman, J., Wilson, D.: Some improvements on the synchronized-overlap-add method of time scale modification for use in real-time speech compression and noise filtering. IEEE Transactions on Acoustics, Speech, and Signal Processing 36(1), 139–140 (1988) 8. Verhelst, W., Roelands, M.: An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech. In: Proceedings of ICASSP, pp. 554–557 (1993) 9. Hardam, E.: High quality time scale modification of speech signals using fast synchronized overlap add algorithms. In: Proceedings of ICASSP, pp. 409–412 (1990) 10. Moulines, E., Charpentier, F.: Pitch-synchronous waveform processing techniques for textto-speech synthesis using diphones. Speech Communication 9(5-6), 453–467 (1990) 11. Keogh, E., Pazzani, M.: Derivative dynamic time warping. In: Proceedings of 1st SIAM International Conference on Data Mining, pp. 1–11 (2001)
Complexity Reduction of Virtual Reverberation Filtering Based on Index-Based Convolution for Resource-Constrained Devices Kwang Myung Jeon1, Nam In Park1, Hong Kook Kim1, Ji Woon Kim2, and Myeong Bo Kim2 1
School of Information and Communications Gwangju Institute of Science and Technology (GIST), Gwangju 500-712, Korea {kmjeon,naminpark,hongkook}@gist.ac.kr 2 Camcorder Business Team, Digital Imaging Business Samsung Electronics, Suwon-si, Gyeonggi-do 443-742, Korea {jiwoon.kim,kmbo.kim}@samsung.com
Abstract. Virtual reverberation effects are a vital part of virtual audio reality. Reverberation effects can be directly applied by implementing a convolution process between the input audio and a reverberation filter response that characterizes a virtual space. In order to apply reverberation effects, however, additional or dedicated processors are required for practical implementation due to the excessively long impulse response of the reverberation filter. In this paper, we propose a fast method for applying virtual reverberation effects based on a reverberation filter approximation and an index-based convolution process. Throughout exhaustive experiments, we attempt to optimize the proposed method in terms of satisfaction of the reverberation effect and its computational requirements. We then implement three different types of virtual reverberation functions in a resource-constrained digital imaging device. It is shown that the virtual reverberation effects implemented by the proposed approach are able to operate in real-time with less than 5ms latency, with an over 80% overall satisfaction score in the subjective preference test. Keywords: Virtual reverberation, sparseness of impulse response, index-based convolution, audio effects.
1 Introduction Reverberation is a very common phenomenon in our life. Whether we speak in a classroom or listen to musical performances in a concert hall, the sounds we hear contain delayed reflections from many different directions based on the characteristics of the room. Hence, virtual reverberation effects, which reflect physical room characteristics, are a vital part of virtual audio reality. Using reverberation, an unaffected recorded sound can be transformed to the sound as if it was recorded in a large room, a musical hall, or a wet bathroom. This ability to apply the reverberation effects of desired rooms is especially useful in audio productions T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 28–38, 2011. © Springer-Verlag Berlin Heidelberg 2011
Complexity Reduction of Virtual Reverberation Filtering
29
for movies, PC games, and in virtual reality applications, where most source sounds are recorded in a studio with no inherent reverberation effects. Reverberation can be generated by multiple feedback delay circuits to create echo signals to make artificial reverberations. In digital signal processing, the multiple feedback delay circuits are realized using a reverberation filter whose impulse response contains delayed responses by considering the positions of sound sources, listening spots, and the characteristics of the room we want to realize. Then, a conventional convolution process between the input audio and that reverberation filter response is performed [1]. Many digital signal processing algorithms, including the image method [2], have focused on the design of the reverberation filter response. However, they cannot be directly implemented in most resource-constrained devices due to the high computational requirement which is derived by the convolution process with excessively long impulse response of the reverberation filter. For this reason, reverberation effects are commonly applied via additional processors or by using a hardware dedicated to fast convolution of reverberation processes [3][4]. It should be noted, however, that the implementation of these strategies is still limited in resource-constrained devices due to both cost and implementation issues. To resolve the computational problem of applying reverberation effects using the convolution process, we propose an approximation approach of a reverberation filter and a new convolution process, so-called index-based convolution, in order to reduce the overall computational requirements.
2 Review of Virtual Reverberation Filtering The application of virtual reverberation effects is based on two major steps; a reverberation filter generation and a convolution process between the input audio and the generated reverberation filter. The reverberation filter generation is based on the simulation of room acoustics [2][6]. A more sophisticated way of modeling reverberation has previously been proposed by considering several factors such as the positional information of sound sources in a room and the acoustic absorption of a room surface, the humidity, and the air temperature [5]. However, the reverberation characteristics of a well-modeled filter can be distorted by the following approximation step. Depending on the computational capability of the device to be implemented, the degree of the approximation can be varied. Since implementing reverberation effects on a resource-constrained device requires a high approximation ratio, the well-modeled reverberation filter is approximated in a way of preserving the strong characteristics of the well-modeled filter. Fig. 1 shows an example of the virtual sound source in a 2-dimensional space derived by the image method. The cross and the asterisk symbol in the figure represent the source sound position and the listening spot of the virtual listener, respectively. In addition, the black circles represent the virtual sound positions perceived as a reverberation. In this process, a main issue regarding the reverberation filter generation is how to calculate the virtual sound positions to reflect the positional information to the filter.
30
K.M. Jeon et al.
Fig. 1. Example of the virtual sound source in 2-dimensional space derived by the image method
Fig. 2. Illustration of calculating the virtual sound position in a 1-dimensional space
Fig. 2 shows the simplified method of calculating the virtual sound position in a 1dimensional space. In the figure, the circle, x s , and x r represent the recording position, the distance between the recording position and the source sound position, and the distance between the recording position and the wall of the modeled room, respectively. Then, the i-th virtual sound position in a 1-dimensional space is denoted as
⎛ 1 − ( −1)i xi = ( −1)i xs + ⎜ i + ⎜ 2 ⎝
⎞ ⎟ x r − xm ⎟ ⎠
(1)
where xm is the distance between the recording and listening positions. Note that the concept describe above can also be extended to find the virtual sound position in the y- and z-direction. In other words, we can obtain the j-th and k-th virtual sound positions in the y- and z-direction as y j and zk , respectively, by using Eq. (1). Thus,
the distance between the recording position and the ijk-th virtual sound source in a 3dimensional space, d ijk , is represented as
d ijk = xi2 + y 2j + zk2 .
(2)
In Eq. (2), d ijk is used to derive the unit impulse response for the reverberation filter. First of all, the impulse response of each virtual sound position is represented as
Complexity Reduction of Virtual Reverberation Filtering
⎧1, if d ijk = tc aijk = ⎨ ⎩0, otherwise
31
(3)
where t is the time delay of the echo and c is the speed of sound which is given by the conditions of the room’s medium. aijk is the unit impulse response at time t . Second, magnitude of each unit impulse response is computed by taking into account the wall’s reflectivity and d ijk . For the given distance, d ijk , and position (i, j, k), the magnitude of each impulse response, eijk , is calculated as
eijk = rijk bijk i+ j+k
where rijk = rw
(4)
and bijk ∝ 1 d ijk . In other words, bijk is distance coefficient
which is inversely proportional to d ijk , and rijk is the room’s reflection coefficient by assuming that every wall surrounding the room has the same reflection coefficient defined as rw . Finally, the reverberation filter containing the characteristics of the virtual room is represented as h (t ) = ∑ ∑ ∑ aijk eijk .
(5)
i j k
After the reverberation filter is generated, a linear convolution process between the input data and the reverberation filter is carried out to generate the reverberation effects for the output data. However, the major problem in this simple process is the excessive computational requirement, which comes from the long filter response range. To overcome this problem, a fast method that reduces the computational requirements is proposed in the following section.
3 Fast Implementation of Virtual Reverberation The proposed method for applying reverberation effects consists of two steps such as filter generation and filter application step. Fig. 3 shows an overall procedure of applying reverberation effects using the proposed method. Note that the filter generation step includes reverberation filter generation and its approximation. In this section, we assume that a reverberation filter is generated by using the procedure described in Section 2.
Fig. 3. Procedure of applying reverberation effects using the proposed method
32
K.M. Jeon et al.
3.1 Reverberation Filter Approximation
The impulse response of the generated reverberation filter has significantly long duration. Typically, around a hundred thousand or more durational impulse response is required to achieve the desired reverberation effects. Thus, the conventional convolution process between the input data and the reverberation filter causes significantly computational burden. To reduce computational burden while maintaining the desired reverberation effects, we propose a new convolution process when the data are sparse, which is called an index-based convolution process in this paper and will be discussed in the next subsection. The index-based convolution process is originated from the fact that the convolution should be actually done only for non-zero data. Therefore, we can approximate the generated reverberation filter such that the number of non-zero values in the response is as small as possible. Reverberation filter approximation is performed by clipping non-zero values into zero by using a predefined threshold. In other words, the approximated reverberation filter, H a ( z ), is obtained as N −1
N −1
i =0
i =0
H a ( z ) = ∑ h (i ) w(i ) z − i = ∑ ha (i ) z − i
(6)
where h(i ) is the i-th value of the impulse response or the i-th filter coefficient of the generated reverberation filter, and N is the duration of the filter. In addition, ha (i ) is the filter coefficient obtained from h(i ) by applying the weighting value defined as ⎧1, w(i ) = ⎨ ⎩0,
if h(i ) ≥ Thr else
, for i = 0,1,L ,n
(7)
where Thr is a predefined threshold and used to control an approximation ratio. Fig. 4(a) shows the impulse response of the generated reverberation filter. For the filter generation, the room size was set to 135m × 180m × 16m, and the source spot and the listening spot was set in the position of (67m, 90m, 10m) and (67m, 45m, 10m), respectively. For the given positions of source and listening spot, ( x r =0, x s =0, x m =0), ( y r =90, y s =45, y m =0), ( z r =0, z s =0, z m =0) in Eq. (1). In addition, rw =0.9 in Eq. (4). It is known from the figure that the generated reverberation filter consists of densely distributed responses with small amplitude and sparsely distributed responses with a distinctive large amplitude in between the small ones. Due to such distribution characteristics of the generated reverberation filter, applying the filter approximation method described in Eqs. (6) and (7) can dramatically reduce the non-zero filter coefficients of the reverberation filter. Fig. 4(b) shows the impulse response of an approximated reverberation filter when Thr =0.18. As mentioned earlier, the approximation does not hurt the performance of the generated reverberation filter. That is, the effectiveness of the virtual reverberation by the generated reverberation filter is perceptually identical to that by the approximated filter. We performed exhaustive informal listening tests and thus it was found that the approximated filter provided somewhat better reverberation effects than the generated filter, which will be discussed in Section 4.1.
Complexity Reduction of Virtual Reverberation Filtering
33
Fig. 4. Comparison of impulse responses; (a) impulse response of the generated reverberation filter and (b) impulse response of an approximated reverberation filter
l := 0; k := 0 repeat if w( k )! = 0 then hI (l ) := ha ( k ); I (l ) := k ; l := l + 1; until k = N − 1 M := l ; Fig. 5. Pseudo-code for obtaining hI (n ) from ha (n )
3.2 Index-Based Convolution
In this subsection, we propose a modified approach of the linear convolution designed to reduce complexity of a filter whose non-zero impulse response is sparsely distributed, which is here referred to as index-based convolution. A key idea of the
34
K.M. Jeon et al.
index-based convolution is to skip the computation at the time when the impulse response of the filter is equal to zero. Therefore, the index-based convolution is very computationally efficient and thus it can be applied to the approximated reverberation filter described in Section 3.1. The index-based convolution can be derived from the linear convolution with the approximated reverberation filter. To begin with, the linear convolution is applied to the approximated filter, ha (n ), from Eqs. (6) and (7), as N −1
N −1
k =0
k =0
y ( n ) = ∑ ha ( k ) x ( n − k ) = ∑ h(k ) w( k ) x( n − k )
(8)
where h (n ) is the impulse response of the generated reverberation filter and N is the duration of the filter. Also, x(n) and y (n) are input and output audio respectively. By properly selecting the threshold in Eq. (8), the number of the actual summation in Eq. (8) can be performed much less then N times. If we ignore the summation in Eq. (8) when w( n ) = 0, then we obtain hI (n ) from ha (n ) as shown in Fig. 5. In the figure, M is the duration of hI (n ), and I (n ) is a position of the n-th non-zero value if the impulse response such that hI ( n ) = h( I ( n )). Thus, Eq. (8) can be rewritten as M −1
y (n ) = ∑ hI (l ) x (n − I (l )) . l =0
(9)
In Eq. (9), we need ( M − 1) additions and M multiplications for each y (n ). Consequently, the computation of the index-based convolution is dominated by M . In total, we need N ( M − 1) additions and NM multiplications for the index-based convolution, which are smaller than N ( N − 1) additions and N 2 multiplications for the conventional convolution. Since M << N , the index-based convolution costs much cheaper than the conventional convolution even if we require a memory to store the lookup table, I (n ), and additional operations for looking up the index from the table.
4 Performance Evaluation Two virtual reverberation systems were set up on a PC platform to evaluate both the effectiveness of the output data with reverberation and the efficiency in terms of hardware requirements. One system included the conventional convolution between the input data and the reverberation filter without approximation. On the other hand, the other system included the proposed index-based convolution between the input data and the approximated reverberation filter. In other words, the filter was generated by the image method with a rectangular space of (135m × 180m × 16m). The virtual positions of the source and listening spots were set to (67m, 90m, 10m) and (67m, 45m, 10m), respectively. The model parameters were set as mentioned in Section 3.1.
Complexity Reduction of Virtual Reverberation Filtering
35
4.1 Effect of the Filter Approximation on Reverberation Effects
To compare the reverberation effects by the index-based convolution with that by the conventional convolution, we carried out an informal listening preference test. This test asked listeners to choose their preferred output from between two systems. In this paper, ten people, seven males and three females, participated in the test, and three different types of input audio were considered for each three reverberations which are defined in Table 2. Fig. 6 shows preference test results between reverberation effects realized by the generated reverberation filter with the conventional convolution and by the approximated reverberation filter with the index-based convolution. It was shown from the figure that the reverberation effects by the approximated reverberation filter with the index-based convolution were preferred during the test, which was the result beyond our expectation. This implied that the filter approximation did not hurt the perceptual effectiveness of the reverberation filter 4.2 Effect of the Proposed Index-Based Convolution on Computational Complexity
Table 1 compares the computational complexity of the reverberation filters realized by the conventional and the index-based convolution. Here, N was measure as 150,737 for the generated reverberation filter depicted in Fig. 4(a). After that, by setting Thr in Eq. (7) to 0.18, M became 43, as shown in Fig. 4(b), which implied that the computational complexity of the index-based convolution with the approximated filter was significantly lower than that of the conventional convolution with the generated filter. Thus, in order to compare the actual computational complexity between the two convolutions, we counted the numbers of additions and multiplications per a frame, where a frame size was 1024.
Fig. 6. Preference test results regarding reverberation effects realized by the generated reverberation filter with the conventional convolution and by the approximated reverberation filter with the index-based convolution
As showed in Table 1, the computational complexity of the index-based convolution applied to the approximated reverberation filter is much smaller than that of the conventional convolution applied to the generated reverberation filter. This complexity
36
K.M. Jeon et al.
reduction was already expected in that M is also much smaller than N . Notice here that the index-based convolution required a lookup table for I (n ) , but this requirement is a marginal burden to the index-based convolution compared to the huge reduction in complexity. Table 1. Comparison of computational complexity of the reverberation filters realized by the conventional and index-based convolution Convolution method
No. of the filter coefficients used in convolution
No. of additions /frame
No. of multiplications /frame
Conventional
150,737
1.542 × 108
1.543 × 108
Index-based
43
4.398 × 10 4
4.403 × 10 4
5 Implementation of Reverberation Effects on Resource Constrained Devices We implemented three different reverberation effects by the index-based convolution method on a resource-constrained digital imaging device having an ARM processor with a CPU clock of 133 MHz and a RAM size of 17 MB. Three reverberation filters were generated by the image method using the parameters shown in Table 2. Each generated reverberation filter was approximated using a different threshold in Eq. (7). In the target device, input and output audio signals were sampled at a rate of 32 kHz, and we applied each filter to the audio signals once every frame, where the frame size was 1024 as mentioned in Section 4.2. Hence, for the real-time implementation, all the operations should be done within 32 ms that corresponded in the frame size. In order to make sure that the reverberation filtering could be implemented in real time, we measured processing time per frame for all three reverberation filters by using the TRACE32 In-Circuit Debugger (ICD) [7]. The first column of Table 3 shows processing time per frame measured on the ARM processor for three different reverberation effects. As shown at the column, the ‘Large Hall’ effect required the highest computations but its computations only occupied 12.8% of the ARM CPU clock. Thus, it was concluded here that the index-based convolution with the filter approximation enabled the reverberation effects to be realized in real-time on a resource-constrained device. Table 2. Model parameters of the image method applied to the reverberation filter generation, where three different reverberation effects are simulated Reverberation model Room
Room size [x,y,z] (m) [40,50,10]
Source spot [x,y,z](m) [20,25,5]
Listening spot [x,y,z](m) [20,15,5]
Reflection Approximation coefficient threshold 0.2 0.03
Concert Stage
[80,100,15]
[40,50,7]
[40,25,7]
0.7
0.15
Large Hall
[135,180,20]
[67,90,10]
[67,45,10]
0.9
0.18
Complexity Reduction of Virtual Reverberation Filtering
37
Finally, to demonstrate the perceptual effectiveness of the proposed approach, we carried out an informal listening test. Ten people including seven males and three females participated in this test, and each listener voted a preferred output audio between with the reverberation effects and without any reverberation effect. The second column of Table 3 shows the preference scores for three different reverberation effects. It could be seen from the table that the output audio with the reverberation effects was significantly preferred rather than that without any effects. Table 3. Computational complexity measured in processing time per frame and the preference score of the reverberation effects Reverberation model Room Concert Stage Large Hall
Processing time/frame (ms) 3.0 3.8 4.1
Preference score (%) 75 100 100
6 Conclusion In this paper, we discussed issues associated with applying virtual reverberation effects based on a reverberation filter approximation and an index-based convolution process. The number of reverberation filter coefficients generated by the imagemethod was too large to implement it in real time on a resource-constrained device, thus the filter approximation enabled us to reduce the number of filter coefficients whose values were non-zero without any perceptual degradation of the reverberation effects. Next, an index-based convolution was proposed to efficiently perform a convolution with the approximated reverberation filter. In order to show the effectiveness of the proposed approach, we compared the computational complexity and the preference between the conventional convolution with the originally generated reverberation filter and the index-based convolution with the approximated one. It was shown from the experiments that the processing time of the proposed approach was much lower than that of the conventional one and also the output audio processed by the proposed approach was preferred rather than that by the conventional one. In particular, the computations for the proposed approach only occupied 12.8% of the CPU clock in the device, which concluded that the proposed approach was an efficient way of implementing the reverberation effects on resourceconstrained devices. Acknowledgments. This work was supported in part by Samsung Electronics Co., and by the Global Frontier project (No. 2010-0029751) of MEST in Korea.
References 1. Reilly, A., McGrath, D.: Convolution processing for realistic reverberation. In: Proceedings of 98th AES Convention (1995) (preprint 3977) 2. Allen, J.B., Berkley, D.A.: Image method for efficiently simulating small-room acoustics. Journal of the Acoustical Society of America 65(4), 943–951 (1979)
38
K.M. Jeon et al.
3. Cowan, B., Kapralos, B.: GPU-based real-time acoustical occlusion modeling. Virtual Reality 14(3), 183–196 (2010) 4. Rush, M.: Convolution Engine Utilizing NVIDIA’s G80 Processor ECC 277 Project (2007), http://www.ece.ucdavis.edu/~mmrush 5. Campbell, D.R., Palomäki, K.J., Brown, G.J.: Roomsim, a MATLAB simulation of “Shoebox” room acoustics for use in teaching and research. Computing and Information Systems 9(3), 48–51 (2005) 6. McGovern, S.G.: Fast image method for impulse response calculations in box-shaped rooms. Applied Acoustics 70(1), 182–189 (2009) 7. Lauterbach Datentechnik GmbH.: TRACE32 Debugger and In-Circuit Emulator (2006)
Audio Effect for Highlighting Speaker’s Voice Corrupted by Background Noise on Portable Digital Imaging Devices Jin Ah Kang1, Chan Jun Chun1, Hong Kook Kim1, Ji Woon Kim2, and Myeong Bo Kim2 1
School of Information and Communications Gwangju Institute of Science and Technology (GIST), Gwangju 500-712, Korea {jinari,cjchun,hongkook}@gist.ac.kr 2 Camcorder Business Team, Digital Imaging Business Samsung Electronics, Suwon-si, Gyenggi-do 443-742, Korea {jiwoon.kim,kmbo.kim}@samsung.com
Abstract. In this paper, an audio effect (AE) algorithm is proposed which can be applied to portable digital imaging devices to enjoy video contents effectively. The proposed AE algorithm enhances speech signals corrupted by background noise in audio content based on audio content classification (ACC) and the signal-to-noise ratio (SNR) estimation in order to highlight speaker’s voice. The ACC classifies each short segment of audio content as speech, non-speech, or mixed signal by using the parameters such as signal energy, sub-band energy, and residual signal energy obtained from the linear prediction analysis. Then, we adaptively scale the signals according to the classification and the estimated SNR. To show the effectiveness of the proposed AE algorithm, we perform an informal listening test between the original audio contents and their processed versions by the proposed AE algorithm. Consequently, it is shown that the proposed AE algorithm significantly improves audio quality. Keywords: Audio effect, audio content classification, adaptive scaling, speech enhancement, portable digital imaging devices.
1 Introduction A wide range of portable digital imaging devices, i.e., digital camcorders or smart phones, have been developed to generate or enjoy video content. Furthermore, as the video content sharing platforms, e.g., YouTube [1], become more popular, suppliers of portable digital imaging devices should provide useful services that assist users to enjoy video content easily and effectively. In this paper, an audio effect (AE) algorithm is presented to enhance speech signals corrupted by background noise in audio content, in order to provide a useful service on portable digital imaging devices. A number of research works have been reported for speech enhancement in a noise environment. Spectral subtractions [2] and Wiener filtering [3] are some of the useful speech enhancement methods, though these T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 39–45, 2011. © Springer-Verlag Berlin Heidelberg 2011
40
J.A. Kang et al.
methods accompany the speech quality distortion due to unnatural artifacts. Thus, the proposed AE algorithm aims to enhance speech signals without distorting speech quality. Towards this goal, the proposed AE algorithm classifies each short segment of audio signal as speech, non-speech, or mixed signal, and also estimates a signal-tonoise ratio (SNR) of the audio segment. In addition, by utilizing the parameters such as signal energy, sub-band energy, and residual signal energy obtained from the linear prediction analysis, the proposed AE algorithm adaptively scales down or up audio signals according to the classification result and estimated SNR. Following this introduction, Section 2 proposes an AE algorithm that enhances speech signals for highlighting speaker’s voice. After that, Section 3 discusses a service scenario using the proposed AE algorithm for portable digital imaging devices. Section 3 evaluates the performance of the proposed AE algorithm by performing an informal listening test. Finally, we conclude our findings in Section 5.
2 Proposed Audio Effect Algorithm for Highlighting Speaker’s Voice In general, audio content in a video file is compressed by MPEG advanced audio coding (AAC) [4] and stored in a portable digital imaging device. In other words, audio signals are segmented into a series of frames, where a typical frame size is 1024 for MPEG AAC. After that, each frame is processed by an MPEG AAC encoder and stored into the internal memory or a memory card in the device. The proposed AE algorithm is applied to the decoded audio signal that is obtained from the stored bitstream by an MPEG AAC decoder. Input Audio Signal
Energy Analysis
Sub-band Scaling
LP Analysis
QMF Synthesis
QMF Analysis
Scale Factor Decision
ACC
Adaptive Scaling Output Audio Signal
Fig. 1. Procedure of the proposed AE algorithm for highlighting speaker’s voice
Fig. 1 shows the procedure of the proposed AE algorithm. The first step of the proposed algorithm is to compute several parameters to be used for audio content classification (ACC). At first, we compute the signal energy of the n-th frame, E (n ). Second, we apply linear prediction (LP) analysis to the audio data of the n-th frame
Audio Effect for Highlighting Speaker’s Voice Corrupted by Background Noise
41
and obtain residual signals. Then, we also compute the signal energy form the residual signals, E R (n). The reason why we compute two different signal energies is in that the amplitude of the LP residual signal for speech is much smaller than that for nonspeech, thus the residual signal energy is useful for signal classification. Third, through the 2-band quadrature mirror filter (QMF) analysis, the input signal is decomposed into the low or high band signal. Here, the low and the high band are regarded as speech band and non-speech band, respectively, since the speech signal is generally represented by frequencies up to 4 kHz. After the QMF analysis, the speech band energy, E L (n ), and the non-speech band energy, E H (n ), are computed. As a second step of the proposed AE algorithm, we perform audio content classification (ACC). In other words, we classify each frame as speech, non-speech, or mixed signal by using the four parameters such as signal energy, E (n) , residual
signal energy, E R (n ) , low band energy, E L (n ), and high band energy, E H (n ). The ACC for the n-th frame, ACC (n ), is decided as if ( SRER <θ1 and LHER > θ 2 ) and ( SNER >θ 3 and SSER > θ 4 ) ⎧speech, ⎪ ACC (n ) = ⎨mixed signal, if ( SRER <θ1 and LHER > θ 2 ) and ( SNER >θ 3 or SSER > θ 4 ) (1) ⎪non - speech, otherwise ⎩
where signal-to-residual energy ratio (SRER), low-to-high band energy ratio (LHER), signal-to-non-speech energy ratio (SNER), and signal-to-speech energy ratio (SSER) are defined as E R ( n) / E ( n), E L (n ) / E H (n), E (n ) / E N (n ), and E ( n) / E S (n ), respectively. In the definition, E N (n) and ES (n) are average energies corresponding to E N (n) and ES (n ), respectively. They are estimated from the frames classified as non-speech and speech, respectively. In other words, if the n-th frame is classified as speech, then ES (n) is updated as
E S ( n ) = α E S ( n − 1) + (1 − α ) E ( n )
(2)
where 0 < α < 1. However, ES (n ) = ES ( n − 1) if the n-th frame is non-speech or mixed-signal. Similarly, E N (n) is updated as E N ( n ) = α E N ( n − 1) + (1 − α ) E ( n ) if the n-th frame is non-speech, but E N (n ) = E N (n − 1) if the n-th frame is speech or mixed-signal. Next, sub-band scaling is conducted to enhance the speech band signals as well as suppress the non-speech band signals. That is, the low band and high band signals obtained from the 2-band QMF analysis are scaled by g L (n ) or g H (n ), respectively. They are estimated using ES (n ) and E N (n) as
g H ( n) = 1 − β
E S ( n) E N (n)
g L (n ) = 2 − g H (n)
(3) (4)
42
J.A. Kang et al.
where β is a constant and used to control the degree of the gain variation. Subsequently, the scaled sub-band signals are synthesized through 2-band QMF synthesis. Finally, a scale factor in the time domain is obtained to further enhance or suppress the synthesized signals according to the ACC result. This scale factor is used to decrease the energy of audio signals classified as mixed signal or non-speech in order to further suppress the noise signals. Otherwise, for audio signals as classified as speech, this scale factor is used to match the energy of synthesized audio signals to the energy of original audio signals in order to maintain the energy of the audio signals enhanced by sub-band scaling as that of the original audio signals. Thus, we re-compute the signal energy from the synthesized signal after applying the sub-band scale factors of Eqs. (3) and (4), which is here denoted as Eˆ (n). The scale factor is then computed as ⎧ E ( n) , if ACC (n) = speech ⎪ˆ ⎪ E ( n) ⎪⎪ E (n) g ( n) = ⎨ S , if ACC (n) = mixed signal ˆ ⎪ k m E (n) ⎪ E S ( n) , if ACC (n) = non - speech ⎪ ˆ ⎪⎩ k n E (n)
(5)
where k m or k n is the estimated SNR from the frames declared as mixed signal or non-speech, respectively. In other words, when the n-th frame is classified as mixed signal or non-speech, g (n ) is determined so that the energy of the output signals decreases up to ES (n) / k m or ES ( n) / kn . On the other hand, when the n-th frame is classified as speech, g (n) is determined so that the energy of the output signals becomes equal to that of the input signals. Consequently, by multiplying g (n) to the signals synthesized by the 2-band QMF, we generate the output signals that highlight speaker’s voice against background noise.
3 Service Scenario Using the Proposed Audio Effect Algorithm for Portable Digital Imaging Devices Fig. 2 shows the procedure for a service scenario using the proposed AE algorithm for portable digital imaging devices. In order to activate the service, a user should select a video file and command a portable digital imaging device to play back the selected file. During the service, a user can also select the proposed AE algorithm optionally by enabling an on/off function implemented on the device. In other words, if the function is on, the proposed AE algorithm operates once every frame and the processed audio data are sent to the output device. Otherwise, audio data are directly brought to the output device.
Audio Effect for Highlighting Speaker’s Voice Corrupted by Background Noise
User
43
System a. Read the unit frame data from audio content in video file
A. Select video file B. Commend to play back
b. Verify on/ off function on
C. Enable on/off function
off
c. Process audio data by the proposed AE algorithm d. Send the processed audio data to the output device
e. Send audio data to the output device
Fig. 2. Service scenario using the proposed AE algorithm for portable digital imaging devices
4 Performance Evaluation In this section, we first discuss how the proposed AE algorithm works by showing the waveforms and parameters for each processing step and then perform an informal listening test for the audio contents before and after applying the proposed AE algorithm. We prepared audio contents which were recorded by a digital camcorder in a stereo format of MPEG AAC at a sampling rate of 32 kHz with a 16 bit resolution. In order to set the thresholds in Eq. (1), we performed exhaustive preliminary experiments with a number of audio contents recorded in noise environments, and we found it out that setting θ1 , θ2 , θ 3, and θ 4 in Eq. (1) to 0.25, 2.0, 1.8, and 0.35, respectively, provided the best performance of the proposed AE algorithm. In addition, other constants were also set as α = 0.5 in Eq. (1), β = 0.05 in Eq. (3), and km = 1.5 and kn = 3.5 in Eq. (5). Fig. 3 shows an example how the proposed AE algorithm highlights speaker’s voice. That is, Fig. 3(a) displays the input audio waveform, where speaker’s voice was corrupted by babble and music noises. Then, we performed the ACC and the result was shown in Fig. 3(b), where 0, 1, and 2 indicated non-speech, mixed signal, and speech, respectively. Next, we estimated sub-band scale factors, g L (n) or g H (n ), and depicted them as a solid line and a dotted line in Fig. 3(c), respectively. By multiplying the sub-band scale factors to the low and high band signal followed by applying the 2-band QMF synthesis, we obtained the signal displayed in Fig. 3(d). After that, we estimated a time-domain scale factor, g (n ) , defined in Eq. (5), from the waveform of Fig. 3(d), which was shown in Fig. 3(e). Finally, we obtained the output signal after multiplying g (n ) to the waveform of Fig. 3(d), which was shown in Fig. 3(f). It was shown from the figure that the proposed AE algorithm classified each frame of the input audio signals to speech, non-speech, or mixed signal as expected, thus background noise was suppressed effectively in the output audio signal.
44
J.A. Kang et al.
(a)
(b)
(c)
(d)
(e)
(f) Fig. 3. Example of the experimental results by the proposed AE algorithm: (a) input audio signal, (b) the classification result, (c) sub-band scale factors (solid line: low band scale factor, dotted line: high band scale factor), (d) synthesized audio signal after multiplying the sub-band scale factors, (e) time-domain scale factor, and (f) output audio signal
Audio Effect for Highlighting Speaker’s Voice Corrupted by Background Noise
45
Finally, to demonstrate the perceptual effectiveness of the proposed AE algorithm, we carried out an informal listening test. Ten participants including seven males and three females participated in this test, and each listener voted a preferred audio content between the original audio content and its processed version by the proposed AE algorithm. Table 1 shows the average preference score for 10 audio contents. As shown in the table, the audio contents processed by the proposed AE algorithm were significantly preferred rather than the original audio contents. Table 1. Average preference score for the audio contents processed by the proposed AE algorithm Original audio contents Preference (%)
5
Audio contents processed by the proposed AE algorithm 95
5 Conclusion In this paper, an audio effect (AE) algorithm was proposed to enhance speech signals corrupted by background noise in audio content and applied to a portable digital imaging device for highlighting speaker’s voice. First of all, the proposed AE algorithm classified each short segment of audio signals as speech, non-speech, or mixed signal, on the basis of the parameters such as signal energy, sub-band energy, and linear prediction residual signal energy. Then, a scale factor for the output signal was adaptively determined depending on the classification and the estimated signalto-noise ratio. It was shown from an informal listening test that audio contents processed by the proposed AE algorithm outperformed the original audio contents. Acknowledgments. This work was supported in part by the Global Frontier project (No. 2010-0029751) of MEST in Korea, and by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the MEST in Korea (2010-0023888).
References 1. http://www.youtube.com/ 2. Boll, S.: Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech and Signal Processing 27(2), 113–120 (1979) 3. Lim, J.S., Oppenheim, A.V.: Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE 67(12), 1587–1604 (1979) 4. ISO/IEC 13818-7: Information Technology - Generic Coding of Moving Pictures and Associated Audio Information - Part 7: Advanced Audio Coding, AAC (December 2004)
Detection of Howling Frequency Using Temporal Variations in Power Spectrum Jae-Won Lee1 and Seung Ho Choi2 1 Graduate School of NID Fusion Technology Seoul National University of Science and Technology, Seoul 139-743, Korea
[email protected] 2 Department of Electronic and Information Engineering Seoul National University of Science and Technology, Seoul 139-743, Korea
[email protected]
Abstract. Indoor audio feedback is a common problem in many audio amplification systems. The audio feedback can be out of control in some indoor conditions, which results in howling. Also, the howling frequency is subject to variation by the changes of indoor environments. Most conventional methods attempt to eliminate the howling, but they do not provide a way to predict the occurrence of howling. This paper presents a novel method to predict the howling frequency using temporal variations in power spectrum. Keywords: Howling, acoustic feedback circuit, moving average, power spectrum.
1 Introduction Audio amplification systems such as mobile devices and hearing aids include both speaker and microphone, and howling can be generated at some specific frequency due to acoustic feedback circuit (AFC) as shown in Fig. 1 [1-3]. As a positive feedback circuit, AFC can diverge at a particular frequency in some phase condition, and this howling phenomenon restricts the amplitude gain of audio system [2-4]. Conventional methods to suppress the howling control the frequency gain after howling is detected [1]. Least mean square method that is generally used for the howling suppression works after the howling occurs [5-8]. The howling may vary depending on the overall environment such as the positions of microphones and loudspeakers, room shape and arrangement, the position and movement of talker, room temperature, etc. Therefore, it is difficult to predict the howling [9]. This paper presents a novel method to predict the howling frequency using temporal variations in power spectrum. This paper is organized as follows. Howling phenomena and transfer function characteristics are described in Section 2. In addition, relations between indoor environment and howling frequency are described. In Section 3, we describe the proposed howling prediction method that detects howling frequency using temporal variations in power spectrum before howling occurrence. And we conclude this article in Section 4. T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 46–51, 2011. © Springer-Verlag Berlin Heidelberg 2011
Detection of Howling Frequency Using Temporal Variations in Power Spectrum
47
2 Howling Phenomena 2.1 Effects of Indoor Reflections In an electric amplifier system, as shown in Fig. 1(a), the microphone input signal x(t) is amplified by the gain g and then the speaker output signal y t is generated. The y t makes the multiple reflection signals { y t } [2]. And then, the { y t } are summed to the feedback signal y t that is again entered to the microphone, where τ is the delay time of y t and α is the attenuation factor of the indoor wall [10]. The AFC of the single reflection path is shown in Fig. 1(b) and the transfer function H ω is given by Eq. (1). From the magnitude response |H ω | in Eq. (2), we can see that the transfer function is a comb filter that changes periodically [9]. As in Eq. (3), the frequency at H ω 2πm becomes a potential howling frequency (PHF) where m is an integer [10]. In other words, a howling is likely to occur at the PHF.
yi (t) = αi y0 (t − τi )
Y (ω )
X (ω )
y0 (t)
− jτω
αe y ( t ) = ∑ α i y 0 (t − τ i ) n
x(t)
i= 0
(a)
(b)
Fig. 1. AFC by indoor reflections; (a) generation of indoor reflections and (b) transfer function model
H ω
Y ω X ω
g·
|H ω |
H ω
1
αgcos τω j αgsin τω 1 2αgcos τω α g
(1)
g 1 tan
2αgcos τω
α g
αgsin τω . αgcos τω 1
(2)
(3)
The transfer function can be represented by Eq. (4). However, it is difficult to determine the parameters since the transfer function is affected by the environment such as the indoor conditions, the positions of microphone and loudspeaker, and the directivity characteristics [10]. H ω
Y ω X ω
αe
.
(4)
48
J.-W. Lee and S.H. Choi
magnitude[dB]
In fact, the howling frequency changes continuously according to moving devices and circumferential changes. Therefore it is difficult to predict a howling occurrence in advance. As shown in Fig. 2 that shows the PHF simulation result by 4 reflections, the PHF changes with time.
phase[degree]
frequency[Hz]
frequency[Hz]
Fig. 2. A PHF simulation result by 4 reflections
2.2 Magnitude Change of Howling Signal The AFC can easily diverge since the AFC is a positive feedback circuit. When a howling occurs, the signal magnitude increases continuously over time at a specific frequency as shown in Fig. 3(a) [9]. The spectrogram in Fig. 3(b) shows a howling phenomenon on an actual music signal.
y(t)
{
n −1
} +∑y
y n (t) = x ( t-nΔt ) * g s ( t ) * g f ( t )
n
m =0
m
(t)
y1 (t) = x ( t-Δt ) * g s ( t ) * g f ( t ) + y 0 (t) y0 (t) = x(t ) * g s (t ) time(s)
(a)
(b)
Fig. 3. An example of howling phenomenon; (a) magnitude of a howling signal and (b) spectrogram
2.3 Experiments on Indoor Condition Changes As shown in Fig. 4, after changing circumferential condition, the howling frequency has shifted from 380 Hz to 747 Hz. Therefore, the transition of reflective sound leads
Detection of Howling Frequency Using Temporal Variations in Power Spectrum
49
to the change of howling frequency. Also, room temperature is related to sound velocity and phase shift so that the howling frequency is affected by the room temperature. Fig. 5 shows that the howling frequency has shifted from 380 Hz at 21°C to 384 Hz at 23°C.
(a)
(b)
Fig. 4. An example of howling frequency change by different circumferential conditions; (a) the howling at 380 Hz and (b) The howling at 747Hz
0.5 10kHz 0.5kHz
0.4 0.3
phase[rad]
0.2 0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5
0
5
10
15
20
25
30
35
40
45
50
temperature[o C]
(a)
(b)
Fig. 5. An example of howling frequency change by different temperature; (a) Phase shift and (b) Howling at 384 Hz
3 Proposed Howling Frequency Detection Method 3.1 Observation of Spectral Change Using Moving Average Detection of howling occurrence is difficult since acoustic signals change in both time and frequency domains. In this research work, we used a moving average (MA) filter in order to examine spectral changes as in Eq. (5), where X(ω,n) is the Fourier transform of x(n) at ω rad/sec and MA n, ω is the MA value of X(ω,n). MA n, ω
∑
X ω, n
i .
(5)
Figs. 6(a) and 6(b) show the MA values with p=50 and p=100, respectively. As shown in Fig. 6(b), the spectral change is minimized except the spectrum at howling frequency.
50
J.-W. Lee and S.H. Choi
(a)
(b)
Fig. 6. Examples of MA filtering in spectral domain; (a) p = 50 and (b) p = 100
Fig. 7. Decision for howling occurrences; (a) MA and dMA, (b) VA, and (c) DH
3.2 Detection of Howling Using Moving Average Filtering Considering the signal energy at howling frequency continuously increases, the MA value also increases over the time. Therefore, if the MA value continuously increases at a specific frequency, we can predict that a howling is likely to occur at that frequency as in Fig. 7. The dMA n, ω in Eq. (6) is used to decide whether MA value increases or not. The VA n, ω in Eq. (7) is 1 if dMA n, ω is positive and is 0 if dMA n, ω is not positive. If VA n, ω values are 1 for m times, DH n, ω in Eq. (8) is set to 1 as shown in Fig. 7. Therefore, we can detect the howling using the DH values. Fig. 8 shows the block diagram for the howling detection method. dMA n, ω
MA n, ω
VA n, ω
DH n, ω, m
1, 0,
MA n
dMA n, ω dMA n, ω
VA n
i, ω
1, ω
(6)
0 0
(7)
(8)
Detection of Howling Frequency Using Temporal Variations in Power Spectrum
51
Fig. 8. Block diagram for the proposed howling detection method
4 Conclusion In this paper, we explained the howling phenomenon by the acoustic feedback circuit and the potential howling was related to indoor conditions and environmental changes by the experiments. We presented a novel method to predict the howling frequency using temporal variations in power spectrum. And, we showed that the proposed moving average filtering method could be successfully used for howling detection.
References 1. Loetwassana, W., Punchalard, R., Lorsawatsiri, A., Koseeyaporn, J.: Adaptive howling suppressor in audio amplifier system. In: Proceedings of Asia-Pacific Conference on Communications, pp. 445–448 (2007) 2. Lee, J.W., Kang, S.H., Choi, S.H.: Prediction of potential acoustic gain considering directivity of microphone and loudspeaker. In: Proceedings of Spring Meeting of the Acoustical Society of Korea, vol. 26, pp. 275–278 (2007) 3. Schroeder, M.R.: Improvement of acoustic feedback stability by frequency shifting. Journal of the Acoustical Society of America 36, 1718–1724 (1964) 4. Nyquist, H.: Regeneration theory. Bell Syst. Tech. J. 11, 126–147 (1932) 5. Gil-Cacho, P.: Regularized adaptive notch filters for acoustic howling suppression. In: Proceedings of EUSIPCO, pp. 2574–2578 (2009) 6. Antman, H.S.: Extension to the theory of howlback in reverberant rooms. Journal of Acoustical Society of America (Letters to the Editor) 39(2), 399 (1966) 7. Yasukawa, H., Furukawa, I., Ishiyama, Y.: Acoustic echo control for high quality audio teleconferencing. In: Proceedings of ICASSP, pp. 2041–2044 (1989) 8. Ibaraki, I., Furukawa, H., Noono, H.: Pre-howling howlback detection method. In: Proceedings of ICASSP, pp. 941–944 (1986) 9. Lee, J.W., Kang, S.H., Choi, S.H.: The effect of room environments on howling frequency. In: Proceedings of Spring Meeting of the Acoustical Society of Korea, pp. 53–56 (2010) 10. Davis, D., Davis, C.: Sound System Engineering, 2nd edn., pp. 424–426. Focal Press (1997)
Differential Brain Activity in Reading Hangul and Hanja in Korean Language Hyo Woon Yoon and Ji-Hyang Lim Department of art therapy, Daegu Cyber University, Daegu, Korea
[email protected]
Abstract. Even though the Korean words (Hangul) are characterized as phonemes like other alphabetic languages, their shape resembles much more morphemes like Chinese characters (Hanja). The use of functional magnetic resonance imaging permits the collection of brain activation patterns when native Korean speakers (12 persons as subjects) read Hangul and Hanja. The Korean language uses both alphabetic Hangul and logographic Hanja in its writing system. Our experimental results show that the activation patterns obtained for reading Hanja by Korean native speakers involve neural mechanisms that are similar to Chinese native speakers; i.e. strong left-lateralized middle frontal cortex activation. For the case of Korean word reading, the activation pattern in the bilateral fusiform gyrus, left middle frontal gyrus, left superior temporal gyrus, right mid temporal gyrus, precentral gyrus, and insula was observed Keywords: Word perception, brain activity, frontal cortex.
1 Introduction It is generally known that perceiving or reading visually presented words encompasses many processes that collectively activate several specialized neural systems to work in concert. Functional imaging techniques such as Positron Emission Tomography (PET) or functional Magnetic Resonance Imaging (fMRI) have provided meaningful insights into the neural systems that underlie word recognition and reading process in the human brain. In the proposed model of written word perception [1,2], it is proposed that a large-scale distributed cortical network, including the left frontal, temporal, and occipital cortices, mediates the processing of visuo-orthographic, phonologic, semantic, and syntactic constituents of alphabetic words. For example, the posterior fusiform gyri are relevant to visual processing, whereas the inferior frontal lobe emphasizes their role in semantic processing [3,4]. Regarding various written languages or writing systems, the question of how the surface form of words influences the neural mechanisms of the brain during word recognition is of interest. One of the most different writing systems from alphabetic words is the Chinese character. Alphabetic systems are based on the association of phonemes with graphemic symbols and linear structure, whereas Chinese characters are based on the association of meaningful morphemes with graphic units, the configuration of which is square and nonlinear. Previous studies using visual hemifield paradigms demonstrated T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 52–59, 2011. © Springer-Verlag Berlin Heidelberg 2011
Differential Brain Activity in Reading Hangul and Hanja in Korean Language
53
that the right cerebral hemisphere is more effective in processing Chinese characters than the left cerebral hemisphere [5]. This leads to a Chinese character-word dissociation hypothesis for a lateralisation pattern, since word perception is regarded to be left-lateralised. This conclusion has been disrupted, because some more current results of brain activation based on fMRI experiments suggest that the reading of Chinese characters is bi-lateralized. In particular, the left inferior frontal cortex (BA 9/45/46) emphasized the importance of the semantic generation or processing of Chinese characters [6,7]. Chinese characters are used not only in the writing systems of Chinese language, but are also widely used in the Japanese and Korean languages. The Korean writing system consists of the mixture of the pure Korean words and Chinese characters. The Korean words are characterized as phonemic components similar to the alphabetic words used in English or German. However, the shape of Korean words is nonlinear. The composition of its symbols is shaped into a square-like block, in which the symbols are arranged left to right and top to bottom. Its overall shape makes Korean more similar to Chinese than other alphabetic orthographies [8]. Furthermore, unlike alphabetic words, these phonemic symbols are not arranged in a serial order, but are combined into a single form to represent a syllable. These syllabic units are spatially separated from each other. Each Korean syllable is constructed of two to four symbols that in various combinations represent each of 24 phonemes. Thus, in a sense, Korean words, Hangul, can also be regarded as syllabograms (Figure 1). Hangul is the name of the Korean alphabet. In addition, the Korean vocabulary consists of pure Korean words (24.4 %), Chinese-derivative words (69.32 %), and other foreign words (6.28 %). Chinese derivative words can be written either in the form of Chinese ideogram or its corresponding Korean words [9]. In the current Korean writing system, e.g. daily newspapers or boulevard magazines in South Korea, the use of Chinese characters is relatively sparse. According to the statistics of the year 1994 [10], the proportion of Chinese characters in the body of daily newspapers are about 10 % and since then this has continuously diminished. Surprisingly, although these are unique and interesting characteristics, the neural mechanisms involved in reading Korean words have been rarely studied, at least with modern functional imaging techniques. Using functional magnetic resonance imaging technique, we investigated the neural mechanism involved in reading these two different writing systems by Korean native speakers. In doing so, we hope to identify specific neural mechanisms that are involved in reading Korean words (phonemes) and Chinese characters (morphemes).
2 Materials and Methods 2.1 Subjects Seven male and five female right-handed subjects (mean age: 22 years, S.D. 1.5 years) participated in the study. All were native Korean speakers who has been educated for Chinese characters for more than 6 years in school. They did not have any medical, neurological or psychiatric illness at past or present, and they did not take medication. All subjects consented to the protocol approved by the Institutional Ethics and Radiation Safety Committee.
54
H.W. Yoon and J.-H. Lim
2.2 Experimental Design As stimuli, two-character Chinese words and Korean words with equivalent phonetic as well as semantic components were chosen (Figure 1). There were 60 words for each category. All words were nouns. Half consisted of abstract meanings and the other half concrete meanings. Stimuli were presented using custom-made software on a PC and projected via an LCD projector onto a screen at the feet of the subjects. The subjects viewed the screen via a homemade reflection mirror attached on the head RF coil. Each stimulus was presented for 1.5 seconds long, followed by a blank screen for 500 ms. Ten different items of this stimulus pattern were presented, including a blank screen for one second prior to the first stimulus within a block. These stimuli blocks were alternated with the baseline task. During the baseline task, a fixation point was projected on the middle of the screen for 21 seconds. Two kinds of stimuli (Korean words and Chinese characters) and baseline task blocks lasted equally for 21 seconds each. A total of six blocks of Korean words and six blocks of Chinese characters were presented, and these were intermixed at random. During the experiment, the subjects were instructed to press the right button for nouns with an abstract meaning and the left button for those with a concrete meaning. Simultaneously, they should respond covertly to the stimuli presented. 2.3 Data Acquisition and Analysis Images were acquired by using 1.5 Tesla MRI scanner (Avanto, Siemens, Erlangen, Germany) with a quadrature head coil. Following a T1-weighted scout image, highresolution anatomic images were acquired using an MPRAGE (Magnetization-Prepared RApid Gradient Echo) sequence with TE = 3.7 ms, TR =8.1 ms, flip angle = 8°, and image size of 256 x 256. T2*-weighted functional data were acquired by using echo planar imaging (EPI) with TE = 37 ms, flip angle = 80°, TR = 3000 ms, and image size of 64 x 64. We obtained 30-slices EPI images with slice thickness of 5 mm and no gaps between slices for the whole brain. Total 172 volumes were acquired per an experimental run. For each participant, the first four volumes in each scan series were discarded, which were collected before magnetization reached equilibrium state. Image data were analyzed using SPM2 (Wellcome Department of Cognitive Neurology, London). The images of each subject were corrected for motion and realigned using the first scan of the block as a reference. T1 anatomical images were coregistered with the mean of the functional scans and then aligned to the SPM T1 template in the atlas space of Talairach and Tournoux [11]. Finally, the images were smoothed by applying Gaussian filter of 7 mm full-width at half-maximum (FWHM). In order to calculate contrasts, the SOA (stimulus onset asynchrony) from the protocol was defined as events and convolved with the hemodynamic response function (HRF) to specify the appropriate design matrix. The general linear model was used to analyze the smoothed signal at each voxel in brain. Significant changes in hemodynamic response for each subject and condition were assessed using t-statistics. For the group analysis, contrast images of single subject were analyzed using a random effect model. Activations were reported if they exceeded a threshold P < 0.05, corrected on the cluster level (P < 0.0001 uncorrected at the single voxel level). Significance on the cluster level was calculated in consideration of peak activation and extent of the cluster.
Differential Brain Activity in Reading Hangul and Hanja in Korean Language
55
Fig. 1. a) indicates the activation map “Korean word reading” minus “Chinese character reading” in 12 subjects (threshold at p < 0.0005, uncorrected at a single voxel level). b) indicates the activation map of Korean words minus baseline (left two images) and Chinese character minus baseline (right two images). Threshold p-value for b) is 0.0001 (uncorrected at the single voxel level).
a)
b) Fig. 2. Brain areas showing the repetition suppression effect. The anterior portion of the left fusiform gyrus showed reduced activation in case of the cross script condition a), whereas a more posterior portion of the fusiform gyrus is responsible for the repetition suppression in the case of the same script condition b).
56
H.W. Yoon and J.-H. Lim
3 Results The mean reaction time for subjects during Korean word reading was 1.01 sec (S.D.: 325 ms), whereas, for Chinese character reading, it was 1.24 sec (S.D.: 367 ms). A paired t-test verified the significance between these two reaction times (p < 0.00001). Significant signal changes for Korean words reading vs. baseline were detected bilaterally in the fusiform gyrus (BA 19/37) and in the left middle frontal area (BA 46/6). In addition, right hemispheric activation was observed in the medial frontal gyrus (BA 8). For Chinese characters vs. baseline, the activation patterns appeared to be slightly different. In the region, responsible for the visual stimuli per se, we observed bi-hemispheric activation for the Chinese character reading vs. baseline task. In the frontal (superior, BA 8 and inferior area, BA 9) and parietal (superior, BA 7) cortices, only left hemispheric activation was significant in contrast with the baseline task. Brain areas responsible for the repetition suppression are summarized in Table 2. Repetition suppression effect in the case of prime Hanja and target Hanja was clearly seen in the area of bilateral middle fusiform gyrus. In contrast, the suppression effect of priming was found in the area of left medial fusiform gyrus (x = -36, y = -40, z = -12) in the case of prime word Hanja, target word Hangul condition. According to the further analysis this area also showed a repetition priming suppression in the case of the cross-script condition. This was done by doing computerizing a linear combination of prime Hanja-target Hangul and prime Hangul-target Hanja conditions (inclusively masked by both contrast at p < .05). This suggests that the left medial portion of fusiform gyrus seemed to exhibit a significant effect of repetition suppression, irrespective of the direction of script alternations. To the contrary, the repetition priming suppression effect in the case of the same script conditions (linear computerizing of prime Hanja-target Hanja and prime Hangul-target Hangul) showed also in the area of left fusiform gyrus, but more posterior to coronal section and lateral to sagittal section (x = -42, y = -46, z = -16).
4 Discussion In the different contrast of Korean words minus Chinese character conditions, significant positive signal changes were observed in the right superior gyrus of the frontal lobe (BA 8), the left superior temporal lobe (BA 41), and the right midtemporal lobe (BA21), precentral gyrus (BA 6) and insula (BA 13) and for the condition of Chinese character minus Korean words, activation was observed in the bi-hemispheric visual area (BA 19). In terms of behavioral data, significantly longer reaction times were observed for Chinese character reading compared to Korean word reading. Since very simple Characters were used as stimuli, it would appear that the reaction time advantage for reading Korean words is not derived from a familiarity effect. Rather, it might rely on differences in characteristics of phonological processing between these two writing systems. The phonological processing in Chinese character recognition is at the syllable-morpheme phonology level. This is the fundamental difference regarding the role of phonology between Chinese and alphabetic writing systems. The concept of
Differential Brain Activity in Reading Hangul and Hanja in Korean Language
57
pre-lexical phonology is misleading for Chinese character reading [8]. However, in processing Korean words, pre-lexical phonology is activated rapidly and automatically. Reading Korean words for meaning involves pre-lexical information processing [12]. In the functional imaging data, the activated area for the condition of Chinese character vs. baseline reading was found to be in the left hemispheric inferior and superior gyri of the frontal lobe (BA 6/9). This demonstrates the left lateralized pattern of the frontal cortex during Chinese character reading. This activation can be attributed to the unique square configuration of Chinese characters [13,14]. Chinese characters consist of a number of strokes that are packed into a square shape according to stroke assembly rules, and this requires a fine-grained analyses of the visual-spatial locations of the strokes and subcharacter components [15]. In addition, it is known that the left middle frontal cortex (BA 6/9) is the area of spatial and verbal working memory by which the subject maintains a limited amount of spatial and verbal information in an active state for a brief period of time [16,17]. More precisely, this area may play a role as a central executive system for working memory, which is responsible for coordination of cognitive resources [18]. In our experiment, even though a working memory process was not involved in the subjects’ decision, they indeed needed to coordinate the semantic (or phonological) processing of the Chinese characters. These two processes of coordination of cognitive resources and semantic processing were explicitly required by the experimental task and the intensive visuospatial processing of the Chinese characters. It seems that the activation of the left middle frontal gyrus should be involved in these two cognitive processes. This left frontal activation pattern is consistent with other studies, in which functional imaging techniques of Chinese character reading by native Chinese speakers were used, especially the activation of BA 9 [7,19,20]. Left hemispheric middle frontal activations (BA 46/6) were also observed for the condition of Korean words vs. baseline and this appears to be correlated with similar mechanisms associated with the reading of other alphabetic words [21]. Since our subjects were ask to respond after seeing and the covert speaking of Korean words (forced choice option), which is connected with semantic processing, the activation of middle frontal area seems to underlie this cognitive process. We propose that this might be the reason for why the left frontal area is strongly activated during this experimental task. Occipital lobe activation was observed for Chinese character reading in contrast with baseline as well as a direct comparison with Korean words (Table 1). The activated occipital areas, such as the fusiform gyrus, are thought to be relevant to the visual processing of Chinese characters. Interestingly, we observed right hemispheric dominant occipital activation, even though two-character Chinese words were presented as stimuli. There were some indications that the reading of two-character Chinese word is left lateralized [19,20], but our results did not support the dissociation hypothesis of single and two-character Chinese word perception. Bilateral activation of occipito-temporal area was also observed for the Korean word reading. It is generally thought that this area is relevant to the processing of the visual properties of Korean words. The activation pattern is bilateral, but the left hemispheric activity was relatively weaker (Table 1). This is not in agreement with previous studies with alphabetic words [22].
58
H.W. Yoon and J.-H. Lim
Another interesting imaging result of the present study is that the more anterior, medial area (x = -36, y = -40, z = -12) of the left fusiform gyrus is involved in the repetition suppression in case of cross script condition. This cross script related site of the present study lies further along 14 mm in the posterior temporal region. A study of Thompson-Schill et al. (1999)[23] reported that there was activation reduction of left temporal region (averaged coordinates x = -40, y = -28, z = -12, with standard deviation of y-axis 17 mm) involved in the repeated retrieval of semantic knowledge. The decreased activity in this area during semantic retrieval of primed items may be related to the phenomenon of repetition suppression observed in neurons in anteriorventral inferotemporal cortex of nonhuman primates [23]. Furthermore, Devlin et al. (2004) [24] reported that reduced activation of the left middle temporal gyrus was found for word pairs with a high degree of semantic overlap. Moreover, according to a previous study of cross script masked priming effect in Japanese by Nakamura et al. (2005)[10], the left middle temporal cortex (x = -48, y = -43, z = -2) was found to be involved in the repetition suppression. They suggested that this activation may correspond to a progressive abstraction process, as is also proposed for object recognition, whereby the raw visual features of stimuli are progressively transformed from perceptual to conceptual. In fact, this part of the left temporal gyrus was reported to be associated with the semantic network. In addition, in the behavioral results of masked priming experiment of Kim and Davis (2002)[25], they have shown priming effect of Hangul-Hanja prime-target relations. It was also suggested that this priming effect was occured at the semantic level. They further postulated that the facilitation due to priming depends upon the semantic process, which should be happening with the lexical information simultaneously (Kim and Davis, 2002). Taken together, the result of the present study due to cross script repetition suppression in the anterior part of fusiform gyrus seems to be related to the semantic representation influenced by subliminal primes, even though the activity reduction area of the present study lies closer to the occipital region compared to aforementioned previous studies. It seems that this region is related with the anterior-posterior progression in word processing. In terms of this interpretation one point should be considered. As Nakamura et al. (2005) [10] have mentioned in their study, some cautious approaches should be made due to this interpretation. The posterior temporal activity reduction due to repeated items could include the effect of motor congruity as a potential confounding variable, which is inflated by response association learned through repeated exposure to the same items). The result of the present study indicating that the stronger priming effect in case of prime Hanja, target Hangul condition may lead to an similar interpretation of them, since the subliminal primes in Hanja and their orthography-to-lexicon route seem to be activated ultimately by the phonological route of visible target words in Hangul.
References 1. Price, C.J.: The anatomy of language: contributions from functional neuroimaging. J. Anat. 3, 335–359 (2000) 2. Demonet, J.E., Chollet, F., Ramsay, S., Cardebat, D., Nespoulus, J.N., Wise, R., Rascol, A., Frackowiak, R.: The anatomy of phonological and semantic processing in normal subjects. Brain 115, 1753–1768 (1992)
Differential Brain Activity in Reading Hangul and Hanja in Korean Language
59
3. Bookheimer, S.: Functional MRI of language: New approaches to understanding the cortical organization of semantic processing. Ann. Rev. Neurosci. 25, 151–188 (2002) 4. de Zubicaray, G.I., Wilson, S.J., McMahon, K.L., Muthiah, S.: The semantic interference effect in the picture-word paradigm: An event-related fMRI study employing overt responses. Human Brain Mapping 14, 218–227 (2001) 5. Tzeng, O., Hung, D., Cotton, B., Wang, W.S.-Y.: Visual lateralisation effect in reading Chinese characters. Nature 282, 499–501 (1979) 6. Ding, G., Perry, C., Peng, D., Ma, L., Li, D., Xu, S., Luo, Q., Xu, D., Yang, J.: Neural mechanisms underlying semantic and orthographic processing in Chinese-English bilinguals. NeuroReport 14, 1557–1562 (2003) 7. Tan, L.H., Spinks, J.A., Gao, J.-H., Liu, H.-L., Perfetti, C.A., Xiong, J., Stofer, K.A., Pu, Y., Liu, Y., Fox, P.T.: Brain activation in the processing of Chinese characters and words: a functional MRI study. Human Brain Mapping 10, 16–27 (2000) 8. Wang, M., Koda, K., Perfetti, C.A.: Alphabetic and nonalphabetic L1 effects in English word identification: a comparison of Korean and Chinese English L2 learners. Cognition 87, 129–149 (2003) 9. Kim, H., Na, D.: Dissociation of pure Korean words and Chinese-derivative words in phonological dysgraphia. Brain and Language 74, 134–137 (2000) 10. Nakamura, K., Dehaene, S., Jobert, A., Le Bihan, D., Kouider, S.: Subliminal Convergence of Kanji and Kana Words: Further Evidence for Functional Parcellation of the Posterior Temporal Cortex in Visual Word Perception. J. Cog. Neurosci. 17(6), 954– 968 (2005) 11. Gusnard, D., Raichle, M.: Searching for a baseline: functional imaging and the resting human brain. Nat. Rev., Neurosci. 2, 685–694 (2001) 12. Kuo, W., Yeh, T.C., Duann, J.-R., Wu, Y.-T., Ho, L.-W., Hung, D., Tzeng, O.J.L., Hsieh, J.-C.: A left-lateralized network for reading Chinese words: a 3 T fMRI study. NeuroReport 12, 3997–4001 (2001) 13. Tan, L.H., Liu, H.-L., Perfetti, C.A., Spinks, J.A., Fox, P.T., Gao, J.-H.: The neural system underlying Chinese logograph reading. NeuroImage 13, 836–846 (2001) 14. Chee, M., Tan, E., Thiel, T.: Mandarin and English single word processing studies with functional magnetic resonance imaging. J. Neurosci. 19, 3050–3056 (1999) 15. Chee, M.W., Weekes, B., Lee, K.M., Soon, C.S., Schreiber, A., Hoon, J.J., Chee, M.: Overlap and dissociation of semantic processing of Chinese characters, English words, and pictures: evidence from fMRI. NeuroImage 12, 392–403 (2000) 16. Zhang, W., Feng, L.: Interhemispheric interaction affected by identification of Chinese characters. Brain and Cognition 39, 93–99 (1999) 17. Mathews, P.M., Adcock, J., Chen, Y., Fu, S., Devlin, J.T., Rushworth, M.F.S., Smith, S., Beckmann, C., Iversen, S.: Towards understanding language organization in the brain using fMRI. Human Brain Mapping 18, 239–247 (2003) 18. Courtney, S.M., Petit, L., Maisog, J.M., Ungeleider, L.G., Haxby, J.V.: An area specialized for spatial working memory in human frontal cortex. Science 279, 1347–1351 (1998) 19. Owen, A.M., Doyon, J., Petrides, M., Evans, A.C.: Planning and spatial-working memory: A positron emission tomography study in humans. Eur. J. Neurosci. 8, 353–364 (1996)
Computational Neural Model of the Bilingual Stroop Effect: An fMRI Study Hyo Woon Yoon Department of art therapy, Daegu Cyber University, Daegu, Korea
[email protected]
Abstract. A functional MRI was used to investigate the computational neuronal model of differential processing patterns of two languages (Korean as a mother language, L1 and English as L2) in the late Korean-English bilingual subjects during the performance of a Stroop task during overt production of words. The Stroop paradigm experiment was done separately in L1 and L2 and the imaging results of these different conditions were compared. In the case of L1, the activation of the bilateral anterior cingulate gyrus was observed among others. L1 related activation was also observed in middle frontal and inferior parietal lobule. In the case of L2, frontal and parietal as well as superior temporal activation was observed, but the absence of ACC activation was reported. This difference led to an argument that the differential information processing (automatization, inhibition control) mechanisms between L1 and L2. Keywords: automatization; inhibition control; anterior cingulate gyrus.
1 Introduction A number of studies on bilingual subjects have investigated the neural mechanisms associated with the processing of L1 and L2, especially with the aid of modern imaging techniques such as PET (Positron Emission Tomography) and fMRI (functional Magnetic Resonance Imaging). The focus of most of the studies was on whether the same or different locations of the brain are activated during the processing of L1 and L2. Their assumption was that the activation of the same site for L1 and L2 indicated that the same module was shared for the processing of both languages [1,2]. While another line of studies provide evidence for different cortical organization for L1 and L2 for late bilinguals [3,4,5] If the use of two languages would be explained in terms of the concepts of implicit and explicit memory, this model can also explain the reason why bilingual aphasic patients show paradoxical recovery. Even though the damage interferes with the implicit memory, the explicit knowledge, which relies on the different neural mechanism, could be used. In the case where L2 was recovered better than L1, it seemed to be related to the broader regions involved in the explicit knowledge of L2 could compensate the damaged language area [6,7]. The loss of L2, which has been repeatedly reported among the patients with Alzheimer’s disease, can generally be explained by the fact that they lost their declaritive memory mechanism at first. In the present study, the functional MRI was used to investigate the neural correlates of the language processing differences in T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 60–65, 2011. © Springer-Verlag Berlin Heidelberg 2011
Computational Neural Model of the Bilingual Stroop Effect: An fMRI Study
61
L1 and L2 during the performance of the Stroop task in late Korean-English bilingual subjects. The main focus of the study was to identify differences in the processing mechanisms between L1 and L2 during the performance of the Stroop task. It was thought that the inhibition of automatic processing is needed for the Stroop task in L1, whereas L2 processing should show less activity in such inhibition related cortical areas. Rather, L2 processing should correspond more to the activity in the memory related cortical regions, or those responsible for color perception.
2 Materials and Methods 2.1 Subjects Eighteen undergraduate students (9 males and 9 females with normal vision Mean=21.3, SD = 2.1, Range=19~26) participated in this study. All were native Korean speakers who had received English education in school for more than 6 years in school. They were right-handed, had no psychiatric history, and did not regularly take any medicine. All subjects consented to the protocol approved by the Institutional Ethics and Radiation Safety Committee. They were paid for participating in the study. 2.2 Experimental Design The independent variables of the experiment were divided largely into the kind of task (congruent, incongruent), stimulus language (Korean, English), and response language (Korean, English). The box-car design was used. The materials were presented using E-Prime. As stimuli, three Korean color words and three English color words (RED, GREEN, BLUE) were used. During the experiment, the participants were asked to loudly speak the colors of presented words in L1 and L2 as accurately and quickly as possible. There were four types of experimental tasks and each was repeated three times. In the experimental tasks ICKK and ICEE, subject spoke the color of the presented word using the same language (Korean-Korean, English-English), where the meaning of the word was incongruent with the color of the word. 2.3 Data Acquisition and Analysis Images were acquired by using 1.5 Tesla MRI scanner (Avanto, Siemens, Erlangen, Germany) with a quadrature head coil. Following a T1-weighted scout image, highresolution anatomic images were acquired using an MPRAGE (Magnetization-Prepared RApid Gradient Echo) sequence with TE = 3.7 ms, TR =8.1 ms, flip angle = 8°, and image size of 256 x 256. T2*-weighted functional data were acquired by using echo planar imaging (EPI) with TE = 37 ms, flip angle = 80°, TR = 3000 ms, and image size of 64 x 64. We obtained 30-slices EPI images with slice thickness of 5 mm and no gaps between slices for the whole brain. Total 172 volumes were acquired per an experimental run. For each participant, the first four volumes in each scan series were discarded, which were collected before magnetization reached equilibrium state.
62
H.W. Yoon
Image data were analyzed using SPM2 (Wellcome Department of Cognitive Neurology, London). The images of each subject were corrected for motion and realigned using the first scan of the block as a reference. T1 anatomical images were coregistered with the mean of the functional scans and then aligned to the SPM T1 template in the atlas space of Talairach and Tournoux [8]. Finally, the images were smoothed by applying Gaussian filter of 7 mm full-width at half-maximum (FWHM). In order to calculate contrasts, the SOA (stimulus onset asynchrony) from the protocol was defined as events and convolved with the hemodynamic response function (HRF) to specify the appropriate design matrix. The general linear model was used to analyze the smoothed signal at each voxel in brain. Significant changes in hemodynamic response for each subject and condition were assessed using t-statistics. For the group analysis, contrast images of single subject were analyzed using a random effect model. Activations were reported if they exceeded a threshold P < 0.05, corrected on the cluster level (P < 0.0001 uncorrected at the single voxel level). Significance on the cluster level was calculated in consideration of peak activation and extent of the cluster.
3 Results The imaging data of 18 subjects were analyzed. In order to observe the typical Stroop effect the incongruent condition (ICKK) was subtracted from the congruent condition (CKK). Bilateral anterior cingulate (BA 24/32), middle frontal gyrus (BA 9/10), left lentiform nucleus (putamen), inferior parietal lobule (BA 40), caudate, right precentral gyrus (BA 6), superior frontal gyrus (BA 6), and thalamus activation were observed (Table 2 & Figure 1). The direct comparison of L1 and L2 during Stroop task performance shows the following cortical activations. This comparison was done in order to find out the differential pattern of language processing in L1 and L2 (automatization, control inhibition). ICKK minus ICEE shows the activation of the bilateral putamen, thalamus, anterior cingulate (BA 32), middle frontal gyrus (BA 6/8/9), caudate, left posterior cingulate (BA 23), right inferior frontal gyrus (BA47), and middle temporal gyrus (BA 21). The inverted comparison, ICEE minus ICKK, shows the activation of the bilateral superior temporal gyrus (BA 38), parahippocampal gyrus (BA 30/36), precentral gyrus (BA 6), caudate, left fusiform gyrus (BA 36), cuneus (BA 17), right inferior parietal lobule (BA 40), medial frontal gyrus (BA 8), and insula (BA 13). The comparison of ICEK minus ICKK should confirm of the hypothesis that L2 word processing depends on the controlled processes, since the subjects should verbally respond in Korean in the ICEK condition despite the words being presented in English. This contrast shows the activation of the bilateral parahippocampal gyrus (BA 36), right insula (BA 13), superior temporal gyrus (BA 22), left posterior cingulate (BA 23), and caudate.
4 Discussion In order to identify the typical cortical activation areas during performance of the Stroop task, the incongruent condition was subtracted from the congruent conditions
Computational Neural Model of the Bilingual Stroop Effect: An fMRI Study
63
Fig. 1. Activation map of the contrast ICKK minus CKK in upper three images and ICEK minus ICKK in below three images. The differential activation patterns can be seen, especially in ACC.
for Korean use (ICKK minus CKK). The activation of the bilateral anterior cingulated (BA 24/32), middle frontal gyrus (BA 9/10), left lentiform nucleus (putamen), inferior parietal (BA 40), caudate, right precentral gyrus (BA 6), superior frontal gyrus (BA 6), and thalamus was observed in this contrast. In the Stroop task, the recognition of real color meaning is interfered with by the word, which does not correspond with the presented colors [9]. In general, the processing of word reading is regarded as more automatized than the color recognition [3], thus the successful performance of color recognition in the Stroop task requires the inhibition of the automatized processing of accompanied word reading. The capability of such inhibition is one of the characteristics of working memory executive functions. The executive function is known to be mostly related to the involvement of the brain area dorsolateral prefrontal cortex (DLPFC) and anterior cingulate cortex [10,11]. Especially anterior cingulate activation is responsible for the monitoring of such inhibition functions in the Stroop task [12]. A number of neuroimaging studies have reported commonly the activation of ACC during the Stroop task performance [13,14,15]. The activation of ACC and DLPFC in the present study was consistent with the results of previous studies, which indicated that these areas were important for task-relevant control of conflicting information or of competing sources of information. On the basis of literature evidences including the present paper’s results, the DLPFC and ACC are typical brain regions involved in the Stroop task performance, but involvement of the parietal region in performing the task has also been reported [14,15,1718]. Bench et al. (1993) [14] and Peterson et al. (1999) [17] reported inferior parietal activation during Stroop task performance, and they have interpreted this activation is related to the retaining
64
H.W. Yoon
attention, which is needed for task performance. Adleman et al. (2002) [16] reported that inferior parietal activation is also connected with the role in attending to the related features while inhibiting the unrelated features in the stimulus. The left inferior parietal activation in the present study seemed to be related to one of the general attention processing mechanisms demanded by other tasks such as the Simon task, which involves the inhibition of incongruent spatial features [17,19]. According to the previous literature evidences, brain activations of the thalamus, pulvinar, superior colliculus, posterior parietal regions, prefrontal regions, anterior cingulate cortex, and basal ganglia have been known to be the neural correlates of the attention processes [13,20]. The involvement of the thalamus among them in the case of attention processing is particularly meaningful, since this brain area seems to be related to the selection of input information for the further processing of attention. In a PET study by LaBerge & Buchsbaum (1990) [20], thalamus activation was more involved during the task of searching for a letter among 8 different items, compared to the simple recognition of one. They interpreted this to mean that thalamic activity was related to the selection process of input information to attend. In the present study, the cognitive load was larger in the case of the incongruent condition, compared to the congruent one. Therefore, the activation of the thalamus as well as basal ganglia (putamen, caudate) seemed to serve this attention process during the incongruent condition of Stroop task performance. The precentral gyrus plays a role involving sensoric and motoric processes. Pinel et al. (1999) [21] have postulated that the activation of this area is related to visual identification and appropriate response. Schneider & Chein (2003) [12] have postulated that the precentral gyrus is responsible for suppressing movement, which is needed for the reaction delay during interference tasks such as Stroop. The results of the present study seemed to be consistent with these previous evidences. The behavioral data in this study indicated that there were significant differences between congruent and incongruent conditions. This means that the incongruent task should be more difficult compared to the congruent and this difficulty should include the interference effect of task performance as well.
References 1. Klein, D., Milner, B., Zatorre, R.J., Zhao, V., Nikelski, J.: Cerebral organization in bilinguals: A PET study of Chinese–English verb generation. NeuroReport 10(13), 2841– 2846 (1999) 2. Hernandez, A.E., Martinez, A., Kohnert, K.: In search of the language switch: An fMRI study of picture naming in Spanish–English bilinguals. Brain and Language 73(3), 421– 431 (2000) 3. Cohen, J.D., Dunbar, K., McClelland, J.L.: On the control of automatic process: A parallel distributed processing account of the Stroop effect. Psychological Review 97(3), 332–361 (1990) 4. Perani, D., Dehaene, S., Grassi, F., Cohen, L., Cappa, S., Paulesu, E., Dupoux, E., Fazio, F., Mehler, J.: Brain processing of native and foreign languages. NeuroReport 7(15-17), 2439–2444 (1996) 5. Pallier, C., Dehaene, S., Poline, J.-B., LeBihan, D., Argenti, A.-M., Dupoux, E., Mehler, J.: Brain imaging of language plasticity in adopted adults: Can a second language replace the first? Cerebral Cortex 13(2), 155–161 (2003)
Computational Neural Model of the Bilingual Stroop Effect: An fMRI Study
65
6. Aglioti, S., Fabbro, F.: Paradoxical selective recovery in a bilingual aphasic following subcortical lesions. NeuroReport 4(12), 1359–1362 (1993) 7. Fabbro, F., Paradis, M.: Acquired aphasia in bilingual child. In: Paradis, M. (ed.) Aspects of bilingual aphasia, pp. 67–83. Pergamon Press, Oxford (1995) 8. Talairach, J., Tournoux, P.: Co-planar stereotaxic atlas of the human brain. Thime, New York (1988) 9. MacLeod, C.M.: Half a century of research on the Stroop effect: An integrative review. Psychological Bulletin 109(2), 163–203 (1991) 10. Roberts, A.C., Robbins, T.W., Weiskrantz, L.: The prefrontal cortex: Executive and cognitive functions. Oxford University Press, Oxford (1998) 11. Stuss, D.T., Shallice, T., Alexander, M.P., Picton, T.W.: A multidisciplinary approach to anterior attentional functions. Annals of the New York Academy of Science 769(11), 191– 211 (1995) 12. Schneider, W., Chein, J.M.: Controlled & automatic processing: Behavior, theory, and biological mechanisms. Cognitive Science 27(3), 525–559 (2003) 13. Banich, M.T., Milham, M.P., Atchley, R., Cohen, N.J., Webb, A., Wszalek, T., Kramer, A.F., Liang, Z.P., Wright, A., Shenker, J., Magin, R.: fMRI studies of Stroop tasks reveal unique roles of anterior and posterior brain systems in attentional selection. Journal of Cognitive Neuroscience 12(6), 988–1000 (2000) 14. Bench, C.J., Frith, C.D., Grasby, P.M., Friston, K.J., Paulesu, E., Frackowiak, R.S.J., Dolan, R.J.: Investigations of the functional anatomy of attention using the Stroop test. Neuropsychologia 31(9), 907–922 (1993) 15. Pardo, J.V., Pardo, P.J., Janer, K.W., Raichle, M.E.: The anterior cingulate cortex mediates processing selection in the Stroop attentional conflict paradigm. Proceedings of the National Academy of Sciences of the USA 87(1), 256–259 (1990) 16. Adleman, N.E., Menon, V., Blasey, C.M., White, C.D., Warsofsky, I.S., Glover, G.H., Reiss, A.L.: A developmental fMRI study of the Stroop color-word task. NeuroImage 16(1), 61–75 (2002) 17. Peterson, B.S., Kane, M.J., Alexander, G.M., Lacadie, C., Skudlarski, P., Leung, H.C., May, J., Gore, J.C.: An event-related functional MRI study comparing interference effects in the Simon and Stroop tasks. Cognitive Brain Research 13(3), 427–440 (2002) 18. Taylor, S.F., Kornblum, S., Lauber, E.J., Minoshima, S., Koeppe, R.A.: Isolation of specific interference processing in the Stroop task: PET activation studies. NeuroImage 6(2), 81–92 (1997) 19. Fink, G.R., Dolan, R.J., Halligan, P.W., Marshall, J.C., Frith, C.D.: Space-based and object-based visual attention: Shared and specific neural domains. Brain 120(11), 2013– 2028 (1997) 20. LaBerge, D., Buchsbaum, M.S.: Positron emission tomographic measurements of pulvinar activity during an attention task. Journal of Neuroscience 10(2), 613–619 (1990) 21. Pinel, P., Le Clec’H, G., van de Moortele, P.F., Naccache, L., Le Bihan, D., Dehaene, S.: Event-related fMRI analysis of the cerebral circuit for number comparison. NeuroReport 10(7), 1473–1479 (1999)
The Eye Movement and Data Processing Due to Obtained BOLD Signal in V1 : A Study of Simultaneous Measurement of EOG and fMRI Hyo Woon Yoon1,*, Dong-Hwa Kim2, Young Jae Lee3, Hyun-Chang Lee4, and Ji-Hyang Lim1 1
Department of Art Therapy, Daegu Cyber University, Daegu, Republic of Korea Tel.: +82-53-850-4081, Fax: +82-53-850-4019
[email protected] 2 Department of Electrical Engineering, Hanbat University, Daejeon, Republic of Korea 3 Department of Multimedia, Jeonju University, Jeonju, Republic of Korea 4 Division of Information and Electronic Commerce, Wonkwang University, Iksan, Republic of Korea
Abstract. The simultaneous measurement of EOG and fMRI has been done in the present study in order to investigate the influence of eye movement (blinking mechanism) on the functional magnetic resonance imaging (fMRI) signal response in the primary visual cortex. The conventional Echo-Planar Imaging (EPI, T2*-weighted) with concurrent electrooculogram (EOG) was obtained in four subjects while they viewed a fixation point and a checkerboard with a flickering rate of 8Hz. The division of two different conditions in the whole experimental blocks can be done with the help of EOG information: fixation and moving eye. The fMRI data comparison of these two conditions has been achieved. The results of this study indicated that there are no differential signal changes between these two conditions. This suggests that eye blinking does not affect BOLD signal changes in the primary visual cortex. In terms of data processing the eye blinking can be ignored according to the results of the present study. Keywords: simultaneous measurement of fMRI and EEG, Artifact correction.
1 Introduction For measuring correlates of neuronal activation in the brain several methodological approaches are currently used, e.g., Positron Emission Tomography (PET) or functional magnetic resonance imaging (fMRI). One of the preferred methods for this purpose is functional MRI which can detect the increased blood flow related to neuronal activity with relatively high spatial resolution. T2*-weighted MR imaging can reveal changes in blood oxygenation in activated brain areas [1]. Fast imaging sequences such as echo-planar imaging (EPI) can capture stimulus-evoked transient *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 66–71, 2011. © Springer-Verlag Berlin Heidelberg 2011
The Eye Movement and Data Processing Due to Obtained BOLD Signal in V1
67
changes in blood oxygenation level-dependent (BOLD) contrast, which likely reflects hemodynamic responses [2-4]. The majority of fMRI studies are based on visual stimuli and eye movement could influence signal intensity of functional MR data, especially in the primary visual cortex. It has been reported that local eye movements influence whole-head motion correction procedures, resulting in inaccurate movement parameters and potentially lowering the activation detection sensitivity [5]. Eye movement also affected the magnitude of fMRI response in the extrastriate cortex during visual motion perception [6] and eye blinking in a dark environment evoked primary visual cortex’s activation [7]. In this study, we have measured electrooculogram (EOG) data simultaneously with EPI data acquisition to detect the eye blinking signal while visual stimuli were being presented. The aim of our study is to find out if there are differences or correlations in the activation patterns of the primary visual cortex between a large and small frequency of eye blinking. However, hemodynamic response is only indirectly linked to the energy consumption of the neural population, and takes place on a timescale of more than 3 seconds after giving stimulus input. In order to compensate for temporal resolution and indirect measurement of neural activity, a direct measuring method of the electrical activity of neurons has been suggested. This can be done with using electroencephalogram (EEG) and this method features the millisecond timescale, which might represent underlying cognitive processes. The combination of EEG and fMRI or the simultaneous recording of the EEG and fMRI might be a promising tool for functional brain mapping, which provides physiological characteristics with both high spatial and temporal resolution. In the present study, we aimed to complete the experiment using this combination. In particular, the acquisition of EOG is done for measuring the frequency of eye blinking in each subject.
2 Materials and Methods 2.1 Subjects Four right-handed healthy volunteers (mean age of 26 years with SD of 1.2 years, all males) participated as subjects in this study after giving their written informed consent. They were free of any neurological antecedent and had good vision. Before starting the scanning session, subjects put on the cap for recording EEG. 2.2 EEG (Electroencephalogram) Recording A commercially available MR-compatible system (Brain Amp MR, Brain Products GmbH, Germany) was used for EEG recording and used along with a speciallydesigned electrode cap (BrainCap-MRI). The electrode cap contains 32 EEG channels and three additional channels dedicated to electocardiogram (ECG) and electrooculogram (EOG) acquisition. The amplifier was designed to be placed inside the magnet bore of the scanner and was connected to the host computer outside the MR room via a fiber optic cable. The resolution and dynamic range of the amplifier was 100nV and ±3.2 mV, respectively. The EEG and EOG waveforms were recorded
68
H.W. Yoon et al.
with a sampling rate of 500 samples/s. Band pass filtering from 0.5 to 80 Hz was applied along with 60 Hz notch filtering. We recorded EOG in order to monitor the subject’s blinking and eye movement. The gradient-induced artifacts on EEG signal were removed by digital filtering (8-10). Detecting the blink signal on the EEG recording monitored eye movements. 2.3 Experiment Design Visual stimuli were produced using custom designed software. Stimuli were presented with the aid of a video projecting on a screen to the subjects via a mirror. In experimental sessions visual stimuli had concentric circles shape. The fixation point at the center was static, while the surrounding field was flickered. These were circle type checkerboards which had a flickering rate of 8 Hz. Stimuli were presented to subjects with a viewing angle of 13° vertically, 17° horizontally and a viewing distance 5 m. During the experiment, subjects performed 24 blocks of condition with a duration of 30 sec alternating with visual center point fixation. Images were acquired during three experimental conditions (natural eye movement, fixed eye, and rest). During the rest condition, subjects were asked to look at the fixation point. During natural eye movement’s condition, they also should look at the fixation point and simultaneously the flickering checkerboard’s pattern which was being presented. During the fixed eye condition, subjects were asked to keep eyes open and try to not to blink their eyes during stimuli presentation. Checkerboard stimuli were presented as the natural eye movement condition. 2.4 fMRI Scanning fMRI data was acquired with a 3 Tesla MR scanner (Varian Console, built up by ISOL Tech., Korea). Echo-planar images (128x128 matrix, over a 160mm field of view) consisted of 15 consecutive axial sections (3mm thickness, no gap, repetition time/echo time = 3000/37 milliseconds, flip angle 70). An experiment session consisted of 24 blocks of 30 seconds. The remaining 240 sec (40 volumes) were analyzed. Using a quadrature head coil (anatomic scan), high-resolution anatomic images were acquired using an MPRAGE sequence (echo time TE = 3.7 ms, TR = 8.1 ms, flip angle = 8°, FOV = 256 x 256 mm). 2.5 Data Analysis Before scanning, we recorded each subject’s EOG reference data and measured the amplitude, shape, and rate of reference data for detecting the eye’s movement and blinking. During functional MR imaging, the switching of gradient magnetic field induced a series of artifacts into EEG whose amplitudes are 10 to 100 times lager than that of the EEG, data which was measured outside the MR-room. This makes the monitoring of EOG waveforms difficult when MR imaging is being simultaneously performed. The correction method of gradient-induced artifact is to average the intervals in which the gradient changes of the scanner have taken place. The averaged gradient-induced artifact is then subtracted from the original EEG data in the affected intervals. The gradient-induced artifacts are generally not completely eliminated in
The Eye Movement and Data Processing Due to Obtained BOLD Signal in V1
69
this way. For this reason, different filters then were applied to the data in the corrected ranges. After gradient-induced artifacts on recorded EOG were removed by these methods [2], we detected the blink signal being similar to the each subject’s reference EOG signal which has over 150μV and about 2~4 Hz frequency. The EOG signals typically showed sustained spike wave during eye blinking as shown in Figure 1(C).
Fig. 1. The Electrooculography. The top is reference data, middle is acquired during fMRI scan and Bottom is removed gradient-induced artifacts on the middle. Calibration = 1 second, 1500μV.
To analyze fMRI data in adequate blocks, the first slice scan time within a volume was used as a reference for alignment by linear interpolation of the following slice of that volume to correct for the temporal slice scan time shifts. Data analysis and visualization were performed with the fMRI software package Brain Voyager (Brain Innovation, Maastricht, The Netherlands) [11]. Before the main analysis the following processing steps were carried out 1) motion correction, 2) spatial smoothing of EPI images with full width at half-maximum of 4 mm, 3) transformation into Talairach [12] coordinate space. The cortical sheets of individual subjects and a template brain were reconstructed as polygon meshes based on high-resolution T1-weighted structural three-dimensional recording. The white-gray matter boundary was segmented, reconstructed, smoothed, and morphed. A morphed surface always possesses a link to the folded reference mesh so that functional data can be correctly projected onto partially inflated representations. The ON and OFF phase were used for linear regression analysis of the BOLD signal time course. Using an empirically founded model [13] of the temporal dynamics of the fMRI signal, hemodynamic predictors were computed from the ON and OFF phase and a general linear model (GLM) was computed for every voxel.
70
H.W. Yoon et al.
3 Results As shown in Figure 2, the results of one of four subjects indicates that the activation of primary visual cortex for both the fixed eye (FE) and natural eye movement (ME) conditions was observed. The indicated activation area was observed in the other three subjects. The averaged blinking frequency, which was detected by EOG, for the FE is condition 0.8 and 2.9 for the ME condition. Even though a larger blinking frequency can be observed by the ME condition compared to FE condition, the activation of the contrast ‘ME minus FE’, or ‘FE minus ME’ indicates that there were very few significantly activated voxels.
Fig. 2. Processing results of blinking eye (BE)’s condition. (A) is activated voxel on ‘FE-FP’s contrast (p < 3.66E-34). (B) is time course of MR signal intensity as a function of time in activated area. Green, blue, and red columns signify FE, BE, and FP’s stimulation periods. The signal look like typical BOLD signal in all of BE (red columns) and FE (blue columns)’s stimulation periods.
We proceed to correlation and consider that each data point consists of two measurements, the averaged blinking and BOLD-signal time course data in each block of the FE and ME conditions. The correlation between those two values is given by the covariance and variance of the two conditions’ blocks. Correlation is dimensionless and takes values limited to the range of –1 to +1. As we observe, correlation between the number of blinks and the average MR-time course data are smaller. Therefore, eye blinking doesn’t affect primary visual cortex activity in our experiment.
4 Discussion Eye movements, both voluntary and involuntary, are the main focuses of this study. Due to their intrinsic nature, they can be possible sources of artifact and confound when interpreting functional MRI data. The blinking process is reported to be controlled in the orbitofrontal and visual cortices, including the anterior portion of the visual cortex and the primary visual cortex [7]. Local eye movements influence whole-head motion correction procedures, resulting in inaccurate movement parameters and potentially lowering the sensitivity to detect activations [5]. For this reason, during fMRI
The Eye Movement and Data Processing Due to Obtained BOLD Signal in V1
71
experimentation, infrared visible light video cameras or fMRI echo planar image data can be used for monitoring eye movement [14, 15]. While these methods appear simple to realize at first view, it is unfortunately not so simple to realize for technical reasons (i.e., synchronization of eye movement with pulse sequences and limited echo planar imaging time). Based on this, an MR-compatible EEG recorder might be a promising tool for monitoring the eye movement. Our results indicate that eye blinking doesn’t affect primary visual cortex and shows that the general fMRI study need not consider the effect of the eye blinking mechanism in data processing. It appears to be enough to instruct the subject to fix their eye on the stimuli in a general fMRI study.
References 1. Ogawa, Lee, T.M., Kay, A.R., Tank, D.W., et al.: Brain magnetic resonance imaging with contrast dependent on blood oxygenation. Proc. Natl. Acad. Sci. 87(24), 9868–9872 (1990) 2. Bandettini, Wong, E.C., Hinks, R.S., Tikofsky, R.S., Hyde, J.S.: Time course EPI of human brain function during task activation. Magn. Reson. Med. 25(2), 390–397 (1992) 3. Ernst, Hennig, J.: Observation of a fast response in functional MR. Magn. Reson. Med. 32(1), 146–149 (1994) 4. Menon, Ogawa, S., Hu, X., Strupp, J.P., Anderson, P., Uğurbil, K.: BOLD based functional MRI at 4 Tesla includes a capillary bed contribution: echo-planar imaging correlates with previous optical imaging using intrinsic signals. Magn. Reson. Med. 33(3), 453–459 (1995) 5. Tregellas, Tanabe, J.L., Miller, D.E., Freedman, R.: Monitoring eye movements during fMRI tasks with echo planar images. Hum. Brain Mapp. 17(4), 237–243 (2002) 6. Freitag, Greenlee, M.W., Lacina, T., Scheffler, K., Radü, E.W.: Effect of eye movements on the magnitude of functional magnetic resonance imaging responses in extrastriate cortex during visual motion perception. Exp. Brain Res. 119(4), 409–414 (1998) 7. Tsubota, Kwong, K.K., Lee, T.Y., Nakamura, J., Cheng, H.M.: Functional MRI of brain activation by eye blinking. Exp. Eye Res. 69(1), 1–7 (1999) 8. Allen, Polizzi, G., Krakow, K., Fish, D.R., Lemieux, L.: Identification of EEG events in the MR scanner: the problem of pulse artifact and a method for its subtraction. Neuroimage 8(3), 229–239 (1998) 9. Kruggel, Wiggins, C.J., Herrmann, C.S., von Cramon, D.Y.: Recording of the eventrelated potentials during functional MRI at 3.0 Tesla field strength. Magn. Reson. Med. 44(2), 277–282 (2000) 10. Allen, Josephs, O., Turner, R.: A method for removing imaging artifact from continuous EEG recorded during functional MRI. Neuroimage 12(2), 230–239 (2000) 11. Muckli, Kriegeskorte, N., Lanfermann, H., Zanella, F.E., Singer, W., Goebel, R.: Apparent motion: event-related functional magnetic resonance imaging of perceptual switches and States. J. Neurosci. 22(RC219), 1–5 (2002) 12. Talairach, Tournoux, P.: Co-Planar Stereotaxic Atlas of the Human Brain. Thieme Medical Publishers, New York (1988) 13. Boynton, Engel, S.A., Glover, G.H., Heeger, D.J.: Linear systems analysis of functional magnetic resonance imaging in human V1. J. Neurosci. 16(13), 4207–4221 (1996) 14. Kimming, Greenlee, M.W., Huethe, F., Mergner, T.: MR-Eyetracker: a new method for eye movement recording in functional magnetic resonance imaging. Exp. Brain Res. 126(3), 443–449 (1999) 15. Beauchamp: Detection of eye movements from fMRI data. Magn. Reson. Med. 49(2), 376–380 (2003)
Face Tracking for Augmented Reality Game Interface and Brand Placement Yong Jae Lee1 and Young Jae Lee2,* 1
Tongmyong University
[email protected] 2 College of Culture and Creative Industry, Jeonju University, Korea
[email protected]
Abstract. This paper proposes the AR game interface which is more faster and emotive by using an intelligent autonomous agent. As the operation of AR game has designed in accordance gamer’s face tracking, this study applied the movements on game. Since the nature of the game requires the real-time interaction, CBCH algorithm has been selected for face recognition. In case of failed face tracking, the interface agent has been used to provide the gamer with the sound information to be helped with situation perception. Furthermore, retracking has been enabled so that the proposed algorithm could help the gamer to be able to effectively react to the attack. This paper also looked at the design for the new beneficiary model for the game industry through interdisciplinary research between game and advertising. In conclusion, application to the 3D ping-pong game brought about effective and powerful results. The proposed algorithm might be used as fundamental data for developing the AR game interface. Keywords: AR game interface, face tracking, agent, 3D ping-pong game.
1 Introduction With advanced technologies in computer hardware and software, technologies involving game production and game interface are rapidly developing. Recently various and natural interfaces which do not use keyboard or mouse have been developed and in use. Among these, experience-based interface using gamer's motions is in wide application for AR or functional games [1-11]. The AR game allows the gamer to enjoy the exercising effect since it is based on the motions of the body. This kind of game has the advantage in that the gamer could stretch the stiff body after hours of computer use while enjoying the game [1]. However, most of the games which have now been developed for PC use are designed to use limited hardware such as keyboard, mouse and joystick, which is the main cause for confinement to limited space and less sense of reality. Other interface *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 72–78, 2011. © Springer-Verlag Berlin Heidelberg 2011
Face Tracking for Augmented Reality Game Interface and Brand Placement
73
gadgets such as haptic gloves or glasses and HMD interface have also their own drawbacks several ways: they cause discomfort since they have to be worn on the user's body; they use wires which can bother free movements of the user; they are expensive to be bought for the purpose of playing the game. [1] If we can only use human movements to control the game without the need for keyboard or sensormounted gadgets, we could make the computer user-friendly with more human interface, not to mention enhancing the sense of reality for the game. [2] Thus, for the purpose of this research 3D ping pong game has been developed with no other interfaces except for the webcam and the paper proposes the game interface using gamer's face recognition. When face recognition fails, the interface agent appears and tells the gamer about the current situation with accompanying message. The algorithm for re-recognizing the face has also been proposed and the following experiments proved its performance and validity.
2 AR Games and Agent AR(Augmented Reality) focuses on enhancing fun through the interactive experiences within the environment which combines the elements of the virtual reality and the real-world. There has been steady attempts to introduce the augmented reality into the game technology, as we can see in the examples of bowling game and car racing on the table. "Eye of the Judgement", the card game for PlayStation 3 by Sony is the AR game released in 2008, which shows 3D virtual characters on the table by using card recognition technology[2-6]. There is also a game which is played interactively in a limited space with many markers installed on the ceiling recognizing the hands of the user [1]. Interface agent is also called user interface. Interface agent is the agent which is autonomous and with reinforced learning ability designed to provide more convenient computer-using environment. Interface agent also presents more familiarity to the computer user by introducing human or animal images. Through technologies such as 3D animation, sound synthesis and lip animation, the interface agent provides the environment in which the user might feel as if he or she is conversing with other people or personified animals while using the computer. Peedy of MS and Genie of Argo are the examples. Peedy, the Microsoft agent character, can understand human speech and react to it. Multimodal agent interfaces provide the user with the environment enabling easy input [7-8].
Fig. 1. Eye of Judgement of SONY Corp
74
Y.J. Lee and Y.J. Lee
3 Proposition of the Algorithm and Experiments 3.1 Proposition of the Algorithm CBCH(Cascade of Boosted Classifier Working with Haar-like Feature Algorithm)[911], which is proposed here, works like this. The image input through the webcam is recognized by using CBCH and the coordinate values of the face is changed into the coordinate values for the game interface. In the 3D ping-pong game presented here, the bars for offense and defense are matched with the coordinates and the bar movements are controlled by the face movements. The collision sensing algorithm can detect the collision of the ball and the bar whose action and reaction is to be realized in consideration of the directions of the movement vector. If the face recognition fails in the next frame, the agent notifies the gamer of the situation with sound and text more than once. 3.2 Ping-pong Game For the purpose of the research, the 3D ping-pong game has been custom-produced by using Irrlicht engine, Visual C++, and SAPI(speech engine). It is to be noted that the agent has been used to notify the gamer about the situation regarding face recognition. The background of the game is 800 * 600 sized and 32-bit graphics. Two bars for the offense are based on the faces of the gamer 1 and the gamer 2 respectively. To make the ball three-dimensional, we used the cube scene node and the spare scene node functions of the Irrlicht engine. We made the ball and the bars spin toward the y axis to enhance the sense of reality and to attract the attention of the gamer respectively. Square collision method has been selected for the collision algorithm of the bars and the ball since it assures rapid detection. Background music and sound effect have been added for the game immersion effect. The moment of the collision has been given special auditory effect to emphasize that the 'spinning' ball hits the bar. The agent used for the game is Genie of Argo(Figure 3.) 3.3 Experiment 1 Frame Design The Experiment 1 is about the frame design for the game image using the webcam image and the resulting image has been checked. In Figure 2, we modified the coordinates for the webcam and the game image by using the constant values of alpha and beta and tested the locations on Experiment 1 image. To apply the designed frame in the game, Experiment 1 tested the correspondence between the bar and the face interface. Fig. 3(b) shows the resulting movement of the bar to the corresponding left bottom on the game scene after we had moved the face(game interface)to the left bottom in Fig. 3(a). In Fig. 3(c), the face(game interface) moves to the right top which makes the bar in Fig. 3(d) move to the corresponding right top on the game scene. The two cases confirm that the movement of the face triggers the reaction of the bar, which moves to the corresponding location on the game scene. Therefore, the Experiment 1 verifies that gamer's face could be used as the valid game interface.
Face Tracking for Augmented Reality Game Interface and Brand Placement
75
Fig. 2. Frame Image
(a)
(b)
(c)
(d) Fig. 3. Experiment 1 Image
3.4 Experiment 2 Experiment 2 is to identify the function of the agent in notifying the gamer when CBCH face recognition algorithm fails. In Fig. 3(a), the face recognition fails and the Genie in Fig. 3(b) appears and tells, with the accompanying text, the gamer to adjust. The gamer, upon hearing the message, adjusts the face(the game interface) and moves the face(Fig. 3(c)) to defend not to lose score(Fig. 3(d)). Fig. 3(e) shows defense and offense and Fig. 3(f) is the resulting image. As shown, the role of the agent is very important in scoring and defending. 3.5 Experiment 3. Brand Placement on Game Experiment 3 showed the possibilities of interdisciplinary studies between game and advertising by placing particular brand on the AR game interface. Recently, interdisciplinary studies are drawing scholar’s attention in many fields. That's why the
76
Y.J. Lee and Y.J. Lee
(a)
(b)
(c)
(d)
(e)
(f) Fig. 3. Experiment 2 Image
convergence is getting more important concept in multimedia. In game industry, a new beneficiary model such as brand placement can be a successful way to guarantee their business. And also, prior researches have found out the synergy effects could be maximized by integrating these two academic fields. In other words, advertising tools including brand placement can be developed as an innovative beneficiary model for game industry. In summary, brand exposure on game can serve as an opportunity to raise the brand awareness and preference for the sponsor and the game company can reap profits. It can also raise the enjoyment level for the game user, making its realization highly probable.
Face Tracking for Augmented Reality Game Interface and Brand Placement
77
Fig. 4. Experiment 3 Image
4 Conclusion In this paper, there has been attempt through experiments to apply information regarding gamer's face recognition to the 3D ping-pong game, to find out about the performance of the interface functions. In case of failed face recognition, we used the agent to notify the gamer with speech and text of the current situation. The results of the experiments showed that the effective reaction of the gamer was possible. And we could also realize various roles of the agent through speech and text which include ordering tactics, expressing emotions, cheering and etc. This paper also looked at the design for the new beneficiary model for the game industry through interdisciplinary research between game and advertising. In conclusion, this research verifies that this kind of approach is conducive to increasing fun factor as well as providing extra assistance to the gamer.
References 1. Kim, K.Y., et al.: ARPushPush: Augmented Reality Game in Indoor Environment. In: KHCI, pp. 354–359 (2005) 2. Handheld Augmented Reality Game System Using Dynamic Environment Kang, Won 3. http://www.comp.dit.ie/bmacnamee/papers/MixedRealityGames.pdf 4. http://www.eyeofjudgement.com 5. http://www.google.co.kr/search?hl=ko&newwindow=1&complete=1& q=Eye+of+Judgement&btnG=%EA%B2%80% 20%20%20%20%20%20%20%25%20EC 6. Hyung: Handheld Augmented Reality Game System Using Dynamic Environment. KAIST, Thesis for Master’s Degree (2007) 7. Lee, Y.J., Lee, Y.J.: Interface of Augmented Reality Game Using Face Tracking and Its Application to Advertising. Security-Enriched Urban Computing and Smart Grid Communications in Computer and Information Science 78, 614–620 (2010)
78
Y.J. Lee and Y.J. Lee
8. Lee, Y.J., Lee, Y.J.: The Application of Agent and Advertising in 3D Sports and Game. Journal of the Korea Institute of Marine Information and Communication Sciences 14(10), 2269–2276 (2010) 9. Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conf. on Computer Vision and Pattern Recognition, Kauai, Hawaii, USA (2001) 10. http://www.lienhart.de/ICIP2002.pdf 11. http://cafe.naver.com/opencv.cafe?iframe_url=/ ArticleRead.nhn%3Farticleid=1328
On-line and Mobile Delivery Data Management for Enhancing Customer Services Hyun-Chang Lee1, Seong Yoon Shin2, and Yang Won Rhee2 1
Division of Information and e-Commerce, Wonkwang University 344-5, Sinyoung-Dong, Iksan-Si, Jeonbuk, Korea
[email protected] http://www.wku.ac.kr 2 Department of Computer Information Eng., Kunsan University {s3397220,ywrhee}@kunsan.ac.kr http://www.kunsan.ac.kr
Abstract. With the growing trend toward the use of supply chain and ecommerce, logistic service providers for product warehousing, transportation and delivery are placing great emphasis on information technology (IT) to be competitive globally. Realizing the current service tracking system merely supports order status tracking within a service provider, applies mobile agent technology for customer satisfaction index through online delivery tracking across the logistic alliances. Therefore, in this paper, we propose a system that utilizes three-tier architecture for mobile agent technology and develops a prototype system for logistic delivery service tracking to satisfy customers who want to know the location of their things in detail. Also, we demonstrate the concept and technology proposed with mobile and internet. The online service tracking services enable customers to monitor the real-time status of their service requests through internet and therefore becomes key tool for modern enterprises to compete successfully in a global marketplace.
1 Introduction In a context of growing market globalization, rapid diffusion of information and communication technologies and increasingly widespread de-localization of manufacturing activities, product mobility is preferred to their accumulation in warehouses, involving a continuous flow of materials through the supply chain [1]. Therefore, firms have to supply distant markets from their own warehouses and plants with always-greater frequency. This also involves relevant expenditures. [2] for instance states that transportation costs on average impact for about 50 percent of total logistic costs. This requirement will become even more critical with the progressive diffusion of e-commerce activities [4]. The technique for customer contentment enhances needs also from the perspective of e-commerce activation. Such factors make the distribution logistics increasingly important and often critical for competitiveness of companies [7] justifying major efforts to reduce logistic T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 79–89, 2011. © Springer-Verlag Berlin Heidelberg 2011
80
H.-C. Lee, S.Y. Shin, and Y.W. Rhee
expenditures to the greatest extent possible in order to delivery goods at a reduced price and to enhance a customer contentment to customers. According to statistics from Council of Logistics Management, 20 to 30 percent of total production costs are directly attributed to distribution and logistics management [8]. Investment in efficient distribution and logistics management systems can considerably improve enterprise competitiveness. Further, logistics management can significantly affect the efficiency of production, distribution and the quality of total customer services. Distribution and logistics management consist of a series of activities including warehousing, order manipulation, goods picking and dispatching, transportation and inventory control [9]. Since a vast amount of information is generated across the various logistic activities and participants, efficiency and quality of enterprise logistic services are hard to monitor, control and also provide with efficient and high-quality customer services. Moreover, since many participants such as freighters, distribution centers, manufacturers, and distributors are involved in the logistic service industry, in this paper we present them with a model to enhance customer services by providing a part containing real-time monitor function and show the prototype system to implement for service enhancement to customers. The rest of this paper is organized as follows. In section 2, a short review of on-line service tracking and logistics management is discussed. We explain a logistic service system analysis and design in Section 3. In Section 4, we present our technical prototype system for enhancing customer services. Finally, we conclude with some comments in section 5.
2 Logistics Management of Customer Service Perspective This paper proposes an approach for customer service enhancement through online global logistic services using mobile agents. In this section, the literature covering online service tracking, agent-based techniques and logistics management is reviewed to formulate the research questions and basis for the approach. 2.1 Online Service Tracking Along with the tremendous development in industrial engineering and management applications, including supply chain management (SCM), customer relationship management (CRM), and global logistics, various computer-aided applications have been developed to assist implementation enterprises. The online logistic service tracking system is one of the more effective solutions that have been developed to support efficient customer service response. In the competitive global market, enterprises should efficiently respond to customer requests to gain market advantage. Service status tracking is the fundamental offer to provide customers a means to realize the status of their requests and to anticipate and plan actions. For a manufacturer down stream in the supply chain, this service provides real-time information that enhances the effectiveness of raw material planning and scheduling.
On-Line and Mobile Delivery Data Management for Enhancing Customer Services
81
Since the service tracking system provides the order and delivery status of the products and services, users of the system can make decisions based on the actual status. Unlike the traditional approach, the Internet-based technique including wireless environment has the advantage that information exchange and transmission are not geographically restricted. Realizing the importance of efficient response to customers, traditional, non-Internet-based approaches for business transaction and communication have gradually been replaced by Internet means [10]. With the development of the Internet and wireless technique, numerous service tracking systems have gone online. Though users can easily access real-time status information via Web -based service tracking systems, most tracking systems cannot confirm an accurate place where his or her material is after departure. Therefore, this research aims at developing an Internet and Wireless-based logistic service tracking system for efficient feedback of service status. 2.2 Logistics Management The distribution center does not only take charge of a role as a hub for the storage, transportation to customers and logistic service providers, but also play a part as a control center of the distribution as well as the management of all of the logistics for the successful operations [13]. The well-known logistics companies such as UPS and FedEx have better organized distribution center than others. Furthermore, because the efficient delivery from the supplier to the customer relies on the information management of transportation and warehousing and so on, the real time service system like a RSSOL must have a control algorithm of the data from the diversified logistic activities and show the accurate position of customer items to enhance customer services.
3 Design and Analysis for Real-Time Delivery Service In this section, the business analysis, architecture and functions of the proposed logistic service tracking system are discussed. The objectives of the proposed approach are to provide customers with an intelligent mechanism to track the service. The system is designed for the following two participants to meet their operational requirements and to enhance their business competitiveness: (1) Customers. Expect their requests for products or services from distributors and manufacturers to be fulfilled on time. (2) Logistic service providers. Provide logistic services with a range of information support provided to buyers in order to enhance the business efficiency of the customers. 3.1 Coordination Model Analysis A general procurement processes within a supply chain are described as shown in figure 1[13]. At first, a consumer places an order on the retailer and then the retailer may request the logistic service provider to deliver the goods, and vice versa in the delivery process from the logistic service provider to retailer, finally to the consumer.
82
H.-C. Lee, S.Y. Shin, and Y.W. Rhee
As shown in figure 1, the transportation plays an essential role in distributing and delivering the merchandise to the consumers, the retailers, the manufacturers and the raw material suppliers. However, in case of the international commerce including ecommerce, the delivery must take advantage of intermediary distribution centers and overseas service providers of the logistics between suppliers and foreign customers in addition to the domestic logistic services.
Fig. 1. Configuration of the Supply Chain
Fig. 2. Three-tire architecture for supply chain
On-Line and Mobile Delivery Data Management for Enhancing Customer Services
83
In the delivery process from an order to the customer, the real time localization of customer’s deliver items are critical for the satisfied services and more efficient Logistic Services than before. Therefore, in this paper, we propose a RSSOL prototype system using the facility of logistics. 3.2 System Design The configuration of the RSSOL in figure 2 shows three-tire architecture of three parts: the agent center as a service-tracking kernel, the user interactive and real time logistic service and the delivery operation center as a carrier. The RSSOL consists of three parts proposed in figure 2 and we explain two points of view to enhance customer services as follows: At the customer side: 1. 2.
Ordering phase: the customer can order through the internet after logging on. Delivery status tracking phase: - Log on internet, and then - Lists of the serviced items with delivers charging of the items are displayed on the web page. - A customer can search for the location of a deliverer by pushing the last specific hyperlink text. - The delivered status of the items is displayed at the bottom side of the screen. - The customers can estimate the arrival time by confirming the location of the deliverer on the internet, which is one of the enhanced services, and finding algorithm of the short cut path.
At the Internet service provider (ISP) side: 1. 2. 3. 4.
A deliverer decides which item to deliver the next based on the displayed data. The deliverer logs on the server using mobile device and update the items data after delivering, also one of the services which make customer take services. The deliverer location and item is updated simultaneously by sending mobile information with signature of the customer to the server after step 2. If any items to be delivered are remained, then continue from 2 to 3, or finish LSP side activity.
4 Implementation and Evaluation of RSSOL In this section, we verify the superiority of the proposed method by implementation and evaluation. 4.1 Environment for Delivering To implement the RSSOL to enhance customer services, we first look into the environment of delivering system through postal operations. The addressing system of
84
H.-C. Lee, S.Y. Shin, and Y.W. Rhee
Korea is composed of rather complex than that of other countries. The addressing system of Korea is configured as shown in figure 3(a). We can also mention that the addressing system is similar to other countries. For example, 4 or at most 5 items (nation, state, city, street, address) among items in figure 3(a) are enough information to deliver in the addressing system of America. Nonetheless, ISPs only use the information of city or station level to deliver any product or things to customers. This means that customers through internet can get the information of their products delivered till the city or station level of that city although street information exist. Therefore, in this paper, we emphasize and use that information to enhance customer services. It is also one of main contributions of the paper. For instance, a customer who does not receive his or her product yet can not estimate time to receive the product, whereas the customer using this proposed system, RSSOL, can estimate the time to receive products because the customer can know roughly the distance between the customer location and the last delivered location of deliverer through internet.
Nation
State
City
district
street
address
a) Components of addressing system 100-101 Sangrok apartment, Ilwon-dong, Gangnam-gu, Seoul, Korea b) Example for postal system to deliver in Korea Fig. 3. Korean record for addressing system
For RSSOL, we design and show the implementation result. First, we implemented RSSOL by dividing it into two parts: one is customer side and the other is service provider one as discussed in section 3. For these directions, we can first consider the following service provider side. 4.2 Service Provider Side to Deliver ISPs take the responsibility of delivering customer’s products safely and get handwrite information of customers after delivering by deliverer using PDA in realtime. However, because of the sophisticated contents of the address record until to the address, they can give an exact localization as well as an estimation of the arrival time for their items through internet accesses of customers. Figure 4 shows a delivering example of a deliverer using PDA that accesses to RSSOL. As for first step, deliverer like figure 4(a) can login to connect the server, RSSOL. Then in the figure 4(b), the receiver’s names and addresses to be delivered for a day are displayed in the PDA screen after clicking the left lower button, “Bring”
On-Line and Mobile Delivery Data Management for Enhancing Customer Services
85
가져오기
(data from server, “ ” in korean), in 4(b). If the deliverer wants to know more information of a customer chosen, then by clicking the center button “Detail”(“ ” in Korean), the deliverer could get the result like figure 4(c). At step 2, e.g. after delivering the customer’s product, deliverer clicks the right lower button, “send” (“ ”) to the RSSOL as a last step for confirming and storing the delivering information. So we can see the PDA screenshot to be deleted the last customer information in figure 4(d).
자세히 보내기
(a) deliverer login
(b) item list to be delivered
(c) customer information (d) update after delivering Fig. 4. A series of operations at a deliverer side using mobile device
4.3 Customer Side Any customers who want to localize their items or know to distribution status can take advantage of this RSSOL, confirm the deliverer’s location and estimate the time to get their items through internet easily. Figure 5 shows the processes of customer
86
H.-C. Lee, S.Y. Shin, and Y.W. Rhee
side: figure 5(a) for ordering phase of a customer and 5(b) for the neighboring delivery list of deliverer within ordering information of customers that is including facts whether customer items are delivered or not. If a customer connects to RSSOL through web browser, the customer could see the deliverer position by confirming whether the previous customers (first to be delivered) of the customer get their items or not. If the previous other customers have not got their items, the customer could estimate the time to get his or her item like figure 6(a) to 6(c) only by clicking the last field and by confirming the status at the bottom side as well. The status of delivering items is displayed on the bottom side as shown in figure 6(a) for the first delivery. Next order is shown in a figure 6(b) for the second delivery and so on. For example, in case of “hclee” (in Korean “ ”) which will be delivered from now on, any sign is not displayed at the bottom side and only he is able to estimate the time to get his items by the previously explained procedure. Figure 7 shows a comparison result of RSSOL with conventional logistics methodology for the delay in logistic service. We see that the more there are customers, the better we get the result of delay unit.
이현창
a) ordering phase
b) delivery list of deliverer Fig. 5. Steps through the internet at a customer side
On-Line and Mobile Delivery Data Management for Enhancing Customer Services
a) 1st delivery state
b) 3rd delivery state
c) state for a specific customer “hclee” Fig. 6. Monitoring states of being delivered
87
88
H.-C. Lee, S.Y. Shin, and Y.W. Rhee
Fig. 7. Real-time service delay according to the number of customers
5 Conclusion According to the popularities of supply chain and e-commerce, the ISPs recognize importance of the information technology (IT) in their business as real time monitoring tools. The customer’s satisfaction must be more important than before and critical in leading competitiveness of companies through the IT. Hence, in this paper we propose the RSSOL in the logistics and implemented it by the PDA combined with web pages, and show the usefulness of the RSSOL in real time localization. Especially, the ISPs can cut down the Logistic expenditures because of monitoring the materials flows in real time and scheduling their products, and also the clients can get a chanced services through the real time localization and estimation to get their items.
References 1. Zografos, K.G., Giannouli, I.M.: Emerging trends in logistics and their impact on freight transportation systems: a European perspective. Transportation Research Record (1790), 36–44 (2002) 2. Weil, M.: Moving more for less. Manufacturing Systems 16, 90–94 (1998) 3. Frank, L.: Architecture for integration of distributed ERP systems and e-commerce systems. Industrial Management & Data Systems 104(5), 418–429 (2004) 4. Caputo, A.C., Cucchiella, F., Fratocchi, L., Pelagagge, P.M., Scacchia, F.: Analysis and evaluation of e-supply chain performances. Industrial Management & Data Systems 104(7), 546–557 (2004) 5. Ho, D.C.K., Au, K.F., Newton, E.: The process and consequences of supply chain virtualization. Industrial Management & Data Systems 103(6), 423–433 (2003) 6. Soliman, F., Youssef, M.A.: Internet-based e-commerce and its impact on manufacturing and business operations. Industrial Management & Data Systems 103(8), 546–552 (2003) 7. Lai, K.H., Ngai, E.W.T., Cheng, T.C.E.: An empirical study of supply chain performance in transport logistics. International Journal of Production Economics 87, 321–331 (2004) 8. Ministry of Economic Affairs (MoEA), Yearbook of Logistic Service Industry in Taiwan, Taipei (2000) 9. Aitken, J.: Supply chain integration within the context of a supplier association. Cranfield University, Cranfield, PhD thesis (1998)
On-Line and Mobile Delivery Data Management for Enhancing Customer Services
89
10. AIMO: Artificial Intelligence Modelling Organisation, AIMO (1999), http://aimo.kareltek.fi/ 11. Anton, J.: Call Center Benchmark Report, Purdue University, West Lafayette, IN (2000) 12. Amy, J.C., Charles, V., Hou, J.-L., Chen, B.J.G.: Mobile agent technology and application for online global logistic services. Industrial Management & Data Systems 104(2), 169– 183 (2004) 13. Huang, S.-M., Kwan, I.S.Y., Hung, Y.-C.: Planning enterprise resources by use of a reengineering approach to build a global logistics management system. Industrial Management & Data System 101(9), 483–491 (2001)
Design of LCL Filter Using Hybrid Intelligent Optimization for Photovoltaic System Jae Hoon Cho1, Dong-Hwa Kim2, Mária Virčíková3, and Peter Sinčák3 1
Chungbuk National University, Cheongju Chungbuk, Korea 2 Hanbat National University, Daejeon, Korea 3 Dept. of Cybernetics and Artificial Intelligence, FEI TU of Košice, Slovak Republic {jhcho1975,koreahucare}@gmail.com, {maria.vircikova,peter.sincak}@tuke.sk
Abstract. This paper proposes new design method of LCL filter parameters by using hybrid intelligent optimization. Usually, the voltage-source inverters (VSI) have been used for control of both link voltage and power factor in the grid connected topologies. But, since VSI can cause high-order harmonics by high switching frequency, it is important to choose the filter parameters to achieve good filtering effect. Compared with traditional L and LC filter, LCL filter is known as an effective method on reducing harmonic distortion at the switching frequency of VSI. However, design of the LCL filter is complex, and the selection of the initial values of the inductors is difficult at the start of the design process. This paper proposes an approach for designing LCL filter parameters by hybrid optimization method with both genetic algorithm and clonal selection. Simulation results are provided to prove that the proposed method is more effective and simple than conventional methods. Keywords: LCL filter, voltage-source inverters, genetic algorithm, clonal selection.
1 Introduction Renewable energy sources such as photovoltaic, micro turbines, small wind turbines, and fuel cell have received plenty of global attention due to depletion of the Fossil fuel and environment pollution. In particular, photovoltaic system has received great attention because it has many advantages such as simplicity of allocation, nonpolluting production, silent operation, long life time, low maintenance and absence of fuel cost [1-2]. Photovoltaic system, however, need to adopt means of storage to meet sustained load demands since the output power of photovoltaic system is usually dependent on the weather conditions and other external factors. The generation system with photovoltaic array can be operated in stand-alone mode and grid-connected mode. In grid-connected mode, this system need power conditioning units for interfacing with grid and load. Since PV sources are dc power supply, so an inverter is required for dc/ac conversion. Voltage-source inverter(VSI) are now commonly used in many power conversion applications including renewable source and distributed generation systems. The VSI has many advantages such as T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 90–97, 2011. © Springer-Verlag Berlin Heidelberg 2011
Design of LCL Filter Using Hybrid Intelligent Optimization for Photovoltaic System
91
bidirectional power flow, controllable power factor and sinusoidal input current. The inverter with an L filter or LC filter is simple in control, but the ability of highfrequency harmonics current suppression is limited. A PWM inverter with higher switching frequency will result in smaller LC filter size. However, the switching frequency is generally limited in high power applications. In order to solve this problem, LCL filter is proposed to decrease the inductance of L without depreciate filter effect[3]. The LCL filter has advantages compared with LC filter in a great attenuation with high-frequency harmonics, and can meet the requirements of harmonics attenuation even in a lower switching frequency and smaller inductors [45]. In spite of such advantages, LCL filter has its own drawbacks. A modern recursive method and some constraints were suggested in by trial and error method to decide on LCL parameters of a grid-connected voltage source inverter. But the method is complicated and selection of the initial values of the inductors is difficult at the start of the design process [6-7]. For the best performance of an LCL filter, the design of filter parameters by intelligent optimization methods have been developed [8-9]. Artificial intelligence methods has become a popular search technique used in computing to find exact or approximate solutions to optimization and search problems. Genetic algorithms are a particular class of evolutionary algorithms inspired by biology such as inheritance, mutation, selection, and crossover. To improve the performance of simple GA, various hybrid methods have been proposed and these obtained better performance than simple GA in most case [10-11]. On the other hand, Castro and Zuben [12] presented a clonal selection algorithm to solve complex problems such as learning and multi-modal optimization. The clonal selection algorithm inspired by immune system has the ability of getting out local minima, operates on a population of points in search space simultaneously, not on just one point, and does not use the derivative or any other information. This paper proposes an approach for designing LCL filter parameters by hybrid optimization method with both GA and clonal selection. For the simulation of the proposed method, PV system and LCL filter are used by Matlab/Simpowersystem. Simulation results are provided to prove that the proposed method is more effective and simple than both traditional methods and other intelligent optimization methods.
2 Grid-Connected PV System and Simplified Design Principle of LCL Filter 2.1 Grid-Connected PV System with VSI The power circuit of the grid connected PV system, as shown in Fig.1, consists of a PV array, ultracapacitor(UCB) as storage unit, a dc link capacitor that keeps dc voltage constant, a 3 phase voltage source inverter(VSI), LCL filter and utility grid. The LCL-filtered VSI topology has been reported for utility interfacing of renewable source or distributed generation and provides higher harmonic attenuation than that of conventional filter such as L and LC filter. The LCL filter, however, requires a proper filter parameters to avoid resonance problem which deteriorate stability of the system[13].
92
J.H. Cho et al.
Fig. 1. Topology of three phase VSC with LCL filter
The three-phase voltage source inverter connected grid through LCL filter is shown in Fig.1, where i1 is the inverter-side current, i2 is the grid-connected current, iC is the filter-capacitor current. The traditional design LCL filter can be calculated by the following equations[8]. L1 =
Vg 2 6 f sw iripple, peak
C2 ≤ 0.05Cb 1 (ωn Zb )
Cb =
Zb =
2 VgLL
Pn
(1) (2) (3)
(4)
Vg : the RMS value of grid voltage, fsw : inverter switching frequency. Cb : the base capacitance, Z b : base impedance, VgLL : grid line voltage Pn : inverter rated power. iripple, peak denotes peak value of harmonic current, it used to be 15% of peak value of fundamental current. C2 = 0.025Cb
(5)
For a further attenuation of current ripple of 20dB at the switching frequency, the grid side inductance of LCL filter L2 is computed as follow. L2 = 0.8L1
(6)
Design of LCL Filter Using Hybrid Intelligent Optimization for Photovoltaic System
93
3 Design of LCL Filter Parameters by Hybrid Intelligent Optimization 3.1 Clonal Selection Algorithm De Castro and Von Zuben [12] presented a clonal selection algorithm to solve complex problems such as learning and multi-modal optimization. A clonal selection algorithm is primarily focused on mimicking the clonal selection principle which is composed of the mechanisms; clonal selection, clonal expansion, and affinity maturation via somatic hypermutation[15] and is used to describe the basic features of an immune response to an antigenic stimulus. When an antigen encounters the immune system, its epitopes eventually will react with a B-lymphocyte with B-cell receptors on its surface that more or less fit and this activates that B-lymphocyte. This process is known as clonal selection. The main features of the clonal selection theory are as follows [12]: 1) generation of new random genetic changes, subsequently expressed as diverse antibody patterns by a form of accelerated somatic mutation. 2) phenotypic restriction and retention of one pattern to one differentiated cell (clone). 3) proliferation and differentiation on contact of cells with antigens. The overall procedure of clonal selection is schematically shown in Fig. 2.
Fig. 2. Clonal selection theory
3.2 Constraints for Designing the Parameters of LCL Filter Some constraints are considered to insure the performance of LCL filter[8].
a. b.
To insure a high power factor, a constraint of the capacitor is needed : C2 ≤ 0.05Cb Consider the stability of the control system and the loss of power switch devices, the total inductance should be lower than 0.1 pu(Per Unit) that is L1
+ L2 < 0.1Lb
94
J.H. Cho et al.
c.
To avoid the resonance at lower order or high order frequency, the resonance frequency should be 10 f n ≤ f res ≤ 0.5 f sw ig ( hsw ) i ( hsw )
=
2 Z LC 2 2 ωres − ωsw
2
L Z 2 ωres = T LC , LT = L1 + L2 , Z LC = L1
(7)
1 L2C f
2 ωsw = (2π f sw )
where ig ( hsw ) / i ( hsw ) is harmonic attenuation rate.
These constraints can be used for the object function of intelligent optimization algorithm to achieve the required filter effort with inductor and capacitor value as small as possible. In this paper, the object function of the proposed algorithm is calculated by Eq. (8) [16]. Object _ F = w * Har + (1 − w) * Total _ L + penalty _ fun.
⎧if ⎨ ⎩
f res is 10 f n ≤ f res ≤ 0.5 f sw , penalty _ func = 0 otherwise
, penalty _ func = 10
(8)
(9)
The first term in (8) is harmonic attenuation rate (Har) with a weight function w and the next term is total inductance. The penalty function can be used for the above mentioned constraint c.
4 Simulation and Results In order to verify the proposed design method, after LCL fitler parameters are evaluated by the proposed method, the parameters are applied to the simulation model based on MATLAB/ SimPowerSystems. Table1 shows the system parameters and the initial parameters of the proposed method. Figure 3 shows current output of VSI with LCL fitler parameters selected by general method and the proposed method. Figure 4 provides a zoomed-in view of part of A in figure 3. As shown in Table 2, the total inductance of the proposed method are lower than others and the fitness of the proposed method is higher than others which means the harmonic is lower than others. As shown in figure 3 and 4, the LCL filter with a low harmonic distortion can be obtained by the proposed method.
Design of LCL Filter Using Hybrid Intelligent Optimization for Photovoltaic System Table 1. The system parameters and initial parameter of the proposed method PV array
Values
PV Power (Pn)
12kW
Utility line (VgLL) Utility frequency(fn) Switching frequency (fsw) DC rated voltage (Vdc)
380V 60Hz 10kHz 700V
Ultracapacitor bank
Values
Capacitance Internal series resistance Voltage Short-circuit current
165 [F] 6.1[mΩ] 48.6[V] 4800[A]
The Proposed algorithm
Values
Population size Probability of crossover of gene Probability of mutation Iteration for the proposed algorithm The No. of clones Probability of crossover of clone Weight for object function
100 0.65 0.05 200 5 0.25 0.5
Table 2. Simulation results for the proposed method and others
Methods
L1 (mH)
L2 (mH)
Cf (uF)
Normal
1.28
1.12
3.89
3.28
0.78
GA
1.22
0.98
4.65
4.66
0.83
The proposed method
1.19
0.77
4.65
4.87
0.87
fres (kHz)
Fig. 3. Current waves by general method and the proposed method
fitness
95
96
J.H. Cho et al.
Fig. 4. Zoomed plot of current waves in figure 3
5 Conclusion Usually, the voltage-source inverters (VSI) have been used for full control of both dclink voltage and power factor in the grid connected topologies. But, the VSI can cause high-order harmonics disturbing other sensitive loads/equipment on the grid. This paper proposes an approach for designing LCL filter parameters by hybrid optimization method with both GA and clonal selection. In order to verify the proposed design method, after LCL fitler parameters are evaluated by the proposed method, the parameters are applied to the simulation model based on MATLAB/ SimPowerSystems. Simulation results are provided to prove that the proposed method is more effective and simple than both traditional methods and other intelligent optimization methods. Acknowledgments. The work developed in this paper has been supported by the DAEDEOK INNOPLIS (“R&D Hub Cluster project”).
References 1. Molina, M.G., Mercade, P.E.: modeling and control of grid-connected photovoltaic energy conversion System used as a Dispersed Generator. In: 2008 IEEE PES Transmission and Distribution Conference and Expositon, pp. 1–8 (2008) 2. Fakham, H., Degobert, P., Francois, B.: Control system and power management for a PV based generationunit including batteries. In: International Aegean Conference on Electrical Machines and Power Electronics, pp. 141–146 (2007) 3. Lindgren, M., Svensson, J.: Connecting Fast Switching Voltage Source Converters to the Grid-Harmonic Distortion and its Reduction. In: IEEE Stockholm Power Tech Conference, pp. 18–22 (1995) 4. Halimi, B., Dahono, P.A.: A Current control method for phase-controlled rectifier that has an LCL filter. In: 4th IEEE International Conference on Power Electronics and Drive Systems, pp. 20–25 (2001)
Design of LCL Filter Using Hybrid Intelligent Optimization for Photovoltaic System
97
5. Yin, J.J., Duan, S., Zhou, Y., Liu, F., Chen, C.: A novel parameter design method of dualloop control strategy for grid-connected inverters with LCL filter. In: IPEMC 2009, pp. 712–715 (2009) 6. Liserre, M., Blaabjerg, F., Hansen, S.: Design and Control of an LCL-filter based Threephase Active Rectifier. IEEE Trans. Industry Applications 47, 1281–1291 (2005) 7. Tavakoli Bina, M., Pashajavid, E.: An efficient procedure to design passive LCL-filters for active power filters. Electric Power Systems Research 79, 606–614 (2009) 8. Sun, W., Chen, Z., Wu, X.: Intelligent optimize design of LCL filter for three phase voltage-source PWM rectifier. In: IEEE 6th IPEMC 2009, pp. 970–974 (2009) 9. Chen, Y.M.: Passive filter design using genetic algorithms. IEEE Trans. Industrial Elec. 50, 202–207 (2003) 10. Grimaccia, F., Mussetta, M., Zich, R.E.: Genetical Swarm Optimization: Self-Adaptive Hybrid Evolutionary Algorithm for Electromagnetics. IEEE Trans. on Antennas and Propagation 55, 781–785 (2007) 11. Kao, Y.T., Zahara, E.: A hybrid genetic algorithm and particle swarm optimization for multimodal functions. Applied Soft Computing 8, 849–857 (2008) 12. De Castro, L.N., Von Zuben, F.J.: Learning and optimization using the clonal selection principle. IEEE Trans. Evol. Comput. 6, 239–251 (2002) 13. Lee, K.J., Park, N.J., Kim, R.Y., Ha, D.H., Hyun, D.S.: Design of an LCL filter employing a symmetric geometry and its control in grid-connected inverter applications. In: 39th IEEE Power Electronics Specialists Conference, pp. 963–966 (2008) 14. Wang, T.C.Y., Ye, Z., Sinha, G., Yuan, X.: Output filter design for a grid-interconnected three-phase inverter. In: 34th IEEE Power Electronics Specialists Conference, pp. 779–784 (2003) 15. Brownlee, J.: Clonal selection algorithms. Technical report id: 070209a, Victoria, Australia: ComplexIntelligent Systems Laboratory (CIS), Centre for Information Technology Research (CITR), Faculty of Informationand Communication Technologies (ICT), Swinburne University of Technology (2007) 16. Cho, J.H., Yoon, J.H., Cho, Y.I., Chun, M.G.: LCL filter Design for Renewable Energy Sourceusing Advanced Bacterial Foraging Optimization. In: SCIS & ISIS 2010, pp. 618– 623 (2010)
Distributed Energy Management for Stand-Alone Photovoltaic System with Storages Jae Hoon Cho1 and Dong Hwa-Kim2 1
Chungbuk National University, Cheongju Chungbuk, Korea 2 Hanbat National University, Daejeon, Korea {jhcho1975,koreahucare}@gmail.com
Abstract. This paper presents a distributed energy management system (DEMS) for stand-alone photovoltaic(PV) system with storage devices. Usually, batteries can be used as the storage device for PV system to compensate PV output power changed by irradiation and temperature. Recently, Ultracapacitor has been adopted for the better dynamic characteristic of a whole system. The control of these systems, called hybrid system, often is complicated and need to design a proper energy management system(EMS). DEMSs are more effective than conventional centralized EMS, but it is difficult to design the DEMS for a power system because of various states of the system during operation of standalone mode. To verify the performance of the proposed DEMS, the stand-alone photovoltaic(PV) system is designed by Matlab/SimPowerSystems and Matlab/StateFlow tool is used for designing the proposed DEMS. Keywords: distributed energy management systems(DEMSs), centralized EMS, stand-alone photovoltaic(PV) system.
1 Introduction Because of the ever increasing energy consumption, the soaring cost and the exhaustible nature of fossil fuel, many researches focus on renewable sources such as wind turbine, photovoltaic(PV) energy, fuel cell and micro-turbine, etc. Particularly, because of some advantages such as abundance, pollution free, simplicity of allocation, absence of fuel cost, PV system are becoming popular as a promising alternative source[1-4]. However, the stand-alone PV system need to add certain energy storage devices since the output power generated by a PV system is highly dependent on weather conditions. For example, during cloudy periods and at night, a PV system would not generate any power. In addition, it is difficult to store the power generated by a PV system for future use. To overcome this problem, a PV system can be integrated with other alternate power sources and/or storage systems, such as battery and UC bank [5-6]. For the integrated system with more than two storage systems, a proper energy management system(EMS) are usually required[7]. The energy management system can be categorized into two types: centralized EMS(CEMS) and distributed EMS(DEMS). DEMS are more effective than conventional CEMS, but it is difficult to design the DEMS for power system because of the complex control strategy according to several distributed controller. T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 98–106, 2011. © Springer-Verlag Berlin Heidelberg 2011
Distributed Energy Management for Stand-Alone Photovoltaic System with Storages
99
This paper proposes distributed energy management system(DEMS) for standalone PV system with storages. To verify the performance of the proposed DEMS, the stand-alone photovoltaic(PV) system is designed by Matlab/SimPowerSystems and Matlab/StateFlow tool is used for designing the proposed DEMS.
2 System Description and Modeling The system model, as shown in Fig. 1, is described for the proposed DEMS. Here, to sustain the power demand and solve the energy storage problem, battery and ultracapacitor is connected in parallel through bi-directional DC-DC converter. The system composed of PV system, ultracapacitor bank(UCB), and battery and each component is connected to the same DC Bus through the DC-DC converter with distributed controller. Both UCB and Battery have bi-directional DC-DC converter for charging and discharging, respectively. The boost DC-DC converter connected in PV system is used for maximum power tracking(MPPT). In this system, IncCond MPPT algorithm is adopted [8].
Fig. 1. The stand-alone PV system with storages
2.1 PV System A PV system consists of many cells connected in series and parallel to provide the desired output terminal voltage and current. The PV system can be modeled by various mathematical methods [9-10]. In this paper, the PV system with a 2D-Lookup table and a controlled current source is used for simulation time and computational efficiency. The irradiance data and the I-V characteristic curve of PV array can be used for modeling the PV output power. Fig. 2 shows PV system Matlab/simulink model using the 2D-Lookup table and the controlled current source. The input data for 2D Look-up table are irradiance data and output voltage of PV system.
100
J.H. Cho and D.-H. Kim
Fig. 2. Matlab simulink model for PV array Table 1. Parameters of the PV array PV system Parameters The number of series cells per string The number of parallel cells per strings Ideality or completion factor Boltzmann’s constant PV cell temperature Electron charge Short-circuit cell current (A) PV cell reverse saturation current (A) Series resistance of PV cell (Ω)
Values 105 148 1.9 1.3805e23[J/K] 298 [K] 1.6e-19 C 2.926 0.00005 0.0277
2.2 Ultracapacitor Bank (UCB) Model The ultracapacitor is an energy storage device that is able to handle fast fluctuations in energy level. When comparing ultracapacitors with batteries, they have a significantly lower energy density than the batteries but they have a higher power density compared with batteries[11]. For practical applications, the terminal voltage determines the number of capacitors which must be connected in series to form a bank and the total capacitance determines the number of capacitors which must be connected in parallel in the bank. The detailed model including various characteristic of ultracapacitor can be used in modeling ultracapacitor. But, for the simplicity of simulation, a simplified equivalent model that is only used for principle verification has been used in many papers [9,12]. The parameters used in the mathematical modeling of the UC bank are as follows[9]: C CUC-total ESR, R EUC ns np RUC-total Vi Vf
capacitance [F] the total UC system capacitance [F] equivalent series internal resistance [U] the amount of energy released or captured by the UC bank[W/s] the number of capacitors connected in series the number of series strings in parallel the total UC system resistance [U] the initial voltage before discharging starts [V] the final voltage after discharging ends [V]
Distributed Energy Management for Stand-Alone Photovoltaic System with Storages
101
The amount of energy drawn from the UC bank is directly proportional to the capacitance and the change in the terminal voltage , given by EUC =
(
1 C Vi 2 − V f2 2
)
(1)
The total resistance and the total capacitance of the UC bank may be calculated as
RUC −total = ns
ESR np
(2)
CUC −total = n p
C ns
(3)
Table 3. Ultracapacitor module parameters UC parameters
Value
Capacitance [F]
165
Internal series resistance (dc) [mΩ]
6.1
Leakage current [A]
0.0052, 72 h, 25 C
Operating temperature
40 C to 65 C
Voltage [V]
48.6
Short-circuit current [A]
4800
Power density [W/kg]
7900
Energy density [Wh/kg]
3.81
2.3 Battery Model In this paper, electric circuit-based battery model is used for representing state of charge (SOC) estimations of battery packs. The battery can be modeled by using a simple controlled voltage source in series with a constant resistance. This model assumes the same characteristics for the charge and the discharge cycles and the open voltage source that can be calculated with a non-linear equation based on the actual SOC of the battery [13]. Fig. 3 shows the Matlab/SimPowersystem model for battery. The controlled voltage source is described by equation (4): E = E0 − K
Q
∫
Q − idt
Vbatt = E − Ri
E = no-load voltage(V) E0 = battery constant voltage(V) K =polarization voltage(V)
∫
+ A exp(− B ⋅ idt )
(4)
(5)
102
J.H. Cho and D.-H. Kim
Q =battery capacity(Ah)
∫ idt =actual battery charge(Ah) A =exponential zone amplitude(V) -1 B =exponential zone time constant inverse(Ah) Vbatt =battery voltage(V) R =internal resistance( Ω ) i =battery current(A)
Fig. 3. The SimPowerSystems model of battery
3 Distributed Energy Management System (DEMS) for PV System In the stand-alone PV system, since demanded load power must be only satisfied by PV system and storage devices, a proper energy management is essential. The storage devices such as battery and ultracapacitor can be introduced into PV system as an energy buffer to balance the input power and the output power. In order to draw the best performance of such hybrid systems, it is known that DEMS is better reliability than CEMS[14]. Fig. 4 represents the proposed distributed energy management system. In this system, for designing strategies of DEMS, the rated voltage of battery and UCB is considered. So, the battery and UCB voltage will be maintained within the safe range that can be represented by their state of charge (SOC). The proposed DEMS consists of three distributed controllers: PV controller, Battery controller and UCB controller. The battery controller and UCB controller operates two modes: DC-Voltage control mode and power control mode. But, both of them can't have the same mode at a same time. Therefore, only one controller at a time must regulate the DC-link voltage. Also, in the DC-voltage control mode, the use of UCB has priority because UCB is capable of providing the short-term peak load demand. The PV system executes two strategies: Permission for charging and allocation of excess power. The PV system has to track the maximum power point (MPP) by controlling a dc/dc converter to ensure efficient operation and allocates available power to recharge the battery and UCB. If the total output power of PV system is higher than the demand power, the remaining power will be used to charge the battery
Distributed Energy Management for Stand-Alone Photovoltaic System with Storages
103
Information and request
PV Controller
Battery Controller
Information
UCB Controller
Mode
and request
change
DC-voltage Control Mode Power Control Mode
Mode change
DC-voltage Control Mode Power Control Mode
Fig. 4. The proposed distributed energy management system
and the UCB. Each controller has a right to permit or refuse a request of others. When battery or UCB requests the power for charging SOC of battery and UCB, PV system decides whether or not to accept their requests according to its own state. This behavior also is similar to the operation between Battery and UCB controller.
4 Simulation and Results The PV system developed in this paper is capable of delivering 12kW of power at the best radiation conditions and the maximum current of UC power is 750A. Also, the rated voltage and current of the battery model are 200V and 6.5A, respectively. To show as clearly as possible the response of the EMS, the simulation time was set 20s and the external power is assumed to be 10kW. Table 4 shows the power allocation by PV controller during DC-link voltage control mode of UCB. When UCB controller executes DC-link voltage control mode, it mean that UCB is working in the range of between maximum of SOC and minimum of SOC. In this paper, the favorable SOC for the UCB is set between 80% and 40%. Similarly, the suitable SOC of battery is set between 80% and 30%. In Table 4, PPV denotes the remaining power of PV system after providing the demand power. Fig. 5 shows PV output power by MPPT and the demand load power. Table 4. Power allocation during DC-link voltage control mode of UCB UCB charge power
Batter charge power
SOC = Ok
-
-
DC-link Mode
SOC = Ok
-
-
State 1_3
DC-link Mode
SOC = Not
-
PPV
State 1_4
DC-link Mode
SOC = Not
-
0.8PPV
State
UC SOC
Battery SOC
State 1_1
DC-link Mode
State 1_2
104
J.H. Cho and D.-H. Kim
Fig. 5. PV MPPT output power and load power
Fig. 6. The current and voltage of UCB
Fig. 6 and 7 show the current and voltage of UCB and battery, respectively. During low power demand period (from 0 to 2s), the UCB operates charge mode. Thus, the UCB current is negative in this period. During high power demand period (from 2 to 6s), for sustaining demand load power, both the UCB and battery operate discharge mode and the UCB and battery current is positive. As shown in Fig. 6 and 7, the UCB and battery properly operate according to the constantly varying the load power and PV output power.
Distributed Energy Management for Stand-Alone Photovoltaic System with Storages
105
Fig. 7. The current and voltage of battery
5 Conclusion This paper presents a distributed energy management system (DEMS) for stand-alone photovoltaic(PV) system with storage devices. DEMS are more effective than conventional CEMS, but it is difficult to design the DEMS for power system because of the complex control strategy according to several distributed controller. To verify the performance of the proposed DEMS, the stand-alone photovoltaic(PV) system is designed by Matlab/SimPowerSystems and Matlab/StateFlow tool is used for designing the proposed DEMS. Acknowledgments. The work developed in this paper has been supported by the DAEDEOK INNOPLIS (“R&D Hub Cluster project”).
References 1. Borowy, B.S., Salameh, Z.M.: Optimum photovoltaic array size for a hybrid wind/PV system. IEEE Transactions on Energy Conversion. 9, 482–488 (1994) 2. Markvart, T.: Sizing of hybrid photovoltaic-wind energy systems. Solar Energy 57, 277– 281 (1996) 3. Martínez, J., Medina, A.: A state space model for the dynamic operation representation of small-scale wind-photovoltaic hybrid systems. Renewable Energy 35, 1159–1168 (2010) 4. Hwang, J.J., Lai, L.K., Wu, W., Chang, W.R.: Dynamic modeling of a photovoltaic hydrogen fuel cell hybrid system. International Journal of Hydrogen Energy 34, 9531– 9542 (2009) 5. Xie, J., Zhang, X., Zhang, C., Wang, C.: Research on Bi-Directional DC-DC Converter For a Stand-Alone Photovoltaic Hybrid Energy Storage System. In: 2010 Asia-Pacific Power and Energy Engineering Conference (APPEEC), pp. 1–4 (2010) 6. Gee, A.M., Dunn, R.W.: Novel battery/supercapacitor hybrid energy storage control strategy for battery life extension in isolated wind energy conversion systems. In: 2010 45th International Universities Power Engineering Conference (UPEC), pp.1–6 (2010)
106
J.H. Cho and D.-H. Kim
7. Katiraei, F., Iravani, R., Hatziargyriou, N., Dimeas, A.: Microgrids management. IEEE Power and Energy Magazine 6, 54–65 (2008) 8. Kim, S.K., Jeon, J.H., Cho, C.H., Ahn, J.-B., Kwon, S.H.: Dynamic Modeling and Control of a Grid-Connected Hybrid Generation System With Versatile Power Transfer. IEEE Trans. on Industrial Electronics 55, 1677–1688 (2008) 9. Zue, A.O., Chandra, A.: Simulation and stability analysis of a 100 kW grid connected LCL photovoltaic inverter for industry. In: IEEE 2006 Power Engineering Society General Meeting, pp. 1–6 (2006) 10. Uzunoglu, M., Onar, O.C., Alam, M.S.: Modeling, Control and simulation of a PV/FC/UC based hybrid power generation system for stand-alone applications. Renewable Energy 34, 509–520 (2009) 11. Johansson, P.: Comparison of Simulation programs for Supercapacitor Modeling. Chalmers university of technology, master of science thesis (2008) 12. El-shark, M.Y., Rahman, A., Alam, M.S., Byrne, P.C., Sakla, A.A., Thomas, T.A.: dynamic model for a stand-alone PEM fuel cell power plant for residential applications. Journal of Power Sources 138, 199–204 (2004) 13. Tremblay, O., Dessaint, L.-A., Dekkiche, A.-I.: A generic Battery Model for the Dynamic Simulation of Hybrid Electric Vehicles. In: Vehicle Power and Propulsion Conference, pp. 284–289 (2007) 14. The application of Multi Agent System in Microgrid coordination control. In: International conference on Sustainable Power Generation and Supply, pp. 1–6 (2009)
Framework for Performance Metrics and Service Class for Providing End-to-End Services across Multiple Provider Domains Chin-Chol Kim1, Jaesung Park2,*, and Yujin Lim3 1
Digital Infrastructure Division, National Information Society Agency, 77 Mugyo-Dong, Jung-Gu, Seoul, 100-775, Korea
[email protected] 2 Department of Internet Information Engineering, University of Suwon, 2-2 San, Wau-ri, Bongdam-eup, Hwaseong-si, Gyeonggi-do, 445-743, Korea 3 Department of Information Media, University of Suwon, 2-2 San, Wau-ri, Bongdam-eup, Hwaseong-si, Gyeonggi-do, 445-743, Korea {jaesungpark,yujin}@suwon.ac.kr
Abstract. Developing a unified solution that enables the end-to-end delivery of services over multiple provider domains at a guaranteed quality level is challenging. International standard organizations offer their own definition of performance metrics and service classes. In this paper, we define the unified performance metrics and service classes for interworking of various types of networks. Keywords: Quality-of-service, service-level agreement, performance metric, service class.
1 Introduction The Internet is moving from being a simple monolithic data service network to a ubiquitous multi-service network in which different stakeholders including content providers, service providers, and network providers. They require to co-operate for offering value-added services and applications to content consumers [1]. The problem of how to extend QoS (Quality-of-Service) capabilities across multiple provider domains for providing end-to-end services, has not been solved satisfactorily to-date. Furthermore, developing a unified solution that enables the end-to-end delivery of services over various types of networks at a guaranteed quality level is more challenging. The current practice in service offering is using of Service Level Agreements (SLAs). This paper presents the solution for end-to-end QoS-enabled service delivery over heterogeneous networks. To do this, we compare the performance metrics defined by international standard organizations and re-define the unified metrics. Then we analyze *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 107–113, 2011. © Springer-Verlag Berlin Heidelberg 2011
108
C.-C. Kim, J. Park, and Y. Lim
the service classes presented by the organizations and re-define the unified service class. Besides, we consider how to allocate performance to multiple provider domains. This paper is organized as follows. Section 2 highlights related works and our proposed performance metrics. Section 3 describes how to achieve the interworking of heterogeneous networks. Finally, conclusions are presented in Section 4.
2 Performance Metrics 2.1 Related Works Generally, performance metrics are defined to evaluate the end-to-end QoS provisioning. International standard organizations lay out several criteria for the metrics. We introduce performance metrics defined by thee organizations; IETF, ITU-T, and GSMA. First, IETF develops and promotes Internet standards and it proposes 5 metrics to maximize the service quality and reliability of end-to-end path, as below [2-7].
Packet Delay For a real number dT, “the delay from Src (source) to Dst (destination) at T is dT” means that Src sent the first bit of a packet to Dst at time T and that Dst received the last bit of that packet at time T+dT.
Packet Delay Variation The variation in packet delay is sometimes called "jitter”. The packet delay variation is defined for two packets from Src to Dst as the difference between the value of the delay from Src to Dst at T2 and the value of the delay from Src to Dst at T1. T1 is the time at which Src sent the first bit of the first packet, and T2 is the time at which Src sent the first bit of the second packet.
Packet Loss “The loss from Src to Dst at T is 0” means that Src sent the first bit of a packet to Dst at time T and that Dst received that packet. “The loss from Src to Dst at T is 1” means that Src sent the first bit of a packet to Dst at time T and that Dst did not receive that packet.
Packet Reordering If a packet s is found to be reordered by comparison with the NextExp value, its “Packet-Reordered” = True; otherwise, “Packet-Reordered” = False if s >= NextExp then /* s is in-order */ NextExp = s + 1; Packet-Reordered = False; else /* when s < NextExp */ Packet-Reordered = True;
Packet Duplication The packet duplication is a positive integer number indicating the number of (uncorrupted and identical) copies received by Dst in the interval [T, T+T0] for a packet sent by Src at time T.
Framework for Performance Metrics and Service Class for Providing End-to-End Services
109
Second, ITU-T coordinates standards for telecommunications. It provides 8 metrics for QoS monitoring across heterogeneous provider domains to provide end-to-end services as below [8].
IP Packet Transfer Delay (IPTD) IPTD is the time, (t2 – t1) between the occurrence of two corresponding IP packet reference events, ingress event at time t1 and egress event at time t2, where (t2 > t1) and (t2 – t1) ≤ Tmax.
IP Packet Error Rate (IPER) IPER is the ratio of total errored IP packet outcomes to the total of successful IP packet transfer outcomes plus errored IP packet outcomes.
IP Packet Loss Rate (IPLR) IPLR is the ratio of total lost IP packet outcomes to total transmitted IP packets.
Spurious IP Packet Rate Spurious IP Packet Rate is the total number of spurious IP packets observed at that egress point during a specified time interval.
IP Packet Reordered Ratio (IPRR) IPRR is the ratio of the total reordered packet outcomes to the total of successful IP packet transfer outcomes.
IP Packet Severe Loss Block Ratio (IPSLBR) IPSLBR is the ratio of the IP packet severe loss block outcomes to total blocks.
IP Packet Duplicate Ratio (IPDR) IPDR is the ratio of total duplicate IP packet outcomes to the total of successful IP packet transfer outcomes minus the duplicate IP packet outcomes.
Replicated IP Packet Ratio (RIPR) RIPR is the ratio of total replicated IP packet outcomes to the total of successful IP packet transfer outcomes minus the duplicate IP packet outcomes.
Third, GSMA is an association of mobile operators and related companies devoted to supporting the standardizing, deployment and promotion of the GSM mobile telephone system. GSMA defines the metrics requested to IPX providers by wireless service providers as below [9].
Max Delay Max Delay is the maximum value of the one-way transit delays across an IP transport network.
Max Jitter Max Jitter is the maximum value of delay variations across an IP transport network.
110
C.-C. Kim, J. Park, and Y. Lim
Packet Loss Packet Loss is the ratio of total lost packets to total transmitted packets via an IP transport network.
SDU Error Ratio SDU Error Ratio is the ratio of total errored packets to total transmitted packets via an IP transport network.
Service Availability Service availability is a proportion of the time that the service is considered available to service providers on a monthly average basis.
Fig. 1. The comparison of performance metrics
2.2 Definition of Performance Metrics We summarize performance metrics mentioned in the previous subsection, as shown in Fig 1. We select the common metrics among the metrics defined by the standard organizations, such as delay, jitter, and packet loss. The delay and jitter seriously affect the quality of real-time streaming multimedia applications such as voice over IP, online games and IPTV. In some cases, excessive delay can render the application unusable. Some network transport protocols such as TCP provide for reliable delivery of packets. In the event of packet loss, the receiver asks for retransmission or the sender automatically resends any segments. Although TCP can recover from packet loss, retransmitting missing packets causes the throughput of the connection to decrease. In addition, the retransmission possibly causes severe delay in the overall transmission. Thus, when the end-to-end QoS is evaluated across multiple providers, at least three metrics should be considered. However, the definitions of the metrics are different among the organizations. We re-define the metrics consistently as below.
Delay The arithmetic mean of one-way packet transit delays between source and destination.
Framework for Performance Metrics and Service Class for Providing End-to-End Services
Jitter The arithmetic mean of differences between successive packet delays.
Packet Loss The ratio of total lost packets to total transmitted packets.
111
3 Interworking of Multiple Providers 3.1 Definition of Service Class Service providers offer their own service class and the quality of these services are different. When the traffic enters into the provider domain and the traffic is mapped into the higher level of service class than the requested level, the network resources are wasted. Whereas, when the traffic is mapped into the lower level of service class than the requested level, the service quality is not guaranteed. Thus we define the unified service classes to solve the mapping problem between service classes of different providers.
Fig. 2. The comparison of service classes
Fig. 3. Definition of service classes
112
C.-C. Kim, J. Park, and Y. Lim
Fig. 2 summarizes the service classes proposed by IETF, ITU-T, and GSMA [1012]. IETF divides the service classes based on the services offered. ITU-T divides the classes based on the service characteristics and GSMA focuses on the services used. We re-define the service class for interworking of heterogeneous networks, as shown in Fig. 3. 3.2 Allocation of Performance Even though the performance metrics and service classes are re-defined consistently, another challenge to achieve the end-to-end QoS is presented: how can QoS classes, e.g., network performance, be assured for users? There are two basic approaches to solve the problem [13]. One involves allocating performance to a limited number of network segments, which allows operators to contribute known levels of impairments per segment, but restricts the number of operators that can participate in the path. The other approach is impairment accumulation, which allows any number of operators to participate in a path. On the surface, this may appear too relaxed, but assuming operators in a competitive environment will actively manage and improve performance. The static allocation approach divides the end-to-end path into a fixed number of segments and budgets the impairments such that the total objective is met in principle. It requires that individual segments have knowledge of the distance and traffic characteristics between the edges of their domains, as these properties of the segment affect the resulting allocations. For example, the delay budget allocated to a network segment depends on whether it is access or transit, and whether the transit distance is metro or regional. Similarly, packet loss and delay variation will have to be allocated according to whether the segment is access or transit, as the traffic aspects can differ significantly. An important aspect of the static allocation is its dependence on the number of providers, as the allocation has to be done accordingly. This can result in undershooting or overshooting the objective because paths can have a different number of network segments than designed for. Accumulation approach is defined as those that include requests of what performance level each provider can offer, followed by decisions based on the calculated estimate of end-to-end performance. The requester may be the customer-facing provider only or include all the providers along a path. The responder may be a provider or their proxy. However, there are several weaknesses of the approach. First, users' segment impairments are not taken into account. Second, if the initial process fails, multiple passes of request/estimation cycle may be required. Third, it requires customer or customer proxy involvement. Finally, commitments for each network segment must be pre-calculated taking distance into account.
4 Conclusions In this paper, we focus on how to extend QoS capabilities across multiple provider domains for providing end-to-end services. To solve the problem, we define the unified performance metrics and the service classes for interworking of multiple provider domains. For the future work, we need to solve another problem of how to allocate performance to the domains, by using IETF PDB (Per-Domain Behavior) concept.
Framework for Performance Metrics and Service Class for Providing End-to-End Services
113
References 1. Ahmed, T., Asgari, A., Mehaoua, A., Borcoci, E., Berti-Equille, L., Georgios, K.: End-toEnd Quality of Service Provisioning through an Integrated Management System for Multimedia Content Delivery. Elsevier Computer Communications 30, 638–651 (2007) 2. Paxson, V., Almes, G., Mahdavi, J., Mathis, M.: Framework for IP Performance Metrics. IETF internet-standard: RFC2330 (1998) 3. Almes, G., Kalidindi, S., Zekauskas, M.: A One-way Delay Metric for IPPM. IETF internet-standard: RFC2679 (1999) 4. Almes, G., Kalidindi, S., Zekauskas, M.: A One-way Packet Loss Metric for IPPM. IETF internet-standard: RFC2680 (1999) 5. Demichelis, C., Chimento, P.: IP Packet Delay Variation Metric for IP Performance Metrics (IPPM). IETF internet-standard: RFC3393 (2002) 6. Morton, A., Ciavattone, L., Ramachandran, G., Shalunov, S., Perser, J.: Packet Reordering Metrics. IETF internet-standard: RFC4737 (2006) 7. Uijterwaal, H.: A One-Way Packet Duplication Metric. IETF internet-standard: RFC5560 (2009) 8. ITU-T Recommendation Y.1540. Internet Protocol Data Communication Service – IP Packet Transfer and Availability Performance Parameters (2007) 9. GSMA IR.34. Inter-Service Provider IP Backbone Guidelines (2008) 10. Babiarz, J., Chan, K., Baker, F.: Configuration Guidelines for DiffServ Service Classes. IETF internet-standard: RFC4594 (2006) 11. ITU-T Recommendation Y.1541. Network Performance Objectives for IP-based Services (2006) 12. 3GPP TS 23.107. Quality of Service (QoS) Concept and Architecture (2010) 13. ITU-T Recommendation Y.1542. Framework for achieving end-to-end IP performance objectives (2006)
Design of a Transmission Simulator Based on Hierarchical Model Sang Hyuck Han and Young Kuk Kim* Department of Computer Science & Engineering, Chungnam National University 220 Gung-dong, Yuseong-Gu, Daejeon 305-764, South Korea
[email protected],
[email protected]
Abstract. Recently, there are increasing efforts to use IT technologies on power domains like as power automation systems, simulation systems, etc. Especially, simulation technologies are necessities for systems to improve their qualities and verify their errors with low costs. But, it's limited to use them because most of power simulation systems are focus on simulating small substations and commercial solutions need expertise knowledge to use them and their cost is very high. In this paper, we describe a design of HBTS(Hierarchical Based Transmission Simulator) that simulates Korea's substations. HBTS outputs each substation's currents, voltages, phase angles. Keywords: Transmission simulator, industrial process control.
1
Introduction
Recently, IT technologies play a key role for risk management and quality control of power automation systems. Requests for convergence of IT and Power Industries are increasing at an alarming rate. Especially, Real-time data processing technologies and power simulation technologies are most essential ones among those technologies. Power simulation technologies are useful to improve system’s qualities and verify system’s errors with low costs and educate inexperienced operators. Also, the data that are generated from the simulation can be used for power automation systems like SCADA(Supervisory Control And Data Acquisition), RTDB(Real-Time DataBase) to verify their functionalities and enhance their performance. But, it's limited to use them because most of power simulation systems are focus on simulating small substations and commercial solutions need expertise knowledge to use them and their cost is very high[1][2]. In this paper, we describe a design of HBTS(Hierarchical Based Transmission Simulator) that similarly simulates Korea's substations. HBTS describes Korea's substations with IEEE 4 Node Test Feeder and a hierarchical tree structure. Accordingly, Using real power usage capacity that is provided in KPX(Korea Power eXchange), HBTS regularly can output each substation's currents, voltages, phase angles. HBTS can be used on various ways. First, it can both check and enhance *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 114–122, 2011. © Springer-Verlag Berlin Heidelberg 2011
Design of a Transmission Simulator Based on Hierarchical Model
115
performance of the RTDB that collects and manages data from a lot of substations. Second, with detail description of HBTS, you can considerably decrease your development time when you make similar simulators. This paper is organized as follows. Chapter 2 describes related works that are about Korea's transmission overview and main electric theories that are used in HBTS, and Chapter 3 describes a design of HBTS in detail, and Chapter 4 describes a conclusion and future works.
2 2.1
Related Works Korea Transmission Overview
Fig. 1 is a conceptual figure of Korea's transmission power system, and it is mainly consist of generators, ultra high voltage substations(745kV), high voltage substations (345kV), medium voltage substations(154kV), distribution substations(154kV). Each generator produces power, and changes it with high voltage power, and sends it to high voltage substations. High voltage substations distribute some of power some factories or train substations, and transmit power that is remained to medium voltage substations. Medium voltage substations distribute some of power to broadcasts, big buildings, stadiums, and transmit power that is remained to distribution substations. Distribution substations distribute some factories, buildings, and pole transformers change power with low voltage that is 22.9kV and distribute it to small factories, markets, customers[3].
Fig. 1. Conceptual diagram of transmission power system
Fig. 2 is the picture of Korea's power system[4]. It is consist of generators and substations. Korea's generators have some features. First, most of generators produce power with water, steam, nuclear. Second, most of hydro power plants are located on Han river, and steam or nuclear plants are located on forest and sea area because they import raw materials from abroad. Third, power generation size of each power plant is bigger than before because of difficulty on selecting right area. With these reasons, also, submission lines are longer and have massive capacity. Most of substations are located on client area or big industrial sector. Especially, clients area near metropolitan have difficulty with power send and receive because
116
S.H. Han and Y.K. Kim
Fig. 2. Korea Power System
preventing building of large capacity plant near metropolitan. For sending and receiving in metropolitan, transmission lines of 765kV power plant are used for decreasing power loss[4][5]. 2.2
The Main Technologies
The main technologies for development of HBTS are PU(Per Unit system)[4], KCL(Kirchhoff's Current Law)[6], IEEE 4 Node Test Feeder[7]. A per-unit system is the expression of system quantities such as voltage, current, power, reactive power, real power, impedance, admittance as fractions of a defined base unit quantity. Generally, we use PU as symbol of a per-unit system. Table 1. PU(Per-Unit System) Items Voltage
Formula V PU
(1)
P PU Power
Q PU W PU I
(2)
A
∆
Current √
I PU
∆
√
∆
√
(3) (4)
Line to line voltage V [kV] of a power system is represented as (1) when base [kV], and power-reactive power is represented each of line to line voltage is V power P[MW] and reactive power Q[MVar] and apparent power W[MVA] as (2) by PU when base capacity is W [MVA]. Basic current I A , that is upper current, is represented as (3) with basic capacity that is W , W and basic voltage. Then, (4) can be taken with basic current from (3). KCL is a rule that at any
Design of a Transmission Simulator Based on Hierarchical Model
117
node in an electrical circuit, the sum of currents flowing into that node is equal to the sum of currents flowing out of that node. In Fig. 4, with KCL, I that is feeder is same as sum of three current that is I , I , I . I
I
I
E y
E
I E y
E
E y
(5)
(5) can be transformed as (6) I
E Y
E Y
E Y
(6)
Fig. 4. An example of power circuit
Fig. 5 shows IEEE 4 Node Test Feeder that is consist of both generator G1, G2 and load L1, L2, and it has 4 bus between generators and loads[10]. 2
1 2000 ft.
Infinite Bus
[I 12]
3
4 2500 ft.
[I 34]
Load
Fig. 5. IEEE 4 Node Test Feeder
Fig. 6 describes representative values about line parameters, active power, reactive power. A virtual power system uses representative values in Fig. 5 with difficulty of obtaining real parameters.
Fig. 6. P, Q, i, V, ζ values in IEEE 4 Node Test Feeder
118
3
S.H. Han and Y.K. Kim
Hierarchical Based Transmission Simulator
Fig. 7 shows an architecture of HBTS(Hierarchical Based Transmission Simulator). HBTS has three main components that are BDG(Base Data Generator) that creates base data of transmission substations that similarly can be modeled with Korea's transmission substation's architecture and TDG(Trend Data Generator) that generates period data with trend data of KPX(Korea Power eXchange) and RDG(Result Data Generator) that output the simulation result with an adequate format.
Fig. 7. An Architecture of HBTS
3.1
BDG(Base Data Generator)
BDG is a main component that generates each substation's base data such as voltage, current, power based on transmission structure that is similar to Korea transmission power system. Fig. 8 shows a conceptual architecture of HBTS's transmission substations that has 4 levels. On 1 level, 4 nodes that are referenced from IEEE 4 Node Test Feeder connected each other, and 2-4 levels have hierarchical structure based on KCL. HBTS can be used to represent Korea's transmission power system with configuration of both the count of level and the node count of each level.
Fig. 8. A Conceptual Architecture of HBTS’s transmission substations
Table 2 pictures the relations between HBTS and Korea's power system. 1 level has 2 generators and 2 loads. On Korea's power system, 1 level is starting point with generators and 765kV substations. And we assume that generators in 1 level produce overall power that is equal to overall load, and produced power P does not loss in the
Design of a Transmission Simulator Based on Hierarchical Model
119
power providing process from high level to low levels. Level 2 has 30 loads that is sum of each child's count of 1 level that is each 15. On Korea's power system, Level 2 is consist of 345kV substations. Also, each 3 level and 4 level has 300 loads and 1500 loads. And they have hierarchical structure. On Korea's power system, each 3 level and 4 level is consist of 154kV substations and 22.9kV substations. Table 2. The relations between HBTS and Korea’s power system
Level
Structure
1
4 node feeder
2 3 4
Tree, Hierarchical
node type
count
Korea Power System
generator
2
generator
load
2
765kV
load
30
345kV
load
300
154kV
load
1,500
22.9kV
For generating base data in BDG, first step is generating base data of l level that are power, voltage, current values, and these values are obtained from power formula with convergence values of 4 Node Test Feeder. Second step is generating base data of other levels that can be taken by power current value of 1 level with KCL.
Fig. 9. Flow Diagram for obtaining Slack feeder power and line current
Fig. 9 shows the flow diagram for obtaining slack feeder power and line current. It can be summarized as follows. First, make YBus about each line. Second, get P , Q for making power formula. Third, get real power ∆P and reactive power ∆Qwith Newton-Rhapson method that is non-linear power method. Fourth, if the method isn't
120
S.H. Han and Y.K. Kim
convergent, from second to fourth step is repeated after obtaining improved phase angle and voltage using jacobian matrix until the method is convergent. Fifth, if the method is convergent, calculate line current based on slack feeder. For generating current(i), power(p), voltage(v) in each 1-4 level, the way to obtain values is different between 1 level and 2-4 level. In 1 level, we obtain the values after calculation (7) that has power P as input value. Each voltage value is assigned substation voltage value of each level. But, substation voltage value of each level is not always same, but it has voltage difference in the range of 3%. Each phase angle that is has the value 45 by default. I
A
(7)
In 2-4 level, we obtain power value with (8) that has current I that is taken by KCL with feeder current of upper level. Voltage and Cos are same as the values of 1 level. P
VIcosθ Watt
(8)
With (7) and (8), we can get the base value of each substations in all level. 3.2
TDG(Trend Data Generator)
TDG creates power trend data of each substation transforming the base data of BDG with status of real time power send and receive that provided by KPX(Korea Power eXchange)[8].
Fig. 10. Real time power usage status of KPX
In KPX, it provides the power data graph like as Fig. 10. For using this in HBTS, we need to change a graph form to digital values. Then, we obtained 1 minutes interval values with programming logic. Fig. 11 shows graph of the transformed values.
Fig. 11. Transformed power usage status
Design of a Transmission Simulator Based on Hierarchical Model
121
TDG can generate the trend data of each substation with both the base data from BDG and transformed power usage from KPX data. 3.3
RDG(Result Data Generator)
RDG(Result Data Generator) is the component that outputs the result of each substation's values timely after execution of BDG and TDG. A result format has 4 lists of attribute as Table 3. Each attribute is time T, voltage v, current i, power p. Table 3. Attributes of RDG Result
attribute time voltage current power
4
unit T v i p
description 1 minute time interval v of current node i of current node p of current node
Conclusion and Future Work
In this paper, we described about the design of HBTS. It is modeled as the hierarchical architecture that is similar to Korea's transmission power. Also, we described the result format. With HBTS’s simulation data, we hope to enhance the performance and functionality of main components that are RTDB, HMI, data compression etc. also, we can easily change HBTS for representing Korea's power system because the main architecture of HBTS is similar to Korea's one. In a near future, we will implement HBTS and generate transmission simulation data.
References 1. Lee, K.H., Kil, I.J., Choi, J.Y., Lee, S.K.: Design of Real-Time Power System Simulator for Education using LabVIEW. Journal of the korean Institute of Illuminating and Electrical Installation Engineers 24(6), 177–182 (2010) 2. Baik, S.D., Kim, S.K., Lee, J.H., Lee, S.C.: A study on the development of substation power system simulator for education and training. In: The Korean Institute of Electrical Engineers summer Annual Conference, pp. 67–69 (2004) 3. KEPCO: ELECTRICITY & ENERGY (2010), http://www.kepco.co.kr/museum/e_energy/energy_1_3.html 4. KPX: figure of power system in 2009 (2009), http://www.kpx.or.kr/KOREAN/servlet/action?index=233 5. Song, K.Y.: Power System Engineering. Dongilbook (2009) 6. Song, K.Y.: The newest distribution engineering. Dongilbook (2010) 7. IEEE: IEEE 4 Node Test Feeder. IEEE Power Engineering Society Power System Analysis, Computing and Economics Committee (2006) 8. KPX: Real-time Snapshot of power Supply and Demand (2011), http://www.kpx.or.kr/KOREAN/htdocs/main/sub/ems_info.jsp
122
S.H. Han and Y.K. Kim
9. KERI: Development of Power System Simulator. Final Report of a secondary year in KERI (1996) 10. Bergen, A.R., Vittal, V.: Power System Analysis. Prentice Hall, Englewood Cliffs (1986) 11. Saadat, H.: Power System Analysis. McGraw-Hill, New York (2007) 12. Ducan, J., Sarma, S., Overbye, J.: Power System Analysis And Design. Thomson (2008) 13. Im, J.H., No, C.K., Yeum, J.E., Jun, J.: Load Flow for Smart grid. A graduation thesis of collage graduate in ChungNam national University (2011)
Recommendation System of IPTV TV Program Using Ontology and K-means Clustering Jongwoo Kim1, Eungju Kwon1, Yongsuk Cho2, and Sanggil Kang1 1
Department of Computer Science and Information Engineering, Inha University, 253,Yonghyun-dong, Nam-gu, Incheon, 402-751, Korea {bestkjw,ora1126}@inha.edu,
[email protected] 2
Department of Electronic Engineering, Konyang University, Nea-dong, Nonsan-si, Chungcheongnam-do, Korea
[email protected]
Abstract. In this paper we introduce a recommendation system for recommending preferred TV genres for each viewer, using ontology technique and K-means clustering algorithm. The algorithm is developed based on the personal VOD viewing history. First, the viewing history is built in an ontology which is able to achieve inference process through a query. In the list of users, each item and class obtain the probability of preference of VODs and then the information is used for building the ontology. From the ontology we select each user’s preferred VODs using K-means algorithm. In the experimental section, we show the feasibility of our algorithm using real TV viewing data. Keywords: clustering, K-means, ontology, recommendation of genres.
1
Introduction
IPTV service is a kind of VOD services by selecting an appropriate TV program from the menu. Once a user turns on an IPTV, he/she has to click the menu at least 5-6 times for searching his/her preferred programs. To help a user to select programs, the information of user’s preferred TV programs is needed to reduce searching time and effort. Therefore, the development of personal VOD recommendation has been one of hot issues presently. To develop recommendation systems, mining techniques and ontologies have been commonly applied in recent years. For example, Benjamin Adrian et al. [2] proposed ConTag document, which extracts topics of documents using WordNet [9] and then recommend the tags related to the concepts of topics. Mining methods have the problem that it is bias on the personality tendencies. To solve the problem, the K-means algorithms have implemented leader clustering and the IAFC (Integrated Adaptive Fuzzy Clustering) artificial neural network algorithms [5]. Leader clustering without the initial number determines the number of clusters based on the similarity of data. In this paper we develop a recommendation system based on the personal VOD viewing history. The viewing history is built in an ontology which is able to achieve inference process through a query. In the list of users, each item and class obtain the T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 123–128, 2011. © Springer-Verlag Berlin Heidelberg 2011
124
J. Kim et al.
probability of preference of VODs and then the information is used for building the ontology. From the ontology we select each user’s preferred VODs using K-means algorithm. The remainder sections of this paper are composed as follow: In Section 2, we introduce the related works of our proposed system. In Section 3 we describe our proposed recommendation algorithm. In Section 4, we show the implementation of our system from running a simple test. Finally, we conclude in Section 5.
2
System Architecture of Recommendation
Fig. 1 shows the overall architecture of our proposed recommendation system in IPTV content provider side. The system consists of four modules such as Data Collection, Knowledge Management, Preference Computing, and Reasoning. Data Collection module collects viewer’s content consumption information from Set-top Box connected to IPTV at viewer side. Based on the information, Knowledge Management module builds ontology-based semantic relations among IPTV contents using description keywords (DKs) extracted by WordNet [9]. Preference Computing module computes the preference of contents from ontology built in Knowledge Management. Reasoning module recommends each user’s preferred TV programs using K-means clustering. iTV Content Provider Viewer Matching Module
Content Matching Module
Recommendation Access Network
Knowledge Management Module
Data Collection Module
Set-top Box User Profile Usage History
User Data
AD Advertisement Content Provider
Fig. 1. Overall architecture of PTA system in iTV
3
Our Recommendation Algorithm
In this section, we show the way to compute the similarity among TV programs by measuring the semantic distance from their program ontology as seen in Eq. (1). sim
i. p C , j. p C
∑ EWS i. p C synset j. p C EWS i. p C
(1)
where i.p(C) and j.p(C) are the context C of the description property p in program i, j. Also, EWS (i.p(c)) is total number of context C union of description key words between
Recommendation System of IP PTV TV Program Using Ontology and K-means Clustering
125
hyperset and hyposet of context c C and EWS Onto . p C synset Onto . p C is the total of similar for con ntext C of program i and j. Eq. (2) is the similarity forr all context C. sim i, j
∑ sim i. p C , j. p C a property number of i
(2)
The case not to be overlook ked from Eq. (2) is that there are more than two prograams have same similarity meassure. In that case, we need to give weight accordingg to relative viewing times of eaach viewer to for each program as seen Eq. (3). Item
number of viewing
(3)
By clustering the variety of viewing tendency, we can recommend each vieweer’s favorite genre and its programs. The center point to coordinate the viewer's prefereence list determined from the previous p equations. If some viewers prefer two kindss of programs, than two clustering is more likely to be generated. To avoid the problem, we use K-means algorithm m to analyze and to generate clustering as an effecttive technique. The number of clustering g based on viewers is different, so it is not easily to preddict. Therefore, the number of clusters c needs to be dynamically adjusted. Also, when the size of two clusters is largee as seen in Fig. 2, the center of two clusters will appeaar as the viewer's preference. Theerefore, the programs in the center are recommended.
Fig. 2. Example of clustering
Fig. 3 shows the clusterring process. The number of clusters start with one at the initial state. As seen in Fig.. 3-(a), there is one cluster. In this case, the diameter off the cluster is greater than the prredetermined maximum diameter of cluster, Rmax. Thus,, we divide the cluster into two o clusters (C1 and C2) as seen in Fig. 3-(b). If C2 is still greater than Rmax, then we divide d the cluster into two clusters as see in Fig. 3-(c). T This process continues until the diameters of all clusters less than Rmax. The value of Rmax is determined by the system m administrator. The smaller number tends to increase the number of clusters. Once the clustering proccess is finished, we can recommend viewer’s preferred TV programs based on the disttance of each program to the centers of clusters. The thhing
126
J. Kim et al.
Fig. 3. K-means clustering process
not to be overlooked is thee case more than one cluster exists as the closet clusters to the viewer, as seen in Fig g. 4. In this case, we need to determine the priority for recommending the program ms. For example, in Fig. 4, C2 has more viewer membbers than C1. As seen in the figure, f we have to recommend two candidate prograams because the target viewer is located in the same distance from the centers of the clusters. The candidate program p including C1 will have lower value than tthat including in C2 because thee number of members included in C1 is smaller than C2. To take into the consideration n, we provide weight to each cluster in order to givve a priority for recommending programs. p
Fig. 4. Tw wo clusters as the closet clusters to the viewer
Recommendation System of IPTV TV Program Using Ontology and K-means Clustering
4
127
Experiments
The proposed method is applied to predict the preference of TV genres for each viewer using 200 viewers’ TV watching history collected from December 1, 2008 to May 31, 2009. The viewers are clustered into 22 groups obtained from our algorithm explained at the previous section. Based on the groups, we tested our recommendation performance using five test viewers, i.e., User1, User2, User3, User4, and User5. For convenience, we divided TV programs into five genres such as Sports, Drama, Entertainment, Education, and Movie. Table 1 is the result of the recommendation ratio and its actual viewing ratio of genres for each test viewer. As seen in the table, User1 has watched 6 times of the genre “Sports” out of 8 times recommendation for the genre. So, it results in 75% recommendation accuracy. For the genre “Drama”, “Entertainment”, “Education”, and “Movie”, the recommendation accuracies are 93%, 90%, 33%, and 100%. The average recommendation accuracy of User1 is 80%. In the same process, the average recommendation accuracies of User2, User3, User4, and User5 are 80%, 73%, 80%, and 94% respectively. Table 1. Recommendation accuracy for test viewers Test viewer
Sports
Drama
Entertainment
Education
Movie
User 1
recommended genres viewed genres
8 6
15 14
8 7
3 1
3 3
User 2
recommended genres viewed genres
0 0
16 18
9 7
0 0
12 8
User 3
recommended genres viewed genres
1 0
13 11
7 5
4 3
1 0
User 4
recommended genres viewed genres
0 0
11 10
3 2
0 0
1 0
recommended genres viewed genres
10 5
11 14
8 5
0 0
7 10
User 5
5
Section
Conclusion
In this paper, we proposed a new IPTV program recommendation system using ontology and K-means algorithm. As seen in the experimental section, we showed the feasibility of recommendation system using real TV viewing history. Despite of the feasibility of our method, there are a couple of further works for the completion of work. As seen in the experimental result, our recommendation is explored using only semantic relations of TV programs and viewers’ TV viewing history. However, if we use viewers’ profile information such as their occupation, age, and gender, then the performance of our recommendation will be improved. Also, we need to develop a prototype in which our algorithm is imbedded, in order to test under real practical situation.
128
J. Kim et al.
References 1. Ansari, A., Skander, E., Rajeev, K.: Internet Revcommendation System. Journal of Marketin Research 37 (August 2000) 2. Chumki, B., Haym, H., William, W., Craig, N.: Recommending Papers by Mining the Web 3. Greg, L., Brent, S., Jeremy, Y.: Amazon.com Recommendations: Published by the IEEE Computer Society Industry Report (January 2003) 4. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wileyinterscience, New York (2001) 5. Kim, Y.S., Mitra, S.: Inegrated Adaptive Fuzzy clustering (IAFC) Algorithm. In: Proc. of the Second IEEE International Conference on Fuzzy Systems, San Francisco, vol. 2, pp. 1264–1268 (1993) 6. Kim, J.W., Yoon, T.B., Kim, D.M., Lee, J.H.: A Personalized Music Recommendation System with a Time-weighted Clustering, vol. 19(4), pp. 504–510 (2009) 7. Benjamin., A., Leo., S., Thomas, R.: ConTag: A Semantic Tag Recommendation System. In: I-SEMANTICS 2007 Papers, pp. 297–304 (2007) 8. Martín, L.N., Yolanda, B.F., José, J.P.A., Jorge, G.D., Manuel, R.C., Alberto, G.S., Rebeca, P.D.R., Ana, F.V.: Receiver-side semantic reasoning for digital TV personalization in the absence of return channels. Multimedia Tools and Applications 41(3), 407–436 (2009) 9. WorNet, http://wornetprincetion.edu
A Novel Interactive Virtual Training System Yoon Sang Kim1 and Hak-Man Kim2 1
School of Computer Engineering, Korea Univ. of Technology and Education, Korea
[email protected] 2 Dept. of Electrical Engineering, Univ. of Incheon, Korea
[email protected]
Abstract. This paper proposes a novel interactive virtual training system that provides PLC trainees with a virtual environment that is identical to actually handling various types of equipment trainees could not access during training programs. The proposed system is applied to an actual technical training program, and the results are analyzed to examine its propriety and applicability. Keywords: virtual training, interactive virtual training, PLC, virtual PLC, virtual conveyor, virtual ladder.
1 Introduction Virtual reality (VR) uses computer to simulate a specific environment and surrounding circumstances to deliver information regarding the environment through five human senses (sight, smell, hearing, taste, and touch), allowing people in a virtual space to engage in experience identical to that of the real world. In addition to simulating the real world, VR allows people to simulate experiences that are not feasible in the actual world. Today, VR technology is used in various fields, including military, medical, construction, design, experience, training, and entertainment [1]. Korea has some of world's highest levels of technology in industrial areas such as electric power, automobile, semiconductors, display, mobile handsets, steel, energy, and shipbuilding. The most important group of technical experts in cutting-edge manufacturing and facility industries are PLC automation professionals. Due to the nature of manufacturing and facility industries, PLC automation training requires very expensive latest equipment. However, since inadequate manipulation or programming errors by trainees can damage the equipment and there are difficulties involved in continuously replacing and providing the latest equipment, most expensive equipment are not used at all or provided only with restricted and limited use during training sessions. Accordingly, most trainees listen to explanations rather than hands-on manipulation, which places limitations compared to actually being able to operate the equipment. In order to overcome the constraints of the training environment due to the nature of the latest automation facilities and address the importance of training, this paper proposes an interactive virtual training system based on 3D stereoscopic imaging, allowing expensive equipment to be replaced with virtual equipment. There has been hardly any research and development on the training system for PLC automation T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 129–138, 2011. © Springer-Verlag Berlin Heidelberg 2011
130
Y.S. Kim and H.-M. Kim
equipment using VR. In order to allow PLC trainees to handle virtual equipment that models actual high-price equipment such as elevators and conveyors, providing the effects of engaging in hands-on training, the virtual training system developed by this study is deployed in an actual training program and the results are analyzed to verify the feasibility of the proposed system.
2 Proposed Interactive Virtual Training System 2.1 System Overview The proposed system consists of the object-based sensor input unit that processes sensor input, the virtual PLC composer unit that receives user's arbitrary wiring input, the virtual ladder tool unit for writing PLC ladder programs, the virtual PLC unit that replaces the actual PLC, and the object-based virtual input/output model unit that provides 3D visualization of the results that correspond to arbitrary input data. The overall block diagram of the proposed system is shown in Figure 1.
Fig. 1. Block diagram of the proposed system
2.2 Virtual PLC Composer This unit receives the initial input from the user (trainee) and supports arbitrary wiring. Whereas most input wiring units accept only pre-assigned (correct) wiring, the virtual PLC composer unit in the proposed system accepts any wiring configuration (even incorrect wiring) from the user. The trainee can study materials related to a specific session and practice wiring using the virtual PLC composer shown in Figure 2. In order to cover various user's arbitrary wiring input, SAI(Scene Access Interface) [2,3] is used for the implementation.
A Novel Interactive Virtual Training System
131
Fig. 2. The virtual PLC composer
- Screen shot of practice wiring using the virtual PLC composer 2.3 Object-Based Virtual Input/Output Model Unit That Provides 3D Visualization
Fig. 3. The object-based virtual input/output model unit
- Screen shot of an object-based 3D input/output model and simulation of a motor sequence control This unit converts the user wiring input configuration from the virtual PLC composer into corresponding 3D visualization and allows the user to verify the result through simulation. Accurate wiring simulates correct action, whereas incorrect
132
Y.S. Kim and H.-M. Kim
wiring configuration produces a simulation result of faulty action (Figure 3). All the virtual input/output models are implemented based on X3D, the Web3D which is international standard graphic format. [4-7] 2.4 Virtual Ladder Tool We need a virtual ladder tool that provides a usage environment similar to the actual environment while satisfying the dedicated ladder tool for virtual PLC. Based on this requirement, we developed a ladder tool for programming ladders that can be loaded into virtual PLC. Ladder diagram (LD) into instruction list (IL) can be transformed based on [8]. Specifications of the functions supported by the ladder tool are based on the international standard IEC61131-3. Figure 4 displays the generic virtual ladder tool implemented by this study.
(a)
(b)
Fig. 4. (a) Screen shot of the generic virtual tool implemented by this study (b) Screen shot of virtual ladder tool simulating GMWIN, the actual GLOFA ladder tool
3 Application in Practice and Result Analysis 3.1 Application in Training Program - Experiment After deploying the proposed system in a technical training program (GLOFA-PLC control) conducted at our university's training center, applicability and feasibility of the system were evaluated by based on the results of the survey conducted among the trainees and instructors of the program. The training session involved controlling a virtual conveyor implemented in a virtual environment to simulate a high-level conveyor, a large-scale and expensive machine that is difficult for trainees to operate for training purposes. The practice session was conducted in a sequence identical to practicing on an actual machine, as shown in Figure 5 and Figure 6. 14 trainees participated in the experiment, including professional training school teachers, industrial high school teachers, and large companies' in-house instructors. After being introduced to the proposed virtual training system, the participants were
A Novel Interactive Virtual Training System
133
Fig. 5. Sequence of the conveyor control practice
Fig. 6. Screen shot of virtual conveyor control training session
trained in the conveyor control practice session. Upon completion of the session, they were asked to respond to a survey consisting of 10 questions, as shown in Figure 7, and display the level of satisfaction according to the Likert scale: highly satisfied (5 points), satisfied (4 points), average (3 points), unsatisfied (2 points), highly unsatisfied (1 point). 3.2 Analysis of the Results from the Training Program Table 1 shows the statistics regarding the survey response from the trainees after the practice session using the proposed system. The overall average was close to "satisfied (4 points)", confirming applicability of the training program based on the proposed system. If we can improve on some aspects, such as user-friendliness and interface, and expand functionality, the system should be fully applicable in virtual PLC training programs.
134
Y.S. Kim and H.-M. Kim
Fig. 7. Questionnaire used for evaluation on the virtual conveyor session using the proposed system
A Novel Interactive Virtual Training System
135
Table 1. Statistics regarding the questionnaire responses
Motivation N
Effective Missing Average
14 0 4.0 Adequacy
N
Effective Missing Average
14 0 4.0
N
Effective Missing Average
Training Effectiveness 1 14 0 4.0
Training Effectiveness 2 14 0 4.0
Reality
Distinction 14 0 4.0
Effectiveness as replacement 14 0 4.0
14 0 4.0
Training Effectiveness 2 14 0 4.0 User-friendliness 14 0 4.0
Applicability 14 0 4.0
Following figures display all the results of the questionnaire survey.
(a)
(b)
Fig. 8. (a) Question 1 - learning motivation (b) Question 2 - training effectiveness I
136
Y.S. Kim and H.-M. Kim
(a)
(b)
Fig. 9. (a) Question 3 - training effectiveness II (b) Question 4 - training effectiveness III
(a)
(b)
Fig. 10. (a) Question 5 - Adequacy of training (b) Question 6 - Reality of training
A Novel Interactive Virtual Training System
(a)
137
(b)
Fig. 11. (a) Question 7 - Distinction from other training (b) Question 8 - User-friendliness
(a)
(b)
Fig. 12. (a) Question 9 - Effectiveness in replacing actual equipment (b) Question 10 - Applicability in technical training
138
Y.S. Kim and H.-M. Kim
4 Conclusions This paper proposed a novel interactive virtual training system. The proposed interactive virtual training system was developed to allow PLC trainees to operate virtual equipment modeled after actual high-price machinery such as elevators and conveyors, providing experience identical to operating actual equipment. Using the virtual equipments, the proposed system could offer a training environment that allows free and unlimited repetitions in virtual space without concerns of safety and equipment damage from malfunctioning and program errors. The proposed system was applied in an actual training environment for students and teachers, and their survey responses were analyzed to confirm that the system is applicable in training programs.
References 1. Burdea, G., et al.: Virtual Reality Technology. IEEE Computer Society Press, Los Alamitos (2003) 2. Apache Ant Project, http://ant.apache.org/ 3. SAI Tutorial, http://www.xj3d.org/tutorials/general_sai.html 4. Brutzman, D., Daly, L.: X3D: Extensible 3D Graphics For Web Authors. Morgan Kaufmann Publishers, San Francisco (2007) 5. Web3D Consortium, http://www.web3d.org/ 6. Xj3D, http://www.xj3d.org/ 7. Xj3D 2.0 Code Library, http://www.xj3d.org/javadoc2/index.html 8. Fen, G., Ning, W.: A Transformation Algorithm of Ladder Diagram into Instruction List Based on AOV Digraph and Binary Tree. Journal of Nanjing University of Aeronautics & Astronautics 38(6), 754–758 (2006)
Recommendation Algorithm of the App Store by Using Semantic Relations between Apps Yujin Lim1, Hak-Man Kim2, Sanggil Kang3, and Tai-hoon Kim4 1 Department of Information Media, University of Suwon, 2-2 San, Wau-ri, Bongdam-eup, Hwaseong-si, Gyeonggi-do 445-743, Korea
[email protected] 2 Department of Electrical Engineering, University of Incheon, 12-1 Songdo-dong, Yeonsu-gu, Incheon 406-772, Korea
[email protected] 3 Department of Computer Science and Information Engineering, Inha University, 253 Yonghyun-dong, Nam-gu, Incheon 402-751, Korea
[email protected] 4 Department of Multimedia Engineering, Hannam University, 133 Ojeong-dong, Daedeok-gu, Daejeon, 306-791, Korea
[email protected]
Abstract. In this paper, we propose a recommendation algorithm for recommending mobile application software (app) to mobile user using semantic relations of apps consumed by users. To do that, we define semantic relations between apps consumed by a specific member and his/her social members using Ontology. Based on the relations, we identify the most similar social members from the reasoning process. In the experimental section, we show feasibility of our algorithm using a specific scenario. Keywords: app, recommendation, mobile, semantic relation, social members.
1 Introduction A report commissioned by mobile application store GetJar in 2010 [1] said that the mobile app (application software) market will reach $17.5 billion by 2012, having grown to 50 billion downloads from just 7 billion in 2009. The mobile app market definitely has tremendous room to grow, which takes the mobile app paradigm and slaps it onto a bigger, tablet device. Mobile app recommendation services have sprung out of a growing need to filter, rank and recommend the best apps from the hundreds of thousands now available for download onto mobile phones or tablet PCs. With iTunes now carrying 225,000 apps and Android up to 100,000, it is no wonder users have turned to other resources beyond the search box and category listings found in the official vendor-specific app stores. For the end users, recommendation services like these prove useful, even necessary at times. Personalization is defined as the ability to provide content and services tailored to individuals based on knowledge about their preferences and behavior [2]. The main T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 139–144, 2011. © Springer-Verlag Berlin Heidelberg 2011
140
Y. Lim et al.
goal of personalization is to help users find the information they are interested in, what can significantly enhance their mobile experience. Most of personalization systems try to filter available content by user's preferences and recommend only content found potentially interesting for that particular user. Personalized apps recommendation services analyze app purchases made by users, and then recommends similar apps that users may find useful for their mobile devices. The same approaches are used for recommended music and video purchases. Each service offers its own feature set and ranking algorithm. Some, for example, not only take into account an app's popularity, but also its media coverage when ranking the apps. Others use a combination of signals including apps you own and members' reviews in their algorithms. AppStoreHQ [3] is a site featuring apps for Android, iPhone, and iPad and even Web-based HTML apps. Here, a user can find what apps are hottest on the Web, what are hottest on Twitter and it offers app reviews. Appolicious [4] ranks and recommends iPhone, iPad, Android and Yahoo applications using a number of mechanisms, including reviews, likes, and friend recommendations. Users can follow their friends on the site and train the recommendation engine by sharing what apps they already have installed. Then, when signed in, the site can recommend new apps to try based on your preferences, what apps you own and other signals. Smokin Apps [5] features the top mobile apps for iPhone, Android, Blackberry, Nokia, Palm and Windows Mobile. Its recommendation engine matches a user with apps he would like based on apps similar to those his own. It then combines that information with other members' recommendations using an algorithm that tracks every way user’s rate apps on the site. Apple adds Genius recommendation tab to iPad App Store [6]. A mobile version of Apple's Genius recommendation feature, which suggests applications to users based on their account activity. The most difficult aspect of recommendation service is to understand user's preferences and to use them in an intelligent way for app filtering. In order to solve the problem, we define semantic relations between apps consumed by a specific member and his/her social members using Ontology [7]. Based on the relations, we identify the most similar social members from the reasoning process. The reasoning is explored from measuring the common attributes between apps consumed by the target member and his/her social members. The more attributes shared by them, the more similar is their preference for consuming apps. Once the similar members are identified, then the consumed apps by the members are recommended to the target member. The remainder of this paper is organized as follows. Section 2 introduces related works of our work. Section 3 explains our proposed algorithm. Following this, we demonstrate our algorithm using a scenario in Section 4. Finally, we conclude in Section 5.
2 Related Work According to [8] and [9], personalization techniques are classified in four classes: content based filtering, collaborative filtering, model based techniques, and hybrid
Recommendation Algorithm of the App Store by using Semantic Relations between Apps
141
techniques. Content based filtering uses an individual approach which relies on user's ratings and item descriptions. Items having similar properties as items positively rated by user are being recommended to the user. The most common problem of content based filtering is the new user problem. This problem occurs when a new user is added to the system, hence has an empty profile and cannot receive recommendations. Collaborative filtering recommends a target user the preferred content of the group whose content consumption mind is similar to that of the user. Problems in collaborative filtering occur when new content item is added to the system, because the item cannot take place in personalization without being rated before. It has been attractive for predicting various preference problems such as net-news, e-commerce, digital TV, digital libraries, etc. Model based techniques are usually implemented by using a predetermined model. They represent an improvement in scalability issues, because part of data is pre-processed and stored as model, which is used in the personalization process. Hybrid personalization techniques combine two or more personalization techniques to improve the personalization process. In most cases, content based filtering is combined with collaborative filtering. Traditional personalization techniques can provide very suitable solution for tailoring apps according to user's preferences. On the other hand, traditional personalization has limitations in accuracy of modeling user's behavior. In this paper, we use the semantic relations among apps using Ontology concept. From the following section, we demonstrate our reasoning algorithm in detail.
3 Reasoning Algorithm Using Semantic Relations In this subsection, we describe the reasoning algorithm to provide personalized app to each social member in a collaborative manner. The approach is explored under the assumption that the interests in app of each member tend to agree with those of the social members. We first define the semantic relations between the apps a specific (or target) member and his/her members used and attributes of each app consumed by each member through social network. Fig. 1 is a part of the semantic relationship established from interrelating the specific apps of the target member and the apps of his/her social members. The semantic relations are defined by extracting the common instances for each attribute of the apps using WordNet [22]. In the figure, the target member’s apps and the app of social member are represented on the left side and on the right side respectively. Also, the yellow ellipse, blue ellipse, and gray square denote sub-class, attribute class, and attribute respectively. Due to the diversity of general social concepts, our ontology is built with a multiple hierarchy of classes that represents domain concepts such as Food and Social network. Each class contains sub-classes such as Recipe. Also, the classes and instances of the apps are linked through their semantic attributes according to their semantic similarity. Based on the links, we can infer the semantic similarity between the target’s apps and the apps of social members. If the classes or instances in the target’s apps and the apps of the social members are connected to each other with a link, then we consider the apps to be semantically similar. The more common links
142
Y. Lim et al.
Fig. 1. A part of the semantic relationships established by interrelating the shopping product the stereotyped and the ontology of commercial advertisements
there are between them, the more similar they are in terms of the semantic relation. Consequently, the more relations we can infer between them (and the more relations from that set are established through common attributes), such as Life style of the instances in Fig. 1, the greater their semantic similarity is. Also, the same semantic metric is established through sibling attributes if they have attributes belonging to the same class, such as Recipe of classes. For example, let’s assume four app are consumed, such as Cookcook TV, Touchring Mobile, Weather Desk, and Word Break. The app Cookcook TV has the instances such as Life style, Food category, individual, Recipe, and Korean. The instance Life style is shared with Food&Cafe in the app of member. Also, Recipe is connected to Food&Café and Kimchi, because Recipe is included in the sub-class of class Food, which is connected to the instance Food in Kimchi. The instance Food categories are connected the instance Search and the instance Search is connected the instance Company of Smart 114. Therefore, Cookcook TV and Smart 114 have the equal function relationship. Once the semantic relations among apps consumed by social members as above are defined, then for each target member, we can identify the social members who have similarity preference in consuming apps from the measurement of the number of links in the semantic relations. The more links are shared between the target member and his/her social member, the more similar they are for consuming apps.
4 Experiment In this section, we describe the reasoning process of our recommendation system using a specific scenario. In order to demonstrate our reasoning process, we first collected 24 apps from a social network in which there five members as seen in Table 1.
Recommendation Algorithm of the App Store by using Semantic Relations between Apps
143
Table 1. Attributes of apps consumed by members in the social network Member A
B
C
Apps Cookcook TV, Touchring Mobile, Weather Desk , Word Break Chinese-character Study, Driver´s License Pretest, Sign Language Dictionary, YBM English Dictionary, TOEIC Speaking Smart Dial, Kimchi, Clip English, Kakaotalk, Film Lab, sGeoNotes
D
Food&Café, Konan VoiceStar, Kimchi, Smart114
E
Weather Star, mFAX, One Lock, My Photo Album, iSharing
Attributes life style, food categories, cooking method, Individual, SMS, call, weather, real-time, temperature, song, rainfall, wordbook, video, education, Korean, Japanese, English Chinese character, a qualifying examination, study, vocabulary, writing, driver’s license, pretest, an item pool, sign language, life style, TOEIC, English, Korean phone, study, slow food, cooking method, photo, social network, memo, schedules, English life style, good food restaurant, navigation service, coupon service, food, slow food, side dish, cooking method, vocal mimicry, entertainment, apply, company, productivity, Korean, English weather, a weather forecast, smart-card, security, economy, fax, real-time, GPS tracking, location, call, photo, social network, English
From the table, there are 3 common attributes between the member A and the member B, 2 common attributes with the member C, 7 common attribute with the member D, and 2 common attribute with E. From the result, we consider the member D is the most similar to the member A because the number of common attributes shared between member A and the member D is the largest compared to the other members. Therefore, we recommend the apps consumed by the member D to the member A as shown in Fig. 2. Table 2 shows the list of the identified similar member for consuming apps.
Fig. 2. The recommended apps thru the member A’s mobile
144
Y. Lim et al. Table 2. Attributes of apps consumed by members in the social network Member A B C D E
Identified similar member D C E A C
5 Conclusion In this paper, we proposed the recommendation algorithm using semantic relations between apps consumed by social members. For showing the feasibility of our algorithm, we developed a prototype of a personalized apps recommendation system using OWL by defining ontology-based semantic relations among mobile apps consumed by social members. As a further work, we need to do further experimental analyses using more data. This is necessary because the reasoning performances depend on the richness of the data. If the population in a social network is large, then the identification performance will be improved.
References 1. ReadWriteWeb, http://www.readwriteweb.com/archives/ mobile_app_marketplace_175_billion_by_2012.php 2. Liang, T.-P., Yang, Y.-F., Chen, D.-N., Ku, Y.-C.: A Semantic Expansion Approach to Personalized Knowledge Recommendation. ACM Decision Support Systems 45, 401–412 (2008) 3. AppStore HQ, http://www.appstorehq.com/ 4. Appolicious, http://www.appolicious.com/ 5. Smokin Apps, http://smokinapps.com/ 6. Genius Recommrendations, http://www.appleinsider.com/articles/10/08/ 06/apple_adds_genius_recommendation_tab_to_ipad_app_store.html 7. Maedche, A., Staab, S.: Learning ontologies for the semantic web. In: Semantic Web Worskhop (2001) 8. Blanco-Fernández, Y., Pazos-Arias, J.J., Gil-Solla, A., Ramos-Cabrer, M., López-Nores, M., García-Duque, J., Fernández-Vilas, A., Díaz-Redondo, R.P.: Exploiting Synergies between Semantic Reasoning and Personalization Strategies in Intelligent Recommender Systems: A Case Study. Elsevier The Journal of Systems and Software 81, 2371–2385 (2008) 9. Vuljanic, D., Rovan, L., Baranovic, M.: Semantically Enhanced Web Personalization Approaches and Techniques. In: Proc. of Intl. Conf. on Information Technology Interfaces, pp. 217–222. IEEE Press, Croatia (2010)
A Comparative Study of Bankruptcy Rules for LoadShedding Scheme in Agent-Based Microgrid Operation Hak-Man Kim1 and Tetsuo Kinoshita2 1
Dept. of Electrical Engineering, Univ. of Incheon, Korea
[email protected] 2 Graduated School of Information Science, Tohoku Univ., Japan
[email protected]
Abstract. A microgrid is a small-scale power system composed of distributed generation systems (DGs), such as solar power, wind power, and fuel cells, distributed storage systems (DSs), and loads. It is expected that many microgrids will be introduced in power grids in the near future as a clean energy grid. In microgrid operation, to meet the rating frequency, 50 or 60 Hz, is an important requirement of microgrid operation. Especially, in the case of supply shortage in the islanded operation mode, load shedding, which is an intentional load reduction, is used in order to maintain the rating frequency. Recently, a loadshedding scheme using the bankruptcy rule has been proposed as a reasonable method. In this paper, five well-known bankruptcy rules are compared and discussed for load shedding of islanded microgrid operation. Keywords: microgrid, microgrid operation, multiagent-based microgrid operation, load-shedding scheme, bankruptcy problem.
1 Introduction A microgrid, which was proposed by Prof. Lasseter in 2001 [1], is a small-scale power system composed of distributed generation systems (DGs), such as solar power, wind power, and fuel cells, distributed storage systems (DSs), and loads as shown in Fig. 1 [2]. The microgrid provides electricity and/or heat to customers such as residential buildings, commercial buildings, public offices and industrial compounds, as shown in Fig. 1, where CHP means combined heat and power and PCC is an abbreviation for the point of common coupling. To meet the rating frequency, 50 or 60 Hz, is an important requirement of microgrid operation. In grid-connected microgrid, a microgrid can solve an imbalance by power trades with the upstream power grid. However, a microgrid should solve an imbalance without trading power with any power grid in islanded operation. Especially, load shedding, which is an intentional load reduction, is used generally in the case of supply shortage. Recently, a load-shedding scheme using the bankruptcy rule has been proposed by Kim et al [3-5] as a reasonable method for load shedding. In those works, well-known bankruptcy rules such as the constrained equal awards (CEA) rule, the constrained T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 145–152, 2011. © Springer-Verlag Berlin Heidelberg 2011
146
H.-M. Kim and T. Kiinoshita
Fig. 1. Typical configuration of a microgrid [2]
equal losses (CEL) rule, thee Talmud rule, and the random arrival (RA) rule have bbeen applied. In this paper, th he bankruptcy rules including the proportional rule are compared and discussed forr load shedding.
2 Load Shedding in Islanded I Microgrid Operation An important requirement of microgrid operation is to meet the rating frequency,, 50 or 60 Hz, which is closely related to a balance between supply and demand of power M has two operation modes: grid-connected and in microgrid operation. Microgrid islanded. In the grid-conneected mode, an imbalance is solved by trading power w with the upstream power grid. However, H an imbalance should be solved without tradding power with any power grid in the islanded mode. In general, an imbalancee by slight supply shortage can be solved by dischargee of distributed storage systemss (DSs). However, critical imbalance cannot be solvedd by only discharge of DSs. In this case, an intentional reduction of load, i.e. lload shedding, is used in order to solve the critical imbalance. Although load sheddding ortable, it is the only practical option to meet the ratting makes consumers uncomfo frequency as a critical requirement of microgrids. A conventional load-sheddding scheme is to intentionally reduce load amounts by low order of priority of loaads. m of how to deal with loads having same priority. However, there is a problem Recently, an approach using the bankruptcy rule to load shedding has been propoosed m idea of the approach is to consider the load-sheddding by Kim et al [3-5]. The main
A Comparative Study of Bankruptcy Rules for Load-Shedding Scheme
147
problem as a bankruptcy problem dealing with dividing the short estate to claimants. In the approach, well-known bankruptcy rules such as the CEA rule, the CEL rule, the Talmud rule, and the RA rule have been applied. The results showed the feasibility for load shedding using the bankruptcy rules. By the approach, the load-shedding problem was defined as a pair (l, P), where P is available power and l = (l1,······, ln) is the vector of claims of loads, which is described as 0 ≤l1 ≤ ······≤ ln and 0 ≤ P ≤ l1+······+ln.
(1)
The vector of allocated power (l*) of each load is defined by (2), l* = (l1*,······ ,ln*),
(2)
where the available power is allocated by bankruptcy rule. The vector of amount of load-shedding of each load (s*) is calculated by (3) s* = (s1*,······ ,sn*) = l – l*.
(3)
3 Comparison of Bankruptcy Rules for Load-Shedding Scheme 3.1 Bankruptcy Rules In this paper, five bankruptcy rules, i.e. the proportional rule, the Talmud rule, the CEA rule, the CEL rule, and the RA rule as bankruptcy rules are considered. The details of the rules are described in [6]. In (4)-(8), P is available power and l = (l1,······, ln) is the vector of claims of loads. Allocating short power to loads using the proportional rule is defined as Pi (l, P) = l * = λli ,
(4)
where λ is chosen so that Σ λ li = P. Allocating short power to loads using the CEA rule is defined as
CEAi (l, P) = l * = min { l i ,λ } ,
(5)
where λ is chosen so that Σ min{li, λ} = P. Allocating short power to loads using the CEL rule is defined as CEL i (l, P) = l * = max { 0 , l i − λ } ,
(6)
where λ is chosen so that Σ max{0, li - λ} = P. Allocating short power to loads using the Talmud rule is defined as
⎧ min{li /2, λ} Ti (l,P) = l* = ⎨ ⎩ lci - min{li /2, λ}
≥
if Σ (li /2) P , other
(7)
where, and λ is chosen so that Σmin {lci/2, λ} = P and is chosen so that Σ [lci - min {lci/2, λ}] = P, respectively.
148
H.-M. Kim and T. Kinoshita
Allocating short power to loads using the RA rule is defined as
RAi (l, P) = l * =
1 min{ li , max{ P − n!
∑
j∈N ,π ( j )<π ( i )
l j , 0 }} .
(8)
3.2 Test Conditions and Results Test results of four rules excluding the proportional rule is used in previous works [35] basically and the test result of the proportional rule is tested additionally in the paper to compare features of five bankruptcy rules. A multiagent-based islanded microgrid constructed for tests is shown in Fig. 2, where IMG means the islanded microgrid. The multiagent system is composed of an MGOCC agent (AgMGOCC), three DG agents (AgDG1, AgDG2, AgDG3), a DS agent (AgDS), and ten load agents (AgL1, …. , AgL10). The detailed information of the multiagent system-based microgrid was described in references [3-5].
Fig. 2. Configuration of multiagent-based islanded microgrid for load-shedding tests
Figures 3-7 show results of load-shedding using the proportional rule, the CEA rule, the CEL rule, the Talmud rule, and the RA rule, respectively. In the application of the proportion rule, as shown in Fig. 3, load-shedding is performed proportionally within power claim of each load agent. In the application of the CEA rule, as shown in Fig. 4, equal power is allocated to all load agents in a serious power shortage of 1,100 kWh. On the other hand, in small supply shortages, load agents claiming a lot of power take charge of load-shedding and load agents claiming small power are ruled out from load-shedding. In the application of the CEL rule, as shown in Fig. 5, equal load-shedding is applied to load agents not limited by their power claims and amounts of load claims are imposed to load agents limited by their power claims as loadshedding amounts in power shortages of from 1,100 kWh to 300 kWh in the figure. In power shortage of from 300 kWh to 100 kWh, equal load-shedding is applied to each load agent, where load-shedding amount imposed evenly to each load agent is less than or equal to the load amount of L1, which is smallest load. In the application of the Talmud rule, as shown in Fig. 6, shortage power is allocated to all load agents almost proportionally within load claim in serious power shortages. On the other hand, in small supply shortages, shortage power is allocated to all load agents evenly.
A Comparativee Study of Bankruptcy Rules for Load-Shedding Scheme
149
Fig. 3. Load-shedding L based on the proportional rule
Fig. 4. 4 Load-shedding based on the CEA rule
Fig. 5. 5 Load-shedding based on the CEL rule
In the application of the RA R rule, as shown in Fig. 7, load-shedding is perform med proportionally within poweer claim of each load agent. This result is similar to the application of the proportio on rule.
150
H.-M. Kim and T. Kiinoshita
Fig. 6.. Load-shedding based on the Talmud rule
Fig. 7. Load-shedding based on the RA rule
3.3 Comparison of Test Results R From results of load-shedd ding tests using five bankruptcy rules shown in Figs. 33-7, the following features are su ummarized in Table 1. Table 1. Featurres of load-shedding according to bankruptcy rules
Scheme
S Serious supply shortage
CEA rule
y Loaad-shedding to allocate same amo ount within load claim y Equ ual load-shedding within load d claim y Alm most proportional loadshed dding within load claim y Proportional load-shedding with hin load claim
CEL rule Talmud rule Prop. & RA rules
Small supply shortage y Small loads are ruled outt y Equal Load-shedding y Equal Load-shedding y Proportional load-sheddinng within load claim
A Comparative Study of Bankruptcy Rules for Load-Shedding Scheme
151
Additionally processing time of load-shedding schemes using five bankruptcy rules is considered. The results are summarized in Table 2, where M and N is sizes of for loop shown in codes for the five rules. Especially, the proportional rule takes a short time and the RA rule can take a long time when the number of loads increases. Therefore, it is considered that the RA rule is not suitable for load-shedding application in practice. Table 2. Results of processing time analysis
Rules
Serious supply shortage
Proportional rule
O(N)
CEA rule
O(M) * O(N)
CEL rule
O(M) * O(N)
Talmud rule
O(M) * O(N) O(N
RA rule
)
It is anticipated that supply shortages will be not serious in the islanded microgrid because the total amount and pattern of loads will be considered in order to decide total capacity of DGs. Therefore, we take into account small supply shortages to compare some features according to the bankruptcy rules. For the comparison, we consider the following indices: y The number of loads involved for load-shedding and the number of loadshedding amounts are considered as an index for operation. y Proportional distribution is considered an index for fairness. y Processing time is considered for real applications. Table 3 shows the result of the comparison by the indices. Table 3. Features of bankruptcy rules for load-shedding applications
Scheme Proportional rule CEA rule CEL rule Talmud rule RA rule
Operation
• • • • •
Complex Simple Simple Medium Complex
Fairness
• • • • •
High Low Medium Medium High
Processing time
• • • • •
Short Acceptable Acceptable Acceptable Very long
From Table 3, we can see that the proportional rule, the CEA rule, the CEL rule, and the Talmud rule can be acceptable although their features are different because they have even degrees. However, the proportional rule has a complex feature in the
152
H.-M. Kim and T. Kinoshita
operation side. In addition, the RA rule is not acceptable as mentioned before. Although there is some room for improvement about the indices, especially fairness, the CEA rule, the CEL rule, and the Talmud rule can be recommended as acceptable operations.
6 Conclusion In this paper, we compared the features of load-shedding schemes according to five bankruptcy rules from the viewpoints of the load-shedding pattern, microgrid operation, fairness, and processing time. In this paper, fairness was defined as proportional distribution roughly but its concept has room for improvement. Although the definition of fairness for load shedding is not simple because the topic deals with many areas such as philosophy and economy, an effective fairness index for load shedding is required.
References 1. Lasseter, R.H.: Microgrids. IEEE Power Engineering Society Winter Meeting 1, 146–149 (2001) 2. Katiraei, F., Iravani, R., Hatziargyriou, N., Dimeas, A.: Microgrids Management-Controls and Operation Aspects of Microgrids. IEEE Power Energy 6(3), 54–65 (2008) 3. Kim, H.-M., Kinoshita, T., Lim, Y., Kim, T.-H.: A Bankruptcy Problem Approach to Load-shedding in Multiagent-based Microgrid Operation. Sensors 10(10), 8888–8898 (2010) 4. Kim, H.-M., Kinoshita, T., Lim, Y., Kim, T.-H.: Bankruptcy problem approach to loadshedding in agent-based microgrid operation. In: Kim, T.-H., Stoica, A., Chang, R.-S. (eds.) SUComS 2010. Communications in Computer and Information Science, vol. 78, pp. 621–628. Springer, Heidelberg (2010) 5. Kim, H.-M., Kinoshita, T., Lim, Y.: Talmudic Approach to Load-shedding of Islanded Microgrid Operation based on Multiagent System. Journal of Electrical Engineering & Technology 6(2), 284–292 (2011) 6. Thomson, W.: Axiomatic and Game-theoretic Analysis of Bankruptcy and Taxation Problems: A Survey. Mathematical Social Science 45, 249–297 (2003) 7. Kim, H.-M., Kinoshita, T., Shin, M.-C.: A Multiagent System for Autonomous Operation of Islanded Microgrids Based on a Power Market Environment. Energies 3(12), 1972– 1990 (2010) 8. Kim, H.-M., Kinoshita, T.: A Multiagent System for Microgrid Operation in the Gridinterconnected Mode. Journal of Electrical Engineering & Technology 5(2), 246–254 (2010) 9. Kim, H.-M., Kinoshita, T.: Multiagent System for Microgrid Operation based on a Power Market Environment. In: INTELEC 2009, Incheon, Korea (October 2009) 10. Aumann, R.J., Michael, M.: Game Theoretic Analysis of A Bankruptcy Problem from the Talmud. Journal of Economic Theory 36, 195–213 (1985) 11. Kinoshita, T. (ed.): Building Agent-based Systems, The Institute of Electronics, Information and Communication Engineers (IEICE), Japan (2001) (in Japanese)
The Visualization Tool of the Open-Source Based for Flight Waypoint Tracking Myeong-Chul Park1 and Seok-Wun Ha2 1
Songho College, Namsanri, HoengseongEup, HoengseongGun 225-704, Republic of Korea
[email protected] 2 Dept. of Informatics, Gyeongsang National University, 900 Gajwadong, Jinju 660-701, Republic of Korea
[email protected]
Abstract. Flight Waypoint visualization of the aircraft is the system which in order to be solved is used the threat against a low altitude task and terrain altitude widely. But, in order to compose a system with GPS about lower, it is widely used with the restriction must construct terrain information where it is huge to it is difficult. In this paper, flight Waypoint and economically visualization Moving Map systems of the Open-Source based for it proposes. First, UDP it leads first from X-Plane and synthetic flight path information, it acquires demonstration and terrain altitude information from the delivery receiving Google Earth. Moving from map systems terrain altitude of altitude and the present location of the aircraft about under mapping from map sever it leads and Waypoint visualization at map information which it brings. Also, altitude comparison of the terrain altitude which it follows in time interval and the aircraft it leads and indicates the terrain location which has the dangerous characteristic of collision at the projected course the monitoring screen which it provides. The outgrowth of this paper in objective of the flight algorithm research back and the flight visualization field could be used flight Waypoint visualization with the economic tool. Keywords: Flight Waypoint, Flight Visualization, Open Source, Moving Map.
1 Introduction The aim of flight visualization is to more effectively confront the threat which is various from skies and an accidental situation change [1-2]. In a related study, it considered terrain altitude information that Waypoint the research which creates with automatic became accomplished and optimization technique it avoided the multiple threat [3-4]. But, research of most the course of the aircraft and it creates Waypoint. To express the user realistically the Waypoint where it creates, Moving Map systems built with the map and the terrain information, it guides the flight path with the tool which it is used. Moving Map systems of most the terrain information which is huge and terrain information in database about under constructing in order to have the restricted condition must use game or it applies, are difficult from the unit system which is used in research objective. T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 153–161, 2011. © Springer-Verlag Berlin Heidelberg 2011
154
M.-C. Park and S.-W. Ha
This study propose that various software of Open-Source based about under using economic Moving Map about under constructing visualization of flight Waypoint. It uses flight simulator because they have difficulty getting necessary to flight visualization information in actual flight. Flight simulator data was received by XPlane with UDP forms. And location information value about under Google Earth it leads and it expresses and terrain altitude information of corresponding location in Moving Map systems about under acquiring it delivers. It acquires the vertical velocity (ascending rate, descending rate) of altitude information of the aircraft and the aircraft which about under using is forecast direction information a progress direction coat terrain altitude it acquires. Final information led the visualization module which uses OpenGL libraries and the place where it is expressed at the map, map information from MapServer of ArcGIS was transmitted at real-time all [5]. Map information leads and web database the fact that the time delay of some occurs because is transmitted problem point. But using map information does not become the big problem because is not the fact that it is not used from the specialist system where the implement tool demands a real-time operation.
2 Background Fig. 1 shows, it announced that the flight visualization of the flight path which using State Transition Information without a terrain information [6]. But this system there is limit which only location information of the aircraft the expression which is factual characteristic in order to appear only the flight path it does not do and not to be able expresses.
Fig. 1. Visual Information of Flight Path
And Fig. 2 (A) shows, it introduces the tool that the flight information through the Google Earth it leads actually visualization beside flight condition and instrumental information [1]. Fig. 2 (B) shows, it recognizes that a flight situation it introduced three-dimension visualization tools [2]. In this paper, unit map only it does not express the moving course which is simple at to inform the threatening element which it follows in terrain altitude information. In brief, the tool of existing will express the result of flight optically, is not the data the actual pilot it will be able to refer. Whereas, the flight Waypoints which the present paper proposes use of evaluation after flight and reference flight all it will be able to apply.
The Visualization Tool of the Open-Source Based for Flight Waypoint Tracking
(A)
155
(B)
Fig. 2. Implementation of the Flight Information Visualization System (A) and 3D Visualization for Flight Situational Awareness (B)
3 A Visualization Tool With whole constructional drawing Fig. 3 of the system which it proposes from the present paper it is same. nGlGpGt
z GmGkGjGt
}¡Gl
vT vT
Tw
jGpGt
iT\ |kwGl
tmkGMGt}
|kwVhypuj[Y`
Fig. 3. System Architecture
First of all, designs of the packet engine which is UDP of X-Plane for a simulation flight data acquisition and it explains the structure for Moving Map systems. It confirms the example real use an effective characteristic and an operational condition and last actual data and the screen which it appears about under comparison analyzing it proves. 3.1 Tool Design The simulation flight data in compliance with Plug-in of X-Plane order this data is transmitted at outside and to acquire Fig. 4 it designs UDP modules together X-Plane Output & UDP Server Setting UDP Server Starting X-Plane Flight & User Interface
X-Plane UDP Send
X-Plane UDP Receive & Packet Analysis
Fig. 4. Design of UDP Engine
Visialization Engine
156
M.-C. Park and S.-W. Ha
It was composed that the module which starts UDP servers and it is received of packet the module which it analyzes from UDP engines. The packet transmitted from X-Plane is the data which corresponds to an index it receives a data in UDP forms (Table 1). There is one packet including the maximum 8 data which becomes independent and each data is defined with 4 byte real type variable. Table 1. UDP Packet item of X-Plane 9 Packet No. UDP 01 UDP 03 UDP 04 UDP 18 UDP 20 UDP 37 UDP 41,42
Inforamtion Times Aircraft Speeds Vertical Speed Indicator Pitch, Roll, Headings Lat, Lon, Altitude Engine RPM N1, N2
Index 0 0 1 0,1,2 0,1,2 0 0
The structures of packet format for received UDP packet are followed. Structures type for UDP packet format typedef struct data_struct { int index; float data[8]; } X9_UDP; UDP packet information which is stored in the structures for visualization with Fig. 5 was delivered together in flight path creation module and this module map information which it brings from MapServer and vertical speedometer information about under using compares the altitude of the airplane and terrain for. Vertical velocity was carried out in compliance with (Aircraft Speed*6075.62963/60)* tanθ, θ wants is drop where speed. But, in compliance with UDP Engine it uses VSI (Vertical Speed Indicator) value which is delivered from the tool. Heading, Pitch, Roll UDP Engine
Lat, Lon
MapServer
Flight Waypoint
Altitude
Altitude Comparson
Display
VSI
Fig. 5. Design of Visualization Engine
The flight Waypoint about under using with the course which the aircraft passes the way it does the prediction course which it will pass by and in order to appear Heading, Pitch and Roll value together with the quadrangular boxes with Fig. 6 about
The Visualization Tool of the Open-Source Based for Flight Waypoint Tracking
157
under using it expresses. Heading information progress direction, Pitch up-and-down, Roll was used with information for a right and left rotation. The blue dotted line is vertical speed indicator information about under using to predict the progress course of the aircraft.
Fig. 6. Flight Waypoint Tracking
The expression of instrument about under using was used OpenGL libraries and then, it leads texture mapping and blending methods and it expressed a realistic description. The texture mapping the place where it expresses the figures which are simple at the private room is a useful technique. Texture mapping is useful technique which expresses the simple figures realistically and the user wants the image where mapping is a possibility of doing at polygon. When indicating the specific image in the screen, it piled up above the image of origin and with the image of origin to come to erase it became. The flying instrument of most operated only the part which is located on internal and the image of outer side was fixed. To achieve this, it uses blending methods that there is the flight instrument with black color to the inside there is a circular image. When it becomes this image overlap in the background image, it adds to RGB value of the circular image and the black color part does not any effect in black background image part. But when are not real black color where it is complete there is a problem where the color deteriorates between a circular image and a background image. Like this problem operation, in the background image, the mask image about under adding when and this part was exchanged with black color. It becomes that the mask image background with white color, circular image part with black color, and the mask image AND operation logically in background. The circular image it is under OR operations, it wants it will be able to express the image where with instrumental information. Fig. 7 (A) and (B) are to be visible the background image for blending and the original image. Fig. 7 (C) is to be visible the image which creates with final blending methods.
(A) Background Image
(B) Forground Image
(C) Result
Fig. 7. Images for Blending
158
M.-C. Park and S.-W. Ha
3.2 Tool Implementation Fig. 8 is whole cockpit and instrument panel pictures of the aircraft which has been implemented. MFD (Multi-Function Display) of the left and it is became accomplished with map information of the right side and information about PFD (Primary Flight Display) in the center leads each instrumental panel. The actual image it led and it augmented a factual characteristic.
Fig. 8. Integration of Simulation Instrument in Cockpit and Google Earth
Mapping works in compliance with Roll and Pitch value central coordinate in standard rotary coordinate about under calculating mapping in corresponding territory. The rotary function which it provides from OpenGL gave an effect to different image territory and to find the rotary coordinate of separate way corresponding territory about under extracting mapping it used the method which it does from the image. Mapping coordinate arrangements are Fig. 9. The following codes are to embody MFD where there are to instrumental panel top left is to show the one part of texture mapping implementation codes. Implementation of MFD using rotary coordinate glBindTexture(GL_TEXTURE_2D, texture[0]); glBegin(GL_QUADS); x1=(sm_X-Cen);y1=(sm_Y-Cen); x2 = x1 * cos(roll) - y1 * sin(roll)+Cen; y2 = x1 * sin(roll) + y1 * cos(roll)+Cen -pitch; glTexCoord2f(x2,y2); glVertex3f(x, y, 0.0f); x1=(la_X-Cen);y1=(sm_Y-Cen); x2 = x1 * cos(roll) - y1 * sin(roll)+Cen; y2 = x1 * sin(roll) + y1 * cos(roll)+Cen-pitch; glTexCoord2f(x2,y2);glVertex3f(x+x_s, y, 0.0f); x1=(la_X-Cen);y1=(la_Y-Cen); x2 = x1 * cos(roll) - y1 * sin(roll)+Cen; y2 = x1 * sin(roll) + y1 * cos(roll)+Cen-pitch; glTexCoord2f(x2,y2); glVertex3f(x+x_s,y+y_s, 0.0f); x1=(sm_X-Cen);y1=(la_Y-Cen); x2 = x1 * cos(roll) - y1 * sin(roll)+Cen; y2 = x1 * sin(roll) + y1 * cos(roll)+Cen-pitch; glTexCoord2f(x2,y2); glVertex3f(x, y+y_s, 0.0f); glEnd();
The Visualization Tool of the Open-Source Based for Flight Waypoint Tracking
159
And real-time map information (Fig. 9) the right side to be divided to 10 levels, it shows on-line information display. Map information leads and MapServer of ArcGIS it is brought and the place representative fraction until 144,285 supports from 73,874,399. When it pressed the up-and-down direction key and the level of representative fraction was exchanged with automatic and, the corresponding level was indicated on map right sides. To the center of the map currently the coordinate of the longitude and latitude which flies was arranged, the maximum 2,097,152 (level 10) parts it was divided and under arranging in the screen the position map image in the corresponding longitude and latitude to be right it was.
Fig. 9. Texture Mapping for Rotary Coordinate
The following source code is one of the real-time maps for mapping descriptions. Fuction for texture mapping of Map void textures_run(){ int i, j, no=1; x_st=abs((int)(((-180)-lon)/zoom_ra)); y_st=abs((int)((lat-90)/zoom_ra)); for(i=-1;i<=1;i++) for(j=-1;j<=1;j++,no++){ sprintf(str1,"%d/%d/%d.jpg",zoom,y_st+i,x_st+j); BuildTexture(str1, texture[no]); } } The image where the place central coordinate which map central coordinate in standard loading does the total 9 parts images with the thing which it sees from the code together at web exists in standard it brought the image of 3 parts respectively with longitude direction and latitude direction. The image which corresponds to the maximum leveling instrument level all loading is not a possibility of doing to 2,097,152 is once. Therefore, if the central coordinate is escaped from image territory of existing, newly 9 parts image to be new about under loading mapping in this project. That 9 images are becoming mapping, when it assumes, currently the longitude and latitude which flies always to under existing does in (i , j) (Fig. 10 A). And then, to
160
M.-C. Park and S.-W. Ha
under being located it does a corresponding coordinate in the map center and mapping it does the remaining image respectively partially in corresponding territory. For example, if it is flying are located in (i , j) right side lower parts, the latitude and longitude the arrangement of the map becomes Fig. 10 (B).
i-1 , j-1
i-1 , j
i-1 , j+1
i , j-1
i,j
i , j+1
i+1 , j-1
i+1 , j
i+1 , j+1
(A) Territory of image
(B) Example of image arrangement
Fig. 10. Arrangement of Map Image
Fig. 11 is the resultant screen which is visible information about Moving Map screen against the flight of Kyongsang-Namdo Yosu area and the flight path. Black line lead and flight Waypoint tracking blue area of progress direction and the lower part of the aircraft which is predicted is altitude information of terrain. To the altitude territory which is dangerous characteristic of collision at progress direction of the aircraft it informs the danger about under expressing with red.
Fig. 11. Moving Map and Flight Waypoint Tracking
4 Conclusion The presented paper implements Moving Map systems from Open-Source based. Besides, flight Waypoint tracking it led and the visualization tool for the course search and a collision avoid of the aircraft it provided. Also, intuitionally it expresses instrument information of the aircraft it provides for the instrument panel. It could be
The Visualization Tool of the Open-Source Based for Flight Waypoint Tracking
161
applied with the economic tool and it is a plan which will continue the research against the flight simulation tool which integrates the research result of existing from hereafter research.
References 1. Park, M., Hur, H.: Implementation of the Flight Information Visualization System using Google Earth. Journal of the Korea Society of Computer and Information 15(10), 79–86 (2010) 2. Park, S., Park, M.: 3D Visualization for Flight Situational Awareness using Google Earth. Journal of the Korea Society of Computer and Information 15(12), 181–188 (2010) 3. Park, J., Park, S., Ryoo, C., Shin, S.: A Study on the Algorithm for Automatic Generation of Optimal Waypoint with Terrain Avoidance. Journal of the Korea Society for Aeronautical and Space Sciences 37(11), 1104–1111 (2009) 4. Kim, B., Ryoo, C., Bang, H., Chung, E.: Optimal Path Planning for UAVs under Multiple Ground Threats. Journal of the Korea Society for Aeronautical and Space Sciences 34(1), 74–80 (2006) 5. World Street Map, http://www.arcgis.com 6. Song, J., Park, T., Kim, J., Choy, Y.: Flight Path Visualization Using State Transition Information of Track. In: Proc. of the 34th KIISE Fall Conference, KIISE, vol. 34(2), pp. 172–177 (2007)
Space-Efficient On-the-fly Race Detection Using Loop Splitting Yong-Cheol Kim1 , Sang-Soo Jun2 , and Yong-Kee Jun3 1
Dept. of Computer Engineering, International Univ. of Korea, Jinju, South Korea 2 Dept. of Computer Engineering, Kyung Hee Univ., Seoul, South Korea 3 Dept. of Informatics, Gyeongsang National Univ., Jinju, South Korea
[email protected],
[email protected],
[email protected]
Abstract. Detecting races is important for debugging shared-memory parallel programs, because the races result in unintended nondeterministic execution of the programs. Previous on-the-fly techniques to detect races in parallel programs with general inter-thread coordination shows serious space overhead which is dependant on the maximum parallelism of the program. This paper proposes a two-pass algorithm which splits a parallel loop with just one event variable into a series of two serializable loops, preserving the semantics of the original program. The first serializable loop contains all the original dynamic blocks which are executed before the first wait operation in every thread. And, the next serializable loop contains all the original dynamic blocks which are executed after the first wait operation in every thread. Keywords: parallel program, inter-thread coordination, on-the-fly race detection, space efficiency, two-pass loop splitting, serializable loop.
1
Introduction
It is inherently more difficult to write and debug parallel programs [3, 16] than sequential ones. Because the instruction streams in parallel execution can be executed simultaneously, it is difficult to obtain precise information on the execution state and flow of the parallel program. Particularly, shared-memory parallel programs may have a special kind of bugs called data races or access anomalies, in short races. A race appears when two instructions in different parallel threads perform accesses to a shared variable without proper inter-thread coordination, and when
This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency) (NIPA-2010(C1090-1031-0007)).
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 162–169, 2011. c Springer-Verlag Berlin Heidelberg 2011
Space-Efficient On-the-fly Race Detection Using Loop Splitting
163
at least one of the accesses is a write to the variable. Detecting races is important for debugging shared-memory parallel programs, because the races result in unintended non-deterministic executions of the program, which makes the debugging more difficult. To detect races, on-the-fly detection [5, 8, 15, 18] instruments the program to be debugged, and monitors an execution of the program. The monitoring process reports races which occur during the monitored execution. This approach can be a complement to static analysis [1, 4, 6, 17], because on-the-fly detection can be used to identify realistic races from the potential races reported by static analysis approaches. Although on-the-fly detection may not report as many races as post-mortem detection [2, 12–14], it guarantees to report at least one race for each variable involved in a race. On-the-fly detection requires less storage space than post-mortem detection, because much of the information collected by the monitoring process can be discarded as an execution progresses. However, this technique shows still serious space overhead for parallel programs with inter-thread coordination, because it should maintain information which is as large as the maximum parallelism of the execution, in the worst-case. Our primary interest is to restructure parallel programs to serializable programs for sequential monitoring, preserving the semantics of original program. This paper develops our principle [10] suggested previously by proposing a two-pass algorithm which splits a parallel loop with just one event variable into a series of two serializable loops preserving the semantics of the original program. The first serializable loop contains all the original dynamic blocks which are executed before the first wait operation in every thread. And, the next serializable loop contains all the original dynamic blocks which are executed after the first wait operation in every thread. We assume that a parallel program is already instrumented with additional code including the labelling functions for on-the-fly race detection. We formulate our problem and motives including the related work in the next section and describe the algorithm of our solution in section 3. Finally, we conclude our argumentation in the final section.
2
Background
Shared-memory parallel programs may have two parallel constructs [3, 16]: parallel loop and parallel sections. Although here we use parallel loops such as shown in Figure 1 due to the restricted space of this paper, our technique also can be applied to parallel sections. In an execution of a parallel loop, multiple threads of control are created at a parallel do statement and terminated at the corresponding end parallel do statement. Parallel constructs may be nested inside parallel constructs. Branches are not allowed from within a parallel constructs to outside the parallel constructs or vice versa. We assume that all inter-thread coordination is provided via event variables. An event variable is always in one of two states: clear or posted . The initial value
164
Y.-C. Kim, S.-S. Jun, and Y.-K. Jun
0: 1: 2: 3: 4: 5: 6:
parallel do i = 1, 3
if then wait E if then wait E end parallel do Fig. 1. A parallel loop
of an event variable is always clear. The value of an event variable can be set to posted with a post statement. The value of an event variable can be tested with a wait statement. A wait statement suspends execution of the thread which executes it until the specified event variable’s value is set to posted. On the other hand, the posting thread may proceed immediately. A clear statement resets the value of an event variable to clear. We assume that there is just one event variable for inter-thread coordination and no clear statements in the programs considered in this paper. A static block is a structured block of statements which must not contain any wait statement and any external control flow into or out of the static block. Note that a static block may contain post statements. A dynamic block is a sequence of instructions from a static block executed by a single thread. Figure 1 shows an example parallel loop considered in this paper, where E is an event variable, Bi is an i-th static block, and Ci is a boolean condition of the logical IF statement which contains a wait operation. In an execution of parallel program, two accesses to a shared variable are conflicting if at least one of them is a write. If two accesses executed concurrently in two different dynamic blocks are conflicting, then these accesses constitute a race. On-the-fly race detection instruments additional code into debugged program to monitor and detect races during an execution of the program. In order to detect races on the fly, one needs to determine if a dynamic block’s access to a shared variable is logically concurrent and conflicting with any other previous access to the same variable and then results in a race. This requires detection protocol to monitor the dynamic blocks that perform accesses to shared variables and to maintain an access history for each shared variable during the execution of the program. Whenever an access to a shared variable occurs, the logical concurrency should be determined between current access and every previous access in the access history of the variable. The logical concurrency between two accesses is determined from the concurrency information on the two dynamic blocks that perform the accesses, which are called block labels [5]. The labels are generated on each creation or termination of dynamic blocks, and may be stored in access histories of shared variables. The on-the-fly analysis, however, has yet large overhead in space. The storage space required consists of two components in its complexity: one is the space
Space-Efficient On-the-fly Race Detection Using Loop Splitting
165
to maintain access histories for all shared variables, and the other is the space to maintain block labels of simultaneously active dynamic blocks. For example, the worst-case space complexity of Task Recycling [5] is O(V T + T 2 ), where V is the number of monitored variables and T is the maximum parallelism of the debugged program. To eliminate one of these two components, our technique resorts to sequential monitoring of the program execution, because on-the-fly race detection resorts to checking logical concurrency between two blocks regardless of thread scheduling. In this sequential monitoring, the worst-case space complexity of Task Recycling reduces to O(V T ), because labels are not required to be maintained for simultaneously active dynamic blocks. The sequential execution of the program is undefined or deadlocked , if a wait statement is executed and the corresponding event variable is not already posted. If a program’s sequential execution is defined for all input data sets, the program is serializable. Our primary interest is to restructure parallel programs into serializable programs for sequential monitoring, preserving the semantics of original program.
3
The Loop Splitting Algorithm
To make it a parallel loop to be serializable or not to be deadlocked, we should make all the post operations be executed before any wait operation for the same event variable. The main idea of our technique is to split every original parallel loop with inter-thread coordination into two serializable loops: one loop only for post operations, and the other only for wait operations. The first serializable loop is called the before-wait loop for all the dynamic blocks executed before the first wait operation in every thread, and the other serializable loop is called the after-wait loop for all the dynamic blocks executed after the first wait operation in every thread. This technique therefore partitions the set of all dynamic blocks appeared in the execution of original loop into two sets of dynamic blocks which are defined in sequential execution. In the before-wait loop, each thread is immediately terminated whenever it arrives at the first wait operation of the original loop. The remaining part of the thread continues to execute in the corresponding after-wait loop. For example,
0: 1: 2: 3: 4: 5: 6: 7:
parallel do i = 1, 3 if then goto 100 if then goto 100 100 continue end parallel do
Fig. 2. The before-wait loop
166
Y.-C. Kim, S.-S. Jun, and Y.-K. Jun
consider the before-wait loop shown in Figure 2 which is constructed directly from the original loop shown in Figure 1 by replacing each wait statement with the predefined special goto statement and adding a continue statement for the gotos. This restructuring technique requires every wait statement in the original loop to be in one logical if statement. We resort this to the instrumentation phase that transforms the program to the code in a canonical form. Note that the before-wait loop does not contain any wait statement, but may contain post statements in its static blocks. This means that the before-wait loop does not perform any wait operation, but performs all post operations appeared before the first wait operation in each thread. We construct this before-wait loop in the first pass of our algorithm as follows. read a Term from the original program while (Get-Term(T) == not null) { if (T == WAIT) { replace the wait statement with a goto statement replace the event variable with a number write the modified Term } if (T == LOOP-END) { get the event number write " continue" } read a Term from the Original Program } In the after-wait loop, each thread is created to continue immediately from the first wait operation of the original loop. For example, consider the after-wait loop shown in figure 3 which is constructed directly from the original loop shown in figure 1 by eliminating the first static block and restructuring each wait statement to be executed if it is the first wait operation in its thread. The restructured code sets a boolean variable waited if the wait operation is the first in the thread, which makes the thread continue to execute the next static block. Otherwise, it performs the predefined special goto statement to execute
0: 1: 2: 3: 4: 5: 6: 7:
parallel do i = 1, 3 if then wait E waited = .true. endif if .not. waited then goto 10 10 continue
8: 9: 10: 11: 12: 13: 14: 15:
if then wait E waited = .true. endif if .not. waited then goto 20 20 continue end parallel do
Fig. 3. The after-wait loop
Space-Efficient On-the-fly Race Detection Using Loop Splitting
167
the corresponding continue statement which is inserted immediately before the next wait statement. We construct this after-wait loop in the second pass of our algorithm as follows. Read a Term from the original program while (Get-Term(T) == not null) { if (T == WAIT-IF) { set After-wait flag to true add " continue" write the Term with adding "waited = .true." add "if .not. waited then goto " } if (T == LOOP-END) { determine a statement number add " continue" write the Term } if (After-wait == true) { write the Term } read a Term from the Original Program } Although the partial order of restructured program execution is actually different from that of the original program execution, it shows no problem to monitor the sequential execution for detecting races in the original program. Note that we construct a serializable program by restructuring a parallel program which is already instrumented. The instrumented program is equipped with additional code including the labelling functions for on-the-fly race detection. This approach leads the partitioned loops not to be instrumented for both end parallel do statement of the before-wait loop and parallel do statement of the after-wait loop. This implies that the monitored sequential execution of restructured program generates the same block labels that are generated in the monitored parallel execution of original program, preserving the semantics of original program for sequential monitoring. For on-the-fly race detection of the restructured program, therefore, we can apply any existing scheme for not only block labelling but also detection protocol, which makes our technique general.
4
Conclusion
On-the-fly techniques to detect races shows still serious space overhead for parallel programs with inter-thread coordination, because it should maintain information which is as large as the maximum parallelism of the execution, in the worst-case. This paper proposes a two-pass algorithm which restructures a parallel programs with just one event variable to be serializable for sequential
168
Y.-C. Kim, S.-S. Jun, and Y.-K. Jun
monitoring by splitting a parallel loop into a series of two serializable loops, preserving the semantics of the original program. A previous work [10] presents a principle to extend this technique for the programs with multiple event variables. This technique especially allows us to use a two-pass on-the-fly algorithm [15] also for detecting the first races [7, 8, 15] in shared-memory parallel programs with general inter-thread coordination. It is because that this on-the-fly algorithm works for a special class of parallel programs which may have ordered synchronization in which any pair of the corresponding coordination points are executed on ordered sequence including sequential execution of these points. The first races are important in debugging, because the removal of such races may make other races disappear.
References 1. Callahan, D., Kennedy, K., Subhlok, J.: Analysis of Event Synchronization in a Parallel Programming Tool. In: 2nd Symposium on Principles and Practice of Parallel Programming, pp. 21–30. ACM, New York (1990) 2. Chen, F., Serbanuta, T.F., Rosu, G.: jPredictor: A Predictive Runtime Analysis Tool for Java. In: 30th International Conference on Software Engineering (ICSE), pp. 221–230. ACM, New York (2008) 3. Dagum, L., Menon, R.: OpenMP: An Industry-Standard API for Shared-Memory Programming. Computational Science and Engineering 5(1), 46–55 (1998) 4. Dabrowski, F., Pichardie, D.: A Certified Data Race Analysis for a Java-like Language. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) TPHOLs 2009. LNCS, vol. 5674, pp. 212–227. Springer, Heidelberg (2009) 5. Dinning, A., Schonberg, E.: An Empirical Comparison of Monitoring Algorithms for Access Anomaly Detection. In: 2nd Symp. on Principles and Practice of Parallel Programming, pp. 1–10. ACM, New York (1990) 6. Grunwald, D., Srinivasan, H.: Efficient Computation of Precedence Information in Parallel Programs. In: 6th Workshop on Languages and Compilers for Parallel Computing, pp. 602–616. Springer, Heidelberg (1993) 7. Ha, K., Jun, Y., Yoo, K.: Efficient On-the-fly Detection of First Races in Nested Parallel Programs. In: Workshop on State-of-the-Art in Scientific Computing (PARA), Copenhagen, Denmark, pp. 75–84. Springer, Heidelberg (2004) 8. Jun, Y., McDowell, C.E.: On-the-fly Detection of the First Races in Programs with Nested Parallelism. In: 2nd International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 1549–1560. CSREA, USA (1996) 9. Kim, D., Jun, Y.: An Effective Tool for Debugging Races in Parallel Programs. In: 3rd International Conference on Parallel and Distributed Processing Techniques and Applications, pp. 117–126. CSREA, USA (1997) 10. Kim, Y., Jun, Y.: Restructuring Parallel Programs for On-the-fly Race Detection. In: 5th International Conference Parallel Computing Technologies, pp. 446–452. Russian Academy of Science, Russia (1999) 11. Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM 21(7), 558–565 (1978) 12. Marino, D., Musuvathi, M., Narayanasamy, S.: LiteRace: Effective Sampling for Lightweight Data-Race Detection. In: ACM Sigplan Confernece on Programming Language Design and Implementation (PLDI), Dublin, Ireland. ACM, New York (2009)
Space-Efficient On-the-fly Race Detection Using Loop Splitting
169
13. Netzer, R.H.B., Ghosh, S.: Efficient Race Condition Detection for Shared-Memory Programs with Post/Wait Synchronization. In: International Conference on Parallel Processing, pp. 242–246. Pennsylvannia State University, USA (1992) 14. Netzer, R.H.B., Miller, B.P.: Improving the Accuracy of Data Race Detection. In: 3rd Symp. on Principles and Practice of Parallel Programming, pp. 133–144. ACM, New York (1991) 15. Park, H., Jun, Y.: Detecting the First Races in Parallel Programs with Ordered Synchronization. In: 6th International Conference on Parallel and Distributed Systems, pp. 201–208. IEEE, Alamitos (1998) 16. Parallel Computing Forum: PCF Parallel Fortran Extensions. Fortran Forum 10(3) (1991) 17. Qadeer, S., Wu, D.: KISS: Keep It Simple and Sequential. In: Sigplan Conference on Programming Language Design and Implementation (PLDI). ACM, New York (2004); Washington D.C. Sigplan Notices 39(6), 14–24 (2004) 18. Sack, P., Bliss, B.E., Ma, Z., Petersen, P., Torrellas, J.: Accurate and Efficient Filtering for the Intel Thread Checker Race Detector. In: 1st Workshop on Architectural and System Support for Improving Software Dependability (ASID), San Jose, California, pp. 34–41. ACM, New York (2006)
A Comparative Study on Responding Methods for TCP’s Fast Recovery in Wireless Networks Mi-Young Park1 , Sang-Hwa Chung1, Kyeong-Ae Shin2 , and Guangjie Han3 1
2 3
Dept. of Computer Engineering, Pusan National University Dept. of Medical Information System, Dongju College University, South Korea Dept. of Information & Communication System, Hohai University, P.R. China [email protected], [email protected], [email protected], [email protected]
Abstract. When TCP operates in wireless networks, it suffers severe performance degradation because TCP’s fast retransmit/fast recovery (FR/FR) is often triggered regardless of congestion when a packet is lost due to wireless transmission errors. Although several loss differentiation schemes have been proposed to distinguish the cause of triggered FR/FRs, there has been a little try to find out an appropriate responding method to utilize the detection information of a LDA. In this paper, we observe the relationship between a responding method and TCP’s performance, and investigate the best way to respond to the FR/FRs non-related to congestion in order to improve TCP’s performance to the max. Our experiment results emphasize the importance of the responding method, which is overlooked in previous works, and suggest the best response for the FR/FRs non-related to congestion.
1 Introduction One of the objectives in TCP [8,1] is to use available bandwidth to the max, and to prevent congestion in the networks by appropriately controlling its transmission rate. When a TCP sender receives three successive duplicate acknowledgements (3DUPACK) indicating a packet loss, it assumes that the packet is lost due to congestion. Then, TCP triggers a fast retransmit/fast recovery (FR/FR) to retransmit the lost packet as well as to reduce its transmission rate [1]. Such a mechanism works very well in wired networks where most packets are lost due to congestion. However, in wireless networks where packets are lost due to wireless transmission errors, TCP’s FR/FRs are often triggered regardless of congestion [10,4]. We call the FR/FRs non-related to congestion wireless FR/FRs, and we call the FR/FRs triggered by congestion congestion FR/FRs. Due to wireless FR/FRs, TCP unnecessarily reduces its transmission rate, and then underutilizes the available bandwidth. Consequently, TCP’s performance severely degrades in wireless networks. Since TCP has no ability to distinguish if a FR/FR is triggered due to congestion or due to wireless transmission errors, several loss differentiation schemes (LDAs) have been proposed to distinguish the cause of triggered FR/FRs. These LDA schemes can detect wireless FR/FRs by distinguishing congestion losses from wireless losses when a TCP sender receives 3DUPACK. To improve TCP’s performance in wireless networks, T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 170–179, 2011. c Springer-Verlag Berlin Heidelberg 2011
A Comparative Study on Responding Methods for TCP’s Fast Recovery
3HUIRUP PDQFHHQKDQFHPHQW
0D[
171
0LQ
$YHUDJH
9HQR
:HVW
-7&3
5(/'6
4 /'$B(4
$Z
$F 7&3
$W 1&3/'
9HQR
:HVW
(a)
-7&3
5(/'6
/'$B(4
(b)
Fig. 1. Relation between accuracy of LDA and TCP’s performance enhancement
these LDAs assume that the detected wireless FR/FRs will be responded by setting TCP’s congestion control state to the previous state instead of being responded by halving TCP’s transmission rate. Unfortunately, our previous work [7] showed that restoring TCP’s congestion control state to the previous state does not always improve TCP’s performance; TCP’s performance degrades even with the best LDA in some simulation scenarios. In this paper, we will observe why TCP’s performance decreases in some cases, and investigate the best way to respond to wireless FR/FRs in order to improve TCP’s performance to the max. For this, we designed about 120 different simulation scenarios by setting different values for network parameters such as the buffer size, the number of hops, and the loss rate, and then we grouped all the scenarios into two groups (W , and C group) according to our intention. In each group, we measure how much each responding method can improve TCP’s performance when all wireless FR/FRs are detected, and we measure how much each responding method degrades TCP’s performance when a congestion FR/FR is mistakenly detected as a wireless FR/FR. Our extensive experiment results emphasize the importance of the responding method on improving TCP’s performance, and suggest the best response for the detected wireless FR/FRs. In the following section, we introduce previous loss differentiation schemes which can detect wireless FR/FRs, and then we explain our motivation in section 3. In section 4, we investigate the performance variation of TCP based on different responding methods, and suggest the best response for the detected wireless FR/FRs. Lastly, we conclude this paper with our future work.
2 Related Works To avoid TCP’s performance degradation caused by wireless FR/FRs in wireless networks, several loss differentiation schemes (LDAs) have been proposed to detect wireless FR/FRs by distinguishing congestion losses from wireless losses when a TCP sender receives 3DUPACK. When any wireless loss is detected by a LDA, the LDA
172
M.-Y. Park et al.
assumes that TCP’s FR/FR is triggered regardless of congestion, and treats it as a wireless FR/FR, otherwise, it treats it as a congestion FR/FR assuming that the FR/FR is triggered due to congestion. Here, we introduce five end-to-end loss differentiation algorithms (LDAs): NCPLD [9], Veno [3], West [12], JTCP [11], and RELDS [5]. Samaraweera [9] proposed an non-congestion packet loss detection (NCPLD) to implicitly detect the type of packet loss using the variation of delay experienced by TCP packets. On detection of a packet loss, the scheme compares the currently measured round trip time (RTT) with a calculated delay threshold. If the RTT is less than the threshold, the scheme treats the packet loss as wireless losses, and it assumes the triggered FR/FR as a wireless FR/FR. TCP Veno [3] estimates the backlog packets (N ) in the buffer using Vegas’s mechanism [2]. When a TCP sender receives 3DUPACK, Veno compares N with a threshold 3. If N < 3, Veno assumes the packet loss as wireless losses, and treats the triggered FR/FR as a wireless FR/FR. Yang [12] computes the two thresholds, Bspikestart and Bspikeend using RTT in order to identify the spike state of the current connection. Any packet losses in the spike state are considered as congestion losses, and the triggered FR/FRs are assumed as congestion FR/FRs. Otherwise, all the triggered FR/FR will be assumed as wireless FR/FRs. JTCP [11] calculates a threshold (Jr) which is the average of the inter arrival jitter during one round-trip time. When a TCP sender receives 3DUPACK, it checks if the time receiving the 3DUPACK exceeds one RTT as well as if Jr is larger than the inverse value of the current congestion window size. If the two conditions are not satisfied, it ascribes the packet loss to wireless transmission errors and it assumes the triggered FR/FR as a wireless FR/FR. Lim and Jang [5] suggested a robust end-to-end loss differentiation scheme (RELDS) to precisely discriminate between congestion losses and wireless losses. This scheme employs a moving threshold, which is defined as a function of minimum and sample RTT. If the moving threshold is not satisfied when a TCP sender receives 3DUPACK, it assumes the packet loss as wireless losses, and treats the triggered FR/FR as a wireless FR/FR.
3 Motivation and Observation In our previous work, we also suggested an end-to-end loss differentiation scheme (LDA EQ) [6] to distinguish the cause of the triggered FR/FR by estimating the rate of queue usage. If the estimated queue usage is smaller than a certain threshold when 3DUPACK is received, our scheme assumes the packet loss as wireless losses, and treats the triggered FR/FR as a wireless FR/FR. In the work, we measured three accuracies: (1) how accurately it can detect wireless FR/FRs (Aw ), (2) how accurately it can detect congestion FR/FRs (Ac ), and (3) the average (At ) of Aw and Ac . Figure 1(a) shows the result. In the figure, Aw of our scheme (LDA EQ) is 67%, and its Ac is 93%, thus, its average accuracy (At ) is 80% which is the highest among the LDAs. In case of TCP which has no ability to detect wireless FR/FRs, its Aw , Ac and At are 0%, 100%, and 50% respectively.
A Comparative Study on Responding Methods for TCP’s Fast Recovery
173
In our another work [7], we analyzed the effect of LDAs on improving TCP performance. Whenever a wireless FR/FR is triggered, it is responded by restoring TCP’s congestion control state to the previous state instead of being responded by reducing the transmission rate in the experiments of the work [7]. That is because the existing LDAs basically assume that TCP’s performance will be improved by not reducing the transmission rate when a wireless FR/FR is detected. Figure 1(b) shows TCP’s performance improvement when each LDA is applied to TCP. In the figure, each bar graph in each LDA shows the lowest, the highest, and the average of the performance enhancement when each LDA is applied. For example, when Veno is applied to TCP, the lowest performance enhancement is -6%, the highest one is 6%, and the average is 0%. With our scheme (LDA EQ), its lowest enhancement is -8%, the highest one is 7%, and the average is 0%. This result shows that TCP’s performance is not always improved with any of the LDAs. In some scenarios, TCP’s performance degraded even with the best LDA. To find out why, we traced and analyzed all the trace files of the simulation scenarios carefully, and then, we found the reason in the responding method. How to respond is very critical on improving TCP’s performance because there will be no performance enhancement if all wireless FR/FRs are responded by the same way of TCP. From now on, let us explain our observations. TCP uses the two variables, the congestion window size (cwnd) and the slow start threshold (ssthresh), to control its transmission rate according to the network traffic. For example, when it is assumed that the network is congested, TCP reduces its transmission rate by decreasing the values of cwnd and ssthresh while TCP increases its transmission rate by increasing cwnd when the network is not congested. Firstly, let us suppose that a wireless FR/FR is detected by a LDA when cwnd and ssthresh are too small values. Just recovering cwnd and ssthresh to the previous values does not make any big difference between the performances of TCP with/without a LDA. If the wireless FR/FR is detected when cwnd and ssthresh are large values, however, TCP’s performance will be improved significantly. This is the reason why one LDA improves TCP’s performance significantly in some scenarios while it hardly improves the performance in other scenarios. In the meantime, with the larger values of cwnd and ssthresh, TCP’s performance with a LDA can be degraded if the bandwidth is not enough or if the packet loss rate is high. This is because the larger values of cwnd and ssthresh might cause serious congestion or might cause burst losses. In such cases, TCP’s performance can be reduced even with the best LDA. Secondly, let us suppose that one LDA (A) detects 10 wireless FR/FRs when cwnd and ssthresh are too small values, and another LDA (B) detects 1 wireless FR/FR when cwnd and ssthresh are large values in the same scenario. In this case, B algorithm may improve TCP’s performance much more than A algorithm even though the accuracy of A algorithm is higher. This indicates that the accuracy of a LDA can be meaningless without a good responding method. All existing LDA schemes assume that the responding method of recovering TCP’s congestion control state to the previous state will improve TCP’s performance, and this assumption seems reasonable in a common sense. But, our observations inform
174
M.-Y. Park et al. Table 1. Responding Methods Responding TCP’s Congestion Control State Methods 0 TCP’s original response 1 restoring to the previous congestion control state Slow Start Part cwnd ssthresh 11 the size of one segment 65535 12 TCP’s response (cwnd) 65535 13 the previous value of cwnd 65535 14 the average of cwnd 65535 15 the maximum of cwnd 65535 16 the size of sender’s buffer 65535 17 bandwidth-delay product 65535 Congestion Avoidance Part cwnd ssthresh 21 the size of one segment 22 TCP’s response (cwnd) 23 the previous value of cwnd 24 the average of cwnd 25 the maximum of cwnd 26 the size of sender’s buffer 27 bandwidth-delay product
us that recovering to the previous state does not always improve TCP’s performance. Furthermore there has been a little try to find out the best responding method to utilize the detection information of a LDA. Thus, in this paper, we observe the relationship between responding methods and TCP’s performance, and investigate the best way to respond to wireless FR/FRs in order to improve TCP’s performance to the max.
4 A Comparative Study on Responding Methods 4.1 Responding Methods To find the best response, we made 14 different responding methods for wireless FR/FRs based on our observations. Table 1 shows all the responding methods used in our experiments. First of all, we classified the responding methods into two groups: the slow start part and the congestion avoidance part. This is to check if TCP should enter the slow start phase or the congestion avoidance phase when a wireless FR/FR is detected. Although TCP enters the congestion avoidance phase when a FR/FR is triggered, it might be necessary to make TCP enter the slow start phase to quickly utilize available bandwidth to the max if the FR/FR is triggered regardless of congestion. This is why we classified the methods into the two groups. Then, each group is again classified into seven methods according to cwnd’s size. This is to check if cwnd should be set by small or larger values when a wireless FR/FR
A Comparative Study on Responding Methods for TCP’s Fast Recovery
175
ZLUHOHVV
Fig. 2. 5-hops wireless chain topology
is detected. Deciding the size of cwnd is a difficult problem because there is always a trade-off relationship. With a small size of cwnd, TCP’s transmission rate is low while there is a little possibility to have burst losses. With a big size of cwnd, TCP’s transmission rate is high while there is a high possibility to have burst losses and additional congestion. By testing various sizes of cwnd, we will check the appropriate size of cwnd for the detected wireless FR/FRs. In the table 1, the method 0 is TCP’s response for the triggered FR/FR when a TCP sender receives 3DUPACK. This method will be used to compare with other methods to check if other methods is better than TCP’s response. With the method 1, a TCP sender recovers its transmission rate to the previous state just before a wireless FR/FR is detected; all LDAs assume that the method 1 will be used when a wireless FR/FR is triggered. With the methods from 11 to 17, TCP enters the slow start phase when a wireless FR/FR is detected. Thus, ssthresh is set to the max value (65535) and cwnd is set to one of the values shown at “cwnd” column of the table 1. For cwnd, seven different sizes are used: the size of one segment, TCP’s response for cwnd, the previous value just before a wireless FR/FR is detected, the average of cwnd, the maximum value of cwnd, the size of sender’s buffer, and bandwidth-delay product. For example, when the method 11 is used as a responding method for the wireless FR/FRs, TCP’s cwnd and ssthresh will be set to “the size of one segment”, and “65535” respectively, instead of halving cwnd and ssthresh. With the methods from 21 to 27, TCP enters the congestion avoidance phase when a wireless FR/FR is detected. For example, when the method 21 is used, TCP’s cwnd and ssthresh will be set to “the size of one segment” when a wireless FR/FR is detected. 4.2 Simulation Scenarios To evaluate and compare TCP’s performance enhancement according to different responding methods, we conducted simulation-based experiment using QualNet 4.5 [13]. Figure 2 shows the 5-hops chain topology of IEEE 802.11 wireless nodes used in our experiment. The bandwidth of the wireless channel is 11Mbps, and we set DropTail as its queuing policy. The maximum segment size of TCP is equal to 1K bytes, and the packet size equal to 1K bytes. The congestion window size cwnd is limited to 48 packets, and all TCP flows are originated at the first node (node 1) and destined to one of nodes on the right side of Figure 2. In our experiment, each simulation scenario lasts about 200 seconds, and data packets of TCP are continually transmitted during the simulation time after the warm-up period (35 seconds). In this experiment, we aimed to find the best response for the wireless FR/FRs. We assume that the best responding method should improve TCP’s performance as highly as possible when a wireless FR/FR is detected and also should cause little performance degradation when a LDA misjudges congestion FR/FRs to wireless FR/FRs. Thus, to find the best response, we
176
M.-Y. Park et al.
VORZVWDUW
FRQJHVWLRQDYRLGDQFH
KRS
KRSV
VORZVWDUW
FRQJHVWLRQDYRLGDQFH 4GURQPFKPI/GVJQFU
4GURQPFKPI/GVJQFU
(a)
KRSV
(b)
Fig. 3. TCP’s performance enhancements according to the responding methods in W group
measure how much each responding method can improve TCP’s performance when all wireless FR/FRs are detected, and also we measure how much each responding method degrades TCP’s performance when all congestion FR/FRs are mistakenly detected as wireless FR/FRs. For this, we designed about 120 different simulation scenarios by setting different values for network parameters such as the buffer size, the number of hops, and the loss rate, and then we grouped all the scenarios into two groups: W , and C group. W group, which consists of 60 different simulation scenarios, is designed to measure how much each responding method can improve TCP’s performance when all wireless FR/FRs are detected. All simulation scenarios in this group have packet losses caused by only wireless transmission errors, and only one TCP flow is used to avoid causing congestion. Thus, all TCP’s FR/FRs triggered in this group are wireless FR/FRs. C group, which consists of 60 different simulation scenarios, is designed to measure the performance degradation of TCP when each responding method is applied to congestion FR/FRs. Each scenario in this group has packet losses caused by only congestion, and we increased the number of TCP flows gradually to make different levels of congestion. Thus, all the triggered FR/FRs in this group are congestion FR/FRs. 4.3 Results and Analysis Figure 3 shows TCP’s performance enhancement according to the different responding methods in W group. In this group, we checked how much each responding method can improve TCP’s performance when it can detect all wireless FR/FRs. Whenever we applied each responding method in the table 1 to the same scenario, we modified TCP’s fast recovery to react like the responding method when FR/FRs are triggered. In the figure, x axis represents the responding methods, and y axis represents TCP’s performance enhancement when each responding method is applied. To check if each responding method improves TCP’s performance, we set the performance of TCP as the criterion for comparison. Thus, when the method 0 (TCP’s original response) is used, we set the performance of TCP as 100%, and then we compared it with the enhancements when the other methods are used. For example, in the figure 3(a), we can see that
A Comparative Study on Responding Methods for TCP’s Fast Recovery
IORZV IORZV
IORZV IORZV
IORZV IORZV
FRQJHVWLRQDYRLGDQFH
VORZVWDUW
KRS
KRSV
KRSV
VORZVWDUW
4GURQPFKPI/GVJQFU
(a)
177
FRQJHVWLRQDYRLGDQFH 4GURQPFKPI/GVJQFU I
(b)
Fig. 4. TCP’s performance degradation according to the responding methods in C group
the methods (1, 21, 24, and 27) improve TCP’s performance compared to TCP’s original response (the responding method 0) while the other responding methods degrade TCP’s performance. Figure 3(a) shows TCP’s performance enhancement according to the packet loss rate when each responding method is used at a 1-hop TCP communication. Each bar graph in each responding method shows TCP’s performance enhancement when the loss rate is 1%, 3%, and 5% respectively. For example, with the responding method 1, TCP’s performance is improved when the loss rate is 1%, but the performance degrades as the loss rate increases (3% and 5%). While the responding method 1 tends to degrade TCP’s performance as the loss rate increases, the method 21 tends to increase TCP’s performance as the loss rate increases. Among the four methods (1, 21, 24, and 27), the responding method 21 is the best when the loss rate is high. Figure 3(b) shows TCP’s performance enhancement according to the number of hops in each responding method. Each bar graph in each responding method shows TCP’s performance enhancement in a 1-hop, 3-hops, and 5-hops TCP communication, respectively. In this graph, we can see that the responding method 1 enhances TCP’s performance significantly as the number of hops increases. Although the method 21 improves TCP’s performance, it tends to decrease as the number of hops increases. Thus, the responding method 1 is the best when the number of hops increases. From the figure 3, we can see that all the responding methods in the slow start part (from 11 to 17) do not improve TCP’s performance while some of the methods in the congestion avoidance part (from 21 to 27) enhance TCP’s performance. Additionally, we can see that as the size of cwnd increases (specially the methods 15, 16, 25, and 26) TCP’s performance degrades severely. This is because, as cwnd becomes larger, there is high probability not only to have burst losses when the packet loss rate is high, but also to have additional congestion when the bandwidth is not sufficient. This observation informs us that it is better to set cwnd with small values and to let a TCP sender enter the congestion avoidance phase. Figure 4 shows TCP’s performance according to the different responding methods in C group. In this group, we aimed to measure how each responding method degrades
178
M.-Y. Park et al.
TCP’s performance when all congestion FR/FRs are mistakenly detected as wireless FR/FRs. This is necessary because there is no perfect LDA and all LDA have a possibility to misjudge congestion FR/FRs to wireless FR/FRs. When a LDA misjudges a congestion FR/FR as a wireless FR/FR, the best responding method should hardly degrade TCP’s performance. Figure 4(a) shows TCP’s performance according to the number of TCP flows when each responding method is used in a 1-hop TCP communication. Each bar graph in each responding method shows the performance when the number of TCP flows is 5 flows, 15 flows, and 20 flows, respectively. In this figure, we can see that the responding methods (1, 22, and 24) show the similar performance degradation, and the responding method 21 shows the least performance degradation. The other methods degrade TCP’s performance severely. Figure 4(b) shows TCP’s performance according to the number of hops in each responding method. Each bar graph in the each responding method shows the performance in a 1-hop, 3-hops, and 5-hops TCP communication, respectively. This graph also shows the similar result with that in Figure 4(a). In the graph, we can see that the responding method 21 hardly degrades TCP’s performance compared to the other responding methods. From the two graphs in Figure 4, we can know that the responding method 21 is the best response for the case when congestion FR/FRs are mistakenly detected as wireless FR/FRs.
5 Conclusions To find the best response for the detected wireless FR/FRs, in this paper, we compared TCP’s performance based on the 15 different responding methods as the loss rate increases or the number of hops increases. Our experiment results show that the responding method 1 is the best as the number of hops increases, and the responding method 21 is the best as the loss rate increases. Also, it shows that, when a LDA detects mistakenly congestion FR/FRs as wireless FR/FRs, the responding method 21 is the best in avoiding TCP’s performance degradation. Among the 15 different methods, there is no single responding method which is the best for all network conditions. This informs us that the responding method needs to dynamically react to wireless FR/FRs according to the network conditions. In the near future we will develop a dynamic responding method for the detected wireless FR/FRs in order to improve TCP’s performance in wireless networks.
References 1. Allman, M., Paxson, V., Stevens, W.: Tcp congestion control. RFC 2581 (1991) 2. Brakmo, L., Peterson, L.: Tcp vegas: end to end congestion avoidance on a global internet. IEEE Journal on Selected Areas in Communications 13(8), 1465–1480 (1995) 3. Fu, C.P., Liew, S.: Tcp veno: Tcp enhancement for transmission over wireless access networks. IEEE Journal on Selected Areas in Communications 21(2), 216–228 (2003) 4. Fu, Z., Zerfos, P., Luo, H., Lu, S., Zhang, L., Gerla, M.: The impact of multihop wireless channel on tcp throughput and loss, pp. 1744–1753 (2003)
A Comparative Study on Responding Methods for TCP’s Fast Recovery
179
5. Lim, C.H., Jang, J.W.: Robust end-to-end loss differentiation scheme for transport control protocol over wired/wireless networks. Communications, IET 2(2), 284–291 (2008) 6. Park, M., Chung, S., Sreekumari, P.: End-to-end loss differentiation algorithm based on estimation of queue usage in multi-hop wireless networks, vol. E92-D(10), pp. 2082–2093 (October 2009) 7. Park, M.Y., Chung, S.H.: Analyzing effect of loss differentiation algorithms on improving tcp performance. In: Proceedings of the 12th International Conference on Advanced Communication Technology, ICACT 2010, pp. 737–742. IEEE Press, Piscataway (2010) 8. Postel, J.: Transmission control protocol. RFC 793 (1981) 9. Samaraweera, N.: Non-congestion packet loss detection for tcp error recovery using wireless links. IEE Proceedings Communications 146(4), 222–230 (1999) 10. Tung, L.P., Shih, W.K., Cho, T.C., Sun, Y., Chen, M.C.: Tcp throughput enhancement over wireless mesh networks. IEEE Communications Magazine 45(11), 64–70 (2007) 11. Wu, E.K., Chen, M.Z.: Jtcp: jitter-based tcp for heterogeneous wireless networks. IEEE Journal on Selected Areas in Communications 22(4), 757–766 (2004) 12. Yang, G., Wang, R., Gerla, M., Sanadidi, M.: Tcp bulk repeat. Computer Communications 28(5), 507–518 (2005) 13. http://www.scalable-networks.com/index.php
The Feasibility Study of Attacker Localization in Wireless Sensor Networks Young-Joo Kim1 and Sejun Song2,* 1
ETID Department, Texas A&M University, College Station, Texas 77843, U.S.A 2 Assistant Professor, ETID Department, Texas A&M University {akates,song}@tamu.edu
Abstract. Wireless Sensor Networks have much vulnerability in the aspect of security because sensor nodes are limited in power, computational capabilities, and memory, and they even use a shared medium. For this reason, wireless sensor networks can be attacked easily by adversary and their attacks are mainly accomplished through channels or frequencies being used by these nodes using various attack mechanisms. Representative one of these attacks is DoS (Denial-of-Service) attack such as jamming attack, collision attack, and flooding attack. As DoS is attack which diminishes or eliminates a network’s capacity, sensor nodes must be able to prevent or detect DoS attack. Mechanisms corresponding to this have been studied. No work, however, can cover these attacks because there are so many attack patterns and trade-offs between performance and accuracy. In this paper, we present a simple approach which can detect attack’s location as fast as possible using CCA (Clear Channel Assessment) value. For this, we do the feasibility study for our approach through the experiment using TinyOS mica2 nodes. In addition to, we suggest new some alternatives through this study. As its result, we can speculate that the rate of false alarm is high but the detection time may be reduced in detecting attack’s location. Keywords: Wireless Sensor Networks, Attacks, Denial-of-Service, CCA.
1 Introduction The original idea of the internet was to build a network that is excellently protected against nuclear weapon attacks in cold war and always available to its users, but at those days the inventors forgot about some hackers sitting with their laptop in the basements and are connected to the network itself. The result is that we have been under attack today from lots of attacks (e.g. connection flooding) and especially servers are highly vulnerable to them. Such hacker attacks have been magnified toward whole fields of computing systems such as distributed computing [1]. Representative field of distributed computing is Wireless Sensor Network [2-4] called *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 180–190, 2011. © Springer-Verlag Berlin Heidelberg 2011
The Feasibility Study of Attacker Localization in Wireless Sensor Networks
181
WSN because a large amount of sensor nodes are deployed on a monitoring realm and these nodes are doing their own computing. They, however, have limited power, computation capability, and storage/communication capability. For this reason, WSNs make it easy for adversaries to conduct radio interference, transmission failure, battery depletion, etc using various attack methods such as DoS (Denial-of-Service) [5-6] attack. This attack is an explicit attempt to prevent the legitimate use of a service. This kind of this attack is categorized into jamming attack [7-10], collision attack [10-12] and flooding attack [10, 13, 14, 15]. Mechanisms [16-20] which detect them have been studies but they always have considered tradeoffs between performance and accuracy. In this paper, we present a simple approach which can detect attack’s location as fast as possible using CCA (Clear Channel Assessment) [21-23] value. Basic concept of this approach is to check threshold of CCA value of each node. For example, if a node’s CCA value exceeds threshold, this mechanism judges that this node is attacked from adversaries. For this, we do feasibility study for our approach through experimentation using TinyOS [24] mica2 nodes. In addition to, we suggest new some alternatives through this study. As its result, we can speculate that false alarm is high but the detection time is reduced in detecting attack’s location. The rest of the paper is structured as follows. Section 2 describes various attack methods such as DoS (Denial-of-Service) attack and illustrates CCA (Clear Channel Assessment). And we address our proposed solution of attack detection approach in Section 3. In Section 4, we will show the empirical result about our approach. Finally, we conclude our feasibility study in Section 5.
2 Background The sensor nodes are usually scattered in a sensor field. Each of these scattered sensor nodes has the capabilities to collect data and route data back to the sink and the end users. The communication architecture makes sensor networks vulnerable to various attacks methods such as DoS attacks because their primary weakness, shared by all wireless networking devices, is the inability to secure the wireless medium. Any adversary in radio range can overhear traffic, transmit spurious data, or jam the network. This section describes DoS (the representative attack of communication attacks) and illustrates CCA in the respect of attack. 2.1 Communication Attacks WSNs use a distributed-based wireless computing technology for monitoring a variety of environments such as remote environmental sites, hostile battlefields, health-care facilities, etc. A major benefit of WSNs is that they perform in-network processing to reduce large streams of raw data into useful aggregated information. However, unlike data and control mechanisms of general computing, it is realistically difficult to provide integrity and authenticity guarantees in case of sensitive data of WSNs because sensor nodes are limited in resources. So WSNs have no choice but to be vulnerable to attacks such as DoS (Denial-of-Service). DoS attack, an explicit attempt to prevent the legitimate use of a service, is any event that diminishes or
182
Y.-J. Kim and S. Song
eliminates a network’s capacity to perform its expected function. Hardware failures, software bugs, resource exhaustion, environmental conditions, or any complicated interaction between these factors can cause DoS. And each layer is vulnerable to other type of DoS attacks and has different options available for its defense. In this subsection, we only concentrate on the Denial-of-Service attack on the communication within the sensor networks. There are three kinds of these attacks: jamming attack [7-10], collision attack [10-12], and flooding attack [10, 13, 14, 15]. (1) Jamming attack, a standard DoS attack, is simply to jam a node or set to nodes. And jamming is simply the transmission of a radio signal that interferes with the radio frequencies being used by the sensor network. So jamming attack can result in no messages are able to be sent or received. However, because attackers are limited in energy, constant transmission of a jamming signal is an expensive use of energy so that they may use sporadic or burst jamming instead. These jamming attacks have many different attack strategies [5, 7, 8]: Constant jammer, Deceptive jammer, Random jammer, and Reactive jammer. (2) Collision attack is that an attacker may simply intentionally violate the communication protocol and continually transmit messages in an attempt to generate collisions. By detecting and parsing radio transmissions near the victim, the attacker can disrupt key elements of packets such as fields that contribute to checksums or the checksums themselves. Such disruption may trigger back-offs in contention-control mechanisms that delay other messages. Unfortunately, in WSNs detection of a collision with one’s own transmission is not easy because collision attacks are cooperative by nature. (3) Flooding attack overwhelms a victim’s limited resources, whether memory, processing cycles, or bandwidth. For example, if a malicious node may send many connection requests to a susceptible node, resources must be allocated to handle the connection request. Eventually a node’s resources will be exhausted, thus rendering the node useless. In sensor networks, various attacks and the corresponding security mechanisms have been studied [16], [17], [18], [19], [20]. However, no work can cover all attacks on wireless sensor networks because there are so many attacks and trade-offs between performance and accuracy. In this paper, in order to focus on performance than accuracy, we propose a simple detection mechanism using CCA value of each node. 2.2 Clear Channel Assessment (CCA) The CCA value (Clear Channel Assessment) is used in networks as an indication for the availability of a wire or for the availability of a specific channel/frequency in case of wireless sensor networks. If no node is using the channel a node will usually immediately start to send a message. The other case is that the channel is blocked by another node sending some messages. In this case the CCA value will be increased and the node will retry to listen for a free channel. If this fails for some times the node (in our case TinyOS mica2 nodes) uses randomly a created break time in the juncture that the channel will be free after that time. We will illustrate CCA value for an understanding of a network congestion related to attack. Congestion appears in a case of failure retries to get a free channel. Such congestion is a quite normal result in a network. For example, as shown in the Fig. 1, there might be a lot of congestions at the node 3 if two nodes (1 and 2) exchange data every 30 ms. If it is so, CCA value of the node 3 will be increased due to interferences of the node 1 and 2.
The Feasibility Study of Attacker Localization in Wireless Sensor Networks
183
Fig. 1. Two nodes (1 and 2) exchanging data will increase the CCA value of the node 3
Such congestion might cause problems in a network. There are a lot of sources for these interferences including somebody using the microwave at the best or in a worst case an attacker, trying to break down the network like Fig. 2. If the node 4 persistingly sends packets to somebody, node 1 and 3 will be unable to send their own packets because of the node 4. This is, if the node 4 is attacker, communication between nodes may be difficult due to congestion of the node 4. Under these circumstances, we can know CCA value of nodes. Two nodes 1 and 3 are attacked by the node 4 and then their CCA value will be increased.
Fig. 2. Two nodes are attacked by the node 4
3 Feasibility Study: Simple Attacker Localization Approach In section, we describe a simple attacker localization algorithm using CCA value and in this algorithm we account for factors influenced in false alarm. Also we consider our suggested approach carefully and present additional solutions for improving this approach. 3.1 Simple Attack Detection The purpose of suggested simple attack detection approach is to detect attackers as fast as possible using CCA value but there may be a lot of false alarms because it is not easy
184
Y.-J. Kim and S. Song
to determine exactly whether signals on the air are due to attack or any noise. This problem will be discussed in the future. However, detection performance will be good because the cost for CCA duration (shall be 8 symbol periods or 128 μs [23]) is low. Fig. 3 is to show the algorithm of pseudo-level which is able to detect simply an attack’s location. This algorithm is executed on each node and any node which finds attacker will send detection information to sink node. In this algorithm, criticalCountvalue (line 09) and criticalSignalvalue (line 10) are important factors because these values are concerned with the number of false alarm. criticalCountvalue means critical value of CCA and the value is set by user. criticalSignalvalue means signal strength such as RSSI (Received Signal Strength Indication) and this value must be able to identify whether it is an interference signal or a real signal. The value also is set by user. The code between line 14 and line 18 is to determine that there is attacker around a node and then detectionLocation( ) is called. If node judge that there is no attacker, the code of line 23 is executed. And if there is suspicion of attack, node’s CCAValue is increased (line26 to line 27).
01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29:
simpleAttackDetection ( ) { int CCACount = 0; int CCAValue = 0; int criticalSignalvalue = 0; int criticalCountvalue = 0; char *msg; Node nodeInfo; …. criticalCountvalue = getCountValue( ); criticalSignalvalue = getCriticalSignal( ); CCACount = getCurrentCount(nodeInfo ); if CCACount is equal to and greater than criticalCountvalue { detectLocation(nodeInfo); return 0; } CCAValue = getCCAsignal(nodeInfo); if CCAValue is not criticalSignalvalue sendPacket(nodeInfo, msg); else CCAValue is equal to and greater than criticalSignalvalue { CCACount = CCACount + 1; putCurrentCount(CCACount, nodeInfo); } } Fig. 3. Algorithm of Simple Attack Location Detection
The Feasibility Study of Attacker Localization in Wireless Sensor Networks
185
Our original idea was to get to know the position of an attacker in getting the CCA values to guess the approximate position. So we need to know, which nodes have a problem. As an attacker has no rules and is just jamming the network, we thought about the possibility to turn our nodes also in a position of ignoring the CCA and try to send a help message around, which we defined as a 6 value (= a lot of congestions). The problem: How to define an attacker from a client perspective? We decided after some tests to define a value of 50 congestions, which are reset after 30 seconds, as a problem and turn the node into this "help message" mode. Later we figured out that no matter what value we set, it sometimes occur even there is no attacker. So the chance for a false alarm is very high. However, detection time may be efficient. The result indeed was that it's still quite normal that sometimes e.g. 50 congestion appears and then the node turns itself into an attacker jamming the network with help messages. The effect is clear - all other nodes begin to detect also some kind of attack as their channel is being used and start themselves to send the help messages around. So the whole network was getting down like dominoes. A self-healing is not possible. Even if some nodes end their phase of sending around a help messages (after 30 seconds), they immediately restart it, as they are still being "attacked" by other nodes around with a different start time. We got a conclusion that it is not easy for CCA value to be used as indicator for an attacker. However, if we set CCA value modeled through experiment and communication methods (e.g. routing, channels, devices), suggested approach will be efficient. 3.2 Consideration of This Approach As an attacker has no rule and is just jamming the network, it is not easy to identify attacker in WSNs because of a shared communication channel, limited resources, and so on. For example, all nodes including attacker use the same channel and their nodes begin to detect an attacking node using this channel. If the detection process is continuously gone, maybe the whole network was getting down like dominoes. Fig. 4 is to show that node 4 send interference signals with short cycle so that two nodes (1 and 2) are attacked on the same frequency by this and these node are completely blocked. In other words, even if one of two nodes detects node 4 as attacker, it is difficult to send a detection message to sink node because this network may be break down. Therefore, additional solutions may be necessary to cope with serious problems.
Fig. 4. Two nodes (1 and 3) are attacked on a frequency
186
Y.-J. Kim and S. Song
Solution "A" Solution "A" might be to have a list for frequencies and a switching after a specified time to the next frequency of the list on all nodes - independent from a problem. This solution should work and will allow the nodes to transfer most of the time data to the base node. The disadvantage will be that for some frequencies the nodes are still failing (if attacked), so we have to decide for our application if it's important to get every value or only some. We also have to keep the same time on all nodes (sync. problem) to switch at the same time the frequency. Solution "B" Solution "B" is a possible improvement of solution A as we can try to find the frequencies, which are attacked and blacklist them on all nodes in exchanging them over a free frequency. Therefore it should be possible to transfer the values nearly all the time until all frequencies are blacklisted. Periodical checks might be a way to remove frequencies from the blacklist. Furthermore we need to be careful putting a frequency on the blacklist as a high CCA value is only an indicator for a current problem, which might be gone after some seconds. A node should therefore check a frequency quite often before blacklisting it. Solution “C” Unlike Solution “A” and “B”, Solution “C” is to use additional channel or device. For example, each node has an exclusive communication channel which can transfer detection information and is not affected by interference signals in its transmission. Also each node is to make its resource rich. In order to reliable communication, each node is connected to a wired medium such as TCP. If so, regardless of attacker’s interference signal, detection information is sent to sink node independently.
4 Implementation The idea we had was to find first the 'normal values' for the CCA value in usual environment and compare its values while a node is being attacked so that we can react to this situation in case of a real attack. We have searched for a way to present the data in a visual way and found the software Octopus [2] developed by the University of Dublin, Ireland. The software is available for free under the GPL. We added some changes regarding our purposes. The next problem here was that the software in its newest version isn't designed for mica2 nodes with CC1000, but we found an old version of it on the internet to improve it to our needs as at least the implementation for the CC1000 chip, which is used by the mica2 nodes, was already available. The part we needed to crack was TinyOS itself as it's based on a special C dialect called NesC. Our first goal was to get the CCA value of a node to the Octopus GUI and we decided to replace the sensor values, which are normally presented by Octopus to a user as a line chart. We also changed the appearance of this line chart and adjusted the values of the axes to fit in the range. Fig. 5 is to show Octopus application displaying CCA values. The CCA value is on mica2 nodes defined from 0 to 5, where 5 means congestion. 6 is used by us to indicate a lot of congestions.
The Feasibility Study of Attacker Localization in Wireless Sensor Networks
Fig. 5. Octopus Application displaying CCA values
187
Fig. 6. Network for the first test case
4.1 The 1st Measurement In these 15 minutes, we have measured CCA value every second using Test case which consists of a base node and four end nodes. Fig. 6 is displaying that the base node and four end nodes construct a network at first. Node 1 means base node like sink node and the rest nodes means end nodes (node 1, node 2, node 3, and node 4). And we measured CCA value of each node, which is started randomly. Fig. 7 is to show five CCA values (base/sink node, node 1, node 2, node 3, and node 4) measured in each node every second. The measured results mainly have a CCA value between 1 and 4. In this case, there exists no attack of node. Also we can know that a congestion defined with a value of 5 appears quite often in this experimentation. This is an important fact because congestion might not be attack even if that occurs in a network. This value is concerned with criticalCountvalue of Fig. 3. We could empirically obtain this value to determine whether it is attack or not in mica2 experimentation device. A question to us was how we define an attacker. In our case we are only watching at attackers that are trying to harm or bring the wireless sensor network down. Therefore the worst case attacker would be a node jamming the network all the time at a used frequency so that some nodes are not or at least mostly not reachable anymore. For us to simulate this attacker, a simple experiment is done. We picked up the oscilloscope example of TinyOS, changed the delivery rate to 10 ms and disabled the CCA check. The result is a node jamming the frequency (we used 916.4 MHz), while ignoring all other nodes in that area - a node really similar to an attacker. 4.2 The 2nd Measurement This experiment is to measure how sensor network is when an attack is happened on it. Its test case consists of one base node, which is always needed to connect Octopus with four sensor nodes. Then three nodes are again transferring their CCA value every second regularly and one node as the attacker turns on randomly and sends an interference signal periodically. What we got was even more than the expected situation as some of the nodes were completely disconnected and disappeared in the control GUI of Octopus as shown in Fig. 8, while the attacker was in action. In this figure, node 1 (base node) is only connected with node 2 and the rest nodes are disconnected. We had to reset the attacker from time to time because of some of the values during this knock-out-period. As we were able to get the CCA value from time to time we can assume that the CCA value was not always on five. The value itself is requested and sends every second by the node.
188
Y.-J. Kim and S. Song
Mote 0 (Basis) 4.5 4 3.5
CCA value
3 2.5 Reading 2 1.5 1 0.5 0 0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
Time (ms)
Mote 0: Sink node Mote 1
Mote 2
6
6
5
5
4
Reihe1
3
CCA value
CCA value
4
2
1
1
0
Reihe1
3
2
0 0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0
100000
200000
300000
400000
Time (ms)
500000
Mote 1: Node 1
700000
800000
900000
1000000
Mote 2: Node 2
Mote 4
Mote 5
6
6
5
5
4
Reihe1
3
CCA value
4 CCA value
600000
Time (ms)
2
1
1
0
Reihe1
3
2
0 0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
0
100000
200000
300000
400000
Time (ms)
500000
600000
700000
time (ms)
Mote 3: Node 3
Mote 4: Node 4
Fig. 7. Measured CCA value about each node
Fig. 8. An attacker brings the network down
800000
900000
1000000
The Feasibility Study of Attacker Localization in Wireless Sensor Networks
189
5 Conclusion We presented a simple approach which can detect attack’s location as fast as possible using CCA value. Our original idea was to get to know the position of an attacker in getting the CCA values to guess the approximate position. As an attacker has no rules and is just jamming the network we thought about the possibility to turn our nodes also in a position of ignoring the CCA and try to send a help message around, which we defined as a 6 value (= a lot of congestions). For feasibility study of suggest approach, we have done some experiments. As this result, in the absence of an attacker, we could know that the measured CCA mainly is a value between 1 and 4 in mica2 node and often a value of 5 appears in it. Also we could know that communication between nodes was completely disconnected if there is an attacker. CCA value we want, however, was not measured. In the future, we will analyze an accurate pattern of CCA value for reducing false alarms and will suggest improved approach.
References [1] ITU-T: Distributed Computing: Utilities, Grids & Clouds. ITU-T Technology Watch Reports, International Telecommunication Union (2009) [2] Akyildiz, L., Su, W., Sankarasubramaniam, Y., Cayirci, E.: A Survey on Sensor Networks. Communications Magazine 40(8), 102–114 (2002) [3] Kuorilehto, M., Hannikainen, M., Hamalainen, T.: A Survey of Application Distribution in Wireless Sensor Networks. EURASIP Journal on Wireless Communications and Networking 5(5), 774–788 (2005) [4] Yick, J., Mukherjee, B., Ghosal, D.: Wireless sensor network survey. Computer Networks: Int’l Journal of Computer and Telecommunications Networking (ACM) 52(12), 2292– 2330 (2008) [5] Wood, A., Stankovic, J.: Denial of Service in Sensor Networks. Computer 35(10), 54–62 (2002) [6] Raymond, D., Midkiff, S.: Denial-of-Service in Wireless Sensor Networks: Attacks and Defenses. In: Pervasive Computing, pp. 74–81. IEEE, Los Alamitos (2008) [7] Xu, W., et al.: The Feasibility of Launching and Detecting Jamming Attacks in Wireless Networks. In: MobiHoc 2005: Proc. 6th ACM Int’l. Symp. Mobile Ad Hoc Net. and Comp., pp. 46–57 (2005) [8] Law, Y., et al.: Link-Layer Jamming Attacks on S-Mac. In: Proc. 2nd Euro. Wksp. Wireless Sensor Networks, pp. 217–225 (2005) [9] Xu, W., Ma, K., Trappe, W., Zhang, Y.: Jamming Sensor Networks: Attack and Defenses Strategies: IEEE Network, 41–47 (2006) [10] Han, S., Chang, E., Gao, L., Dillon, T.: Taxonomy of Attacks on Wireless Sensor Networks. In: Proc. of the First European Conference on Computer Network Defence (EC2ND 2005), pp. 97–105. Springer, Heidelberg (2005) [11] Reindl, P., Nygard, K., Du, X.: Defending Malicious Collision Attacks in Wireless Sensor Networks. In: IEEE/IFIP International Conference on Embedded and Ubiquitous Computing, pp. 771–776. IEEE, Los Alamitos (2010) [12] Kulkarni, R., Venayagamoorthy, G., Thakur, A., Madria, S.: Generalized Neuron Based Secure Media Access Control Protocol for Wireless Sensor Networks. In: Proc. of Computational Intelligence in Miulti-criteria Decision-making, pp. 16–22. IEEE, Los Alamitos (2009)
190
Y.-J. Kim and S. Song
[13] CERT Coordination Center: Smurf IP denial-of-service attacks. Technical report, CERT Advisory CA-98:01 (1998) [14] Aura, T., Nikander, P.: Stateless connections. In: Han, Y., Quing, S. (eds.) ICICS 1997. LNCS, vol. 1334, pp. 87–97. Springer, Heidelberg (1997) [15] Singh, V., Jain, S., Singhai, J.: Hello Flood Attack and its Countermeasures in Wireless Sensor Networks. International Journal of Computer Science Issues 11, 23–27 (2010) [16] Perrig, W., Stankovic, J., Wagner, D.: Security in Wireless Sensor Networks. Communication of the ACM 47(6), 53–57 (2004) [17] Hu, N., Smith, R., Bradford, P.: Security for fixed sensor networks. In: ACM Southeast Regional Conference, pp. 212–213 (2004) [18] Yang, H., Ye, F., Yuan, Y., Lu, S., Arbaugh, W.: Toward resilient security in wireless sensor networks. In: Proc. of ACM Sym. on Mobile Ad Hoe Networking and Computing, pp. 34–45 (2005) [19] Li, Z., Trappe, W., Zhang, Y., Nath, B.: Robust Statistical Methods for Securing Wireless Localization in Sensor Networks. In: Proc. International Symposium on Information Processing in Sensor Networks, pp. 91–98. IEEE, Los Alamitos (2005) [20] Olariu, S., Xu, Q.: Information assurance in wireless sensor networks. In: Proc. of the 19th IEEE International Parallel and Distributed Processing Symposium. IEEE, Los Alamitos (2005) [21] Engstrom, J., Gray, C.: Clear Channel Assessment in Wireless Sensor Networks. In: Proc. of the 46th Annual Southeast Regional Conference (ACM-SE 2008), pp. 464–468. ACM, New York (2008) [22] Ramachandran, I., Roy, S.: Clear Channel Assessment in Energy-Constrained Wideband Wireless Networks. IEEE Wireless Communications, 70–78 (2007) [23] Ramachandran, I., Roy, S.: On the Impact of Clear Channel Assessment on MAC Performance. In: Proc. of Global Telecommunications Conference, pp. 1–5. IEEE, Los Alamitos (2006) [24] Culler, D.: TinyOS: Operating System Design for Wireless Sensor Networks. Sensors (2006)
Efficiency of e-NR Labeling for On-the-fly Race Detection of Programs with Nested Parallelism Sun-Sook Kim1 , Ok-Kyoon Ha2 , and Yong-Kee Jun2, 1
Specialized Graduate School for Aerospace Engineering, 2 Department of Informatics, Gyeongsang National University, Jinju 660-701, South Korea {sskim,jassmin,jun}@gnu.ac.kr
Abstract. On-the-fly race detector using Lamport’s happened-before relation needs thread labeling scheme for generating unique identifiers which maintain logical concurrency information for the parallel thread. e-NR labeling creates the NR labels by inheriting from the parent thread a pointer to their shared information, so it requires constant amount of time and space overhead on every fork/join operation. Previous work compares e-NR labeling only with OS labeling using a set of synthetic programs to support that e-NR labeling is efficient in detecting races during an execution of programs with nested parallelism. This paper compares empirically e-NR labeling with BD labeling that shows the same complexity of space and time with OS labeling. The results using OpenMP benchmarks with nested parallelism show that e-NR labeling is more efficient than T-BD labeling by at least 2 times in total monitoring time for race detection, and by at least 3 times in average space for maintaining thread labels. Keywords: Empirical comparison, thread labeling, race detection, nested parallelism, e-NR labeling.
1
Introduction
Data races [2, 11] in parallel programs is a kind of concurrency bugs that can be occurred when two parallel threads access a shard memory location without proper inter-thread coordination, and at least one of these accesses is a write. The races must be detected for debugging, because they may lead to unpredictable results. However, it is difficult and cumbersome to detect data races in an execution of the parallel program. Any of on-the-fly race detection [7, 10]
“This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education, Science and Technology(2010-0024778)”. Corresponding author: In Gyeongsang National University, he is also involved in the Research Institute of Computer and Information Communication (RICIC).
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 191–200, 2011. c Springer-Verlag Berlin Heidelberg 2011
192
S.-S. Kim, O.-K. Ha, and Y.-K. Jun
techniques need a representation of Lampot’s happened-before relation [9] for generating unique identifiers which maintain logical concurrency information of parallel threads. In general, the time required for the on-the-fly race detection techniques to be consumed consists of two factors. One is to determine races and maintain access histories in each access to shared variables. The other is to generate a label for each thread to determine races during an execution of the parallel program. The space also consists of two factors. One is to store access histories for all shared variables. The other is to maintain the concurrent thread labels. The time and space complexity of the thread labeling schemes participate actively in the efficiency of on-the-fly race detection. Therefore, we can reduce the time and space cost for race detection by improving the efficiency of the thread labeling scheme. e-NR lableing [6] which improves NR labeling creates the NR labels [8] using their shared information, and it does not depend on the nesting depth of a parallel program. Thus, it requires a constant amount time and space complexity. This approach uses a pointer for a thread label to refer to its ancestor list by inheritance or update. For the reference, it maintains only the pointer which points to ancestor list instead of the list structure of the original NR labeling. The time to generate a unique identifier of each thread is O(1), and the storage space for the concurrency information is O(V +T ) in the worst case. The previous work compares e-NR labeling only with OS labeling using a set of synthetic programs to support that e-NR labeling is efficient in detecting races during an execution of programs with nested parallelism [3]. This paper compares empirically e-NR labeling with BD labeling [1, 12] that shows the same complexity of space and time with OS labeling [6]. Our experiments were performed on OpenMP benchmark programs [5] with nesting depths of two or three and maximum parallelisms varying from 2 to 64. The results using OpenMP benchmarks with nested parallelism show that e-NR labeling is more efficient than T-BD labeling by at least 2 times in total monitoring time for race detection, and by at least 3 times in average space for maintaining thread labels. We organize this paper as follows. Section 2 describes the notion of nested parallel programs and on-the-fly race detection. Section 3 introduces e-NR labeling and T-BD labeling to detect the races of programs with nested parallelism. In Section 4, we empirical compare and analyze the efficiency of two labeling scheme used for on-the-fly race detection with OpenMP benchmark programs. In the last section, we conclude our results.
2
Background
Parallel or multi-threaded programs are a natural consequence of the fact that multi-processor and multi-core systems are already common. This section illustrates the notion of nested parallelism and introduces on-the-fly race detection.
Efficiency of e-NR Labeling for On-the-fly Race Detection of Programs #pragma omp parallel for private( i ) for (i = 1; i <= 2; i++) { do something if (i % 2 == 0) { #pragma omp parallel for private( j ) for ( j = 1; j <= i; j++) {
5 5
5 5
5
do something } #pragma omp parallel for private( j ) for ( j = 1; j <= i; j++) {
193
5 5
5
do something } }
5 5
}
Fig. 1. An example of nestable fork/join model with parallel execution and its POEG
2.1
Program Model
Since parallel programs were introduced, its various models have been presented to us. OpenMP [13, 14] is de-facto standard for portable and scalable parallel programming, which uses a fork/join model with nested parallelism [3]. The OpenMP programs contain loop level parallelism which employs structured fork/join execution of threads, thus it makes the program efficiently with lower overhead for parallel execution. The speedup of a parallelized program is increased as much as the fraction of parallelized regions in the program [3]. In general, the ideal speedup is figured out to us when the rate of parallelized region is larger than 90% in the program. It is difficult to make sufficient parallelism using fork-join model of parallel execution without nested parallelism, because the parallel loop execute sequentially the nested loops on their parallel threads. Thus main benefits of nested parallelism on a multi-core system are portable extension of parallel regions and speedup of program executions. Nested parallelism [3] is defined as multiple levels of parallelism. The loop is called an inner-most loop if there is no other loop contained in a loop body. Otherwise, it is called an outer loop. Any loop in the program may contains one or more other loops, called nested loops, in their body. In other words, an individual loop can be enclosed by many other loops in nested parallel loop. Nesting depth N of a program is maximum value of nesting levels in the program, and it exists at least one N level nested loop. If it is exactly one loop at each nesting level, it is called one-way nested loop. A loop is multi-way nested loop
194
S.-S. Kim, O.-K. Ha, and Y.-K. Jun
if there exists two or more disjoint loops in a same nesting level. This paper consider nested parallel loops without inter-thread coordination in the OpenMP programs. Fig. 1 shows a multi-way nested parallel program of nesting depth two. If we remove the first nested loop indexed by j in the figure, the program becomes a one-way nested parallel program with nesting depth of two. Let Ii denote the loop index of a loop Li , li and Ui denote the lower and upper bound of Ii respectively, and ai denote the increment of Ii . A loop Li is normalized if the values of both li and ai are one. ai is optional when it is equal to one. To make our presentation simple, we assume that all parallel loops are normalized loops. Fig. 1 shows a normalized loop with index j for which the lower bound is one, the upper bound is i, and the increment is one. The graph in Fig. 1 represents an execution of the program shown in the figure through a directed acyclic graph called POEG (Partial Order Execution Graph) [10]. In this graph, a vertex means fork or join operation for parallel threads and an arc started from a vertex represents a thread started from the vertex. Using such a graph, we can easily figure out happened-before relation between any pair of threads. For example, the POEG in Fig. 1 shows a partial order on the threads in an execution instance of the two-way nested loop. 2.2
On-the-fly Race Detection
Data races [2, 11] in parallel programs are a kind of concurrency bugs that may occur when two parallel threads access a shared memory location without proper inter-thread coordination, and at least one of the access is a write. The parallel program may not have the same execution order with the same input, because thread execution order is timing-dependent. It is difficult for programmer to figure out when the program runs into a race. Unpredictable and mysterious results due to data races may be reported to the programmer. Thus, races must be detected for debugging, but this is difficult and cumbersome. We have much previous work which progresses some techniques for race detection. These techniques have focused on static and dynamic analysis. Static analysis techniques may lead too many false positives, because they report unfeasible races by analyzing only source code without any execution. Moreover, using these techniques for detecting feasible races in a program with dynamically allocated data is NP-hard [11]. Dynamic analysis includes trace based post-mortem techniques and protocol based on-the-fly techniques, which report only feasible races occurred in an execution of a programs. Post-mortem techniques collect occurred accesses information into a trace file during an execution of a program. Then, these techniques analyze the traced information or re-execute the program to detect feasible races. However, this approach is inefficient because it requires large time and space cost for tracing accesses and threads. On-the-fly techniques are based on two different methods: lock-set analysis and happened-before analysis. Lock-set analysis reports races of monitored
Efficiency of e-NR Labeling for On-the-fly Race Detection of Programs
195
program by checking violation of locks. This technique is simple and can be implemented with low overhead. However, lock-set analysis technique may lead to many false positives, because it ignores synchronization primitives which are non-common lock such as signal/wait, fork/join, and barriers [7]. Happened-before analysis [9] which based on uses a logical time stamp[1, 6, 8–10] and a protocol for race detection. This method reports races between current access and maintained previous accesses by comparing their happenedbefore relation. Thus this method reports more precise results than lock-set analysis method. Mellor-crummey’s protocol [10] and Dining’s protocol [4] are commonly used for race detection. This paper uses only the former protocol due to considering our program model.
3
Thread Labeling Schemes
Thread labeling schemes generate unique identifiers which maintain logical concurrency information for the parallel threads. Any on-the-fly race detection techniques based on happened-before analysis needs a thread labeling scheme for monitoring parallel threads. This section simply introduce T-BD labeling and e-NR labeling, which are representative efficient thread labeling schemes for monitoring programs with nested parallelism. 3.1
T-BD Labeling
The original BD labeling generates a label on every event to fork or thread coordination such as message send and receive in addition to every logical thread. To monitor an execution of structured fork/join program such as OpenMP programs, it is sufficient for each logical thread in POEG to have just one label. Thus we use modified BD labeling, called Thread-based BD (T-BD) labeling [12], which is more efficient by making it generate only one label for each thread. A T-BD label consists of a pair of breadth (B) and depth (D) value, which is maintained using a list structure. The value B indicates the position of the thread in sibling threads, and the value D counts the number of threads that happened before the thread in the critical path. A list of thread consists of the information of its parent threads in the critical path. For current thread Ti in the nesting level i, we denote BDi = (Bi , Di ), and the list Li of Ti consists of three entries, denoted (ti−1 , Si , Di−1 ). In the Li , t is the loop index of thread which is less than or equal to Si which is the number of siblings of current thread created by fork operation. The ti−1 means the loop of parent thread. The breadth index of parent thread, and the Di is depth value i−1 value of Ti is defined as Bi = Bi−1 + (ti − 1) k=1 Li (k, S), where b0 = 0 and Li (k, S) stands for Si in Li (k). From this equation, we have B1 = t1 − 1, which is the remainder of dividing Bi with Li (1, S). For example, we consider a thread T7 in Fig. 2(a) whose BD label is (3,5) in the second nesting level. The B2 value of the thread is three, because B0 = 0, B1 = (B0 + 1) = 1, and B + 2 = (B1 + 1 × 2 = 3. L3 (3, S) of the thread is zero, because the thread does not know the number of threads to be forked next. Whenever
196
S.-S. Kim, O.-K. Ha, and Y.-K. Jun
(0,2) & (1,2,1) 5 (1,0,2)
5
(0,1) & (1,0,1)
(1,2) & (1,2,1) 5 (2,0,2)
5
5
(3,3) & (1,2,1) (2,2,2) (2,0,3)
(1,3) & (1,2,1) (2,2,2) (1,0,3)
5
(1,4) & (1,2,1) (2,0,4)
5
(3,5) & (1,2,1) 5 (2,2,4) (2,0,5)
(1,5) & (1,2,1) (2,2,4) (1,0,5)
5 5
5
(0,7) & (1,0,9)
(1,6) & (1,2,1) (2,0,6)
(a) T-BD labeling
[1,1,<1,10>] ** [1]
5
[1,1,<1,20>] * [1] [1,1,<11,20>] 5 ** [1]
5
[1,1,<11,15>] ** [1]
[1,1,<16,20>] ** [1]
OH-List addr One-way Region 1 [4,<1,20>] 2 [3,<11,20>]
5
5
5
[2,2,<11,20>] * [1,2]
5
[2,2,<11,15>] ** [1,2]
[2,2,<16,20>] ** [1,2]
[2,3,<11,20>]
5 * [1,2] 5
[1,4,<1,20>] * [1]
(b) e-NR labeling
Fig. 2. Example of two labling schemes for program in Fig. 1
a thread in the nesting level i is forked, the algorithm computes {Bi , Di , Ti } and Li (i, S) to generate BD label for current thread. We implemented T-BD labeling with four algorithms, T-BD Init(), T-BD Fork(), T-BD Join(), and TBD Ordered(), introduced in [12]. 3.2
e-NR Labeling
NR labeling [8] consists of a pair of join counter and nest region of thread, which is maintained using a list structure. The nest region is a range of number space created by thread operations, and the join counter counts the number or joined threads in the critical path. The list structure maintains information of parent threads associated with fork operations in the critical path. The size of the list depends on nesting depth, thus the efficiency of NR labeling depends on the nesting depth. e-NR labeling [6] was designed to improve the dependency of the nesting depth. The approach focused on improving the list of ancestor information, called one-way history OH. The basic idea is using a pointer for a thread label to refer its OH from the parent threads by inheritance or update, but does not regenerate. e-NR labeling creates NR labels which consist of the join counter and nest region, called one-way region OR, by the same way with the original NR labeling [8, 12]. The e-NR labeling is designed using only a OH-list which maintains one-way roots for all thread labels. For each OH of thread label, this labeling uses a pointer to refer selective to the OH-list. Fig. 2(b) shows an example of e-NR labeling using a OH-list and the pointers for the program shown in Fig. 1. In this figure, a “∗” symbol means a referred thread OH to the OH-list for joined thread label, and “∗∗” symbols mean that
Efficiency of e-NR Labeling for On-the-fly Race Detection of Programs
197
forked threads only refer to the thread OH of their one-way root. For the OH of a thread label, e-NR labeling employs one-way root counter ξ in OR of a thread, denoted by [ξ, λ, α, β]. Thus, a joined thread Ti has a label that consists of OR(Ti ) and OH(Ti ), denoted by OR(Ti ) ∗ OH(Ti ). Similarly, a forked thread Tj has a label denoted by OR(Tj ) ∗ ∗OH(Tj ). This labeling manages the OH-list only on join operations. If the OR(Ti ) of a joined thread Ti already exists in the OH-list, the λ of the one-way root is changed to λi . Otherwise, the OR(Ti ) is added as new one-way root in the OH-list. For example, consider two threads T5 and T8 in Fig. 2(b). The OR of a joined thread T5 is added into the OH-list because the OR(T5 ) does not exist in the OH-list. Then the pointer to the new one-way root is added to the thread OH of T5 . In the case of T8 , the OR(T8 ) already exists in the OH-list, thus the join count value two is changed to three which is the join count value of T8 .
4
Empirical Comparison
In this section, we compare empirically e-NR labeling with T-BD labeling using OpenMP benchmark programs, and analyze the experimental results of the efficiency for the both two labeling schemes. 4.1
Experimentation
To compare the efficiency of two labeling schemes, our experiments are carried on a system with two Intel Quad-core CPUs and 4GB of memory under the Kernel 2.6 Linux operating system. We installed gcc 4.4.4 for OpenMP 3.0 on the system. e-NR labeling and T-BD labeling are implemented as run-time libraries written in C language which are inserted into the target program. Then, we compile and execute the instrumented program for the labeling and race detection. For on-the-fly race detection with the both two labeling schemes, we employ Mellor-Crummey’s protocol [10], because the protocol exploits a simple technique to compare any two threads, called left-of relation, and requires small overhead for detecting races during an execution of a parallel program. The protocol also implemented as run-time libraries. We use OmpSCR (the OpenMP Source Code Repository) applications [5] which allow to measure the performance of OpenMP programs. OmpSCR applications are coded by ANSI C/C++ or standard Fortrans90/95, and these are considered both kernel applications and complete applications with simple code Table 1. The feature of three benchmarks Multi-way Nested loop Iteration
Mandelbrot 1 2 100K
FFT6 7 3 10K
Jacobi 1 2 2 1000K
Jacobi 2 2 2 1000K
198
S.-S. Kim, O.-K. Ha, and Y.-K. Jun
under 5000 lines. We use three applications of the OmpSCR. Table. 1 specifies the feature of the three applications. The details of the applications are as follows: – Mandelbrot computes an estimation to the Mandelbrot set area using MonteCarlo sampling. Mandelbrot set is a fractal that is defined as the set of point in the complex plane. The input parameter for this program is the number of points in the complex plane to be explored. This application consists of one-way nested parallel loop and two of the nesting depth. In our experiments, we use a constant 100,000 to explore the points for the application. – FFT6 is a kind of the FFT algorithms, which uses Bailey’s 6-step. FFTs are used to solve digital signal processing problem, partial differential equations, partial differential equations to algorithms for quick multiplication of large integers, and so on. 6-step FFT consists of seven-way nested parallel loop and three of the nesting depth. We insert 27 of input signal to store vectors and complex array to first parameter of the application. – Jacobi solves a finite difference discretization of Helmholtz equation using the well known Jacobi iterative method, and it implemented by 2 parallel loops with nested parallelism. We use two versions of the Jacobi program. One was implemented each parallel loop covered with parallel regions, and the other was implemented two parallel loops covered with one parallel region. These application consist of implemented as two-ways nested parallel loop, and two of nesting depth. We set the total iteration of the loops as 1,000,000. The iterations for parallel loops shows in Table 1. Each application uses fixed total iteration from 10,000 to 1,000,000 respectively. We measure time and space overheads of two labeling schemes using these benchmarks with increasing the number of available parallel threads from 2 to 64. 4.2
Results and Analysis
We compare the efficiency of e-NR labeling with T-BD labeling by analyzing the time and space overhead. Table 2 shows the results measured time and space Table 2. The results measured time and space overhead with FFT6 Labeling Schemes Original e-NR labeling T-BD labeling
Time (Sec) Memory (MB) Time (Sec) Memory (MB) Time (Sec) Memory (GB)
2 20.7 1.2 29.4 750.3 34.9 2.47
4 10.2 2.2 32.9 749.3 37.5 2.44
Threads 8 16 5.7 5.3 4.8 8.2 32.4 26.9 749.2 750.5 35.7 34.2 2.47 2.43
32 5.4 14.9 28.5 749.4 39.3 2.43
64 5.5 33.1 30.2 791.5 40.4 2.44
150
Run-time Overhead (%)
Run-time Overhead (%)
Efficiency of e-NR Labeling for On-the-fly Race Detection of Programs
e-NR Lebeling T-BD Labeling
120 90 60
e-NR (jacobi1) e-NR (jacobi2) T-BD (jacobi1) T-BD (jacobi2)
100 80 60 40
30 0
120
199
20
2
4
8
16
32
64
0
2
4
8
16
32
Threads
(a) Mandelbrot
64
Threads
(b) Jacobi
Fig. 3. Results for runtime overhead using two labeling schemes
overhead for both two labeling schemes with FFT6. In the results, e-NR labeling is 3.4 times slowdown than the original execution but 1.2 times faster than T-BD labeling in average execution time. The required space for e-NR labeling is increased 70 times than the original, and T-BD labeling is 230 times larger than the original execution in average space for program execution. The space overhead for e-NR labeling is moderate enough space overhead for large parallel applications, but space overhead for T-BD labeling is dramatically increased. Fig. 3 shows the results of time overehead to monitor parallel execution of two applications, Mandelbrot and Jacobi, by applying two labeling schemes. Fig. 3(a) is the time overhead of two labeling schemes for Mandelbrot, the slowdown of e-NR labeling is 19.5% in average case, and T-BD labeling slow downs 59.9% in average case. In Fig 3(b), the slowdowns of e-NR labeling are 28.9% and 34.2% in monitoring with Jacobi 1 and Jacobi 2 respectively. However, the slowdown of T-BD labeling to monitor two Jacobi programs are 63.6% and 70.2% in average case. These empirical results in Table 2 and Fig 3 shows that e-NR labeling is more efficient than T-BD labeling by at least 2 times in total monitoring time for race detection, and at least 3 times in average space for maintaining thread labels. Thus, the efficiency of e-NR labeling makes on-the-fly race detection more practical.
5
Conclusion
To support that e-NR labeling is efficient in detecting races during an execution of programs with nested parallelism. This paper compared empirically e-NR labeling with modified BD labeling, called T-BD labeling, that show the same complexity of time and space with OS labeling. The results using well known OpenMP benchmarks with nested parallelism show that e-NR labeling more efficient than T-BD labeling by at least 2 times in total monitoring time for race detection, and by at least 3 times in average space for maintaining thread labels.
200
S.-S. Kim, O.-K. Ha, and Y.-K. Jun
References 1. Audenaert, K.: Clock trees: logical clocks for programs with nested parallelism. IEEE Transactions on Software Engineering 23(10), 646–658 (1997) 2. Banerjee, U., Bliss, B., Ma, Z., Petersen, P.: A theory of data race detection. In: Proceedings of the 2006 Workshop on Parallel and Distributed Systems: Testing and Debugging, PADTAD 2006, pp. 69–78. ACM, New York (2006) 3. B¨ ucker, H.M., Rasch, A., Wolf, A.: A class of openmp applications involving nested parallelism. In: Proceedings of the 2004 ACM Symposium on Applied Computing, SAC 2004, pp. 220–224. ACM, New York (2004) 4. Dinning, A., Schonberg, E.: Detecting access anomalies in programs with critical sections. SIGPLAN Not. 26, 85–96 (1991) 5. Dorta, A.J., Rodriguez, C., Sande, F., Sande, F.d., Gonzalez-Escribano, A.: The openmp source code repository. In: Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing, pp. 244–250. IEEE Computer Society, Washington, DC (2005) 6. Ha, O.K., Kim, S.S., Jun, Y.K.: Efficient thread labeling for monitoring programs with nested parallelism. In: Kim, T.H., Vasilakos, T., Sakurai, K., Xiao, Y., Zhao, G., Lzak, D. (eds.) Communication and Networking. CCIS, vol. 120, pp. 227–237. Springer, Heidelberg (2010) 7. Jannesari, A., Tichy, W.F.: On-the-fly race detection in multi-threaded programs. In: Proceedings of the 6th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, PADTAD 2008, p. 6. ACM, New York (2008) 8. Jun, Y.K., Koh, K.: On-the-fly detection of access anomalies in nested parallel loops. SIGPLAN Not. 28, 107–117 (1993) 9. Lamport, L.: Ti clocks, and the ordering of events in a distributed system. Commun. ACM 21, 558–565 (1978) 10. Mellor-Crummey, J.: On-the-fly detection of data races for programs with nested fork-join parallelism. In: Proceedings of the 1991 ACM/IEEE Conference on Supercomputing 1991, pp. 24–33. ACM, New York (1991) 11. Netzer, R.H.B., Miller, B.P.: What are race conditions?: Some issues and formalizations. ACM Lett. Program. Lang. Syst. 1, 74–88 (1992) 12. Park, S.H., Park, M.Y., Jun, Y.K.: A comparison of scalable labeling schemes for detecting races in openmp programs. In: Eigenmann, R., Voss, M. (eds.) WOMPAT 2001. LNCS, vol. 2104, pp. 68–80. Springer, Heidelberg (2001) 13. Petersen, P., Shah, S.: Openmp support in the intel thread checker. In: Voss, M.J. (ed.) WOMPAT 2003. LNCS, vol. 2716, pp. 1–12. Springer, Heidelberg (2003) 14. The openmp api specification for parallel programming., http://www.openmp.org
Lightweight Labeling Scheme for On-the-fly Race Detection of Signal Handlers Guy Martin Tchamgoue, Ok-Kyoon Ha, Kyong-Hoon Kim, and Yong-Kee Jun Department of Informatics, Gyeongsang National University, Jinju 660-701, South Korea [email protected],{jassmin,khkim,jun}@gnu.ac.kr
Abstract. Data races represent one of the most notorious class of bugs in shared-memory parallel programs since they are hard to reproduce and can lead programs into unintended nondeterministic executions. However, data races can also show up in sequential programs that use signals when a signal handler and the sequential program both access without proper synchronization, a shared variable with at least one write-access operation. Unfortunately, existing tools for race detection in such programs produce high time and space overhead that drastically slow down the monitored program. This overhead is partly due to the use of costly and inappropriate timestamps originally designed for threads’ ordering in multithreaded programs. This paper presents a lightweight labeling scheme for efficient on-the-fly race detection in sequential programs that use signal handlers. This scheme generates constant-sized labels for the program and its signal handlers and determines the logical concurrency between instructions accessing the same shared memory. Generating and comparing concurrency information are done in constant time. The efficiency of the technique and its limited footprint on the program execution make the on-the-fly race detection more practical in sequential programs. Keywords: Data races, sequential programs, signal, signal handlers, labeling scheme, on-the-fly race detection.
1
Introduction
Data races [3,10], a well-known class of bugs in shared-memory parallel programs can occur also in sequential programs that use signal handlers. As in the parallel environment, a data race may happen when a signal handler and the sequential program both access without proper synchronization, a shared variable with at least one write-access operation. Data races in sequential programs
This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency), NIPA-2010(C1090-1031-0007). Corresponding Author: In Gyeongsang National University, he is also involved in the Research Institute of Computer and Information Communication (RICIC).
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 201–208, 2011. c Springer-Verlag Berlin Heidelberg 2011
202
G.M. Tchamgoue et al.
are due to the asynchronous property of signals and to the use of non reentrant functions in signal handlers. Detecting data races in sequential programs is therefore crucial since many useful programs like web browsers, web servers, text editors, command interpreters, graphical user interface, etc. heavily use signals and may contain harmful data races. Zalewski [15] reported that many well known UNIX programs (Sendmail, Screen, pppd) and several daemons (ftpd, http, MTA) where vulnerable and exposed to serious security problems due to data races in signal handlers. Moreover, a data race in signal handler that could have been exploited to perform remote authentication on a server was reported and fixed in OpenSSH [5, Chapter 13]. To dynamically detect data races in sequential programs that use signals, existing solution[12] propose to adapt tools for multithreaded programs to sequential programs. However, these tools generally use logical timestamps like vector clocks [1,2,6,7] to provide happens-before relationship between concurrent signal handlers and the sequential program. This approach incurs high space and time overhead partly due to the size and the number of vector clocks that may increase severely with the number of signals received during the program execution. To reduce this overhead, a new approach based on the /proc system file (for Solaris 10) or the debug registers (for IA32 Linux) is presented in [13] and uses watchpoints to monitor accesses to shared variables. This approach is certainly more scalable than the previous one, but still incurs high overhead in terms of time and is strongly tied to the operating system. Moreover, other existing on-the-fly race detection tools [4,8,9] generally generate labels with size depending on the maximum number of concurrent threads at runtime. Therefore, they are also inappropriate for signal handlers since they will incur high overhead with the growing number of signals. In order to efficiently detect data races on-the-fly in sequential programs that use signal handlers, this paper presents our lightweight labeling scheme tailored for this class of programs. This technique reduces the time and space overhead for practical race detection in sequential programs by proposing a constant-sized label for each task. Contrary to logical timestamps borrowed from multithreaded programming, the number of labels generated by our scheme does not increase with the number of signals received and the comparison time for two labels is also constant. This makes our technique be more scalable and appropriate for on-the-fly race detection in sequential programs. For the remainder of this paper, Section 2 gives an overview of signal handling in the Unix-like environment. Section 3 presents our lightweight labeling scheme. Section 4 theoretically compares the complexity of our label scheme with the vector clocks used in [12]. Section 5 finally concludes the paper.
2
Signal Handling Overview
Most Unix programmers have already dealt with signals at least once. A signal is an interrupt that is used to notify a process that an event has occurred so that the process can then respond to that event accordingly. A signal is then a
Lightweight Labeling Scheme for Signal Handlers
203
message that can be sent to a process by the kernel or by another process using the kill system call. A signal can also be sent by a process to itself. A signal is represented by a symbolic name and a unique integer number as identifier. For example, when a user resizes a program window, the signal number 28 called SIGWINCH is sent to the corresponding process. The process then responds to the signal by updating the size of its window to fit the needs of the user. Before using a signal, a signal handler must be registered with the kernel using the signal() or the sigaction() system call. A signal handler is simply a function that is automatically invoked when the corresponding signal is received. It is worth to notice that each signal has a default handler or action. Depending on the signal, the default action may be to terminate the receiving process, to suspend the process or just to ignore the signal. Signal can either be blocked or unblocked but some signals like SIGSTOP or SIGKILL cannot be caught, blocked or ignored. Signals may be generated synchronously or asynchronously. A synchronous signal pertains to a specific action in the program and is generally generated by some errors in the program (e.g. SIGFPE and SIGILL). Asynchronous signals are generated by events outside the control of the process that receives them and may arrive at unpredictable times during execution (e.g. SIGINT and SIGWINCH). Signals are not only that inoffensive as they appear to be. Misusing signals may, in fact, lead the program into serious vulnerability problems due to data races [5,12,13,15]. When a signal is delivered to a process, the normal execution flow of the process is preempted by the registered signal handler. The control is sent back to the process only at the end of the signal handler execution. This mechanism introduces logical parallelism into the preempted process. Thus, shared variables between the program and the signal handlers may be subject to data races. There is an asymmetric preemption relation between the signal handler and the sequential program. The signal handler can preempt the sequential program not the inverse. Only few libraries and system calls referred to as Async-signal-safe [14,15] are safe and recommended to be used in signal handlers. For example, printf() is classified non Async-signal-safe when getuid() and rmdir() are Async-signal-safe. Signal handlers must therefore be written efficiently not to monopolize the processor since they can also preempt each other.
3
A Lightweight Labeling Scheme for Signal Handlers
This section describes our lightweight labeling scheme used to capture the logical concurrency introduced by signal handlers in sequential programs. Though the programming model of signals seems to be simple, their execution and concurrency models may be very complex due to their asynchronous property. 3.1
General Concepts
For an efficient on-the-fly race detection, it is primordial to determine whether a task or thread accessing a shared memory is logically concurrent with any prior
204
G.M. Tchamgoue et al.
accesses to this variable. The effectiveness of the technique relies mostly on the cost and the accuracy of this test. To achieve this, we divide our program into blocks and assign a label to each of them. The label for each block is a unique identifier that permits to determine the logical concurrency at runtime when a shared variable is accessed. A block is defined as a sequence of instructions executed by a single thread that does not include signal concurrency primitives. In this definition, thread refers to either the sequential program or a signal handler. The signal concurrency primitives we are interested in are signal registration (e.g. signal() or sigaction() system calls), signal blocking and signal unblocking (e.g. sigprocmask() system call) primitives. A new block is created when a signal is registered with the kernel, blocked or unblocked. Each block receives a label comprising of a block identifier and a signal list. The block identifier is a single number which is automatically increased by one whenever a new block is created. Since the program is running sequentially, this simple numbering mechanism is strong enough to keep the happens-before relationship between blocks of the sequential program. The signal list of a block represents the list of all signals that may be concurrent with that block. Unless explicitly blocked, a signal is concurrent with the whole program from its registration point. Each signal list entry is composed of these three elements: – Signal Number : is a unique integer identifying each signal – Signal Handler Address: the runtime memory address of the signal handler. – Signal Status: the state of the signal in the corresponding block; a signal can be either blocked or enabled. When a signal is first registered, a new block is created with its signal list containing the registered signal with its status set to “enable”. If the signal is later blocked, the status is turned to “disable” in the corresponding block. Consecutive blocks that do not contain any instruction are merged into one single block (an example is shown is Fig.2). The maximum size of a signal list is fixed to 64 which represents the total number of available signals. Generally, programs handle only a reduced subset of these signals. Each new block inherits and updates the signal list of the precedent block. Each signal handler is considered as a single block and as such, is also associated with a unique label. When a signal handler is invoked, it receives a label consisting of the signal number and the signal handler memory address. Sometimes, the same signal may change his handler at runtime. The signal handler memory address is useful to distinguish between different handlers of the same signal (e.g. Fig.2). With this technique, the label of a given signal handler remains unchanged with the number of invocations. This is an important property since the number of signals received by a program at runtime can grow exponentially. Example 1. Let us consider the Fig.1 that shows an example program that creates and registers four signal handlers. A SIGWINCH and a SIGALRM are first registered (cf. lines 14 and 15, Fig.1), and then far in the code, SIGWINCH is
Lightweight Labeling Scheme for Signal Handlers
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28:
205
#include <signal.h> #include <stdio.h> volatile int x=0; void void void void
sh1(int sh2(int sh3(int sh4(int
signum){x=. . .; signum){=x. . .; signum){x=. . .; signum){=x. . .;
. . .; if()x=. . .; =x;} . . .; x=. . .;} . . .;} . . .;}
int main(int argc, char* argv[]){ sigset t nmask, omask; . . .; signal(SIGWINCH,sh2); signal(SIGALARM,sh3); . . .; sigemptyset(nmask); sigaddset(&nmask, SIGWINCH); sigprocmask(SIG BLOCK,nmask,omask); signal(SIGINT,sh1); . . .; sigemptyset(nmask); sigaddset(&nmask, SIGWINCH); sigprocmask(SIG UNBLOCK,nmask,omask); signal(SIGINT,sh4); . . .; return; }
Fig. 1. An Example Program with Signal Handlers
blocked while a SIGINT is registered (cf. lines 19 and 20, Fig.1). Finally, the program unblocks SIGWINCH (cf. line 24, Fig.1) and registers again SIGINT (cf. line 25, Fig.1) with a new signal handler. Fig.2 presents an example of labeling for this program. The circles on the sequential program execution flow represent the blocks creation points that should correspond to the registration, blocking or unblocking points of signals. Block 1 has no signal in its list, so none of the signals is concurrent with this block. Block 2 contains signals SIGWINCH(28) and SIGALARM (14). Each signal handler is labeled by both its correponding signal number and memory address. The dashed line means that the given signal handler is blocked or is not registered in the corresponding block. 3.2
Detecting Concurrent Blocks
Detecting concurrent blocks becomes simple and immediate using our labeling. All generated labels are constant-sized and composed only of numbers that facilitate their comparison. A signal handler is concurrent with a block in the sequential program if the signal list of the block contains the label of this signal
206
G.M. Tchamgoue et al.
Fig. 2. Example of Labeling for the Program in Fig.1
handler with its state set to enable (i.e. one in the example of Fig.2). If a signal handler is concurrent with a given block, it is also concurrent with all other signals that are concurrent with that block. A signal handler can preempt the execution of another signal handler or the same signal handler if the given signal is reentrant. Since blocks in the sequential program are ordered by happens-before relation, they are not concurrent with each other. Example 2. In Fig.2, the signal handler sh2 with label 28, 0x2 is concurrent with blocks 2 and 4, but not with block 3 since it has been blocked in this block. Similarly, sh1 with label 2, 0x3 is concurrent only with block 3 since signal SIGINT is registered in block 4 with a new handler sh4. Also, block 4 is concurrent with signal handlers 14, 0x1, 28, 0x2 and 2, 0x4 but not with 2, 0x3. However, there is not concurrency relation between blocks 1, 2, 3 and 4; they are ordered by happened-before relation.
4
Theoretical Analysis
Vector clocks [1,2,6,7] have the interesting property that they are strongly consistent, which means that two vector clocks that are not ordered must belong to concurrent tasks. The size of vector clocks is bounded by the number of concurrent tasks in a system. Therefore, updating vector clocks becomes time consuming. The race detection tool for sequential programs presented in [12]
Lightweight Labeling Scheme for Signal Handlers
207
Table 1. Efficiencies of Logical Timestamps for Race Detection in Signal Handlers
Technique
Space
Time Creation Comparison 2
Ronsse et al. [12] O((S + H) ) O(S + H) Our Labeling
O(H)
O(1)
O(S + H) O(1)
relies on vector clocks to detect concurrent tasks. In this work, two vector clocks are associated with the invocation of every signal handler at runtime. The only problem with this is that the number of signals received by a program at runtime can increase exponentially making it difficult to manage the vector clocks. The size of the vector clocks increasing with the number of signals received. In this section, we examine and compare the efficiency of our labeling scheme with the technique presented in [12]. We analyze only the required space to store the concurrency information for all concurrent blocks and the time needed to create and compare them. These complexities depend on (a) the number of signals received by the program S and (b) the number of blocks created in the program H. We did not include the work done in [13] since their technique is not based on logical timestamps. The efficiency comparison of the two approaches is presented in Table 1. The analysis results presented in Table 1 show that our labeling scheme is effective in terms of space and time. The space to store vector clocks in [12] depends on both the number of blocks and the number of signal received. Two vector clocks are created for every signal received by the program. The size of each vector clock is given by S + H. Thus, the time needed to create and to compare two vector clocks [12] is O(S + H). With our labeling technique, this time remains constant since labels are composed only of numbers. Each signal handler in our scheme keeps the same label for all its invocations. Thus, the total number of labels generated for signal handlers is always almost 64, the total number of signals.
5
Conclusion
Handling signals introduces logical parallelism in sequential programs. Consequently, data races can also occur when a signal handler and the sequential program both access a shared variable without proper synchronization and with at least one write-access operation. In this paper, we presented a lightweight labeling scheme that generates constant-sized labels for efficient on-the-fly race detection in sequential programs that use signal handlers. With our labeling scheme, contrary to the existing approach that use vector clocks, the number of generated labels does not increase with the number of signals received at runtime. Moreover, by generating and comparing labels in constant time, our labeling technique is more scalable than existing solution. In our future work,
208
G.M. Tchamgoue et al.
we will provide a race detection framework for sequential programs based on this labeling scheme and compare its efficiency with existing work on real-world programs.
References 1. Agarwal, A., Garg, V.K.: Efficient Dependency Tracking for Relevant Events in SharedMemory Systems. In: The 24th Symposium on Principles of Distributed Computing (PODC 2005), pp. 19–28. ACM, Las Vegas (2005) 2. Baldoni, R., Raynal, M.: Fundamentals of Distributed Computing: A Practical Tour of Vector Clock Systems. IEEE Distributed Systems Online 3(2) (2002) 3. Banerjee, U., Bliss, B., Ma, Z., Petersen, P.: A Theory of Data Race Detection. In: Workshop on Parallel and Distributed Systems: Testing and Debugging, pp. 69–78. ACM, New York (2006) 4. Dinning, A., Schonberg, E.: An Empirical Comparison of Monitoring Algorithms for Access Anomaly Detection. In: The Second ACM SIGPLAN symposium on Principles & Practice of Parallel Programming, pp. 1–10. ACM, New York (1990) 5. Dowd, M., McDonald, J., Schuh, J.: The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities, 1st edn. Addison-Wesley Professional, Massachusetts (2006) 6. Fidge, C.J.: Logical Time in Distributed Computing Systems. Computer 53(11), 28–33 (1991) 7. Flanagan, C., Freund, S.N.: FastTrack: Efficient and Precise Dynamic Race Detection. Communications of the ACM, 93–101 (November 2010) 8. Jun, Y., Koh, K.: On-The-Fly Detection of Access Anomalies in Nested Parallel Loops. In: The 1993 ACM/ONR Workshop on Parallel and Distributed Debugging, pp. 107–117. ACM, California (1993) 9. Mellor-Crummey, J.: On-The-Fly Detection of Data Races for Programs with Nested Fork-Join Parallelism. In: The 1991 ACM/IEEE conference on Supercomputing (SC 1991), pp. 24–33. ACM, New York (1991) 10. Netzer, R.H.B., Miller, B.P.: What Are Race Conditions? Some Issues and Formalizations. ACM Letters on Programming Languages and Systems 1(1), 74–88 (1992) 11. Ousterhout, J.K.: Why Threads are a Bad Idea (for most purposes). In: USENIX 1996: Invited talk at the 1996 USENIX Technical Conference, USENIX (1996) 12. Ronsse, M., Maebe, J., De Bosschere, K.: Detecting Data Races in Sequential Programs with DIOTA. In: Danelutto, M., Vanneschi, M., Laforenza, D. (eds.) Euro-Par 2004. LNCS, vol. 3149, pp. 82–89. Springer, Heidelberg (2004) 13. Tahara, T., Gondow, K., Ohsuga, S.: Dracula: Detector of Data Races in Signals Handlers. In: The 15th IEEE Asia-Pacific Software Engineering Conference (APSEC 2008), pp. 17–24. IEEE, Los Alamitos (2008) 14. The Open Group, and IEEE: The Open Group Technical Standard Base Specifications, (7), POSIX.1-2008 (2008), http://www.opengroup.org/onlinepubs/ 9699919799/ 15. Zalewski, M.: Delivering Signals for Fun and Profit (2001), http://lcamtuf. coredump.cx/signals.txt
Automatic Building of Real-Time Multicore Systems Based on Simulink Applications Minji Cha and Kyong Hoon Kim Department of Informatics, Gyeongsang National University Gajwadong 900, Jinju 660-701, South Korea {cha7355,khkim}@gnu.ac.kr
Abstract. MATLAB/Simulink is commonly used for designing modelbased dynamic embedded systems. Throughout Real-Time Workshop toolkits, it can generate C or C++ programs for various target platforms, which is useful to develop embedded systems. However, the current toolkits generate only single programs, so that it does not leverage multicore technology for performance improvement. In this paper, we provide an automatic code generation scheme for multicore real-time systems by inserting user-defined S-Functions for Simulink applications. Therefore, users can easily develop multiple subtasks of a Simulink application on multicore systems. We develop the automatic code generation for RTAI real-time systems and evaluate the performance throughout experiments. Keywords: Auto-code, Multicore, Real-Time, Simulink, RTAI.
1
Introduction
Matlab/Simulink [1] is a useful tool for block model-based design of dynamic embedded systems, and is widely used in many fields such as marine, automotive, aviation, etc. A Simulink application combines multiple simple block diagrams which consist of pre-defined blocks in order to build complex systems. One of key features is automatic code generation of Real-Time Workshop tool [1]. Matlab/Simulink generates C and C++ programs using Real-Time Workshop tool [1] for target platform. Thus, users easily generate real-time program source codes for their platforms by converting their Simulink applications using Real-Time Workshop tool, not by manual coding.
This research was supported in part by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA2010-(C1090-1031-0007)), and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. 2010-0015770). Corresponding author: In Gyeongsang National University, he is also involved in the Research Institute of Computer and Information Communication (RICIC).
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 209–220, 2011. c Springer-Verlag Berlin Heidelberg 2011
210
M. Cha and K.H. Kim
Although the automatic code generation function of Simulink can reduce time and cost in building real-time systems, the current Simulink provides autogenerated code in the form of a single program. Thus, it is hard to expect performance improvement in multi-core based systems. In this paper, we provide user-defined Simulink blocks in order to generate parallel codes for multicore real-time systems based on RTAI Linux. The remainder of this paper is organized as follows. Section 2 describes related work and research motivation. The proposed scheme is explained in Section 3 and the implementation detail is provided in Section 4. We evaluate the proposed scheme throughout experiments in Section 5. Finally, Section 6 concludes the paper.
2 2.1
Related Work and Research Motivation Simulink
Simulink supports an environment for multi-domain simulations, model based dynamic systems and embedded systems. In addition, a complex system can be expressed simply using many defined blocks with block diagrams, and users can define their own needed functions by defining user-defined blocks. So, users build systems hierarchically with subsystems. Each subsystem presents a single sub-function, and is verified independently. One important aspect of simulink is auto-code generation using Real-Time Workshop. After a user models simulink program, he or she can easily obtain C or C++ source program in the form of a pre-defined target system and single program. 2.2
RTAI Linux
RTAI (Real-Time Application Interface) [3] is developed by DIAPM poiteco dl Milan based on the concept of RT Linux. RTAI provides scheduler for real-time tasks in many different processor environments, such as uniprocessor, symmectric multi processor, multi-uniprocessor.
Fig. 1. Interrupt control of Linux and RTAI
Auto Building of RT Multicore Syst. Based on Simulink
211
Fig. 1 shows RTAI interrupt handling structure. While interrupts for processors are handled by linux kernel, interrupts by other peripherals are handled by RTAI. Therefore peripheral interrupts of real-time tasks are serviced first. When real-time tasks are not active, peripheral interrupts are handled by linux kernel. RTAI supports many functions such as LXRT, inter-process communication, POSIX threads, RTAI-Lab, etc. LXRT enables users to create real-time tasks in user-space and to access many real-time services in kernel-space. RTAI-Lab shows the block diagrams of programs written in MATLAB/Simulink, and provides patch and block for users who want to generate C code for RTAI target. Therefore, users can easily build real-time systems by using RTAI linux. 2.3
Research Motivation
MATALB/Simulink can generate real-time program codes using automatic code generation function of Simulink. However, the generated code is provided in the form of a single program. When the program is executed in multicore real-time systems, it uses only a single core. This research provides user-defined blocks and programs for automatic code generation on multicore-based real-time systems. User-defined blocks in Simulink specify parts of parallelization, and converted programs can be run in multicore real-time systems based on RTAI Linux. The proposed scheme is useful for highperformance real-time systems by fully utilizing multicore performance.
3
The Proposed Method
This research proposes a way to automatically generate parallel code in simulink, so that users can execute program in multicore real-time systems. The proposed method consits of five steps, as shown in Fig. 2. Step 1. Programming a Simulink application: First, a user develops a Simulink application based on block diagrams. The function blocks which are to be parallelized should be specified as subsystems. For example, in Fig. 3, there are two subsystems (Subsystem1, Subsystem2) to be parallelized.
Fig. 2. Five steps of the proposed scheme
212
M. Cha and K.H. Kim
Fig. 3. Example of a Simulink program
Fig. 4. Adding user-defined funtions for parallel execution of Fig. 3
Step 2. Inserting user-defined blocks for parallelization: The user inserts userdefined blocks in order to be run concurrently. We provide RT TASK START for indicating the start of parallel execution. Similarly, the user-defined block RT TASK END implies the end of parallel execution. For example, Fig. 4 shows parallelized Subsystem1 and Subsystem2 by inserting two user-defined blocks. Step 3. Auto-code generation: Simulink generates automatically C programs using Real-time Workshop tool. The proposed user-defined blocks also build preprocess directives for C code blocks which are used to generating real-time tasks. Moreover, RT TASK START blocks generate
void Subsystem1() { /*...*/ } void Subsystem2() { /*...*/ } static void mdlOutputs() { #rt_task_start Subsystem1 Subsystem1(); /*...*/ #rt_task_start Subsystem2 Subsystem2(); /*...*/ #rt_task_end Subsystem1, Subsystem2 } Fig. 5. A part of code generated by Simulink
Auto Building of RT Multicore Syst. Based on Simulink
213
void Subsystem1() { rt_task_init_schmod(rt_Subsystem1); /*...*/ rt_task_delete(rt_Subsystem1); } void Subsystem2() { rt_task_init_schmod(rt_Subsystem2); /*...*/ rt_task_init_schmod(rt_Subsystem2); } static void mdlOutputs() { pthread_create(Subsystem1); /*...*/ pthread_create(Subsystem2); /*...*/ pthread_join(Subsystem1); pthread_join(Subsystem2); } Fig. 6. Code conversion of Fig. 5 for multicore systems
‘#rt task start’ directives, RT TASK END blocks generate ‘#rt task end’ directives. Fig. 5 shows a part of source programs including those direcrives. Step 4. Code conversion for multicore real-time systems: The source code of Step 3 is converted again by replacing inserted directives with threadrelated functions. The converter program searches ‘#rt task start’ and ‘#rt task end’ directives, and replaces them with pthread creat() and pthread join(), respectively. For example, ‘#rt task start Subsystem1’ in Fig. 5 is converted into pthread create (Subsystem1), as shown in Fig. 6. Similarly, ‘#rt task end Subsystem1, Subsystem2’ directive in Fig. 5 is converted into pthread join (Subsystem1) and pthread join (Subsystem2) in Fig. 6. In addition, it inserts additional codes for RTAI task in each sub-function definition as shown in Fig. 6. Step 5. Real-time task execution: Now, the user compiles converted program source from Step 4 in order to execute it in real-time target systems. The real-time task for the Simulink application runs periodically in RTAL Linux. Fig. 7 shows task flow regarding with subtask creation and termination in each period. First, the main task is assigned to a single core, as shown in Fig. 7(a). The main task creates sub-tasks which will execute RT TASK START block codes. Each subtask is assigned to different cores by scheduler as shown in Fig. 7(b). They run in parallel until executing the RT TASK END block code. While subtasks wait the main task, the main task terminates the subtasks. After then, the main task executes the remaining part in a single core, as shown in Fig. 7(c). This process is repeated until the end of the program.
214
M. Cha and K.H. Kim
Fig. 7. Example of Simulink program
4 4.1
Implementation S-Function Overview
S-Function [2] is user-defined blocks and can be added to the model in Simulink. It is written with C, C++, Fortran, and MATLAB. Users should describe three files in order to define a new S-function. Fig. 8 shows the S-function configuration and each file structure when written by C language. – C Description: Users should use C language, and write pre-defined format on each function such as initialize, output, input, terminate, and so on. For example, a user can define his or her own needed functions in mdlOutputs() function, or can define initialization functions in relation to block I/O, parameters, and ports. – MEX: It converts C description into MEX for simulation of Simulink. It converts only description functionality using by MATLAB command. – TLC (Target Language Compiler): TLC is used for writing additional codes (e.g., global variables, global functions, etc.) in generated C description. Users can use inlining codes when generates automatic codes. Therefore, it can build program without limit to global functions and global variables.
Auto Building of RT Multicore Syst. Based on Simulink
215
Fig. 8. S-function structure
4.2
User-Defined S-Function for Generate Tasks
In this paper, we implement two S-functions (RT TASK START, RT TASK END) for generating parallel codes. Table. 1 shows each S-function shape and parameters on Simulink. RT TASK START block consists of a single I/O port, and gets subsystem name as a parameter. RT TASK END block consists of two parameters and ports. First parameter defines the number of ports, and the other parameter gets subsystem names. Table 1. User-defined block of Simulink Block Name
RT TASK START
RT TASK START
Simulink Block shape
Parameters
216
M. Cha and K.H. Kim
We implement C desciption of each S-function. Fig. 9 shows description code of RT TASK START. The function mdlInitializeSizes initializes each I/O port and parameter, and sets the option for implementing TLC file. The function mdlRTW stores the subsystem name (SUBSYSTEM NAME) to temporary rtw file, since TLC file with C description needs to share for subsystem name. In addition, mdlOutputs function connects input values as output values for Simulink simulation. static void mdlInitializeSizes(SimStruct *S) { ssSetSFcnParamTunable(S, 0, false); if (!ssSetNumInputPorts(S, 1)) return; ssSetInputPortWidth(S, 0, DYNAMICALLY_SIZED); ssSetInputPortDirectFeedThrough(S, 0, true); ssSetInputPortOptimOpts(S, 0, SS_NOT_REUSABLE_AND_GLOBAL); if (!ssSetNumOutputPorts(S,1)) return ssSetOutputPortWidth(S, 0, DYNAMICALLY_SIZED); ssSetOutputPortOptimOpts(S, 0, SS_NOT_REUSABLE_AND_GLOBAL); ssSetNumSampleTimes(S, 1); ssSetSimStateCompliance(S, USE_DEFAULT_SIM_STATE); ssSetOptions(S, /* ... */); } static void mdlRTW (SimStruct *S) { /* ... */ mxGetString(SUBSYSTEM_NAME,operStr,noper+1); ssWriteRTWParamSettings(S, 1, SSWRITE_VALUE_STR, "SubSystem_Name", operStr); /* ... */ }
Fig. 9. C description for RT TASK START
We also implement the TLC file for auto-code generation, as shown in Fig. 10. Outputs function of TLC replaces mdlOutputs function of C description when automatic code generation. It loads subsystem names (Subsystem Name) to temporary rtw file, and inserts ‘#rt task start Subsystem Name’ directive. Then, it generates C code to connect input values as output values. 4.3
Code Converter for Real-Time Task Generation
The code converter program converts the source code from Simulink to the target platform RTAI Linux. It consits of two passes: one for information retrieval and second for replacement.
Auto Building of RT Multicore Syst. Based on Simulink
217
%function Outputs(block, system) Output %assign rollVars = ["U", "Y"] %roll idx = RollRegions, lcv = RollThreshold, block, ... "Roller", rollVars % = \ %; %endroll %assign Sub_Name= SFcnParamSettings.SubSystem_Name #rt_task_start %<Sub_Name> %endfunction %% Outputs
Fig. 10. TLC code for RT TASK START
In the first pass, the program finds ‘#rt task start’ and ‘#rt task end’ directives in the source code. In addition, it saves subsystem name, the number of subsystems, and etc. In the second pass, the program inserts RTAI task creation and termination codes into each subsystem function using the saved subsystem name. In addiition, it converts ‘#rt task start’ and ‘#rt task end’ directives into POSIX thread creation and termination codes, in each. For example, ‘#rt task start’ directive converted into pthread create, and ‘#rt task end’ directive into pthread join shown in Fig. 6.
5
Performance Evaluation
We used Simulink program of Fig. 11 for performance evaluation. The proram calculates falling points of many floating submunitions in the air with consideration of angles, wind, and so on. Each submunition calculates altitude and angle for every sampling period, and outputs X, Y, and Z coordinates. The internal block of calculation consits of multiple sub-blocks, as shown in Fig. 12. When M is the total number of submunitions and N is the number of sub-blocks, each subsystem updates M/N submunitions. The experimented real-time system uses patched RTAI 3.8 in the linux kernel 2.6.24, and the hardware platform is based on Intel Xeon Quad-Core 1.66GHz processor of 3GB memory. The number of submunitions in the experiments is varied from 100 to 300. The sampling period is set as 2.5 msec. Since the number of cores of the system is four, the Simulink program is parallelized in maximum four subsystems. Fig. 13 shows the average of worst execution times by performing 20 times. As shown in Fig. 13, the parallelization effect occurs as the number of submunitions increases. For example, in case of 300 sumunitions, working four threads reduces the 69% execution time, and three threads reduces the 60% execution time, compared with single core execution. In case of sampling period 1 msec, running the program in a single core cannot satisfy the deadline from 200. Using two cores
218
M. Cha and K.H. Kim
Fig. 11. Simulink program of floating submunitions
Fig. 12. Composition of main block of Fig. 11
Fig. 13. Worst-case execution time
Auto Building of RT Multicore Syst. Based on Simulink
219
Fig. 14. Overhead of the proposed method
cannot meet the deadline when the number of submunitions is greater than 240. Thus, the proposed scheme utilizes the processing capacity of multicore real-time systems. We compare the execution time of single program and 1-thread in order to measure the overhead of the proposed scheme. Single program is from the source code of Simulink, and 1-thread includes the overhead of creation and termination of one thread. As shown in Fig. 14, the overhead is between 50 µsec and 75µsec. It corresponds to thread creation and termination time in each cycle.
6
Conclusions
Current Simulink provides auto-generated code in the form of a single program. Thus, when the program is executed in multicore real-time system, it uses only a single core. In this paper, we provide user-defined Simulink blocks in order to generate parallel codes for multicore real-time systems. The proposed scheme also includes the code converter for thread creation and termination codes. Throughout the exeperiments, the proposed method shows enhanced parallelism effect about increasing the calculation unit. And, the proposed method is useful in high-performance real-time systems by fully utilizing multicore performance. Our on-going research includes reducing the overhead and applying to distributed real-time systems using real-time communication protocols.
References 1. Mathworks: Simulink, http://www.mathworks.com/products/simulink/ 2. Mathworks: Simulink Model-Based and System-Based Design (Writing S-Function) (1998–2001) 3. Lineo, Inc.: DIAPM RTAI programming guide (2000), http://www.rtai.org
220
M. Cha and K.H. Kim
4. Bucher, R., Balemi, S.: Rapid Controller Prototype with Matlab/Simulink and Linux. Control Engineering Pracrice 14, 185–192 (2003) 5. Bucher, R., Dozio, L.: CACSD under RTAI Linux with RTAI-LAB. In: Real-Time Linux Workshop, vol. 5 (2003) 6. Canedo, A., Yoshizawa, T., Komatsu. H.: Automatic Parallelization of Simulink Applications. In: IEEE/ACM International Symposium on Code Generation and Optimization (CGO), vol. 8, pp. 151–159 (2010) 7. Sarolahti, P.: Real-time Application Interface, http://www.cs.helsinki.fi/u/sarolaht/papers/rtai.pdf 8. Cho, S., Choi, K.: A Study on Validation of OFP for UAV using Auto Code Generation. The Kerean Society for Aeronautical & Space Sciences 37(4), 359–366 (2009) 9. Cho, S., Park, J., Kim, S., Choi, K., Park, C.: Development of Software for UAV using Automatic Code Generation of MATLAB. The Korean Society for Aeronautical & Space Sciences 4, 571–574 (2007)
A Study on the RFID/USN Integrated Middleware for Effective Event Processing in Ubiquitous Livestock Barn Jeonghwan Hwang and Hyun Yoe* School of Information and Communication Engineering, Sunchon National University, Korea {jhwang,yhyun}@sunchon.ac.kr
Abstract. The RFID/USN technology is one of the important technologies to implement the ubiquitous society, and it could make many changes in the existing agricultural environment including livestock rearing, cultivation and harvest of agricultural products if such a RFID/USN technology is applied to the agricultural sector. In particular, for livestock barns, the RFID/USN technology has been applied to manage livestock's history and monitor environments of livestock barns, and it is secured the increase of productivity and the transparency of distribution channels through it. In the case of livestock barn systems applying such a RFID/USN technology, an integrated RFID/USN middleware technology is required to provide a combined service through effective processing of generated data and organic connections between application services because it should be processed an enormous amount of data collected via the RFID/USN in real-time. This paper would propose an integrated RFID/USN middleware for implementing a combined service by efficiently processing data collected from the RFID, which is used to identify livestock, and the USN, which is used to monitor livestock barn's environment, and by organically connecting with data. The proposed middleware has features to reduce application's load and provide data requested by applications by filtering data generated in real-time in the RFID and USN environment applied to livestock barns. Keywords: RFID, USN, Middleware, Ubiquitous, Agriculture.
1 Introduction The ubiquitous agriculture is aimed at introducing the information technology into the agriculture to increase its productivity and systematically managing distribution, consumption etc. of agricultural products to verify its safety and promote transparency[1], and the RFID/USN(Radio Frequency Identification/Ubiquitous Sensor Networks) technology is regarded as a core technology to implement it[2]. Currently, the RFID/USN technology has been introduced into various agricultural fields such as greenhouses and livestock barns to construct monitoring systems from cultivation environment to production management, distribution for producing *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 221–228, 2011. © Springer-Verlag Berlin Heidelberg 2011
222
J. Hwang and H. Yoe
agricultural and livestock products and securing distribution channel's transparency [3][4][5][6], particularly, the livestock sector is utilizing the RFID/USN technology for feeding management of livestock entities, management of livestock barn's environment, and management of breeding trace[7][8][9]. Since such a system applying the RFID/USN technology should process an enormous amount of data in real-time, a middleware is needed to abstract into significant information after filtering a large amount of data collected from several heterogeneous RFID/USN and processing the event data, and to efficiently transfer and process lot of contexts and data generated in the ubiquitous environment[10][11]. Even though the RFID/USN technology has been diversely applied into the agricultural sector to proceed studies, however, studies on the RFID/USN middleware focusing application services in agricultural environment are insufficient due to absence of standards, technical limits and difficulties of data sharing and fusion in special environment like agriculture[12]. Therefore, this paper would like to propose an integrated middleware platform between heterogeneous service systems to implement a combined service by efficiently processing data collected from ubiquitous livestock barns applying the RFID/USN technology and organically connecting with data. The proposed middleware could reduce application program's load by filtering data collected in real-time from the RFID and USN environment applied to livestock barns, maximize the scalability and usability of systems by providing event services through configuration of events suitable to contexts and event processing, and provide a combined service through application services and organic connections between them. This paper is organized as follows. Session 2 describes a design of the proposed integrated RFID/USN middleware, Session 3 describes implementation of the proposed middleware and a system applying the middleware, finally Session 4 would finish this paper through a conclusion.
2 Design of the Proposed Middleware 2.1 Ubiquitous Livestock Barn System The ubiquitous livestock barn system is a system that could monitor livestock barn's environment and control livestock barn facilities by applying the ubiquitous technology into livestock barns, and the ubiquitous livestock barn system in this paper is constructed based on the RFID/USN that is one of ubiquitous technologies. The ubiquitous livestock barn system in this paper installs the USN and CCTV in a livestock barn to collect environmental and image data in it, and these collected environmental and image data in it help users could not only monitor the livestock barn externally via Web but also manually control livestock barn facilities even from a remote site. In addition, it could automatically control facilities based on prescribed breeding environmental values, and it could not only provide convenience to users by offering a SMS messaging service when any dangerous condition is arisen but also efficiently manage history of pigs by collecting and managing identification and entity information etc. of pigs with RFIDs. The figure 1 is a structure of the ubiquitous livestock barn system.
A Study on the RFID/USN Integrated Middleware for Effective Event Processing
223
Fig. 1. System structure of ubiquitous livestock barn
2.2 Design of the Proposed RFID/USN Integrated Middleware The middleware proposed in this paper has features that reduce system's load by collecting livestock entities, identification data and livestock barn's environmental data from heterogeneous RFIDs and sensors installed for monitoring the livestock barn and filtering data generated in real-time, and that provide data requested by the application programs. In order to satisfy such a condition, the integrated RFID/USN middleware supports a function to analyze, filter, and convert the collected data. The proposed integrated RFID/USN middleware is constructed as the figure 2, and each layer carries out the following role. The RFID/USN interface layer provides a common interface function to heterogeneous RFIDs and sensors, and offers continuous monitoring and control functions for status of RFIDs and sensor networks. The data process layer plays a role to provide a function to process various queries for data collected from the RFID/USN infrastructure and a real-time management function for RFID and sensor information. In addition, the data process layer employs a data management component, which delivers valid messages to the upper layer and supports queries of various forms by filtering data for reducing system's load that could be generated due to an enormous amount of data collected via RFIDs and sensors.
224
J. Hwang and H. Yoe
Fig. 2. Structure and data flow chart of the proposed RFID/USN integrated middleware
The event process layer is composed of an event construction and an event processing, and the event construction builds up the RFID, USN data received from the lower layer as a simple event. The event processing specifies a reference value through a complex event language defined by the upper application layer and builds up the constructed simple event as a significant event considering the reference value. The database controller stores data generated in real-time into the database, supports queries of the application layer to use data at the time when the data is needed, and stores and updates the reference data for automatic control and condition notification of livestock facilities. The database plays a role to store environmental data collected from sensors installed inside/outside the livestock barn, image data collected from CCTVs, livestock entity information and identification information collected via RFIDs, conditions, operating time and the number of controls of livestock controlling facilities, environmental reference values for automatic control and condition notification into each table. The application service interface plays a role to deliver and establish the range of services established by the application. And data stored in the DB could be searched and stored by connecting the DB controller for demands of the application. In addition, it supports various queries to constitute flexible connections between applications and hardwares.
A Study on the RFID/USN Integrated Middleware for Effective Event Processing
225
3 Implementation of the Proposed Middleware 3.1 Implementation Environment Environment to implement the RFID/USN integrated middleware for ubiquitous livestock barn system is divided into hardware and software environment, and the details is as follows. Hardware environment to develop the proposed RFID/USN integrated middleware is the server-side PC development environment, and the details is as Table 1. Table 1. Hardware Environment
Server PC Environment
Type
Details
CPU
Intel Xeon 3.2 Ghz
RAM
1 GB
OS
Fedora Linux 2.6.27
DATABASE
MySQL 5.0
WAS
Tomcatt 6.0.20
In hardware environment, the PC development environment corresponding to a server is implemented as the general PC environment, and the sensor node environment taking charge of sensing is constructed as the environment suitable to use Zigbee sensors. Next, software environment to develop the proposed context-aware middleware is as Table 2. In software environment, the PC development environment uses an operating system of Microsoft Windows series, programming language uses Java and C#. The sensor node environment responsible for sensing is constructed as Linux environment in Windows environment with Cygwin, and it uses TinyOS as the operating system to produce the sensor program with NesC. Table 2. Software Environment
PC Development Environment
Sensor Node Environment
Type
Details
OS
Microsoft Windows XP
Programming Languages
JAVA(JDK 6), C#
OS
TinyOS 1.0
Linux Environment
Cygwin
Sensor Programming Languages
NesC
JAVA
JDK 1.4.1
226
J. Hwang and H. Yoe
3.2 Application of the Proposed RFID/USN Integrated Middleware It was aimed at implementing the proposed middleware on the integrated management system for ubiquitous livestock barns in agricultural environment, and it was experimented for the conversion of data collected via RFID and USN into simple events and the extraction of events by given conditions. In addition, it was constructed to monitor the environment of livestock barn and check history information of livestock via the Web even externally.
Fig. 3. Ubiquitous livestock barn management system GUI
Fig. 4. Ubiquitous livestock traceability system GUI
A Study on the RFID/USN Integrated Middleware for Effective Event Processing
227
The figure 3 is the livestock barn monitoring system implemented on the Web, which could check the real-time environmental and image information, and control the livestock barn facilities. The figure 4 is a screen displaying history and biometric information of livestock, which could check livestock's identification number, birth date, birth place, age, weight, grade, disease history if a livestock identification number is entered.
4 Conclusions This paper proposed and developed an integrated RFID/USN middleware to implement a combined service by efficiently processing data collected from the ubiquitous livestock barn applying the RFID/USN technology and by organically connecting with the data. The proposed middleware was implemented to carry out functions such as data refinement, data filtering, data conversion etc. for data collected via RFID and USN installed in the livestock barn for the purpose of system efficiency and reduction of application program's load, and implemented to support a function processing events suitable to the context, so that it could maximize scalability and usability of the system and provide a combined service through an organic connection between application services. In addition, the proposed middleware was actually applied to an ubiquitous livestock barn based on RFID/USN to construct and operate the integrated management system for ubiquitous livestock barns that could monitor the livestock barn in real-time, control the facilities and check livestock's history information. If the management system for ubiquitous livestock barns is constructed through the proposed middleware platform, it is expected to increase efficiency and reliability for management of livestock barns and livestock's history.
Acknowledgements This research was supported by the MKE (The Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2010(C1090-1021-0009)).
References 1. Shin, Y.-S.: A Study on Informatization Model for Agriculture in Ubiquitous Era; MKE Research Report. National IT Industry Promotion Agency, Seoul (2006) 2. Pyo, C.-S., Chea, J.-S.: Next-generation RFID/USN technology development prospects. Korea Inf. Commun. Soc. 24, 7–13 (2007) 3. Jeong, B.-M.: Foreign u-Farm Service Model Casebook. Issues and Analysis Report of Korea National Information Society Agency, NCA V-RER-06005, Seoul, Korea (2006) 4. RFID Journal, http://www.rfidjournal.com/article/articleview/2229/1/1/
228
J. Hwang and H. Yoe
5. Lee, M.-H., Shin, C.-S., Jo, Y.-Y., Yoe, H.: Implementation of green house integrated management system in ubiquitous agricultural environments. J. KIISE 27, 21–26 (2009) 6. Hwang, J.H., Shin, C.S., Yoe, H.: A Wireless Sensor Network-Based Ubiquitous Paprika Growth Management System. Sensors 10, 11566–11589 (2010) 7. Hwang, J.H., Yoe, H.: Study of the Ubiquitous Hog Farm System Using Wireless Sensor Networks for Environmental Monitoring and Facilities Control. Sensors 10, 10752–10777 (2010) 8. Lee, S.-C., Kwon, H., Kim, H.-C., Kwak, H.-Y.: A design and implementation of Traceability System of black pigs using RFID. J. Korea C. Assoc. 8, 33–40 (2008) 9. Kim, M., Son, B., Kim, D.K., Kim, J.: Agricultural Products Traceability Management System based on RFID/USN. J. KIISE 15, 331–343 (2009) 10. Kim, M.S., Lee, Y.J., Park, J.H.: Trend of USN Middleware Technology. ETRI Electronic Communications Trend Report, Electronics and Telecommunications Research Institute, Daejeon, Korea 22(3), 67–79 (2007) 11. Kim, Y.-M., Han, J.-I.: Middleware Technology for Ubiquitous Sensor Network. J. KIISE 25(12), 35–48 (2007) 12. Kung, S.: The Design of Fungus Cultivating System based on USN. Journal of Korean Institute of. Information Technology 5(3), 34–41 (2007)
Design of Android-Based Integrated Management System for Livestock Barns JiWoongLee and Hyun Yoe* School of Information and Communication Engineering, Sunchon National University, Korea {leejiwoong,yhyun}@sunchon.ac.kr
Abstract. Smartphones have a feature that could search and process information in real-time via wireless Internet such as 3G networks and WiFi networks etc., and users could directly add and remove the desired application programs. In order to monitor environments of livestock barns in real-time and actively respond when problems are arisen in the livestock barn by using such features of smartphones, it is proposed an integrated management system for livestock barns based on the Android. The Android-based integrated management system for livestock barns proposed in this paper collects environmental information of livestock barns in real-time through sensors and image processing equipments installed in the livestock barn, and stores the environmental information into a DB on the environmental server. It is constituted as a form to provide information for livestock barns to smartphone users by processing the stored information through an XML parser. Keywords: Android, Smartphone, livestock barns.
1 Introduction Thanks to recent development of mobile technologies, the smartphone, which is called a small PC in the hand, installs the additional functions such as listening to music, watching TV and video, games, diary, dictionary, digital camera, Web browsing etc. above the phone function as an original purpose, and the range of applications is also expanding [3]. The best feature of the smartphone, which is evolved day by day like this, is that it could utilize numerous functions such as Internet information search, scheduling etc. by adding functions of PDA and computer into a mobile phone. In addition, it could be also used as a client terminal of company's enterprise application since various software could be installed and operated. And, since the mobile device provides mobility, also functions recognizing the surrounding context such as positioning, acceleration measurement etc., it could be expected to enable development and practice of various applications [4]. An integrated management system for livestock barns is designed by using the Android platform developed by Google among operating systems for smartphones with such various application possibilities. The existing released platform has a disadvantage *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 229–233, 2011. © Springer-Verlag Berlin Heidelberg 2011
230
J.W. Lee and H. Yoe
that the platform itself is closed and depends on single platform. Therefore, its range of applications is narrow and there is a limitation for providing various application services [5]. On the contrary, the Android mobile platform developed by Google could develop various application programs with open sources including major application programs by using open OS features, and it could be installed on set-top boxes and household electronic appliances etc. as well as mobile devices, so that its range of applications could be also developed more widely than the existing platform [5]. The Android-based integrated management system for livestock barns proposed in this paper is designed to check environmental information of the livestock barn in real-time by utilizing sensors and imaging equipments installed in the livestock barn and to optimally control the environment of livestock barn. If a large amount of data is constantly received in real-time to check environmental information of livestock barns by a smartphone, there would be taken place a problem for high consumption rate of smartphone's battery. In order to solve this problem, the system is designed to check data on a smartphone after constructing an environmental server, storing environmental information such as temperature/humidity etc. collected from sensors, and parsing the data with XML on the server, and to control the control equipment for livestock barn environment if any abnormal condition is arisen in the livestock barn while the smartphone is checking the inside of livestock barn in real-time.
2 Design of System Architecture 2.1 System Structure The integrated management system for livestock barns proposed in this paper is divided into three part for managing the livestock barn. The first is the physical layer, which is composed of a sensor part collecting environmental information for livestock barns and a control part. The Zigbee and image processing equipments are installed to monitor the environment of livestock barns in real-time on the sensor part, and a micro-controller for fans is installed to maintain ventilation and humidity in the livestock barn. The second is the middleware layer, which stores image information collected by the physical layer into a database on the environmental server, and is constructed as users play a role to control the environment of livestock barns via smartphones and the Web. The third is the user layer, which implements an application program to monitor and control the inside of livestock barns in real-time by the Web and smartphones. Information collected from temperature/humidity sensors etc. installed in the livestock barn is transferred to the environmental server and stored in the database. The Android based integrated management system for livestock barns fetches data stored on the environmental server to smartphones and the Web via the XML parser. In addition, the working condition of the control facilities for livestock barns is inquired and controlled remotely via smartphones. When a smartphone sends a control command to the environmental server, the server updates the working condition and sends a control signal to the micro-controller to control the livestock
Design of Android-Based Integrated Management System for Livestock Barns
231
barn facilities. The image processing equipment installed in the livestock barn sends to smartphones via the WiFi and 3G networks. Fig. 1 represents a structure of the system proposed in this paper.
Fig. 1. System Structure
2.2 Processing of Sensor Data The Zigbee sensors manufactured by Hanbaek Electronics are used for sensors installed in the livestock barn. Atmega128L is used for Zigbee sensors, and CC2420 is used for RF Modems. The measurable sensor information is digital temperature/humidity, brightness, and illuminance. The measured data is collected as a format like the packet Fig. 2, and a data packet parser is implemented to store growth information such as temperature/humidity etc. into a database on the environmental server. When a user decides whether or not the livestock barn is abnormal and sends a control signal by a smartphone, the control table of the database on the environmental server is referred to apply a control signal into the micro-controller, which controls the environmental control devices such as a fan, heater, air conditioner, and humidifier etc. installed in the livestock barn.
Fig. 2. Packet format
2.3 Environmental Server The environmental server is divided into environmental information for livestock barns, image information, control information, and the repository stores environmental information for the inside/outside of livestock barns received from the physical layer,
232
J.W. Lee and H. Yoe
the working condition information for devices of the control equipment group, and control information etc. The image repository plays a role to store image information obtained from the image processing part, which sends images to the smartphone in real-time when a request for image information is received from a user 2.4 XML Parser The XML is used for a data format between smartphones and the database. The XML could easily process complicated data of client systems and is compatible with the Web and even other application programs without a special middleware because it is standardized in application programs and the Web. However, it is difficult to extract the accurate information because the format rule of XML is quite strict. Therefore, the help of a professional parser is required to correctly find the desired information and filter out the partial characters. For this purpose, a parser is implemented with open sources, and parsers that have been currently made public are divided into DOM and SAX. The DOM parser is very fast because it reads after spreading all the contents of a document as a tree form on the memory, and it could read an arbitrary node repeatedly. Since it could read after reading the whole document to complete the tree, however, it has a disadvantage that the beginning is a little slow, and a problem that uses much memory when the document becomes larger [6]. The SAX parser uses a scheme that reads a document in order to generate events, which rarely uses memory and may stop work in the middle [6]. Comparing advantages and disadvantages of these two parsers, the DOM uses much memory but it is fast, and the SAX has an advantage that rarely uses memory. Therefore, this paper would use a parser of the SAX scheme for processing a large amount of sensor information generated in the livestock barn easily. 2.5 Image Processing The TCP/IP socket program is implemented to send image information generated in the livestock barn to smartphones. When a smartphone user applies a signal for viewing image information the inside of livestock barn by using a smartphone, it accepts the request for connection and converts image information into the jpeg format to send it to the user. Users would receive the converted information in realtime [2]. Fig. 3 is the flow chart of image data.
3 Conclusion This paper used a smartphone equipped with the Android to design the Android-based integrated management system for livestock barns to monitor and remotely control the inside of livestock barn. It was designed to store environmental information such as temperature/humidity etc. collected from livestock barns in a database via the environmental server, and to monitor environmental information of livestock barns in real-time when a user requested it. In addition, it was designed to remotely control the environment of livestock barns when an abnormal condition was arisen in the livestock barn, and to check conditions for the inside of livestock barn and the
Design of Android-Based Integrated Management System for Livestock Barns
233
Fig. 3. Flow chart of image Processor
livestock with an image processing equipment. The future study would like to implement the integrated management system for livestock barns based on this system design. Acknowledgements. This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency) (NIPA-2010-(C1090-1021-0009)).
References 1. Park, J., Seo, S., Kim, Y., Lee, K.H.: A mobile Service model for Ubiquitous Computing Environment. The Korean Institute of Information Scientists and Engineers 13(3), 170–182 (2006) 2. Choi, J.Y., Lee, S.-J., Jeon, B.-C.: Home Network Application using Android Mobile Platform, vol. 10(4), pp. 7–15 (2010) 3. Bae, S., Kim, W.: Mobile Information Sharing System Based-on Android Mobile Platform. The Institute of Electronics Engineers of Korea 46(2), 58–64 (2009) 4. Kim, S.D., La, H.J.: An Architecture for Android-based Mobile Service Applications. The Korean Institute of Information Scientists and Engineers 28(6), 25–34 (2006) 5. http://gojump0713.blog.me/140056424428 6. http://blog.naver.com/schweine7/40112744930
A Study of Energy Efficient MAC Based on Contextual Information for Ubiquitous Agriculture Hochul Lee and Hyun Yoe* School of Information and Communication Engineering, Sunchon National University, Korea {hclee,yhyun}@sunchon.ac.kr
Abstract. In the agricultural field, various technologies are being used. Especially, a sensor network related technology is being applied as a technology that has been receiving much attention in recent. The efficiency of MAC protocol of WSN (Wireless Sensor Network) is being studied in various aspects, and the studies on the method of applying such MAC protocol in the agricultural field are also needed. In this thesis, a situation information based MAC protocol that can be applied in the agricultural field is proposed. The proposed protocol determines transmission by analyzing the given situation unlike the existing MAC protocol. It determines the data availability of unchanged situation or that of insignificant range with no influence, and it allows the transmission of any changed volume upon comparing with the existing transmission data only in the case when the critical value mutually set with the serve is surpassed. It increases the energy efficiency through the method of decreasing the data size by delivering only the changed volume than the entire data and decreasing the transmission frequency by transmitting data only in the case when the critical value has been surpassed. To analyze the performance of the proposed protocol, a simulation was conducted based on the data measured in greenhouse. Keywords: WSN, Ubiquitous, u-IT, Energy Efficient MAC, Context Aware.
1 Introduction The recent technological innovation of information communication is accelerating the convergence of industries. The convergence of IT and traditional industries are continuing on and convergence technology is being expected as a technology for increasing the added value and productivity of agriculture by applying ubiquitous technology in the primary industry of agriculture[1]. To successfully establish such uagriculture environment, it is a must to develop the core ubiquitous technology optimized for agriculture such as sensor hardware, middleware platform, routing protocol, agricultural environment application service, etc[2]. The proposed protocol determines transmission by analyzing the given situation unlike the existing MAC protocol. It determines the data availability of unchanged situation or that of insignificant range with no influence, and it allows the transmission *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 234–239, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Study of Energy Efficient MAC Based on Contextual Information
235
of any changed volume upon comparing with the existing transmission data only in the case when the critical value mutually set with the serve is surpassed. It increases the energy efficiency through the method of decreasing the data size by delivering only the changed volume than the entire data and decreasing the transmission frequency by transmitting data only in the case when the critical value has been surpassed. The composition of this thesis is as follows. In Chapter 2, the proposed MAC protocol using the situation information is designed based on which energy consumption model is developed. In Chapter 3, performance of the proposed MAC is evaluated through simulation and it is compared with SMAC[3]. In Chapter 4, the conclusion on the proposed MAC is made, and future study direction is described.
2 Protocol Design The proposed MAC protocol operates in energy-efficient ways according to particular situation based on the situation information given according to the operating environment. The control message structure of MAC is based on SMAC. Sensor node collects the surrounding environment information regardless of the wake period of MAC and it conducts the basic operation of transmitting the value to the server during which it conserves the transmission power through the method of reducing the size of transmitted data. Sensor node transmits the environment information initially detected and its type to the server and saves according to the sensor type that detected the environment information last. In the server, the critical value required for monitoring is determined according to the transmitted environment information and the received sensor type and transmits the designated critical value and the maximum limit time to sensor node. The maximum limit time is the maximum time during which the detecting medium is not affected even when changes occur in the environment information. Node saves and manages the critical value per sensor and maximum limit time sent from the server. Node collects environment information according to the determined cycle at the server, and the data is not transmitted when the value is within the critical value upon comparing it with the environment information transmitted prior. If the collected information is within the maximum change amount, the measured changed amount is calculated using Formula (1). (1) If the absolute value of the measured changed amount calculated exceeds the maximum change amount, the calculated data is transmitted to the server. If it is less than the maximum change amount, the data is discarded and the data measurement frequency is counted. When the communication between sensor node and the server no longer occurs from the data discarding that resulted from the data change amount that is less than the critical value, whether the sensor node is operating or not cannot be known by the server. To notify the server that the sensor node is operating normally, the measured changed amount is sent according to the limit time per sensor device that had been designated during the server registration when the limit time is surpassed
236
H. Lee and H. Yoe
even when there are no data changes. The maximum frequency of measuring the discarded data follows Formula (2), and the discarded data measurement variable is initialized to Zero when data transmission occurs. (2) In Formula (2), TMAX indicates the maximum limit time, and tcycle indicates 1 cycle time including listen-sleep. Table 1. Parameters
Parameters Dpre Dcur ΔD TMAX tactive tlisten Pnodata Pdata Plisten Psleep P Pevent
Description Transmitted data at last Recently transmitted data Variation of the current data collected the maximum time not to send active time of data during transmission active time of data during no transmission amount of electric power during no transmission one cycle amount of electric power during transmission one cycle power consumption of status during data transmission and reception Power consumption of Sleep status Total power consumption The probability of Data Generation
Next, the parameters required for the energy consumption model of the proposed MAC have been defined as shown in Table 1. As can be seen in Table 1, the parameters are set for the following: show and classify the percentage of even occurrence; the power consumed in listen mode and sleep mode as well as when data is transmitted and when it is not transmitted; set the time length of a cycle; the listen continuation time when data is transmitted and when it is not transmitted; the total power consumption amount of node of each MAC protocol. The energy consumption model is as shown in Formula (5). (3) (4) (5) Formula (3) shows the energy consumption of a cycle with no data transmission, and Formula (4) shows the energy consumption with data transmissions. Formula (5) shows the sum of the energy consumptions during the entire m cycle, and Pevent in the formula indicates the probability of the occurrence of data to be transmitted.
A Study of Energy Efficient MAC Based on Contextual Information
237
3 Simulation and Result For the purpose of measuring the energy performance of MAC that is being examined for application, simulation is conducted. The performance of S-MAC and that of the proposed X-MAC are compared for which the simulation environment was constructed by using NS-2[4]. The system parameters for the simulation are as shown in Table 2. Table 2. Simulation Parameter
S-MAC NS2 Version Simulation Time Data Size Packet Interval Node Count Routing Protocol Duty Cycle Bandwidth Initial Energy
X-MAC NS-2.34 24Hours
Fixed
Variable 1 minute / node 3 DSDV 5% 250Kbps 30,000 J
In the simulation, the changes in temperature that were measured in actual paprika greenhouse were used as the situation information. The simulation was conducted using a model for transmitting to the server the changes in temperature through the course of time, and SMAC transmitted the information measured as they were and the proposed MAC transmitted the changed amount of the measured information. The data collection cycle was set as 1 minute, and SMAC transmits data every 1 minute and the proposed MAC determined whether or not to transmit by comparing the collected data with the data previously transmitted. The critical value of data change was set as 0.5 . The number of nodes used in the experiment was limited to three in order to simplify the test environment, and they were designated as source node, relay node and sink node.
℃
Fig. 1. Measured temperature in the Greenhouse
238
H. Lee and H. Yoe
Figure 1 show the temperature changes measured in paprika greenhouse. The red line in Figure 1 shows the temperature changes measured without environment control and blue line shows the temperature changes measured while conducting environment control optimized for crop cultivation. These data were used in the MAC simulation, and the temperature change data measured without environment control and the temperature change data measured with environment control were tested for once respectively for each protocol.
Fig. 2. Energy consumption per Minute
As shown in Figure 2, the energy consumption amount of the proposed MAC found to be less than that of a representative MAC protocol SMAC upon comparison, and the proposed MAC showed about 23% of energy consumption reduction effect compared to that of SMAC, as shown in Figure 2, thereby confirming that its energy performance has been improved compared to that of SMAC. It can be inferred from what was revealed in the simulation is that the proposed MAC protocol will show high efficiency when sensor network is applied in facility agricultural environment.
4 Conclusion In this thesis, energy-efficient situation information based MAC protocol that can be used in facility agricultural field was proposed. The facility agricultural field is relatively very static compared to other dynamic industrial fields and the changes measured through real-time monitoring are extremely small. Contrary to the existing MAC protocol, the proposed protocol determines whether or not to transmit by analyzing the given situation. It determined the data availability of unchanged situation or that of insignificant range with no influence, and it allowed the transmission of any changed volume upon comparing with the existing transmission data only in the case when the critical value mutually set with the serve is surpassed. It allows rational communication by increasing the energy efficiency through the method of decreasing the data size by delivering only the changed volume than the entire data and decreasing the transmission frequency by transmitting data only in the case when the critical value has been surpassed.
A Study of Energy Efficient MAC Based on Contextual Information
239
For the purpose of analyzing the performance of the proposed protocol, simulation was conducted based on the data measured in greenhouse. The result showed 23% of energy performance improvement by the proposed MAC compared to that of SMAC. The situation information based MAC protocol proposed in this study will be applied in facility agricultural field in the future. Acknowledgements. This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency) (NIPA-2010-(C1090-1021-0009)).
References 1. Lee, K.H., Ahn, C.M., Park, G.M.: Characteristics of the Convergence among Traditional Industries and IT Industry. Electronic Communications Trend Analysis 23(2), 13–22 (2008) 2. Lee, M.-H., Shin, C.-S., Jo, Y.-Y., Yoe, H.: Implementation of Green House Integrated Management System in Ubiquitous Agricultural Environments. Journal of KIISE 27(6), 21– 26 (2009) 3. Ye, W., Heidemann, J., Estrin, D.: An Energy-Efficient MAC Protocol for Wireless Sensor Networks. In: 21st Conf. of the IEEE computer and communicaions Societies (INFOCOM), vol. 3, pp. 1567–1576 (2002) 4. Network Simulator NS-2, http://www.isi.edu/nsnam/ns
A Service Scenario Based on a Context-Aware Workflow Language in u-Agriculture* Yongyun Cho and Hyun Yoe Information and Communication Engineering, Sunchon National University, 413 Jungangno, Suncheon, Jeonnam 540-742, Korea {yycho,yhyun}@sunchon.ac.kr
Abstract. In u-agricultural environment based on greenhouses, there are various sensors and network devices. So, it may be more reasonable that the situation information has to be considered for a service in the u-agricultural environment. uWDL and CAWL are workflow languages, which can consider contexts as service transition conditions. So, to get a context-aware service application in any service domain, a developer can directly describe contexts as service execution condition into a scenario written in the languages. In this paper, we propose a service scenario that may reasonably happen at any time in real u-agricultural environment. The suggested service scenario is based on context-aware workflow languages, especially written in uWDL and CAWL. The suggested service scenario will demonstrate how a developer can easily compose a contextaware service application with the workflow languages according to situation information from real u-agricultural environment, and may be adapted to develop various context-aware applications in u-agricultural environment. Keywords: context-aware service, context-aware workflow languages, uagricultural environment.
1 Introduction In u-agricultural environments, when a service to control the growth of agricultural products can be executed according to the sensed data, it will be more intellectual and autonomous. And, when a service to control the visible supply time or quantity of agricultural products can be concluded according to the profiled data, it will make agricultural works more prognostic and precise. To implement what we can call as a context-aware or smart agricultural service, we need a method to combine the individual services in u-agricultural environments into a flow of agricultural services which can be automatically executed according to the sensed data and the profiled data. Recently, some researches tried to adapt workflow to implement a smart and autonomous service in the area of the u-agricultural environment [1, 2, 3]. However, when many service demands occur from real u-agricultural environments at the same time, *
This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency) (NIPA-2010-(C1090-1021-0009)).
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 240–244, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Service Scenario Based on a Context-Aware Workflow Language in u-Agriculture
241
they are not enough to resolve the demands. In this paper, we propose a new workflow service scenario, which can implement context-aware agricultural services. It includes the various agricultural service cases, and uses the sensed data or the profiled data as contexts for the execution of those. Because the suggested service scenario is composed with CAWL, which is a experimental context-aware workflow language to resolve multi-demanded service flows. Through the suggested scenario, we will show a possibility that context-aware workflow technology can efficiently make agricultural services more autonomous and smart, and demonstrate how well it can be utilized to develop various agricultural context-aware applications.
2 Related Work uWDL [4] is a context-aware workflow language, which can easily support the implementation of smart services for various service domains by describing a context as a transition condition of a service. However, the schema of uWDL can describe only one service flow at a time. It is not enough for an application developer to compose a service scenario for context-aware services in real u-agricultural environments networked with various sensors. CAWL [5] is another context-aware workflow language, which can simultaneously support flows for many proper services according to contexts transmitted from the sensors. Fig. 1 is the schema of CAWL1.0.
Fig. 1. A part of the schema of CAWL1.0
242
Y. Cho and H. Yoe
Unlike uWDL, CAWL1.0 includes a node tag to support multi-flows and subflows. As shown in Figure 1, the tag, the tag, and the tag are related in the functions [5]. And, it can use ontology to support a context-aware workflow service for a specific service domain such as u-agriculture. For example, through the element in Figure 1, a developer may include an OWL-based agricultural ontology dictionary in order to easily compose a contextaware agricultural workflow application.
3 A Context-Aware Workflow Service for u-Agriculture To describe contexts and services with a context-aware workflow for u-agriculture, it is necessary to define agricultural services and conditions, and connect the defined service nodes with a u-agricultural flow. Figure 2 describe a part of a context-aware service workflow that consists of possible u-agricultural services.
Fig. 2. A context-aware temperature control service as a possible u-agricultural service
Figure 2 shows a part of a possible service that may occur in real u-agricultural environment equipped various sensors. Figure 2 illustrates a service situation to automatically control the temperature of a greenhouse. Let's call it as the TemperaturControlService. As Figure 2, the service consists of sub-services with service execution conditions individually.
4 Experiments and Results In this section we will show how a context-aware workflow service can be efficiently and easily expressed using a context-aware workflow language, CAWL1.0, in u-agriculture environments in agricultural environment. Figure 3 shows a part of a context-aware agricultural service workflow in CAWL1.0 for the possible services described in Figure 2.
A Service Scenario Based on a Context-Aware Workflow Language in u-Agriculture
243
Fig. 3. A part of apossible u-agricultural service scenario in CAWL1.0
5 Conclusion This paper introduced a reasonable and potential possibility that a developer can easily and efficiently develop a context-aware agricultural service application using a context-aware workflow language, CAWL. Utilizing the introduced method, a developer can handle contexts related in situation information from various agricultural sensors being around the environment to monitor and control growing of the crops in real-time. And, especially in u-agricultural environment such as greenhouses or vertical farms, the suggested method will be helpful in developing various u-agricultural services to increase crops productivity and work effectiveness, minimizing human's interleaving by making agricultural services more autonomous.
References 1. Cho, Y., Yoe, H., Kim, H.: CAS4UA: A Context-Aware Service System Based on Workflow Model for Ubiquitous Agriculture. In: Kim, T.-h., Adeli, H. (eds.) AST/UCMA/ISA/ACN 2010. LNCS, vol. 6059, pp. 572–585. Springer, Heidelberg (2010) 2. Fileto, R., Liu, L., Pu, C., Assad, E.D., Medeiros, C.B.: POESIA: An OntologicalWorkflow Approach for Composing Web Services in Agriculture. The VLDB J. 12(4), 352–367 (2003) 3. Cho, Y., Moon, J., Yoe, H.: A Context-Aware Service Model Based on Workflows for uAgriculture. In: Taniar, D., Gervasi, O., Murgante, B., Pardede, E., Apduhan, B.O. (eds.) ICCSA 2010. LNCS, vol. 6018, pp. 258–268. Springer, Heidelberg (2010)
244
Y. Cho and H. Yoe
4. Han, J., Cho, Y., Choi, J.: Context-Aware Workflow Language based on Web Services for Ubiquitous Computing. In: Gervasi, O., Gavrilova, M.L., Kumar, V., Laganá, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. (eds.) ICCSA 2005. LNCS, vol. 3481, pp. 1008–1017. Springer, Heidelberg (2005) 5. Choi, J., Cho, Y., Choi, J.: The Design of a Context-Aware Workflow Language for Supporting Multiple Workflows. Journal of Korean Society for Internet Information 11(1), 145–158 (2009)
A Sensor Data Processing System for Mobile Application Based Wetland Environment Context-aware Yoon-Cheol Hwang, Ryum-Duck Oh, and Gwi-Hwan Ji* Dept. of Computer Science and Information Engineering Chung-Ju National University Chungju-si, South Korea [email protected], [email protected], [email protected]
Abstract. In recent years, with the development of USN(Ubiquitous Sensor Network) technology, the USN technology has, from all aspects of society, evolved in the field of environment, human health, buildings, and other construction sites. And nowadays it is being used as one of situation awareness services that can recognize the situation occurring in the region where USN is installed. However, in the USN environment, data is obtained from a very small sensor node via wireless communication and sensing, operation is conducted, and resources are limited. So in order to provide situation awareness service, QoS(Quality of Service) should be guaranteed and energy efficiency should be maximized. Therefore, this paper designed and implemented the sensor data processing system, CADPS(Context-aware Data Processing System) which is available to guarantee QoS and maximize energy efficiency in order to apply situation awareness service using USN into the wetland environment. Such system analyzes the overall situation of detailed sensor nodes for the implementation of intelligent personalized services, and designs and implements for data analysis and processing. And compared to the existing USN middleware system, interface is excellent from user’s perspective and energy efficiency is obtained through mobile-based efficient query treatment. Keywords: USN, mobile-based Application, Wetland, Context-aware, CADPS.
1 Introduction In recent years, with the development of USN(Ubiquitous Sensor Network) technology, necessary data for environmental monitoring, logistics monitoring, and building management can be transmitted from various small sensors using wireless network[1]. Thanks to such technologies, USN service that can monitor disaster situations in real time in the field of our industry and environment[2]. USN(Ubiquitous Sensor Network) is consisted of sensor network, middleware, and application platform[3,4], and through its automatic exchange with sensors that can be attached *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 245–254, 2011. © Springer-Verlag Berlin Heidelberg 2011
246
Y.-C. Hwang, R.-D. Oh, and G.-H. Ji
anywhere, environmental information is acquired, stored, and processed and so knowledge and information services are provided anytime, anywhere, and to whomever. However, as USN obtains data from a very small sensor node in wireless communication and sensing, operation is conducted, and resources are limited, and in the situation awareness service like rainfall detection, small amount of transmission should be conducted mostly for a short time. Therefore, in the situation awareness service using USN, it is necessary to determine data sensing cycle in the wetland environment in consideration of energy efficiency of sensor network. In addition, only with simple stream data collected from USN service, we cannot efficiently process and manage situations occurring from various disaster situations. Therefore, more techniques are required to systematically manage water quality situations through situation information recognition, creation, and management technology appropriate for USN service domain and the type. Wetland environment is a rich repository of natural resources where moisture is always in the natural environment as it is surrounded by rivers, ponds, and swamps, and also is an important part in the environmental area that provides various benefits such as flood control, slope erosion control, natural habitat for wild animals, underground water supply & water quality improvement, prevention of global warming, etc[5]. And as it uses systematically defined knowledge of various environmental changes like eutrophication, algae alert, and algal bloom in the wetland environment, the information service that tells the situation that one wants to look for in terms of application service is an urgent need[6]. Therefore, this paper designs situation awareness data processing system(CADPS) that recognizes the wetland situation and transmits information that a user wants to mobile devices. This system is located between mobile applications and USN, increasing QoS for energy efficient-processing and application of USN data. And for mobile-based wetland environment situation awareness system, smartphone support modules are mounted to provide In-Network module against the existing USN Middleware's inefficient query processing and for user convenience and accessibility. This paper is composed as follows: Chapter 2 describes the USN middleware and situation awareness system for situation awareness based on relevant studies. Chapter 3 describes wetland environment data processing environment coming from sensor data through middleware after designing data processing system for wetland environment situation awareness and explaining about components. Chapter 4 describes and evaluates implementing environment for sensor data processing system and the results that had been applied to the wetland environment. Finally, Chapter 5 describes evaluations on the results and future researches to be studied.
2 Related Works USN is an advanced intelligent community-based infrastructure that detects, stores, processes, and integrates objects and environmental information from tags and sensor nodes attached anywhere, and so anyone is free to use such customized USN service anytime and anywhere through situation awareness information and knowledge information creation. USN has been used a lot in disaster forecast, environmental monitoring management, u-Home service, and u-hospital service, and situation awareness computing.
A Sensor Data Processing System
247
USN Middleware that previously developed are TinyDB[7], Cougar[8], Milan[9], COSMOS[10]. TinyDB (Berkeley) is a middleware that regards wireless sensor network as virtually distributed database. Server and In-Network Middleware cooperatively processes query by selecting similar query to SQL. In particular, InNetwork middleware has TAG techniques that support Data Aggregation, significantly reducing the amount of data transmitted on query processing. However, it is available only on TinyOS-based sensor nodes and difficult to add sensing properties or new query processing capabilities to sensor node[7]. Cougar (Cornell) is a middleware that stores all sensing data of sensor nodes in the sensor network and processes queries by accessing to a server without conducting query processing directly in a sensor. Because the collected information of sensor nodes should be sent to a server in the initial stage, it causes more energy consumption[8]. Milan (Rochester) manages application serviceoriented sensor network, maximizes the lifetime of sensor network, and provides QoS of application services. Milan does not support mobile sensor node, but is strongly related to USN application services, and so does not support the abstraction of sensor nodes. COSMOS (ETRI) supports various types of queries, is possible to simultaneously process queries in the large sensor network environment, and supports the abstraction of heterogeneous sensor network. But due to the lack of In-Network middleware like InNetwork data aggregation, energy efficiency is poor.
3 Sensor Data Processing System for Wetland Environmental Context-Aware 3.1 Structure of CADPS CADPS was designed to efficiently manage sensing information collected from sensor nodes and respond quickly to queries given from USN application services. CADPS largely consists of three parts: Application Layer that requests services, receives the situation perceives, and then figure it out; Middleware Layer that processes the service requested from Application Layer and analyzes the data collected from sensor network; In-Network Layer that collects data necessary for situation awareness. Application Layer is composed of smartphone-based USN situation awareness applications implemented by Android Platform. Middleware Layer consists of a. Query Integration Module that generates queries on sensor control commands required for USN service to make a control request to a sensor or receive DB information request from a client, and then searches for database, and returns result values; b. Intelligent Transaction Module that creates, modifies, deletes meta information to analyze the situation; c. Interface Manger which processes large sensing information, and when a particular event occurs from a sensor, transmits information of a particular sensor to Intelligent Transaction Module. In-Network Layer is consisted of a. sensor that collects information necessary for situation awareness and b. network for transmission of the information collected. [Fig. 1] is the overall structure of CADPS.
248
Y.-C. Hwang, R.-D. Oh, and G.-H. Ji
Fig. 1. The overall structure of CADPS
3.2 Smartphone Application As an USN situation awareness service application equipped with smartphone support module, it was implemented in Android Platform and designed so that it can be demonstrated by Android Simulator. Smartphone Application consists of background service, viewer, communication module, sensor network module, and situation awareness processor. The structure of smartphone application is as shown in [Fig. 2].
Fig. 2. Application Structure Smartphone
• Background service: This is a service that informs users of dangerous situations by using alarm and vibration when a dangerous situation occurs. The situation awareness processor classifies dangerous information of sensor network registered through sensor network manager and information of sensors embedded in smartphones into a dangerous situation. It is provided in smartphone in the form of background service, and informs only when a dangerous situation occurs. • Application Viewer: As a module to show information in the format of GUI in smartphones so that sensor network lists registered in middleware, sensor network lists that's been interested, and the status of sensors can be figured out, the status of sensor network can be identified in the form of list and map. • Communication Module: As a module to transmit sensor network information that's interested in through HTTP, it appropriately converts transmitted XML and sends it to sensor network manager and situation awareness processor.
A Sensor Data Processing System
249
• Sensor Network Manager: As a module to manage sensor network registered, registered sensors are stored into lightweight database in the smartphone itself, data stored in the database and data transmitted from communication module are combined, and then delivered to application viewer. The lightweight database is implemented by SqlLite which is widely used in smartphones. In addition, the feature that converts longitude and latitude data measured at GPS sensor to address is built in. • Situation Awareness Processor: As a module that delivers dangerous situation to background service after analyzing data transmitted from USN middleware and GPS data sensed from smartphone itself, acceleration data, it enables usercentric situation awareness. 3.3 CADPS Middleware CADPS Middleware consists of, as shown in [Fig. 3], largely Query Integration Module, Intelligent Transaction Module, and Interface Manger.
Fig. 3. Middleware Structure of CADPS
Query Integration Module creates query on sensor control commands required for USN service, requests its control to a sensor or receives DB information request from a client, searches for database, and returns the result values. Query Processing in USN makes a flexible link between user query and database depending on the location in which sensing information is stored from sensor network. And it provides proper interface with applications and also with smartphones, being linked to mobile support module so that the application interface of Query Integration Module can control and monitor sensor network. Intelligent Transaction Module consists of Situation Analysis Manager, Sensor Data Manager, and Metadata Manager. The Metadata Manager combines packets sent from sensor network with metadata, changes to information completed, and reduces frequent access to database by applying a specific technique called metadata set. The Metadata Manager plays a role of obtaining, keeping, and maintaining various metadata from USN environment and focuses mostly on data acquisition for upper level of situation awareness. Situation Analysis Manager recognizes wetland status through sensor
250
Y.-C. Hwang, R.-D. Oh, and G.-H. Ji
network data and analyzes dangerous situations. Sensor Data Manager performs to process large sending information, and when a particular event occurs from a sensor, information on a particular sensor is delivered to situation awareness manager. Interface Manager is a component that provides standardized interface for sensor network, and each sensor network is a module that complies with common message specifications for sensor data acquisition, monitoring, and control in linking to sensor network common interface. In other words, when various sensor nodes and networks are configured in terms of the data acquisition of In-Network, it plays a role to guarantee independence of sensor network and middleware and hold scalability of sensor network. 3.4 Wetlands Environment Cotext Data Processing [Fig. 4] shows the data processing process. To look at the flow of eutrophication depending on data processing process, In-Network groupings are sensor nodes placed in wetlands. T-P, T-N, Chl-a, transparency, and the other sensors' sensing information are collected. The data collected is sent to interface manager and standardized so that it can be used in the upper module. The standardized data is a large streaming data, going through conversion of data to ordinary one through sensor network groupings and sent to situation analysis manager and at the same time the current situation is shown to users. The packet data converted before going through situation analysis manager is converted into completed data including metadata by metadata manager and stored in DB. The converted data is compared by the sensor baseline and the discovery of a particular anomaly sensors and the occurrence of eutrophication are inferred at situation analysis manager. Thereafter through application interface, the progression and risk of eutrophication is shown in the users' smartphones. In general, for situation awareness service, a direct query to sensor node is done and such method is expected to be added later and will improve the quality of a system.
Fig. 4. Data Processing Flow
A Sensor Data Processing System
251
4 Implementation and Application 4.1 Wetlands Environmental Context Aware Data Analysis For eutrophication that has the most significant impact on the wetlands environment, situation awareness data processing system is applied to check operation status. Despite the presence of situations like eutrophication, prediction of tidal currents, algal blooms, sewage inflow in the wetlands, it isn't independent and related between each other. In addition, it makes a direct influence and acts complexly. Among these, in case of eutrophication, it has the most relevance of each situation. And this paper is to apply situation awareness data processing system for the eutrophication, which has the biggest influence on the wetland environment, of the various situations of wetlands environment. The criteria for eutrophication depends on nation, targeted lake, and researcher. The criteria for eutrophication are T-P, T-N, Chlorophylla, and transparency, which are used as an essential datum for situation awareness service. In general, when a lake is first created, there are not enough nutrients in the water and it becomes a oligotrophic lake with lower productivity due to algae. However, as nutrients are increasingly influx from surrounding areas, algae gradually proliferates, and the number of zooplankton or aquatic organism that feeds on it increases, the dead bodies of such organisms accumulates on the bottom of the lake, and then it becomes a eutrophic lake. [Table 1] below is three kinds of baseline of situation analysis against the typical data of four kinds of eutrophication. Table 1. Eutrophication Standard Eutrophication
Oligotrophy
mesotrophy
Eutrophy
T-P(mg/m3)
15 below
15 ~25
25 ~ 100
T-N(mg/m )
400 below
400 ~ 600
600 ~ 1500
Chl-a (mg/m3)
3 below
3~7
7 ~ 40
Transparency(m)
4.0 More
2.5 ~ 4.0
2.5 below
3
4.2 Implementation of Middleware and the Results In the implementation of Query Processing Application, Java Development Kit, JDK 6.0 version was used for development, and query occurred at the Query Processing Application and Data on the query in the sensor node was checked. The implemental environment of CADPS middleware is shown in [Table 2]. [Fig. 5] is a CADPA user interface that a user can see. A user can see the location and state of sensor nodes installed in the whole wetlands on the screen. Each circle is a sensor node installed, red one and yellow one is divided into danger and caution separately, and the stable state in green can also be easily identified. The critical values of danger and caution are set as other critical value to the baseline of eutrophication that was set in situation analysis manager. when it exceeds the critical value of danger at sensor node, the sensor node is represented in read, and when it exceeds the critical value of caution, it is represented in yellow.
252
Y.-C. Hwang, R.-D. Oh, and G.-H. Ji Table 2. CADPS Middleware Implementation Environment Items
Content
Development Language
JDK SE 6.0
Development Environment
Eclipse 3.5
Development Platform
Windows 7
System Specification
Intel Core2 2.4GHz, Ram 2G
Fig. 5. CADPS User Interface
Fig. 6. Processing Data Format of CADPS Middleware
If you click on each sensor node, you can determine the rate of change of the incoming data from sensor nodes, and by representing each critical value of eutrophication by lines, it helps to accurate analysis of the situation. [Fig. 6] shows the status of sensing data generated from sensor nodes when a query is raised from the application. The data shown indicates more than 10 kinds of sensing data that can be collected from a sensor, which can be referred when you need to determine sensor status or when you need the existing exact data numerical.
A Sensor Data Processing System
253
4.3 Application Viewer Application Viewer is a module that shows information in the format of GUI in the smartphone so that you can identify sensor network list and status registered in middleware, and the sensor network state can be figured out in the form of List and Map. Map Viewer utilized Google Map API. [Fig. 7] shows an example of implementation of viewer at the actual Android driving simulator. The type of List is aligned according to sensor number and sensing data can be checked. The screen in the form of Map can be checked through the overall sensor status in the wetlands and the details of sensor nodes installed.
Fig. 7. Smartphone Applications Viewer
5 Conclusions The kinds of disaster that can be forecasted using Ubiquitous sensor network are tornado, El Nino, hurricanes, tsunamis, dust, floods, earthquakes, volcanic eruptions, wildfires, landslides, avalanches, and other natural disasters and group activity of abnormal animal pests, epidemic spread of microorganisms such as bacteria and pests, and large regional disaster and accident in the area where a group of people are living. Domestic and International services of USN may be slightly different depending on the circumstances of each country in terms of magnitude, operational work. and service but, in the end, it's often used for the purpose of social security protection services that aims at disaster forecast or disaster prevention. This paper recognizes artificial disaster of eutrophication, which is one of the water quality pollutions, of all the types of disasters, that can be detected and managed by using ubiquitous sensor network, and designs a system for the detection of the environments like wastewater discharge, the presence of toxic substances, deliberate contamination, etc. And it increases not only user's accessibility but also immediate situation awareness system by implementing mobile support module. This study applies the concept of ubiquitous to the wetland environment, collects data on the wetland environment from sensor network, and based on these facts, designs a system to predict eutrophication, and lays a foundation to allow for environmental management across the whole wetlands with the expansion of future
254
Y.-C. Hwang, R.-D. Oh, and G.-H. Ji
sensor networks.Therefore it is necessary to promote the diversity of future rule and efficient management of data, add relevant factors to allow management in the whole wetlands. Acknowledgments. This research was financially supported by the Ministry of Education, Science Technology (MEST) and Korea Institute for Advancement of Technology (KIAT) through the Human Resource Training Project for Regional Innovation.
References 1. Weiser, M.: Some Computer Science Issues in Ubiquitous Computing. Comm. ACM 36(7), 75–84 (1993) 2. Szewczyk, R., Mainwaring, A., Polastre, J., Culler, D.: An Analysis of a Large Scale Habitat Monitoring Application. In: Proc. ACM Conf. Embedded Networked Sensor Systems, pp. 214–226 (2004) 3. Chong, C.Y., Kumar, S.P.: Sensor Networks: Evolution, Opportunities, and Challenges. Proc. of the IEEE 91(8), 1247–1256 (2003) 4. Kahn, J.M., Katz, R.H., Pister, K.S.J.: Next Century Challenges: Mobile Networking for Smart Dust. In: Proc. of the 5th Annual ACM/IEEE International Conference on Mobile Computing and Networking, Seattle, WA, pp. 271–278 (1999) 5. http://www.kwater.or.kr/0678301250_43602.pdf, 8010471250_39398.pdf, kwater Environmental Kit 6. Choi, J.-H., Park, Y.-T.: A Dynamic Service Supporting Model for Semantic Web-based Situation Awareness Service. KIISE Information Science 36(9), 732–748 (2009) 7. Madden, S.R., Franklin, M.J., Hellerstein, J.M.: TinyDB: An Acquisitional Query Processing System for Sensor Network. ACM Trans. Database System 30(1), 122–173 (2005) 8. Bonnet, P., Gehrke, J., Seshadri, P.: Towards Sensor Database Systems. In: Tan, K.-L., Franklin, M.J., Lui, J.C.-S. (eds.) MDM 2001. LNCS, vol. 1987, pp. 3–14. Springer, Heidelberg (2000) 9. Heinzelman, W.B., et al.: Middleware to support sensor Network Applications. IEEE Network 18(1), 6–14 (2004) 10. Kim, M., Lee, J.W., Lee, Y.J., Ryou, J.-C.: COSMOS: A Middleware for Intergrated Data Processing over Heterogeneous Sensor Network. ETRI Journal 30(5), 696–706 (2008)
An Intelligent Context-Aware Learning System Based on Mobile Augmented Reality Jin-Il Kim1, Inn-woo Park2, and Hee-Hyol Lee3 1
Dept. of Electronic Eng., Hannam University, 133 Ojeong-dong, Daedeok-gu, Daejon 306-791, Korea [email protected] 2 Dept. of Education, Korea University, Anam-dong Seongbuk-gu, Seoul, 136-701, Korea [email protected] 3 The Graduate School of Information, Production and Systems, Waseda University, 2-7 Hibikino, Wakamatsu-ku, Kitakyunshu, Fukuoka, 808-0135, Japan [email protected]
Abstract. Learning content using context-aware mobile technology, whether the content is manual or interactive, is hardly expected to arouse learners to interest or immersion because most of real-life environment is discrete from mobile content. For this reason, Augmented Reality is used to fix the drawback and to provide learners with an educational environment fit for desirable practice of the theory of situated learning. Increasing interest in Augmented Reality in recent years has led to multiple study efforts to build applications based on Augmented Reality, most of which require additional hardware or software, resulting in difficulties in establishing proper learning environment in the field of education. Therefore, in this paper we propose an intelligent context-aware learning system based on mobile Augmented Reality that provides a hassle-free desirable learning environment requiring nothing but a common mobile device. Keywords: Mobile, Situated Learning, Context-Aware, Mobile Augmented Reality, Intelligent Agent.
1 Introduction Rapid development in information and communication technology and mobile device infra has driven the ubiquitous computing concept, a user-oriented computing paradigm guaranteeing computing resources are accessible wherever and whenever. At the same time, efforts have been made actively to link the ubiquitous computing paradigm with education. As smaller but smarter and faster computers have arrived, mobile or wearable computing technology has come into the spotlight for its location and context awareness. In light of learner-content interactions, Augmented Reality(AR) differs from Virtual Reality(VR), in that Augmented Reality enables a learner living in the real world to participate in direct and intuitive interactions with virtual content or service[1]. T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 255–264, 2011. © Springer-Verlag Berlin Heidelberg 2011
256
J.-I. Kim, I.-w. Park, and H.-H. Lee
Further, studies on mobile AR have been performed with a view to integrating the interactive aspects of AR with mobile computing so that learners can be represented regardless of time and place and provided with necessary information effectively[2]. However, to build a ubiquitous computing environment, relevant technology is required to encompass features to personalize learner-oriented service or content, to support content sharing and collaboration between learners through dynamic communities and to use diverse contexts. In short, context-awareness mobile AR need be studied to get over the existing technical limitations and to converge it to ubiquitous computing environment[3]. Context-aware mobile Augmented Reality is a new computing concept that in a ubiquitous smart space learners use portable personal mobile devices to collect, manage and apply information in line with their needs and contexts; to augment service or content in accordance with their contexts in real space; and to selectively share information for seamless interaction and collaboration[3]. Currently, in school education, context learning has been investigated actively as a means to provide learners with diverse and creative education. In compliance with demands for new types of effective services meeting the interest of learners rather than one-way uniform services, intelligent agents enabling situated learning as per each learner's situation and condition have been the subject of educational research. However, most of study efforts associated with situated learning have focused on web-based virtual space so far. To go beyond the limitations found in web-based content learning, investigators have turned their attention to mobile environment, where learners can directly download content fit for contexts from web servers or search the content fit for their needs in the data stored in their mobile devices to practice contextual learning. Still, such approaches lack in inciting interest and immersion in learning due to the discreteness of learners' real life environment from the content they use on mobile devices, irrespective of the content being manual or contextual[4]. Although studies on Augmented Reality are far from being universal, there are a few cases of AR-based applications. Unfortunately, such applications require additional hardware or software, which makes it rather infeasible to establish desirable learning environment in the field of education. Given the impracticable reality abovementioned, the present study attempts to solve the problem using a vision-based AR technology that allows provision of such learning environment requiring nothing but common mobile devices. Further, using the technology, this study proposes an intelligent context-aware learning system based on mobile AR that can automatically recommend learners appropriate learning content for their contexts. As an example of effective learning taking advantage of the proposed learning system, English conversation is selected here. As the proposed learning system provides learners with English sentences appropriate for the real situation the learner is in, it promotes connections between knowledge and real-life experiences in the context where the knowledge is practically used.
An Intelligent Context-Aware Learning System Based on Mobile Augmented Reality
257
2 Related Study 2.1 Situated Learning According to Situated Learning, knowledge acquired in the context of real life is more practically applicable to problem solving in reality, maximizing the effects than the one gained through direct teaching[5]. Therefore, learning in context is one of the new learning methods drawing most attention in the field of education in recent times. So far, studies related to learning in context have mostly concerned learning in virtual space on the web. However, as the mobile technology has brought about innovative changes in each field of the society and culture and met with the limitations of the web-based contextual learning, study efforts have been furthered in respect of the learning in context. Existing Situated Learning using mobile devices, however, relies on cell phones characterized by small screens, difficulties in using multimedia features and limitations in storing learning content. Also, even though knowledge can be delivered in real time through the wireless internet, learners themselves have to perform Situated Learning and get learning content necessary for current contexts by downloading it from the web server or by searching the content on their mobile devices. In view of the passivity of content including the time and procedure taken to download content from the web server, and to search information necessary on mobile devices and the lack of skills in handling mobile devices can hamper motivation for learning. Even interactive content is hardly expected to stimulate learners to become interested and immersed in learning because most of real-life environment and the content on the mobile devices are discrete. Thus, an educational environment that can translate Situated Learning into reasonable action needs implementing. 2.2 Intelligent Agent An intelligent agent is a system attempting to achieve goals rather autonomously in a complex and changing environment and has been studied in the field of artificial intelligence. A manually designed intelligent agent requires sufficient knowledge on the application domain and has a disadvantage in that system performance is fixed at the very beginning. To cope with the disadvantage, machine learning that develops computer algorithm automatically through experience is used to design an intelligent agent[6][7]. Machine learning is a technology that generates programs by extracting rules or patterns from large-scale data sets, using neural network, decision tree, genetic algorithm, SVM (Support Vector Machine) and Bayesian Network[8]. As the method to design an intelligent agent using machine learning constructs the agent based on information collected from the surrounding environment, it is applicable to diverse domains and maintains the quality of work by adaptively changing the agent following changes in the environment or learners' characteristics[6][9]. A few application cases of machine learning in intelligent agents are as follows. Mitchell et al. applied the machine learning to an agent managing meeting schedules by modelling learners' characteristics with a decision tree[6], and Lee & Pan used the
258
J.-I. Kim, I.-w. Park, and H.-H. Lee
genetic algorithm to learn learners' characteristics by optimizing a fuzzy model in designing meeting schedules[9]. Schiaffino and Amandi used the reinforcement learning to learn learners' characteristics or habits and to provide appropriate notification service. In an intelligent agent performing information search, machine learning has been adopted to understand learners' intention more accurately. Hsinchun, Yi-Ming, Ramsey, Yang et al. analyzed learners' search preference using genetic algorithm and proposed a personalized information search agent[10]. Horvitz, Breese, Heckerman, Hovel and Rommelse used the Bayesian Network to apply an interactive agent to MSOffice that infers learners' intention and provides corresponding information through natural language[11] and modelled expert knowledge with neural network in a personalized information system based on the Hamdi multi-agent[12]. In the present study, an intelligent context-aware learning system is implemented for mobile environment requiring real-time data processing based on the Bayesian Network, a model-based collaboration filtering method. 2.3 Augmented Reality Augmented Reality(AR) technology capable of combining actual reality with learning information can implement a teaching method nearly conforming with the contextual learning, which is why AR draws much attention as a new learning medium maximizing learning effects. AR-applied learning environment can serve many effective ends including increase in sensory immersion, experience-oriented learning by direct manipulation, promotion of contextual awareness and building cooperative learning environment. Examples of AR applied to education are as follows. 'Invisible Train' implemented by Daniel Lederman et al. is the first example of using PDA to build an AR, where two learners can interact. The small wooden railway represents a real context, and the virtual train built on the railway is manipulated as part of the game. As it works on PDAs, the system can be used by diverse users and applied to different fields[13].Archaeology and Culture Heritage' developed by M.White, F.Liarokapis et al. represents information on cultural heritages based on AR techniques. Learners can watch the 3D images implemented with the AR along with physical heritages on the web[14]. The University of Tokyo's Mono Gatahari project is an example of applying AR to science class, where on touching a trilobite fossil, learners can see images of its fossilization process and inner structure through a wearable spectacles-like display and hear explanation, which is a story-telling learning environment implemented[15]. In Austria, D. Wagner developed ‘Kanji Teaching' system working on PDAs. This system is for learning Chinese characters based on AR games. Paper cards on which Chinese characters are printed are put on the table, and users find character cards corresponding to the icons presented on PDAs. Once they flip cards found, corresponding pictures are displayed in 3D[16]. 'AR-Memo' developed by the U-VR Lab in the Gwangju Institute of Science and Technology enables learners wearing HMD to take notes on real images with a virtual pen augmented at the tip of their fingers and to see memos on the palm while they are in motion. Unfortunately, owing to costly devices and complicated environment setup as well as HMD, the system is far from being widely used[17]. This study suggests a
An Intelligent Context-Aware Learning System Based on Mobile Augmented Reality
259
realistic alternative to those problems mentioned above. The mobile AR-based intelligent context-aware learning system proposed here can provide such a learning environment that requires nothing but common mobile devices.
3 Intelligent Context-Aware Learning System Based on Mobile AR The proposed learning system operates in ubiquitous computing environment, where information on learners and surrounding environment is delivered to portable information terminals for mobile learners' awareness of context. Each mobile device analyzes acquired information and user requirements for the purpose of service and provides service content meeting learners' preference. The proposed intelligent learning agent consists of the learner-oriented context-aware information module, the learner profile information inference module and the learning content recommendation module. Fig. 1 shows the modules the proposed system interacts with to perform mobile personalization.
Fig. 1. The Proposed System
3.1 Learner-Oriented Context-Aware Analysis Information Module The learner-oriented context-aware analysis information module models learneroriented context-aware information to express diverse context-aware information for a learner. The learner-oriented context-aware information module as in Fig. 2 acquires information from diverse external sensors and analyzes learners' context-aware information.
Fig. 2. Learner-oriented context-aware information module
3.2 Learner Profile Information Inference Module Learner-related context-aware information among diverse types of context-aware information scattered around a learner is an essential element providing the personalized service. In general, to determine personal preference for service, basic
260
J.-I. Kim, I.-w. Park, and H.-H. Lee
personal profile information and personal tendency or characteristics in using services over time must be collected and analyzed. The following formula yields a learner's preference.
Plearner = ( LO e1 × ω1 ) + ( LO e 2 × ω 2 ) + ... + ( LO en × ω n )
(1)
Here, LOe1, LOe2, … LOen are factors influencing the preference for learning objects; ω represents weight values of e; and ω1, ω2, …ωn add up to 1. Mobile devices learners carry collect and learn learners' service use and feedback information to easily analyze contextual conditions related to each service used. By doing this, services can be optimized without learners' direct specifying contextual conditions, which in turn saves the burden on the server side to manage and update each learner-specific preference for service. Also, by learning the information on service used by each learner to provide personalized service, the system can provide services reflecting each learner's personal preference information. The learner profile information inference module applies each learner's interest and behavior patterns in using services and compares the information with predetermined learner profile to determine through the inference engine the information that is most suitable for learner's context. Namely, based on learner profile, the context-awareness learning system uses the Bayesian classifier to recommend the contextual service most suitable for each learner without human intervention. The Bayesian classifier is easy to implement and allows reasonable explanation on results derived, which is why it is used in many learning systems. Also, it learns the context of each learner using the learner profile history. In the proposed system, the module results in information, which is used to automatically predict and provide services necessary for each learner. Fig. 3 shows overall operation process of the learner profile information inference module.
Fig. 3. Learner profile information inference module
First, in the Context Preprocessor, diverse information on each learner gained from each sensor is processed into certain types of context-aware information. Next, the Preference of Learner consists of the Rule Engines and the Naive Bayesian classifier and learns the preference of each learner. The Rule Engines defines and manages the rules on each learner's preference. Defined rules go through the Bayesian classifier to provide services that are most suitable for each learner. The Naive Bayesian technique is the most simple form in the Bayesian Network and assumes that every factor is independent to estimate probability relatively easily and simply, so it is very useful in implementing the system that enables learning the preference of each learner.
An Intelligent Context-Aware Learning System Based on Mobile Augmented Reality
261
3.3 Learning Content Recommendation Module The learning content recommendation module is the very core of the proposed system, where information to be serviced is recommended based on the context-aware information fit for each learner's preference and intelligent recommendation service for learning is guaranteed in ubiquitous environment. This module, based on learner profile and learning content data, recommends content each learner prefers. A plethora of learning content is stored in personalized mobile devices. Each learner prefers personalized lists of content estimated based on personal preference to random lists of much content. Learner's preference is contained in the learner profile where learners' preference is kept. The preference profile is variable and contains each learner's current preference for media content. In this learner content recommendation module, data mining technique is used to provide useful information for each learner out of a pile of scattered data. The overall schematic of this system is in Fig. 4.
Fig. 4. Learning content recommendation module
The system's data sources are learner profile and content meta data. The learner profile keeps learner's preference for the meta data. The content consists of meta data describing the content of photos. The two data sources undergo a conversion process, where the data transforms into the types applicable to the system. The processed data are largely transformed into contextual data and content information. The transformed data estimate values with the similarity measurement. Data estimated using the similarity measurement recommend learning content, considering learner's preference, in the order of magnitude of values.
4 Results and Implementation of the System To test the performance of the proposed system, the effectiveness and usability are measured here[18]. The effectiveness of retrieval is usually measured by the following two quantities, recall and precision:
recall =
relevant set I retrived set relevantset I retrivedset , precision = relevant set retrivedset
(2)
Let us assume that relevant_set represents the set of relevant learning objects and petrived_set represents the set of retrieved learning objects.
262
J.-I. Kim, I.-w. Park, and H.-H. Lee Table 1. Environment of System Development
OS Language Database Mobile Device
android 2.2 Froyo Java JDK, android SDK SQ Lite galaxy tab
Table 1 shows the environment needed to develop the intelligent context-awareness learning system based on mobile AR. To test the system performance, an experimental group of 30 learners are set. The learners are asked to move a certain distance randomly while using the service and the extent to which information suitable for learners is recommended in the context recognized is measured and evaluated. As seen in Table 2 regarding the result of the experiment, the proposed method of learner information and personal preference performs better in precision than conventional to use characteristics approaches. Table 2. Experiment result
Method Collaborative Filtering Content-Based Filtering Proposed Method
Precision 70 80 82
Recall 70 76 66
According to Nielsen's definition on usability, a high usability refers to low mental burden on account of easiness to learn and remember, high effectiveness in use, high subjective satisfaction and low error rates. Among 10 heuristic assessment items Nielsen suggested, the items that are meaningful for comparison with conventional systems[19] are chosen. 5-point scale is applied to each assessment question. Table 3. Comparison of two systems items Visibility of system status Match between system and the real world User control and freedom Consistency and standards Recognition rather than recall Flexibility and efficiency of use Aesthetic and minimalist design
MLLAS[19]
3.17 3.20 3.15 3.12 3.07 3.14 3.21
Proposed System
3.75 4.35 3.80 3.78 3.84 3.40 4.05
An Intelligent Context-Aware Learning System Based on Mobile Augmented Reality
263
The significant difference in assessment values regarding the item, "Match between system and the real world", implies that AR technology ensures that real-life environment and the content on mobile devices are not discrete, inducing learners to further interest and immersion. Compared to previous learning systems using mobile devices, the proposed mobile device has a relatively large screen and superior system performance, resulting in higher satisfaction in general. Fig. 5 is a screen on a learner's device showing relevant shops around the learner's current location. Here, instead of shop names, business types are showed. Horizontally, the center is approximately within 50m from the learner's current location, and the higher the location on the screen, the farther the real distance. If there are relevant business types behind the learner, their locations are marked below the horizontal center. General business types are marked in white; favorites are marked in green; and frequent visits are marked in red. The compass at the upper left corner on the screen is based on north and south SE, and indicates 134 degrees north, 36.20.54.78 latitude and 127.24.58.68 longitude. Fig. 6 is a screen turning up on pressing 'clinics and hospitals' and shows English conversation used in the sector.
Fig. 5. Result Screen
Fig. 6. English Conversation Screen
5 Conclusion The present study aims to design and implement an intelligent context-aware learning system based on mobile AR that is possibly to provide a learning environment using common mobile devices only. The proposed learning system overcomes the issues of mobile devices including the limited screen size by using the Galaxy Tab. It is an intelligent context-aware learning system using learners' context-aware information gained from mobile devices as well as users' profile information. To extend the application of the proposed intelligent context-aware learning system based on mobile AR developed here, subsequent studies need conducting as follows. First, to enable learners to use the developed system more easily and conveniently, a program need be developed to register and manage locations they want. Second, study efforts should be made to re-use existing mobile content on the proposed system. Third, further studies must be performed to build diverse learning content applicable to the proposed system. Acknowledgments. This paper has been supported by 2011 Hannam University Research Fund.
264
J.-I. Kim, I.-w. Park, and H.-H. Lee
References [1] Azuma, R., Baillot, Y., Behringer, R., Feiner, S., Julier, S., MacIntyre, B.: Recent advances In augmented reality. IEEE Computer Graphics and Applications 25(6), 34–47 (2001) [2] Höllerer, T., Feiner, S., Terauchi, T., Rashid, G., Ha11away, D.: Exploring MARS: Developing indoor and outdoor user interfaces to a mobile augmented rea1ity system. Computers & Graphics 23(6), 779–785 (1999) [3] Woo, W., Jeon, M.-G., Nam, T.-J., Lee, S.-G., Cho, W.-D.: CAMAR, JinhanM&B [4] KERIS, Research on Using Augmented Reality for Interactive Educational Digital Contents (2005) [5] Cognitive and Technology Group at Vanderbilt. Anchored instruction and situated cognition revisited. Educational Technology 33(3), 52–70 (1993) [6] Mitchell, T., Caruana, R., Freitag, D., Dermott, J.M., Zabowski, D.: Experience with a learning personal assistant. Communication of the ACM 37(7), 80–91 (1994) [7] Thomas, E.R., John, H.G., Henrik, E., Angel, R.P., Samson, W.T., Mark, A.M.: Reusable ontologies, knowledge-acquisition tools, and performance systems: PROTE’GE’-II solutions to Sisyphus-2. International Journal of Human-Computer Studies 44(3-4), 303– 332 (1996) [8] Hong, J.H., Cho, S.B.: Machine Learning and Intelligent Agents. The Korean Institute of Information Scientists and Engineers 25(3), 64–69 (2007) [9] Lee, C.S., Pan, C.Y.: An intelligent fuzzy agent for meeting scheduling decision support system. Fuzzy Sets and Systems 142(3), 467–488 (2004) [10] Hsinchun, C., Yi-Ming, C., Ramsey, M., Yang, C.: An intelligent personal spider(agent) for dynamic internet/intranet searching. Decision Support Systems 23(1), 41–58 (1998) [11] Horvitz, E.J., Breese, D., Heckerman, D., Hovel, D., Rommelse, K.: The Lumiere project, Bayesian user modeling for inferring the goals and needs of software users. In: Proceedings of the 14th Conference on Uncertainty in n Artificial Intelligence, pp. 256– 265 (1998) [12] Hamdi, M.S.: MASACAD: A multi agent-based approach to information customization. IEEE Intelligent Systems 21(1), 60–67 (2006) [13] The Invisible Train, http://studierstube.icg.tu-graz.ac.at/invisible_train/ [14] ARCO Project, http://www.soi.city.ac.uk/~fotisl/AREL/archaeology.htm [15] MonoGatahari Project, http://www.beatiii.jp/projects.html [16] Wagner, D., Barakonyi, I.: Augmented Reality Kanji Learning. In: Proceedings of the 2nd IEEE/ACM Symposium on Mixed and Augmented Reality (ISMAR 2003), pp. 335– 336 (2003) [17] Ha, T., Woo, W.: Video see-through HMD based Hand Interface for Augmented Reality, pp. 169–174 (2006) [18] Khan, L., McLeod, D., Hovy, E.H.: Retrieval effectiveness of an ontology-based model for information selection. VLDB J. 13(1), 71–85 (2004) [19] Kim, J.-i., Lee, Y.-H., Lee, H.-H.: Development of a mobile language learning assistant system based on smartphone. In: Kim, T.-h., Vasilakos, T., Sakurai, K., Xiao, Y., Zhao, G., Ślęzak, D. (eds.) FGCN 2010. CCIS, vol. 120, pp. 321–329. Springer, Heidelberg (2010)
Inference Engine Design for USN Based Wetland Context Inference So-Young Im, Ryum-Duck Oh, and Yoon-Cheol Hwang* Chung-ju National University, Computer Science and Information Engineering, 72, Daehak-ro, Chungju-si, South Korea [email protected], [email protected], [email protected]
Abstract. With gradually increasing movement to live together with nature, artificial wetlands are increasing as well. And the wind of change blows at the rivers and streams, thereby increasing the need for management systems. Accordingly this study proposes to design for this through application of context inference of USN and production rules for Context Inference Engine of wetland management system by using JESS. The produced rules in this paper can decide the grade of Eutrophication on wetland environment then predict the status of the wetland based on facts collected from sensor networks. Keywords: Wetlands, Eutrophication, Context-Inference, JESS, Sensor Network.
1 Introduction Wetland is a wet ground surrounded by rivers, ponds, and swamps and also a reserve of natural resources where water always remains under natural environment and lasts long[1][2]. Thus, ongoing maintenance is required to better manage important wetlands. The wetland needs to be constantly checked in its nature. However, with the use of these measurement tools only, it is difficult to know exactly the situation that wetlands change every time and, the time to prepare may be insufficient. Accordingly this study aims to prepare for this through application of context inference techniques of USN(Ubiquitous Sensor Network) and production rules for wetland control system design. The produced rules in this paper can decide the grade of Eutrophication on wetland environment then predict the status of the wetland. This paper is organized as follows. Chapter 2 discusses the basic knowledge on the situation reasoning and rule-based languages needed in designing reasoning rules. Chapter 3 proposes reasoning engine to predict the state of water quality in the wetland environment. Chapter 4 applies the performance of the created rules into the wetland environment situation data and then evaluates the performance. Chapter 5 evaluates the study results and suggests future researches and brings to a conclusion. *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 265–274, 2011. © Springer-Verlag Berlin Heidelberg 2011
266
S.-Y. Im, R.-D. Oh, and Y.-C. Hwang
2 Related Works Context-inference is an inference of new facts from already-known facts and knowledge. Context inference method is largely divided into two things: Rule-based reasoning and Case-based reasoning[3]. Rule-based reasoning is a method that operates equipment as long as condition of a sensor is met regardless of place and user. Contrary to this, Case-based reasoning is a method to infer based on the previous cases. Compared to rule-based reasoning, it is easy to learn and convenient to adapt to exceptional rules. Generally rule-based inference is applied in an expert system. Rules make it possible to infer another fact based on facts, the collected data from a sensor.
Fig. 1. Indicates a representation method of rules
The production memory is a set of productions (rules). A production is specied as a set of conditions, collectively called the left-hand side(LHS), and a set of actions, collectively called the right-hand side(RHS). One condition or more come into LHS, and Action is followed on RHS(Reft Hand Side), when the conditions are met. Such rules should be set to take a specific action when datum that a sensor collects meets a certain condition or rule that a user has made. Or it can be predicted that a certain situation can happen. This is called as rule-based context-inference[4]. For example, it is to perform the specified information when a factor based on water quality data collected from a sensor has a specific data value. The Rete Match Algorithm is a method for comparing a set of patterns to a set of objects in order to determine all the possible matches. It was described in detail in this article because enough evidence has been accumulated to make it clear that it is an efficient algorithm which has many possible applications[5]. Jess is an abbreviation of Java Expert System Shell, and it is a rule-based expert system with features of Java language. Jess is a language which has rule-based system concept similar to LISP but, it is easy to define rules and codes and it has a powerful Java API support environment such as networking through integration with Java, graphics, database connection [6][7][8].
3 Inference Engine for Wetland Context-Aware The main purpose is rule design for context inference engine which allows us to figure out current status of water quality by using JESS language based on rule-based inference method based on collected data from a sensor.
Inference Engine Design for USN Based Wetland Context Inference
267
3.1 System Model System model uses a standard interface for sensor network and consists of incomplete data processing, stream data processing, and situation awareness and reasoning support module.
Fig. 2. System Model of Context-aware Inference Engine
1) Interface manager: As a component which provides a standardized interface for USN sensor networks, each sensor network is a module which complies with common message specifications for sensor data acquisition, monitoring and control function in the internetworking to sensor network common interface. 2) Step2 IDPM (Incomplete Data Processing Mechanism) module: It is a module which analyzes patterns that appear when an amount of power is insufficient and the patterns of incomplete data resulting from increasing distance of communication distance, determines the relationship between attributes of sensors, and discovers and treats imperfections. 3) Metadata Manager: It combines packets transmitted from a sensor network with use of metadata and changes it into completed information, and reduces frequent access to database with an application of a specific technique, Metadata Set. 4) Rule Manager: It receives environmental information of a sensor network done by Stream Data Processor and matches database to the stored inference rule. Resultingly, if necessary, it is re-loaded on WM to go through matching tasks, and it comes to perform a particular action depending on reasoning results. 5) Application Interface: This application interface controls sensor network, and provides an appropriate interface with application interface so that it can be monitored 3.2 Context Inference Engine At first a certain event occurs at a network, and data values from a sensor come in (Input Data). Such raw data values are stored in Working Memory(WM) set. Users' predefined set of rules are stored in the Data Base. Through comparison process between the sensed values stored in WM and such rules, facts-level value is determined. And then such facts-level values are loaded into WM and are going through matching process with the rules that finally determine water quality.
268
S.-Y. Im, R.-D. Oh, and Y.-C. Hwang
Fig. 3. Inference Engine Structure
Such process performance is a Pattern Matcher of Inference Engine. Pattern Matcher goes through the process of comparison between facts values stored in WM and rules stored in Database and stores the matched values into Conflict Set. The Conflict Set calculates reliability depending on the order of priority of such matching values and sends matching results that are considered to be the most accurate to execution engine, and performs a certain action correspondingly at Service Module. 3.3 Production Rule of Inference Engine To predict the status of water quality in the wetland using factors that can be measured in a sensor, it is necessary to establish the correlations between such factors and the criteria to determine eutrophication[9][10]. To set up such standards, eutrophication assessment criteria and relations commonly used in evaluating the water quality, and resources measured from the actual wetland were synthesized and correlation was established. When we make clear the correlation between water quality items and then determine the degree of eutrophication from that, we can estimate the other item out of one item[11]. The factors that can be measured in a sensor network are DO, BOD, SS, and PH. As the eutrophication should be determined with the values that can be earned from a sensor, the correlation between these facts was established. DO can be used as one fact of eutrophication criteria, and from which the eutrophication can be determined independently. When you make a reference to the actual data, it shows the signs of increased levels of DO as it transits from being oligotrphic to being eutrophic. BOD is typically used as an indicator to determine water quality, and generally inversely proportional to DO. As the state of the lake transits from being oligotrophic to being eutrophic, the level increases.
Inference Engine Design for USN Based Wetland Context Inference
269
Table 1. Indicates the water quality data that is measured and recorded on a regular basis in various areas of Lake Chungju No. 1 2 3 4 5 6 7 8 9 10
DO 3.7 6.7 7.4 8.7 9.6 7.5 11 5.9 8.6 8.9
BOD 1 0.9 7 0.9 0.9 0.7 0.6 0.8 0.9 0.9
T-N 3.459 2.679 1 2.717 2.643 2.632 2.57 1.836 2.127 2.538
T-P 0.168 0.007 1 0.011 0.008 0.023 0.006 0.027 0.019 0.018
SD 3.5 4.3 2.77 4.7 4.5 3.5 3.2 2.7 2.6 2.7
SS is a measurement of the amount of suspended solids contained in the lake, being closely related to SD(transparency), which is the criteria to determine eutrophication and inversely proportional. Transparency is closely related to the concentration of N, P and the concentration of Chlorophyll-a changed depending on the growth of algae. But in case of the lake in which there are many other suspended matters other than phytoplankton, it is not a so important indicator for the determination of eutrophication. PH remains in nearly constant value regardless of water quality. But it is used as criteria to determine whether it is abnormal when levels are too low or too high. Rules are designed by using rule-based language, JESS[6]. Definition of facts. Deftemplate is a function used to define unordered facts in Jess. The facts defined by a function of defempalte are independent of order if a name is changed by using slot. As the facts that we use here are regardless of order, they are defined in this way. (deftemplate IF "Input-Facts"; (slot BOD) (slot SS) (slot DO) (slot PH) (deftemplate FL "facts-level"; (slot BOD-level) (slot SS-level) (slot DO-level) (slot PH-level))
It shows collectible facts from the wetlands environmental sensor networks. Indicators are four kinds of quality management standards. Facts represented as “facts-level” are necessary to design rules to determine the final water quality. Based on water quality management standard, DO,BOD has 3 standard points and it is divided into 3 stages; SS into two stages.
270
S.-Y. Im, R.-D. Oh, and Y.-C. Hwang
The design of rules depending on the correlation with DO. When defining rules at Jess, Defrules is used. (defrule DO1 ?d <- (DO ?x&:(>= ?x 10)) => (Do-level1) (retract ?d) (assert (DO-level 1))) (defrule DO2 ?d <- (DO ?x&:(and (< ?x 10) (>= ?x 7))) => (Do-level2) (retract ?d) (assert (DO-level 2))) (defrule DO3 ?d <- (DO ?x&:(< ?x 7)) => (Do-level3) (retract ?d) (assert (DO-level 3)))
Water Quality Standard Rules is based on the relationship with eutrophication criteria to divide DO and BOD into 3 steps and SS into 2 steps. As DO is a factor that plays a role in determining the eutrophication in the wetland, it is considered with high priority. DO was designed in the way of defining 3 steps of the level. For the rule, it is divided into LHS which belongs to if part on the left and RHS which belongs to then part on the right, on the basis of this symbol of "=>". To take an example of DO1, it is a way of adding fact, ‘DO-level 1’ to WM if the criteria has been met when the value coming from a sensor proves to be 10 (mg/L) or more. Do-level ranges from level 1 to 3, which means that as it is closer to Do-level 1, the level of DO is higher. This also means that it is less likely to be in eutrophication depending on the correlation with the eutrophication criteria. Conversely, if it is on level 3, it means that it is bad in water quality and that it is likely to be in eutrophication. Therefore the importance of this was considered on the design of rules and the priority among rules was set higher. The rules to determine the state of water quality depending on the values of BOD measured. (defrule BOD1 ?b <- (BOD ?x&:(>= ?x 1)) => (Bod-level1) (retract ?b) (assert (BOD-level 1))) (defrule BOD2 ?b <- (BOD ?x&:(and (< ?x 1) (>= ?x 0.7))) => (Bod-level2) (retract ?b) (assert (Bod-level 2)))
Inference Engine Design for USN Based Wetland Context Inference
271
(defrule BOD3 ?b <- (BOD ?x&:(< ?x 0.7)) => (Bod-level3) (retract ?b) (assert (BOD-level 3)))
Likewise in DO, it consists of 3 steps, through which it operates in the way of setting up level depending on the range of BOD level. As we’ve seen from the correlation above, BOD is inversely proportional to DO. It means that as it is closer to BODlevel 1, BOD value becomes higher. From this we can see that DO level is low and that water quality is not good. Adversely, as it goes closer to level 3, BOD level decreases. And from this, we can presume that water quality is better. Detection of Dangerous Situations depending on PH. (defrule PH ?ph <- (PH ?x&:(or (> ?x 9) (< ?x 6))) => (Ph-level) (retract ?ph))
PH does not affect to the determination of eutrophication but, if PH is too high or too low, it is possible to determine whether it is in dangerous situation or not, and so warning message is set to be printed when PH is more than 9 or when PH is less than 6. A part of the final rule which was made by synthesizing BOD, SS, DO level facts. If it’s Do-level 2, if BOD-level is less than 2, and if SS-level is 2 or less, it’s determined as Grade D(Warning) for warning message to be printed out. (defglobal ?*grade* = A) (deffunction gradeD() (printout t "grade D " crlf)) (defrule GRADED1 ?gradedo <- (DO-level 2) ?gradebod <- (BOD-level ?x&:(< ?x 2)) ?gradess <- (SS-level ?y&:(<= ?y 2)) => (gradeD) (retract ?gradedo) (retract ?gradebod) (retract ?gradess) (bind ?*grade* = D))
경고
The risk of eutrophication can be divided into six steps: stable(A), to observed(B), cautious(C), warning(D), dangerous(E), urgent(F). The criteria for determining the risk is set based on the importance of rules. DO is an important factor to determine the status of eutrophication, and the priority is set much higher than the other factors like BOD and SS in setting its rating. As a rule to determine the ratings of eutrophication,
272
S.-Y. Im, R.-D. Oh, and Y.-C. Hwang
there are 18 rules and from which we can predict the status of wetlands. The level of eutrophication is determined based on synthesized facts-level depending on the coming facts from a sensor network in the wetlands environment.
4 Experiment Result The evaluation criteria to determine the eutrophication is divided into a. evaluation by single item assessment and b. evaluation by multiple ones, and so in order to determine the exact status, it is desirable to study with use of synthesized multiple items. With such method, we can estimate the general correlation that exists between water quality items in consideration of physical, chemical characteristics. In addition, if you are using two or more items, you will be able to make more accurate assessment [12]. Table 2 represents the criteria for determining the degree of eutrophication in the lake. The eutrophication is divided into three major steps: we call it an oligotrophic lake when the degree of eutrophication is not severe but, rather, relatively in mild state; a mesotrophic lake when the eutrophication has progressed to some extent; a eutrophic lake when the eutrophication has progressed a lot and needs to be managed. Table 2. Eutrophication Criteria Eutrophication T-P(mg/m3) Chl-a (mg/m3) SD(m)
Oligotrophic 〈 15 <3 >4
Mesotrophic 15 ~25 3~7 4 ~ 2.5
Eutrophic 〉 25 >7 < 2.5
The items for evaluation are largely divided into three major parts, among these the only item that has the biggest impact on assessment is T-P(Total Phosphrus). Chl-a is an abbreviation of chlorophyll-a, indicating the concentration of phytoplankton. SD(Secchi Depth) is the simplest indicator to evaluation the eutrophication, being inversely proportional to the concentration of phytoplankton. But in case that suspended matters other than the phytoplankton are present a lot, its disadvantage is that it is less accurate. Table 3. Indicates 10 randomly selected data among the water quality data in Lake Chungju No. 1 2 3 4 5 6 7 8 9 10
PH 7.6 8.1 8 7.4 8 7.1 7.4 7.9 7 8.5
DO 3.7 5.8 6.3 7 6.7 8.7 9.6 9.9 7.8 11
BOD 1 1.1 1.1 1 0.9 0.9 0.9 0.7 1 0.6
SS 3 2 1 1 1 0.3 05 0.7 1.3 2
Inference Engine Design for USN Based Wetland Context Inference
273
For design of rules to determine the status of eutrophication in the wetland and tests, data measured from various areas in Lake Chungju is used for experiments. An experiment was conducted to figure out if the rules designed by putting in 10 data facts are correctly operated. In order to compare whether the degree of eutrophication is correctly defined, the degree of eutrohpication is determined through the eutrophication criteria- T-P, T-N, SD, chl-a- based on test data and then compared.
Fig. 4. The result obtained after putting in one test data at Jess. It is the experimental data having values of DO 7.8, SS 1.3, BOD 1, and PH 7. The results earned from the rules are Bodlevel 1, Ss-level 2, and Do-level 2. From this, we can see that Grade D(warning) is set and the warning message is printed out.
The degree of eutrophication is determined from the results earned through such reasoning process and the actual data and then compared. Table 4. The comparative result between the test result and the criteria for the actual degree of eutrophication No.
PH
DO
BOD
SS
Test Result
Eutrophication
match
1 2 3 4 5 6 7 8 9 10
7.6 8.1 8 7.4 8 7.1 7.4 7.9 7 8.5
3.7 5.8 6.3 7 6.7 8.7 9.6 9.9 7.8 11
1 1.1 1.1 1 0.9 0.9 0.9 0.7 1 0.6
3 2 1 1 1 0.3 0.5 0.7 1.3 2
F F E C E C C B D A
E,F E,F E,F C,D A,B C,D A,B A,B C,D A,B
O O O O X O X O O O
We can see that the inference result and the degree of eutrophication in the lake is 80% agreeable. This result shows that it is highly accurate between reasoning result using facts measured from sensor network and the result measured in the actual field.
274
S.-Y. Im, R.-D. Oh, and Y.-C. Hwang
5 Conclusion In the past, in order to exactly determine the ratings of water quality in the wetland, for example, streams and rivers, a person has to go out to the field himself or herself on a regular basis in order to measure, compare, and determine. Accordingly, this paper applies the concept of ubiquitous to the wetland environment, based on facts that are collectible from a sensor network, determines the correlation between eutrophication criteria that’s been determined from the actual field, and designs the situation reasoning rules that are available to determine the ratings of eutrophication. From such rules, we can predict the status of water quality in the wetland much faster using reasoning rules without any efforts from a person’s hands. This paper uses data collected from the actual wetland environment, applies situation reasoning rules for the designed wetland water quality, and proves its accuracy. Furthermore, based on such rules, more research is needed to improve accuracy through the designs of various situations that may happen in the wetland and the detailed rules that can be applied to more water quality data. And these rules should be reflected on the design of the situation reasoning engine and the situation reasoning engine should be constantly developed so that a user can obtain more accurate reasoning results. Acknowledgments. This research was financially supported by the Ministry of Education, Science Technology (MEST) and Korea Institute for Advancement of Technology(KIAT) through the Human Resource Training Project for Regional Innovation.
References 1. water Environmental Kit, http://www.kwater.or.kr/0678301250_43602.pdf, 8010471250 _39398.pdf 2. Natlonal Institute of Environmental Research EnvironmentalKit (1990), http://library.nier.go.kr/SearchResultjsp/ 3. Lee, H.-S., Lee, J.-H.: A Study on the conceptual model of the ubiquitous residential environment based on the context-aware inference (2009) 4. Doorenbos, R.B.: Production Matching for Large Learning Systems, Computer Science Department, Carnegie Mellon Univ. (1995) 5. Forgy, C.: Rete: A Fast Algorithm for the Many Patterns/Many Objects Match Problem. Artificial Intelligence 19(1), 17–37 (1982) 6. Jess, the Rule Engine for the Java Platform, http://www.jessrules.com/jess/docs/index.shtml 7. Friedman-Hill, E.J.: Jess, The Java Expert System Shell, December 3 (1998) 8. University of Alberta Press: Over fertilization of the World’s Freshwaters and Estuaries 9. Bartram, J., Carmichael, W.W., Chorus, I., Jones, G., Skulberg, O.M.: Chapter 1. Introduction. In: Toxic Cyanobacteria in Water: A Guide to Their Public Health Consequences, Monitoring and Management. World Health Organization (1999) 10. Horrigan, L., Lawrence, R.S., Walker, P.: How sustainable agriculture can address the environmental and human health harms of industrial agriculture (2002) 11. Park, S.-H.: Report for water quality of Artificial lake (2003) 12. DreamTest, Kinds of sensors, http://www.dreamtest.co.kr/
Metadata Management System for Wetland Environment Context-Aware Jun-Yong Park, Joon-Mo Yang, and Ryum-Duck Oh* Dept. of Computer Science and Information Engineering Chung-Ju National University Chungju-si, South Korea {pjy1418,greatyjm}@gmail.com, [email protected]
Abstract. In recent years, USN environment has been developed and used in various fields. In such USN fields, Context Aware technology has been developed into one that enables a computer to determine in an active way and perform a service without user’s needs by making USN more intelligent. However, due to the lack of un-standardized metadata management and Context Aware technology in a sensor network where there are a variety of resources of objects, the development of the Context Aware technology has been delayed. So this study proposed metadata management system that provides Context Aware data suitable for Context Aware and reasoning after gathering/managing efficiently USN environmental resources’ information and then processing the information gathered. Through the metadata management system proposed by this paper, the load of USN Context Aware system can be reduced and more accurate Context Aware can be realized. Keywords: Metadata Magament, Context-aware, USN(Ubiquitous Sensor Network).
1 Introduction In recent years, the subject of the information generated is being transformed into ubiquitous network society in which not only human beings but also things in the surrounding environment where they are living are organically combined and utilized. Ubiquitous Sensor Network (USN) refers to a technology that attaches various sensors and tags to natural environment and things to provide information of the region in real time. As a representative example with such USN technologies applied, wetlands environment Context Aware system can be taken. Such Context Aware systems are called “context-aware computing” since it is a study of field to make a much smarter computer in order to take advantage of useful information for a system even if a user did not enter information, as one of the technologies to support an intelligent computer as computing devices are increasingly intelligent. The devices or computers embedded with such situational computing have the ability to provide relevant and useful services by detecting surrounding situations. *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 275–284, 2011. © Springer-Verlag Berlin Heidelberg 2011
276
J.-Y. Park, J.-M. Yang, and R.-D. Oh
To do service for corresponding situation in a computer, it is essential to make an accurate Context Aware by using a sensor network. For Context Aware, more accurate situational information is required. Such situational information is influenced by sensing information of sensor nodes and metadata of sensor networks. Metadata are information on sensors in a sensor network, location information, and application service information. So, the USN middleware should be able to efficiently maintain sensor networks or sensor nodes and be provided in a Context Aware system. Using such meta-information, the Context Aware system makes it possible to easily extract or obtain only the information wanted from the USN Middleware where many heterogeneous sensor networks are connected complicatedly at the same time. The Meta-data management system heavily influences ontology composition and Context Aware to extract the semantic information of sensor network through metadata. And if the metadata is not correct, ontology and the accuracy of Context Aware are also faced with many problems. But the existing metadata model focuses on the representation of sensor network information but, shows low interest in effective Context Aware. This paper proposed and implemented not the metadata management system that provides the existing fragmentary information modeled on the wetlands Context Aware but the one that provides high levels of situational information in the form of XML through processing of metadata. This study largely consists of four parts: a. structure of the metadata management system b. types of situational information c. meta-data management scenario and system implementation information and finally d. conclusions and future directions [1,2,3].
2 Related Research 2.1 SensorML SensorML(Sensor Model Language) is a technology that is promoted by the working group of OGC(Open Geopatial Consortium), SWE(Wensor Web Enablement) in order to represent features and services in heterogeneous sensor networks. SensorML provides not only measured data from sensors but also commands to obtain specific information related to sensors in the form of an XML encoding. The figure below is the schema structure of SensorML. SensorML defines parameter, in-and-output, and methods by using elements named process, and models Actuator and Detector through process. However, SensorML is able to obtain not only measured data of sensor networks but high-level information like various services but such SensorML technology is faithful to the aim, ‘the combination of various sensor data’ but for the definition of Context Aware or metadata, in most cases it is more complex than the needs of a system. This paper, therefore, defines metadata in accordance with the national standards of Meta-data format in the form of being more appropriate for internal affairs [4,5] .
Metadata Management System for Wetland Environment Context-Aware
277
Fig. 1. SensorML
2.2 Context-Aware Computing A variety of objects present in the real world are always placed in a particular situation. The dictionary definition of such situation encompasses a very wide range of information. Although there are various definitions of the situation, it is commonly called as situation or condition around users between user and USN environment or as information of objects. The Context Aware aims to detect the changing situation and provide appropriate information and services with users or make the system itself change its status. The Context Aware, depending on response form, is classified into being active and being passive. The active Context Aware is, when new circumstances or changes of a situation occur, to be automatically configured to run this operation. But the passive Context Aware is, when new circumstances or changes of a situation occur, to provide related information with users or make searches be saved for the future. In Context Aware, the perspective to look at the Context Aware may differ, depending on the degree of coordination of user intervention. The first case configures the system to keep user intervention to a minimum, and the second case regards users’ input as part of the situation and has the subject of issuing orders to take specific actions outside of the system. Now mainly passive Context Aware system is being used and for active Context Aware, the development of reasoning technology is necessary but, still reasoning techniques for high level situation reasoning is not enough. For actual Context Aware to be serviced, a variety of techniques are applied. For example, situation modeling, situation reasoning, inference engines, service management are being used. Basically for using such technologies, situational information is collected based on accurate data model and such information becomes basic materials for reasoning as information like identification of primarily one-dimensional objects, place, and time. It has been considered as the most important part of exact Context Aware to deliver such information in an efficient manner. So this paper proposed a system that of course can gather or deliver efficient meta-data in USN environment, and provides primary Context Aware for efficient Context Aware [6,7,8].
278
J.-Y. Park, J.-M. Yang, and R.-D. Oh
3 Metadata Management System 3.1 Structure of Metadata Management System The meta-data management system for Context Aware proposed in this paper can largely be classified into two parts: a. server-side meta data management that manages static and dynamic meta-data of sensor network b. in-network meta data management that plays the role of storing, managing metadata, and delivering it to server-side middleware at In-Network. 3.1.1 Configuration of Sensor Nodes and Embedded Meta-data Management Design In-Network Metadata Management largely consists of Task Manager, Embedded Metadata Manager, Interface, Data Collector, and Flash Memory, and Embedded Metadata Manager periodically transmits by converting in the form of wireless message packet for metadata of sensor nodes to be transmitted to server at interface. As shown in [Fig. 2], In-Network Meta data management components are as follows.
Fig. 2. In-Network Metadata Management
Task Manager: This is a basic module for driving sensor nodes. It manages Process Task of sensor nodes, and also manages the operation of sensor nodes. Data Collector: It is a module that periodically obtains data from sensors of sensor nodes and stores the data into Flash Memory. It determines the acquisition cycle and form of sensing data. Interface: This configures message packet that is transmitted to server at a sensor node, and configures the standard of communication protocol and packet so that it can be organically communicated with other nodes. Flash Memory: This stores meta data configured at Sensing Data or Embedded Metadata Manager obtained from data collector and also stores necessary data for sensor node. Embedded Metadata Manager: In this module, it collects dynamic metadata such as the location of sensor nodes, provision of services, internal voltage, etc and on server’s request delivers to sensor node through Interface. This paper added Embedded Metadata Manager in order to exactly collect and manage meta data of sensor nodes.
Metadata Management System for Wetland Environment Context-Aware
279
3.1.2 Server-Side Meta-data Management Server-side Meta-data Management Module primarily consists of two modules: one is for storage and management of static metadata and the other for provision of metadata to Context Aware system. And modules to manage the form of metadata are also available. Each component will operate independently like in [Fig. 3].
Fig. 3. Server-side Metadata Management
Metadata Structure Editor: This modifies the structure of metadata and edits the existing form or creates a new structure. This module represents the structure of sensor network metadata in the form of intuitive tree so that administrators can easily add and delete. Static Metadata Console: The administrator input metadata of sensor network, and modifies or deletes the existing static metadata. But the static metadata also can be stored through how an administrator enters and collection at In-Network. Query Processor: This module aims to provide an appropriate metadata when converting data entered from sensor network or when metadata is requested at upperlevel service. This module provides data in the form of XML on the query requested from upper-level service in order to provide relevant metadata to sensor network. Metadata Storage Manager: This stores Metadata Collector, Static Metadata Console, Metadata Structure Editor, and Metadata provided at server side relational database, and such role is necessary because the structure of metadata consists of XML but, the database using XML is inconvenient to use and the use of the existing relational database provides high convenience and so components are required for effective storage to metadata database. Metadata Collector: This module effectively collects dynamic data of sensor network and provides metadata. Its main purpose is to raise metadata collection query at sensor network periodically and collect metadata in order to obtain the most recent metadata. XML GENERATOR: When metadata stored in the relational database, MetaDatabase is requested by upper-level module, Context Provider or query processor, it converts in the form of XML defined at Metadata Structure Editor. Context Provider: When sensor network’s situational information is requested at Context Aware system, it provides situational information in the form of XML in accordance with the specifications using XML GENERATOR. The most basic
280
J.-Y. Park, J.-M. Yang, and R.-D. Oh
situational information includes Identity, Location, Status, Time, etc. When sensor network’s situational information is requested at Context Aware system, it’s immediately provided. Metadatabase: It stores standardized sensor data and metadata specified at sensor network using relational database. The structure of server-side metadata management, as in Fig. 3, has three steps in total. The first layer consists of stratum to collect and edit static metadata and dynamic metadata and the second layer stores metadata to database and by the request of upper-level layer provides data in XML for processing. Finally, when query is requested at Context Aware system and by upper-level user, it is divided into two parts: a. conducing query b. delivering the response of lower system.
3.3 Wetlands Environment Situational Information In wetland environment, various environmental circumstances like eutrophication, algae alert, and algal bloom are present at each situation. In order to precisely recognize the various environmental situation, basic situational information is required but, in general the situational information is configured using metadata and sensing data at Context Aware system. And using such situational information, it proceeds to a upper-level reasoning like eutrophication or red tide. In this paper, it reduces the complexity of Context Aware system and uses specific information on wetland environments by providing basic configuration of primary situational information at metadata management system. One-dimensional situational information are location-based information, individual identification information, the status information of sensor nodes, and local state information But it uses four major situational information, configures situational information, and provides the situational information to upper-level Context Aware system for fast search and reasoning. In this system, the basic format of the metadata is organized according to national standards and the configuration of situational information takes the form of additional provision. Table 1. Situational information Situational information ID Location Status Time
Description Object Identifier Object’s Location Information Object’s Sensing Values and Object Information Hours of Situational information Provided
3.4 Context Aware through Metadata Management for Context Aware The flow of data and command for query request at upper-level Context Aware system of metadata collection is as in [Fig. 4].Three main commands are largely carried out at a system. First, it is the command for metadata collection periodically in progress. Second, it is the response to query for basic metadata. Finally, we can take the query for situational information.
Metadata Management System for Wetland Environment Context-Aware
281
Fig. 4. Metadata Management System Flow Chart
Meta Data Acquisition Flow For the collection of metadata of sensor nodes at In-Network, Metadata Management System uses Metadata Collection Administrator to request query in periodic In-Network. The sensor node of network that a question has been asked transmits each metadata to server’s meta data management system. And administrator defines the structure of metadata collected to XML GENERATOR and each information stores metadata to database through Metadata Storage Manager. Meta Data Query Flow Basic query on the metadata is a query processing system of fragmentary metadata numerical values or information that is basically required at Context Aware system. The metadata query processor provides metadata for information in XML using XML Generator for the query required at Context Aware system. Situational Information Query Flow For Situation Query Request, situational information for each object is requested at Context Aware system. The objects in USN environment are present variously in units like sensor node or sensor network. The information contained in situational information consists of four kinds of information and is provided in the form of XML. First, the Context Aware system does situational information request at situational query processor. The situational query processor request current status of XML format by XML Generator. And requested XML GENERATOR coordinates situational information at metadatabase and sensor database and sends to Context Provider. And Context Provide checks the Context Aware system requested and provides situational information.
4 Implementation and Application To implement metadata management system for wetland environment Context Aware, the system proposed in this paper, TinyOS-based Telos platform mott was used as sensor node. Software Development Platform was implemented using JDK1.6 and MySQL. In the metadata management system for wetland Context Aware, Metadata Structure Editor was implemented to efficiently edit the structure of metadata as
282
J.-Y. Park, J.-M. Yang, and R.-D. Oh
shown in [Fig. 5]. And it supports situational information editor like in [Fig. 6] to add data or parameter to situational information as well as basic structure of metadata for the purpose of changing the structure.
Fig. 5. Metadata Structure Editor (Metadata)
Fig. 6. Metadata Structure Editor (Context)
To figure out the correct operation of metadata management system implemented in this paper, as basic data necessary for Context Aware, materials for Chungju Lake were used. The operational process is as follows: First of all, sensor nodes installed in Chunju Lake proceed sensing periodically and transmits the sensing data to server. At this time, on a periodic basis server sends message requesting metadata. And the sensor node that received any message sends dynamic metadata information that is has to a server by radio. The metadata sent is stored into relational database by metadata management system. When the stored metadata is requested to a system for wetland Context Aware, first of all metadata query processor analyzes type of identifier of the metadata and type of metadata requested by Context Aware system and extracts data from meta database using the information analyzed. The extracted information is converted and stored in the form of XML to be efficiently used at web
Metadata Management System for Wetland Environment Context-Aware
283
or Context Aware system using XML GENERATOR. The metadata information converted like this is sent to web or Context Aware system. And metadata management system provides one-dimensional information as situational information to help the Context Aware system immediately recognize the situation. The one – dimensional situational information has object identifier, location and status values, and the most recent time values provided in the form of predefined XML. Such provision of situational information reduces unnecessary operation at Context Aware system and focuses on upper-level ontology and rule-based reasoning. [Fig. 7] represents GUI that supports the part of Static Metadata Console so that metadata statically typed in USN environment can be edited or added. This module uses ID of each identifier and proceeds searching to input data, and when a new identifier is entered, it uses New Identifier button, defines the type of identifier, and then generates.
Fig. 7. Static Metadata Console
[Fig. 8] shows the metadata provided in the form of XML when metadata is requested at Context Aware system. This file has a variety of information by each node or sensor network and sends information compacted to Context Aware system.
Fig. 8. Results XML
284
J.-Y. Park, J.-M. Yang, and R.-D. Oh
5 Conclusion This study implemented metadata management system with standardized metadata applied in order to reduce non-efficiency issues of sensor network resources caused by un-standardized interface and opaque identifier of heterogeneous sensor networks in USN environment. And for efficient data processing of Context Aware system, metadata management system provides basic situational information, allowing the Context Aware be processed much faster than in real time. In the future, further studies are needed for increasing user convenience in the existing metadata system and on situational information. And for efficient collection of metadata, which is the aim of metadata management system, further studies should be done on the metadata collection techniques at sensor network. Acknowledgments. This research was financially supported by the Ministry of Education, Science Technology (MEST) and Korea Institute for Advancement of Technology(KIAT) through the Human Resource Training Project for Regional Innovation.
References 1. Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: A survey on sensor networks. Computer Networks (2002) 2. Chen, G., Kotz, D.: A Survey of Context-Aware Mobile Computing Research, Dartmouth Computer Science Technical Report TR2000-38 (2000) 3. Kawashima, H., Hirota, Y., Satake, S., Imai, M.: Met: A Real World Oriented Metadata Management System for Semantic Sensor Networks. In: 3rd International Workshop on Data Management for Sensor Networks (DMSN), pp. 13–18 (2006) 4. OGC, SensorML Sensor Model Language SensorML Implementation Specification v1.0 (2007) 5. Robin, A., Havens, S., Cox, S., Ricker, J., Lake, R., Niedzwiadek, H.: OpenGIS c sensormodel language (SensorML) implementation specification. Technical report, Open Geospatial Consortium Inc. (2006) 6. TTAK.KO-06.0168/R1 (USN Metadata Specification) (2009) 7. Dey, A.K., et al.: A conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications. anchor article of a special issue on ContextAware Computing. Human-Computer Interaction (HCI) Journal 16 (2001) 8. Schilit, B.N., Adams, N., Want, R.: Context-aware computing applications. In: Proceedings of the Workshop on Mobile Computing System and Applications, pp. 85–90 (1994) 9. Dey, A.K.: Supporting the Construction of Context-Aware Applications. Dagstuhl seminar on Ubiquitous Computing (2001)
Toward Open World and Multimodal Situation Models for Sensor-Aware Web Platforms Yoshitaka Sakurai2 , Paolo Ceravolo1, Ernesto Damiani1 , and Setsuo Tsuruta2 1
2
Dipartimento di Tecnologie dell’Informazione Universit` a degli Studi di Milano Via Bramante 65, 26013 Crema (CR), Italia [email protected] School of Information Environment, Tokyo Denki University, Japan Inzai (Chiba Japan) [email protected]
Abstract. Context-aware computing is a well-established approach for customizing event-based reasoning to specific business or other operational situations. In this position paper we present an approach to enrich the expressive power of context-based system by including modalities, supporting hybrid reasoning. Also, our approach supports trust-based intelligent generation of sensor events.
1
Introduction
Context-Aware Computing is a computing paradigm where applications have the ability to interact with the user, learn from the user’s past behavior and actions and, autonomously, adapt to the current situational context. One of the emerging requirements for context-aware applications is the ability of perceive and comprehend their environment, thanks to the acquisition of multiple flows of sensor data. Early multimodal systems were based on joint recognition of active modes, such as speech and handwriting, for which there is now a large body of research work [1]. Context-based system are based on statically defined operational contexts; at each time, the system behavior is determined by the context which is active at that moment. Modeling these contexts is a time-honored problem; different types of logic based models have been proposed throughout the years [2]. Today, a new notion of context is emerging, where context changes are triggered by events generated on the basis of sensor data, like location, time and communication involving people other than the user. The easy availability of sensor information via the Web is making context-aware computing applicable to model the way service-oriented applications must operate. Complex Web-based services can now receive a huge amount of event notification based on sensor readings. Based on this information, Web service composition and operation must be continuously adapted to the situation at hand; but events reliability cannot always be accurately determined due to several kinds of imperfections (e.g., missing information, imprecision, uncertainty, unreliability of the source, T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 285–293, 2011. c Springer-Verlag Berlin Heidelberg 2011
286
Y. Sakurai et al.
etc.) In this position paper we outline our ongoing research program for improving situation modeling in context-aware system by using a syntax inspired to OMG SBVR standard, supporting modalities, hybrid reasoning and intelligent sensor fusion.
2
Research Goals and Requirements
Much of the seminal work on context-aware computing relied on information made available to context-aware systems by applications residing on the same host, e.g., user profile, host location, time of day etc. Later, a new notion of context came about including information made available via the global network, such as emails, chat messages and the like. Today, context selection is also affected by events generated by sensors and expressing concepts like location, distance, movement, identity, and others. In this new setting, the active context plus sensor-generated events define situations where decisions must be timely taken. Often, events have an associated uncertainty model, expressing approximation in the corresponding sensors’ readings or, more frequently, in their interpretation. Several ontology-based approaches have been developed [2] for supporting this new generation of contextaware systems. e.g. by describing places, moments in time and heir associated properties. Here, we argue that in this new setting ontology-based modeling and Prologstyle logic programming are not enough: a new hybrid declarative technique is needed for modeling situation awareness, identifying situations changes and triggering the appropriate context-based reasoning. Our research objective is then twofold: our first goal is to provide a model where each situation corresponds to a set of rules, generated by the human designer (henceforth modeler) via a (possibly visual) Rule Editor. Our second goal is providing a declarative format for rules suitable for powerful reasoning. satisfying the following three basic requirements [2]: – R1. High expressive power. A high expressive power is needed in order to provide most of the description capabilities of natural language. Business users are accustomed to using natural language syntax and would not accept any too severe limitation to the modeling language. – R2. Low complexity and completeness. The underlying expressive power must be carefully balanced against tractability and complexity of rule-based inference. Also, completeness must be ensured to guarantee that no potentially valid inferences are missed. – R3. Non-falsifiability. Situation selection should rely whenever possible on monotonic reasoning, i.e. on inference whose conclusions cannot be contradicted by simply adding new knowledge, in order to guarantee efficiency and avoid reasoning loops. The above three requirements have been often emphasized by business analysts as guidelines toward providing a correct interpretation of business models, but are only partially satisfied by current approaches [2].
Toward Open World and Multimodal Situation Models
3
287
Our Running Example
Let us consider the case of a Web-based car rental service as our example throughout this position paper. The car rental service receives information from customers, e.g. via a Web interface suitable for mobile terminals, as well as from sensors mounted at parking lots and on rental cars. It is easy to see that Web procedures for interacting with the car rental company service to get help in the case of a serious car accident must be substantially different (hopefully, much leaner and faster) from the ones needed to extend the rental. Still, estimating a car accident’s impact just by looking at untrusted information sources (e.g., a MMS message with a photo coming from a third party) will involve a certain amount of uncertainty.
4
Research Approach
In our research approach, different business situations will correspond to different sets of business rules. A default situation (e.g. called Normal ) will be active initially and a specific set of meta-rules, called transition rules, will specify under what conditions another situation (e.g., Emergency) should be activated. As our starting point for the rule syntax, we shall consider Business Rules (BRs). Business rules have been often used in the literature [3] as a starting point to develop business processes or as a way to validate them. Generally speaking, BRs define relevant domain entities, describe their properties/attributes and express the constraints on such properties that must hold in a given business situation. To fix our mind, let us consider a sample BR in natural language for our car rental service example. For someone to rent a car, U.S. law prescribes that she must show a driving license valid for at least 3 months. So, in the U.S. car rental business domain, a “driver” is someone whose license expiry date must occur at least 3 months after the rental date. Note that in a different situation (e.g., if the designated driver cannot drive the car due to an emergency) the (emergency) driver is just someone holding a valid driving license, regardless of its expiry date. The international standard organization Object Management Group (OMG) has since long released its standard Specification on Semantics of Business Vocabulary and Business Rules (SBVR) (http://www.omg.org), inserted in its wider Model Driven Architecture (MDA) vision. The purpose of SBVR is to describe formally and without ambiguities the semantics of a business model. The reason why we have chosen SBVR as a rule format suitable for our situational computing is twofold: 1. SBVR can be written as a Natural Formal Language, i.e. a formally specified subset of English, Italian or Japanese; the declarative format of SBVR rules lend itself to easy translation in First-Order-Logic (FOL). For example, the rule expressed in natural language as follows: “For class C car rentals, if the rental pick-up branch does not coincide with the drop-off one, then the rental
288
Y. Sakurai et al.
duration cannot be less than two days” becomes in parsed NFL “For each rental X, if X.class = C and X.pickup = X.dropoff, then days(X.duration)≥ 2”, that can be readily translated into a FOL formula, like: ∀x ∈ R : (x.c = C) ∧ (x.p = x.d) → days(x.d) ≥ 2
(1)
2. SBVR is an international OMG standard accepted worldwide. OMG members include many major U.S., European and Japanese companies and research institutions. 4.1
First Research Contribution: Adding Modalities
This contribution will address requirement R1 about expressive power, adding to our language an important aspect of BRs: their modality [4]. In modal logics, modals qualify the truth of a judgment. For example, if it is true that “Yuki is happy”, we might qualify this statement by saying that “Yuki is very happy,” in which case the term “very” would be a modality. Traditionally, there are three “modes” or “moods” or “modalities” represented by modal logic, namely, possibility, probability, and necessity. In our case, SBVR business rules can have three modalities: – Definition (also called alethic modality: a rule which is true by definition or design). In terms of our example: The sales tax rate for a rental is the sales tax rate at the pick-up branch of the rental on the drop-off date of the rental. – Business Rule (also called deontic modality: a rule which is true by obligation or prohibition) In terms of our example: It is required that the drop-off date of a rental precedes the expiration date on the drivers license of the customer reserving the rental. – Authorization (a rule which is true by special permission) Example: A customer reserving a rental may exceptionally provide a credit card in the name of another person if the credit card is charged on a corporate account authorized for use by the customer. Currently, much research work is being done internationally to provide a complete logic-based model for BR including these three modalities and supporting partial knowledge environments typical of Web applications. We plan to address these modalities in a straightforward way by mapping BRs deontic modalities Required and Prohibited into three classic modal operators obligatory (O), permitted (P)(used when no modality is specified in the rule) and forbidden (F). In principle, such modalities can be dealt with by adding customized meta-rules to enable reasoning. Modal rule evaluation can thus be carried out in the framework of inference engines like MOLOG [11], a general inference machine for building a large class of meta-interpreters. MOLOG handles Horn clauses (with conjunctions ∧ and implications →) qualified by modal operators. The user has the possibility to define his modal operators, defining a multi-modal language. In terms of our running example, here is a multi-modal rule in NFL: A customer reserving a rental must provide a driving licence and a credit card in his name.
Toward Open World and Multimodal Situation Models
289
The customer may exceptionally provide a credit card in the name of another person if the credit card is charged on a corporate account authorized for use by the customer. Using a multimodal engine like MOLOG, classical resolution is extended with modal resolution rules that define the operations that can be performed on the modal operators. The MOLOG user can either use an already existing set of resolution rules, or create his own set of rules. We also plan to explore an interesting alternative to modal reasoning: imposing restrictions over the language expressive power to “factor out” modalities at evaluation time. the simplest restriction one can think of is imposing that conditions in the antecedent of the same rule have uniform modality. This way modal operators can be factored out, obtaining chunks of rules of uniform modality, and reasoning can be performed using a conventional reasoner multiple times, and taking into account incrementally chunks of different modality. To clarify this notion, let’s consider two rules in terms of our example: – Rule A, modality (O): A customer reserving a rental must provide a driving license and a credit card in his name – Rule B: modality (P): A customer may provide a credit card in the name of another person if the credit card is charged on a corporate account authorized for use by the customer These rules satisfy our restriction, i.e. their antecedents contain conditions of uniform modality (respectively, (O) for rule A and (P) for rule B). Reasoning based on (O) rules only, we consider only rule A, that will lead to deny the rental in the event of a reservation request coming with the driver’s driving license and a corporate card in the name of another person. Repeating the reasoning taking account all rules, we will consider both rule A and rule B, leading us to granting the reservation. It is possible to model a situation like this as a policy evaluation conflict. Our research will provide a formalization of this notion and provide a policy-conflict handling model [12]. 4.2
Second Research Contribution: Dealing with Partial Knowledge
This contribution will address requirement R2 about improving reasoning efficiency and ensuring completeness. This requirement is twofold, as it addresses the two basic features of knowledge creation and sharing on the open web we mentioned at the beginning, i.e. (a) missing knowledge (on the web, no agent performing an inference can be sure of holding all relevant information) and (b) uncertainty (sensors have diverse reliability). Let us focus first on feature (a), which is very important for some classes of Web applications. We do not want a service to rely on available information considering it complete, while in fact it is not. By avoiding to carry out inferences based on incomplete knowledge we will also improve the system efficiency. Inferences based on incomplete knowledge often happen when information sources used by the service are not some trusted company databases, but the results of Web queries to foreign sites, which may be not up-to-date or simply unavailable. Also, this effect is amplified by modalities. In term of our car rental example,
290
Y. Sakurai et al.
suppose that the law prescribes that known offenders of traffic laws and regulations who were heavily fined for speeding in the last two years cannot rent a car. This will become a rule with obligatory (O) modality in our system. We propose that the business modeler can decide if he wants this rule processed with Closed World (CW) or Open World assumption (OW) [5]. Generally speaking OW assumption is introduced to prevent deriving negation from lack of knowledge. In turn, CW means that, if a person name’s not in the available list of offenders, that person will be considered as clean and the rental granted, regardless of how the list was put together. In the OW assumptions, instead, a person will be considered clean only if the list of offenders is provenly complete; incomplete lists will lead to rejecting the rental request until all data has been gathered. Note that choosing CW can lead to consequences for the rental company liability to damages claims (if the rented car has an accident, insurance may not pay damages if the driver is found out to be a known offender, and the car rental company will remain liable for them). When the modeler chooses to annotate a rule as OW, we will represent it using Description Logics (DL), which supports the Open World assumption and guarantees monotonic reasoning. DL is based on a subset of FOL providing i) a declarative formalism for representation and expression of knowledge and ii) sound, efficient reasoning methods. While using DL ensures tractability and completeness, and can improve efficiency, we will not use DL in all situations. Indeed, DL is NOT expressive enough to model all the possible semantics of business vocabularies. For this reason we are not entirely replacing ordinary IFTHEN rules with DL: when needed, some rules will be modeled using extended IF-THEN clauses with negation (Horn rules) and CW assumption. So, our situation models will be hybrid in nature, including both OW and CW rules. Our work will contribute to identify which aspects of business vocabularies and rules can be effectively modeled with DL, and which ones require closed world theorem provers or native Java code to enforce general FOL constraints. There are basically two ways to deploy our hybrid model [5]: (1) partitioning the rules and using multiple reasoners in parallel and (2) using a single reasoner, and adding meta-rules to handle the different reasoning types. The first technique is of course preferable when all the rules of each context or situation follow either the CW or OW assumption: this way, when a context is active, its corresponding reasoning assumption (CW/OW) will be selected. When individual rules of the same context need to be evaluated differently, the generation of meta-rules is a more practical solution. Our research will provide an algorithm for meta-rule generation based on the CW/OW reasoning specified by the business modeler for each BR. 4.3
Third Research Contribution: Trustworthiness of Events Based on Sensor Readings
This contribution will address to the maximum extent our requirement R3 about falsifiability. We will associate a level of trustworthiness (called Trust Value or
Toward Open World and Multimodal Situation Models
291
TV) to each event flow. Top level (TV = 1) will be the event of a signed message from the user; TV will be decreasing to lower values when events are generated as inference from sensor readings [6]; for instance the event user is scared has low TV when it is based only on pulse and blood pressure readings.TVs are assigned by the business modeler, but can be dynamically adjusted based on user feedback, for instance via an Implicit Voting Mechanism [7]. This will allow to address situations where users have different degrees of understanding and attitude [9], [10]. In our research, TVs will be used by a Situation Assessment module (SA) that will provide perceptual identification [8] of sensor evidenceto-event mapping. The SA will be based on an accretion/resolution approach. The information received from sensors in the system will be stored as evidence without interpretation - a step called accretion. Then, when predicates based on events need to be evaluated, the stored evidence is examined and resolved to generate such event. In some cases, interpretation does not involve any uncertainty. For instance when a rental car is returned, time and location of the return are stored as evidence. When evaluating rule (1), such timestamped evidence can be used to evaluate (crisply, with no uncertainty involved) a condition like (X.pickup = X.dropoff). The advantage of this approach is that it saves the system rom having to put effort into interpreting sensor data as it enters the system. In other cases, evidence interpretation will involve uncertainty.; in other words, our evidence-to-events mapping will involve a probability. For instance, if a video sensor is mounted on the rental car, the driver’s face can be easily grabbed and stored as evidence; but the event of the driver being the person designated in the rental contract (rather than, say, his twin brother) will only be known with a degree of uncertainty. In this case, we will use Bezier curves to estimate interpretation probabilities. The Bezier curves technique is well attested and widely used in computer security. In short, it consists in asking human experts at which stake (for instance 1:10) they would be prepared to bet a small but not irrelevant amount of money (say, 100 Euro) on a given outcome, knowing the initial situation of the system. Their answer is then taken as a subjective estimate of the probability of that outcome. In the case of our example, we would ask a medical doctor at which stake he would be prepared to bet 100 Euro that an adult person having a 110 heartbeat is scared, and will assign the answer as probability. This procedure can be repeated for 2 or more points, to be later interpolated using low-degree Bezier curves. Such curves are then used to convert events into the probability of the system being in a given state. Such methods are well attested in our previous research and, we argue, can support both evidence interpretation and context identification. In the latter case, our SA will guarantee that, given a 90% probability threshold for context identification, the system will only identify a context (e.g., the car accident one) only if available information sources provide values that correspond to 90% probability via the current Bezier mapping. Note that our TVs will also guarantee a weak version
292
Y. Sakurai et al.
on the falsifiability property, and contribute to satisfy requirement R3; namely, no backtrack can be caused by a lower probability event coming after a higher probability one. 4.4
Perspectives and Validation
Tackling the issues introduced above, it is expected that our research will provide a novel procedure for writing and processing context-aware systems. Namely, at deployment time and periodically, our system will: 1. Factor out modalities. 2. Express as much of each model as possible using a monotonic logic supporting the Open World Assumption like Description Logic (DL). 3. Express the rest as a conventional IF-THEN rule system (e.g., using Horn rules). At run time, our system will, 1. Identify the situation based on sensors at the desired level of uncertainty. 2. Carry out context-specific reasoning using (a) context based systems with native java code implementing the rules or (b) a monotonic DL reasoner and a Prolog engine in an integrated fashion. The body of descriptions and examples of EU-Rent, as well as other available models like EU-Stay and EU-Travel are fully-fledged SBVR models that may be freely used, providing their source is clearly acknowledged. So the EU set of SBVR models is now a reference and a common benchmark for the business rules research community. This gives us an opportunity for validating our approach using these industrial-strength, internationally acknowledged models.
5
Conclusions and Future Work
In this position paper we outlined a research program for improving situation modeling in context-aware system by using a highjly expressive syntax inspired to the SBVR standard, supporting modalities, hybrid reasoning and intelligent sensor fusion. This work is being carried out in the framework of a joint research between Prof.Tsuruta’s Lab at SEI, Tokyo Denki University, Japan and the Sesar Lab at DTI, Universit´ a Statale di Milano, italy.
Acknowledgements Authors acknowledge KAKENHI (21700244) support, Research Center for Advanced Technologies of Tokyo Denki University, Japanese Society for the Promotion of Science.
Toward Open World and Multimodal Situation Models
293
References 1. Colombo, A., Damiani, E., Gianini, G.: Discovering the software process by means of stochastic workfow analysis. Journal of Systems Architecture 52(11), 684–692 (2006) 2. Khosla, R., Damiani, E., Grosky, W.I.: Human-centered E-business. Kluwer, Dordrecht (2003) 3. Damiani, E., Ceravolo, P., Fugazza, C., Reed, K.: Representing and Validating Digital Business Processes. In: Proc. of the 3rd Intl. Conference on Web Information Systems and Technologies (WEBIST 2007). LNBIP, vol. 8. Springer, Heidelberg (2007) 4. Damiani, E., De Capitani, S., Fugazza, C., Samarati, P.: Modality conflicts in semantics aware access control. In: Proceedings of the 6th International Conference on Web Engineering (ICWE). ACM, New York (2006) 5. Ceravolo, P., Damiani, E., Fugazza, C.: Trustworthiness-related Uncertainty of Semantic Web-style Metadata: A Possibilistic Approach. In: Proceedings of the Third ISWC Workshop on Uncertainty Reasoning for the Semantic Web, Busan, Korea (2008) 6. Ceravolo, P., Damiani, E., Viviani, M.: Bottom-Up Extraction and Trust-Based Refinement of Ontology Metadata. IEEE Transactions on Knowledge and Data Engineering, TKDE 23 (2007) 7. Damiani, E., De Capitani Di Vimercati, S., Paraboschi, S., Samarati, P.: Managing and sharing servants’ reputations in P2P systems. IEEE Transactions on Knowledge and Data Engineering, TKDE 10 (2003) 8. Ceravolo, P., Damiani, E., Leida, M.: A framework for generic uncertainty management. In: Proceedings of IFSA-EUSFLAT, Lisbon, Portugal (2009) 9. Sakurai, Y., Takada, K., Hashida, S., Tsuruta, S.: Augmented Cyberspace Exploiting Real-time Biological Sensor Fusion. In: FLAIRS Conference (2009) 10. Sakurai, Y., Kinshuk, Graf, S., Zarypolla, A., Takada, K., Tsuruta, S.: Enriching Web Based Computer Supported Collaborative Learning Systems by Considering Misunderstandings among Learners during Interactions. In: ICALT 2009, pp. 306– 310 (2009) 11. Fari-as del Cerro, L., Herzig, A.: Metaprogramming through intensional deduction: some examples. In: Pettorossi, A. (ed.) META 1992. LNCS, vol. 649, Springer, Heidelberg (1992) 12. Dursun, T.: A generic policy-conflict handling model. In: Yolum, p., G¨ ung¨ or, T., ¨ G¨ urgen, F., Ozturan, C. (eds.) ISCIS 2005. LNCS, vol. 3733, pp. 193–204. Springer, Heidelberg (2005)
Design of Intelligence Mobile Cloud Service Platform Based Context-Aware Hyokyung Chang and Euiin Choi* Dept. of Computer Engineering, Hannam University, Daejeon, Korea [email protected], [email protected]
Abstract. Mobile Cloud Computing has become as a new IT paradigm because of the growth of mobile device like smartphone and appearance of Cloud Computing environment. This Mobile Cloud environment provides various services and IT resources according to users‘ requests, so an effective management of service and IT resources is required. Hence, this paper designs a context-aware knowledge Mobile Cloud service platform in order to provide distributed IT resources and services to users based on context-awareness information. As the context-aware knowledge Mobile Cloud service platform proposed in this paper uses context-aware information, it enables to provide more accurate presonalized services and manage distributed IT resources. Keywords: Cloud computing, Context-aware, Provisioning, Intelligence Service, Mobile cloud computing.
1 Introduction The market of mobile recently has been evolving rapidly and Cloud Computing is spreading into mobile as well. That is why Mobile Cloud Computing is becoming a new issue today. Cloud Computing is the computing that provides virtualized IT resources as a service by using Internet technology. In Cloud Computing, a user lends IT resources (software, storage, server, network) as needed, uses them, get a support of real-time scalability according to service load, and pays as he/she goes. Especially the Cloud Computing environment distributes distributed IT resources and allocates according to user’s request, so there should be a study on technology that manages these resources and deals with effectively[1]. Mobile Cloud Computing creates a new chance for IT industry because it allows the superiority and economic of Cloud Computing to meet the mobility and convenience of mobile and draws a synergy effect for both. Also Mobile Cloud Computing refers to an infrastructure that data storage and data processing is done outside mobile device by using Cloud Computing in the regardless of kinds of mobile devices. Mobile devices used in the mobile environment include personal information and enable to provide the environment that collects a variety of context-aware information. Users’ demand on service types suitable for the individual situation has been increasing. *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 294–298, 2011. © Springer-Verlag Berlin Heidelberg 2011
Design of Intelligence Mobile Cloud Service Platform Based Context-Aware
295
Therefore, context-aware reasoning technique has been studied to provide a suitable service for user by using user’ context and personal profile information in mobile environment[2-9]. In this context-aware system, a formal context model has to be provided to offer information needed by application as well as store context and manage. However, there are some technical constraints for this context-aware model to overcome because it itself cannot be applied to mobile platform due to limited device resources, so the study on intelligent mobile service in mobile platform is still insufficient. Recent interest related to Mobile Cloud is personal smartphone. The study on physical support like connecting smartphone to personal virtual system on Cloud and using computing resources unlimitedly is quite active, but the study on how to manage distributed IT resources effectively and provide intelligent mobile service through reasoning based on collected information and role as a medium of collecting context of mobile device is ignored. Therefore, this paper designs a context-aware knowledge Mobile Cloud service platform and develops in order to the optimized Mobile Cloud service through recognizing the conditions of user and Cloud server and reasoning on the basis of external context achieved from mobile or internal user’s personal information and information of resources from Cloud server and service use information.
2 Related Works Mobile platform is mainly referred to mobile middleware what lets users operate the optimized contents or service on mobile and it provided a formed interface to UI and service by using RTOS (Realtime OS) and hardware function. There are Windows Mobile, iPhone, Android, Symbian, etc. as these mobile platform. There are Context-aware information modeling techniques such as Key-value model, Markup scheme model, Graphical model, Object oriented model, and ontology based model which are used in the existing ubiquitous environment and Web environment. Ontology model, a Context-aware model which has been studied mostly recently, enables to express concepts and interactions easily. Recently ontology model has been studied lively related to Semantic Web study based on OWL(Web Ontology Language) and there is a movement to adapt ontology-based model in a variety of context-aware framework. One of the early methods of context modeling using ontology was proposed by Otzturk and Aamodt. Provisioning technology is referred to the technology related to activities and the procedure that prepares knowledge needed in advance to find the optimized resources among multiple resources and supplies by requests. It makes users and enterprises use the system by allocating, arranging and distributing IT infrastructure resources according to need of user or business. In particular, Storage provisioning identifies wasted or unused storage and drops it into a common pool. When the demand on storage is received, the administrator takes storage out form the common pool and provides to be used. It enables to construct infrastructure to heighten the efficiency of storage. Thin provisioning method and Hadoop provisioning method are the representative storage provisioning methods. Thin Provisioning method is the first announced technology by 3par storage which is a joint company of Sun, Oracle, and Symantec and is a management method of storage resources. It limited IT administrators to allocate the physical storage through virtualization. There are studies
296
H. Chang and E. Choi
on Hadoop provisioning technique: the technique that sets up software required simply or generates basic configuration files by loading data in advance and operates Hadoop work automatically and the technique [proposed by Karthik Kambatla, et al.] that compares the similarity among resources groups and performs provisioning according to the optimized resources group which has the closest similarity. Karthik proposed a provisioning technique based on signature. It is divided into RS Maximizer and RS Sizer. RS Maximizer calculates the optimized parameters for Hadoop work to use each resources group completely and RS Sizer decides the number of resources groups in order to maximize the performance and minimize the cost when each resources group is available. Signature based technique generates signatures of optimized resources group in advance, compares them to new signature group of resources required to operate, and allocates resources according to the most similar signature information. However, these provisioning techniques have a disadvantage that they do not consider important personal information like context information because of provisioning by using simple information of resources. Therefore, this paper proposes a context-aware knowledge Mobile Cloud service platform in order to manage resources more effectively by using personal context information and do modeling context-aware information in mobile platform and reason.
3 System Architecture In this paper, we suggested context-aware-based intelligence mobile cloud service platform for efficiently managing resource to use context-aware information.
Fig. 1. Suggested platform architecture
Design of Intelligence Mobile Cloud Service Platform Based Context-Aware
297
As shown figure 1, suggested system consisted of intelligence agent and intelligence middleware. Intelligence agent was responsible for understanding a variety of contextaware information and inferring it. And it consisted of sub-modules such as service module, context-aware preprocessor, personal profile, context-aware information modeling database. Intelligence middleware was responsible for providing services and efficiently managing IT resources by user’s request on mobile cloud computing. Context-aware preprocessor on intelligence agent included process for collecting context-aware information and modeling it, inferring context-aware information, and responsible for understanding what user’s situation was. Service module was responsible for sending context-aware information to intelligence middleware, providing services that suitable to user. Personal profile was repository which was stored personal information, such as service information by using user, user’s ID, password. Context-aware modeling database was stored to information which was modeled by using ontology. Intelligence middleware consisted of interaction interface for communicating to agent, resource manager, service manager, Service catalog, Provisioning Rule Database. Resource manager responsible for effectively allocating and managing service information was required for processing user’s request service, and consisted of monitoring module, provisioning module, and scheduler. Monitoring module crawled information of IT resource utilization. Provisioning module set up plan for providing best service to analyze contextaware information which was transferred by user and utilization information of IT resource. Scheduler was scheduled to utilization of service and resource by plan which was established to provisioning module. Service Catalog was stored service information for which user used, provisioning rule database was stored rule for providing best provisioning process to use contextaware information and utilization of resource. Also, service module was responsible for executing service and using distributed IT resource to providing service to user, and consisted of sub-module, such that synchronization module. Synchronization module responsible for synchronizing resource which user was using on cloud computing.
4 Conclusion and Future Work In this paper, a context-aware knowledge Mobile Cloud service platform was proposed to provide users with suitable services and manage resources effectively by using context information in the Mobile Cloud environment. The proposed platform by the paper is expected to help have the optimized personalized service and effective IT resources management in the Mobile Cloud environment. As a further research, so as to embody the proposed platform, studies are needed on the technique that extracts context information and does modeling, the resources management technique that manages distributed IT resources effectively by using context information, and the part that examines the performance and tests after embodying the actual platform proposed. Acknowledgments. This work was supported by Hannam University Research Fund, 2011.
298
H. Chang and E. Choi
References 1. Goyal, A., Dadizadeh, S.: A Survey on Cloud Computing, In: University of British Columbia Technical Report for CS 508 (2009) 2. Hess, C., Campbell, R.H.: An application of a context-aware file system. Pervasive Ubiquitous Computing 7(6) (2003) 3. Khungar, S., Riekki, J.: A Context Based Storage for Ubiquitous Computing Applications. In: Proceedings of the 2nd European Union Symposium on Ambient Intelligence, pp. 55– 58 (2004) 4. Mayrhofer, R.: An Architecture for Context Prediction, In: Trauner Verlag, Schriften der Johannes-Kepler-Universität Linz, Vol. C45 (2005) 5. Byun, H.E., Cheverst, K.: Exploiting User Models and Context-Awareness to Support Personal Daily Activities, In: Workshop in UM 2001 on User Modelling for ContextAware Applications (2001) 6. Byun, H.E., Cheverst, K.: Utilising context history to support proactive adaptation. Journal of Applied Artificial Intelligence 18(6), 513–532 (2004) 7. Sur, G.M., Hammer, J.: Management of user profile information in UbiData, In: University of Florida Technical Report TR03-001 (2003) 8. Biegel, G., Vahill, V.: A Framework for Developing Mobile, Context-aware Applications. In: IEEE International Conference on Pervasive Computing and Communications, PerCom (2004) 9. Gu, T., Pung, H.K., Zhang, D.Q.: A Middleware for Building Context-Aware Mobile Services. In: Proceedings of IEEE Vehicular Technology Conference, VTC (2004) 10. Asaro, T.: Report Thin Provisioning - Focus on 3PAR (2006), http://www.3par.com 11. http://code.google.com/intl/ko/googleapps/domain/ gdata_provisioning_api_v2.0_reference.html 12. http://www.ibm.com/developerworks/library/ws-wsht/?n-ws-1102 13. Amazon ElasticMapReduce, http://aws.amazon.com/elasticmapreduce/ 14. Cloudera, http://www.cloudera.com/ 15. Hadoop on demand, http://hadoop.apache.org/core/docs/current/ hod.html 16. Soundararajan, G., Amza, C.: Autonomic Provisioning of Backend Databases in Dynamic Content Web Servers. In: Proceedings of the 3rd IEEE International Conference on Autonomic Computing (2005) 17. Stein, S., Jennings, N.R., Payne1, T.R.: Flexible Provisioning of Service Workflows. In: 17th European Conference on Artificial Intelligence (ECAI 2006), Riva del Garda, Italy, August 28-September 1, pp. 295–299 (2006) 18. Keller, A., Badonnel, R.: Automating the Provisioning of Application Services with the BPEL4WS Workflow Language. In: Proceedings of the 15th IFIP/IEEE International Workshop on Distributed Systems: Operations & Management (2004) 19. Gottlieb, J., Greenberg, A., Rexford, J., Wang, J.: Automated Provisioning of BGP Customers. IEEE Network (2003) 20. Sanchez-Pi, N., Molina, J.M.: A multi-agent approach for provisioning of e-services in ucommerce environments. Internet Research 20(3), 276–295 (2010) 21. Kambatla, K., Pathak, A., Pucha, H.: Towards Optimizing Hadoop Provisioning in the Cloud. In: Proceedings of the 2009 Conference on Hot Topics in Cloud Computing (2009)
Similarity Checking of Handwritten Signature Using Binary Dotplot Analysis Debnath Bhattacharyya1 and Tai-hoon Kim2,* 1
Computer Science and Engineering Department, MPCT, Gwalior, India [email protected] 2 Department of Multimedia, Hannam University, Daejeon, Korea [email protected]
Abstract. In this paper we propose a statistical similarity checking method to identify the similarity between two handwritten signatures. Initially, we process the handwritten signature images in various image processing techniques and store in a magnetic storage device. In turn, during the time of similarity checking, our algorithm generates a binary dotplot matrix for each of the signature image in the storage with the input signature image of a particular user domain. In this paper, these binary dotplot matrices guide us to identify the degree of similarity or homology or duplicate copy of handwritten signature of any user. In this paper we deal with only binary dotplot similarity technique. Keywords: Pattern Recognition, bio-informatics, similarity, Image processing, binary dotplot.
1 Introduction Automatic handwritten signature verification is an active research area in the field of pattern recognition, because it has a potential of use in many areas concerned with security and access control. Normally, signature verification is done by manual visual inspection. But in the majority of applications where a handwritten signature is used, no or little verification takes place at all. Thus, automating this verification process in a reliable manner will assist to reduce fraud caused by forged signatures. A dot plot is a graphical method that allows the comparison of two biological sequences and identifies regions of close similarity between them [9]. It is a kind of recurrence plot. The simplest way to visualize the similarity between two protein sequences is to use a similarity matrix, known as a dot plot. These were introduced by Philips in the1970s and are two-dimensional matrices which have the sequences of the proteins being compared along the vertical and horizontal axes. For a simple visual representation of the similarity between two sequences, individual cells in the matrix *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 299–306, 2011. © Springer-Verlag Berlin Heidelberg 2011
300
D. Bhattacharyya and T.-h. Kim
can be shaded black if residues are identical, so that matching sequence segments appear as runs of diagonal lines across the matrix Some idea of the similarity of the two sequences can be gained from the number and length of matching segments shown in the matrix. Identical proteins will obviously have a diagonal line in the center of the matrix. In Table 1, two sequences of strings are plotted in dotplot technique. Here, both the strings are taken as similar strings. We mark the main-diagonal of the matrix (Table 1), we see all the elements through the matrix are “1”. Each horizontal element are matched with each vertical element, when similar put “1” in the cell else put “0”. Here we can say, “KOLKATA” string is just similar to “KOLKATA” string. In some case this concept is called diagonal match matrix. This concept is used generally, sequence alignment of protein, gene, etc., in bio-informatics applications. Table 1. A Binary Dotplot
K O L K A T A
K 1 0 0 1 0 0 0
O 0 1 0 0 0 0 0
L 0 0 1 0 0 0 0
K 1 0 0 1 0 0 0
A 0 0 0 0 1 0 1
T 0 0 0 0 0 1 0
A 0 0 0 0 1 0 1
Another example is shown in Table 2, where two strings are partially similar. Here, “TECHNICAL” and TECHNOLOGY” are partially similar, up to “TECHN” these are similar, as follow: TECHNICAL Similar sequences (Bold faces string) TECHNOLOGY Table 2. A Binary Dotplot not exactly similar
T E C H N O L O G Y
T 1 0 0 0 0 0 0 0 0 0
E 0 1 0 0 0 0 0 0 0 0
C 0 0 1 0 0 0 0 0 0 0
H 0 0 0 1 0 0 0 0 0 0
N 0 0 0 0 1 0 0 0 0 0
I 0 0 0 0 0 0 0 0 0 0
C 0 0 1 0 0 0 0 0 0 0
A 0 0 0 0 0 0 0 0 0 0
L 0 0 0 0 0 0 1 0 0 0
Similarity Checking of Handwritten Signature Using Binary Dotplot Analysis
301
Table 3. A Binary Dotplot for palindrome string
h a n n a h
h 1 0 0 0 0 1
a 0 1 0 0 1 0
n 0 0 1 1 0 0
n 0 0 1 1 0 0
a 0 1 0 0 1 0
h 1 0 0 0 0 1
In Table 3, dotplot showing identities between the palindromic sequence “hannah” and itself. The palindrome reveals itself as a stretch of matches perpendicular to the main diagonal, here in this Table 3. This is not just word play – regions in DNA or enzymes recognized by transcription regulators have sequence related to palindromes, crossing from one strand to other. Because a DNA sequence is double stranded, we read the base pairs not just the bases on one strand to determine a palindrome [11]. The restriction enzyme EcoR1 recognizes the following palindromic sequence: 5'- G 3'- C
A T
A T
T A
T A
C -3' G -5'
the top strand reads 5'-GAATTC-3', while the bottom strand reads 3'-CTTAAG-5'. If you flip the DNA strand over, the sequences are exactly the same (5’GAATTC-3' and 3'-CTTAAG-5'). Palindromic sequences play an important role in molecular biology. Many restriction endonucleases recognize specific palindromic sequences and cut them. Palindromic sequences may also be methylation sites. In this paper, concept has been taken from molecular biology and experimental result shows satisfactory conclusion. We are analyzing the nature of handwritten signature. This can be treated as biometric authentication and recognition. As per human nature, two signatures of a particular individual, cannot be exactly similar, 100% similarity cannot be obtained in case of handwritten signature. If it shows 100% similarity then the “copy of the signature” chance should be increased, even if for the same user. That experimental result also will be shown in this paper.
2 Previous Works Amac Herdagdelen, et. al, investigated [1] an on-line signature verification method, in which a special hardware such as a digitizing tablet or a touch sensitive pad was used to obtain the movement of the pen. The reliability of a signature verification technique measured by two factors that have varying relative weights depending on the area of application: False rejection rate (FRR) and false acceptance rate (FAR). Simon Baker and Shree K. Nayrar [2] developed a number of algorithms to cope with the demands of pattern rejection conditions. The algorithms achieved high efficiency by using pattern rejecters. A pattern rejecter was a generalization of a
302
D. Bhattacharyya and T.-h. Kim
classifier that quickly eliminates a large fraction of the candidate classes or inputs. After applying a rejecter, the recognition algorithms concentrated their computational efforts on verifying the small number of remaining possibilities. The generality of their algorithms was established through a close relationship with the KarhunenLoeve expansion [10]. Sequence matching techniques are effective for comparing two images and videos. Mei-Chen Yeh and Kwang-Ting Cheng [3] viewed video copy detection as a local alignment problem between two frame sequences and proposed a two-level filtration approach which achieved significant acceleration to the matching process. First, they proposed to use an adaptive vocabulary tree to index all frame descriptors extracted from the video database. Each video was treated as a “bag of frames”. Such an indexing structure not only offered a rich vocabulary for representing videos, but also enabled efficient computation of a pyramid matching kernel between videos. They also proposed a fast edit-distance-based sequence matching method that avoids unnecessary comparisons between dissimilar frame pairs. Stephen Few, 2006, explained [4] in his article that natural ability of human is to see patterns. Human eyes and brains work as a team to discover meaningful patterns that help us make sense of the world. Visual perception is a process that consists of multiple stages. Pattern perception is one of these stages. Colin Ware [5] of the University of New Hampshire explains the process as three sequential stages: a. Parallel processing to extract low-level properties of the visual scene; b. Pattern perception; and c. Sequential goal-directed processing. M. Desira and K.P. Camilleri, 2008, reviewed and explored [6] a method that learnt about the image structure directly from the image ensemble in contrast to other methods where the relevant structure was determined in advance and extracted using hand-engineered techniques. In tasks involving the analysis of image ensembles, important information was often detected in the higher-order relationships among the image pixels. Independent Component Analysis (ICA) method that learnt high-order dependencies found in the input. ICA has been extensively used in several applications but its potential for the unsupervised extraction of features for handwritten signature verification has not been explored. They investigated the suitability of features extracted from images of handwritten signatures using the unsupervised method of ICA to successfully discriminate between different classes of signatures. Jonathan Isaac Helfman [7] stated Dotplot technique in his paper for visualizing patterns of string matches in millions of lines of text and code. Patterns explored interactively or detected automatically. Applications include text analysis, author identification, plagiarism detection, translation alignment, etc. Dragomir Yankov, et. al., demonstrated [8] how dot plots could be used for the analysis and mining of real-valued time series data. They designed a tool that creates highly descriptive dot plots which allow one to easily detect similarities, anomalies, reverse similarities, and periodicities as well as changes in the frequencies of repetitions. Meenakshi S Arya, and Vandana S Inamdar, 2010, have studied [12] all the available methods of Handwritten Signature matching techniques and performance analysis also they have done by comparing all those methods.
Similarity Checking of Handwritten Signature Using Binary Dotplot Analysis
303
3 Our Work We divided our work in two distinct parts. Firstly, we created the monochrome bicolor image database for storing optimized signature images after the following processes: • • • • •
24-bit color image to gray image; Gary image to bi-color image; Scaling; Thinning; and Digitization.
Those above processes are not presented here in this paper. Here in this paper we propose the Similarity algorithm after converting the bi-color image to corresponding 2-Dimensional array. Rule sets are same for gray value coded image too. Generation of binary dotplot: The dotplot is a table or matrix. The row corresponding to residue of one sequence and column to the residue of other sequence. We can fix the row for input image (which is to be matched) and change the column every time each for stored image. We are considering same sized images for similarity checking and for that scaling to same size of all images have been performed in the first part of our work. Algorithm is stated as follows: Step01: Start Step02: Open input image in read mode (Image to be matched). Step03: Open training image in read mode from storage. Step04: Extract the size of image(s) (width and height), size will be same for both the images as they are scaled into same size). Step05: Compute, k = width x height. Step06: Create two single dimensional arrays, ‘Array1’ and ‘Array2’ of ‘k’ size each. Step07: Create a 2-Dimensinoal (Array) Matrix, Matrix1 of ‘k x k’ size. Step08: Read pixel value from ‘input image’ and ‘training image’ one by one and store into the corresponding location of ‘Array1’ and ‘Array2’, respectively, until ‘end-of-file’. Step09: Check all elements of ‘Array2’ for each element of ‘Array1’ when similar store ‘1’ in the corresponding location of ‘Matrix1’ else store ‘0’ in the corresponding location of ‘Matrix1’. Step10: Count total number of locations in main diagonal of ‘Matrix1’ (it should be, k). Step11: Count ‘n’, pair of matching Pixels in main diagonal (basically element value is ‘1’ in main diagonal) of ‘Matrix1’.
304
D. Bhattacharyya and d T.-h. Kim
Step12: Count ‘m’, , pair of un-matched Pixels in ma ain diagonal (basically element value is ‘0’ in ma ain diagonal) of o ‘Matrix1’. Step13: Compute, ‘f’ as the percentage of matching, th hat is, (n/k) x 100. Step14: Count, ‘cm’ (Continuous matching) pairs. Step15: Count, ‘um’ (Continuous un-matched) pairs. Step16: Stop.
4 Result and Discusssion The stated algorithm generates a binary dotplot Matrix from two input sequencces, each for every training imaage. Sequences are generated from corresponding imagges. Each sequence can be thou ught as a binary chromosome with ‘0’/ ‘1’ or values ‘0255’(stands for 8-bit gray pixel p value) embedded on it, and that also can be thoughht as ‘genes’. As a result, it generates a long data sequence resembles to chromosome, Figure 1, shows the conceptt of image to sequence conversion.
Gray Image Example maatrix (sampling) 10 05 16 68 210 68 8 50 0
168 0 220 47 30
210 254 128 68 167
68 251 128 26 168
50 178 128 211 178
Converted to Single Dim mensional Matrix 105
168
210
----------------------------------------
167
168
1778
F 1. Image to sequence conversion Fig.
Score of similarity is ach hieved from the binary dotplot Matrix (for the input im mage with all training images onee by one) by the following formula: Double, f = 100, wh here f is the score, n is the number of matched pair of pixxels from 2 sequences and k is th he number of elements in the main diagonal of matrix. ∑ Continuous matching =∑ , when elements of main diagonal of Matriix is ‘1’. Continuous dissimilarity y =∑ , when elements of main diagonal of Maatrix is ‘0’.
Similarity Checking of Handwritten Signature Using Binary Dotplot Analysis
Ratio of similarity =
305
/
Table 4 shows the scores for 15 stored images with one input image: Size of Images: 44 x 63 = 2772 Each. Total number of locations in main diagonal of binary DotPlot: 2772 Each row of the Table 4 shows the similarity result of one fixed input image with one training image. Table 4. Output after implementation
Sign1.bmp Sign2.bmp Sign3.bmp Sign4.bmp Sign5.bmp Sign6.bmp Sign7.bmp Sign8.bmp Sign9.bmp Sign10.bmp Sign11.bmp Sign12.bmp Sign13.bmp Sign14.bmp Sign15.bmp
Pair of matching pixels in main diagonal 2772 2764 2760 2759 2763 2754 2754 2607 2746 2743 2772 2764 2757 2744 2731
Pair of unmatched Pixels in main diagonal 000 008 012 013 009 018 018 165 016 029 000 008 015 028 041
Similarity (%) 100.00 99.71 99.56 99.53 99.67 99.00 99.00 94.00 99.00 98.95 100.00 99.71 99.45 98.98 98.52
From the Table 4 and experimental result it is clear that only Signature copy can produce 100% similarity, as per the behavior of human signature. Here in this experiment, we copied Sign1.bmp and Sign11.bmp from the input image and in both the cases it shows 100% similarities. So, this can be conclude that the copy detection can be achieved and if similarity is 100% then reject it. Performance of this technique is given in Table 5 [12]. Table 5. Performance analysis
Technique Hidden Markov models approach Neural networks approach (Hough transform) Statistical approach Structural or syntactic approach Wavelet-based approach Principal Component Analysis Our Approach
Success Rate 95.15% 95.24% 96% - 97% 97% 93% - 94% 98.50% 98% - 99%
306
D. Bhattacharyya and T.-h. Kim
5 Conclusion In this paper we propose a new technique for similarity matching of handwritten signature. Our experiment is based on off-line signature matching, here in this case. However, this technique is well suitable for on-line signature matching technique too. The concept has been adopted from the bio-informatics point of view. Result in this experiment also supports the performance of this technique we propose.
References 1. Herdagdelen, A., Alpaydm, E.: Dynamic Alignment Distance Based Online Signature Verification. In: The 13th Turkish Symposium on Artificial Intelligence & Artificial Neural Networks, Izmir, Turkey, June 10-11, pp. 119–127 (2004) 2. Baker, S., Nayrar, S.K.: Algorithms for Pattern Rejection. In: The 13th International Conference on Pattern Recognition, Vienna, August 25-29, vol. 2, pp. 869–874 (1996) (Current version: August 06, 2002) 3. Yeh, M.-C., Cheng, K.-T.: Video Copy Detection by Fast Sequence Matching. In: The ACM International Conference on Image and Video Retrieval (CIVR 2009), Santorini, Fira, Greece, July 8-10, Article No. 45 (2009) 4. http://www.perceptualedge.com/articles/Whitepapers/ Visual_Pattern_Rec.pdf (last accessed on September 17, 2010) 5. Ware, C.: Information Visualization: Perception for Design, 2nd edn., pp. 20–22. Morgan Kaufmann Publishers, San Francisco (2004) 6. Desira, M., Camilleri, K.P.: Handwritten signature verification by independent component analysis. In: First National ICT Workshop, WICT 2008, Malta, November 17-18 (2008) 7. Helfman, J.I.: Similarity Patterns in Language. In: IEEE Symposium on Visual Languages, St. Louis, MO, October 4-7, pp. 173–175 (1994) (Current version: August 06, 2002) 8. Yankov, D., Keogh, E., Lonardi, S., Fu, A.W.: Dot Plots for Time Series Analysis. In: 17th IEEE International Conference on Tools with Artificial Intelligence, Hong Kong, November 16, pp. 159–168 (2005) 9. http://en.wikipedia.org/wiki/Dot_plot_bioinformatics (last accessed on September 17, 2010) 10. http://en.wikipedia.org/wiki/Karhunen%E2%80%93Lo%C3%A8ve_theorem (last accessed on September 17, 2010) 11. http://en.wikipedia.org/wiki/Palindromic_sequence (last accessed on September 18, 2010) 12. Arya, M.S., Inamdar, V.S.: A Preliminary Study on Various Off-line Hand Written Signature Verification Approaches. International Journal of Computer Applications (0975-8887) 1(9), 55–60 (2010)
Brain Tumor Detection Using MRI Image Analysis Debnath Bhattacharyya1 and Tai-hoon Kim2,* 1
Computer Science and Engineering Department, MPCT, Gwalior, India [email protected] 2 Department of Multimedia, Hannam University, Daejeon, Korea [email protected]
Abstract. Brain tumor is the most commonly occurring malignancy among human beings, so study of brain tumor is important. In this paper, we propose an image segmentation method to indentify or detect tumor from the brain magnetic resonance imaging (MRI). There are many thresholding methods developed but they have different result in each image. So we need a method by which detection of tumor can be done uniquely. In this paper we propose a set of image segmentation algorithms which gives a satisfactory result on brain tumor images. Keywords: Segmentation, edge detection, threshold, image processing.
1 Introduction Brain tumor is one of the major causes for the increase in mortality among children and adults. A tumor is a mass of tissue that grows out of control of the normal forces that regulates growth (Pal and Pal, 1993). The complex brain tumors can be separated into two general categories depending on the tumors origin, their growth pattern and malignancy. Primary brain tumors are tumors that arise from cells in the brain or from the covering of the brain. A secondary or metastatic brain tumor occurs when cancer cells spread to the brain from a primary cancer in another part of the body. Most Research in developed countries show that the number of people who develop brain tumors and die from them has increased perhaps as much as 300 over past three decades. The National Brain Tumor Foundation (NBTF) for research in United States estimates that 29,000 people in the U.S are diagnosed with primary brain tumors each year, and nearly 13,000 people die. In children, brain tumors are the cause of one quarter of all cancer deaths. The overall annual incidence of primary brain tumors in the U.S is 11 - 12 per 100,000 people for primary malignant brain tumors, that rate is 6 - 7 per 1,00,000. In the UK, over 4,200 people are diagnosed with a brain tumor every year (2007 estimates). There are about 200 other types of tumors diagnosed in UK each year. About 16 out of every 1,000 cancers diagnosed in the UK are in the brain (or 1.6%). In India, totally 80,271 people are affected by various types of tumor (2007 estimates). Artificial Neural Networks (ANNs) are mathematical analogues of biological neural systems, in the sense that they are made up of a parallel interconnected system *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 307–314, 2011. © Springer-Verlag Berlin Heidelberg 2011
308
D. Bhattacharyya and T.-h. Kim
of nodes, called neurons. The parallel action is a difference between von Neumann computers and ANNs. Combining ANN architectures with different learning schemes, results in a variety of ANN systems. The proper ANN is obtained by taking into consideration the requirements of the specific application, as each ANN topology does not yield satisfactory results in all practical cases. The evolution of digital computers as well as the development of modern theories for learning and information processing led to the emergence of Computational Intelligence (CI) engineering. Artificial Neural Networks (ANNs), Genetic Algorithms (GAs) and Fuzzy Logic are CI non-symbolic learning approaches for solving problems. The huge mass of applications, which ANNs have been used with satisfactory results, has supported their rapid growth. Fields that ANNs were used are image processing, environmental problems, Climate study, and financial analysis. In this paper, a new technique is implemented to extract the suspicious region in the Segmentation of MRI Brain tumor. So, this detection technique can be useful for further consideration of medical practitioners.
2 Previous Works In 2004, M. C. de Andrade, introduced an interactive algorithm for image smoothing and segmentation. A non-linear partial differential equation was employed to smooth the image while preserving contours. The segmentation was a regiongrowing and merging process initiated around image minima (seeds), which were automatically detected, labeled and eventually merged. The user places one marker per region of interest. Accurate and fast segmentation results were achieved for gray and color images using this simple method [1]. Rajeev Ratan, Sanjay Sharma, S. K. Sharma, 2009, have developed [2] a brain tumor segmentation method and validated segmentation on 2D & 3D MRI Data. This method did not require any initialization while the others require an initialization inside the tumor. The visualization and quantitative evaluations of the segmentation results demonstrate the effectiveness of this approach. In this study, after a manual segmentation procedure the tumor identification, the investigations has been made for the potential use of MRI data for improving brain tumor shape approximation and 2D & 3D visualization for surgical planning and assessing tumor. Surgical planning now uses both 2D & 3D models that integrate data from multiple imaging modalities, each highlighting one or more aspects of morphology or functions. Firstly, the work was carried over to calculate the area of the tumor of single slice of MRI data set and then it was extended to calculate the volume of the tumor from multiple image MRI data set. S. Shen, et al, 2003, proposed [3], a new brain tumour diagnostic procedure using magnetic-resonance imaging (MRI) only. First, the MR images were preprocessed, using standardizing, non-brain removal and enhancement. Second, an improved fuzzy clustering algorithm was applied to segment the brain into different tissues. Finally, brain tumour diagnosis was performed using fuzzy logic based genetic programming (GP) to search for classification rules. Classification results on a variety of MR images for different pathologies, indicated that the technique was promising. Carles Arus, et al, 2006, introduced [4] HealthAgents, an EC-funded research project to improve the classification of brain tumours through multi-agent decision
Brain Tumor Detection Using MRI Image Analysis
309
support over a distributed network of local databases or data marts. HealthAgents developed new pattern recognition method for a distributed classification and analysis of in vivo MRS and ex vivo/in vitro HRMAS and DNA data, and also defined a method to assess the quality and usability of a new candidate local database containing a set of new cases, based on a compatibility score. Cigdem Demir, et al, 2005, presented [5] a graph-based representation (a.k.a., cellgraph) of histopathological images for automated cancer diagnosis by probabilistically assigning a link between a pair of cells (or cell clusters). Since the node set of a cell-graph could be included a cluster of cells as well as individual ones, it enables working with low-cost, low-magnification photomicrographs. The contributions of that work were twofold. First, it was shown that without establishing a pairwise spatial relation between the cells (i.e., the edges of a cell-graph), neither the spatial distribution of the cells nor the texture analysis of the images yields accurate results for tissue level diagnosis of brain cancer called malignant glioma. Second, the work defined a set of global metrics by processing the entire cell-graph to capture tissue level information coded into the histopathological images. In that work, the results were obtained on the photomicrographs of 646 archival brain biopsy samples of 60 different patients. It was shown that the global metrics of cell-graphs distinguish cancerous tissues from noncancerous ones with high accuracy (at least 99 percent accuracy for healthy tissues with lower cellular density level, and at least 92 percent accuracy for benign tissues with similar high cellular density level such as nonneoplastic reactive/inflammatory conditions). G. Farias, et al, 2008, prposed [6] a synergy of signal processing techniques and intelligent strategies was applied in order to identify different types of human brain tumours, so that to help to confirm the histological diagnosis. The wavelet-SVM (support vector machine) classifier merged wavelet transform to reduce the size of the biomedical spectra and to extract the main features, with SVM to classify them. The influence of some of the configuration parameters of each of those techniques on the clustering was analysed. The classification results were promising specially taking into account that medical knowledge was not considered.
3 Our Work Three separate algorithms are developed, implemented and tested with available brain tumor MRI images. These algorithms are stated hereunder: 3.1 24-Bit Color Image to 256 Gray Color Image open 24-bit color image in read mode store image height to m and width to n, respectively declare two 2-D Matrices, namely, matrix[m][n], matrix1[m][n] read pixel one by one until end of file { matrix1[i][j] = bi.getRGB(i,j) b1[i][j]= (bi.getRGB(i,j)) & 0x000000FF g1[i][j]= (bi.getRGB(i,j) & 0x0000FF00) >> 8 r1[i][j]= (bi.getRGB(i,j) & 0x00FF0000) >> 16
310
D. Bhattacharyya and T.-h. Kim
int avg = (r1[i][j]+g1[i][j]+b1[i][j])/3 int newRGB = 0xFF000000 + (avg << 16) + (avg << 8 ) + avg matrix[i][j] = newRGB } write matrix[m][n] to another image file close all the files In this algorithm, the 24-bit pixel is separated into 3 bytes. Then the new value is computed and stored in such a way that all the red, green and blue values have been given the perfect weightage for each 24-bit pixel. 3.2 Image Segmentation Algorithm open 256 gray color image in read mode store image height to m and width to n, repectively declare a 2-D Matrix, matrix[m][n] read pixel one by one until end of file { matrix[i][j] = bi.getRGB(i,j) if matrix[i][j]<128 then matrix[i][j] = 0 } write matrix[m][n] to another image file close all the files This algorithm is completely depends on the algorithm A. Here average threshold value is considered, because the grey value is claculated in that way. Here threshold is taken as 128. 3.3 Edge Detection Algorithm Open resultant image file from Algorithm B store image height to m and width to n, repectively declare two 2-D Matrices, namely, matrix[m][n], matrix1[m][n] declare integer, a1=0, a2=0, a3=0, a4=0, a5=0 read pixel one by one until end of file for(int i=0; i<m;++i) { for(int j=0;j
Brain Tumor Detection Using MRI Image Analysis
311
for(int c = 1; c < n-1; c++) { if((((matrix[r-1][c] == a1 ) && (matrix[r][c-1] == a2 )) && ((matrix[r-1][c] == a4) && (matrix[r][c+1] == a2 ))) && (((matrix[r+1][c] == a4) && (matrix[r][c-1] == a2 )) && ((matrix[r+1][c] == a4) && (matrix[r][c+1] == a2 ))) && (matrix[r][c] == a5)) matrix1[r][c] = 255; } } write matrix1[m][n] to another image file In this algorithm, each and every pixel is checked with its neighbor pixels. If all neighbors are black or 0 then make the pixel white or 255. We can say the concern pixel will die.
4 Result Output of the Algorithm A is shown in Fig. 2, which is extracted from Fig. 1.Here Fig. 1 and Fig. 2 are visually identical, but histograms are showing the clear differences between Fig. 1 and Fig. 2 in Fig. 6 and Fig. 7 respectively. Fig. 3 shows the image thresholding, where tumor in MRI image is clearly detected, but visually. For further consideration we convert this thresholded image into monochrome image. Here also the differentiation can be obtained by comparing the histograms presented in Fig. 8 and Fig. 9 for the images in Fig. 3 and Fig. 4 respectively. Point can be noted for Fig. 3, where more pixel values only 252 and 255 are obtained, which are more than the threshold value 128, what we fixed. Resultant output of edge detection algorithm is shown in Fig. 5. Here clear location and shape is detected, Fig. 5.
Fig. 1. 24-bit Color Gray Image (I.bmp)
312
D. Bhattacharyya and T.-h. Kim
Fig. 2. 256-Color Gray Image (Result1.bmp)
Fig. 3. Image Thresholding (Result2.bmp)
Fig. 4. Monochrome Image (Result3.bmp)
Brain Tumor Detection Using MRI Image Analysis
313
Tumor
Fig. 5. Edge Detection: Clearly Identified Tumor Only (Result4.bmp)
Fig. 6. Histogram of 24-bit Color Image
Fig. 7. Histogram of 256 Colour Image
Fig. 8. Histogram analysis identifies only 3 colors in Result2.bmp image
314
D. Bhattacharyya and T.-h. Kim
Fig. 9. Histogram analysis identifies only 2 colors in Result3.bmp image
5 Conclusion In this paper, we propose the tumor detection algorithm for further consideration of medical practitioners. The results depict that automatic detection of brain tumor can be done more confirmly from the MRI images in comparisons to other tumor detection systems available in the market. Further improvements also can be proposed with huge amount of MRI image data if available. Here we have used 12 numbers of brain MRI images for our experiment; one such example is presented here in this paper to make the process and technique understandable.
References 1. de Andrade, M.C.: An Interactive Algorithm for Image Smoothing and Segmentation. Electronic Letters on Computer Vision and Image Analysis 4(1), 32–48 (2004) 2. Ratan, R., Sharma, S., Sharma, S.K.: Multiparameter Segmentation & Quantization of Brain Tumor from MRI images. ISEE-IJST Journal 3(1) (March 2009) 3. Shen, S., Sandham, W.A., Granat, M.H., Dempsey, M.F., Patterson, J.: A new approach to brain tumour diagnosis using fuzzy logic based genetic programming. In: International Conference of the IEEE Engineering in Medicine and Biology Society, Piscataway, NJ, September 17-21, pp. 870–873 (2003) 4. Arus, C., Celda, B., Dasmahaptra, S., Dupplaw, D., Gonzalez-Velez, H., Huffel, S.V., Lewis, P., Lluch i Ariet, M., Mier, M., Peet, A., Robles, M.: On the Design of a Web-Based Decision Support System for Brain Tumour Diagnosis Using Distributed Agents. In: Web Intelligence and Intelligent Agent Technology Workshops, Hong Kong, December 18-22, pp. 208–211 (2006) 5. Demir, C., Humayun Gultekin, S., Yener, B.: Learning the Topological Properties of Brain Tumors. IEEE/ACM Transactions on Computational Biology and Bioinformatics 2(3), 262– 270 (2005) 6. Farias, G., Santos, M., Lopez, V.: Brain tumour diagnosis with Wavelets and Support Vector Machines. In: International Conference on Intelligent System and Knowledge Engineering, Xiamen, November 17-19, pp. 1453–1459 (2008)
Image Data Hiding Technique Using Discrete Fourier Transformation Debnath Bhattacharyya1 and Tai-hoon Kim2,* 1
Computer Science and Engineering Department, MPCT, Gwalior, India [email protected] 2 Department of Multimedia, Hannam University, Daejeon, Korea [email protected]
Abstract. In this paper a novel technique, Discrete Fourier Transformation based Image Authentication has been proposed to authenticate an image and with its own application one can also transmit secret message or image over the network. Instead of direct embedding a message or image within the source image, choosing a window of size 2 x 2 of the source image in sliding window manner then convert it from spatial domain to frequency domain using Discrete Fourier Transform (DFT). The bits of the authenticating message or image are then embedded at LSB within the real part of the transformed image. Inverse DFT is performed for the transformation from frequency domain to spatial domain as final step of encoding. Decoding is done through the reverse procedure. The experimental results have been discussed and compared with the existing steganography algorithm S-Tools. Histogram analysis and Chi-Square test of source image with embedded image shows the better results in comparison with the S-Tools. Keywords: Data Hiding, Authentication, Frequency Domain, Discrete Fourier Transformation (DFT), Inverse Discrete Fourier Transform (IDFT), S-Tools.
1 Introduction The most popular technique for image authentication or steganographic technique is embedding message or image within the source image, generally termed data hiding which also provides secret message transmission over the communication channel. Moreover several techniques are available for secret message transmission by hiding a message inside an image without changing its visible properties but the source image may be changed. Instead of direct embedding message or image within the source image, the embedding is done in the frequency domain. The presented work deals on information and image protection against unauthorized access in frequency domain. The frequency domain is the domain where the analog picture of continuous signal resides. A picture in the spatial domain can be described *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 315–323, 2011. © Springer-Verlag Berlin Heidelberg 2011
316
D. Bhattacharyya and T.-h. Kim
as a collection of pixel values describing the intensity values. The DFT changes an N point input signal into two point output signals. The input signal contains the N/2 –1 signal being decomposed, while the two output signals contain the amplitudes of the component sine and cosine waves. The input signal is said to be in the time domain. This is because the most common type of signal entering the Discrete Fourier Transformation (DFT) is composed of samples taken at regular intervals of time. Any kind of sampled data can be fed into the DFT, regardless of how it was acquired. The frequency domain signal is represented by a vector F[u,v], and consists of two parts, for each of the samples. These are called the Real part of F[u,v] written as: ReF[u,v], and the Imaginary part of F[u,v], written as: ImF[u,v]. In the sample "real part" means the cosine wave amplitudes while "imaginary part" means the sine wave amplitudes. The basic formula of DFT for a function f (x, y) of size M x N is given in equation 1 for frequency domain transformation.
For the cause of the proposed algorithm the simpler from of equation 1 is as given in equation 2
where the ReF( u ,v) and ImF( u ,v) is given in equation 3 and 4.
Image Data Hiding Technique Using Discrete Fourier Transformation
317
Similarly inverse discrete Fourier transformation, where the frequency domain gets converted to the spatial domain, digital image may be written as in equation 5.
Considering the real parts of the transformed image and embedding authenticating message or image bits at the LSB of each source pixel (exclusive the 1st pixel). After embedding, the embedded image is converted into spatial domain by using IDFT for transmitting over the network. The technique provides more security as embedding the message or image has been done by considering a window of the source image in sliding window manner and then transforming into frequency domain.
2 Earlier Works Nameer N. EL-Eman in April, 2007, implemented an algorithm based on hiding a large amount of data (image, audio, text) file into color BMP image. They have been used adaptive image filtering and adaptive image segmentation with bits replacement on the appropriate pixels. These pixels are selected randomly rather than sequentially by using new concept defined by main cases with their sub cases for each byte in one pixel [1]. Palak K. Amin, Ning Liu, K. P. Subbalakshmi in 2005, described a discrete cosine transform (DCT) based spread spectrum data-hiding algorithm that provides statistical security [2]. R. Chandramouli, N. Memon in 2001, considered some specific image based steganography techniques and shown that an observer can indeed distinguish between images carrying a hidden message and images which do not carry a message [3]. S. Dumitrescu, Xiaolin Wu and Zhe Wang in 2003 introduced an approach to detecting LSB steganography in digital signals. They shown that the length of hidden messages embedded in the LSB of signal samples can be estimated with relatively high precision. That approach was based on some statistical measures of sample pairs that are highly sensitive to LSB embedding operations [4]. Brian Chen, Gregory W. Wornell in 2001 described the problem of embedding one signal (e.g., a digital watermark), within another “host” signal to form a third, “composite” signal [5]. Moulin, P. O'Sullivan, J.A. in 2000 analyzed Information hiding as a communication game between an information hider and an attacker, in which side information is available to the information hider and to the decoder. They derived several Capacity formulas [6].
318
D. Bhattacharyya and T.-h. Kim
Pierre Moulin, M. Kıvanç Mıhçak in 2002 described an information-theoretic model for image watermarking and data hiding. Some recent theoretical results been used to characterize the fundamental capacity limits of image watermarking and data-hiding systems. Capacity was determined by the statistical model used for the host image, by the distortion constraints on the data hider and the attacker, and by the information available to the data hider, to the attacker, and to the decoder. They considered autoregressive, block-DCT and wavelet statistical models for images and compute data hiding capacity for compressed and uncompressed host-image sources [7]. Ching-yung Lin and Shih-fu Chang in 1998 described a different goal from that of image watermarking which embeds into the image a signature surviving most manipulations. They described an effective technique for image authentication which can prevent malicious manipulations but allow JPEG lossy compression. The authentication signature was based on the invariance of the relationship between DCT coefficients of the same position in separate blocks of an image [8]. Pavan, S. Sridhar, G. Sridhar, V. in 2005 proposed a hybrid image registration algorithm to identify the spatial or intensity variations between two color images. The proposed approach extracts salient descriptors from the two images using a multivariate entropy-based detector. The transformation parameters are obtained after establishing the correspondence between the salient descriptors of the two images [9]. H.H. Pang, K.L. Tan, X. Zhou, in 2004 introduced StegFD, a steganographic file driver that securely hides user-selected files in a file system so that, without the corresponding access keys, an attacker would not be able to deduce their existence. They proposed two schemes for implementing steganographic B-trees within a StegFD volume [10].
3 Our Work The presented work emphasizes on information and image protection against unauthorized access in frequency domain. The DFTIAT uses gray scale image of size (M x N) to be authenticated .The technique inserts authenticating message or image Xm,n of size (M/2*N/2*3)-16 bits (maximum) as the first 16 bit holds the dimension of the file. DFT given in equation-1 is used to transform the image from spatial domain to frequency domain. The encoding and decoding scheme is given in fig1 and fig2 respectively.
Fig. 1. Encoding scheme using DFTIAT
Image Data Hiding Technique Using Discrete Fourier Transformation
319
Fig. 2. Decoding scheme using DFTIAT
The details algorithm for insertion and extraction is described in section 3.1 and 3.2. 3.1 Insertion Algorithm a. Take a message file or image whose size is less than or equal to (M/2*N/2*3)16 bits where M x N is the size of the cover image. b. Take 2 x 2 window of the cover image in sliding window manner and repeat step 3 and 4 until the ends of the cover image. c. Apply the Discrete Fourier Transformation. d. Consider the real part of the frequency component and do the following. • Take three frequency component values but not the first one and do the following. Consider the Least Significant Bit position of the DFT component. • Replace the bit by one authenticating bit. e. Apply the Inverse Fourier Transformation. f. Stop. 3.2 Extraction Algorithm a. Take the authenticated image as input. b. Consider 2 x 2 mask of the input image at a time and repeat step 3 and 4 until the ends of the embedded image. c. Apply the Discrete Fourier Transformation. d. Consider the real part of the frequency component and do the following. • Take three frequency component values but not the first one and do the following. • Extract the Least Significant Bit. • Replace this bit position by ‘1’ or by ‘0’. e. Apply the Inverse Fourier Transformation. f. Stop.
320
D. Bhattacharyya and T.-h. Kim
4 Result In this section results are analyzed and comparative studies have been made between proposed technique and S-Tools in terms of test for homogeneity i.e. Chi-square test and histogram analysis. Section 4.1 illustrates Chi-Square test. Section 4.2 deals with histogram analysis.
Fig. 3a. Hill
Fig. 3b. Lotus
Fig. 3c. DFTIAT
Fig. 3d. S-Tools
Comparison of visual fidelity in embedding ‘Lotus’ using DFTIAT and S-Tools are stated in Fig. 3a-3d. Fig. 3a shows source image ‘Hill’ and Fig.3b shows the authenticating image ‘Lotus’ and Fig. 3c and Fig. 3d are embedded image using proposed algorithm and S-Tools respectively. The authenticating image ‘Lotus’ has been embedded into the source image ‘Hill’. Some differences may be observed between source image and embedded image by S-Tools [11] but no such differences are observed in source image and embedded image by proposed technique. Fig. 4 indicates the comparison of visual changes for another source image ‘Rasmancha’ (Fig. 4a) embedding with the same Lotus image (Fig. 4b). The results are embedded image using proposed algorithm (Fig. 4b) and S-Tools (Fig. 4c). Here, may be observed some variations between source image and embedded image by STools but no such differences are observed in source image and embedded image by DFTIAT.
Fig. 4a. Rashmancha
Fig. 4b. Lotus
Fig. 4c. DFTIAT
Fig. 4d. S-Tools
Comparison of visual fidelity in embedding ‘Lotus’ using DFTIAT and S-Tools are shown in Fig.4a-4d. 4.1 Chi-Square Test The Chi-Square test has been performed for the source image and authenticated image, and also for the Authenticating Image & Extracted Image. The values of chi-squares
Image Data Hiding Technique Using Discrete Fourier Transformation
321
are given in Table 1 for different images, which show that the calculated chi-square value is less than the tabulated chi-square value for some level of significance, which indicates the homogeneity of the images. They are more significant for 1% level of uncertainty. For the authenticating and extracted image the Chi-Square value is zero. That is we are able to extract the original image without any noise. Table 1. Comparison of Chi-Square values in DFTIAT
Images Source and authenticated Hill image by Lotus Authenticated and Extracted Hill Image Source and Authenticated Rashmancha Image Authenticating Image & Extracted Image
File Size
Uncertainty
Degree of freedom
98 x 130
0.01
255
264.219
310.457
98 x 130
0.001
255
315.089
347.650
98 x 130
0.01
255
241.284
310.457
33 x 26
0.01
255
0.00
310.457
Calculated Tabulated Chi-Square Chi-Square
4.2 Histogram Analysis Histogram analysis has also been performed between source image ‘Hill’ and for the embedded image using ‘Lotus’ by applying proposed technique and S-Tools. In both the cases noticeable differences are observed in frequency distribution table of pixel values in source image and embedded image using S-Tools algorithm. But very small variances are observed in frequency distribution table of pixel values in source image and embedded image using proposed technique. Fig. 5 shows the visual effect in histogram in embedding source image ‘Hill’ with proposed technique and S-Tools. Fig. 5a is the histogram of the source image ‘Hill’, fig. 5b shows the histogram of the image embedded with ‘Lotus’ image using proposed technique and that of figure 5c is the histogram of the image embedded using ‘Lotus’ image by applying S-Tools. It is seen clearly that in the proposed technique the histogram remains almost identical with the source image even after embedding the image with ‘Lotus’ image where as in case of embedding with Stools there is a noticeable change in histogram in compare to the histogram of source image ‘Hill’. From these observations it may be inferred that the proposed technique may obtain better performance in embedding. Histogram analyses have also been done for another source image ‘Rashmancha’, which is depicted in fig. 6.
322
D. Bhattacharyya and T.-h. Kim
Fig. 5a. Hill
Fig. 5b. DFTIAT
Fig. 5c. S-Tools
Histogram for source image ‘Hill’, embedded image using DFTIAT and S-Tools are shown in fig 5a-c.
Fig. 6a. Rashmancha
Fig. 6b. DFTIAT
Fig. 6c. S-Tools
Histogram for source image ‘Rashmancha’, embedded image using DFTIAT and S-Tools are stated in fig. 6a-c.
5 Conclusion In this paper the proposed technique implemented here for image authentication and secret message transmission scheme. The algorithm used here is the bit level message or image insertion and extraction in the frequency domain. Using DFTIAT we are also able to extract the source image. In this technique 2 x 2 window is selected for better result of authentication. Insertion and extraction is done in frequency domain instead of spatial domain for more security. From the analysis of Chi-Square test and histogram analysis and comparison with S-Tools the proposed technique may obtain better result.
References 1. EL-Eman, N.N.: Hiding a large Amount of data with High Security Using Steganography Algorithm. Journal of Computer Sciences, 223–232 (April 2007) 2. Amin, P.K., Liu, N., Subbalakshmi, K.P.: Statistically Secure Digital Image Data Hiding. In: IEEE 7th Workshop Multimedia Signal Processing, Shanghai, October 2005, pp. 1–4 (2005)
Image Data Hiding Technique Using Discrete Fourier Transformation
323
3. Chandramouli, R., Memon, N.: Analysis of LSB based image steganography techniques. In: International Conference on Image Processing, Thessaloniki, Greece, pp. 1019–1022 (2001) 4. Dumitrescu, S., Wu, X., Wang, Z.: Detection of LSB steganography via sample pair analysis. IEEE Transactions on Signal Processing 51(7), 1995–2007 (2003) 5. Chen, B., Wornell, G.W.: Quantization Index Modulation: A Class of Provably Good Methods for Digital Watermarking and Information Embedding. IEEE Trans. on Information Theory 47, 1423–1443 (2001) 6. Moulin, P., O’Sullivan, J.A.: Information-theoretic analysis of information hiding. In: IEEE International Symposium on Information Theory, Sorrento, Italy, p. 19 (June 2000) 7. Moulin, P., Kıvanc Mıhcak, M.: A Framework for Evaluating the Data-Hiding Capacity of Image Sources. IEEE Trans. on Image Processing 11, 1029–1042 (2002) 8. Lin, C.-Y., Chang, S.-F.: A robust image authentication method surviving JPEG lossy compression. In: SPIE, pp. 296–307 (1998) 9. Pavan, S., Sridhar, G., Sridhar, V.: Multivariate entropy detector based hybrid image registration. In: IEEE ICASSP 2005, March 18-23, vol. 2, pp. ii/873–ii/876 (2005) 10. Pang, H.H., Tan, K.L., Zhou, X.: Steganographic schemes for file system and B-tree. IEEE Transaction on Knowledge and Data Engineering 16(6), 701–713 (2004) 11. http://digitalforensics.Champlain.Edu/downloads/s-tools4.zip
Design of Guaranteeing System of Service Quality through the Verifying Users of Hash Chain* Yoon-Su Jeong1,**, Yong-Tae Kim2, and Gil-Cheol Park2 1
Department of Computer Science Chungbuk National University 410 Seongbong-ro, Heungdeok-gu, Cheongju Chungbuk 361-763, Korea [email protected] 2 Department of Multimedia Engineering, Hannam University 133 Ojeong-dong, Daeduk-gu, Daejeon, Korea {ky7762,gcpark}@hnu.kr
Abstract. Now a days, Cloud Computing Technique that can use effective service whenever/wherever and get rid of the division of each source is soaring. However, in the Cloud Computing environment which all informations are centered, there is no sufficient method for users to access data safely when accesses. In this paper, guaranteeing technique of service quality which decides to use or not to use on proper data after checking the information about service quality that is evaluated by users themselves who have experienced accessing data before they received the data which the company provided when they access data is proposed. The proposed technique can be guaranteed the data integrity when user accesses data by tieing users who access data as hash chain, and the trust on data access by making users manage the security of data on data access not companies. Keywords: Cloud Computing, Hash Chain, Service Quality, Guaranteeing System.
1 Introduction Recently, a computing environment that can use dada load, network, contents using, and IT related service through internet server, cloud computing, is making an issue of new trend. Clouding computing saves information in internet server permanently, in the client like desk top·tablet computer, notebook computer, netbook, smart phone temporarily, all information of user, and can use the information whenever/wherever through IT equipments[1,2]. But there are a lot of issues to deal with when the existing computing environment is converted into cloud environment, mostly the security problem. Because cloud computing service allocates and manages separate source to protect data the stability is getting higher than personal management. But they should give up the right to control on sensitive data, and also if any accident happens the damage can be serious. Therefore, there is a problem of company secret and personal privacy[3,4]. *
This work was supported by the Security Engineering Research Center, granted by the Korea Ministry of Knowledge Economy. ** Corresponding author. T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 324–330, 2011. © Springer-Verlag Berlin Heidelberg 2011
Design of Guaranteeing System of Service Quality
325
Generally, a company and person using cloud computing can reduce a lot of cost like update, server set-up and purchase, software purchase, system managing cost, and time and manpower. Also it can contribute to reducing energy. If user saves data in pc it can be lost. However, in cloud computing, because data is saved in outer server it can be stored securely, can overcome the problem of storage limit, and can look at or modify personal data whenever/wherever. But, if the server is hacked, personal information can be spilled. And if server disorder happens, users can not access to data. Specially, in the case of company, there happens monetary problem because of information spill or damage so that some people avoid cloud computing. In this paper, guaranteeing technique of service quality which decides to use or not to use on proper data after checking the information about service quality that is evaluated by users themselves who have experienced accessing data before they received the data which the company provided when they access data is proposed. The proposed technique can be guaranteed the data integrity when user accesses data by tieing users who access data as hash chain, and the trust on data access by making users manage the security of data on data access not companies. The composition of this paper is like this. In the chapter 2, we research the mainstream of cloud computing and security demand. In the chapter 3, we propose guaranteeing system of service quality that can access to data which is provided by companies securely by adapting hash chain into user verifying. In the chapter 4, we analyze the proposed system, and lastly make a conclusion in the chapter 5.
2 Related Works 2.1 Cloud Computing Cloud computing is a technique which provides plural datacenters that are in different location after unifying those by virtual technique[1]. Datacenter means a place that gathers various computing sources as users pc or server gathering. It is user centered computing environment that can performs work by driving applications like web browser through multiple devices like personal computer and mobile after saving programs or documents in to huge computer which can access to internet[2]. We can find datacenter showing great computing capacity in the example of grid computing but there is difference between grid computing and cloud computing. The method to process distribution tieing computers as one computer is grid computing, but cloud computing is more personalized as a method to provide gathered computing resource to people who need properly[5]. Until now, Google DOCs(Google), Work Space(Microsoft), Acrobat.Com(Adobe) provide cloud computing service. Users request service through service catalog that is provided by service provider in cloud computing. System management module of service provider gives resource through virtural server network on this request. Users doesn't know the details like how the service is provided, where their data and information is saved, or which server is used. They only to use the service.Users can use high quality service and performs the task that is needed high capacity computing resource and mass storage through internet if they have a device that can calculate and access to internet[2].
326
Y.-S. Jeong, Y.-T. Kim, and G.-C. Park
Fig. 1. Cloud Service Network
Cloud computing classifies the characteristic of itself as service oriented, extendability and elasticity, sharing, real consumption estimate, internet technique. First, 'service oriented' means that in person or company person who consumes and creates IT resource, and owns it as a personal asset and operates it Second, scalability means solidity of service construction that react to scale change and elasticity means agility of service providing which reacts pattern change and the amount of service. Third, Sharing means a form of consumption that a person or an organization owns IT assets not personally but jointly, and this features that leads an effect to bring standardization and optimization of asset application while unifying isolated and various calculational sources as logically unique sources. Forth, measurement of real consumption means measuring real consumption as a concept of IT source accurately as much as possible by separating charge of necessary IT source and investment cost on manufacturing facility of IT source to be able to using IT source of consumer goods method. Fifth, internet technique needs standardized infra which is connected whenever and wherever IT source is needed to be used as consumer goods. And the best effective method to realize is adapting technical standard related to internet infra which prevailed all over the world[6-10]. 2.2 Security Issue of Cloud Computing Cloud computing doesn't own IT source and has a form of outsourcing partially or mostly. And then it has a secure problem. The security issue of cloud computing can be separated as a personal view and view of company[2]. 2.2.1 Personal User's View Personal users features that they prefer free service and use E-mail, blog, community, up-loading pictures and files, and sharing service. The security issues in personal user's view are like; · Leaking of personal information · Surveillance on individuals · Processing of economical purpose in personal information
Design of Guaranteeing System of Service Quality
327
2.2.2 View of Company Users Company users wants to be provided IT assets which they own as a form of cloud but they don't want share their data with others. Company users are of a mind to pay cost if security and stability are served. Sometimes they run it directly as Private Cloud. The security issues that company users worry are; · Suspension of service · Damage of company information · Leaking of company information · Leaking of client's information · Obedience of law/regulation · Action to e-discovery Like those above, demands of personal users and company users on security of cloud computing are different. Personal users emphasize anonymity while company users emphasize compliance.
3 Design of Guaranteeing System of Service Quality through the Verifying Users In this chapter, guaranteeing system of data access service quality which can manage authentication of quality about data and guarantee the stability on data access by users not company gathering users with hash chain to get services with data company provides securely when users access to data is proposed. 3.1 System Structure Management server performs a role to save and manage the quality information about data which tries to access to database when a user accesses data. Service provider performs a role to guarantee security on data access. Also, management server performs a role to save data access information and authentication information on user. User performs a role to access data after requesting it to company through user device like PC(Personal Computer). WAP proxy gateway provides WTLS in wireless area and SSL in wired area to provide traffic security based on WAP(Wireless Application Protocol). Contents server provides contents management and service to provide data which users select to users. Users performs authentication of their device and users through data quality information saved in the database and authentication information saved in authentication server to be supported by data. When it happens, device authentication process is not performed when user gets data in content server after user information and data quality information are saved and renewed in database. Content service prevents from illegal users using user's authentication number through communication device authentication server. User supports service after certificating user in content server after requesting service to content server.
328
Y.-S. Jeong, Y.-T. Kim, and G.-C. Park
Fig. 2. System structure of proposed system
3.2 Notations Notations to use in the proposed technique is defined in Table 1. Table 1. Parameters Notations U PK NX Cert KSH KUS h() E(X, D) MACX(D)
Explanations User Public Key Random number created by X Certification Public key pre-shared between user and content server Secrete Key of users One-way hash chain Encrypt of D by X Mac of D by X
3.3 User Access Control by Hash Chian Proposed technique to control user's data access using hash chain performs analysis and comparison process between user information served in authentication server by random number and Pseudo random number in content server. At this time, public key algorithm is used to guarantee security of information transmitted to content server. The proposed technique assumes that users and content server share pre-agreed public key KSH User sends equation 1 which includes user's random number(random number NU, certification Cert) to content server using public key KSH. E(PKCS, NU), MAC (KSH, NU, Cert)
(1)
Content server receives permission information of authentication server by delivering random number NCS which is created by itself and Authentication State Information(ASI) to authentication server. The information that is delivered to
Design of Guaranteeing System of Service Quality
329
authentication server adapts user's secrete key KUS that is enrolled beforehand to authentication server and random number(NU,NCS) to one-way hash function for user to get data access. E (PKCS, NU, NCS), h(MAC(KUS, NU, NCS, ASI), Cert)
(2)
Authentication server renews user's authentication state information(ASI) of content server by delivering ASI to content server after verifying user's authentication state information saved in database. Content server checks out user's service request state based on ASI of user that is renewed. If user's service request and user's ASI in content server don't match, user's request is ignored. 3.4 Data Quality Information Management Content server that confirms data quality information which is about to be saved in management server from certification server performs user's access confirmation. User enrolls user's certification number and device serial number to management server beforehand to get data. After confirming the user's information enrolled to management server beforehand, service is provided through content server. If user's state information is changed, it should be approved by server. If not, service is not available. User gets service approval/denial massage after delivering 2bit(16bite) authentication packet information along with service request massage to authentication server and content server like figure 3.
Fig. 3. Authentication Packet Information
Authentication packet information of figure 3 consists of 2bit authentication state, 1bit service quality state, 1bit server approval, 4bit user's authentication number, 8bit device serial number, and etc. 2bit authentication state information represents access approval information about data line from management server to content server. For example, if authentication state is 10 and 01, user's information enrolled in authentication server is searched with keyword and then if sever approval information is state of 1 user's information is requested to confirm the reliability. Service quality approval uses 1 bit as an information to decide data quality served by content server. If service quality information is 1, data service is good. If it is 0, data service is bad. This information is managed by user who get data service and decides whether data access is available confirming this information by user when accessing to content server. Content manager doesn't have authority to access to this information and do the content management. Server approval is an information to decide whether approve or not after getting user's information and it uses 1bit. User's authentication number and device serial number is pre-enrolled information in authentication server and only user knows it. Device serial number has the only number.
330
Y.-S. Jeong, Y.-T. Kim, and G.-C. Park
4 Conclusion In this paper, hash chain based service quality guaranteeing technique which users can access to data securely and control data access of users who use cloud computing in company efficiently. Even though the early authentication for accessing to cloud system is nomarly made by the company which provides the service, service quality information for data access is performed by user who already provided data access service. So access using data can be decided by users. The proposed technique has advantages to guarantee reliability on data access by managing security of data on data access by users not company and integrity of data when users access data by users who access to data tied by hash chain. In the further research, system capacity will be evaluated by adapting proposed technique to the system.
References 1. Zhang, S.A., Zhang, S.F., Chen, X.B., Huo, X.Z.: Cloud Computing Research and Development Trend. In: Second International Conference on Future Networks (ICFN 2010), pp. 93–97 (2010) 2. Gong, C.Y., Liu, J., Zhang, Q., Chen, H.T., Gong, Z.H.: The Characteristics of Cloud Computing. In: 39th International Conference on Parallel Processing Workshops (ICPPW 2010), pp. 275–279 (2010) 3. Cloud computing, http://en.wikipedia.org/wiki/Clound_computing 4. Cloud computing with Linux, http://www.ibm.com/developerworks/linux/library/ I-cloud-computing 5. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google File System. In: Proc. of ACM Symp. on Operating Systems Principles, pp. 20–43 (2003) 6. HDFS, http://hadoop.apache.org/core/docs 7. FUSE, http://fuse.sourceforge.net 8. Who uses Hadoop, http://wiki.apache.org/hadoop/PoweredBy 9. Non-standard RAID levels, http://en.wikipedia.org/wiki/Non-standard_RAID_levels 10. Data deduplication, http://en.wikipedia.org/wiki/Data_deduplication
Design of RSSI Signal Based Transmit-Receiving Device for Preventing from Wasting Electric Power of Transmit in Sensor Network* Yong-Tae Kim1, Yoon-Su Jeong2,**, and Gil-Cheol Park1 1
Department of Multimedia Engineering, Hannam University 133 Ojeong-dong, Daeduk-gu, Daejeon, Korea {ky7762,gcpark}@hnu.kr 2 Department of Computer Science Chungbuk National University 410 Seongbong-ro, Heungdeok-gu, Cheongju Chungbuk 361-763, Korea [email protected]
Abstract. In WPAN environment using the existing ZigBee communication, the determination of transmitting electric power output of sensor is set up at the early stage which composes system. Therefore, the devices which compose WPAN use the way to transmit with the output value that is set as fixed irrelevantly to the distance of other device and it causes excessive electric power for transmitting. As a result, the life-expectancy of battery is reduced and there happens interference among devices. So, in this paper, a transmitreceiving system which can transmit signal using proper electric power for transmit to other device that transmits signal by the receiving intensity of measured signal after measuring intensity of RSSI of each device for solving interruption problem among device and wasting electric power transmitting. Keywords: WPAN, WLAN, ZigBee communication, RSSI, WSN, Low Power Trasmit.
1 Introduction Wireless sensor network in recent years is a technique with rapid development and it provides mobility and flexibility. Also it is easy to compose of active network, distribute, and has low cost. Wireless sensor network provides low electric power consumption with loosening the condition of demanded throughput for industrial field, vehicle, sensors in home, and etc. And it tries to do a wireless networking with low cost by a simple wireless connection to device level communication[1]. As wireless sensor network is developed, WPAN(Wireless Personal Area Network) appeared because of demand for the close range wireless communication through the development of computer-related technique and information and communication technique, and another demand for wireless based communication among various and *
This work was supported by the Security Engineering Research Center, granted by the Korea Ministry of Knowledge Economy. ** Corresponding author. T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 331–337, 2011. © Springer-Verlag Berlin Heidelberg 2011
332
Y.-T. Kim, Y.-S. Jeong, and G.-C. Park
different devices. And the device like WPAN which is used in wireless sensor network appeared because of the necessity of small, low power, low price, environment friendly transmitting method. The form of WPAN is composed of 60GHz WPAN, WIMedia UWB, Mobile bluetooth, ZigBee telecom application, smart grid, low speed WPAN, and etc. WPAN makes communication available with devices like computer which is away, peripheral, cell-phone, and appliances. And it connects with each devices wirelessly and uses wireless communication method like ZigBee, Bluetooth, UWB(Ultra Wide Band). The devices which composes WPAN consists of various sensors that detect temperature, humidity, vibration, illuminance and etc, and remote control/surveillance module. It doesn't use normal electric power like AC 110~220V but small battery. Therefore, the electric power of device which composes WPAN is required to reduce electric power consumption to guarantee electric power for years or months. So, in this paper, the transmit-receiving system which can transmit signal by proper electric power to other device that receives the signal by the receiving intensity of being estimated signal and measure the intensity of Received Signal Strength Indication(RSSI) which each devices receive to solve a problem of interference among devices and wasting electric power is proposed. This paper consists of: Chapter 2 analyses the technique that increases energy efficiency and outline of wireless sensor network. Chapter 3 proposes the technique that guarantees scalability of network maximizing energy efficiency of wireless sensor network. Chapter 4 analyzes and evaluates the efficiency of proposed framework. At last, chapter 5 describes further research direction related with the result.
2 Related Research 2.1 Wireless Sensor Network Though wireless sensor network is studied and used as a method of military surveillance and trace, recent application field is used in the field of environment/ecology surveillance, energy management, distribution/stock management, and medical monitoring. It is now researched variously about wireless sensor network in various and widespread field[2]. The composition form of wireless sensor network includes various different sensors, and in the case of some sensors' stop it doesn't affect to whole network. And it has an attribute that is free to distribution allocation. And wireless sensor network has problems like low transmit speed of device, transmit error, impossibility of exchange of sensor node and restricted electric supply[3]. The most important element of wireless sensor network is reducing electric power consumption which is used to deliver data because sensor nodes that activate by small battery are distributed to certain region randomly and it transmits information as much as the capacity of battery. It is important to design protocol suit for the condition that reduces power consumption of sensor nodes in wireless sensor network. The existing research technique on wireless sensor network is focused on the composition of network and
Design of RSSI Signal Based Transmit-Receiving Device
333
mobility of nodes and it is proposed mostly with MAC(Multiple Access Control) and routing protocol to increase the energy efficiency of sensor node[4]. Because wireless sensor network is a cooperation system of sensor nodes, for the energy efficiency a technique which maximizes scalability of network and life-expectancy of whole network that is consisted of a lot of nodes, and also extends lifespan of each nodes which consist of sensor network when MAC(Multiple Access Control) and routing protocol are designed. 2.2 WPAN and WLAN Nowadays, in the field of personal wireless networking solution, WPAN(Wireless Personal Area Networks) technique which is used at close range as long as 10m is getting attention while being compared with the existing wired networking technique. WAPN has a convenience that avoids cumbersomeness that connects several devices with cable, performs data communication wirelessly at the distance of 0~50m, and includes the element technology which interlocks sensor and gatewqy, and sensor and sensor like Bluetooth, ZigBee, UWB(Ultra Wide Band), and etc. WPAN is for the wireless connection among personal-centered small peripherals and its purpose is distinguished to WLAN[5]. WLAN(Wireless Local Area Network) is network which accesses from computer to Access Point wirelessly instead of LAN Cable. Wireless LAN makes devices in the restricted area communicate to each other. Therefore the user can access to network while walking around the area of wireless LAN. WLAN is highly secured but its protocol standard is complicated. The radiowave interference is severe. And it consumes electric power very much so it has to be replaced often. Table 1 compares the specification of WPAN and WLAN. Table 1. Comparison Table for Specification of WPAN between WLAN WLAN
WPAN
WLAN 11b
WLAN 11n
Bluetooth
ZigBee
UWB
IEEE Standard
IEEE802.11b
IEEE802.11n
IEEE802.15.1
IEEE802.15.4
IEEE802.15.3
Frequency Band
2.4GHZ
5GHZ
2.4GHZ
915M/2.4GHZ
3.1~10.6GHZ
Transfer Rate
11Mbps
500Mbps
1Mbps
40/250Kbps
54Mbps
Communication Distance
50m
1Km
10m
30m
2~10m
Features
Data Transmission
Support Various Communications(Voice, Fax)
Low Price/Power Transmit
High Speed Transmit Strong at
Control of Low Power/Price Device
Short Range High Speed Communication
Applications
PC
Data Transmission
PC, Mobile
Wire Substitution of Short Range
Stability
Interference
WPAN is recognized as a key technology which consists of ubiquitous sensor network in the environment of next generation digital home and constructs massive ubiquitous network connected existing WLAN or CDMA network. Like this, WPAN is expected to do an important roll in local area wireless technique business and this various applications which can be adapted to WPAN should be developed to vitalize.
334
Y.-T. Kim, Y.-S. Jeong, and G.-C. Park
2.3 Overview of THE 802.15.4 Based SigBee ZigBee is one of IEEE 802.15.4 standard which supports close-range communication and it is a technique for close range(10~20m) communication and ubiquitous computing in the field of wireless networking in home and office as a concept of mobile phone or wireless LAN. Compare to existing technique, it transmits small amount of information instead of minimizing electric power. It is a form of dual PHY and uses frequency band of 2.4GHz, 868/915MHz. Its modem method is direct sequence spread spectrum(DS-SS) and the data transmit rate is 20~250kbps[6]. ZigBee based QPAN is a close range wireless communication standard which features low coverage, low rate communication, low cost, and low power. And it is the technique that consumes low electric power, has inexpensive chip, and has high stability of communication. ZigBee is suitable for remote control, management, and monitoring. It creates demands in the field of factory and home automation, healthcare, and home network by applying other IT technology like finger print, voice, and biometric. In the future, support of ubiquitous networking environment is necessary and for these function it is the most important to realize sensor networking technique. Though ZigBee technology has low data transmit rate, it will be widened in the business of low power consumption WPAN(specially in the field of automatic control) because it connects a lot of nodes for surveillance and control also it is good at using low power comparing to other techniques. Specially it is expected it will be the best suitable technique to ubiquitous environment because it consumes power less and the chip cost is lower than Bluetooth, UWB.
3 Design of RSSI Signal Based Transmit-Receiving Device This paper transmits a data with fixed output value which is set up as irrelevant to the distance of other device in the existing the existing WPAN environment using excessive transmit electric power. Therefore it shortens the lifespan of battery and happens interference among sensors. To solve these problems, this paper proposes low power transmit method which reduces electric power consumption by controling transmit power as the receiving strength of each sensors. 3.1 Components of WPAN for Low Power Transmit-Receiving System WPAN represents ZigBee network and composes wireless network with form of tree by distributing coordinator(C), router(R), end device(T) to defined CELL. Coordinator manages several router, and collects data by several end device. Figure 1 shows the structure of defined CELL which composes WPAN. Coordinator saves the information that is transmitted by router and end device. Router engages data communication of coordinator and end device, and data communication of end devices(ED). End device(ED) requests network access by coordinator directly or via router. And when it is accessed through coordinator, it transmits and receives data packet from and to other end device via coordinator or router. Devices that compose WPAN include coordinator, router, and end device(ED). Each devices transmits data after properly setting the transmitting output by transmit
Design of RSSI Signal Based Transmit-Receiving Device
335
Fig. 1. Structure of WPAN CELL
distance, radio environment, and RSSI(Received Signal Strength Indication). Through this, each device reduces electric power consumption by preventing unnecessary electric power consumption when transmits data. 3.2 Receiving Module Function and the Structure of Device Composing WPAN Coordinator, Router, and End Device that compose WPAN has same basic structure including antenna and antenna related devices. Each devcice is consists of signal receiving module, signal transmit module, and transmit electric power controller. Figure 2 shows the structure of device that consists of WPAN.
Fig. 2. Structure of Device Composing WPAN
The receiving part of device consists of signal receiving module with antenna, receiving signal processor, Analog to Digital Converter(ADC), digital signal processing module, and RSSI making module. The received signal through antenna is amplified by
336
Y.-T. Kim, Y.-S. Jeong, and G.-C. Park
amplificator, and after performs frequency transfer operation on I channel signal and Q channel signal by receiving signal processing module it print out received data which is processed by digital signal processing module to RSSI making module after transferring into digital data by Analog Digital to Converter(ADC). Digital signal processing module prints out after rewarding frequency deviation of received data as much as 80ppm to keep the receiving capacity. RSSI making module saves the calculated RSSI value to register after calculating signal level of related channel receiving data among channels(received data from digital signal processing module). And RSSI making module makes transmit electric power controller adjust transmit power of electric power amplificator based on RSSI value of received signal by printing out the saved RSSI value to transmit electric power controller. 3.3 Transmit Module Function of Device Composing WPAN The transmit apparatus of device basically has transmit buffer and processes data of transmit buffer. And then it play a roll which transmit digital data that is saved in transmit buffer after transferring it into analog signal on the contrary. The compositions of transmit module are digital data processing module connected to transmit buffer, Digital to Analog Converter(DAC), transmit electric power control module, electric power amplificator, and antenna. Digital data processing module spreads data for message transmitting which saved in transmit buffer after being created using PN code which is 32chip spread sequence which is proposed by IEEE802.15.4 mapping mapping as four bit simbol unit after transferring into source data and by automatical creating and adding of Preamble and Frame Start. In this part, spread sequence chip rate is 2Mcps and 16 symbols of 4bit symbol0(0000b)~F(1111b) are allocated 32bit PN code. And 32bit PN code is separated into I(In Phase) channel and Q(Quadrature Phase) channel, and it creates Half Sine which is proposed by IEEE802.15.4 Digital to Analog Converter(DAC) transfers created Half Sine pulse into analog signal through DAC. Transmit electric power control module gets rid of noise which exists in analog signal and performs frequency upper change. Electric power amplificator amplifies transmit data by electric power setting value if transmit output controller. Transmit data which is amplified is transmitted other device through antenna. Then, transmit output controller prevents waste of excessive transmit electric power because it adjusts electric power amplificator properly by setting transmit electric power by RSSI signal from receiver. 3.4 Function of Transmit Electric Power Search Module To solve the problem of wasting transmit electric power in each device which composes WPAN environment, transmit electric power requires proper management. Therefore, in this paper, coordinator, router(R), and end device that compose WPAN(ZigBee network) save detected transmit electric power value into transmit electric power management table after searching transmit electric power for each device to manage transmit electric power. The work which transmit electric power search module performs to calculate proper transmit electric power value is like the algorithm below.
Design of RSSI Signal Based Transmit-Receiving Device
337
Algorithm: The method to calculate transmit electric power value 1) Receive data packet from opponent device which wants data transferring. 2) RSSI making module of signal receiving module saves RSSI value to register after calculating it from receiving data. 3) Data transmit module decides transmit electric power from saved RSSI value. 4) Transmit electric power should be adjusted to be the level of -70~85dBm as a normal data transmit-receiving standard level and then transmitted. 5) Request receiving RSSI value to opponent device by using adjusted transmit electric power to confirm that opponent device receives transmit electric power properly. 6) Detect RSSI value from received packet if the requested answer from opponent device is received. 7) Save decided transmit electric power value as the transmit electric power value of opponent device to electric power management table and end transmit electric power researching procedure if the detected RSSI value is in the normal range. 8) Repeat transmit electric power researching procedure until finding transmit electric power which is in normal range if the detected RSSI value is out of normal range.
4 Conclustion RSSI signal based transmit-receiving device to prevent from wasting proposed transmit electric power uses RSSI value of receiving data like the proposed algorithm and searches proper transmit output for each device in network, calculates it, and saves it to transmit elcetric power management table. And it prevents from excessive transmit electric power by control transmit output according to frequency environment and control transmit distance and receiving intensity according to opponent device using saved transmit electric power value. And while network is being updated, transmit electric power management table can be updated by detected transmit electric power value which is searched proper transmit electric power on each device again using the proposed algorithm.
References 1. Ferro, E., Fotorti, F.: Bluetooth and Wi-Fi wireless protocols: A survey and a comparison. IEEE Wireless Communications 12(1), 12–16 (2005) 2. Akyildiz, I.F., Su, W., Sankarasubramaniam, Y., Cayirci, E.: A survey on sensor network. IEEE Communications Magazine 40, 102–114 (2002) 3. Liang, S.H.L., Croitoru, A., Vincent Tao, C.: A distributed geospatial infrastructure for Sensor Web. Computers & Geosciences 31(2), 221–231 (2005) 4. Raymond, D.R., Marchany, R.: Effects of Denial-of-Sleep Attacks on Wireless Sensor Network MAC Protocols. IEEE Transactions on Vehicular Technology 58 (2009) 5. Xia, Q., Hamdi, M.: Smart sender: a practical rate adaptation algorithm for multirate IEEE 802.11 WLANs. IEEE Transactions on Wireless Communications 7 (2008) 6. Kim, H.S., Song, J.-H., Lee, S.: Energy-Efficient Traffic Scheduling in IEEE 802.15.4 for Home Automation Networks. IEEE Transactions on Consumer Electronics 53(2) (2007)
User Authentication in Cloud Computing Hyokyung Chang and Euiin Choi∗ Dept. of Computer Engineering, Hannam University, Daejeon, Korea [email protected], [email protected]
Abstract. As Cloud Computing has been spreading widely, users and service providers enables to use resource or service cheaply and easily without owning all the resource needed. However, Cloud Computing has some security issues such as virtualization technology security, massive distributed processing technology, service availability, massive traffic handling, application security, access control, and authentication and password. User authentication among them requires a high-quranteed security. Hence, this paper would like to discuss technologies of access contol and user authentication briefly and look at the problems inside. Keywords: Cloud Computing, Security, Access Control, User Authentication.
1 Introduction Gartner announced Top 10 of IT strategy technologies of the year of 2010 in 2009, October. Strategy technologies Gartner mentioned are the technologies which importantly affect enterprises for the next 3 years and have a powerful effect on IT and business. They may affect long-term plans, programs, and major projects of enterprises and help enterprises get strategic advantages if enterprises adopt them a head start. Cloud Computing got the top rank (2nd rank in 2009)[1][2]. Cloud Computing system makes all the information shared in the virtual environment and lets users utilize software, platforms, and infrastructure environment in real time as they want, so it can be the future computing system that enables to control information and manage, retrieve and adjust and has an unlimited availability potentially[3]. There are some security issues such as virtualization technology security, massive distributed processing technology, service availability, massive traffic handling, application security, access control, and authentication and password in the cloud computing service environment [4]. This paper looks at general user authentication service in the Cloud Computing environment and discusses the problems related to. The composition of this paper is as follows: Chapter 2 Related Works looks at the appearance background of Cloud Computing and technical features of Cloud Computing, Chapter 3 describes user authentication services in Cloud Computing and the problems of them, and Chapter 4 draws the conclusions. ∗
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 338–342, 2011. © Springer-Verlag Berlin Heidelberg 2011
User Authentication in Cloud Computing
339
2 Related Works 2.1 Definition of Cloud Computing and the Appearance Background Gartner defines cloud computing as "a style of computing where scalable and elastic IT-related capabilities are provided 'as a service' to external customers using Internet technologies [14][11]." Cloud Computing is a kind of on-demand computing method that lets users use IT resources such as network, server, storage, service, application, and so on via Internet when needing them rather than owning them[5]. Cloud Computing can be considered as a sum of SaaS (Software as a Service) and utility computing and Figure 1 shows the roles of users or providers in the Cloud Computing under the concept[11]. The first appearance of the concept of Cloud Computing is from what John McCarthy said that computation might someday be organized as o public utility in 1960s[6][7]. The term of “Cloud” was originated from the telecommunication world in 1990s, when providers started using virtual private network (VPN) for data communication[8]. Since then, Cloud Computing has been developing rapidly because of IT infrastructure such as the spread of wired/wireless communication network and the high speed, a variety of tool sets, the spread of free software, and so on[9].
Fig. 1. Users and Providers of Cloud Computing
2.2 Technical Features of Cloud Computing and Challenges Cloud Computing provides three of virtualized representative services by using Internet technology like Web 2.0. They are SaaS (Software as a Service), PaaS (Platform as a Service), and IaaS (Infrastructure as a Service). SaaS means the service that provides functions of application software through the Internet, PaaS means the service that provides platforms which applications act or development environment or tools in the Web environment of Internet, and IaaS means the service that is provided through computing capability of server or hardware like storage[10]. The Cloud Computing challenges that M. D. DiKaiakos et al. mentioned were software/hardware architecture, data management, Cloud interoperability, security and privacy, and service provisioning and Cloud economics[7]. Cloud Computing is in a way of outsourcing some or all of the IT resources rather than owning them all. Therefore, the issue of security of Cloud Computing cannot help being raised and it
340
H. Chang and E. Choi
can be divided into two consumer areas. The first is the security from the point of view of individual user and the second is the one from the point of view of enterprise user[12]. This paper deals with the security from the point of view of individual user. All Cloud Computing should provide a proper method of security that includes user authentication, authorization, confidential, and a certain level of availability in order to confront the security problem of leak of personal information [10]. 2.3 Security Technology in the Cloud Computing Environment There are no concrete security technologies in Cloud Computing, however, if we regard Cloud Computing as an extension of the existing IT technologies, it is possible to divide some of them by each component of Cloud Computing and apply to[12]. Access control and user authentication are representative as security technologies used for platforms. Access control is the technology that controls a process in the operating system not to approach the area of another process. There are DAC (Discretionary Access Control), MAC (Media Access Control), and RBAC (RoleBased Access Control). DAC helps a user establish the access authority to the resources that he/she owns as he/she pleases. MAC establishes the vertical/horizontal access rules in the system at the standard of security level and area for the resources and uses them. RBAC gives an access authority to a user group based on the role the user group plays in the organization, not to a user. RBAC is widely used because it is fit for the commercial organizations[12]. Technologies used to authenticate a user are Id/password, Public Key Infrastructure, multi-factor authentication, SSO (Single Sign On), MTM (Mobile Trusted Module), and i-Pin[12].
3 User Authentication in the Cloud Computing Environment 3.1 User Authentication Technology in the Cloud Computing Environment Users in the Cloud Computing environment have to complete the user authentication process required by the service provider whenever they use new Cloud service. Generally a user registers with offering personal information and a service provider provides a user’s own ID (identification) and an authentication method for user authentication after a registration is done. Then the user uses the ID and the authentication method to operate the user authentication when the user accesses to use a Cloud Computing service[13]. Unfortunately, there is a possibility that the characteristics and safety of authentication method can be invaded by an attack during the process of authentication, and then it could cause severe damages. Hence, there must be not only security but also interoperability for user authentication of Cloud Computing. User authentication in Cloud Computing is generally done in the layer of PaaS and representative authentication security technologies are Id/password, Public Key Infrastructure, multi-factor authentication, SSO (Single Sign On), MTM (Mobile Trusted Module), and i-Pin mentioned in Chapter 2[12].
User Authentication in Cloud Computing
341
3.2 Weakness of User Authentication Technology in Cloud Computing Environment The representative user authentication security technologies described above have some weaknesses. This section looks at user authentication security technologies and their weaknesses briefly[12][14]. z Id/password: the representative user authentication method. It is simple and easy to use, but it has to have a certain level of complication and regular renewal to keep the security. z PKI(Public Key Infrastructure): an authentication means using a public-key cryptography. It enables to authenticate the other party based on the certificate without shared secret information. In PKI structure, it is impossible to manage and inspect the process of client side. z Multi-factor: a method to heighten the security intensity by combining a few means of authentication. Id, password, biometrics like fingerprint and iris, certificate, OTP (One Tome Pad), etc. are used. OTP information as well as Id/password can be disclosed to an attacker. z SSO (Single Sign On): a kind of passport if it gets authentication from a site, then it can go through to other sites with assertion and has no authentication process. The representative standard of assertion is SAML. z MTM (Mobile Trusted Module): a hardware-based security module. It is a proposed standard by TCG (Trusted Computing Group) which Nokia, Samsung, France Telecom, Ericson, etc. take part in. It is mainly applied to authenticate terminals from telecommunications, however, it is being considered as a Cloud Computing authentication method with SIM (Subscriber Identity Module) because of generalization of smartphone[15]. z i-Pin: a technique to use to confirm a user’s identification when the user uses Internet in Korea now. It operates in a way that the organization which itself performed the identification of the user issues the assertion. Among the security technologies mentioned above, Id/password and OTP may have a key logging attack or SSL strip attack and can be disclosed to the attacker.
4 Conclusions This paper had a close look at access control and user authentication which are security technologies used in the platform in the Cloud Computing environment. In the Cloud Computing environment, misuse of access authority to resources and leak of personal information which should be used to authenticate a user could affect faster and more powerful compared to mono-system. For the effective user authentication in the Cloud Computing environment, the authentication technologies described above should be used by combining them suitably or a secure user authentication method for the right purpose of Cloud Computing should be developed. As a further research, a proper user authentication service model and protocol for Cloud Computing should be designed and developed. Acknowledgments. This work was supported by Hannam University Research Fund, 2011.
342
H. Chang and E. Choi
References 1. Lee, T.: Features of Cloud Computing and Offered Service State by Individual Vender. Broadcasting and Communication Policy 22, 18–22 (2000) 2. http://www.gartner.com/technology/symposium/ 2009/sym19/about.jsp 3. Lee, J., Choi, D.: New trend of IT led by Cloud Computing. In: LG Economic Research Institute (2010) 4. Lee, H., Chung, M.: Context-Aware Security for Cloud Computing Environment. IEEK 47, 561–568 (2010) 5. Kim, J., Kim, H.: Cloud Computing Industry Trend and Introduction Effect. In: IT Insight, National IT Industry promotion Agency (2010) 6. http://en.wikipedia.org/wiki/Cloud_computing#History 7. Dikaiakos, M.D., et al.: Cloud Computing Distributed Internet Computing for IT and Scientific Research. IEEE Internet Computing, 10–13 (September/October 2009) 8. Harauz, J., et al.: Data Security in the World of Cloud Computing. IEEE Security & Privacy, 61–64 (2009) 9. Lee, J.: Cloud Compting, Changes IT Industry Paradigm. LG Business Insight, 40–46 (2009) 10. Kim, H., Park, C.: Cloud Computing and Personal Authentication Service. KIISC 20, 11– 19 (2010) 11. Armbust, M., et al.: Above the Clouds: A Berkeley View of Cloud Computing. In: Technical Report (2009), http://www.eeec.berkeley.edu/Pubs/TechRpts/ 2009/EEEC-2009-28.html 12. Un, S., et al.: Cloud Computing Security Technology. ETRI 24(4), 79–88 (2009) 13. Beritino, E., et al.: Privacy-preserving Digital Identity Management for Cloud Computing. Bulletin of the Technical Committee on Data Engineering 31(1) (January 2009) 14. Gartner Says Cloud Computing Will Be As Influential As E-business (2008), http://www.gartner.com/it/page.jsp?id=707508 15. TCG, http://www.trustedcomputinggroup.org
D-PACs for Access Control and Profile Storage on Ubiquitous Computing Changbok Jang and Euiin Choi* Dept. of Computer Engineering, Hannam University, Daejeon, Korea [email protected], [email protected]
Abstract. Recently, context information of studies on ubiquitous networks have been conducted, and techniques for reducing the time of service and increasing accuracy of recommendation are also being studied. But it is difficult to quickly and appropriately provide a number of services, because only basic context information is stored. And Applications in context-aware computing environment will be connected wireless network and various devices. Reckless access of information resources can make trouble of system. So, access authority management is a very important issue of both information resourcess and adaptation to system through founding security policy of needed system. For this reason, we suggest D-PACs which stores location, time, and frequency information of often used services, and propose the model of automated context-aware access control using ontology that can more efficiently control resources through inference and judgment of context information. Keywords: Ubiquitous computing, Recommendation System, Personal Profile, Access control.
1 Introduction Recently, with the IT technique growth, there is getting formed to convert to ubiquitous environment that means that it can access information anywhere and anytime, and the number of user demanding information services, without time and a location constraint[1,2]. For this reason, various studies of context-aware on ubiquitous networks have been conducted. Context-aware information is collected by sensors located around users in this ubiquitous environment, and used to infer user’s current situation with personal information, such as user’s profile[3,4]. Because context information inferred is used to provide services appropriate for users, many academics are studying context-aware modeling, streaming data processing, service provision, context prediction, and context histories[5,6,7,8,9,10,11]. Also, recommendation techniques such as Case-based Reasoning, Collaborative filtering, Content-based filtering, Item-item filtering have been studied in ubiquitous computing, on the web, and in other fields [12,13]. These techniques provide services with a way of using the personal profile and preference rating of users. However, previous techniques on ubiquitous computing have not been considered privacy *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 343–353, 2011. © Springer-Verlag Berlin Heidelberg 2011
344
C. Jang and E. Choi
problems and recommendation correctness by context. Also, current ubiquitous computing service does not consider a security policy of resources. So it affects a field of studying context-aware security, and researchers have been lively studying context-aware access control models. Context-aware access control models have to provide suitable service or resources for users that is based on context information and personal profiles. It is different from the existing security services that authenticate authority using simple user information. It can provide to restrict resources access by environment with synthetic data such as location, current user profile, time, and devices. When a user who has access authority and a user who does not have access authority use the service same time, they would restrict even a user who has authority. It is able to maintain much stronger than resources security according to context information and another character. Since the ubiquitous computing appeared, there have been so many researches in local and international, but security research using context aware technique started recently. Also, the more ubiquitous computing and pervasive computing construct, the more important context aware security research is. Therefore, we suggested D-PACs(Dimension Profile and Access Control system) which stored location, time, and frequency information of often used service, and put the services expected to use in any location on time to storage, and proposed the automated context-aware access control model that does modeling user information and user context surrounding information accessing resources by ontology. So, D-PACs is able to provide more accurate service than previous techniques through techniques and systems and is available to adapt an active security-level of resources using an inference engine.
2 Related Works Ubiquitous is a word meaning “anywhere and anytime” from Latin. In this computing environment, a user can access to network regardless of wire and wireless environment and device. The object of awareness in this ubiquitous which differs from previous computing environment is not human, but a device[14]. This means that information of awareness is to be personalized. Because inference power of computer was not enough to understand human’s action and thought. But because they actively have to provide various service and information to aware human’s behavior and thought, there is need of definition, how expressed human’s behavior and thought to context information. Also, because human’s behavior and thought was individually differ, how understand personal inclination is important problem. For this problem solving on ubiquitous, it was inferred context which was able to understand human’s behavior based situation information which collected from various sensors. And they are provided service and information to user through the inferred context. Generally, situation which arose around human was able to collect from sensor, but personal inclination and thought was not. Therefore, they used the method of personal information storage for analyzing inclination, such as personal profile, history, diary[15,16]. As mentioned above, user of ubiquitous was provided various services without human’s reorganization by ubiquitous devices in anywhere and anytime. So, we have to infer context for provide the services to users correctly. Therefore, there
D-PACs for Access Control and Profile Storage on Ubiquitous Computing
345
are studying about technique of context inference to use personal inclination and information. As it demanded personal inclination and information in context inference, using the user’s profile for storing it, research about technique of profile was as follow: UbiData project was suggested data process and synchronization in ubiquitous environment and addresses these three challenges using an architecture and sophisticated hoarding, synchronization, and transcoding algorithms to enable continuous availability of data regardless of user mobility and disconnection, and regardless of the mobile device and its data viewing/processing applications[17,18]. Annika Hinze describe TIP (Tourism Information Provider) system, which delivers various types of information to mobile devices based on location, time, profile of end users, and their “history”, i.e., their accumulated knowledge. The system hinges on a hierarchical semantic geospatial model as well as on an Event Notification System (ENS)[19]. Annie Chen proposed a context-aware collaborative filtering system that can predict user preferences in different context situations, based on past userexperiences. Recommendation techniques, such as Case-based Reasoning, Collaborative filtering, Content-based filtering, Item-item filtering, were suggested[12, 13]. Annie Chen proposed a context-aware collaborative filtering system that can predict user preferences in different context situations, based on past user-experiences. The system uses what other like-minded users have done in similar context, to predict a user’s preference towards an item in the current context[20]. Manuele Kirsch-Pinheiro proposed a context-based filtering process, aimed at adapting awareness information delivered to mobile users by collaborative web systems. This filtering process relied on a model of context which integrates both physical and organizational dimensions, and allows representation of the user’s current context as well as general profiles. These profiles are descriptions of potential user contexts and express awareness information filtering rules to apply when the user’s current context matches one of these rules. Given a context, these rules reflect user preferences. They describe how the filtering process performs in two steps, the first for identifying the general profiles that apply, and the second for selecting awareness information[21]. Method of Context-aware Access Control was studied such as RBAC, GRBAC. RBAC is access control model that is more popular in commercial area as alternative model of MAC(Mandatory Access Control) or DAC(Discretionary Access Control)[22, 23, 24]. The best feature of the RBAC is not directly allowed to user who is available for performance of operation about information that is assigned by role that is point obtaining access authority of user through assigned role to user. GRBAC(Generalized RBAC) model use subject role, object role, environment role in access control decision. And that added context information to existing role-based access control[23, 25]. Middleware Using Context-aware Security Method was studied such as CASA(Context-aware Security Architecture), SOCAM. CASA is suggested by Georgia Institute of Technology that is security platform of middleware level for security of which associated security with user bio information or location information[26, 27, 28]. SOCAM propose to OWL for context-information modeling in middleware and that is consisted of several components. Context Providers make abstraction for various context-information[29].
346
C. Jang and E. Choi
3 System Architecture We have to infer what service is fit to user well for serving various services to user based context-aware information which arose in ubiquitous environment. Generally, we stored profile for using user’s inclination and information. Also, because services that often used have high probability which continuously using, if the services stored in profile, we could reduce time of service using. Therefore, previous technique which information and time of often using services stored in profile was suggested. But there are need to information of user’s location and time for providing more correct services. For example, we assume that a service was used 10 times on a day, and if time of service using is 3 P.M, we should infer that the service almost would use to afternoon. And time and frequency of information is important in ubiquitous. But location information is very important also. Even if services was same, frequency of service was different each other. Therefore we suggest technique that providing the service which demanded by user to store location information with time and frequency in profile and that put the service in location on time to using. The system, which we were suggested, is consists of Agent and D-PACs and User, service provider. And figure 1 shows how we structured D-PACs.
Fig. 1. D-PACs Architecture
Agent is responsible for communicating user and D-PACs, providing services to user. And D-PACs is responsible for finding services and predicting, analyzing profile information, collecting information from sensors. Service provider is responsible for providing services to user. User is used to service in ubiquitous.
D-PACs for Access Control and Profile Storage on Ubiquitous Computing
347
D-PACs was consists of 3 module, such as Service manager, Agent manager, Inference engine, and each modules consist of sub-modules. Service manager is responsible for finding the service and the predicted service which user will use in other place from service provider. And then stores the predicted services in service storage. If users request the predicted service to D-PACs, we will directly search it on service storage without searching works. So it is more quickly find service which user requested, and is able to provide it to user. Analyzer within inference engine is responsible for analyzing context with profile and sensor information to provide suitable service for user. And predictor estimates services which user is going to use service on other place. Context knowledge model database was stored location information anywhere user could stayed in a place and context model information how we infer to context. Sensor information database was stored sensor data which was received from D-PACs manager. Agent manager is responsible for receiving information from D-PACs manager on agent, and then send to inference engine. Also, it is send services which was find from service provider to service handler on agent. In Service module, which included by authentication and authorization service, authentication service module performs that is both in charge of management and treatment in context information of subject and confirming identification about subject that accessible of context-aware access control system. Also, Authorization service provides service of assignment as dynamically about role of user through analysis of access policy and acquiring added information that is access location, access time, and spatial area about context information of subject which is access of resources. And Authorization services perform for role of access control through comparison and analysis of security policy with both user role of activated user and activated context role in present. Authentication services perform for monitoring function of user's access control. Authentication services acquire context information by surround sensor or device besides access information of approached subject. And then, through comparison and analysis of context information about surround environment of accessed user is in charge of pre-processing about authority level of user who wants access. And, through authorization service is in charge of function that provide to data about authority of user to access. Context knowledge repository is storing both context information which analyze to data from authorization service and resources which want approach of user. User & Role, Constraint Policy, Context Knowledge Model represent either approval or disapproval about including request of access to transaction list and each transaction and that is storing as type of rule about approval information. Context-aware access control model is using OWL(Web Ontology Language) for collecting and analyzing context information about surround environment of user's.
4 Profile and Context Modeling 4.1 Definition of User Profile and Context Model A user’s profile specifies information of interest for an end user. So the profile was structured user information part and service information part. User information part was stored user’s information such as user’s name, inclination, hobby and Service
348
C. Jang and E. Choi
information part was stored services which we were used such as service name, service provider etc. structure of user profile was follow: - User Information: User name, User ID, Personal inclination, hobby, etc - Service Information: Service Name, Service Provider, Service context, Service frequency value, etc - Role: Person(Doctor, Nurse, Patient), Place or Location(Room) - Schedule: work time, visit time Because profile stored how much the service information used, stored not only used service, but also information when, how, where used. Also, there are stored the information about what context used. In this paper propose to automated context-aware access control model of concept such as figure 2 at below. Authentication service provides to authorization service that is user’s ID after authentication through various context-aware information of attempted user about information access. Authorization service performs not only supervise for requirements about information access of user’s but also decision of context information for all constraints by role. And then that provide contextknowledge model repository for location of resources that is accessible of user. User&Role repository is storing role about user's ID and authority policy repository is describing security policy. Context knowledge model is the map of knowledge that provides locations of all information resources which is accessible according to role and context information. This method is limited by resources access along the surrounding environment in case of accessible user request to approach.
Fig. 2. Concept of context-aware access control
Context-aware access control model is divided with authentication service, authorization service and context knowledge repository. In this paper, model which was proposed defines basic information, location, time, device of user through using owl. In this paper, we analyze user’s role through the
D-PACs for Access Control and Profile Storage on Ubiquitous Computing
349
Fig. 3. User-based context model using OWL
location and time information, we suggested user-based context model which include location and time information for understanding relation of context information using OWL. Figure 4 show owl source code and figure 5 show appearance of source through Protégé from owl source code.
Fig. 4. OWL source code
350
C. Jang and E. Choi
Fig. 5. Ontology modeling attribute in Protégé application
4.2 Access Control of D-PACs Also, we define to sequence about resources access at the below.
Fig. 6. Performance architecture of Context-aware access control model
① ②
user make an approach to authorization service for authority of authentication to access in resources. User utilize for application in order to access of resources. Authorization service call up authentication service for authorizing of authority about user in present. Authentication service is collecting context information of user's surroundings about approach of resources in present. For user's role to request of approach of resources and context-aware service that ask for context information.
③
D-PACs for Access Control and Profile Storage on Ubiquitous Computing
351
④
Acquired information by context information of user's surroundings transfer to authorization service module and authorization service module transmit information about receiving of acquired information to authentication service module. Acquired authorization service module by context information of user's surroundings try to access of resources that is approach to context knowledge repository for performing access control and role assignment of user. It request data of access policy and information about role-assignment of user from context knowledge repository. Authorization service is granting access authorization by access policy and role of user who want to approach of resources in present. User request to service through acquisition of access authority about assigned role. Authorization service module make request to service and authorization service module make an approach to suitable resources in level of access authority through level of authority and role by assigned resources of requiring to user. Context knowledge repository can be approached to suitable resources about level of access authority by assigned of authority, security policy and context of user in present.
⑤ ⑥ ⑦ ⑧ ⑨
5 Conclusion In this paper, we suggested D-PACs which stored location, time, and frequency information of often used service among various services and put the service which will be expected to use at any location. The system used a profile made in the form of an XML document, and classified information, which was used by users when context arose with elements such as location, time, date(week), and frequency. If the user is located at a specific place, our system provides a service to the user through location, time, date(week), and frequency information, which is stored in the user’s profile. The processing of similarity assessment between current and past context was very simple and fast, and more accurate than previous techniques. Therefore, we can reduce the recommending time of service, and are able to provide a more accurate service. Meaning of Ubiquitous computing environment is what services are available for users to use computers conveniently and naturally in common life without constraint of location or time. Thus, in distributed computing environment such as ubiquitous environment, a user is efficiently available to use and to share of resources among users. We also need the access control model that controls user access to access to any sharing resources possibly and is able to control the approach to user without authority to use efficient resources. Therefore, this paper proposed the model that has advantages of which active authorization is more possible than the existing access control models by adding a function of authorization for collaborative resources control about other subjects differently from RBAC and xoRBAC. The proposed model called automated context-aware access control model in this paper will be making the system of active access control based on suitable context-aware in ubiquitous environment. We grant a role of access authority for information resources and user to assign a suitable role. And then, we provide the service available to information resources through valid access authority of user who is suitable. Also, for
352
C. Jang and E. Choi
active access control based on context-aware, we use a context role by quantificational expression which shows a relationship between context information. To use information resources, we will be implementing active access control based on context-aware that enables to estimate the validity of acquired access control through checking satisfaction of security policy of context role in present(although the user has a assigned role). And, to adopt a suitable as context change we will provide the service which should be provided for user in a certain context with security policy through being automatically aware of transition of context role.
References 1. Lyytinen, K., Yoo, Y.: Issues and challenges in ubiquitous computing. Communications of the ACM 45, 62–96 (2003) 2. Schilit, B.N., Adams, N., Want, R.: Context- aware computing applications. In: Proc. IEEE Workshop on Mobile Computing Systems and Applications, pp. 85–90 (1994) 3. Potonniée, O.: A decentralized privacy- enabling TV personalization framework. In: 2nd European Conference on Interactive Television: Enhancing the Experience, euroITV 2004 (2004) 4. Klyne, G., Reynolds, F., Woodrow, C., Ohto, H., Hjelm, J., Butler, M.H., Tran, L.: Composite Capability/Preference Profiles (CC/PP): Structure and vocabularies 1.0. W3C Recommendation. W3C (2004) 5. Hess, C.K., Campbell, R.H.: An application of a context-aware file system. Pervasive Ubiquitous Computing 7(6), 339–352 (2003) 6. Khungar, S., Riekki, J.: A Context Based Storage for Ubiquitous Computing Applications. In: Proceedings of the 2nd European Union Symposium on Ambient Intelligence, pp. 55– 58 (2004) 7. Chakraborty, D., Chen, H.: Service Discovery in the future for Mobile Commerce, ACM Crossroads (2000) 8. Mayrhofer, R.: An Architecture for Context Prediction, vol. C45. Trauner Verlag, Schriften der Johannes-Kepler-Universität Linz (2005) 9. Byun, H.E., Cheverst, K.: Exploiting User Models and Context-Awareness to Support Personal Daily Activities. In: Bauer, M., Gmytrasiewicz, P.J., Vassileva, J. (eds.) UM 2001. LNCS (LNAI), vol. 2109. Springer, Heidelberg (2001) 10. Byun, H.E., Cheverst, K.: Utilising context history to support proactive adaptation. Journal of Applied Artificial Intelligence 18(6), 513–532 (2004) 11. Agostini, A., Bettini, C., Cesa-Bianchi, N., Maggiorini, D., Riboni, D., Ruberl, M., Sala, C., Vitali, D.: Towards highly adaptive services for mobile computing. In: Proceedings of IFIP TC8 Working Conference on Mobile Information Systems, MOBIS (2004) 12. Burke, R.: Hybrid Recommender Systems: Survey and Experiments. User Modeling and User-Adapted Interaction 12(4), 331–370 (2002) 13. Lemire, D., Boley, H., McGrath, S., Ball, M.: Collaborative Filtering and Inference Rules for Context-Aware Learning Object Recommendation. International Journal of Interactive Technology & Smart Education 2(3) (2005) 14. Weiser, M.: Hot topics-ubiquitous computing. IEEE Computer 26(10), 71–72 (1993) 15. Barkhuus, L., Dey, A.: Is Context-Aware Computing Taking Control away from the User Three Levels of Interactivity Examined. In: Dey, A.K., Schmidt, A., McCarthy, J.F. (eds.) UbiComp 2003. LNCS, vol. 2864, pp. 149–156. Springer, Heidelberg (2003)
D-PACs for Access Control and Profile Storage on Ubiquitous Computing
353
16. Elfeky, M.G., Aref, W.G., Elmagarmid, A.K.: Using Convolution to Mine Obscure Periodic Patterns in One Pass. In: Hwang, J., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 605–620. Springer, Heidelberg (2004) 17. Sur, G.M., Hammer, J.: Management of user profile information in UbiData, Dept. of CISE, University of Florida, Gainesville, Technical Report TR03-001 (2003) 18. Zhang, J., Helal, A.S., Hammer, J.: Ubidata: Ubiquitous mobile file service. In: Eighteenth ACM Symposium on Applied Computing (2003) 19. Hinze, A., Voisard, A.: Location- and time-based information delivery in tourism. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J., Theodoridis, Y. (eds.) SSTD 2003. LNCS, vol. 2750. Springer, Heidelberg (2003) 20. Chen, A.: Context-Aware Collaborative Filtering System: Predicting the User’s Preference in the Ubiquitous Computing Environment. In: Strang, T., Linnhoff-Popien, C. (eds.) LoCA 2005. LNCS, vol. 3479, pp. 244–253. Springer, Heidelberg (2005) 21. Kirsch-Pinheiro, M., Villanova-Oliver, M., Gensel, J., Martin, H.: Context-Aware Filtering for Collaborative Web Systems: Adapting the Awareness Information to the User’s Context. In: ACM Symposium on Applied Computin, SAC 2005 (2005) 22. Ferraiolo, D.F., Cugini, J.A., Kuhn, D.R.: Role-Based Access Control (RBAC): Features and Motivations. In: 11th Annual Computer Security Application Conference (November 1995) 23. Sandhu, R.S., Coyne, E.J.: Role-Based Access Control Models. IEEE Computer 20(2), 38– 47 (1996) 24. Sandhu, R.S., Ferraiolo, D., Kuhn, R.: The NIST Model for Role-Based Access Control: Towards a Unified Model Approach. In: 5th ACM Workshop on RBAC (August 2000) 25. Neumann, G., Strembeck, M.: An Approach to Engineer and Enforce Context Constraints in an RBAC Environment. In: 8th ACM Symposium on Access Control Models and Technologies (SACMAT 2003), pp. 65–79 (June 2003) 26. Covington, M.J., Moyer, M.J., Ahamad, M.: Generalized role-based access control for securing future application. In: NISSC, pp. 40–51 (October 2000) 27. Convington, M.J., Fogla, P., Zhan, Z., Ahamad, M.: Context-aware Security Architecture for Emerging Applications. In: Security Applications Conference, ACSAC (2002) 28. Biegel, G., Vahill, V.: A Framework for Developing Mobile, Context-aware Applications. In: IEEE International Conference on Pervasive Computing and Communications, PerCom (2004) 29. Gu, T., Pung, H.K., Zhang, D.Q.: A Middleware for Building Context-Aware Mobile Services. In: Proceedings of IEEE Vehicular Technology Conference, VTC (2004)
An Approach to Roust Control Management in Mobile IPv6 for Ubiquitous Integrate Systems Giovanni Cagalaban and Byungjoo Park* Department of Multimedia, Hannam University, Ojeong-dong, Daedeok-gu, Daejeon 306-791, Korea [email protected], [email protected]
Abstract. Ubiquitous healthcare systems offer a dynamic collection of services for patient monitoring from a mobile and diverse range of sources such as huge volume of sensor data. This is enabled by the rapid advancement of Internet technologies such as mobile IPv6 protocol which integrates real-time monitoring of people with health illnesses and improves situation and location awareness even if people are on the move. This research proposes a patient control management with the support of mobile IPv6 networks. In particular, this paper integrates ubiquitous healthcare and wireless networks for efficient and reliable access and transmission of information to healthcare centers. Heterogeneous devices such a mobile devices and wireless sensors can continuously monitor patient status and location which have the capabilities of communicating remotely with healthcare centers, dedicated servers, physicians, and even family members to effectively deliver healthcare services. Keywords: Ubiquitous healthcare, Mobile IPv6, Patient control.
1 Introduction Ubiquitous healthcare systems are increasingly growing to be useful anytime and anyplace due to the paradigm shift from health supervision to health preservation, the development of ubiquitous technology, and the state-of-the-art technological infrastructures available virtually everywhere [1]. This is made possible due to recent technological advances in wireless networking, microelectronics integration and miniaturization, sensors, and the Internet which allow people to fundamentally modernize and change the way health care services are deployed and delivered. These enabling technologies which range from biosensors, context-aware sensors, handheld computing devices, to wireless communication technologies, and information processing technologies enable the removal of spatial and time barriers in information services and allow seamless exchange of information online. The proliferation of mobile devices and wireless sensors has become a key in helping the transition to a more proactive and affordable healthcare services. They are capable of helping elderly people check their health conditions, arranging healthcare schedules, and administering diagnosis of their diseases [2][3][4][5]. Additionally, *
Correspondent author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 354–359, 2011. © Springer-Verlag Berlin Heidelberg 2011
An Approach to Roust Control Management in Mobile IPv6
355
they could be implemented for health monitoring of patients in ambulatory settings [6]. Even in motion, patients can keep them in touch and improve their independence by delivering personalized healthcare services in home regardless of one’s location. It also aid physicians and personal healthcare consultants to care about their patients and clients, such as remote access to patients’ medical records and provide remote medical recommendations. Mobile IPv6 protocol allows mobile devices to communicate with other devices while a mobile device is in motion. IP-layer protocols that enable terminal mobility in IPv6 [7] networks has been defined by the Internet Engineering Task Force (IETF). The mobility support in IPv6 allows an entire network which can include wireless body area networks (WBAN), personal area networks (PAN), and networks of sensors to migrate in the Internet topology. The rise in Internet access and connectivity using IPv6 are gradually incorporated with various mobile computing devices. However, up to this time, a limited number of researches focus on integrating IPv6 protocol to ubiquitous monitoring due to heterogeneity of wireless sensors and distributed resources. One research is a collaborative monitoring [8] of people of all ages which utilized IPv6 equipments and data communication methods. Despite its introduction of IPv6 protocols to support reliable communication, the project does not include IPv6 mobility support mechanisms such as mobile IP, fast mobile IPv6 (FMIP) and network mobility (NEMO) protocols. In this paper, we propose an approach for patient control management for delivering ubiquitous healthcare services by integrating mobile IPv6 networks. Here, we consider integration of mobile device and wireless sensors with awareness of their environment and situation as context to exploit the functionality of mobile IPv6 protocol which allows the seamless exchange of information online. Additionally, context information generated by location, health and home environment collected from heterogeneous devices can be useful in improving user interaction and support of prompt and reliable healthcare services.
2 Patient Control Management The increasingly diverse healthcare environment, deriving and combining requirements of patient monitoring involving fixed and mobile patients in indoor and outdoor environment have become a difficult task to do. Numerous healthcare applications require reliable monitoring of patients such as those in a hospital or nursing home. The use of wearable devices and short-range communications brought some progress related to patient monitoring. As the number of patients who wear or use communications devices increases, the need for comprehensive patient monitoring solutions developed using wireless intelligent sensors also increases. The design and deployment of these wireless sensor networks can be a cost-effective alternative to the growing number of sensor networks. Patient control management can be implemented in a situation where a patient is monitored by a caregiver or a medical staff in a home regularly. Consider Mr. Kim, an elder who has a systemic, arterial hypertension and needs to check his blood pressure from time to time. This can be done by continuously monitoring and logging
356
G. Cagalaban and B. Park
his vital parameters. Monitoring can be done using mobile and wireless sensor networks so the patient can move outside the house while being monitored. This capability allows the monitoring of mobile and stationary patients in indoor and outdoor environments such as shown Figure 1.
Fig. 1. Patient control management
3 Ubiquitous Healthcare System The ubiquitous healthcare system supports intelligent healthcare with network mobility support is shown in Figure 2. It aims to provide an efficient infrastructure support for building healthcare services in ubiquitous computing environments. A platform deploys the architecture on the heterogeneous hardware and integrates software components with the web service technologies. As a result, the flexible system is capable of attending elder persons by providing appropriate services such as interactions, healthcare activities, etc. in an independent and efficient way. 3.1 Integrating UHS with Mobile IPv6 IPv6 [9] has built-in features that allow to support more effectively the new services requested by new demanding applications. This includes support for mobility, multicast, traffic reservation, and security. IP is growing to be the base technology of future networks to provide all kind of services and through different access technologies, both fixed and mobile. Thus, future devices will be connected to the Internet through IPv6 eventually.
An Approach to Roust Control Management in Mobile IPv6
357
Connection to the network must be available anywhere and anytime to facilitate the faster access and delivery of ubiquitous healthcare services even when devices are on the move. Since IP networks were not fully designed for mobile environments, a mechanism through mobile IPv6 is required to manage the mobility of the equipment topologies. However, the drawback of this approach is that all embedded devices must be designed to operate this protocol. A recent research by IETF NEMO Working Group [10] put on an effort to address issues related to network mobility where the entire IPv6 networks change their point of attachment to the Internet topology due to mobility.
Fig. 2. Ubiquitous Healthcare System
For efficient patient control management, a testbed setup is designed as shown in Figure 3. For terminologies, it is composed of mobile devices and network of sensors which obtain connectivity through the mobile router (MR). A Mobile Node (MN) is a device that implements the mobile IP protocol and has its home network outside the network and is roaming around the network. A bidirectional tunnel is configured between the MR and its Home Agent (HA) where it is located in the home network. HA is responsible for receiving all the traffic addressed to the mobile network and then forwards it to the MR through the tunnel. The MR removes the tunnel header and forwards the traffic to its destination within the mobile network. In the same manner, the traffic originated in the mobile network is sent by the MR towards the HA through the tunnel, the HA removes the tunnel header and forwards the packets to their destination. The demands for mobility are not restricted to single terminals anymore as Internet access becomes ubiquitous. The testbed setup has host mobility support for all the devices used in the network as shown in Figure 3. In this scenario, the mobile network has at least a MR that connects to the fixed infrastructure and the devices of the network connect to MR.
358
G. Cagalaban and B. Park
3.2 Support for Multihoming and Multicasting A continuous connectivity to the Internet must be provided anywhere and anytime to ensure safety and reliability. This can be done when network is preferably best connected via several interfaces, several access technologies and to distinct access networks referred to as multihomed network. The testbed setup must be able to deal with handovers where handovers may be performed between distinct domains and topologically distant parts of the Internet. Having multiple points of attachment is effective to avoid disruption of service when either a particular technology is not available in the geographic area or when one is experiencing failure of devices.
Fig. 3. Testbed setup
Mobility in the network requires managing multicast group membership and establishing the necessary routes for data transmission. In the case of home agents, they are static but have the knowledge of the locations of the mobile nodes they are serving. Hence, it is logical for a multicast HA to forward multicast traffic to the mobile node through the Mobile IP tunnel via the foreign agent (FA). The multicast support for mobile hosts can be implemented through the use the bidirectional tunnel (BT) [11] by a MR between the HA and the MR’s CoA located in the foreign network. Using bidirectional tunneled multicast, the MR on the home network must also be a multicast router. In this approach, subscriptions are done through the home agent. When the mobile node is away from home, a bidirectional tunnel to the home agent is set up. This allows both sending and receiving of multicast packets with the same performance expected to static nodes [12].
An Approach to Roust Control Management in Mobile IPv6
359
4 Conclusion Ubiquitous healthcare systems are increasingly growing to be useful anytime and anyplace due to the paradigm shift from health supervision to health preservation, the increasing number of the elderly, the development of ubiquitous technology, and the cutting-edge IT infrastructure available virtually everywhere. This is aided by the rapid growth of mobile communications networks, and the recent advances in wireless sensor networks and embedded computing technologies. These networks are evolving to provide not only the communication services but also data services significant for delivery of ubiquitous healthcare system. The paper demonstrated how mobile IPv6 and multicast support will improve ubiquitous healthcare monitoring, mobility and diagnosis of patient health status. For future work, security related problems will be addressed especially in the transmission and exchange of crucial healthcare information. Acknowledgments. This work has been supported by the 2011 Hannam University Research Fund.
References 1. Weiser, M.: Some Computer Science Problems in Ubiquitous Computing. Communications of the ACM (1993) 2. Lin, J.: Applying telecommunication technology to healthcare delivery. IEEE Eng. Med. Biol. Mag. 18, 28–31 (1999) 3. Istepanian, R.: Telemedicine in the United Kingdom: current status and future prospects. IEEE Trans. Inf. Technol. Biomed. 3, 158–159 (1999) 4. MobiHealth Project, http://www.mobihealth.org 5. UbiMon, http://www.ubimon.org 6. Istepanian, R., Jovanov, E., Zhang, Y.: Guest Editorial Introduction to the Special Section on M-Health: Beyond Seamless Mobility and Global Wireless HealthCare Connectivity. IEEE Trans. on Info. Tech. in Biomedicine 8, 405–414 (2004) 7. Johnson, D., Perkins, C., Arkko, J.: Mobility Support in IPv6; RFC 3775, Internet Engineering Task Force (2004) 8. E-Care Town Fujisawa Project Consortium, http://www.e-careproject.jp/english/index.html 9. Deering, S., Hinden, R.: Internet Protocol Version 6 (IPv6) Specification. Request For Comments 2460 (December 1998) 10. IETF Network Mobility (NEMO) Working Group, http://www.ietf.org/html.charters/nemo-charter.html 11. Chikarmane, V., Williamson, C., Bunt, R., Mackrell, W.: Multicast support for mobile hosts using Mobile IP: Design issues and proposed architecture. Mobile Networks and Applications 3, 365–379 (1998) 12. Generic Packet Tunneling in IPv6 - Bi-directional Tunneling Specification, http://tools.ietf.org/id/draft-conta-extensionsipv6-tunnel-01.txt
A Software Framework to Associate Multiple FPMN MSISDNs with a HPMN IMSI Dongcheul Lee and Byung Ho Rhe* Department of Electronics Computer Engineering, Hanyang University, 17, Seoul, 133-791, Korea {jackdclee,bhrhee}@hanyang.ac.kr
Abstract. Because the UMTS network provides the automatic roaming function, the mobile phone users can use their phones while they are in abroad. Some of them may want to use their HPMN MSISDNs in foreign countries. However, others may want to use the FPMN MSISDNs while using their own mobile phones. We present a software framework to associate multiple FPMN MSISDNs with a HPMN IMSI so that the mobile phone users can use the FPMN MSISDNs while they are visiting the FPMN without changing their USIM cards or phones. The MAP Location Update procedure and the mobile-originated call flow and the mobile-terminated call flow were modified by using the software framework. In simulation, we verified that the proposed scheme is more efficient than the 3GPP standard scheme on generating the ISUP signaling path. Therefore, mobile network service providers not only can provide the multiple FPMN MSISDNs service but also can reduce the foreign network access cost. Keywords: FPMN MSISDN, IMSI, ISUP, software framework.
1 Introduction Mobile phone users in the Universal Mobile Telecommunications System (UMTS) [1] network can use their phones in any other UMTS networks in foreign countries. This is because the UMTS network provides an automatic roaming function. In this environment, some mobile phone users may want to use other phone numbers which belong to other countries. Because if they call to someone who is in a foreign country by using their home countries’ numbers, the receiver will notice that it is an international call. Then the receiver may not want to receive the international call. However, if the callers use foreign country’s phone numbers when they want to call to someone who is in the corresponding foreign country, the receiver cannot notice that it is an international call. Then there is more possibility that the receiver may want to receive the call. Furthermore, if someone who is in a foreign country wants to call to the person who is in another country, the caller may feel comfortable if the dialed number is his local country’s number. For example, if Jack in Korea has a local phone number 010-1111-2222 and an USA phone number 630-111-2222 in one mobile phone, anyone in USA can call to Jack by using 630-111-2222. Since the caller in *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 360–370, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Software Framework to Associate Multiple FPMN MSISDNs with a HPMN IMSI
361
USA might think that the number is not international phone number, they will feel comfortable when they send a call to the number. To provide such services in the UMTS network, multiple Foreign Public Mobile Networks (FPMNs) Mobile Subscriber Integrated Services Digital Network Numbers (MSISDNs) [2] should be assigned to a single Home Public Mobile Network (HPMN) International Mobile Subscriber Identity (IMSI). However, 3GPP standards [3] have not provided the service. Since the standards only allow a single MSISDN for a single IMSI, users should use HPMN MSISDNs even when they go abroad if they did not change their Universal Subscriber Identity Module (USIM) cards with other USIM cards belonging to visited countries in their mobile phones. However, network providers cannot simply modify SS7 protocols to provide the service since the automatic roaming will not properly function if the network providers use different protocols that do not conform to the standards. This paper presents a software framework which functions as the Service Control Point (SCP) to associate multiple FPMN MSISDNs with a HPMN IMSI. The SCP provides multiple FPMN MSISDNs to a single HPMN IMSI. Also, the standards do not have to be modified since the SCP does not affect the standard call flow. The next section introduces related works to provide multiple MSISDNs to a mobile user. In Section 3, we propose the SCP software framework to provide the services and compare it with other related works. In Section 4, we will compare the cost of the standard scheme and the proposed scheme. The paper concludes with a short discussion.
2 Related Works There are two types of methods to provide multiple FPMN MSISDNs to a single mobile station: a mobile station based method and a network based method. [4] suggested a mobile station based method by using a dual IMSI USIM card. Also, [5] suggested a network based method by modifying the Mobile Application Part (MAP) [6] messages. 2.1 Mobile Station Based Method The simplest way to provide multiple FPMN MSISDNs to a single mobile station is using a dual-IMSI USIM card. This method stores two pairs of an IMSI and a MSISDN, which are used in different networks, in a dual-IMSI USIM card. If a mobile station transferred in a network, a STK application [7] that is stored in the USIM card can automatically chooses the suitable IMSI and MSISDN that corresponds to the network. Then the user can use the IMSI and the MSISDN while sending or receiving a call. If the user wants to select another IMSI and MSISDN manually, the automatic selection function stops working. However, since this method uses multiple IMSIs, multiple Short Message Service Center addresses and authentication keys should be stored in a USIM card. Also, a new type of USIM card is needed since it should store the STK application. Therefore, additional costs shall be added since network providers should make new contracts with the USIM card manufacturers. Furthermore, the power consumption of a mobile station will be increased because the STK application should be initiated every time the location update procedure occurs.
362
D. Lee and B.H. Rhe
2.2 Network Based Method This method modifies the Insert Subscriber Data (ISD) message of MAP to provide multiple FPMN MSISDNs to a single mobile station. When a mobile station sends an Update Location (UL) message to the Visitor Location Register (VLR) of a FPMN, the VLR transfers the UL message to the Home Location Register (HLR) of a HPMN through a Gateway Mobile Switching Center (GMSC). Then the HLR checks whether the mobile station is registered to the multiple FPMN MSISDNs service or not. If the mobile station is registered, it checks whether the mobile station has the MSISDN which belongs to the FPMN. If the FPMN MSISDN is found, the HLR sends the MSISDN to the VLR of the FPMN by using a CAMEL Application Part (CAP) [8] Insert Subscriber Data (ISD) message. The VLR fetches and stores the IMSI and the MSISDN from the message, and uses them when the mobile station initiates a call. However, the user should only use the FPMN MSISDN even though the user wants to use the HPMN MSISDN in the FPMN. Furthermore, when the user receives a call, the user cannot know which MSISDN was used for the dialled number.
3 Proposed Software Framework This paper presents a software framework that can associate multiple FPMN MSISDNs with a HPMN IMSI. To provide such services, the location update procedure is modified, which is used when a user is transferred to a FPMN. The SCP sends a FPMN MSISDN to the VLR of the FPMN. Also, the mobile-originated call procedure is modified so that users can select a calling number between a HPMN MSISDN and FPMN MSISDNs. Furthermore, the mobile-terminated call procedure is modified, which is used when the dialed number is an FPMN MSISDN. 3.1 Location Update Procedure If a mobile station moves to a FPMN and has a FPMN MSISDN corresponding to the FPMN, the MAP location update procedure occurs as Fig. 1. First, an outbound mobile station sends an attach request message to the Serving GPRS Support Node (SGSN) that belongs to the FPMN. Then the SGSN sends an Update GPRS Location (UGL) message to the HPMN HLR through the Gateway Location Register (GLR) in the FPMN. As a result, mobile station’s profile is transferred to the SGSN through the CAP ISD messages and the Packet Switching (PS) update location procedure completes. Since the network performs the combined mode [9] location update procedure, the SGSN requests the MSC to perform the update location procedure. Then the MSC sends an UL message to the HPMN HLR through the FPMN GLR to perform the Circuit Switching (CS) update location procedure. After the GLR receives the UL ACK message, it sends the Send Subscriber Information (SSI) message to the SCP if the HPMN service provider and the FPMN service provider made a contract to provide the multiple MSISDNs service. This message contains the mobile station’s profile, which includes the IMSI of the mobile station, the HPMN MSISDN, the global title of the originating MSC/VLR, and the global title of the HPMN HLR. Then the
A Software Framework to Associate Multiple FPMN MSISDNs with a HPMN IMSI
363
SCP analyzes the SSI message and searches the FPMN MSISDN of the mobile station. If the MSISDN is found, the SCP sends the FPMN MSISDN to the GLR by using the SSI ACK message. Then the GLR fetches the information in the message and sends it to the MSC/VLR through the ISD message with the Originating CAMEL Subscription Information (O-CSI). The GLR set the service key parameter for the multiple MSISDNs service to 1 and the SCF address parameter to the SCP’s address in the O-CSI. Then finally, the location update procedure is completed.
Fig. 1. This is the modified MAP update location procedure when the mobile station transferred in a FPMN
364
D. Lee and B.H. Rhe
3.2 Mobile-Originated Call Procedure When an outbound mobile station originates a call and the caller wants to use his FPMN MSISDN in the FPMN, the proposed mobile-originated call procedure occurs as Fig. 2. First, the mobile station sends the call origination request to the OriginatingMSC (O-MSC) in the FPMN. Since the MSC has the O-CSI corresponding to the mobile station and the service key for the multiple MSISDNs service is set, the OMSC sends the CAP IDP message to the address which can be identified in the SCP address parameter in the O-CSI. Then the SCP searches the corresponding FPMN MSISDN and sends it to the MSC through the generic number parameter in the CAP CONNECT message. Then the MSC set the calling number as the FPMN MSISDN and initiates the ISDN User Part (ISUP) [10] procedure. However, when the caller wants to use his HPMN MSISDN while he originates a call, another procedure should be used as depicted in Fig. 3. First, the user should send a called MSISDN with a predefined feature code. Then the MSC send the IDP message with the feature code. When the SCP receives the feature code, it set the generic number in the CAP CONNECT message as the HPMN MSISDN and sends the message to the MSC. Then MSC set the calling number as the HPMN MSISDN and initiates the ISUP procedure.
Fig. 2. This is the modified mobile-originated call procedure when the mobile station originates a call with the FPMN MSISDN in the FPMN
3.3 Mobile-Terminated Call Procedure When a call originates in a FPMN and terminates to an outbound roamer in the FPMN, 3GPP standards suggested the procedure in Fig. 4. If a caller originates a call to an O-MSC in a FPMN, the O-MSC sends the Initial Address Message (IAM) message to the GMSC in the same network. Since the dialed number is an international type, the
A Software Framework to Associate Multiple FPMN MSISDNs with a HPMN IMSI
365
Fig. 3. This is the modified mobile-originated call procedure when the mobile station originates a call with the HPMN MSISDN in the FPMN
GMSC relays the IAM message to an international ISUP carrier. Then the carrier relays the message to the GMSC in the HPMN. Then the GMSC sends the MAP Send Routing Information (SRI) message to the HLR in the HPMN to query the location of the terminating mobile station. The HLR sends the Provide Roaming Number (PRN) message to the GLR in the FPMN to query the Mobile Station Roaming Number (MSRN) of the mobile station. Then the GLR relays the PRN message to the Terminating-MSC (T-MSC) and the T-MSC sends the MSRN to the HLR through the GLR by using the PRN ACK message. After the GMSC in the HPMN receives the MSRN from the HLR through the SRI ACK message, the GMSC changes the destination address of the IAM message to the MSRN and sends the message to the international ISUP carrier. The carrier relays the message to the GMSC in the FPMN and the GMSC relays it to the T-MSC. As soon as the T-MSC receives the IAM message, it checks whether the MSRN belongs to the managed range or not. If it does, the MSC requests paging to the terminating mobile station and completes the ISUP signaling path. However, in the proposed scheme, the mobile-terminated call procedure is modified as Fig. 5. This call flow is used when an outbound mobile station performed the location update procedure in a FPMN and the mobile station has the corresponding FPMN MSISDN and the caller in the FPMN sends a call to the FPMN MSISDN. If the originating mobile station requests to send a call to an O-MSC, the O-MSC sends an IAM message to a GMSC. The GMSC should be configured to route a call to the SCP if the dialed number is included in the FPMN MSISDN range. It means that the service providers should have the fixed MSISDN ranges for the multiple MSISDNs service and they should exchange the MSISDN ranges before launching the service. In this scenario, the SCP has the Sub-System Number 6, which is a HLR. Therefore, the GMSC sends a SRI message to the SCP with an FPMN
366
D. Lee and B.H. Rhe
Fig. 4. This is the standard mobile-terminated call procedure when the mobile station terminates a call in the FPMN
MSISDN. Then the SCP relays the message to the HPMN HLR after replacing the called number with the HPMN MSISDN and adding the Calling Name Presentation (CNAP) [11] information. The CNAP information is included in the DISPTEXT parameter and can be set like “This call is called to 630-111-2222” so that the receiver can identify whether the call is called to the HPMN MSISDN or other FPMN MSISDNs. Also the SRI message contains the Generic Number argument in the Additional Signal Information (ASI), which is set to the FPMN MSISDN. After the HLR receives the SRI message, it sends the PRN message with the ASI and the DISPTEXT to the T-MSC through the GLR to find the MSRN of the receiver. Also, the T-MSC fetches and stores the Generic Number in the ASI and the DISPTEXT so that it can use them when it receives the IAM message. The T-MSC sends the MSRN
A Software Framework to Associate Multiple FPMN MSISDNs with a HPMN IMSI
367
Fig. 5. This is the proposed mobile-terminated call procedure when the mobile station terminates a call with the FPMN MSISDN in the FPMN
to the HLR by using the PRN ACK message. Then the HLR sends the MSRN to the GMSC by using the SRI ACK message. After the GMSC receives the SRI ACK message, it sets the called address to the MSRN and sends the IAM message to the TMSC. Then the T-MSC pages to the mobile station and sends the Redirecting Party BCD Number as the FPNM MSISDN and the CNAP information and initiates the ISUP procedure. If we compare the standard scheme and the proposed scheme within this scenario, the ISUP path should pass by international ISUP carrier’s network in the standard scheme while the path is constructed within the FPMN when the proposed scheme is used. That is, the ISUP path is made of following nodes in the standard scheme: O-MSC, GMSC in the FPMN, international ISUP carrier, GMSC in the HPMN, and T-MSC. In this scheme, even though the caller and the receiver are located in the same network, a voice bearer should pass by international ISUP carrier’s network. However, in the proposed scheme, the ISUP path is generated within the FPMN and is made of following nodes: O-MSC, GMSC, and T-MSC. It simplifies a voice bearer path and reduces the amount of traffic in the Signaling Gateways. Since the voice bearer does not pass by international ISUP carrier’s network, the network provider can reduce the full-circuit cost and network access cost to provide the service.
368
D. Lee and B.H. Rhe
If we compare the proposed scheme with the methods in Section 2, the proposed scheme is more efficient than [4] since the proposed scheme does not need to store multiple IMSIs in a USIM card, and it does not need a new type of USIM cards. Also, when the mobile phone user receives a call, [5] cannot indicate the dialed number to the receiver, where as the proposed method can indicate the dialed number to the user before the user receives a call. Therefore, the receiver can identify whether the HPMN MSISDN or the FPMN MSISDNs were used. Furthermore, since the SCP can distinguish the dialed numbers, there are many opportunities that it can provide various additional services by using the numbers in the future.
4 Performance Evaluation In this section, we will compare the standard scheme and the proposed scheme in the aspect of the transmission cost. As we can see from Section 2.3, two schemes show large difference in building the ISUP signaling path when the call originates from a FPMN and terminates to the outbound roamer in the FPMN. In this evaluation, the scheme which makes long ISUP path will have high transmission cost. To measure the transmission cost, we made the cost function to compare both schemes. The function calculates the cost of transmitting voice traffic, and it is given by f ( x )ω[αβ IT + (1 − α ) β LC ]
(1)
where ω is the total number of outbound roamers, and x is a random variable for voice call duration, and α is the ratio of the roamers who have the FPMN MSISDNs among ω , and β IT is the average cost on using international ISUP carrier’s network, and β LC is the average cost on using the local ISUP network. We assumed that if roamers were registered in the multiple MSISDNs service and they were roaming in the FPMN, all the incoming calls to them were dialed to the FPMN MSISDNs and the proposed scheme was used. However, if roamers were not registered in the service, the standard scheme is used. Also, we assumed that the random variable x has an exponential probability density function that is given by f ( x) =
1
λ
e
−x
λ
(2)
where λ is an average value. Generally, since β IT is greater than β LC , the cost is reduced if α increases. To find the relation among the cost, α , and x , we conducted a simulation with following conditions: ω =5000, λ =10, β IT =350, β LC =11. From Fig. 6, we can see that if the proportion of roamers who have the FPMN MSISDN increases, the cost for transmitting ISUP traffic reduces. If α=0, it indicates only the standard scheme is used. If α=1, it indicates only proposed scheme is used, which indicates that all the traffic do not have to pass by international ISUP carrier’s network which will impose high cost.
A Software Framework to Associate Multiple FPMN MSISDNs with a HPMN IMSI
369
20000000 Accumulated Transmission Cost
18000000
α=01 계열
16000000 14000000
α=0.33 계열 2
12000000
α=0.66 계열 3
10000000 8000000
α=14 계열
6000000 4000000 2000000 0 1
4
7
10
13
16
19
22
25
28
31
34
37
Call Duration
Fig. 6. This graph compares the transmission cost of the standard scheme and the proposed scheme
5 Conclusion This work provides a software framework for associating multiple FPMN MSISDNs with a single HPMN IMSI. If network service providers use this scheme, the users can use not only their HPMN MSISDNs in the HPMN but also their FPMN MSISDNs while they are roaming in foreign countries. Furthermore, when users receive calls which were dialed to the FPMN MSISDNs while they are roaming in the FPMN, the traffic transmission cost is reduced by using the proposed scheme.
References 1. Oudelaar, J.: Evolution towards UMTS. Personal, Indoor and Mobile Radio Communications 3, 852–856 (1994) 2. Koien, G.M.: An introduction to access security in UMTS. IEEE Wireless Communications 11(1), 8–18 (2004) 3. 3GPP, http://www.3gpp.org/Specification-Numbering 4. Jiang, B.T., Mao, J.F.: Design of a PIFA-IFA-monopole in dual-SIM mobile phone for GSM/DCS/Bluetooth operations. In: ICMMT, pp. 1050–1053 (2008) 5. Jiang, Y.J.: Providing multiple MSISDN numbers in a mobile device with a single IMSI. U.S. Patent 7577431 (2009) 6. Chatras, B., Vernhes, C.: Mobile application part design principles. In: XIII International Switching Symposium, vol. 1, pp. 35–42 (1990) 7. Chin, L., Chen, J.: SIM card based e-cash applications in the mobile communication system using OTA and STK technology. In: IET WMMN, pp. 1–3 (2006)
370
D. Lee and B.H. Rhe
8. Lopez, M., Barba, A.: Evolution of CAMEL Service Platform beyond 3G networks. IEEE Latin America Transactions 4(5), 315–425 (2006) 9. Plehn, J.: The design of location areas in a GSM-network. In: Vehicular Technology Conference, pp. 871–875 (1995) 10. Maranhao, M.: ISDN-ISUP interworking-a critical view. In: SBT International Telecommunications Symposium, pp. 355–359 (1990) 11. Wietfeld, C., Seger, J.: User-oriented Addressing in Wireless Networks: Advanced Strategies and New Technical Solutions. In: 3rd International Symposium on Wireless Communication Systems, pp. 174–178 (2006)
Uplink Interference Adjustment for Mobile Satellite Service in Multi-beam Environments* Ill-Keun Rhee1, Sang-Am Kim2, Keewan Jung3, Erchin Serpedin4, Jong-Min Park5, and Young-Hun Lee6 1
Department of Electronic Engineering, Hannam University, 133 Ojung-dong, Daedeok-gu, Daejeon, Korea [email protected] 2 Department of Electronic Engineering, Hannam University, 133 Ojung-dong, Daedeok-gu, Daejeon, Korea [email protected] 3 University of Waterloo, 200 University Avenue West, Waterloo, ON, Canada [email protected] 4 Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77840-3128, USA [email protected] 5 ETRI, 138 Gajeongno, Yuseong-gu, Daejeon, 305-700, Korea [email protected] 6 Department of Electronic Engineering, Hannam University, 133 Ojung-dong, Daedeok-gu, Daejeon, Korea [email protected]
Abstract. In this paper, uplink interference, which can occur when frequencies are reused, and new measures are mainly discussed. First, many different types of scenarios on uplink interference in Mobile Satellite Service in both normal and reverse mode are designed. Then the simulator which analyzes uplink interference of a user link, among interferences that can occur when reuse frequencies to make efficient use of the limited frequency resources in multibeam satellite environments, is designed. The simulator provides the amount of interference in MSS and the way to reuse the frequency efficiently. The results of interference analysis are shown in I/N. With the given data, resolutions are proposed to minimize the interference by modifying satellite cells, diameter of the terrestrial cells, reuse of frequency and protective area. Keywords: Uplink interference, Mobile Satellite System, Multi-beam Satellite Service.
1 Introduction Wireless communications networks, which use radio waves to perform communications, are usually divided into satellite network and terrestrial network. Satellite network uses satellite as a medium to form a communication network, while terrestrial network *
This work was partially supported from ETRI (KI002072) in 2009.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 371–380, 2011. © Springer-Verlag Berlin Heidelberg 2011
372
I.-K. Rhee et al.
forms a network between terrestrial destination and transmitting stations. Satellite network can efficiently perform mobile communication service or many other high performance services where terrestrial network for airplanes, ships, or trains cannot reach. Therefore the use of satellite communication service via satellite network is increasing. Satellite network can have mutual relationship with terrestrial network to increase the service areas and to provide broadcast or multitasking service efficiently [1], [2]. There are two typical ways which satellites are functioning in the orbits. First one is called single beam method where satellite uses one beam to cover the whole coverage area for the service. The other method called multi-beam method divides coverage area into small separate areas and use multiple beam to provide services. The benefit of using single beam method is, it uses geosynchronous satellite which covers one third of the earth surface area. Multi-beam method uses big antenna to produce many beams with small radius. Because it uses multi-beams, it can reuse frequencies if beams are far from the other. Reuse of frequency can increase the efficiency of transmission, and increase the strength of receiving signal. Thus, it can expand transmission capacity at the same time minimize the receiving antenna, which has an economic beneficial [3]. In order to use both benefits of using satellite network and terrestrial network, a device which can use both networks is needed. To provide high speed satellite multimedia service through personal mobile devices, current satellite transponder, which uses single beam method, needs to be replaced with new transponder with higher frequency efficiency, smaller size and which can provide broadcast multi-beam, to improve its’ performance [4]. However, the research shows that co-channel interference on a device will be a critical issue and will eventually causes big problem. Thus, in this paper, co-channel interference which might occur on the device is inspected. Specifically, this paper concentrates on Mobile Satellite Service (MSS). To do so, many different types of interference scenarios are set as follows: interference on uplink in normal and reverse mode, interference on downlink in normal and reverse mode. To make scenarios more close to the real world, parameters of the signal source, noise, interference are set prior to the simulation. After the simulations of each scenario are done, results are shown in diagrams to make it easy to understand. Also, if the results did not satisfied the expected value, new measure to satisfy the standards are presented. The structure of this paper is as follows. In chapter II, analysis on uplink interference and its new measure to minimize interference are proposed. Lastly in chapter IV, conclusion is written.
2 Uplink Interference Analysis 2.1 Uplink Interference Scenarios For satellite network and terrestrial network, there is frequency share method and frequency separation method. Frequency share method uses range of frequency which is used by terrestrial system except the frequency that satellite cell uses. It increases the frequency efficiency but there is a high change of interference occurring. Opposite to share method, separation method separates the range of frequency between satellite system and terrestrial system. Thus interference between systems doesn’t occur, but
Uplink Interference Adjustment for Mobile Satellite Service in Multi-beam Environments
373
Fig. 1. Uplink Interference Scenario of Normal mode
efficiency is very low. For each method, there are two different modes. One called normal mode, and the other called reverse mode. Frequency of up and down is the same for normal mode while the frequency of up and down is the reverse in reverse mode. Figure 1 illustrates the interference in normal mode between up and down frequencies of satellite system and up and down frequencies of terrestrial system when they are the same. Figure 2 shows the interference of two systems where their modes are in reverse. In other words, up and down frequency of satellite system is the opposite of the up and down frequency of terrestrial system. As described in figure 1 and 2, interference can be expected when the same frequency is used in both satellite and terrestrial system in close area. In the diagram, dotted lines represent interference signals and straight lines represent wanted signal. Numbers in the figure 1 and 2 are uplink interference which can occur between satellite network and terrestrial network when they are sharing frequency range, and the numbers can be divided as in Table 1.
Fig. 2. Uplink Interference Scenario of Reverse mode
374
I.-K. Rhee et al. Table 1. Uplink Interference Parameters
Mode
Numbers
Normal Mode
1
2
Reverse Mode
3
4
Source of Interference Other Satellite Device + Terrestrial Device Other Terrestrial Device + Satellite Device Other Satellite Device + Base Station Other Terrestrial Device + Satellite
Wanted Receiver Satellite
Wanted Transmitter Satellite Device
Base Station
Terrestrial Device
Satellite
Satellite Device
Base Station
Terrestrial Device
Table 1 contains information about the source of interference, wanted receiver and wanted transmitted in normal mode and reverse mode. Interference are vary depends on the antenna pattern, wave loss, and transmitted/received power, etc on each system. There are few assumptions made in this paper during the analysis. First, the specification of satellite device and terrestrial device are the same. Second, the frequency which is used by satellite cell cannot be reused in the base cell which is placed within the satellite cell. Third, base station is located in the center of base cell. Last, all cases are considered to be the worst case. The interference scenario in the figure 1 and 2 are assumed to have 1 satellite to consider the ratio of the number of satellite device, number of terrestrial device, and the number of base station.
Fig. 3. Uplink - Satellite – Normal mode
Figure 3 is a detailed diagram of uplink interference when wanted receiver is satellite in normal mode. The source of interference is other satellite cells within mobile satellite which uses same frequency and terrestrial device within other satellites cell in close range, which also uses same frequency. In figure 3, black straight line represents wanted signal, red dotted line for interference from terrestrial device, and green dotted line for interference signal from satellite device. Figure 4 illustrates one of the uplink interferences, where wanted receiver is satellite in reverse
Uplink Interference Adjustment for Mobile Satellite Service in Multi-beam Environments
375
mode. This case, the source of interference is other satellite cells within mobile satellite which uses same frequency and base station within satellite cells around, which also uses same frequency. Black straight line in figure 4 represents wanted signal, red dotted lines for interference from base station, and green dotted line for interference signals from satellite device.
Fig. 4. Uplink-Satellite–Reverse mode
Fig. 5. Uplink-BS–Normal mode
Figure 5 demonstrates uplink interference when wanted receiver is base station in normal mode. The sources of interferences are terrestrial device within same satellite cells and other satellites, and other base stations in other satellite device with same frequency. In figure 5, red dotted line represents interferences from terrestrial device within other satellite cells, and green dotted line for interference signal from mobile satellite. However, interference signal from base mobile within same satellite cell is ignored.
Fig. 6. Uplink–BS Reverse mode
In this case, terrestrial device within its satellites cell, terrestrial device from other satellite cells and side lob of other satellite cells are the source of interference. In figure 6, red dotted line represents interference from terrestrial device, and green dotted line represents interference from satellite.
376
I.-K. Rhee et al. Table 2. Channel Capacity Parameters
Parameter BWT (Whole Bandwidth) BWU (User Bandwidth) ES/No (Signal Noise Rate) va (Voice Active Rate) S (Sector Rate) FR (Coefficient of Frequency Use)
Set Values 4.3MHz 180kHz 0.6 0.5 2.55 1
In order to analyze the interference of uplink, it is necessary to calculate the number of satellite device and terrestrial device first. To get this value for assigned channels, values from table 2 is substituted into equation 1 which will give the value 203 bits per second. Then, multiple the number calculated above with the number with the base cell, which is calculated by base cell plan, the number of mobile that causes interference is provide. The equation to calculate the channel capacity, C, is set with parameters in table 2. /
(1)
Assume that the terrestrial system uses 3GPP LTE [5], [6], and whole bandwidth is calculated with 30MHz, which is assigned to uplink, divided into 7, number of reuse, and the amount of voice usage is based on the voice communication which is in between 35~40%, but since the amount of people on the phone in each sector is greater than the average amount, it’s set to 0.5. Since normal base stations uses three antennas which has 120 degrees of beam range, which receives one third of regular interference signal, increased system capacity three times, but in reality, antenna pattern duplicates, and interference between sectors, the rate of sector is set to 2.55 [3], and the number of frequency reuse is set to 1 because it is necessary to produce high speed data transfer in limited frequency resources. 2.2 Analyzed Uplink Interference and Countermeasure In this section, the analysis is done under the assumption that satellite device and terrestrial device has the same parameter. The equation 2 illustrates the output performance of transmitting system, which is effective istropically radiated power (EIRP). BIRP = Pt + GT [dBW]
(2)
In the equation two, PT stands for transmitted power, GT represents transmitting antenna gain. Also, the received power of antenna is illustrated in equation 3. PR = PT + GT + GR – LBS [dBW]
(3)
where GR means receiving antenna gain. Lastly, LBS show free space loss in equation 4. LBS = 92.45 + 20log(f) + 20log(d) [dB]
(4)
d represents the distance to the satellite in km. f represents uplink frequency in MHz.
Uplink Interference Adjustment for Mobile Satellite Service in Multi-beam Environments
377
Equation 5 shows noise power. N = kBT [dBW],
(5) -23
In the equation 5, k represents Boltzmann’s constant, 1.379 X 10 = -2286 dB, B for noise bandwidth, and T for absolute temperature. In this case, let the wanted receiver heat noise power to be the standard and if the allowed amount of interference is 6%, then I/N should be below -12.2dB, if 10% then I/N below -10dB, and if 20% then I/N below -7dB [7].
Fig. 7. Flowchart for Uplink Interference Analysis
Figure 7 is a flowchart of the simulation, and I/N value is calculated using interference power and noise power given in the table 3. If the calculated value doesn’t satisfies the protective rate, protective areas are re-measured to meet the satisfaction. Four different cases of uplink interference are analyzed using simulation parameters, and satellite antenna pattern from table 2 and 3. Also, the results from the simulation used frequency sharing method, which shares the whole range of frequency between terrestrial system and satellite system. MS_FR is set to the maximum number of frequency reuse within one satellite cell, and it is one number lower than the number of satellite network frequency reuse. Figure 8 illustrates the amount of interference done to single channel as the diameter of base cell and MS_FR changes. As the diameter of base cell and MS_FR increases, it is clear that the interference decreases. In figure 9, for the practice purpose to set protective area, the diameter of the base cell is set to 5km, MS_FR to 6 and changed protective area to see the interference from 20 cells and 7 cells.
378
I.-K. Rhee et al. Table 3. Parameters for Uplink Interference Analysis
Parameter Altitude of Satellite Pattern of Satellite Antenna Maximum Satellite Antenna Benefit Satellite Transmitted Power Maximum Device Antenna Benefit Device Transmitted Power Height of Device Frequency of Uplink Whole Bandwidth User Bandwidth Noise Temperature Noise Bandwidth Satellite Network Loss Terrestrial Network Loss
Values 35,786km Rec. ITU-R S. 672-4 [8] 54.4dBi 30dBm 0dBi 26dBm 1.5m 1995MHz 30MHz 180kHz 27.97dB.K 52.55dB.Hz Free space loss COST-231 Hata model
The results shows that the minimum protective rate, which is -7dB, is satisfied when the diameter of base sell is 5km, MS_FR is 6, protective areas is greater than 85 km, and the amount of mobile in base cell used from 7 cell is only 10%.
Fig. 8. Uplink-Satellite–Normal mode
Fig. 9. Uplink -Satellite–Normal mode
However, these scenarios are not realistic and due to low efficiency of frequency, now consider the reverse mode interference for the analysis. Figure 10 illustrates results in I/N when the base cell diameter is 2km, 5km, and 10km while converting MS_FR. The results shows that that amount of interference is 10dB lower compare to normal mode. From figure 11, it is clear that the protective rate, which is -7dB(the amount of interference is 20% of noise), is satisfied when the diameter of bas cell is 10km, MS_FR is 6 and protective are is about 55km. Figure 12 illustrates the results in I/N as the diameter of base cell changes to 2km, 5km and 10km. as described, even though the parameter of MS_FR and the size of a cell is at the minimum, it does not satisfy the minimum protective rate of -7dB, it is required to reset protective area. In figure 13, the results is shown in I/N from when MS_FR is set to 6, diameter of base cell is 5km and 10km, as the protective area are change. It shows that the minimum protective rate of -7dB is satisfied when the size of base cell is 10km and the area of protection is greater than 60km.
Uplink Interference Adjustment for Mobile Satellite Service in Multi-beam Environments
Fig. 10. Uplink - Satellite – Reverse mode
Fig. 12. Uplink - BS – Normal mode
379
Fig. 11. Uplink - Satellite – Reverse mode
Fig. 13. Uplink - BS – Normal mode
Figure 14 shows the results in I/N from as the diameter of base cell is set as 2km, 5km and 10km with vary MS_FR. When MS_FR is greater than 5, and diameter of base cell is greater than 5km, it satisfies the minimum protective rate of -7dB. The main difference which determines the difference in the amount of interference done in normal mode and reverse mode is, in normal mode, the source of interference is satellite mobile and in reverse mode, it is side lob of satellite. Because the materials are different, it shows that the interference is 10dB greater in normal mode. In figure 15, MS_FR is set to 6, and the protective area is changed as base cell size is set to 5km and 10km respectively. In this case, no matter what the protective area is, when MS_FR is 6 and the base cell is 10km, the minimum protective rate of -12.2dB (6% of the interference noise) is satisfied.
Fig. 14. Uplink - BS – Reverse mode
Fig. 15. Uplink - BS – Reverse mode
380
I.-K. Rhee et al.
3 Conclusion Use of multi-beam satellite system can provide broadband service. Also, it reduces the price of satellite construction and increases the efficiency of frequency reuse. Due to those benefits, use of multi-beam satellites is increasing, but the problem of cochannel interference is increasing at the same time. However, specific way of minimizing interference is not yet categorized. Therefore in this paper, uplink interference in normal and reverse mode for MSS in multi-beam environment is analyzed. The analysis is done by simulating each scenario which is designed prior to this research. As stated above, interference on uplink in reverse mode and normal mode are all considered. The main purpose of this simulation and research was to find a way to share frequencies efficiently. To do so, the factors listed are modified to get the best result: the numbers of satellite cells, base cells, frequency reuse, and protective areas. The result showed that interference was low in reverse mode. Also, method to satisfy protective standard using different protective area or the percent of frequency reuse is proposed. The results are proposed to make full-duplex communication in multi-beam satellite system environment work and in MSS. It is hoped that this paper contributes on the development of other MSS and commercializing future generation high speed multimedia satellite business.
References 1. Roddy, D.: Satellite Communication, 3rd edn. McGraw-Hill, New York (2001) 2. Elbert, B.R.: Introduction to Satellite Communication, 3rd edn. Artech House, Boston (2008) 3. Egami, S.: A power-sharing multiple-beam mobile satellite in Ka band. IEEE J. of Select. Areas Comm. 17(2), 145–152 (1999) 4. Lim, K., et al.: Adaptive Radio Resource Allocation for a Mobile Packet Service in Multibeam Satellite System. ETRI Journal 27(1), 44–52 (2005) 5. Rojat, B.J., de Santis, P.V., Yeung, P.S.: Intel Sat’S Broadband System Studies. IEE (1998) 6. Lavncic, W.D., Brooks, D., Frantz, B., Holder, D., Shell, D., Beering, D.: NASA’s broadband satellite networking research. IEEE 37(7), 40–47 (1999) 7. Recommendation ITU-R M.1183, Permissible levels of interference in a digital channel of a geostationary network in mobile-satellite service in 1-3GHz caused by other networks of this service and fixed-satellite service (1995) 8. Recommendation ITU-R S. 672-4, Satellite antenna radiation pattern for use as a design objective in the fixed-satellite service employing geostationary satellite. pp. 1–15
Protecting Computer Network with Encryption Technique: A Study Kamaljit I. Lakhtaria MCA Department, Atmiya Institute of Technology & Science, Yogidham, Rajkot, Gujarat India [email protected]
Abstract. In today’s world the networking plays a very important role in our life. Most of the activities occur through the Internet. For the safe and secured exchange of information, we need to have security. The encryption has very wide applications for securing data. Latest authentication deals with biometric application such as fingerprint and retina scan. The different types of encryptions and their strength and standards are discussed. Keywords: Encryption, Network Security, Authentication, Biometric.
1 Introduction Today’s global village concept has brought the many unknown people together via electronic media and information technology. Most of the people in today’s world are familiar with Internet, World Wide Web applications, out of these people 40% [1] of them are still uses the unsafe browsing facility. As we talk about global village, there are many transactions happening each and every second of time, between people. To make sure that they do safe transactions every time, there must be some technology, which assures and safeties of usage. This is known to be Encryption. The concepts of Encryption is very old, as Greek General use to send his message from one place to other place through his messengers using scytale, a thin cylinder made out of wood [2], message will written on the parchment, if someone tries to read the message will appear nonsense. But if the other General receives the parchment he has to wrap it on the similar scytale to read the message. As the day passed the techniques were changed. Now to decode the Encrypted message will take millions of years. Why we really worry about our transactions? Do we do it in secure way all the time? Because the internet has so many vast connections, as we do not have control over the activities such as hacking, sniffing, breach of conduct, etc. So what is our solution to this problem? For this, we really need to come up with robust system, the system that is difficult to break! The birth of Modern Cryptography has taken place in 1970’s. “Cryptographic systems are characterized along three independent dimensions, T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 381–390, 2011. © Springer-Verlag Berlin Heidelberg 2011
382
K.I. Lakhtaria
the type of operations used transforming plaintext to cipher-text, the number of keys used and the way in which the plaintext is processed” [3]. On the other hand we have two methods to analyze the cryptography, one is Cryptanalysis and second one is Brute-force attack. These are the independent technique to know which cryptography method is adopted.
2 What Is Encryption? Encryption refers to set of algorithms, which are used to convert the plain text to code or the unreadable form of text, and provides privacy. To decrypt the text the receiver uses the “key” for the encrypted text. It has been the old method of securing the data, which is very important for the military and the government operations. Now it has stepped into the civilian’s day-to-day life too. The online transactions of banks, the data transfer via networks, exchange of vital personal information etc. that requires the application of encryption for security reasons. Whether we really understand what the encryption does in our day-to-day life, but we uses often. As now most of the business completely depended upon the internet for to buy, sell, and transfer money over e-banking system, organize, teleconference, provide various services all need the encryption for safe connection and privacy. Earlier computers didn’t have the networking facility. As the growth of networking industry the software’s more readily available for the individuals. The advantage of the business over the internet is realized and to keep the unauthorized people away from the network the encryption is started taking its own importance.
3 Basic Model of Network Security It requires the some basic structure like, a) b) c) d)
An algorithm, which transforms the important security, related activity A secret data that is used in the algorithm Building of methods for sharing and distribution of secret data and Define the protocol, which makes use of both algorithm and secret data to have security service.
As we can see in the below model, the network model has basic structure. Opponents will try to surpass the safety barrier to get the required information. The message that is travelling through the information channel is not in the simple format. It got blended with the secret message, which contains the cipher text and keys provided by the trusted 3rd party for encryption. Unless one knows the key to decrypt, they cannot get the message properly. Encryption keeps the message safe and secure, till it reaches the desired recipient. [4]
Protecting Computer Network with Encryption Technique: A Study
383
Fig. 1. Block diagram for Network Security
4 Classification of Encryption 4.1 Symmetric Key It is also known as single key encryption. It uses the same key for both the encryption and decryption of text. Types of algorithms for Symmetric Key: Stream Cipher: Here the plain text are encrypted one at a time, each bits of plain text are converted into successive varying digits. Ex. RC4, SEAL Sample Example: “We are spartans“ is written as “ZH DUE VSDUWDQV”
Fig. 2. Caser Cipher
Block Cipher: Here block of plain texts are encrypted, each block has fixed length and unvarying digits. Ex. Rijndael, IDEA (International Data EncryptionAlgorithm) Sample Example: “We are spartans“ is written as “ 25 51 11 24 51 34 53 11 24 44 11 33 34 ”
384
K.I. Lakhtaria
Fig. 3. Block Cipher
4.2 Asymmetric Key It uses the two different keys for encryption and decryption, public key is used for the encryption and private key is used for decryption. As the symmetric key encryption does not provide much of the security, the importance of the Asymmetric key is more. It is also known as Public key encryption. It has the combination of public key and private key, private key is only known by your computer while the public key is given to other computers with which it wants to communicate securely. As everyone has the public key, but to decode the message one has to use the private key. The combination key is based on the prime numbers, thus it makes highly secure. As many as prime numbers are there, that many keys are available. Pretty Good Privacy (PGP) is one of most public key encryption program.[5] Public key encryption can be adopted in large scale, such as for web server and the application to be secure. The Digital Certificate or digital signature gives the authentication between the users. These certificates can be obtained by the Certificate Authority, which plays the role as a middleman for both the users.
Fig. 4. Public Key Encryption Model
Protecting Computer Network with Encryption Technique: A Study
385
4.2.1 Public Key Infrastructure (PKI) To make most out of the encryption, the public keys must be built to create, maintain, use and distribute, we need the organization known as Public Key Infrastructure. 4.2.2 Certificate Authority (CA) Without the CA one cannot issue the Digital Certificate, which contains both the public and private key for encrypt and decrypt the data. Depending upon the volume of the identity verification, Certificate Authority can issue Digital Certificate for different level of trust. CA adopts identifying individual rather going by company. To verify individual CA can ask for Driver License as proof of identity or Notarized letter. This is only applicable for initial level of trust. For high level trust it can go for biometric information like fingerprint, iris scan etc. [6] 4.2.3 Registration Authorities (RAs) These have similar functionality as the CA has, but RAs are one down to the level of hierarchy. This will work under the CA, mainly to reduce the workload of Certificate Authority. The RA can issue the temporary digital certificates. The temporary digital certificates have limited validity, and not fully trusted, unless CA verifies them completely. [6] 4.2.4 Digital Certificates These certificates are used to verify the identity of a person or a company through CA. It can also be used to retrieve rights and authority. Some of them have limited access such as encrypt and decrypt. These Digital Certificates can be issued for particular laptops, computers, routers etc. Computers and web browsers have the facility to store these certificates in particular memory. [6] 4.2.5 RSA It most recognized asymmetric algorithm, the RSA stands for the last names of the inventors Ron Rivest, Adi Shamir, and Leonard Adleman. They developed this algorithm in 1978, since then it is widely used. There are other algorithms used to generate the asymmetric keys, such as ElGamel and Rabin, but not popular as RSA, because a large corporation RSA Data Security stands behinds it.
5 Secure Sockets Layer (SSL) and Transport Layer Security (TSL) [7] The encryption is properly adopted for Secure Sockets Layer, Netscape has first developed it, SSL is basically meant for Internet security protocol used by the web servers and browsers. It became the main part of security known as Transport Layer Security. SSL and TSL make use of CA (Certificate Authority), whenever browser retrieves secure web page there will an additional ‘s’ after the ‘http’. The browser checks three things while sending the public key and certificate, 1. 2. 3.
Is that the certificate is valid Is that the certificate comes from trusted party The certificate has the proper relation with web site, which it is coming from.
386
K.I. Lakhtaria
While initiating a secured connection between the two computers, one will generate the symmetric key and sends to other using public key encryption. Then the two computers can have safe communication using symmetric key encryption. Once the session is over, each will discard the symmetric key used for that particular session. For next session it requires again fresh keys, and cycle is repeated.
6 Security Attacks [8] More is the benefits; it’s more the risk always. As the more people benefits from the Internet, networking the more will be the security attacks too. The attacks may have the proper intention, such as stealing the user names, passwords, credit card details, social security numbers, personal identification numbers, or any others details which can be used and have the benefits and services. There are mainly two types: 1. Passive attack and 2. Active attack 6.1 Passive Attack It has no harm on system resources, but it tries to learn and makes use of system information. An unauthorized party or a person gain access to the system but eventually cannot modify the content or the data. The pattern of communication is observed and makes use of this information for attack. It is also known as Traffic analysis. 6.2 Active Attack It tries to alter the system resources and also has the adverse effects on their operation. In this attack the unauthorized person successfully gets into the system and has the ability to modify the message, data stream or a file. The attacks may of any kind, replay, masquerading, message modification and denial of service (DoS).
7 Encryption Standards [9] 7.1 Data Encryption Standards (DES) The most commonly used encryption programs are based on the Data Encryption Standard, which is implemented in 1977 by National Bureau of Standards (NBS), now it is known as Institute of Standards and Technology (NIST). The algorithm that is used for the data is known as Data Encryption Algorithm. It has the key length of 56 bit. It takes the 64bit block data and encrypts using the 56bit key. The possible combination for the 256 is over 72,000,000,000,000,000 keys. It is considered as secured, but as the time passed the speed of the computers increased tremendously. To break this key today’s computer can take very short time. (See Table 1) Different versions of DES are Federal Information Processing Standard (FIPS PUB 46), FIPS PUB 46-2, and 3. FIPS PUB 46-3 is revised version and introduced in 1997.
Protecting Computer Network with Encryption Technique: A Study
387
The strength of DES depends upon the two things, one is the Key Length and second thing is depended on the nature of the algorithm used. 7.2 Advanced Encryption Standards (AES) Because of the small key length the DES is no longer considered as safe for today’s applications. AES come up with key length 128bit using the symmetric block cipher. It has also different key size as 192bits and 256bits. Rijndael is the algorithm that is adopted in AES. Vincent Rijmen and Joan Daeman develop this AES algorithm. This is chosen over 15 contestants, the contest is organized by NIST (National Institute of Standards and Technology) in 1997, and it took three years to come-up with final winner, which survived all the tests. The possible key combination for this algorithm will be 2128 is over 3.4x1038 keys, which is much stronger as compared to DES. To crack the code time taken by the computer will be in years. Table 1. Average Time required for exhaustive key search
Key Size (Bits) 56 (DES) 128(AES)
Number of Alternative Keys 256 =7.2X 1016 2128=3.4X 1038
Time Required at 1 Encryption / μs 255μs = 1141 years 2127μs = 5.4 X 1024 years
Time required at 106 Encryptions / μs 10.01 hours X 1018 years
8 How the Encrypted Text Look Like? Below is the sample of encrypted text: OT8Hj8uA7inuG2tq5YjID5mESlm2F7exsc0X3fnZRFZhk8YCEpbnDLtOfSAGd K9+cO7/Db3MFlPJwCcAzlCJMElczhvsf2qVAcTDmsMEt3TVQFqolayND7cj4ldks CL3JDu83wup6wuYZnw05QekQ3MGd3heMpqk4Yx3211yfV7SEQQoUAlz3Y6Tw yzC5UWehb0a2dIWils1v6ZGZ1aVzih3AmrK53+JVQ0pBMj6wbRq/LRtZvoPNA2q LUZE4o4UTKH5G9ElnqrnxBvt3WukDcm1BxdwEtTCY9K/7Qq6X8= Actual Plain Text: “Encryption refers to set of algorithms which are used to convert the plain text to code or the unreadable form of text and provides privacy. To decrypt the text the receiver uses the “key” for the encrypted text”. [10]
9 Applications of Encryption 9.1 Data Protection The wide application of encryption is in the field of data protection. The data that is there in the computers is invaluable for the person or the company that owns it. Encryption is necessary for safeguarding the data or the information. The common application of encryption is such as files and email encryption. Encryption protects
388
K.I. Lakhtaria
the stored data on hard disk; the situation like theft of a computer or the attacker successfully hacks in into a system. The managing the encryption becomes more difficult as the access to the system increases. If the people working in a particular company are more the sharing of the key will be more, hence it reduces the effectiveness of encryption. 9.2 Authentication It means the proof of identification. As the user wants to login or sign-in, to the system using the user name or ID and password, the details are sent over network using the encryption. This can be verified as soon as the login page appears on the browser; an additional ‘s’ will appear next the http. It means it’s secured. Whenever we use the online banking for transactions we need to confirm, whether we secured or not by looking into the address bar and a padlock sign at the bottom or in the same address bar of the browser. (See fig. 5)
Fig. 5. HTTPS and PAD LOCK in address bar
10 Biometric Authentication [11] Some of the advanced systems require the biometric authentication. For log in into the very sophisticated laptops it may have face recognition system or the fingerprint scanning. Some of the biometric authentications include, 1. Iris scan (Retina scan) 2. Face recognition 3. Voice recognition 4. Finger print 10.1 Biometric Encryption (BE) Biometric encryption is the process that binds the PIN or key to biometric. It is not possible to get key or biometric from the stored master file / template. The key can be re-recreated only by producing the live biometric sample on the verification. The digital key is randomly generated up on sign up; the user will not have the clue about it. The key is completely independent of the biometrics. Once the biometric is obtained the biometric algorithm will attach the key to the biometric securely, and stores as the private template. Once the registration is over both key and biometrics
Protecting Computer Network with Encryption Technique: A Study
389
are discarded. On verification applicant will provide a fresh biometric to retrieve the PIN when it is applied over a legitimate BE template. If the biometric is not correct, then they will not get the PIN for the further application or use. The BE algorithm itself provides the decryption key. Advantages of Biometric Encryption: 1. 2. 3. 4. 5.
No storage of personal data required. No retention of biometric information or the template. Multiple/ cancelable / revocable Identifier No tempering No substitution attacks Improved authentication security
11 Future Scope of Encryption In today’s world the protection of sensitive data is one of the most critical concerns for organizations and their customers. This, coupled with growing regulatory pressures, is forcing businesses to protect the integrity, privacy and security of critical information. As a result cryptography is emerging as the foundation for enterprise data security and compliance, and quickly becoming the foundation of security best practice. Cryptography, once seen as a specialized, esoteric discipline of information security, is finally coming of age. No one would argue that cryptography and encryption are new technologies. It was true decades ago and it is still true today – encryption is the most reliable way to secure data. National security agencies and major financial institutions have long protected their sensitive data using cryptography and encryption. Today the use of encryption is growing rapidly, being deployed in a much wider set of industry sectors and across an increasing range of applications and platforms. Put simply, cryptography and encryption have become one of the hottest technologies in the IT security industry – the challenge now is to ensure that IT organizations are equipped to handle this shift and are laying the groundwork today to satisfy their future needs [12].
12 Conclusion The advantages of the networking is tremendous, the applications are vast. It saves the lot of time and energy. It has touched almost all fields, where you cannot say we don’t need network anymore. Education, commercial to life saving telemedicine application everything is dependent on networking. To make it happen the services should be reliable and secured. The Encryption serves the purpose all time and every time.
13 Glossary Authentication: A process used to verify the integrity of transmitted data, mainly messages. Block Cipher: It is a type of symmetric encryption algorithm, which takes a large block of plain text bits and converts into a whole cipher text block of the same equal length. The size of the block is 64 bit.
390
K.I. Lakhtaria
Cipher: It is usually replace the plain text with another object mainly to cover the meaning; the replacement is controlled by key. Cipher text: The form of encrypted text; an output from the encryption algorithm. Code: The replacement of original data to some other data with or without fixed rules. Cryptanalysis: A branch of cryptography deals with decode of cipher to recover the data. Cryptography: The branch of cryptology dealing with formation of algorithms for encryption and decryption, indented to secure the data or the message. Decryption: The process of transformation of encrypted text to the original readable text. It’s also known as deciphering. Encryption: The process of conversion of plain text into unreadable form, it’s also known as enciphering. Key: The value by which the algorithm does its encryption or the decryption of text. Private Key: The key which is known by the creator, and used in asymmetric encryption. Public Key: It is one of the key which is used in asymmetric encryption for securing the data. Secret: This key is same for both the user for encrypt and to decrypt the data, used in symmetric Encryption method.
References 1. Krebs, B.: The Washington Post, Forty Percent of Web Users Surf with Unsafe Browsers, July 1 (2008) 2. Tyson, J.: How Encryption works: Introduction to How Encryption works 3. Stallings, W.: Cryptography and Network Security: Principles and Practices, 3rd edn., p. 45. Pearson Education, Inc and Dorling Kindersley Publishing Inc., USA (2006) 4. Stallings, W.: Cryptography and Network Security: Principles and Practices, 3rd edn., pp. 32–33. Pearson Education, Inc., USA (2006) 5. Callas, J.: An Introduction to Cryptography (September 2006) 6. Cobb, C.: Cryptography For Dummies The PKIPrimer. John Wiley and Sons, Chichester (2004) 7. Tyson, J.: How Encryption works: SSL and TSL. How Stuff Works Website 8. Stallings, W.: Cryptography and NetworkSecurity: Principles and Practices, 3rd edn., p. 29. Pearson Education, Inc., Dorling Kindersley Publishing Inc., USA (2006) 9. Stallings, W.: Cryptography and NetworkSecurity: Principles and Practices, 3rd edn., pp. 90–150. Pearson Education, Inc., Dorling Kindersley Publishing Inc., USA (2006) 10. Online encryption software, http://infoencrypt.com/ 11. Cavoukian, A., Stoianov, A.: Biometric Encryption: A Positive-Sum Technology that Aczieves Strong Authentication, Security and Privacy, p. 15 (March 2006) 12. Moulds, R.: The Future of Encryption, nCipher website (2008)
A Study on the Strategies of Foreign Market Expansion for Korean IT Venture Company* Woong Eun1 and Yong-Seok Seo2,** 1 2
Hannam University Kookmin University
Abstract. This study is to account for the effort that domestic IT ventures would like to go abroad and the effort and knowledge to glocalize that companies had really gone abroad with a view of social capital. This study set the model that social capital goes on the performance and accomplishment with the internationalization theory and summed up the outcomes through the cases of company expanded overseas. The result of this study is that structural social capital has influence on relational social capital and this relationship makes fruits to accelerate their own effort to glocalize. It suggests the importance of making a relationship and the local corporation chief as issues. Keywords: Social capital, Establishment of trust, Strategy for expanding overseas, IT Venture Company.
1 Introduction The global economy has continuously changing toward two ways to globalize and to develop local economy at the same time. On this, the strategies for overseas expansion has changed variously and in a complicated way to make companies globalized. Korea holds the worldwide IT infrastructure; however, we have only the outstanding hardware field of IT industry and are still poor in conditions of software or contents, which is another axis of IT industry. In the late 1990s, the venture investment craze by the venture boom has caused small and medium-sized businesses to go abroad. So, a lot of small and medium venture companies carried out the overseas expansion without enough preparations which made them vulnerable to failure. But the number of venture declined with the burst of IT bubble has increased with the worldwide change of business environment. Both quantitative expansion and qualitative improvement were not done. We can recognize it through the fact that the number of venture listed on KOSDAQ gradually decreased. Although it is necessary to expand overseas as ventures’ strategies, the absence or the failure of strategies is the reason why Korea was not developed qualitatively despite the small market of Korea. There have been lots of theories from Stage theory of Uppsala school to the network approach, international new venture theory or Born-global theory, but all the * **
This research was supported by the 2010 Korea San-Hak Foundation (KSF). Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 391–410, 2011. © Springer-Verlag Berlin Heidelberg 2011
392
W. Eun and Y.-S. Seo
results of the studies could not explain the globalization of Korean IT venture directly because most of the studies were conducted in America and Northern Europe. On this study, we will analyze the effort of these domestic IT venture to expand overseas and the glocalization (making ours good to a certain region’s taste and likeness) of the ventures succeeded in expanding with view of social capital theory. Especially, we’d like to analyze the success factors of IT companies which succeeded in expanding into Japan market and to know more about the necessary competence for expanding overseas. Also, after setting the model that social capital leads to produce according to globalization theory, we made the summary about the product and the performance through the cases of successful IT venture to expand overseas. We had matched the pattern with a company group to reach and a company group not to reach the accumulated break- even point during a specific period after a company went abroad, and analyzed systematically the process reaching the developmental local managing stage and the process gaining the social capital, glocalizing and making product.
2 Precedent Study 2.1 The Study on the Glocalization The Stage theory of the studies on globalization was derived from the study on the way to expand abroad by analyzing the globalization stage of small manufacturing company in Northern Europe. It shows that the globalization process of company started from the export by the overseas orders and gone to the progressive international commitment (Johanson and Wiedersheim-Paul, 1975). Also, the knowledge of a specific market has to obtain a direct intervention into the market because it is not able to transfer abroad (Anderson, 1990). It is said that the knowledge of the overseas market accumulated through indirect export to the market in psychologically short distance because of lack of knowledge, this knowledge makes that market intervention increase. In here, the psychic distance is the total of factors interrupting the flow of the information in the market unlike the geographic distance. However, this stage theory has a limit to explain the early expanding overseas as a theory explaining the gradational and stepwise expanding process. The network view shows that all the company operation Including the globalization within the relationship between organization and individuality and it was to come out so as to make up for the stage theory having a limit to account for the rapid expanding overseas of the new small venture. Hara and Kanai (1994) said a company needs not only their own competence but also the external resources in order to avoid the uncertainty of business environment and expand. Blankenburg and Johanson (1992) found out the more knowledge and credits increase, the more market commitment boosts and it is very important to exchange not only each organization’s ability but also the knowledge as the relationship between companies develops in the complicated process. Bell (1995) explained that the relationship with the existing customers causes the globalization by finding out that phenomenon (client-follower) that a small company not certainly tries to find the way to go into overseas markets in psychologically short distant but proceeds into new nations as following customers. But in the case of Network view theory, it can be useful to explain the factors of IT
A Study on the Strategies of Foreign Market Expansion for Korean IT Venture Company
393
venture’s expanding overseas; however, it has a limit to account for internal factors as the external network relationship affects product and performance. Globalization pattern of the brand new small or medium sized venture companies is shown that the company expands into many nations at the same time by early starting to globalize in their founding unlike the traditional company’s beyond the psychic distance. As a result, International New Venture theory came to the force by not putting the needed knowledge directly but acquainting through the network. It is the theory that grasps the method to describe the venture companies go overseas as early as they found the company as factors such as founder’s position, goods and service level, the competition strategy, the technique, the industrial environment, and so on with view based on the network (Oviatt and McDougall, 1994, 1997; Madsen and Servais, 1997; Zahra et al. , 2000; Autio et al. , 2000;Sharma and Blomstermo, 2003). This theory regards network and board of directors’ experience from overseas market as major factors which determine globalization. Researchers on International New Venture theory express that as not only INV but also ‘born global' (Rennie, 1993; Madsen and Servais, 1997). INV theory is useful to explain the insufficient expanding overseas and both internal and external factors of IT company. However, it has a limit to explain perfectly the traits of Korean venture company expanding overseas after securing funds and competitiveness the past goods in the domestic market, by taking mainly advantages of the company glocalization of the glocalization factors, which is CEO’s past experience or network when it explains the expanding overseas. 2.2 Social Capital Social capital was used in Sociology by Coleman(1988, 1990) and Putnam(1993), and then became studied in many fields such as Economics, Business Administration, and Politics. First, Coleman(1988) supposed that social capital is not a singular factor but complexes made of bi or multi entity and organization, this is forming a society in many ways, and it can provoke a society member’s certain behavior in the frame of society made up like this way. Also, Putnam(1993)said that social capital is the network based on the inter-company’s trust and norms promoting the cooperation and the mediation for the profits of every individuality and organizations. Social capital is showing that network with a partner outside of the company can be a major factor unlike the Resource Based View theory that regards their own competitiveness as a major factor determining a company’s product not depending on external facts such as their economy condition or magnetism. Social capital is divided into structural, cognitive, and relational dimensions; the study has been conducting to prove these(Tsai &Ghoshal, 1996; Yli- Renkoet al. , 2002; Li Ling-yee, 2004; Presutti et al. , 2007). Structural dimension means the nature of inter-relation between organizations, the form of link between persons and organizations, and the relation pattern between a member oran individuality(Granovetter, 1985). It can be explained as the network centralization, the tie strength, and the multiplicity. Centralization is a degree that the link of network centralizing a certain individuality. The tie strength means how much stable and strong it is linked with other individuality in the network. The multiplicity is about how diverse the network is used.
394
W. Eun and Y.-S. Seo
Cognitive dimension is the resource formed through the values and cognitive system shared with individuality in the network, and the shared values and cognitive system are made through the informal symbol process like formal action, shared language and behavior style, and norms (Nahapiet&Ghoshal, 1998). Relational dimension is as the property created by the relationship with intermember and between individualities in the network, it is a set of software operating the network while structural dimension is the hard ware of the network (Nahapiet and Ghoshal, 1998). This relational dimension can be measured with personal interaction period, emotional closeness, and friendship in the network (Granovetter, 1985).
3 Methodological Approaches 3.1 The Study Method This study is about what effect social capital has on the performance of management by expanding overseas with the social capital theory. The case analysis method is taken for an authentic validity of this study. More specifically, we are going to confirm social capital has effect on the glocalization and then the good financial performance comes out when it goes into Japan market. On this study, we intend to drag out social capital factors needed for the stable growth in overseas market using the pattern matching technique, one of the case analysis methods. Pattern matching logic is to compare patterns observed empirically with patterns predicted (Trochim, 1989). The internal validity is gained if both of them match up through this method. For securing the internal validity, if the result matches up with a pattern predicted from one case, we insist more powerful result once it is analyzed through a pattern having independent variable from another case. This situation is called ‘theoretical replication study on inter-cases’, and rival explanation is to prove directly-opposed result comparing the pattern of independent variable. We can secure logical validity on the elementary proposition through these processes (Yin, 1989). The research method for this was to analyze literature data on each company, to set survey or interview questions, and to review and revise questionnaires through interviewing experienced experts and businesspersons in advance. Next, interview was conducted with overseas delegate or local manager of the selected company on revised questionnaires. On -site interview was conducted directly with a local company in Japan. Also, on the basis of on-site survey, second additional interview was done and after the visit to headquarters in Korea and related company in order to consider the differences between local company’s opinion and head office and related company’. 3.2 Objects of Study This research studied on IT ventures expanding into Japan. These ventures were selected to choose companies conducted a survey through Korea BusinessCenter(KBC) of KOTRA. The basic qualifications of targeted company are: 1) to expand overseas
A Study on the Strategies of Foreign Market Expansion for Korean IT Venture Company
395
with their own developed IT goods and service 2) to last at least 5 years after expanding 3) to have elementary developmental power in the country. We wanted to set aside a behemoth and small development studios considering sales figures of head office and local corporations. Also, product group was divided into hardware products, hardware maintenance services, software products and services, and processing services and internet services through the IT market summary summed up by Hoch(2000), and then we did a survey within software product and service company among them. It is including hardware manufacturing company such as old IBM in the case of hardware products, hardware maintenance service market is not different from traditional manufacturing market as a market providing service needed to maintain and manage hardware and it is difficult for software goods and service market to be effectively analyzed because service market and internet market have the strong individual market character like an embedded software built in the goods and SI(system integration). Therefore, this study considers the software goods market and the particular software market. The profile of company studied is as in the following Table 1. Table 1. The profile of company A Co.
B Co.
C Co.
D Co.
1999 Contents management 2000
1995 Security solution 2001
1991 Group ware, BPM 1997
2001
2002
2002
1997
2006
Corporate or Not
Corporation
Corporation
Corporation
Office
Expanding form
Corporation
Corporation
Corporation
Office
Product Character
For service Exclusive distributor, Strategic partnership
For package
For service
Diverse channel marketing
partnership
For package Sole distributor/Big company OEM/JV
Foundation Year Field of Product Year of Expanding Establishment of Corporation
Partnership
Remote control 2002
3.3 Models of Study 3.3.1 Deduction of Models by Study Proposition This study is about what effect social capital has on the glocalization and the performance. First, we accounted the relationship of structural dimension and relational dimension of social capitals. And we deducted the relationship between social capital and glocalization performance as a study proposition. The trust, key point of relational social capital, if they once get it, has effect on developing and keeping the relationship, building more diverse cooperation and
396
W. Eun and Y.-S. Seo
improving much more trust (Coleman, 1988). That is, when a person or organization originate in network through the mutual exchange, the trust can be built up and it improves the mutual exchange (Burt and Kenz, 1996). Structural social capital expresses as inter-relation, link form, or relation pattern between organizations (Granovetter, 1992), and close cooperation and a lasting relationship can be maintained in the network that network centrality, tie strength, multiplicity are high like an earlier study. These close and continuous interactions go on developing trust through knowledge and experience about a partner (Gulati, 1995). Therefore, when it is close and continuous, structural network has a positive effect on improving trust which is relational social capital by developing trust focusing on the social relation. Proposition 1. Structural social capital encourages relational dimension to create. On this study, social capital for glocalization is explained by being divided into structural and relational dimension. Cognitive social capital was excluded because of difficulty in analyzing as abstract notion when comparing with other notion. Structural dimension is the competence can be taken in the structure between head office, the Local Corporation and partner. This competence is heightened hen it has consentaneity with partner’s culture and strategy. Also, it is important whether they provide an environment that local corporation keeps foreign operation well. Because of that, not only trustworthiness of head office but also autonomy of a local corporation is important. And structural social capital is controlled by how much local corporation use the ability of head offices. If it is plentiful, they can do these things: the selections of optimal product, glocalization of product used assets of head office, and the best use of the local corporation. Relational dimension is social capital created in relationship between companies and a company can get the ability to absorb knowledge and obtain information about that it if it has a close tie with local corporations. Especially, knowledge for the glocalization of marketing is tacit one has not been written and it’s too implicit that it cannot get helpful information without this close tie. As ventures have a lack of ability collecting information a little, how a company can rapidly internalize the information needed in glocalization is to build up relational social capital unlike FDI theory. As these situations, a company has a lot of social capital can be glocalized and adapt to new country easily. Dyer(1996) found out there is the positive (+) relation between relational social capital and the performance in the network consisted of buyers and suppliers of car goods. In the case of a company having lots of face-toface contact, it has a short product development period and high quality products. The more relational factors and social capital they have, the greater trust they build up between two companies, thus a company can drag out the performance and operate for itself effectively by exchanging their knowledge through this. Proposition 2. The more social capital they have, the faster they can globalized and get a performance.
A Study on the Strategies of Foreign Market Expansion for Korean IT Venture Company
397
The study model may be simplified with above proposition as follows figure 1.
StructuralSocialCapital glocalization
RelationalSocialCapital
Performance
Fig. 1. The relation between social capital and performance
3.3.2 The Measurement of Proposition 3.3.2.1 The measurement of structural social capital. The structural dimension of three social capital dimensions means commonly the structure of the network and the link form between members. Exchange of knowledge and resources is occurred through this network (Tsai and Ghoshal, 1998). This structural dimension is explained as the network centralization, the tie strength, and the multiplicity like a precedent study. Firstly, the network centralization means the tie centralization that can make one’s centralization bigger through relationship with a focused person of network members (Ibarra, 1992). This study measured the network centralization through how much it is focused on network and branch manager’s authority, the major resources of the faceto-face contact, after expanding into the Japan market. The network tie strength is about how stable and strong the tie is with partner in the network. It results from the frequency of contact, keeping relation, the social homogeneity, the emotional strength, and the intimacy (Marsden and Campbell, 1984). This study measured the tie strength through the social homogeneity with a partner company and a size of the local corporation in Japan. The multiplicity is diversity for use of network. This study measured the multiplicity through the degree of network action for diverse aims as well as the privates of contract as an agent of sales in the relationship with a partner company. 3.3.2.2 The relational social capital. The relational dimension is made through the relationship with a partner built on the trust between members and organization in the network. Therefore, the trust stimulates the exchange of knowledge by cooperation and the ability to solve a problem of organization by exchange of knowledge and has a positive effect on the performance of a company as a result (Nahapiet & Ghoshal, 1998). This trustworthiness can be measured and formed by a period of exchange, emotional intimacy, and the close relationship between individualities. The trust is
真鍋誠司
divided into the rational trust and the relational trust( , 2002). The rational trust is something to be formed financially by the size of a company and the past experience, whereas the relational trust is a social relation accumulated by lasting connection. This study is to measure the relational dimension with both of them.
398
W. Eun and Y.-S. Seo
Under the supposition that rational trust is divided into the fair intention and the basic competence, rational trust is measured by the product power evaluating the superiority of the product and sales figure of the past; the relational trust is measured by personal relationship and emotional intimacy between cooperation partner and contractors in Japan focusing on lasting relation. The summary of the variable of social capital is like a table 2. Table 2. The measurement variable of social capital
Social capital Major factor
details
Network y Centralization Centralization Structural dimension
Tie strength
y Linking size y Linking
Multiplicity y Aims of tie y Rational Relational dimension
Trust y Relational
Analysis y Capability and role of corporation chief y Stability of corporation chief y Size and existence or not of corporation y Social homogeneity y Range of cooperation y Give exclusive rights or not y Objective product competence y Performance of the past y Period of cooperation with partner y Emotional intimacy y Close relationship
4 The Analysis of Case 4.1 Forming and Accumulating Social Capital 4.1.1 Forming and Accumulating Structural Social Capital Network centralization of structural dimension elements is decided by the number and quality of network connected certain individuality in a lot of network link (Tsai & Ghoshal, 1998). Because it is analyzed within the view of Korean IT ventures, this study has network centralization that local corporation and office in Japan can play a role as a link with Japanese partner company. So, it can be assumed how centralized on network the competence, past experience, role, and stability of corporation are. Company A, B, C have corporation in example companies; however, company D operates office without a local corporation, this study measured network centralization with regarding a vice-president handling everything in Japanese local company instead of corporation chief as the center of network. Because corporation chief of company A cannot play a major role on management of local company and manager of head office is in charge of local corporation chief, local chief is not a decision-maker but only an executor or interlocutor with head office and doesn’t have their own right to make a decision and select a cooperation
A Study on the Strategies of Foreign Market Expansion for Korean IT Venture Company
399
partner. Also, local chief doesn’t have special business experience in Japan. But an executor of head office does a role of a chief and maintains the relation and close tie with a partner company. Therefore, it helps trust be long lasting. Network centralization of company A is considered as a medium size. Corporation chief of company B had been continuously changed, so it was difficult to make a stable strategy. For the first time, the overseas business manager took part in a role of corporation chief but the vice-president of head office managed this directly because it was difficult to manage this effectively. As this system lasted, because of difficulty in staying long in that place, the absence of a local corporation chief caused lots of trouble in foreign relation. As a result, a corporation chief was designated for efficient management, who had no experience in the Japanese business and IT companies. So, the corporation chief had a limit to use network. After that, management and strategy of head office was changed and it has been changed with a Japanese chief again. Accordingly, network centralization is considered to be low. Company C also has changed a corporation chief. Because its first chief was established to perform the project of Japanese large company, it was managed directly by manager of head office. Later, the local corporation had been managed by board member of head office. It was changed with Japanese chief, which meant to take advantage of network of a native in order to boost their sales figure in BPM solution market. But there was still anything happened to sales figure because of troubles in communicating with head office. After that, it was changed with a Korean chief as of that time, who has 10 years’ experience in Japan business and no problem in communication with head office. After changing a chief, this company replaced BPM with groupware as the anchor product strategy. Finally, although corporation chief had high competence, network centralization seems to be low because it had problem in choosing product group and operating corporation with many times changes of chief. Company D has no difficulty in foreign relation and communication with head office as a vice-president, co-founder of head office, is directly managing everything of Japan market despite only an office. The vice-president has many experiences and network of Japanese business based on experience working in Japanese corporation of Korean big company and staying in Japan. He maintains and creates the relation steadily by managing directly business and cooperation through this experience. Competence and role of chief fits into the operation system of company D and it is possible to operate the corporation stably as a vice-president. As a result, it has high network centralization. Tie strength is decided by the number and traits of linking connected to network and measuring tie strength itself is difficult; however, this study in this case could measure it with social homogeneity with a partner company and the size and existence of corporation. The size of the local corporation of company A is very small. There is no reason that local corporation should be big. It is because a corporation chief is the head offices’ manager and local chief in them as the window of communication with head office. The local corporation has no meaning because the owner of main company manages directly to search a partner and adapt to that place from beginning of expanding into Japan. The company in exclusive partnership with company A is a
400
W. Eun and Y.-S. Seo
small and medium sized-business channel comparable with company A to deal with products willingly excluding a big company group. Also, company A and the exclusive partner company in homogeneity with a partner company made a contract for 1 year as the first and made a partner company take part in glocalization actively. Accordingly, both companies have high social homogeneity. Both companies sought their own interest through the interchange of knowledge and the supporting. It is revealed through the exclusive distribution contract is made from 3 years to 5years and next 20 years again. Company begun expanding into Japan by exporting the product to general trading company and next established the corporation in Japan for full-scale sales expansion of Japanese market. The size of the first corporation was 12 persons: the number of members of call center, marketing, supporting, and managing department. It was a big size as a venture expanding overseas, but it was not maker to the size of rival company’s supporting and call center. The partners affiliated with B2B and B2C products of Company B are very big as a general trading company. They only fulfilled all the products and couldn’t find any attraction from products of company B. Their tie strength is low in the point of the suitable size of company B and the selection of partners. Company C was big to go abroad for the project of client from the first time. The number of members reached up to 30 persons including marketing managers and engineers and now, 7 persons are working for it. It needed many members to perform the project of the client company but is maintaining members and business because of a lack of strategy for Japanese market after finishing the project. It became a burden to the local corporation. Company C should deal with partner due to product characters and the trait of business with partner is to meet the right partner in order to meet the final client. As company C had low products-recognition and trustworthiness of company C in Japan, it couldn’t meet right partners. Because Japanese market wherever it is in business or public office is unwilling to deal with company that not establish some trust, it is difficult to enter into market directly without some trust. Although it is modifying their strategy now, homogeneity of company C was low. Company D went to Japan with office and it didn’t need so much additional supporting service because of their strong product power from beginning of business. So, a vice president operated and managed the company with a staff of two on site. After the company made a contract from the first time became an exclusive distributor, it made a contract to be the sole distributor with three agencies and later it made a contract with secondary agency. It takes at least 3 years to make a contract with secondary agencies. Although the size of company D was smaller than the partner companies’, it overcame the difference with outstanding products. Homogeneity of company D is also high between partner companies. The jointventure was established by suggestion of the client company in the past. It was because it gained the trust on past experience of a vice president of company D. It has high social homogeneity. Relation multiplicity is expressed as the objective diversity of link in network. This study attempts to measure it through the range of cooperation and the exclusive right of Korea-Japan partner companies.
A Study on the Strategies of Foreign Market Expansion for Korean IT Venture Company
401
First, the range of cooperation of company A is broad in partnership with exclusive company. Company A with partnership in product sales into Japanese market is only supporting the development of product and aid for exclusive partner, which is mostly seeking for customer, glocalizing products, marketing on-site, and supporting. Company A should have a discussion with the exclusive company having the first negotiation right so that company A sells new products and expands into Japanese market more deeply. Finally, it is the case that multiplicity is heightened to help a partner company broaden the range of cooperation by granting the exclusive right. Company B has different channel companies on each product group. The range of cooperation is narrow because B2C or B2B product group were sold by channel contract with general trading market in the market from beginning of expanding. That is, because general trading market is only in charge of selling the product and company B is responsible for after management of products and glocalization of product, it had difficulties in sharing knowledge effectively. ASP product makes a good performance using the ISP supply method by cable provider but is not a long term revenue source. The range of cooperation is narrow because the security solution of cable provider is lack of strategy consentaneity as additional service. Company C went through a problem in make a partnership changing main product group from groupware to BPM in choosing product group and it was hard to broaden the range of cooperation without establishing the trust from Japanese market due to the fact that product group itself is strong for service. Company C had to do with the exclusive right. It is because it should be operated with a partner company mainly. Company D is in the range of cooperation that exclusive distribution, nearly like exclusive contract, OEM cooperation with Japanese big company, and foundation of joint-venture started as expanding business with many partners. Company D said that it can use quickly network and knowledge of a partner company to overcome the limit as foreign company about behind story on foundation of joint-venture. It maintains the partnership granting the exclusive right to a partner company from the beginning. (Table 3 is in the next page. ) 4.1.2 The Creation and Accumulation of Relational Social Capital (Trust Development) The trust is something main in relational social capital. The relational social capital based on the trust has a direct effect on increasing information exchange. It is a main element to make an effective network with Japanese companies after
真鍋
expanding into Japan. It is divided into the rational trust and the relational trust ( , 2002); rational trust can be measured by the existing product power of Korean IT venture (including product assimilation on site) or performance (past reference). Product power is the notion including both technical completion and functional stability. When objective product power is high, it helps a trade partner be reliable and do its business. Relational trust was measured by the cooperation periods, emotional intimacy and close relationship with a partner company.
誠司
402
W. Eun and Y.-S. Seo Table 3. Accumulation of structural social capital
A co.
B co.
C co.
D co.
medium
medium
Medium
High
High
Low
Low
Low
medium
Low
Low
High
Alike
different
Alike
Different
Social homogeneity
High
Low
Low
High
Tie strength
High
Low
Low
High
The range of cooperation
Broad
narrow
Narrow
Broad
Granting an exclusive right
Sole distribution
Partially exclusive
Impossible
Partially exclusive
Multiplicity
High
Low
Low
High
Competence and role of the chief Stability of the chief Network Centralization Comparison of local cor. ’ size
Also, the process of trust creation was made in different way. The researched companies followed these step: first: accumulation after expanding, second: employment of expertise on site, third: strategic alliance with cooperation company, fourth: joint-venture with cooperation company. The last method is the best way to make relational trust fast and deep. Company A was recognizing that it is a Korea leading company on the quality of product itself as the product power able to make rational trust, which was a main point of target to Japanese market. Also, it could be known for its own product superiority by the objective comparison based on the many references in Korea. It is needed to glocalize despite the superiority, but products of company A were thoroughly absorbed into the market by a good partner, which (sharing knowledge) got the absorbed products to create rational trust. Relational trust of company A was constructed by the strategic alliance with exclusive distributor rapidly from the beginning of the expanding. Especially, it has kept the relationship steadily with one company it became an opportunity to make the company A lacking social capital use that of partner company and establish trust faster. Company A did not always meet a good partner from beginning. This company met a partner company through the meetings with at least 50 companies, made an agency contract with one of them, and started the first sale with the first partner. But the sales figure scarcely rose after the first sale. It was because the first reference success was managed to match the first attempts anyway as a burden with only one
A Study on the Strategies of Foreign Market Expansion for Korean IT Venture Company
403
customer. As a result, the success in the first reference has no meaning. Finally, it didn’t only one copied product for 3 years after that. It had many partner during that time but they didn’t do a right role. It has kept the contract throughout 1 year after foundation. For this one year, it has had many times reviews and preparations about products. The contract made during this period dragged out some trust on the products and the company. This is rational trust. Relational trust is not created yet. However, product sales of company A come true by the sales contract with partner and Japanese big company after making a contract with exclusive partner company. A client company making a contract with a partner company had a problem but a partner didn’t solve the problem. It became recognized it is reliable and favorable from local clients and a partner company by that the president supported the client’s project directly during one or more months in Japan. It became chance to create some trust of all of partners and clients. They extended from the first 3-year-contractto 5-yearcontract taking that opportunity and now they construct a certain relational trust with a long-term contract of 20 years. Company B is constructing the relational social capital by continual action on-site. Company B was not gained the trust on the basic skill establishing the rational trust, that is, the brand and quality of product itself. Company B sold products through the channel of general trade market. The process of making discussion is complex and unfavorable to Korean company as big and high recognized as general trade market in Japan has. Japanese general trade markets don’t adopt a positive attitude to one software product from Korean company because they deal with diverse items and have Japanese or foreign company cooperating with and supplying. After all, it became a big limitation of all the company to construct the trust and product glocalization was delayed because the trust from this channel was not made and knowledge of glocalization was not shared. T hat is, as knowledge cannot be shared in the situation that rational trust is not established, the late of product became caught in a various circle obstructing the relational trust. But company B is struggling to act constantly through lasting experience on site in order to establish relational trust in this situation. ASP model was sold by the way to provide products to Japanese big cable TV providers through a contract with channel having clients having models taking a large part of revenue resources in company B. It became a cash cow of the company by making a good network and a contract benefitting both the company and a partner. It succeeded in making the relational trust based on thoroughly customizing of product in Korean product group with needs of a partner company. It succeeded in making rational trust by the channel contract, but relational was not. It means that relationship is maintained and rational trust doesn’t come into relational trust without strategic alliance and the exclusive relationship. After all, ASP model is considered as cash cow. Company C went through many difficulties in the beginning because the intention was not a strategy enter into Japan. Company C went far earlier (1997) into Japan to perform the project than domestic software company. It was the good experience to build up relational trust through some trust. But strategic approach to Japanese market was insufficient as American market was in the middle of global strategy rather than Japanese market in the beginning of company C. Also, it changed the groupware
404
W. Eun and Y.-S. Seo
developed in the project BPM focused on American market and appointed Japanese chief as a corporation chief. It intended to establish some trust through the expertise having experience in the relevant industry. It didn’t work due to problems that the corporation company had the avoidance of risk and communication trouble rather than the personal network of Japanese chief didn’t come into the company network. Japanese market for BPM didn’t create at that time, BPM should have been sold with a partner company because its service trait is high, and Japanese chief sold it through channel. Because it is easy for channel marketing to make a contract, it helps company keep; however, it is impossible to extend marketing actively and occupy the market. Partner marketing is the method to be accessed with not technology and communication skill but relational trust. Japanese companies or public officers are unwilling to break the relationship. So, it makes the partner marketing and new relation difficult. It is the reason making that Korean SI companies go into Japanese market difficult. It appointed now chief having experience and network on site and well communicated. A cash cow was changed BPM into groupware. Groupware had many references, especially from public officers, in Korea, which became an opportunity to create rational trust in Japanese market. Also, it has a reasonable price. Most of all, groupware should be sold by partner marketing instead of channel marketing and it is important to make relations with right partner. Accordingly, company C didn’t have relational trust and strategy of choosing product group and now is creating rational trust with strategic effort and coordination of product group. Company D creates relational social capital with product superiority, relations with sole distributor and joint-venture. One of its traits is that it went into market on the rational trust already constructed by product superiority. It became a chance to save time and cost. Company D applied the remote control technology the first time of the world. This technology is more effectively in the security of the advanced country and is supplied to the Pentagon. Product was developed with many languages suitable to that place. Accordingly, rational trust was created on the basis of product superiority. It pursues many methods with a cooperation partner in marketing. Company D went into Japanese Market as soon as the foundation of head office. It was possible to enter into the market because social capital had been created on the basis of rational trust of product itself. And the exclusive distribution right is effective on creating the rational trust. But it became to make an ASP service contract with three middle-sized agencies because an exclusive contract didn’t meet both companies’ expectation. In the end, they didn’t create relational trust. It could create relational trust to make a partner company share their knowledge actively by transferring many profits to a partner company. Also, it established joint-venture to provide the remote control managing service as marketing method for project and it was intended to expand a business with by the chance to establish while lasting relationship with the past clients. The reason of establishment was similar Korean service could be a good reference and absorb knowledge and experience of Japanese companies through the most active method called joint-venture. That is, it was to create relational trust through cooperation of local company.
A Study on the Strategies of Foreign Market Expansion for Korean IT Venture Company
405
Table 4. Accumulation of relational social capital
Rational trust Cooperation period /expanding periods* Emotional intimacy Accumulating method
Accumulation
A co.
B co.
C co.
D co.
Made
Partially made
Partially made
Made
6years/ 8years
6years/ 7years
3years/ 11years
3years/ 6years
High
Low
Low
High
Lasting local activities
Employment of expert
sole contract, Jointventure
Rational trust
Partially rational
Rational trust Ærelational one
Sole distribution, Strategic alliance Rational trust Ærelational one
* The cooperation period is expressed periods lasted with partner company now.
4.2 The Effect Structural Social Capital Has on Relational Social Capital We analyzed how effect centralization, tie strength and multiplicity of structural social capital have on each rational and relational trust. 4.2.1 The Effect Network Centralization Has on Making Some Trust When competence and role of chief is high, it could do a positive role as a deliver such as decision and training of head office and needs and knowledge of product between head office and the cooperation company. The frequent change of strategy and chief is the factor loosing network centralization and trust in foreign relations. A new appointee became corporation chief or manager in close tie with company A and D and had no trouble of communication with head office; as a result, it could be a consistence strategy through it. Lasting relationship without change of the chiefs is important meaning in creating trust. Vice-president of company D created network and maintained continuous foreign relations by managing an office in Japan. Accordingly, when network centralization is high like company A and D, the role of chief doesn’t have an effect on rational trust. But it could have a positive effect on relational trust by the chief. Company B and C changed in selecting product group many times and had difficulties in foreign relationship due to change and absence of chief, which didn’t help trust be made in relationship with cooperation company. Company B had been operated by manager of head office in the beginning had appointed Korean chief and Japanese chief by terns. Also, it made a problem. Company Chad difficulties in maintaining the long-term cooperation relations by the change of chief and product group.
406
W. Eun and Y.-S. Seo
Like that, network centralization centered on a corporation chief maintains continually and it makes communication with a cooperation company good and cooperation period long. We couldn’t reveal the role of chief has an effect on rational trust directly. 4.2.2 The Effect Tie Strength Has on Creating Trust Tie strength measured through the size and homogeneity has effects on making some trust. Company A and D was venture companies and went overseas with small size. They made a contract with small and middle-sized channel of agency or exclusive distributor able to sell their own products. It is the relations that tie strength is high in size and homogeneity. That is, when tie strength is high, it effects on both rational and relation trust positively. Company B and C didn’t have high tie strength in size and homogeneity of a partner company. Company B chose partner companies having big size and high recognition as general trade market in the beginning. But their size and homogeneity were low as compared with company B and sold products actively because these companies deal with diverse products and have complex decision making process. It made the glocalization of products late and maintaining relations difficult. After all, they changed the channel. It didn’t affect both rational trust and relational trust. Company C couldn’t make trust after it went through difficulties in selecting partner companies with channel and partner marketing at the same time due to the character of product for services. It is positive for rational and relational trust to obtain homogeneity focused fully on supplies and relationship with suitable sized partner. 4.2.3 The Effect That Multiplicity Has on Making Trust When multiplicity is high, it is the time that the range of cooperation is high and the exclusive right is obtained. It has an effect on the trust. Company A succeeded in maintaining long-term relation and establishing a strategic alliance with exclusive distributor. It could create rational trust by giving the exclusive right with a partner company. Lasting relations and mutual profits through cooperation of the exclusive distribution drew the relational trust. It can be found out 20-year-long-term contract with company A. Company D made some trust through exclusive relationship. Company D went into Japanese market as soon as the foundation of head office. It made an exclusive contract with channel and could be glocalized with this. It could enter into the market fast because social capital was made on the basis of rational trust in product itself. Accordingly, the exclusive distribution right is useful to construct rational trust. Because the contract with exclusive distribution didn’t meet their expectation, it made ASP service contract with three middle-sized exclusive distributors. Finally, the first exclusive distribution right could help rational trust create but didn’t have useful effect on relational trust necessarily. Although company B made an exclusive contract with general trade market due to diverse product groups, it didn’t actively globalize because of the characters of general trade market. As a result, high multiplicity like an exclusive right has a positive effect on rational and relational trust but couldn’t create some trust affected by other factors. We can realize the exclusive right cannot necessarily create some trust.
A Study on the Strategies of Foreign Market Expansion for Korean IT Venture Company
407
4.3 The Relations of Social Capital and Performance Let’s learn about the relation pattern with synthesizing the structural and the relational social capital to analyze what effect the social capital has. It is difficult to measure accurately and directly social capital and connect to the performance of company. Accordingly, we’d like to said that the accumulation of social capital comes to financial performance of the supply chain by analyzing the relations with the performance with overall points of views through the amount of three dimensions and achievement of BEP or not. Table 5. Relationship between social capital and performance A co.
B co.
C co.
D co.
Network centralization
Medium
Low
Low
High
Tie strength
High
Low
Low
High
multiplicity
High
Low
Low
High
Rational trust
Built up
Partially Built up
Partially built up
Built up
Relational trust
Built up
Not built up
Not built up
Built up
Product character
For service
For product
For service
For product
annual BEP*
Achieved
Achieved
Unachieved
Achieved
Structural Dimension
Relational Dimension
* Annual BEP achievement is measured comparing sales figure and net profit based on each company’s interview material.
Social capital of company A and D have a lot of structural dimension, but company B and C do not. Company A and D achieved annual BEP which is a performance indicator and company C did not do because of the lack of accumulation of structural social capital. But company B achieved annual BEP although it didn’t accumulate structural social capital. This result is because other things else besides structural dimension come into performance. As a result, structural social capital affects quantitatively but it is not the requirement. All the companies have some rational trust accumulating knowledge with steady effort with at least 5 years’ experience on site. However, only rational trust cannot come into performance and it makes performance with accumulating relational trust necessarily. This is characters of Japanese companies that they are willing to endure short-term loss comes from irrational action of a partner company if relational trust is created. It is a behavior based on belief that it will be the mutual benefit for long
真鍋誠司
term( , 2002). We learn that company A and D made a good performance with rational and relational trust. As analyzed, it was turned out structural dimension effects on relational one. It effects on relational dimension and can make a good performance again. Merely, company B achieved annual BEP despite the lack of accumulation of structural and
408
W. Eun and Y.-S. Seo
relational social capital, which is said that the importance of social capital; however, company B has not been successful by the cumulative BEP after expanding because it was analyzed by the annual BEP. That is, it cannot be said social capital does not effect on the performance by only the case of company B. In short, social capital effects on the performance. Both structural and cognitive social capital affect directly on the performance, but ahead of that, structural and cognitive dimension effect on creation of the trust which is relational dimension and it comes into the performance.
5 Conclusion This study goal was to analyze the social capital needed for that venture which goes overseas with a successful expansion and operation. It said that social capital affects company’s glocalization and performance and accumulation of social capital accelerates the glocalization and can draw performance. The issues are like that on the results of this study. IT venture’ expanding into Japan is not impossible to enter into market by the direct business. Affiliation( ) still exists in Japanese market, which make the other company go into this affiliation. Accordingly, it is important to appoint corporation chief is able to be reliable and to found the local corporation for relationship long lasting. But it is not a necessary element for foreign relationship to found a corporation. Due to superiority of products and vice-president having Japanese business experience and competence of company D, it started from the export as an office and reached up to foundation of joint-venture with Japanese company. Although foundation of corporation is not necessary, a company can establish trustworthiness by the supporting center and office without corporation. Corporation chief is major factor to make some trust. Competence of chief is also important in the many ways to select products group and search a partner through network. Especially, it is important to communicate with headquarter. It is not studied on this study; the reasons that a company has been changing chiefs come from the difficulty in communication with head office. Therefore, it should appoint chief having knowledge and understanding about product group and strategy. The importance of partner selection is absolute to enter into Japanese market. It is because it is difficult to business for itself alone. If any problem happens, Japanese client would like to have help from the other reliable company. So, there are many cases that company without partner cannot just enter into Japanese business. Some of the model companies on this study don’t have knowledge to select the first suitable partner, so they had a difficulty. It is for most of companies on the beginning of expanding, a major concern. The right partner is the company has the suitable size. The suitable size is that it is compatible enough to expect one company’s devotion to another. Because the software market is not often big middle-sized venture of Korea, it may not be a good meeting with a big company channel of Japan. That is, it is important to select a company have the will to sell supply actively as a suitable sizepartner. It is a good way that it gives the many profits back to a partner company with view profit structure of company D from a long term perspective.
けい れつ
A Study on the Strategies of Foreign Market Expansion for Korean IT Venture Company
409
It is also a good way to give the exclusive right in order to last relationship with partner. Granting the exclusive right has effect on rational and relational trust on the result of this study. When it gives both the exclusive right and effort to last and improve relationship, it can go up to relational trust. Relational dimension is divided into rational trust and relational trust. Rational trust is basic competence of a related company and the software industry is needed an absolute product technology. It is the character of knowledge service industry. Like the result of this study, it needs to establish relational trust after accumulating rational trust. It should find a partner company after going into market with at least some technical superiority. Company D constructed rational trust on the basis of product superiority and expects the long-term performance on the back of effort to establish relational trust and enhance relations with a partner company through the corporation chief’s network and experience. Accordingly, long-run success will be made by only investment and effort to improve relationship with a basic technical product for a long-term cooperation performance.
References 1.
真鍋誠司 (2002), “企業間信頼の構築:トヨタのケース”Discussion Paper, J42, 神戸 大学経済経営研究所, 2002 年3 月、2-6 頁
2. Anderson, J.C., Narus, J.A.: A Model of Distributor Firm and Manufacturer Firm Working Partnerships. Journal of Marketing 54(1), 42–58 (1990) 3. Autio, L., Sapienza, M., Almeida, P.: Effects of age at entry, knowledge intensity, and imitability on international growth. Academy of Management Journal 43(5), 909 (2000) 4. Bell, J.: The internationalization of small computer software firms: A further challenge to ’stage’ theories. European Journal of Marketing 29(8), 60–75 (1995) 5. Blankenburg, D., Johanson, J.: Managing network connections in international business. Scandinavian International Review 1(1), 5–19 (1992) 6. Burt, R.S., Kenz, M.: Trust and Thrd Party Gossip. In: Kramer, R.M., Tyler, T.R. (eds.) Trust in Organization: Frontiers of Theory & Research, pp. 302–330. Sage, Thousand Oaks (1996) 7. Coleman, J.: Social Capital in the Creation of Human Capital. American Journal of Sociology 94, 95–120 (1988) 8. Coleman, J.: The Foundations of Social Theory. Harvard University Press, Cambridge (1990) 9. Dyer, J.H.: Specialized Supplier Networks as a Competitive Advantage: Evidence from the Auto Industry. Strategic Management Journal 17(4), 271–291 (1996) 10. Granovetter, M.: Economic Action and Social Structure: The Problem of Embeddedness. American Journal of Sociology 91, 481–510 (1985) 11. Gulti, R.: Does Familiarity Breed Trust? The Implications of Repeated Ties for Contractual Choice in Alliances. Academy of Management Journal 38(1), 85–112 (1995) 12. Hara, G., Kanai, T.: Entrepreneurial networks across oceans to promote international strategic alliances for small business. Journal of Business Venturing 9, 489–507 (1994) 13. Hoch Detler, J., Foeding, C.R., Pukert, G., Lindner, S.K., Multer, R.: Secret of Software Success. Harvard business school press, Boston (2000) 14. Ibarra, H.: Homophily& Differential Returns: Sex Differences in Network Structure & Access in an Advertising Firm. Administrative Science Quarterly 37, 422–447 (1992)
410
W. Eun and Y.-S. Seo
15. Johanson, J., Wiedersheim-Paul, F.: The international of the firm: Four Swedish cases. Journal of Management Studies 12(3), 305–322 (1975) 16. Ling-yee, L.: An examination of the foreign market knowledge of exporting firms based in the People’s Republic of China: Its determinants and effect on export intensity. Industrial Marketing Management 33, 561–572 (2004) 17. Marsden, P.V., Campbell, K.E.: Measuring Tie Strength. Social Forces 63(2), 482–501 (1984) 18. Madsen, S.: The internationalization of Born Global: An evolutionary process. International Business Review 6(6) (1997) 19. Nahapiet, J., Ghoshal, S.: Social capital, intellectual capital, and the organizational advantage. Academy of Management Review 23(2), 242–285 (1998) 20. Oviatt, B.M., McDougall, P.P.: Toward a theory of international new ventures. Journal of International Business Studies 23, 45–64 (1994) 21. Oviatt, B.M., McDougall, P.P.: Challenges for internationalization process theory: the case of international new ventures. Management International Review 37(2), 85–99 (1997) 22. Presutti, M., Boari, C., Fratocchi, L.: Knowledge acquisition and the foreign development of high-tech start-ups: A social capital approach. International Business Review 16, 23–46 (2007) 23. Putnam, R.: Making Democracy Work: Civic Traditions in Modern. Princeton University Press, Princeton (1993) 24. Rennie, M.W.: Global competitiveness: Born global. The McKinsey Quarterly (4), 45–52 (1993) 25. Sharma, B.: A critical review of time in the internationalization process of firms. Journal of Global Marketing 16(4), 53 (2003) 26. Trochim, W.: Outcome pattern matching and program theory. Evaluation and Program Planning 12, 355–366 (1989) 27. Tsai, W., Ghoshal, S.: Social Capital & Value Creation: The Role of Intra=firm Networks. Academy of Management Journal 41(4), 464–476 (1998) 28. Yli-Renko, H., Autio, E., Tontti, V.: Social capital, knowledge, and the international growth of technology-based new firms. International Business Review 11(3), 279–304 (2002) 29. Yin, R.: Case Study Research: Design and Methods. Sage Publications, Beverly Hills (1989) 30. Zahra, S.A., Ireland, D.R., Hitt, M.: International expansion by new venture firms: International diversity, mode of market entry, technological learning, and performance. Academy of Management Journal 43(5), 925 (2000)
Binding Update Schemes for Inter-domain Mobility Management in Hierarchical Mobile IPv6: A Survey Afshan Ahmed1, Jawad Hassan2, Mata-Ur-Rehman1, and Farrukh Aslam Khan2 1
Department of Computer Sciences, International Islamic University, Islamabad, Pakistan 2 Department of Computer Sciences, National University of Computer and Emerging Sciences, Islamabad, Pakistan {afshan.mscs523,mata}@iiu.edu.pk, {jawad.hassan,farrukh.aslam}@nu.edu.pk
Abstract. As the number of mobile users is increasing day by day, we need to pay attention to support mobility and its management. Mobile IP is used to maintain mobility information when a mobile user moves from one location to the other while its permanent address remains unchanged. Mobile IPv6 (MIPv6) is introduced to overcome the problems of MIPv4 and bring improvements in the performance of the protocol. MIPv6 performs well for micro-mobility but suffers for macro-mobility. Hierarchical Mobile IPv6 (HMIPv6) is developed to separate micro-mobility and macro-mobility. But HMIPv6 could not outperform for macro-mobility when a mobile node moves from one MAP domain to the other. Many schemes are proposed to reduce handover latency and bring efficiency for macro-mobility, but still this problem affects the performance of HMIPv6. In this paper, we carry out a survey of different binding update schemes for inter-domain mobility management in HMIPv6. At the end, a detailed comparison is given for all the discussed schemes. Keywords: Hierarchical Mobile IP, Macro and Micro mobility, Binding Update Cost.
1 Introduction With the invention of new technologies, mobile computing is getting more and more attention. The number of mobile users is increasing day by day. In order to facilitate this, Internet Engineering Task Force (IETF) introduced mobile IP which allows mobile nodes with all necessary resources to change their location quickly continuing the ongoing session. Mobile IP is an enhancement in IP which allows a mobile node to roam freely anywhere while maintaining its current IP address. Mobile IP or IP mobility (MIP) is a standard communication protocol proposed by IETF which is designed to ease mobile users. Mobile IP is location-independent protocol that lets IP datagram to route on the internet. MIP is mostly found in wireless WAN, where users need to take their mobile devices across several LANs with different IP addresses. MIPv4 [3] is a protocol for IPv4 network; it needs two addresses and two agents to communicate with the Corresponding Node (CN). Registration of Care-of Address T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 411–420, 2011. © Springer-Verlag Berlin Heidelberg 2011
412
A. Ahmed et al.
(CoA) with Home Agent (HA) is done through Registration Request and Registration Reply. CN does not communicate directly with the Mobile Node (MN), this task is accomplished through HA. MIPv4 faced problems of Triangle routing, double crossing, duplicate field IP within IP, Reliability and various security issue as discussed in [10]. MIPv6 [5] is introduced to deal with the problems of MIPv4. MIPv6’s features are almost similar to those of MIPv4, with some improvements as route optimization, registration, security, authentication, address configuration, agent and neighbour discovery. For registration of primary CoA, Binding Update (BU) and Binding Acknowledge (BA) messages are exchanged between HA and MN. Route Optimization provides direct communication between CN and MN and solves the triangular problem. Multiple HA reduce HA failure, so more security and authentication is there in MIPv6. MIPv6 faces signalling and processing overhead and packet loss especially in intradomain mobility when a MN alters its point-of-attachment frequently [2]. Hierarchical mobile IPv6 (HMIPv6) is introduced, which separates inter-site mobility to intra-site mobility management to reduce the problems of MIPv6 [12]. There are two main advantages of HMIPv6 over MIPv6. Firstly, the movement of MN is completely transparent to the CNs; it solves the issue of signalling overhead and some security issues. Secondly, it categorizes inter-site and intra-site mobility, provides more scalability, and has hierarchical, flexible and customized architecture [1]. HMIPv6 is not efficient for the inter-domain mobility when MN performs handoff from one Mobility Anchor Point (MAP) domain to another MAP domain. It faces higher handover latency and packet loss which decreases its overall performance. Many schemes are proposed to bring efficiency in inter-domain mobility, but still no noticeable improvement is seen which satisfies the protocol’s efficiency. In this paper, we discuss some schemes proposed for improving inter/macro handover for binding update (BU) procedure and packet delivery, their benefits and weaknesses as well as we carryout performance analysis of all the discussed schemes. The rest of paper is organized as follows: In section 2, the introduction of HMIPv6 is described. Section 3 presents handover mechanism in HMIPv6. Handover issues and inter-domain mobility issues are discussed in section 4. Schemes for inter-domain mobility are presented in section 5. Finally, section 6 concludes the paper.
2 Hierarchical Mobile IPv6 Hierarchical mobile IPv6 [13] is similar to MIPv6 with some extensions. For decreasing delays in MIPv6, a new mobile IPv6 node called Mobility Anchor Point (MAP) is used, and placed at any level in a hierarchical network of routers. Access Routers are also included with it. MAP is designed to reduce the signalling cost of MIPv6, between the MN, it’s HA, and CN. Outside the local domain, signalling of MIPv6 will be limited by MAP. Hierarchical mobile IPv6 is categorized into local (intra/micro) mobility and global (inter/macro) mobility. When mobile node moves within single domain from one Access Router (AR) to another AR, it is intra/micro mobility. When mobile node moves from one access point to the other across the internet, it is inter/micro mobility. Fig. 1 shows intra and inter-domain mobility when a MN moves from MAP1 to MAP2.
Binding Update Schemes for Inter-Domain Mobility Management in HMIPv6
413
Fig. 1. Intra and Inter Domain Mobility
A MN enters a MAP domain on receiving Router Advertisement (RA) message which contains all related information about MAPs. HMIPv6 uses two addresses: Regional care-of address (RCoA) and On-link care-of address (LCoA). When a MN enters a MAP domain, a Router advertisement (RA) having information about one or more local MAP will be received. The MN can attach its on-link CoA which is its current location, with RCoA. MAP behaves like local HA; it will receive all packets and forward to the MN. When MN alters its current address within local MAP domain, only its LCoA will be changed. RCoA remains same until it moves to the other MAP domain. Therefore, only RCoA will need to register with CN and HA. RCoA does not change until MN moves within local MAP domain. This makes mobility of a MN transparent to CN to whom it is communicating. MAP boundaries are defined by ARs by advertising the MAP information to the attached MNs. HMIPv6 is simply an extension of MIPv6, if HMIPv6 aware mobile node does not discover the capability of MAP in visited network, it will prefer to use simply MIPv6.
3 Handover in Hierarchical Mobile IPv6 Handover or handoff is the process when a MN moves from one domain or location to another domain or location. It has to perform many tasks to continue its on-going communication. Two main operations that it must be performed are: 1. Registration process which includes location updating in which a MN has to register and update its new location by BU process. 2. Sending and receiving packets to and from CNs. 3.1 Registration Process When MN moves from MAP1 to MAP2 (inter/macro handover) after detecting movement, it performs the following BU operations: 1. MN first configures two addresses: LCoA and RCoA. 2. For binding LCoA to RCoA, MN sends Local Binding Update (LBU) message to MAP via currently attached AR.
414
A. Ahmed et al.
3. MAP receives BU message and performs Duplicate Address Detection (DAD) to confirm the uniqueness of the recently configured address. 4. After successful completion of DAD, MAP sends Binding Acknowledgment (BA) message back to MN. 5. MN then sends BU message to HA and CN via MAP. 6. HA/CN store address of MN to their binding cache, if binding is successful, start sending packets to MN using its RCoA (as HA/ CN knows only about RCoA). 7. MAP2 receives packets and tunnel them to MN using its LCoA. 8. MN receives packets, de-capsulates them and processes them in normal way. When MN enters into new MAP domain, BU procedure is as shown in Fig. 2:
Fig. 2. Binding Update Procedure
When MN moves from one AR to another AR within same MAP domain (intra/micro handover), the BU process is: 1. 2. 3. 4.
MN moves to new AR on receiving RA message, within same MAP domain. MN configures new LCoA and sends BU message to MAP via new AR. MAP receives LCoA, and performs DAD. After successful completion of DAD, MAP sends BA with packets back to MN via new AR.
MAP periodically receives binding updates from MN or when MN changes its LCoA, and maintains its cache. HA and CN are not informed when MN changes its LCoA, but binding updates are received periodically from MN. 3.2 Packet Delivery Process There are two possible ways for communication between MN and CN: bi-directional tunnel and route optimization. First mode is available even when MN has not registered its current location with CN. CN sends a packet to HA using MN’s home address, HA encapsulates packet and sends it to MAP using RCoA as it knows only RCoA. MAP receives packet from HA, encapsulates it and sends it to MN using its
Binding Update Schemes for Inter-Domain Mobility Management in HMIPv6
415
LCoA. MN de-capsulates packet and processes it in normal way. When MN transmits packet to CN, it first sends to MAP, MAP sends it to HA, then HA sends it to CN. The second way is Route Optimization which establishes after receiving packets from CN to MN. It requires registering MN’s current LCoA with CN sending binding updates. Now CN can directly communicate with MN using its current location’s address. CN sends packet to the MAP and MAP tunnels this packet to the MN. Similarly, when MN will send packet to CN, it transmits to MAP and MAP sends packet to the CN. When MN moves toward next AR (e.g. from AR1 to AR2), packet transmission will done via new tunnel established between MAP and MN on AR2. This action will be hidden from HA and CN.
4 Handover Issue in Inter-domain Mobility MIPv6 is less efficient for local mobility (i.e., micro mobility) for which it does not give outstanding performance. Therefore, to resolve this problem, HMIPv6 was introduced which adds up a new function i.e., MAP. But it could not improve the performance in case of global mobility when a MN handoffs from one MAP domain to the other MAP domain. Because MN has to send BU to MAP to register its LCoA, and in addition it also registers new MAP’s RCoA with HA and CN, it takes long time to start proper communication again, therefore, transmission delay occurs between MN and HA/CN. As a result, longer handoff latency and packet loss are faced by HMIPv6 than MIPv6. That is why its location and packet delivery costs are high in case of inter-domain mobility as compared to MIPv6 and other protocols like Proxy Mobile IPv6 [16]. Many schemes have been proposed to reduce inter-domain mobility cost, but still this problem affects the overall performance of HMIPv6.
5 Inter-domain Mobility Management Schemes There are many schemes proposed for improving the efficiency in inter-domain mobility. Different factors such as handover latency, binding update procedure, load balancing, signalling, and location management etc. are improved. In this paper we discuss those schemes which attempt to improve the binding update procedure for inter-domain mobility, reducing handover latency and packet loss. 5.1 Pre-binding Update Scheme A neighbour discovery scheme and handoff procedure using neighbour information is proposed in [8]. According to the proposed scheme, when MN senses that it will perform handoff, it transmits a handoff solicitation message to currently attached AR, and the AR transmits an advertisement message with options. MN receives this message and performs an Address Auto-configuration (AA) process in advance to the new CoA before the actual handoff. Then it transmits pre-BU message to the AR to which it will move. The AR and MAP (in case of inter-domain handoff) will perform Duplicate Address Detection (DAD) and store it in incomplete state of the MN’s address. When MN moves to the new AR and sends real BU message to the AR and MAP, new AR and MAP finish the process of handoff and directly start routing with
416
A. Ahmed et al.
changing incomplete state of MN’s address into complete state without performing DAD. When MN moves to the other regional network, it sends BU to the new MAP that is MAP2 via AR and MAP2 sends it back to the MAP1 that is old MAP. The MAP1 already has a table to manage the neighbouring MAPs and generates another table to manage the MN’s information as well as it updates all CN attached to HA and MN using its own RCoA. When MN registers to the MAPs, it sends MAP1’s RCoA and initial MAP’s RCoA to the MAP2. MAP2 then sends its own RCoA and initial RCoA to the MAP and MAP compares initial MAP’s RCoA with itself. Whenever, HA transmits data, data will be routed through this route. Similarly, when MN moves to the MAP3 from MAP2, this procedure will be followed byMAP3 and MAP1. In this case, MAP2’s RCoA will be deleted from the MN’s list by MAP1. MAP1 will store and update the MAP3’s RCoA. Therefore, this scheme is not required to send binding information to the HA and CN. 5.2 An Efficient Binding Update Scheme An efficient binding update scheme has been proposed in [9], allowing MN to originate only the half of signal messages in home registration process. In the proposed scheme, local binding update message piggybacks binding update message. The MN generates an LCoA and RCoA using information from AR and MAP. Then MN sends LBU message to MAP, MAP receives this BU message and binds the LCoA with RCoA. DAD process is then performed by MAP to check the uniqueness of the new configured address. After successful completion of DAD, MAP sends back acknowledgement message to MN. A bi-directional tunnel is established between MN and the MAP. Now MAP sends this BU message to the HA. As HA receives this BU message from MAP, it verifies and updates its binding cache. After doing that HA sends back acknowledgement message to the MAP. In original binding process, first local BU sends to MAP, then MAP binds MN’s RCoA to LCoA and performs DAD. Then sends back acknowledgement message to MN, and MN sends binding updates to the HA and CN, which takes a lot of extra time. 5.3 Geographically Distributed Multiple HAs (GDMHA) Scheme In Geographically Distributed Multiple HAs (GDMHA) mechanism [6], MN performs BU with the nearest home agent, and this is done using GIA (global IP Anycast). GIA is used to find the HAs among several HAs which are geographically distributed. These HAs belong to the same Any-cast group and are chosen easily. When a packet is sent to an Any-cast address, it is delivered to the closest member in the GIA group. Similarly, when MN performs BU, it is performed with nearest HA by using GIA method. Mobility management of MN in HMIPv6 is noticeable issue when it moves to the other MAP domain. When MN enters into another MAP domain, it must carry out BU with AR, MAP, as well as HA. In this case, if the distance of MN and HA is far away, a longer delay may occur during BU from MN to HA. However, if HAs are distributed belonging to the same Any-cast group, this issue can be handled. Thus, BU is performed with HA which is geographically closest to the MN.
Binding Update Schemes for Inter-Domain Mobility Management in HMIPv6
417
5.4 Fast Macro Handover Scheme (FMHS) According to the Fast Macro Handover Scheme (FMHS) [14], when MN detects movement to a different MAP domain, it sends FBU message to MAP1, which contains MAP2 network’s prefix information and also informs MAP1 about its movement to different MAP domain. On receiving message from MN, MAP1 performs the MAP discovery and exchanges HI/HACK message for establishing a tunnel between MAP1 and MAP2. MAP1 stores RCoA2 in its proxy binding cache and updates it and forwards packets to MAP2 which are addressed to RCoA1 through the tunnel established between MAP1 and MAP2. As MAP is elected as an agent for MN, MAP2 will obtain all packets on behalf of the MN and buffer the packets until finishing the L2 handover of the MN. After L2 handover of the MN, MAP2 will encapsulate and forward all packets to the MN’s current address that is LCoA2 directly. The basic idea of this scheme is to localize the macro handover in HMIPv6. 5.5 A Macro Mobility Handover Performance Improvement Scheme A macro mobility handover performance improvement scheme [4] is based on FHMIPv6. When MN wants to move from MAP1 to MAP2, it creates new RCoA and LCoA on the basis of MAP2 and AR3 (new AR) prefixes in RA message. Then MN sends FBU message including new address, to AR2 (current AR), and AR2 sends HI message to AR3. AR3 performs DAD on new LCoA and sends HI message to MAP2 to verify new RCoA. On successful completion of DAD, HAck message is sent to AR2 which informs that new address are stored and fast macro mobility handover is ready, AR3 stores new RCoA and LCoA in its Proxy neighbor Cache. A tunnel is created between AR2 and AR3 through which AR2 sends all packets destined to MN to AR3. FBack message is sent to AR3 and MN when fast macro mobility handover is ready. During L2 handover, all packets sent by CN are intercepted by MAP1 and MAP1 sends to AR2 via tunnel and AR2 sends these packets to AR3 via tunnel where AR3 buffers them until handover of MN ends. After completion of handover process, FNA message is sent to AR3 that MN in on-link of AR3 subnet. AR3 sends all packets to MN, on receiving FNA message. MN then generates a FNA message including LBU and sends to MAP2. MAP2 binds new RCoA and new LCoA of MN on receiving FNA message. After successful registration with the MAP2, MN can perform location update with its HA and CN. 5.6 An Efficient Scheme for Fast Handover over HMIPv6 (ES-FHMIPv6) An Efficient Scheme for Fast Handover over HMIPv6 (ES-FHMIPv6) [15] is designed to be efficient with the data transport feature. A scheme is designed with the combination of FMIPv6 and HMIPv6 to improve their deficiencies and to reduce the handover latency in macro mobility of HMIPv6. Authors explain their scheme with a scenario when a MN wants to move from previous access router (PAR) to new access router (NAR) in the previous access point (PAP) region, and NAR already has information about the link layer address and network prefix, and also sends the router advertisement (RA) to the New Access Point (NAP). NAP has a dual buffer which stores the received message of AR sent by NAR. A NAP sends a beacon message periodically to MN in every 100ms. L2 Association Request to NAP is sent by MN,
418
A. Ahmed et al.
NAP sends an association request with stored router advertisement. MN sends LCoA and RCoA to NAR where address configuration confirmation with 1000ms has been established for RCoA. New Enhanced-Binding Update for LCoA and RCoA is performed, then HA/CN response to MN with New Enhanced Binding Acknowledgement is sent. 5.7 Cost-Reduced Inter-MAP Binding Update Scheme A Cost-Reduced Robust Hierarchical Mobile IPv6 (RH-MIPv6) scheme is presented in [11] in which two different MAPs i.e., primary and secondary MAP are used. Table 1. Performance Comparison of Binding Update Schemes Schemes Pre-Binding Update Scheme [8]
Signaling Cost Saves signaling cost
An Efficient Binding Update Scheme [9]
Reduced signaling cost by sending single BU message
GDMHA [6]
Number of BU messages may cause more signaling cost
Decreased BU and handoff latency
FMHS [14]
Moderate
Decreased Macro handover latency. Does not improve wireless latency
Macro-Mobility Handover Performance Improvement Scheme [4]
More signaling cost in sending more messages
Reduced packet transmission latency as well as handover latency
ES-FHMIPv6 [15]
Moderate
Minimizes message transmission latency. Reduced total handover latency
Cost-Reduced
Signaling cost incurred to send more messages in registration
Handover latency reduced in macro handover. Faster recovery of MAP
RH-MIPv6 [11]
Latency Decreased interdomain handoff latency Decreased packet delivery latency
Overhead More overhead for more number of messages Overhead for MAP to hold and manage both BU messages at the same time Maintaining multiple HAs is the overhead. BCRA message overhead as well. Little overhead for MAP2 to buffer all packets of MN until it finishes L2 handover Buffering all packets by AR and tunneling between PAR and NAR is overhead of this scheme. Overhead for managing old and new MAPs. maintaining dual buffer table is additional overhead for PAR Maintains P-MAP and S-MAP, and overhead of additional cache entry
Binding Update Schemes for Inter-Domain Mobility Management in HMIPv6
419
The aim of this scheme is to enhance the HMIPv6 method by providing a fault tolerant mobile service, and reducing the cost of binding update as well as the communication overhead. In cost-reduced RH-MIPv6; first-try MAP registration is same as done in IRH-MIPv6. When MN enters a new MAP domain, it receives several router advertisement (RA) messages and chooses two exchangeable MAPs. Only one BU message is sent from MN for the registration of both MAPs, i.e., primary and secondary MAP, and CNs. It reduces the signaling cost in the distributed MAP environment. The two MAPs send back binding acknowledgement message to the MN for completion of the binding update procedure. MN sends binding update message to CN for registration including P-RCoA and S-RCoA. After MN moves to the other MAP, this scheme can skip the S-MAP registration when MN registers new P-MAP. It means that P-MAP works as the S-MAP. Then MN holds information of the location of previous P-MAP, and sends it BU. MN can communicate with previous MAP, if current MAP is failed. The Cost-reduced RH-MIPv6 also needs to add an additional entry in the binding cache, and also a new flag to the binding update message. In Table 1 we summarize the performance comparison of all the techniques discussed above. This table briefly tells the strengths and weaknesses of these techniques.
6 Conclusion HMIPv6 introduced the solution of maintaining micro-mobility and improved performance by decreasing handover latency and signaling overhead faced by MIPv6. However, HMIPv6 does not perform well for inter/macro domain mobility which effect its overall performance. This issue is getting active attention and there are many schemes that are proposed and still trying to overcome this limitation of HMIPv6. In this paper, we discussed different binding update (BU) schemes for inter-domain mobility in which registration cost is minimized by improving the binding update procedure. The performance comparison of the existing schemes is also done. As a future work, we will design our own scheme for inter-domain mobility that will attempt to improve the performance of existing schemes by reducing handover latency in the binding update procedure.
References 1. Castellucca, C.: A Hierarchical Mobile IPv6 Proposal, INRIA Technical Report 226 (1998) 2. Perez-Costa, X., Torrent-Moreno, M., Hartenstein, H.: A performance Comparison of Mobile IPv6, Hierarchical Mobile IPv6, Fast Handovers for Mobile IPv6 and their Combination. ACM SIGMOBILE Mobile Computing and Communications (2003) 3. Perkins, C.: IP Mobility Support for IPv4, IETF RFC 3344 (2002) 4. Lee, K., Lim, Y., Ahn, S.-J., Mun, Y.: A Macro Mobility Handover Performance Improvement Scheme for HMIPv6. In: Gavrilova, M.L., Gervasi, O., Kumar, V., Tan, C.J.K., Taniar, D., Laganá, A., Mun, Y., Choo, H. (eds.) ICCSA 2006. LNCS, vol. 3981, pp. 410–419. Springer, Heidelberg (2006)
420
A. Ahmed et al.
5. Johnson, D., Perkins, C., Arkko, J.: Mobility Support in IPv6, IETF RFC 3775 (2004) 6. Lee, J.-H., Han, Y.-J., Lim, H.-J., Chung, T.-M.: New Binding Update Method using GDMHA in Hierarchical Mobile IPv6. In: Gervasi, O., Gavrilova, M.L., Kumar, V., Laganá, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. (eds.) ICCSA 2005. LNCS, vol. 3480, pp. 146–155. Springer, Heidelberg (2005) 7. Soliman, H., Castelluccia, C., El Malki, K., Bellier, L.: Hierarchical Mobile IPv6 Mobility Management, IETF RFC RFC 4140 (2005) 8. Jeong, J., Chung, M.Y., Choo, H.: Improved Handoff Performance Based on Pre-binding Update in HMIPv6. In: Kim, Y.-T., Takano, M. (eds.) APNOMS 2006. LNCS, vol. 4238, pp. 525–529. Springer, Heidelberg (2006) 9. Oh, J., Mun, Y.: An Efficient Binding Update Scheme in HMIPv6. In: Gavrilova, M.L., Gervasi, O., Kumar, V., Tan, C.J.K., Taniar, D., Laganá, A., Mun, Y., Choo, H. (eds.) ICCSA 2006. LNCS, vol. 3981, pp. 471–479. Springer, Heidelberg (2006) 10. Nada, F.: Performance Analysis of Mobile IPv4 and Mobile IPv6. The International Arab Journal of information Technology 4(2) (2007) 11. Jeong, J., Chung, M.Y., Choo, H.: Cost-Reduced Inter-MAP Binding Update Scheme in Robust HMIPv6. In: Proc. of Third International Conference on Broadband Communications, Information Technology & Biomedical Applications, Gauteng (2008) 12. Han, Y., Min, S.: Performance Analysis of Hierarchical Mobile IPv6: Does it improve Mobile IPv6 in Terms of Handover Speed. Wireless Personal Communications 48, 463– 483 (2009) 13. Soliman, H., Castelluccia, C., El Malki, K., Bellier, L.: Hierarchical Mobile IPv6 (HMIPv6) Mobility Management, IETF RFC 5380 (2008) 14. Nam, S., Hwang, H., Kim, J.: Fast Macro Handover in HMIPv6. In: 8th IEEE International Symposium on Network Computing and Applications, Cambridge, Massachusetts (2009) 15. Yoo, H., Tolentino, R.S., Park, B., Chang, B., Kim, S.: ES-FHMIPv6, An Efficient Scheme for Fast Handover over HMIPv6 Networks. International Journal of Future Generation Communication and Networking 2(2) (2009) 16. Lee, J., Han, Y., Gundavelli, S., Chung, T.: A comparative performance analysis on Hierarchical Mobile IPv6 and Proxy Mobile IPv6. Telecommunication Systems 41, 279– 292 (2009)
A Pilot Study to Analyze the Effects of User Experience and Device Characteristics on the Customer Satisfaction of Smartphone Users Bong-Won Park1 and Kun Chang Lee2,* 1
Department of Interaction Science, Sungkyunkwan University Seoul 110-745, Republic of Korea [email protected] 2 SKK Business School and Department of Interaction Science, Sungkyunkwan University Seoul 110-745, Republic of Korea [email protected]
Abstract. People do not want to just call other people any more by using a cellular phone; they want to connect to the Internet and use various applications like they do with a personal computer. Hence, cellular phones need to become smart. A smartphone has an operating system and many applications. Therefore, it is more difficult to use than previous cellular phones. Therefore, smartphone users are stressed as they confront problems while using smartphones. In this study, we analyzed the effects of the smartphone experience, including phone stress and enjoyment, and device characteristics on customer satisfaction. Questionnaires of thirty-three valid respondents in South Korea were analyzed with regression analysis. The results were that smartphone stress and enjoyment do not affect customer satisfaction. However, instant connectivity is an important factor regarding customer satisfaction. In addition, female customers are not satisfied with smartphones compared with male customers. These results are based on a pilot study, and will be corroborated using more data. Keywords: Smartphone stress, Enjoyment, Instant connectivity, Satisfaction.
1 Introduction A few years ago, people only sent (received) calls and messages using cellular phones. However, with the application of new technology to cell-phones, people can see movies, listen to music, and watch TV programs anywhere anytime. In addition, as Wi-Fi (Wireless Fidelity) functions embedded in smartphones are introduced, users can use the Internet on the move. Using such phones, people can connect to the Internet free at Wi-Fi zones, check their emails, and manage social networking sites. As many functions and new features are embedded, users derive enjoyment from using them. However, these smartphones also cause users stress because they are difficult to use, and users do not know how to deal with phone-related problems. As a *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 421–427, 2011. © Springer-Verlag Berlin Heidelberg 2011
422
B.-W. Park and K.C. Lee
smartphone’s power and capability enhance, these problems are expected to increase. In addition, this stress decreases a smartphone user’s satisfaction. Therefore, it is necessary to know what stress occurs in a smartphone environment, and the impact of this factor and related factors on smartphone users’ satisfaction. Many researchers [1-9] have studied smartphone users’ behaviors such as adoption and addiction. However, there is no literature on smartphone stress even though research exists on computer-related stress [10]. This study has focused on perceived smartphone stress and related factors that affect customer satisfaction. Regression analysis is applied to analyze this issue because it is a general tool to analyze similar questions.
2 Previous Studies Studies on cellular phones including smartphones are categorized in three major strands. The first strand is the adoption of and satisfaction with a cellular phone and mobile application. The second strand is addiction to cellular phones. The third strand is the mobile business (application) for a mobile phone. 2.1 Adoption of Mobile Phones The adoption of new devices is a continuing issue in the IT domain. The TAM (Technology Adoption Model) is generally used to explain such adoption [11]. In explaining the adoption of smartphones, Kim [1] found that job relevance moderates between the perceived usefulness and behavioral intentions and also, experience moderates between the company’s willingness to fund and behavioral intentions. According to data surveyed in 2004, three features including the color screen, voiceactivated dialing, and Internet browsing among six mobile phone features affect the user’s overall satisfaction [2]. 2.2 Addiction to Mobile Phones Roos [3] explained that addicted mobile phone users always switch their phones on, always carry and use them, and experience economic and social difficulties due to excessive mobile phone use. Chen [4] found that female undergraduate cellular phone users are more addicted that male users, and mobile phone addiction is positively related to the respondent’s depression. To measure mobile phone dependency, Japanese researchers developed a cellular phone dependence questionnaire [5]. Using this questionnaire, Kawasaki et al. [6] found that old students (university students) are more dependent on mobile phones than young students (high school students), and female high school students are the group with the highest mobile phone dependency. 2.3 Application of Mobile Phones (Mobile Games) Online games are popular around the world. People can play games on the subway or bus using portable game devices such as Sony’s PSP (PlayStation Portable) and Nintendo DS. As the capacity of mobile phones is increasing, users can play mobile
A Pilot Study to Analyze the Effects of User Experience and Device Characteristics
423
games using cellular phones. However, as the age and education level of users to play mobile games increase, interest in mobile games decreases [7]. To make mobile games appealing for mobile phones, mobile games use the user’s location, which is one of the characteristics of mobile phones [8, 9].
3 Hypothesis Development Recently, people are beginning to use the smartphone and want to use it smartly. At this time, satisfied customers communicate their positive experiences to others. Therefore, to expand smartphone markets, a company should try to satisfy existing smartphone customers. In this study, we analyze the factors that affect customer satisfaction respective of user experience and device characteristics. 3.1 User Experience Smartphone stress. While using a personal computer, a person gets stressed because the computer system is down or applications relating to word-processing, spreadsheets, etc., are not working [10]. The inner structure of a smartphone is similar to a personal computer because it has operating systems and applications. As a smartphone has complex functions, it is associated with many problems similar to a personal computer. In other words, as a smartphone’s power and capability increase, it is difficult for average persons to use. Therefore, its user gets stressed from handling it. When this stress decreases, the customer is satisfied with their gadget. In accordance with this concept, we propose the following hypothesis. Hypothesis 1: Perceived smartphone stress negatively influences customer satisfaction. Perceived enjoyment. Davis et al. [12] defined perceived enjoyment as “the extent to which the activity of using the computer is perceived to be enjoyable in its own right, apart from any performance consequences that may be anticipated (p.1113)”. This enjoyment is an important factor for adopting and continuously using Web-based information systems, computer software, and entertainment devices [13-15]. Smart phones have now become multimedia and entertainment devices. Using it, a user can listen to music, and watch TV and movies. In addition, they chat with their friends with Internet messenger services. Therefore, persons derive enjoyment from using smartphones. From these facts, we develop a hypothesis. Hypothesis 2: The perceived enjoyment positively influences customer satisfaction. 3.2 Device Characteristics (Instant Connectivity) Users can access the Internet and use their office system’s wireless devices anytime anywhere. In a subway or bus, a person can connect to portals or news sites, and read the latest news and gossip. This feature is called instant connectivity or ubiquitous access [16, 17]. This device characteristic affects the mood of the person and makes him/her happy and enjoyable.
424
B.-W. Park and K.C. Lee
As a smartphone has Wi-Fi functionality, its user can access the Internet free of charge. The user checks an email and connects her/his desktop and copies a file from it. Therefore, they feel that their decision to buy this smartphone is a wise one. In accordance with this concept, we propose the following hypothesis. Hypothesis 3: Instant connectivity positively influences customer satisfaction.
Stress
H1
User Experience Perceived Enjoyment
Device Characteristic
H2
Satisfaction
H3
Instant Connectivity Gender
Usage Length
Fig. 1. The Research Model
4 Empirical Analysis 4.1 Questionnaire Survey and Sample Statistics The survey items are adapted from various studies in the literature. At first, to measure smartphone stress, we modified Hudiburg [10]’s computer hassle scale that consists of 37 items. In instant connectivity, three items are constituted from two sources [16, 17]. In addition, the survey items regarding customer satisfaction and enjoyment are drawn from well-known literature. After developing the survey items, we randomly selected smartphone users in South Korea. The initial number of respondents was 43; after assessing the response quality, we selected 33 suitable respondents (male: 21, female: 12). Most respondents were in their twenties except for four respondents (30s: 1, 40s: 3). 4.2 Reliability and Confirmatory Factor Analyses Six constructs pertaining to phone stress (stress1, stress 2, and stress 3), satisfaction, instant connectivity, and enjoyment were used in this analysis. The survey items of each construct were reliable because the Cronbach alpha was higher than 0.8. The validity of the survey items was tested using principal component analysis with varimax rotation. The first factor, Stress 1, explained 37.5% of the total variance; the second factor, character identification, 14.8%. The total variance explained by the six factors was 81.4%. From these results, we concluded that the survey items were statistically valid.
A Pilot Study to Analyze the Effects of User Experience and Device Characteristics
425
4.3 Results Using hierarchical regression analysis in SPSS 17.0, we tested which factors affected customer satisfaction in smartphones. At first, we controlled the gender and usage length of smartphones. Then, we added the variables one by one. As the sample size was small, we included results at the 90% significance level. As a result, gender was significant at the 90% level in all models except Model 4 in Table 1. Instant connectivity was significant at the 95% confidence level only in Model 2, but at the 90% level in Models 3 and 4. Table 1. Results of regression analysis Variable Gender (male=0, female=1) Usage length of smartphone Instant connectivity Enjoyment Stress1_Device Stress2_AppData Stress3_NewLearn R Square R Square change F **<0.05, *<0.01
Model 1
Model 2
Model 3
Model 4
-0.331(*)
-0.301(*)
-0.330(*)
-0.264
-0.301
-0.176
-0.167
-0.246
0.431(**)
0.359(*) 0.141
0.388(*) 0.087 -0.247 0.275 -0.158
0.313 0.172 4.394
0.326 0.013 3.380
0.403 0.177 2.409
0.141 2.455
4.4 Discussion This paper deals with smartphone stress and related factors that affect customer satisfaction. From this analysis, we have found the following results. First, smartphone stress is categorized into three components: device-related stress, data/application stress, and new-learning stress. Device-related stress means that the stress occurs when the smartphone is not working or is down. Data/application stress means that the stress occurs when the stored data crashes or the application is not working. New-learning stress means that as new applications and features are included, users have to learn new applications and features. However, smartphone stress is not an important factor regarding user satisfaction because early adopters are tech-savvy and find it less difficult to use new devices than average persons. Therefore, such users do not get stressed from using smartphones. Second, instant connectivity, which connects users anywhere anytime to the Internet, is an important factor that impacts smartphone users’ satisfaction. Many users are familiar with the Internet, and want to use the Internet anytime anywhere
426
B.-W. Park and K.C. Lee
and free of charge. This need is fulfilled through smartphones that are embedded with Wi-Fi functionality. Therefore, telecom operators should try to install more Wi-Fi zones than other competitors and need to speed up access. Third, to control the analysis, we consider the gender and length of experience with smartphones. In this study, female customers are not satisfied with smartphones because male customers are more knowledgeable about new devices and want to try new devices more compared to female customers. To increase the market size of smartphones, smartphone makers should facilitate their use. The length of usage of smartphones does not affect customer satisfaction because many users are beginners in terms of using smartphones.
4 Concluding Remarks In daily life, we cannot live without wireless phones. Anywhere anytime, we carry a cellular phone. This cellular phone is no longer merely a phone for calling a person; it is a multimedia device and small computer. Therefore, average persons find it difficult to deal with smartphones, and can get stressed from using smartphones. In this study, we have sought to find out what stresses arise in the use of smartphones, and the effect of this factor and related factors on customer satisfaction. To analyze this issue, we collected responses from university students and part-time MBA students in South Korea. From this regression analysis, we found that first, smartphone stress is not a problem at the present time. Second, instant connectivity is an important factor regarding customer satisfaction. Third, female customers are not satisfied with smart phones compared with male customers. Not only do these results help us to understand smartphone users, but also smartphone makers and telecom operators need to consider these results to rapidly increase the market for smartphones. This study reports the first approach and pilot study that considers smartphone stress with regard to customer satisfaction. However, many factors affect customer satisfaction. In future, we will add related factors and analyze in detail. Acknowledgments. This study was supported by WCU (World Class University) program through the National Research Foundation of Korea funded by the Ministry of Education, Science and Technology (Grant No. R31-2008-000-10062-0).
References 1. Kim, S.H.: Moderating Effects of Job Relevance and Experience on Mobile Wireless Technology Acceptance: Adoption of a Smartphone by Individuals. Information & Management 45, 387–393 (2008) 2. Ling, C., Hwang, W., Salvendy, G.: Diversified Users’ Satisfaction with Advanced Mobile Phone Features. Universal Access in the Information Society 5(2), 239–249 (2006) 3. Roos, J.P.: Postmodernity and Mobile Communications. In: The 5th Conference European Sociological Association (2001), http://www.valt.helsinki.fi/staff/jproos/mobilezation.htm
A Pilot Study to Analyze the Effects of User Experience and Device Characteristics
427
4. Chen, Y.-F.: The Relationship of Mobile Phone Use to Addiction and Depression among American College Students. In: 2004 Seoul Conference on Mobile Communication Conference (2004) 5. Toda, M., Monden, K., Kubo, K., Morimoto, K.: Cellular Phone Dependence Tendency of Female University Students. Nippon Eiseigaku Zasshi 59, 383–386 (2004) 6. Kawasaki, N., Tanei, S., Ogata, F., Burapadaja, S., Loetkham, C., Nakamura, T., Tanada, S.: Survey on Cellular Phone Usage on Students in Thailand. Journal of Physiological Anthropology 25, 377–382 (2006) 7. Hashim, H.A., Hamid, S.H.A., Rozali, W.A.W.: A Survey on Mobile Games Usage among the Institute of Higher Learning (IHL) Students in Malaysia. In: The 1st International Symposium on Information Technologies and Applications in Education, pp. 40–44 (2007) 8. Nova, N., Girardin, F., Dillenbourg, P.: A Mobile Game to Explore the Use of Location Awareness on Collaboration. HCI International 2005 (2005), http://craftwww.epfl.ch/research/publications/ hci2005_nova.pdf 9. Licoppe, C., Inada, Y.: Emergent Uses of a Multiplayer Location-aware Mobile Game: the Interactional Consequences of Mediated Encounters. Mobilities 1(1), 39–61 (2006) 10. Hudiburg, R.A.: Psychology of Computer Use XXXIV: The Computer Hassles Scale: Subscales, Norms and Reliability. Psychological Reports 77, 779–782 (1995) 11. Davis, F.D.: Perceived Usefulness, Perceived Ease of Use Interface, and User Acceptance of Information Technology. MIS Quarterly 13, 319–340 (1989) 12. Davis, F.D., Bagozzi, R.P., Warshaw, P.R.: Extrinsic and Intrinsic Motivation to Use Computers in the Workplace. Journal of Applied Social Psychology 22, 1111–1132 (1992) 13. Lin, C.S., Wu, S., Tsai, R.J.: Integrating Perceived Playfulness into Expectation– Confirmation Model for Web Portal Context. Information and Management 42(5), 683–693 (2005) 14. Thong, J.Y.L., Hong, S.-J., Tam, K.Y.: The Effects of Post-Adoption Beliefs on the Expectation–Confirmation Model for Information Technology Continuance. International Journal of Human-Computer Studies 64(9), 799–810 (2006) 15. Yi, M.Y., Hwang, Y.: Predicting the Use of Web-based Information Systems: SelfEfficacy, Enjoyment, Learning Goal Orientation, and the Technology Acceptance Model. International Journal of Human-Computer Studies 59, 431–449 (2003) 16. Durlacher Research. Mobile Commerce Report (1999) (2002) 17. Chae, M., Kim, J.: What’s So Different About the Mobile Internet? Communications of the ACM 46(12), 240–247 (2003)
Exploring the Optimal Path to Online Game Loyalty: Bayesian Networks versus Theory-Based Approaches Nam Yong Jo1, Kun Chang Lee2,*, and Bong-Won Park3,* 1
SKK Business School, Sungkyunkwan University Seoul 110-745, Republic of Korea [email protected] 2 Professor at SKK Business School WCU Professor at Department of Interaction Science Sungkyunkwan University Seoul 110-745, Republic of Korea [email protected] 3 Department of Interaction Science Sungkyunkwan University Seoul 110-745, Republic of Korea [email protected]
Abstract. Online games have become a prevailing form of entertainment due to ease in accessing the Internet with tools such as home networks, WiBro, WiFi, and various mobile devices. With the expansion of the online game industry and number of users, researchers have begun to explore the factors affecting user loyalty. Intuitionally, online game playing can be thought of solely as a way to provide pleasure, but the characteristics of online game playing, such as interactions with other users and a real-time approach, introduce the need to explore other psychological factors affecting user loyalty. Thus, this study proposes a new research model consisting of users' perceived loneliness and stress to investigate whether these variables have relationships and mediating roles with enjoyment and flow. To prove the validity of our proposed research model, we first perform an empirical analysis using PLS with 187 valid questionnaire items and find that the proposed research model is statistically significant. Second, we adapt the Bayesian network model to explore new models and compare those with the theory-based SEM. Moreover, we assemble each structure based on best performance to explore an alternative optimized structural model. The empirical results indicate that the theory-based model, which is driven by prior research, performs better than the Bayesian network model. The implications and future research are also discussed. Keywords: Online games, perceived stress, loneliness, character identification, enjoyment, flow, Bayesian network.
1 Introduction Online games are a prevailing form of entertainment, regardless of user age, because it is so easy to access the Internet with the introduction of mobile devices and home *
Corresponding authors.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 428–437, 2011. © Springer-Verlag Berlin Heidelberg 2011
Exploring the Optimal Path to Online Game Loyalty
429
networks. With the market volume of the online game industry and number of online gamers increasing, the global revenues of online gaming are expected to reach more than $24 billion by 2013 [1]. Thus, it is expected that the expansion of the online game industry will create new jobs and emerging markets. On the other hand, the phenomenon of online game use can also have negative effects for users, such as divorce, job loss, and school absence. Developers and researchers of online games are studying the motivations behind online game use and are analyzing the positive and negative effects of such use. However, most research has been focused on investigating only a partial set of aspects, such as the psychological factors or the experiential factors related to online game use. In addition, much research has been focused on young students and adolescents, although a large number of older people are enjoying online games [2, 3]. In this study, we discuss the structure of factors that affect loyalty in regards to online game use. These include psychological factors, such as loneliness and perceived stress, and experiential factors, such as character identification, flow, and perceived enjoyment. Prior research indicates that these factors have cause-and-effect relationships with one another. We first conduct a confirmative analysis with a PLS model and an empirical test with survey data. Second, the Bayesian network (BN) approach is introduced to explore the best structures to describe each construct factor. We then compare them with the PLS model to investigate the validity and usefulness of the findings. The current paper is organized as follows: Section 2 contains a brief review of the literature on online games and the well-known BN structure algorithm used in the study. In Section 3, we develop the research hypotheses and theoretical structure, after which the analysis is performed and the results with both PLS and BNs are discussed. Finally, concluding remarks, implications, and future research are presented in Section 4.
2 Theoretical Background 2.1 Online Games Studies of online games can be broadly classified into three categories. The first is online game adoption, which includes the characteristics that affect game playing. The second is game addiction, which is when a game user feels a compulsion to play the online game. Finally, there is a group of literature on the business models of online games. The three streams of prior research are briefly introduced in this section. Using and extending the technology adoption model (TAM) or the theory of reasoned action (TRA), many researchers have attempted to explain why individuals play online games. For example, Hsu and Lu [4] found that the flow experience is an important factor in the intention to play an online game. Wu and Liu [5] uncovered that trust in online game websites and online game enjoyment are positively related to one’s attitude toward online games. Wang and Wang [6] found that male game users have a higher intention to play online games than do female game users; furthermore, older individuals are increasingly playing online games. As the Internet becomes more popular and is used by more people, Internet addiction is becoming more prevalent. Young [7] defined Internet addiction as “an impulse-control disorder which does not involve an intoxicant” and developed a questionnaire to measure
430
N.Y. Jo, K.C. Lee, and B.-W. Park
Internet addiction. This scale is used worldwide to measure Internet addiction [8, 9, 10]. In addition, this measure can be applied to online game addiction after modifying some of the meanings [11, 12]. The online game business model is classified into a subscription model and a free-to-play model. In the subscription model, game users subscribe on a monthly basis in order to play a specific game. In the free-to-play model, a person can play the online game free of charge, but to adorn or increase the power of one’s game character (i.e., avatar), online game items (virtual products) must be purchased. In this sense, Lin and Sun [13] insisted that game users are not just players but also consumers. The motivations for purchasing game items are the perceived playfulness, character competency, and requirements of the quest system [14]. 2.2 Bayesian Network (BN) The Bayesian network (BN), which can be defined as a directed acyclic graph (DAG), has been used effectively to support decision making. The qualitative part of BN describes the relationship of each variable in a graphical structure, where each variable is called a node. This qualitative part includes the representation of each variable and its statistical value. In DAG (and therefore BN), the arcs represent a direct influential or causal relationship among the variables. The quantitative part of BN is defined as the numerical quantities of the probability distributions of the variables. The variables have an associated probability assessment function that describes the influence of each variable’s predecessors. In the following, we briefly introduce several structures needed to develop the BN classifier in this study. Naïve Bayes Network (NBN) In an NBN, which is discussed in depth in [15], a class node is linked with all of the children nodes, which are explanatory variables for the class variable. Although all of the explanatory variables are dependent on the class node, they are independent of each other. This assumption of the NBN model has been regarded as too rigid and an incorrect reflection of reality. Tree Augmented Naïve Bayes Network (TAN) The TAN introduced by [16] compensates for those weaknesses of NBN, expanding NBN into a tree structure. The idea is that there are correlations among variables, where the class variable has no parents and each variable has the class variable and (at most) one other attribute as parents [16]. This tree-like structure is based on adapting the procedure of [17]. General Bayesian Network – K2 (GBN-K2) The GBN-K2 learning [17] procedure requires a node ordering. It initially assumes that a node has no parent but incrementally adds a parent whose addition most increases the probability of the resulting network until the addition of no one parent can increase the structure’s probability. This is repeated for all nodes, specified sequentially by the node ordering, which is arbitrarily taken by turns of variables in the training data set with the class node always placed first in the order [19].
Exploring the Optimal Path to Online Game Loyalty
431
General Bayesian Network – Hill Climber (GBN-HC) The GBN-HC algorithm starts learning by initializing the structure to NBN. Next, each possible arc from any node, except class [Xi to Xj (Xi≠Xj)] is evaluated using leave-one-out cross validation to estimate the accuracy of the network with that arc added. If no arc produces an improvement in accuracy, the current structure is determined. If the arc that gave the most improvement is retained, the node the arc points to is removed. This is repeated until there is just one node remaining, or no arcs can be added to improve the classification accuracy [18].
3 Model Construction and Analysis 3.1 SEM Model Construction Based on the prior literature and theories discussed in the previous section, we established the following hypotheses and structural equation model with six conceptual constructs. Hypotheses 1a and 1b are characterized as game users’ personal psychological emotions. Hypotheses 2a, 2b, 2c, and 2d focus on the structure of the influence of users’ character identifications and perceived enjoyment on flow, which describes addiction to online game playing. The last hypothesis (Hypothesis 3) is about the influence of flow on game loyalty. The overall structure of the hypotheses and paths is described in Figure 1. Hypothesis 1a: Loneliness is positively related to perceived stress. Hypothesis 1b: Loneliness is positively related to character identification (CI). Hypothesis 2a: Perceived stress is positively related to flow. Hypothesis 2b: CI positively related to flow. Hypothesis 2c: CI is positively related to perceived enjoyment. Hypothesis 2d: Perceived enjoyment is positively related to flow. Hypothesis 3: Flow is positively related to online game loyalty. 3.2 Empirical Analysis of the Survey 3.2.1 Questionnaire Survey and Sample Statistics The sample was collected using an online survey in Korea. The respondents ranged in age from 18 to 50 and almost the same number of male and female respondents participated in the survey. The sample included a total of 187 respondents. The survey items were adopted from prior literature. First, to measure loneliness, a short-form UCLA Loneliness Scale (ULS-8) [20] consisting of eight items was used. This scale is as reliable and valid as the long-form UCLA Loneliness Scale. Second, to measure perceived stress, we used the four-item Perceived Stress Scale (PSS) [21, 22]. Third, to measure character identification, we used questionnaire items from the studies of Hefner et al. and Fornell and Larcker [23, 24]. All of the questionnaire items used a seven-point scale, ranging from completely disagree to completely agree. 3.2.2 Reliability and Confirmatory Factor Analyses The survey items of each construct were reliable, with all of the Cronbach’s alpha values greater than 0.7. The validities of the survey items were also tested using a
432
N.Y. Jo, K.C. Lee, and B.-W. Park
principal component analysis with varimax rotation. The first factor, loneliness, explained 26.73% of the total variance, and the second factor, loyalty, accounted for 22.04%. The total variance explained by the six factors was 73.7%. From these results, we concluded that the survey items were statistically valid.
Fig. 1. Results of the structural model
To confirm the reliability and validity of the measurement data, we performed a confirmatory factor analysis. In the reliability test, all of the composite reliabilities were greater than 0.7, and the average variance extracted (AVE) values were greater than 0.5 [25]. For the validity test, the correlations between the two factors were less than the square root of the AVE value of each factor [24]. Therefore, these data were reliable and valid. The hypotheses were tested using SmartPLS 2.0 with the bootstrapping procedure, and all of the hypotheses were accepted (with a 99% confidence level). 3.3 Bayesian Network Approach In the prior section, we analyzed the SEM model structured by prior research and theories. Now, we explore whether there is a more rigid model and if there are causal relations between constructs. These questions are investigated using BN approaches. 3.3.1 Various BN Algorithm Adoptions For the purpose of exploration, we performed a number of experiments to find the best description model for each construct (i.e., node in BN) by setting each node as a class. Table 1 shows the performance for every model we estimated. The models are combinations of well-known BN search algorithms such as naive TAN, HC, and K2, with scoring types such as BAYES, BDeu, MDK, ENTROPY, and AIC. The best performance of each model of the same class node is parenthesized. All experiments were performed by WEKA 3.7.2. We compared the computational performances with the SEM structure model and found that the theory-based SEM model is never defeated by the WEKA computational algorithm models.
Exploring the Optimal Path to Online Game Loyalty
433
Table 1. Performance comparison of BN algorithm and PLS structure Identification
Stress
Enjoyment
Flow
Loyalty
Naïve
36.77(10.21)
50.69( 9.08)
59.08( 9.81)
50.22( 9.18)
51.27(10.67)
49.61
TAN-BAYES
37.17(11.09)
49.64( 8.34)
59.84(10.29)
47.13( 8.72)
49.36(10.66)
48.63
TAN-BDeu
40.03( 9.77)
48.04( 8.12)
57.52( 9.71)
48.37( 8.66)
52.63(10.99)
49.32
TAN-MDL
36.57(10.20)
50.19( 8.70)
59.02(10.54)
46.68( 9.08)
45.82(10.70)
47.66
TAN-ENTROPY
35.97(10.14)
49.93( 8.86)
59.56(10.94)
46.57( 9.18)
45.29(10.27)
47.46
TAN-AIC
35.71(10.13)
50.15( 8.96)
59.61(10.72)
46.84( 9.44)
45.29(10.59)
47.52
HC-BAYES
39.35( 9.80)
52.68( 6.62)
56.07(10.26)
47.36( 9.65)
52.17(10.76)
49.53
HC-BDeu
27.81(2.09)
55.11(2.77)
51.31(7.89)
41.22(1.98)
45.62(7.31)
44.21
HC-MDL
27.81(2.09)
55.11(2.77)
46.01(2.78)
41.16(1.96)
44.93(2.56)
43.01
HC-ENTROPY
32.85( 9.61)
47.34( 9.28)
33.90(13.73)
30.99(12.23)
32.98(10.11)
35.61
HC-AIC
39.08( 9.62)
53.59( 5.54)
57.05(10.21)
46.99( 9.64)
53.38(11.33)
50.02
K2-BAYES
34.43(10.16)
53.99( 5.00)
56.74( 8.94)
50.47( 9.49)
53.18(11.35)
49.76
K2-BDeu
27.81(2.09)
55.11(2.77)
51.73(7.52)
41.16(1.96)
45.63(7.36)
44.29
K2-MDL
27.81(2.09)
55.11(2.77)
46.01(2.78)
41.16(1.96)
44.93(2.56)
43.01
K2-ENTROPY
34.14(10.06)
38.43(13.16)
45.34(15.12)
39.53(13.92)
38.73(13.99)
39.23
K2-AIC
36.21(10.12)
53.49( 5.57)
57.69( 9.68)
47.94( 9.78)
53.58(11.23)
49.78
Fixed-assembled
39.56(9.74)
59.91(8.77)
51.50(9.81)
56.84(9.81)
Fixed-PLS Structure
41.12(10.64)
60.69( 9.07)
50.39( 9.58)
50.40( 8.79)
47.71( 8.83)
Average
50.06
Figures in parenthesis indicate the models used to construct the fixed-assembled model.
3.3.2 Fixed-Assembled Structures of the Best Performing Computational Models To explore a better structure of BN, we explored whether the prediction performances could be improved by assembling the best performing computational models. The procedures of assembly are presented in Fig. 2. During assembly, when the arcs appear in both directions, we adopted the same direction as the SEM model. The performance of this model is also described in the second line from the bottom in Table 1. Most of these models performed better than any other computational combination models with the exception of enjoyment prediction accuracy in the SEM model. 3.4 Discussion This paper addressed the psychological and experiential factors that affect customer loyalty in online games using PLS (partial least squares) regression. From this analysis, we reached the following conclusions. First, lonely people experience greater stress. In addition, if a game user feels lonely, s/he tends to attach to her/his game character through character identity. Second, a person who is feeling stress tends to more easily increase the state of flow, and if the gamer identifies with the game character, s/he focuses more intently on the game. Therefore, character identification positively impacts the flow of the game. Third, if a person identifies with the game character, s/he finds the online game more enjoyable. Therefore, character identification is an important factor in enhancing online gaming enjoyment. Finally, a game user who experiences flow will want to play the online game again and will recommend the game to other people.
434
N.Y. Jo, K.C. Lee, and B.-W. Park
D &ODVV,GHQWLILFDWLRQ
G &ODVV/R\DOW\
E &ODVV)ORZ
F &ODVV(QMR\PHQW
H )L[HGDVVHPEOHG
Fig. 2. BN constructs used for combining the fixed-assembled structures
(a) Initial distribution
(b) Bayesian inference
Fig. 3. Bayesian inference with the PLS structure
In the second study of BN approaches, the SEM model performed better than every other BN-based model. This result indicates that prior research based on theory is very rigid and well-constructed. Even algorithm-based computational models created by computers, which are popular and validated in prior research, could not perform better than the SEM model. Nevertheless, the BN approaches provide us with useful implications. As shown in Fig. 3, we can use Bayesian inference to find the best influencing factors through a WEKA what-is simulation test. We can recognize that flow mainly
Exploring the Optimal Path to Online Game Loyalty
435
impacts loyalty, and perceived enjoyment has a slight effect on loyalty, but other constructs do not have meaningful effects. Moreover, this BN model can be used in further experiments to explore how much impact the child node of a construct has in the whole model of constructs and relations. However, if we simply need to predict a partially optimized prediction model, we can choose the best model generated by setting each node as a class.
4 Conclusion and Future Works Advances in high speed Internet devices have helped the online gaming industry flourish. Without meeting in the real world, users can meet other individuals in the cyber world. By playing online games, game users relieve stress and experience enjoyment. There has been limited research considering the psychological and experiential factors related to online game use, and studies of users from various age groups have also been limited. Our results confirm the results of previous research, and game development companies should create online games that ensure a feeling of flow during game play. The present study contributes to the literature in several ways. First, the theory-based SEM model reveals that loneliness, character identification, enjoyment, and stress affect online game loyalty and are mediated by flow. This finding indicates that we should consider game addiction and adoption at the same time. In addition, this structure shows valid results. Second, adopting four kinds of structure learning algorithms and scoring types (with a total of 14 combinations of algorithms) to find the best structure for predicting the class nodes of performance, we found that the PLS-based model performed better than any other combination of computational algorithm-based models. This result supports the validity of the SEM and PLS models. We conducted further analyses to find the best structure by combining the optimized models, and found the results to be as good as the SEM model. However, that method cannot describe the full structure of the constructs. Thus, we suggest that the SEM model is the best choice for the purpose of this study. Third, the SEM-based model is useful in that it can be used for dynamic simulation to find the amount of influence of each node to node relationship by Bayesian influence. This simulation can help online game developers and researchers find the strength of influence of each construct, and find the best path to online game loyalty. Further study issues remain. For example, many factors may also depend on the game users’ favorite game genres and age. A recommendation system could be developed to suggest online games for users to choose from, which can increase their loyalty. These topics will be analyzed in detail in future works. Acknowledgments. This research was supported by WCU (World Class University) program through the National Research Foundation of Korea funded by the Ministry of Education, Science and Technology (Grant No. R31-2008-000-10062-0).
436
N.Y. Jo, K.C. Lee, and B.-W. Park
References 1. Wu, J.: Global Video Game Market Forecast (2010), http://www.strategyanalytics.com/default.aspx?mod= PressReleaseViewer&a0=4862 2. Abraham, L.B., Mörn, M.P., Vollman, A.: Women on the Web: How Women are Shaping the Internet, comScore (2010) 3. Williams, D., Yee, N., Caplan, S.E.: Who Plays, How Much, and Why? Debunking the Stereotypical Gamer Profile. Journal of Computer-Mediated Communication 13, 993–1018 (2008) 4. Hsu, C.L., Lu, H.P.: Why do People Play On-Line Games? An Extended TAM with Social Influences and Flow Experience. Information & Management 41, 853–868 (2004) 5. Wu, J., Liu, D.: The Effects of Trust and Enjoyment on Intention to Play Online Games. Journal of Electronic Commerce Research 8, 128–140 (2007) 6. Wang, H.Y., Wang, Y.S.: Gender Differences in the Perception and Acceptance of Online Games. British Journal of Educational Technology 39, 787–806 (2008) 7. Young, K.: Caught in the Net. John Wiley & Sons, New York (1998) 8. Kim, K., Ryu, E., Chon, M., Yeun, E., Choi, S., Seo, J., Nam, B.: Internet Addiction in Korean Adolescents and its Relation to Depression and Suicidal Ideation: A Questionnaire Survey. International Journal of Nursing Studies 43, 185–192 (2006) 9. Leung, L.: Net-Generation Attributes and Seductive Properties of the Internet as Predictors of Online Activities and Internet Addiction. Cyberpsychology & Behavior 7, 333–348 (2004) 10. Yoo, H., Cho, S., Ha, J., Yune, S., Kim, S., Hwang, J., Chung, A., Sung, Y., Lyoo, A.: Attention Deficit Hyperactivity Symptoms and Internet Addiction. Psychiatry and Clinical Neurosciences 58, 487–494 (2004) 11. Johansson, A., Götestam, K.G.: Problems with Computer Games without Monetary Reward: Similarity to Pathological Gambling. Psychological Reports 95, 641–650 (2004) 12. Lu, H.-P., Wang, S.-M.: The Role of Internet Addiction in Online Game Loyalty: An Exploratory Study. Internet Research 18, 499–519 (2008) 13. Lin, H., Sun, C.T.: Cash Trade within the Magic Circle: Free-to-play Game Challenges and Massively Multiplayer Online Game Player Responses. In: DiGRA 2007: Situated Play, pp. 335–343 (2007) 14. Guo, Y., Barnes, S.: Virtual Item Purchase Behavior in Virtual Worlds: an Exploratory Investigation. Electronic Commerce Research 9, 77–96 (2009) 15. IEEE Std. 1219: Standard for Software Maintenance. IEEE Computer Society Press, Los Alamitos (1993) 16. Jørgensen, M.: A Review of Studies on Expert Estimation of Software Development Effort. Journal of Systems and Software 70, 37–60 (2004) 17. Fiol, M., Huff, A.S.: Maps for Managers: Where Are We? Where Do We Go From Here? Journal of Management Studies 29, 269–285 (1992) 18. Lientz, B.P., Swanson, E.B., Tompkins, G.E.: Characteristics of Application software maintenance. Communication of the ACM 21, 466–471 (1987) 19. Zelkowitz, M., Shaw, A., Gannon, J.: Principles of Software Engineering and Design. Prentice-Hall, New Jersey (1979) 20. Hays, R.D., DiMatteo, M.R.: A Short-form Measure of Loneliness. Journal of Personality Assessment 51, 69–81 (1987) 21. Cohen, S., Kamarck, T., Mermelstein, R.: A Global Measure of Perceived Stress. Journal of Health and Social Behavior 24, 385–396 (1983)
Exploring the Optimal Path to Online Game Loyalty
437
22. Hefner, D., Klimmt, C., Vorderer, P.: Identification with the Player Character as Determinant of Video Game Enjoyment. In: Ma, L., Rauterberg, M., Nakatsu, R. (eds.) ICEC 2007. LNCS, vol. 4740, pp. 39–48. Springer, Heidelberg (2007) 23. Cohen, S., Williamson, G.: Perceived Stress in a Probability Sample of the U.S. In: Spacapam, S., Oskamp, S. (eds.) The Social Psychology of Health: Claremont Symposium on Applied Social Psychology. Sage, Newbury Park (1988) 24. Fornell, C., Larcker, D.: Structural Equation Models with Unobservable Variables and Measurement Error. Journal of Marketing Research 18, 39–50 (1981)
The Effect of Users’ Characteristics and Experiential Factors on the Compulsive Usage of the Smartphone Bong-Won Park1 and Kun Chang Lee2,* 1
Department of Interaction Science Sungkyunkwan University Seoul 110-745, Republic of Korea [email protected] 2 Professor at SKK Business School WCU Professor at Department of Interaction Science Sungkyunkwan University Seoul 110-745, Republic of Korea Tel.: +82 7600505, Fax: +82 2 7600440 [email protected]
Abstract. In recent times, mobile phones have become more than a product to make phone calls. As mobile phones have many functions, like a personal computer, they have become mandatory. Therefore, many people use them all the time. This compulsive usage of mobile phones may block human relationships. In this study, we analyzed how user’s characteristics and experiential factors impacted the compulsive usage of the smartphone. To prove our research model, empirical analysis was performed with 183 valid questionnaires using Partial Least Square. From this analysis, we found that perceived enjoyment, satisfaction with smartphones and personal innovativeness positively impact the compulsive usage of smartphones. Keywords: smartphone, loneliness, satisfaction, enjoyment, innovativeness, compulsive usage.
1 Introduction In early times, people made and received phone calls using cellular phones. However, nowadays, cellular phones are often small computers (i.e., smartphones) as new technology has evolved. For example, people can watch movies and listen to music on one of these devices. Currently, as smartphones basically have an embedded Wi-Fi function, people can connect to the Internet free in Wi-Fi zones. As people can easily connect to the Internet, they can check their email and manage social networking sites at any place and at any time on their smartphones. Therefore, the global smartphone market grew rapidly in 2010 compared with the previous year [1]. In reality, almost all smartphone users always carry their smartphones. On subways or buses, we can see many people reading the latest news and managing their social *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 438–446, 2011. © Springer-Verlag Berlin Heidelberg 2011
The Effect of Users’ Characteristics and Experiential Factors
439
networking sites. In addition, some people check their smartphones continuously while eating our or otherwise socializing with friends. This compulsive usage may negatively impact their social lives and personal lives. For example, workers can process their work at home or on vacation by instant and ubiquitous connectivity. Therefore, the border between work and family life is disappearing [2]. In understanding addiction, a major perspective is to psychological difference research such as self-esteem and loneliness [3, 4]. However, in the case of smartphones, innovators and those who are considered tech savvy use them much more than the average person. Therefore, experiential factors can be another perspective to understand smartphone users. As a result, integrated research considering personal characteristics and experiential factors is important to understand a smartphone user’s addiction. In addition, the smartphone addiction is not a problem like game and alcohol addictions. Therefore, instead of an addictive perspective, a compulsive usage perspective is more suitable to analyze smartphone users. In this paper, we analyze how the users’ characteristics and experiences impact the compulsive usage of smartphones.
2 Previous Studies Research on mobile phones can be classified into three parts. The first part is adoption of the mobile phone and mobile phone-related applications. The second is the addiction to mobile phones. The third is mobile business based on cellular phones. 2.1 Adoption of the Mobile Phone Many various smartphones are being launched to the population. Therefore, companies that develop them want to know which factors are important to customers. To explain this issue, one approach is to use the TAM (Technology Adoption Model). In this model, ease-of-use and usefulness are important factors to explain the adoption intention of new devices [5]. In the medical context, Park and Chen [6] found that perceived usefulness and attitude toward using them positively impact the behavioral intent among medical doctors and nurses. In addition to the ease-of-use and usefulness, many other factors such as enjoyment, flow, and value are added in various contexts. For example, Teo and Pok [7] found that reference groups (social influence) positively affect the adoption of a WAP mobile phone among Internet users. Another aspect is which mobile phone form factor is important to buy and use the mobile phone. Therefore, researchers want to know which factors are important in purchasing decisions. Ling et al. [8] found that male users have phones embedded with cameras, Internet browsing, and wireless connectivity features more than female users. In addition, Peaw and Mustafa [9] found that the smartphone’s battery life and size are considered more important than others. 2.2 Addiction to Mobile Phones In addiction research, one research aspect is to make a questionnaire to measure dependency/addiction. For example, Toda et al. [10] developed a cellular phone dependence questionnaire (CPDQ, 20-items) using Japanese university students. In
440
B.-W. Park and K.C. Lee
addition, Koo [11] also developed a questionnaire (20 items) to measure cellular phone addiction in adolescents. In this study, he found three sub factors: withdrawal/tolerance, life dysfunction, and compulsion/persistence. Another research aspect is to compare addicted users and average users. For example, Roos [12] explained that addicted mobile phone users have economic and social difficulties because of too much time spent on the phone. In addition, they always switch their phones on and carry them to use anytime, anywhere. Chen [13] found that male undergraduate cellular phone users are less addicted than female users and users’ depression is positively related to mobile phone addiction. 2.3 Mobile Business (Mobile Game) As mobile phones’ processing power increases and the screens get larger, users can enjoy mobile gaming more than ever before. Therefore, many more mobile games have been developed recently. However, the attraction or curiosity of mobile gaming diminishes as the age and the education level of users increases [14]. To make a mobile game interesting and enjoyable, game companies are making games using users’ locations [15]. Furthermore, a user plays mobile games on the bus or subway using mobile phones and then s/he plays the same game on the Internet using a desktop PC after arriving home. In addition, like online games, mobile games also applied to learn how to drive a car and social values [16, 17]. Besides being used for education, mobile games can be used as a medium for advertising. For example, game development companies promote the brands of real cars using mobile car racing games [18].
3 Hypothesis Development 3.1 User Experience Perceived Enjoyment Davis et al. [5] defined perceived enjoyment as “the extent to which the activity of using the computer is perceived to be enjoyable in its own right, apart from any performance consequences that may be anticipated.” This enjoyment is used as one of the factors to adopt many new products and services. For example, this factor positively impacts the intention to play online games [19]. Smartphone users want to use smartphones to relieve stress or be entertained. They kill time by using mobile games. In addition, they chat with other friends. As smartphones are fun and enjoyable, users download apps from app stores and use them again and again. Hence, we formed the following hypothesis: Hypothesis 1: Perceived enjoyment is positively related to compulsive usage of smartphones. Satisfaction Customer satisfaction is a major concern in the products and services area. In general, a person may praise or criticize a product or service after s/he purchases or experiences a product or service [20]. Satisfied customers want to buy the same product again or visit the same restaurants and cafes again.
The Effect of Users’ Characteristics and Experiential Factors
441
With smartphones, a user can download various applications from app stores. In addition, a user can connect to the Internet anytime, anywhere. Satisfied users want to use their smartphones again and again. In addition, users play mobile games continuously on subways or buses. Even while walking, users read the latest news and check e-mail. This leads to the following hypothesis: Hypothesis 1: Satisfaction is positively related to compulsive usage of smartphones. 3.2 User Characteristics Loneliness As our society modernizes, married persons are having fewer babies and the number of single persons is increasing. Therefore, more and more people are eating alone in their homes and other places. In addition, they don’t have enough people to talk with. Therefore, they feel lonely. These lonely persons express themselves well in an online environment better than an offline environment [21]. Therefore, to meet and talk with unknown people, these lonely individuals compulsively use the Internet [22, 23]. In addition, smartphone users feel boredom/loneliness on subways or buses. To feel connected with others, they search the Internet and talk to others online. Therefore, as smartphone users feel lonely, they compulsively use their smartphones. In accordance with this concept, we propose the following hypothesis. Hypothesis 3: Loneliness is positively related to compulsive usage of smartphones. Personal Innovativeness Agarwal and Karahanna [24] defined PIIT (personal innovation in the domain of information technology) as "the willingness of an individual to try out any new information technology" and this factor is related to new media adoption and use [25, 26]. In the smartphone environment, smartphones are new products and therefore, those who have IT knowledge and are sensitive to fashion trends tend to have them. Therefore, those who have innovative characteristics may use smartphones compulsively. Henceforth, the following hypothesis is presented. Hypothesis 2: Personal innovativeness is positively related to the compulsive usage of the smartphone. User Experience Perceived Enjoyment
H1
Satisfaction H2
Loneliness
H3
H4 Personal Innovativeness
User Characteristics
Fig. 1. Research Model
Compulsive Usage of Smartphone
442
B.-W. Park and K.C. Lee
4 Empirical Analysis 4.1 Questionnaire Survey and Sample Statistics The survey items are adapted from various pieces of literature. First, perceived enjoyment is measured by various works such as Davis et al. [5] and van der Heijden [27] and is made up of 4 items. Loneliness is measured by Hays & DiMatteo’s UCLA Loneliness scale [28] and is composed of 8 items. Satisfaction is measured by various works such as Macintosh and Lockshin [29] and Bodet [30] and is made up of 4 items. Personal innovativeness is measured by various works such as Agarwal and Karahanna [24] and Agarwal and Prasad [31] and is composed of 4 items. Finally, compulsive usage of smartphones is measured by Koo’s scale [11]. However, in this study we focus on compulsive usage on smartphones. Therefore, we selected 7 compulsive usage items of Koo’s 20 items. In addition, all measurement items are translated into Korean except compulsive usage, and we used the seven-likert scale. The samples were collected from Korean smartphone users recruited by a research firm. Ages ranged from teenagers to those in their fifties (teens: 31, 20s: 31, 30s: 46, 40s: 42, and 50s: 33). Among 183 total respondents, the number of male and female respondents was almost equal (male: 91, female: 92). 4.2 Reliability and Confirmatory Factor Analyses Preliminary Analysis Five constructs, including loneliness, compulsive usage, satisfaction, perceived enjoyment, and personal innovativeness were used in this analysis. The Cronbach’s alpha values in the survey items of each construct were greater than 0.9. Therefore, these items were reliable. In addition, a principal component analysis with varimax rotation was conducted for the validities of the survey items in Table 1. The total variance explained by the five factors was 71.6%, as shown in Table 1. In these factors, loneliness, the first factor, explained 30.31% of the total variance; the second factor, compulsive usage, accounted for 16.76%. From these results, we concluded that the survey items were statistically valid. The Measurement Model In this stage, a confirmatory factor analysis was performed to confirm the reliability and validity of the measurement data. For the reliability test, all of the composite reliabilities should be greater than 0.7, and the AVEs (Average Variance Extracted) were greater than 0.5 [32]. For the validity test, the square root of the AVE value of each factor should be larger than the correlation between two factors [32]. Therefore, these data in Table 2 were reliable and valid. The Structural Model The Partial Least Square with the bootstrapping procedure was used for testing hypotheses. At a 95% confidence level, all of the paths except the relationship between loneliness and compulsive usage were accepted in Figure 2.
The Effect of Users’ Characteristics and Experiential Factors
443
Table 1. Results of reliability and factor analyses Survey item LONE COM SAT ENJ Cronbach’s α loneliness5 0.89 loneliness4 0.86 loneliness2 0.83 0.918 loneliness8 0.83 loneliness7 0.80 loneliness1 0.79 compulsiveusage3 0.78 compulsiveusage4 0.76 compulsiveusage1 0.70 0.846 compulsiveusage5 0.70 compulsiveusage7 0.70 compulsiveusage6 0.68 compulsiveusage2 0.56 phonesatisfaction3 0.89 phonesatisfaction2 0.88 0.931 phonesatisfaction4 0.87 phonesatisfaction1 0.85 enjoyment1 0.84 enjoyment2 0.83 0.896 enjoyment4 0.83 enjoyment3 0.71 innovative1 innovative2 0.909 innovative3 innovative4 Eigenvalue 7.577 4.191 2.473 2.134 Variance explained 30.307 16.764 9.893 8.535 Total variance explained (%) 30.307 47.071 56.963 65.499 Note 1: LONE: Loneliness, COM: Compulsive usage, SAT: Satisfaction, ENJ: Enjoyment, INNO: Personal innovativeness. Note 2: In loneliness, one item is deleted because of low factor loadings.
INNO
0.87 0.85 0.82 0.81 1.530 6.118 71.617
Table 2. Correlations of latent variables and AVEs Construct
Composite Reliability
LONE
COM
SAT
ENJ
INNO
Loneliness Compulsive usage Satisfaction Perceived enjoyment Personal innovativeness
0.924 0.881 0.951 0.929 0.937
0.820 -0.099 -0.148 -0.274 -0.166
0.720 0.336 0.407 0.364
0.911 0.466 0.339
0.876 0.493
0.887
Note: Values on the underlined diagonal are the square roots of the AVEs.
444
B.-W. Park and K.C. Lee
0.466**
Perceived Enjoyment
Satisfaction 0.238*
Loneliness
-0.017
Compulsive Usage of Smartphone
0.280** Personal Innovativeness : not significant ** p<0.01, *<0.05
Fig. 2. Results of the structural model
4.3 Discussion This paper analyzed how user characteristics including loneliness and personal innovativeness, and user experiences including perceived enjoyment and satisfaction affect compulsive usage of smartphones using the PLS. From this analysis, we reached the following conclusions. First, as users feel enjoyment from using a smartphone, they are satisfied with it. In app stores, users download new and needed applications. Using these applications, they play mobile games or chat with other friends. For example, many users download the free chatting application, KAKAO Talk. Using this application, they send (receive) messages to friends free of charge. It is a very fun experience because friends can send messages in real time. This experience makes a smartphone user happy and satisfied with his/her smartphone. Second, satisfied smartphone users have a tendency to use their smartphones compulsively. People who feel that a smartphone is a good tool to do their work will use it continuously. In addition, satisfied users want to use smartphones to work or for entertainment purposes. Eventually, satisfied customers will be loyal customers to smartphones and applications. Therefore, making sure a person is satisfied is useful in order to gain profits. Third, as users are innovative, they use smartphones compulsively. The smartphone is a new device. Early adopters buy and use them, meaning these innovative people use smartphones for a longer period of time than the average person. In addition, innovative persons download and try new applications. Therefore, they may use smartphones quite a bit. Finally, even though users feel that they are alone, they do not use smartphones compulsively. To drown out loneliness, many users play online games because game users can meet unknown game users. In the case of smartphones, smartphone users downloaded many applications from app stores for entertainment and to escape feelings of loneliness. However, interacting with friends is best experienced with smartphones. In addition, smartphone users only talk with known persons. Therefore, loneliness is not related to compulsive usage of smartphones.
The Effect of Users’ Characteristics and Experiential Factors
445
5 Concluding Remarks With the introduction of the smartphone, smartphone users can process job-related things on the move. In addition, by using smartphones, they relieve stress and find enjoyment. However, as many smartphone users have been using them for a long time, they do not talk with other people face to face. They only see the smartphone screen whether new messages arrived or not. This compulsive behavior may cause problems in the near future. Therefore, this study analyzed how user characteristics and user experiential factors affect the compulsive usage of smartphones. To study this issue, Korean smartphone users were selected and PLS were used. From this analysis, three insights are generated. First, perceived enjoyment is positively related to satisfaction. Second, satisfaction and personal innovativeness positively affect the compulsive usage of smartphones. Third, loneliness is not related to compulsive usage of smartphones. This study’s results will help smartphone marketing personnel and researchers to understand smartphone users. Acknowledgments. This study was supported by WCU (World Class University) program through the National Research Foundation of Korea funded by the Ministry of Education, Science and Technology (Grant No. R31-2008-000-10062-0).
References 1. International Data Corporation (IDC), Worldwide Quarterly Mobile Phone Tracker (2010) 2. Wajcman, J., Bittman, M., Brown, J.E.: Families without Borders: Mobile Phones, Connectedness and Work-Home Divisions. Sociology 42(4), 635–652 (2008) 3. Bianchi, A., Phillips, J.G.: Psychological Predictors of Problem Mobile Phone Use. CyberPsychology & Behavior 8, 39–51 (2005) 4. Reid, D.J., Reid, F.J.M.: Text or Talk? Social Anxiety, Loneliness, and Divergent Preferences for Cell Phone Use. Cyberpsychology & Behavior 10(3), 424–435 (2007) 5. Davis, F.D., Bagozzi, R.P., Warshaw, P.R.: Extrinsic and Intrinsic Motivation to Use Computers in the Workplace. Journal of Applied Social Psychology 22, 1111–1132 (1992) 6. Park, Y., Chen, J.V.: Acceptance and Adoption of the Innovative Use of Smartphone. Industrial Management & Data Systems 107(9), 1349–1365 (2007) 7. Teo, T.S.H., Pok, S.H.: Adoption of WAP-enabled mobile phones among Internet Users. Omega 31(6), 483–498 (2003) 8. Ling, C., Hwang, W., Salvendy, G.: Diversified Users’ Satisfaction with Advanced Mobile Phone Features. Universal Access in the Information Society 5(2), 239–249 (2006) 9. Peaw, T.L., Mustafa, A.: Incorporating AHP in DEA Analysis for Smartphone Comparisons. In: Proceedings of the 2nd IMT-GT Regional Conference on Mathematics, Statistics and Applications, Penang, Malaysia, pp. 307–315 (2006) 10. Toda, M., Monden, K., Kubo, K., Morimoto, K.: Cellular Phone Dependence Tendency of Female University Students. Nippon Eiseigaku Zasshi 59, 383–386 (2004) 11. Koo, H.Y.: Development of a Cell Phone Addiction Scale for Korean Adolescents. Journal of Korean Academy of Nursing 39, 818–828 (2009) 12. Roos, J.P.: Postmodernity and Mobile Communications. In: Proceedings of the 5th Conference European Sociological Association, Finland (2001), http://www.valt.helsinki.fi/staff/jproos/mobilezation.htm
446
B.-W. Park and K.C. Lee
13. Chen, Y.-F.: The Relationship of Mobile Phone Use to Addiction and Depression among American College Students. In: Proceedings of 2004 Seoul Conference on Mobile Communication Conference, Korea, pp. 344–352 (2001) 14. Hashim, H.A., Hamid, S.H.A., Rozali, W.A.W.: A Survey on Mobile Games Usage among the Institute of Higher Learning (IHL) Students in Malaysia. In: Proceedings of the 1st International Symposium on Information Technologies and Applications in Education, China, pp. 40–44 (2007) 15. Nova, N., Girardin, F., Dillenbourg, P.: A Mobile Game to Explore the Use of Location Awareness on Collaboration. In: Proceedings of HCI International 2005, USA (2005), http://craftwww.epfl.ch/research/publications/hci2005_nova.pdf 16. Attewell, J.: Mobile Technologies for Learning. Learning and Skills Development Agency, UK (2005) 17. Shiratuddin, N., Zaibon, S.B.: Mobile Game-Based Learning with Local Content and Appealing Characters. International Journal of Mobile Learning and Organisation 4(1), 55–82 (2010) 18. Salo, J., Karjaluoto, H.: Mobile Games as an Advertising Medium: Towards a New Research Agenda. Innovative Marketing 3(1), 71–82 (2007) 19. Wu, J., Liu, D.: The Effects of Trust and Enjoyment on Intention to Play Online Games. Journal of Electronic Commerce Research 8, 128–140 (2007) 20. Oliver, R.L.: Satisfaction: A Behavioral Perspective on the Consumer. Irwin/McGraw-Hill, New York (1997) 21. McKenna, K.Y.A., Green, A.S., Gleason, M.E.J.: Relationship Formation on the Internet: What’s the Big Attraction? Journal of Social Issues 58, 9–31 (2002) 22. Davis, R.A.: A Cognitive-Behavioral Model of Pathological Internet Use. Computers in Human Behavior 17, 187–195 (2001) 23. Kim, J., LaRose, R., Peng, W.: Loneliness as the Cause and the Effect of Problematic Internet Use: The Relationship between Internet Use and Psychological Well-being. CyberPsychology & Behavior 12, 451–455 (2009) 24. Agarwal, R., Karahanna, E.: Time Flies When You’re Having Fun: Cognitive Absorption and Beliefs about Information Technology Usage. MIS Quarterly 24(4), 665–694 (2000) 25. Yi, M.Y., Jackson, J.D., Park, J.S., Probst, J.C.: Understanding Information Technology Acceptance by Individual Professionals: Toward an Integrative View. Information & Management 43, 350–363 (2006) 26. Lewis, W., Agarwal, R., Sambamurthy, V.: Sources of Influence on Beliefs about Information Technology Use: An Empirical Study of Knowledge Workers. MIS Quarterly 27(4), 657–678 (2003) 27. van der Heijden, H.: Factors Influencing the Usage of Websites: The Case of a Generic Portal in the Netherlands. Information & Management 40(6), 541–549 (2003) 28. Hays, R.D., DiMatteo, M.R.: A Short-Form Measure of Loneliness. Journal of Personality Assessment 51, 69–81 (1987) 29. Macintosh, G., Lockshin, L.S.: Retail Relationships and Store Loyalty: A Multi-Level Perspective. International Journal of Research in Marketing 14, 487–497 (1997) 30. Bodet, G.: Customer Satisfaction and Loyalty in Service: Two Concepts, Four Constructs, Several Relationships. Journal of Retailing and Consumer Services 15(3), 156–162 (2008) 31. Agarwal, R., Prasad, J.: A Conceptual and Operational Definition of Personal Innovativeness in the Domain of Information Technology. Information Systems Research 9(2), 204–215 (1998) 32. Fornell, C., Larcker, D.: Structural Equation Models with Unobservable Variables and Measurement Error. Journal of Marketing Research 18(1), 39–50 (1981)
Leadership Styles, Web-Based Commitment and Their Subsequent Impacts on e-Learning Performance in Virtual Community Dae Sung Lee1, Nam Young Jo2, and Kun Chang Lee3,* 1
PhD Candidate, SKK Business School, Sungkyunkwan University, Seoul 110-745, Republic of Korea [email protected] 2 PhD Candidate, SKK Business School, Sungkyunkwan University, Seoul 110-745, Republic of Korea [email protected] 3 Professor at SKK Business School and WCU Professor at Department of Interaction Science, Sungkyunkwan University, Seoul 110-745, Republic of Korea [email protected]
Abstract. Virtual communities (VCs) are regarded as increasingly critical groups. This study examines leadership styles, Web-based commitment, and their subsequent impacts on e-learning performance in VCs. The findings indicate that a leader’s initiating structure (task-oriented) has a strong, direct effect on performance. On the other hand, a leader’s consideration (relationshiporiented) has a strong, indirect impact on performance through commitment, which might require more time to observe. These results imply that future studies should consider the related constructs in both the short and long run. Keywords: virtual community, leader’s consideration (relationship-oriented), leader’s initiating structure (task-oriented), commitment, performance.
1 Introduction Virtual communities (VCs) are those places on the Web where people can find and electronically communicate with others who have similar interests. The rapid growth of computer technology and the Internet has accelerated the expansion of VCs in recent years. Virtual teams are becoming a more common type of work unit and are expected to play an increasingly important role in organizations [14, 18]. As such, virtual team leadership is highly important for virtual team performance [13]. Although some research on virtual leadership styles exists, there is still a need for research assessing how certain leadership styles interact with communication technologies to affect team processes and outcomes. The purpose of our study is to examine the effect of leadership styles and commitment on performance in VCs and the mediating effect of commitment. We investigate what kind of leadership style *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 447–456, 2011. © Springer-Verlag Berlin Heidelberg 2011
448
D.S. Lee, N.Y. Jo, and K.C. Lee
influences both commitment and performance and how different their effects on performance are. Few studies have investigated commitment and performance in virtual communities (VCs). Our study seeks to bridge the research gap by examining the effects of leadership styles on Web-based commitment and performance in VCs. We organized 118 VCs consisting of three to four members with students taking a Web-based e-learning course. We examine the proposed hypotheses by testing the structural model using survey data from the VC members.
2 Theoretical Background and Research Model 2.1 Leadership in a Virtual Community Researchers have demonstrated that leaders can make a critical difference to team performance and effectiveness [23, 31]. Into their model of teamwork, Salas, Sims, and Burke [28] included team leadership as one of the ‘‘Big Five’’ contributors to team effectiveness. Moreover, Cascio and Shurygailo [5] argued that leaders play important roles in modeling effective teamwork and in setting ground rules for team members to engage in successful team processes. In short, leadership appears to be an integral part of effective teamwork [12]. The virtual environment and its various communication technologies have created a new context for leadership and teamwork [1]. Leadership within this new context has been referred to as “e-leadership” or “virtual leadership” and has been defined as “a social influence process mediated by advanced information technologies to produce changes in attitudes, feelings, thinking, behaviour, and/or performance of individuals, groups, and/or organizations” [2]. Leaders in VCs play the role of guiding their members in pursuing a common interest or purpose (e.g., information sharing, hobbies) [7] by interacting with them largely through electronic modes of communication such as computer bulletin boards and online chats. The role of leadership in generating team-like feelings among the members is more important in a VC because the members’ interaction depends entirely on online communication [3]. In this study, we adopted the style approach as the theoretical framework and employed the Ohio State University (OSU) leadership theory. The style approach provides a framework for assessing leadership in a broad way, as a behavior with task and relationship dimensions. This framework does not tell leaders how to behave, but describes the major components of their behavior. In some situations, leaders need to be more task-oriented, whereas in others they need to be more relationship-oriented [25]. The style approach can be applied easily in ongoing leadership settings. At all levels in any type of organization, managers are continually engaged in task and relationship behaviors. OSU researchers classified leaders’ behaviors into two categories: consideration and initiating structure [16]. Consideration pertains to relationship-oriented behaviors while the initiating structure is related to task-oriented behaviors. According to the OSU leadership theory, consideration is the degree to which a leader is friendly and supportive in dealing with subordinates while the initiating structure refers to the degree to which a leader structures the task, provides direction, and defines the members’ roles in the group as well as the group’s goals [11, 16, 3].
Leadership Styles, Web-Based Commitment and Their Subsequent Impacts
449
2.2 Web-Based Commitment Individuals with higher levels of organizational commitment have a sense of belonging to, and identifying with, the organization, which increases their desire to pursue the organization's goals and activities in addition to their willingness to remain a part of the organization [20, 24]. Moreover, organizationally committed individuals are far less likely to engage in absenteeism and turnover [19, 21, 24, 27]. There are many studies on the relationship between members’ motivation and organizational commitment. However, few studies have investigated both commitment and performance in virtual communities (VCs). It is important to control for commitment due to invisible activities and a weak sense of belonging within VCs. Web-based commitment in VCs may be stronger than that in off-line communities [10, 26]. 2.3 Research Model and Hypotheses The research model used in this paper (Figure 1) examines leadership styles, Webbased commitment, and their subsequent impacts on performance in a virtual community.
Fig. 1. Research model
In the 1950s and 1960s, a multitude of studies were conducted by researchers at both Ohio State University and the University of Michigan to determine how leaders could best combine their task and relationship behaviors to maximize the impact of these behaviors on the satisfaction and performance of followers [25]. In essence, the researchers were looking for a universal theory of leadership that would explain leadership effectiveness in every situation. The results emerging from this large body of literature were contradictory and unclear [30]. Although some of the findings pointed to the value of a leader being both highly task-oriented and highly relationship-oriented in all situations [22], the preponderance of research in this area was inconclusive [25]. Low levels of organizational commitment generally lead to a variety of organizational ills, including reduced organizational citizenship behavior, reduced job effort and performance, and increased absenteeism and turnover [19]. On the basis of these studies, we propose the following five hypotheses:
450
D.S. Lee, N.Y. Jo, and K.C. Lee
H1: The leader’s consideration will positively contribute to Web-based commitment. H2: The leader’s initiating structure will positively contribute to Web-based commitment. H3: The leader’s consideration will positively contribute to performance. H4: The leader’s initiating structure will positively contribute to performance. H5: Web-based commitment will positively contribute to performance.
3 Research Methodology 3.1 Data Collection A questionnaire survey was used to collect data for the study, and the data from the survey were tested using partial least square (PLS) regression. The participants were undergraduate students who were taking a Web-based e-learning course at a Korean University. The students created virtual communities (VCs) and completed assignments given in the course. Each VC had one designated leader in order to achieve the common targets (team assignments). In this respect, nonwork-related VCs might be appropriate subjects for studying leadership styles and commitment. We also understand that the leaders in the VCs could influence the students’ grades. To secure unbiased and self-motivated participation, we announced that bonus credit would be given to students who submitted questionnaires without omitting answers or apparently skewing the distribution of answers. A Web-based survey was administered to all members of the VCs. To avoid selection bias, we provided questionnaires to all students in the course, with a total of 492 surveys distributed. If respondents supplied inconsistent or incomplete information or did not participate in team activities, their responses were eliminated from the dataset. This process yielded 460 useable cases (93 percent) and 118 VCs. Table 1 reports the sample characteristics. Table 1. Sample Characteristics Characteristics Gender
Grade
Frequency
Percent
Male
343
74.6
Female
117
25.4
Freshman
125
27.2
Sophomore
76
16.5
Junior
89
19.3
Senior
170
37.0
Leader (Do you play a leadership role in the virtual community?)
Yes
118
25.7
No
342
74.3
Experience in virtual community (Have you ever experienced a virtual community?)
Yes
288
62.6
No
172
37.4
460
100.0
Total
Leadership Styles, Web-Based Commitment and Their Subsequent Impacts
451
3.2 Measures Each item in our model was measured on a 7-point Likert scale, with answers ranging from 1 ‘‘strongly disagree” to 7 ‘‘strongly agree” (see Table 2). The items in the survey were developed by adapting existing measures validated by other researchers or by converting construct definitions into a questionnaire format. Table 2. Construct and Measurement Latent variables Leader’s Consideration
Construct LC1
My leader does personal favors for the members.
LC2
My leader finds time to listen to the members.
LC3 LC4 LIS1
Leader’s Initiating Structure
LIS2 LIS3 LIS4 WC1
Web-Based Commitment
Measurement
WC2 WC3 WC4 P1 P2
Performance P3 P4
Related literature
Halpin and Winer My leader is friendly and approachable. [11], Bock et al. [3] My leader makes the members feel at ease when talking with them. My leader schedules the tasks to be done. My leader makes sure that his/her part in the community Halpin and Winer is understood by the members. [11], Bock et al. [3] My leader lets members know what is expected of them. My leader coordinates the tasks of the members. I am generally motivated to positively participate in my virtual community (VC). Casalo et al. [4], I generally activate my virtual community. Meyer and Allen [21] I provide useful information to other VC members. I frequently compose and post messages. Productivity increased with the implementation of the VC. Speed of completing assignments increased with the implementation of the VC. Kirkpatrick [17] Ease of completing assignments was increased with the implementation of the VC. I was generally satisfied with the influence of the VC on my learning performance.
4 Results 4.1 Measurement For the analysis, we adopted a confirmatory approach that uses partial least squares (PLS). PLS has been used widely in theory testing and confirmation, and is appropriate for determining whether relationships exist [9]. We used SmartPLS 2.0 to analyze the measurement and structural models. To validate our measurement model, we conducted assessments of the content, discriminant, and convergent validities. The content validity of our survey was established from the existing literature, and our measures were constructed by adopting constructs validated by other researchers. As shown in Table 3, our composite reliability values ranged from 0.837 to 0.905, and
452
D.S. Lee, N.Y. Jo, and K.C. Lee
our variance extracted by the measures ranged from 0.563 to 0.704. All of these figures are above acceptable levels. The discriminant validity was assessed by examining the correlations among variables [9]. For satisfactory discriminant validity, the AVE (average variance extracted) from the construct should be greater than the variance shared between the construct and other constructs in the model [6]. Table 4 lists the correlation matrix with the correlations among the constructs and the square root of AVE on the diagonal. Table 3. Confirmatory Factor Analysis Construct Leader’s Consideration
Leader’s Initiating Structure
Web-Based Commitment
Performance
Items LC1 LC2 LC3 LC4 LIS1 LIS2 LIS3 LIS4 WC1 WC2 WC3 WC4 P1 P2 P3 P4
Factor Loading 0.677 0.783 0.900 0.875 0.798 0.806 0.854 0.825 0.749 0.807 0.737 0.705 0.840 0.832 0.865 0.817
t-value
Cronbach’s Composite α Reliability
15.861 30.945 84.931 63.799 33.989 33.753 52.381 40.792 20.898 29.192 16.075 18.543 47.534 42.644 54.188 46.296
AVE
0.828
0.886
0.662
0.839
0.892
0.674
0.743
0.837
0.563
0.860
0.905
0.704
Table 4. Correlation of Latent Variables Leader’s Initiating Structure
Construct
Leader’s Consideration
Web-Based Commitment
Leader’s Consideration
0.814
Leader’s Initiating Structure
0.616
0.821
Web-Based Commitment
0.338
0.279
0.751
Performance
0.465
0.517
0.373
Performance
0.839
4.2 Structural Model The path coefficients indicate the strengths of the relationships between the dependent and independent variables, and the R2 values represent the amount of variance explained by the independent variables. We tested the structural model by estimating the path coefficients and R2 values. The R2 and path coefficients (loading and
Leadership Styles, Web-Based Commitment and Their Subsequent Impacts
453
significance) indicate how well the data support the hypothesized model. Figure 2 and Table 5 show the results of the test of the hypothesized structural model. All of the R2 values in this model are above the value of 10% recommended by Falk et al. [8]. Regarding the goodness of fit indicators, leader’s consideration, leader’s initiating structure, and Web-based commitment account for 34.3% of the variance in performance. They also account for 12.2% of the variance in Web-based commitment. These results support the hypothesized relationships among the leader’s consideration, initiating structure, and Web-based commitment and performance. As shown in Table 5, all of the hypotheses are supported. However, the strength of the paths varies in the coefficient levels. Three paths (H1, H4, and H5) represent strong relationships between the variables, whereas the remaining paths (H2 and H3) have weak β values. The findings are discussed in the section below. Table 5. Results of the Hypothesis Tests Hypotheses
Path
Coefficient (β)
t-Value
H1
Leader’s Consideration → Web-Based Commitment
0.267
Strong
4.808**
H2
Leader’s Initiating Structure → Web-Based Commitment
0.114
Weak
1.831*
H3
Leader’s Consideration → Performance
0.179
Weak
3.154**
H4
Leader’s Initiating Structure → Performance
0.347
Strong
6.793**
H5
Web-Based Commitment → Performance
0.216
Strong
4.915**
*p<0.1, **p<0.01
Fig. 2. Research Model Results
5 Discussion 5.1 Theoretical and Practical Implications According to the style approach to studies on leadership, there is no single “leadership styles theory” to explain leadership effectiveness in various situations [22]. We
454
D.S. Lee, N.Y. Jo, and K.C. Lee
focused on the specific situation of a virtual community (VC) and introduced leadership styles, commitment, and performance into the framework. The findings suggest that a leader’s initiating structure (task-oriented) has a strong, direct effect on performance. On the other hand, a leader’s consideration (relationship-oriented) has a strong, indirect impact on performance through commitment. As developing commitment may require time, managers should focus on task-oriented leadership in the short term. The existing literature indicates that commitment is a very important factor that influences organizational performance in the long run. Therefore, managers should control commitment and consider relationship-oriented leadership in order to generate continuous performance. By assessing their leadership style, managers can determine how they are viewed by others and how they could change their behaviors to be more effective. The style approach provides a mirror for managers that can be helpful in answering the frequently asked question of "How am I doing as a leader?". 5.2 Limitations and Future Research According to Hiltz and Wellman [15], when group members interact over a long period of time in a VC, they can become addicted to the group, thus signifying commitment to the VC. Longitudinal designs should be used to illustrate how commitment and performance develop over time by looking at changes in leadership styles. Our cross-sectional design showed that there were causal relationships among the selected constructs. Future researchers could gather longitudinal data to examine causality and interrelationships among the variables that are important to commitment and performance. Our sample included students, not full-time employees. Because team members who are full-time employees work in different organizational settings, the results may differ in the workplace. Consequently, future research should pursue comparative studies in various environments. Finally, we did not consider multiple leadership approaches. Future research should explore various antecedents to commitment and performance in a virtual community. Acknowledgments. This study was supported by WCU (World Class University) program through the National Research Foundation of Korea funded by the Ministry of Education, Science and Technology (Grant No. R31-2008-000-10062-0).
References 1. Avolio, B.J., Kahai, S., Dodge, G.E.: e-Leadership: Implications for Theory, Research, and Practice. Leadership Quarterly 11(4), 615–668 (2001) 2. Avolio, B.J., Kahai, S., Dumdum, R., Sivasubramaniam, N.: Virtual Teams: Implications for e-Leadership and Team Development. In: London, M. (ed.) How People Evaluate Others in Organizations, pp. 337–358. Lawrence Erlbaum, Mahwah (2001) 3. Bock, G., Ng, W., Shin, Y.: The Effect of a Perceived Leader’s Influence on the Motivation of the members of Nonwork-Related Virtual Communities. IEEE Transactions on Engineering Management 55 (May 2008)
Leadership Styles, Web-Based Commitment and Their Subsequent Impacts
455
4. Casalo, L.C., Flavian, C., Guinaliu, M.: Fundaments of Trust Management in the Development of Virtual Communities. Management Research News 32(5), 324–338 (2008) 5. Cascio, W.F., Shurygailo, S.: e-Leadership and Virtual Teams. Organizational Dynamics 31(4), 362–376 (2003) 6. Chin, W.W.: The Partial Least Squares Approach to Structural Equation Modeling. In: Marcoulides, G.A. (ed.) Modern Methods for Business Research, Lawrence Erlbaum, Mahway (1988) 7. Cothrel, J., Williams, R.: Online Communities: Getting the Most out of Online Discussion and Collaboration. Knowledge Management Review 6, 20–25 (1999) 8. Falk, R.F., Miller, N.B.: A Premier for Soft Modeling. The University of Akron, Akron (1992) 9. Fornell, C.R., Lacker, D.F.: Two structural equation models with unobservable variables and measurement error. Journal of Marketing Research 18(1), 39–50 (1981) 10. Griffith, T.L.: Internet Addiction: Does It Really Exist? In: Gackenbach, J. (ed.) Psychology and the Internet: Intrapersonal, Interpersonal, and Transpersonal Implications, pp. 61–75. Academic, New York (1998) 11. Halpin, A.W., Winer, B.J.: A Factorial Study of the Leader Behavior Descriptions. In: Stogdill, R.M., Coons, A.E. (eds.) Leader Behavior: Its Description and Measurement. Ohio State Univ. Press, Columbus (1957) 12. Hambley, L.A., O’Neill, T.A., Kline, T.J.B.: Virtual Team Leadership: The Effects of Leadership Style and Communication Medium on Team Interaction Styles and Outcomes. Organizational Behavior and Human Decision Processes 103(1), 1–20 (2007) 13. Hertel, G., Geister, S., Konradt, U.: Managing Virtual Teams: a Review of Current Empirical Research. Human Resource Management Review 15, 69–95 (2005) 14. Hertel, G., Konradt, U., Orlikowski, B.: Managing Distance by Interdependence: Goal Setting, Task Interdependence, and Team-based Rewards in Virtual Teams. European Journal of Work and Organizational Psychology 13, 1–28 (2004) 15. Hiltz, S.R., Wellman, B.: Asynchronous Learning Networks as a Virtual Classroom. Communications of the ACM 40(9), 44–49 (1997) 16. House, R.J., Mitchell, T.R.: Path-goal Theory of Leadership. In: Pierce, J.L., Newstrom, J.W. (eds.) Leaders and the Leadership Process: Readings, Self-Assessments, and Applications, 2nd edn., pp. 140–146. Irwin, Homewood (1974) 17. Kirkpatrick, D.L.: Evaluation Training Programs: The Four Levels, 2nd edn. BerrettKoehler Publishers (1998) 18. Lipnack, J., Stamps, J.: Virtual Teams: People Working Across Boundaries with Technology, 2nd edn. John Wiley, New York (2000) 19. Mathieu, J.E., Zajac, D.: A Review and Meta-analysis of the Antecedents, Correlates, and Consequences of Organizational Commitment. Psychological Bulletin 108, 171–194 (1990) 20. Meyer, J.P., Allen, N.J.: A Three-component Conceptualization of Organizational Commitment. Human Resource Management Review 1, 61–89 (1991) 21. Meyer, J.P., Allen, N.J.: Commitment in the Workplace: Theory, Research, and Application. Sage, Thousand Oaks (1997) 22. Misumi, J.: The Behavioral Science of Leadership: An Interdisciplinary Japanese Research Program. University of Michigan Press, Ann Arbor (1985) 23. Morgeson, F.P.: The External Leadership of Self-managing Teams: Intervening in the Context of Novel and Disruptive Events. Journal of Applied Psychology 90(3), 497–508 (2005)
456
D.S. Lee, N.Y. Jo, and K.C. Lee
24. Mowday, R.T., Porter, L.W., Steers, R.: Organizational Linkages: The Psychology of Commitment, Absenteeism, and Turnover. Academic Press, San Diego (1982) 25. Northouse, P.G.: Leadership: Theory and Practice. Sage, Thousand Oaks (2009) 26. Novak, T.P., Hoffman, D.L., Yung, Y.F.: Measuring the Customer Experience in Online Environments: A Structural Modeling Approach. Mark. Sci. 19(1), 22–42 (2000) 27. Riketta, M.: Attitudinal Organizational Commitment and Job Performance: A Metaanalysis. Journal of Organizational Behavior 23, 257–266 (2002) 28. Salas, E., Sims, D.E., Burke, C.S.: Is There a Big 5 in Teamwork? Small Group Research 36(5), 555–599 (2005) 29. Steers, R.M., Porter, L.W., Bigley, G.A.: Motivation and Leadership at Work, 6th edn. McGraw-Hill, Singapore (1996) 30. Yukl, G.: Leadership in Organizations, 3rd edn. Prentice Hall, Englewood Cliffs (1994) 31. Zaccaro, S.J., Klimoski, R.: The Interface of Leadership and Team Processes. Group and Organization Management 27, 4–13 (2002)
Test Case Generation for Formal Concept Analysis Ha Jin Hwang1 and Joo Ik Tak2 1
Kazakhstan Institute of Management, Economics, and Strategic Research (KIMEP) [email protected] 2 Catholic University of Daegu [email protected]
Abstract. Model-based testing is a system testing technique that derives a suite of test cases from a model representing the behavior of a software system. By executing a set of model-based test cases, the conformance of the target system to its specification can be validated. However, as there may be large, sometimes infinite, number of operational scenarios that could be generated from a given model, an important issue of model-based testing is to determine a minimal set of test cases which provides sufficient test coverage. With Formal Concept Analysis (FCA) mechanism, we could analyze the coverage of the test cases and eliminate those redundant ones. This systematic approach can help reduce the test suite whilst still maintain the sufficiency of test coverage. Keywords: model-based testing, test suite reduction, formal concept analysis, UML state machine diagram.
References 1. Arévalo, G., Ducasse, S., Nierstrasz, O.: Lessons Learned in Applying Formal Concept Analysis to Reverse Engineering. In: Ganter, B., Godin, R. (eds.) ICFCA 2005. LNCS (LNAI), vol. 3403, pp. 95–112. Springer, Heidelberg (2005) 2. Bertolino, A., Inverardi, P., Muccini, H.: Formal Methods in Testing Software Architectures. In: Bernardo, M., Inverardi, P. (eds.) SFM 2003. LNCS, vol. 2804, pp. 122– 147. Springer, Heidelberg (2003) 3. Binder, R.V.: Testing Object-Oriented Systems-Models, Patterns, and Tools, Object Technology. Addison-Wesley, Reading (1999) 4. Briand, L.C., Labiche, Y., Cui, J.: Automated support for deriving test requirements from UML statecharts. Software and Systems Modeling 4(4), 399–423 (2005) 5. Broekman, B., Notenboom, E.: Testing embedded software. Addison-Wesley, Reading (2003) 6. Chen, T.Y., Lau, M.F.: A New Heuristic for Test Suite Reduction. Information and Software Technology 40, 347–354 (1998) 7. Chevalley, P., Thevenod-Fosse, P.: Automated generation of statistical test cases from UML state diagrams. In: 25th Annual International Computer Software and Applications Conference, COMPSAC 2001, October 8-12, pp. 205–214 (2001) 8. Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 2nd edn. MIT Press, Cambridge (2001)
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 457–458, 2011. © Springer-Verlag Berlin Heidelberg 2011
458
H.J. Hwang and J.I. Tak
9. Ganter, B., Wille, R.: Formal Concept Analysis: Mathematical Foundations. Springer, Heidelberg (1999) 10. Harrold, M.J.: Testing: a roadmap. In: ICSE - The Future of Software Engineering Track, Limerick, Ireland, June 4-11, pp. 61–72 (2000) 11. Harrold, M.J., Gupta, R., Soffa, M.L.: A Methodology for Controlling the Size of a Test Suite. ACM Transactions on Software Engineering and Methodology 2(3), 270–285 (1993) 12. Hartmann, J., Imoberdorf, C., Meisinger, M.: UML-based Integration Testing. In: Proceedings of ACM Symposium on Software Testing and Analysis, pp. 60–70 (2000) 13. Kim, Y.G., Hong, H.S., Cho, S.M., Bae, D.H., Cha, S.D.: Test Cases Generation from UML State Diagrams. IEEE Software 146(4), 187–192 (1999) 14. Korel, B., Singh, I., Tahat, L., Vaysburg, B.: Slicing of state-based models. In: Proceedings of International Conference on Software Maintenance, ICSM 2003, September 22-26, pp. 34–43 (2003) 15. Lindig, C.: Fast Concept Analysis. In: Stumme, G. (ed.) Working with Conceptual Structures - Contributions to ICCS 2000. Shaker Verlag, Aachen (2000) 16. Murthy, P.V.R., Anitha, P.C., Mahesh, M., Subramanyan, R.: Test Ready UML Statechart Models. In: Proceedings of the 2006 International Workshop on Scenarios and State Machines: Models, Algorithms, and Tools SCESM 2006, pp. 75–82 (2006) 17. Nobe, C.R., Warner, W.E.: Lessons learned from a trial application of requirements modeling using statecharts. In: Proceedings of the Second International Conference on Requirements Engineering, April 15-18, pp. 86–93 (1996) 18. Offutt, J., Liu, S., Abdurazik, A., Ammann, P.: Generating Test Data from State-based Specifications. Software Testing, Verification and Reliability 13(1), 25–53 (2003) 19. Rumbaugh, J., Jacobson, I., Booch, G.: The Unified Modeling Language Reference Manual, 2nd edn. Addison-Wesley, Boston (2005) 20. Sampath, S., Mihaylov, S.V., Souter, A., Pollock, L.: A Scalable Approach to UserSession based Testing of Web Applications through Concept Analysis. In: Proceedings of 19th International Conference on Automated Software Engineering (ASE 2004), Linz, Austria (2004) 21. Snelting, G.: Reengineering of configurations based on mathematical concept analysis. ACM Transactions on Software Engineering and Methodology 5(2), 146–189 (1996) 22. Tallam, S., Gupta, N.: A Concept Analysis Inspired Greedy Algorithm for Test Suite Minimization. In: Proceedings of 6th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, PASTE 2005, vol. 31(1), pp. 35–42 (2005) 23. Utting, M., Legeard, B.: Practical Model-Based Testing: A Tools Approach. Morgan Kaufmann, San Francisco (2007)
Mobile Specification Using Semantic Networks Haeng-Kon Kim Department of Computer Information & Communication Engineering, Catholic University of Daegu, Kyungbuk, 712-702, South Korea [email protected]
Abstract. A semantic network is a network which represents semantic relations among concepts. This is often used as a form of knowledge representation.This paper introduces a retrieval method based on semantic networks. A hyperlink information is essential to construct semantic networks. The information is very useful as it provides summary and further linkage to construct Semantic networks that has been provided by human. It also has a property which shows review, relation, hierarchy, generality, and visibility. Using this property, we extracted the keywords of mobile documents and made up of the Semantic networks among the keywords sampled from mobile pages. This paper extracts the keywords of mobile pages using anchor text one out of hyperlink information and makes hyperlink of mobile pages abstract as the link relation between keywords of each mobile page. I suggest this useful retrieval method providing querying word extension or domain knowledge by Semantic networks of keywords.
References 1. Berry, M.J.A., Linoff, G.: Link analysis. In: Data Mining Techniques: For marketing, Sales, and Customer Support, pp. 216–242. Wiley Computer Publishing, Chichester (1998) 2. Brin, S., Page, L.: The Anatomy of a Large-Scale Hyper textual Mobile Search Engine. In: Proceeding of the Seventh International World Wide Mobile Conference (2002) 3. Callan, J.P., Croft, W.B., Harding, S.M.: The INQUERY Retrieval System. Database and Expert Systems Applications, 78–83 (1992) 4. Carrie, J., Kazman, R.: Mobile Query: Searching and Visualizing the Mobile through Connectivity. In: Proceeding of the Sixth International World Wide Mobile Conference (1997) 5. Chen, H., Schuffels, C., Orwig, R.: Internet Categorization and search: A Self Organizing Approach. Journal of Visual Communication and Image Representation 7, 88–102 (1996) 6. Klienberg, J.M.: Authoritative Sources in a Hyperlinked Environment. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pp. 668–677 (1998) 7. Koster, M.: ALIMOBILE: Archie-like indexing in the mobile. Computer Networks and ISDN Systems 27, 175–182 (1994) 8. Marchiori, M.: The Quest for Correct Information on the Mobile: Hyper Search Engines. In: Proceeding of the Sixth International World Wide Mobile Conference (1997) 9. Mauldin, Leavitt: Mobile -agent related research at the CMT. In: Proceedings of the ACM Special Interest Group on Networked Information Discovery and Retrival (1994) 10. Neusess, C., Kent, R.E.: Conceptual Analysis of Resource Meta-information. Computer Networks and ISDN Systems 27, 973–984 (1995) 11. Salton, G.: Developments in automatic text retrieval. Science 253, 974–979 (1991) T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, p. 459, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Novel Image Classification Algorithm Using SwarmBased Technique for Image Database Noorhaniza Wahid Faculty of Computer Science and Information Technology Universiti Tun Hussein Onn Malaysia, Johor, Malaysia [email protected]
Abstract. Image data has become one of the most popular data type distributed in many multimedia applications. The effectiveness of image deployment is greatly dependent on the ability to classify and retrieve the image files based on their properties or content. However, image classification has faced a problem where the number of possible different combination of variables is very high. The algorithms which based on exhaustive search are unable to cope with the problem as the computational ability become infeasible. In this paper, a new image classification algorithm namely Simplified Swarm Optimization (SSO) has been proposed. This new approach is capable to obtain the high quality potential solution in the population which contributes to the improvement of the classification performance. This algorithm has been tested using image dataset which consists of seven classes of outdoor images. Moreover, the performance of SSO, Particle Swarm Optimization (PSO) and Support Vector Machine (SVM) have been compared and analyzed. The testing results show that SSO is more competitive than PSO and SVM, and can be fruitfully exploited in image database and solving image classification problem. Keywords: SSO; PSO; SVM; image classification.
1 Introduction Over the recent years, there is a rapid increase in the amount of available image data in multimedia databases. In addition, the phenomena of personal and public collections of digital images have also become increasingly common. Thus, the emerging of image classification techniques that employ statistical methods such as Support Vector Machine (SVM), Decision Tree and Neural Network (NN) also have been exponentially increased [1]. Data mining is one of the possible approaches to deal with a rapid growth of the massive amount of information stored in image databases. Many data mining applications involve the task of building a model for predictive classifications. The aim of data classification is to classify data instances into classes or categories of the same type [2]. Nevertheless, data classification has faced a problem where the number of possible different combination of variables is very high. The algorithms which based on exhaustive search are unable to cope with the problem as the computational ability become infeasible [3]. Thus, the need to T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 460–470, 2011. © Springer-Verlag Berlin Heidelberg 2011
A Novel Image Classification Algorithm Using Swarm-Based Technique
461
automatically recognize to which class the image belongs makes image classification become an emerging task and the important research area in data mining, so called image mining [4]. To solve the image classification problems, some evolutionary algorithms based clustering techniques have been proposed including ant colony, Particle Swarm Optimization (PSO) and the optimization based image classifier such as COP-K-Means and SOFM, respectively [5]. Other literatures have also reported the use of Genetic Programming [6] and Genetic Algorithm [7] for the supervised image classification of multidimensional feature space. Undoubtedly, recent studies have put their effort to improve the performance of image classification by applying the efficient optimization approaches. Thus, to the best of our knowledge, a swarm based algorithm namely PSO has not yet been implemented in the recent researches of supervised image classification in order to obtain the decision function from the feature space. In comparison with other stochastic optimization technique such as GA, PSO has fewer complicated operations and parameters to adjust, and can be coded in just a few lines. Because of these advantages, the PSO has received increasing attention in data mining community in recent years. However, the basic principle of PSO can easily and efficiently decide the next position for the problems with continuous variables. It is not trivial and well-defined for the problems with discrete variables and sequencing problems. Thus, in this paper a new swarm-based algorithm namely Simplified Swarm Optimization (SSO) is proposed for image classification problem. The proposed algorithm is not only suitable for discrete variables, it also can deal with continuous variables. The idea of the proposed algorithm is to apply a LowerBound and UpperBound method as a threshold for prediction condition in the encoding scheme for each of the particle. The rest of the paper is organized as follows: next section, will discuss the basic idea of standard PSO. In section 3, a novel Simplified Swarm Optimization (SSO) algorithm will be introduced. Details of the proposed algorithm incorporated with data mining methodology are presented in section 4. The computational results that have compared the classification performance of the proposed technique with PSO and SVM are explained in section 5. Finally, the paper ends with conclusion and discussion.
2 Particle Swarm Optimization Particle Swarm Optimization (PSO) was originally introduced by Eberhart and Kennedy in 1995 as an optimization technique which is based on metaphor of social communication such as bird flocking, fish schooling and even human social behavior [8]. It has turned out to be an effective and competitive optimizer in a wide range of applications such as function optimization, neural network training [3], resource allocation problem [9], and scheduling problem [10]. In standard PSO, updating the velocity and position for each individual, so called particle are the most important part. Initially, a population of particles is randomly generated and then performs the search for the optimum iteratively. In every iteration, each particle is updated based
462
N. Wahid
on the information acquired from the entire swarm population and also from its own best solution. The former is known as global best solution (gbest), and the latter is known as previous best solution (pbest). In this approach, updating the particles’ velocity and position are executed by the following equations:
vijt = w.vijt −1 + c1.rand1 ().( pij − xijt −1 ) + c2 .rand 2 ().( g ij − xijt −1 )
Eq. 1
xijt = xijt −1 + vijt
Eq. 2
In these relations, i = 1,2,…,N, where N is a number of swarm population in jdimension. vij is the velocity vector and xij is the current position of the ith particle. pij and gij denotes the pbest and gbest, respectively. c1 is called cognitive parameter while c2 is called social parameter, and in most cases (c1 + c2) is equal to 4 [8]. rand 1() and rand2() are random numbers generated in the range between 0 and 1. w is an inertia weight that control the ability of global and local search. The larger the inertia weight will cause the particle to fly over the optimum searching space while the smaller the inertia weight will trap the particle into the local optimum. In many optimization problems PSO has been performed successfully when deal with continuous variables.
Fig. 1. The particles’ update position scheme for the proposed SSO
A Novel Image Classification Algorithm Using Swarm-Based Technique
463
3 The Proposed Swarm-Based Approach The proposed Simplified Swarm Optimization (SSO) algorithm used here is adopted from [11] which was developed in 2009 by Yeh et al.. In this paper, we employed the SSO algorithm for supervised image classification. The idea of SSO is mainly realized through the generation of particles in the initial stage. Unlike PSO, SSO does not need to use the velocity parameter and inertia weight; it uses only one random number and three predefined parameters (i.e Cw, Cp, Cg) to update each of the particle’s position. Suppose, the initialized size of particles is N, where each particle represents a potential solution. The solution for the particles is represented in a Ddimensional space. A particle i in iteration t has its own position represented t . The quality of the solution is determined by the fitness as X it = xit1 ,....., xidt ,......, xiD
{
( )
}
f X it
. At iteration t, if the fitness value of the particle X it is better than the function fitness value of its pbest Pi = {pi1 ,..., pid ,... piD } , then the pbest will be updated with the current particle X it . Similarly, at iteration t, if the quality of the pbest is better than the gbest G = {g i1 ,..., g id ,...g iD } , it will be saved as gbest. The algorithm continues its iteration process to update the particles’ position on every dimension according to the following equation after three predetermined parameters Cw, Cp, and Cg are given.
⎧ xidt −1 ⎪ t −1 ⎪ pid t xid = ⎨ ⎪ gi ⎪ x ⎩
if if if if
rand () ∈ [0, Cw ) rand () ∈ Cw , C p )
[ rand () ∈ [C , C ) rand () ∈ [C ,1) p
Eq. 3
g
g
where d = 1, 2, 3,….., D, rand() is a random number between the range of 0 and 1. Cw, Cp, and Cg are positive constants between 0 and 1 where Cw < Cp < Cg. xid , the particle’s position that is pbest position and gbest position in each dimension is represented by pid and g id , respectively. x is a new randomly generated particle’s position value in each dimension. Figure 1 shows the flowchart of the particle update. The proposed SSO algorithm can be described as follows: Step 1: Initialize parameters and population with random position. Step 2: Evaluate the fitness value for each particle. Step 3: Get new pbest: if the fitness value of particle i is better than pbest fitness value, set the new pbest of particle i to the current fitness value. Step 4: Get new gbest: if the fitness value of any updated pbest is better than the current gbest fitness value, set the gbest to current fitness value. Step 5: Update particles’ position according to Eq. 3. In every dimension, one random number between 0 and 1 will be generated. If 0 ≤ rand() < Cw is true then keep the original position value, else if Cw ≤ rand() < Cp then replace the original position value with pbest value, else if Cp ≤ rand() < Cg then replace the original position value with gbest, else if Cg ≤ rand() ≤ 1 is true then replace the original position value by a new randomly generated value. Step 6: Repeat step 2 until the termination criterion is met.
464
N. Wahid
4 SSO for Data Mining We apply SSO to data mining application by accelerating the occurrence of high quality potential solution in the population. The basic idea of this approach is to discover the patterns from the corresponding rule which are commonly found in the best solution of the population. In this paper, SSO algorithm is used to solve image classification problem which contains discrete and continuous variables. In this study, the same classification rule mining is used in both standard PSO and SSO algorithm. This approach is significantly differs from other applications that combines data mining and PSO because most of the efforts are deal with the development of PSO as an optimization techniques to solve data mining problems [12-13], such as classification and clustering. Several processes should take into account in order to achieve the desired aim as mentioned above. Details of the data mining methodology are further explained in the following subsections. 4.1 Encoding for an Individual (Particle) In SSO, the particle of the swarm population consists of a conjunction of conditions composing the IF-THEN prediction rule. This rule has the advantage of being highlevel symbolic knowledge representation which contributes to the ability of finding small number of rules with higher fitness [14]. Figure 2 depicted the encoding scheme of the particle’s position (term) that consists of 3 parts: the attribute, the LowerBound and the UpperBound. Each particle contains number of attributes (dimension) that range from 1 to N. LowerBound and UpperBound act as a threshold for the prediction condition which represent the lowest range and the highest range of training and test data, respectively. The last cell, Class X represents the predictive class of the rule. Attr1
LowerBound
UpperBound
..
AttrN
LowerBound
UpperBound
Class X
Fig. 2. Particle’s encoding scheme
We computed the proposed LowerBound and UpperBound method according to the following equations. LowerBound = xij − ρ ⋅ range[X max − X min ] ,
Eq. 4
UpperBound = xij + ρ ⋅ range[X max − X min ] ,
Eq. 5
where, Xi = (xi1, xi2,…., xij) denotes the ith value of the nth dimension, ρ is a random number chosen between 0 and 1, and [Xmax-Xmin] represents the range value of the data set within each dimension. In the case of SSO rule mining, the value of LowerBound and UpperBound will not be updated even if we run the generation. However, in the case of PSO rule mining, the value of LowerBound and UpperBound will be updated according to the new position updated by the velocity after we run the generation. Here, Eq. 2 is employed to update the new current value of the corresponding position in PSO.
A Novel Image Classification Algorithm Using Swarm-Based Technique
465
4.2 Fitness Function To establish a point of reference for further data mining process, we should find the best particle positioning from the potential solution. Thus, it is necessary to evaluate the quality of every candidate rule in training phase so that we can estimate how well a rule will perform in the testing phase. The quality (fitness function) of the rule is measured based on the sensitivity and specificity of the rule according to following equation. Quality = Sensitivity x Specificity =
TP TP + FN
.
TN TN + FP
,
Eq. 6
where, TP: true positive, the rule that cover the number of instances that have the class predicted by the rule. FP: false positive, the rule that cover the number of instances that have a class different from the class predicted by the rule. FN: false negative, the rule that not cover the number of instances but have the class predicted by the rule. TN: true negative, the rule that not cover the number of instances and that do not have the class predicted by the rule. Quality value is within the range 0 ≤ quality ≤ 1 and the higher the value of the quality, the higher the quality of the rule. 4.3 Rule Pruning If we consider the encoding scheme which used in this study, we are facing with a particles’ length in the form of 3*N. The lower this length the easier it will be to optimize values in its components. Thus, the aim of rule pruning is to remove the irrelevant terms that might have been unnecessarily included in the rules. Besides improving the predictive power of the rule it can also avoid overfitting in the training set [2]. In the first iteration, the pruning process starts with the full rule and tries to remove each part in turn within the corresponding term. The quality of the resulting rule is evaluated using the equation defined in Eq. 6. The quality value should be the same or higher than the original rule so that the process of minimizing the rule’s length will be continued. The process is repeated until the rule has just one term or until there is no term left. Once the term with the highest quality was found, the correctly classified instances are then removed from the training set and the rule discovery algorithm is run once more. After completing the pruning process, the whole rules which do not contribute to the classification accuracy will be removed. This is achieved by classifying the training set using the rule list. If any rules do not classify any examples correctly, then they are removed. Later, a series of testing data are used to measure its classification accuracy according to the rule set obtained. For each instance, a prediction value is computed by examining every element in the rule set for the corresponding class if it is covered by the rule. The prediction value is calculated according to the prediction function as in Eq. 7.
466
N. Wahid
Prediction value += α * rule quality + β * percentage of the rule covered,
Eq. 7
where α and β are two parameters corresponding to the importance of the rule quality and the percentage of the rule covered, respectively. The former is known as Quality weight and the latter is known as Coverage weight, α∈[0,1] and β = (1 - α). The prediction value for each class is accumulated and the final result is calculated from the class with the highest prediction value.
5 Experimental Results Our experiments used image segmentation dataset from the UCI data set repository [15]. The downloaded database is composed of a training set and a testing set. The training set consists of 210 instances which were drawn randomly from a database of 7 outdoor images. The 7 classes of images named brickface, sky, foliage, cement, window, path and grass. Each class is represented in this training set by 30 instances. The testing set is a very large set data which consists of 2100 instances. From the testing set, we have chosen the first 20 instances in each class in their order of appearance. Thus, our testing set is composed by 7x20 = 140 instances and the whole dataset for this experiment is made up by 210+140 = 350 instances. Therefore, the dataset contributes 60% for training data and 40% for testing data which is quite usual distribution in classification. The images were hand segmented to create a classification for every pixel. Each instance has 3x3 region and 19 attributes, where some of them are represented by integer and some by real numbers. Table 1 below shows the description of the image data. Table 1. Image database attributes and data type Attribute Region centroid column Region centroid row Region pixel count Short line density 5 Short line density 2 Vertical edge mean Vertical edge standard deviation Horizontal edge mean Horizontal edge standard deviation Intensity mean Raw red mean Raw blue mean Raw green mean Excess red mean Excess blue mean Excess green mean Value mean Saturation mean Hue mean
Data type Integer Integer Integer Real Real Real Real Real Real Real Real Real Real Real Real Real Real Real Real
A Novel Image Classification Algorithm Using Swarm-Based Technique
467
In order to verify the performance of our proposed method, some computer simulations have been conducted. We ran our experiments on a system with a 1.8GHz Pentium (R) processor and 2GB RAM running in Windows XP Professional. We have evaluated the performance of SSO by comparing it with PSO. Besides, as concerns the other classification techniques used for the above mentioned problems, we have made reference to Support Vector Machine (SVM) which was chosen from the Waikato Environment for Knowledge Analysis (WEKA) system release 3.6 [2]. WEKA contains a collection of machine learning algorithms to evaluate the performance of various standard classification problems. SVM is a new type of learning algorithm, based on statistical learning theory [16]. It has been widely used in pattern recognition applications such as handwritten digit recognition, text categorization, spam categorization, 3D object recognition and image mining [17-18]. It uses a hypothesis space of linear functions in a high dimensional feature space and constructs a discriminant function for the data set in feature space in order to separate the feature vectors of the training samples into classes. In this experiment, the radial basis function (RBF) kernel has been used during the data training of known and labeled feature vectors derived from image data set. Parameter values used for SVM are those set as default in WEKA. The implementation specifications and parameters involved in swarm-based methods which gave the best classification performance for both techniques are listed in Table 2. Table 2. The settings of parameters for swarm-based classifier Parameters setting Number of particle Maximum generation Maximum Fitness Cw, Cp, Cg Coverage weight Quality weight c1, c2 Maximum weight Minimum weight
SSO 50 100 1.0 0.1, 0.4, 0.9 0.5 0.5 -
PSO 75 100 1.0 2.0, 2.0 0.9 0.4
The classification performance has been evaluated using 10-fold cross validation procedure. In this procedure, data instances are divided into ten partitions, and each method is run ten times, using a different partition as test set each time, with the other nine as training set. We ran the image dataset for 5 times in SSO and PSO, and then get the average of their classification accuracy. The results of the classification performance are tabulated in Table 3. From Table 3 we can see that SSO provides better classification accuracy than PSO and SVM. It has clearly shown that the classification accuracy of SSO (86.8%) can outperform the PSO (84%) and SVM (70.6%). There is a small different in classification performance between SSO and PSO with 2.8%. Interestingly, we found that SSO could achieve about 16.2% higher accuracy when compared with SVM. Thus, our idea of exploiting SSO has proven can solve the image classification problem effectively.
468
N. Wahid Table 3. Classification accuracies (%) of three classifiers SSO 86.8%
PSO 84.0%
SVM 70.6%
In addition, for each experiment we have run the simulation with four different sizes of particles which are 25, 50, 75 and 100. Our aim is to discover the optimum population size of SSO and PSO that leads to higher classification accuracy. The results of the classification performance with respects to the number of particles are depicted in Figure 3.
88.0 87.0 86.0 %Accuracy
85.0 84.0 83.0
SSO
82.0
PSO
81.0 80.0 79.0 78.0 77.0 25
50
75
100
Number of particles
Fig. 3. Performance results on SSO and PSO classifier with respect to different number of particles
According to the above performance, SSO has a potential to achieve higher classification result with small number of particles that is 50 particles. The accuracies are also consistent when we used 75 and 100 particles. This is because, SSO does not have to use velocity and only needs to update particle’s position according to a very simple condition. Thus, the performance of SSO is more efficient than PSO. In contrast, PSO tends to use more than 50 particles to get the best classification accuracy but yet the performance is still not as good as SSO. When the number of particles was increased to 75 and 100, the classification accuracy increased about 0.7%. It should be noted that, PSO considers each particle based on two key vectors that are velocity and position. Besides, each particle in PSO that wants to update its position needs to use more than two equations, generate three random numbers, five multiplications, and three summations [11]. Therefore, PSO needs to allocate more memory than SSO for each particle to achieve better performance.
A Novel Image Classification Algorithm Using Swarm-Based Technique
469
5 Conclusion and Discussion In this paper we have presented one of the most recent metaheuristic algorithms which called SSO. This new algorithm is inspired from traditional PSO and incorporated with data mining method which has applied on image dataset. In addition, to the best of our knowledge, this is the first time apply the SSO algorithm to the image classification problem. LowerBound and UpperBound method have been proposed to be implemented for each particle encoding scheme as a threshold in a prediction condition in order to improve the image classification accuracy. For that purpose, SSO and PSO have been designed according to the same classification rule mining. We have conducted several experiments and compared the performance of SSO algorithm with traditional PSO and SVM in image segmentation data set which obtained from UCI database [15]. The comparison results illustrate the difference performance of two algorithms based on biologically inspired paradigm, and one algorithm based on statistical learning theory. From the results presented it is apparent that SSO shows better results and tends to find better solutions than PSO, as well as more competitive than SVM. In order to validate the competitiveness of SSO with PSO, four different numbers of particles size have been performed in each experiment. The promising results exhibit that SSO works very well with small number of particles to achieve higher accuracy. In this study, the results seem to imply that SSO algorithm can be fruitfully exploited in image database and solving image classification problem.
References 1. Kim, W., Lee, H.-K., Park, J., Yoon, K.: Multi Class Adult Image Classification Using Neural Networks. In: Kégl, B., Lee, H.-H. (eds.) Canadian AI 2005. LNCS (LNAI), vol. 3501, pp. 222–226. Springer, Heidelberg (2005) 2. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) 3. Wang, Z., Sun, X., Zhang, D.: Classification Rule Mining Based on Particle Swarm Optimization. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS (LNAI), vol. 4062, pp. 436–441. Springer, Heidelberg (2006) 4. Tsai, C.F.: Image mining by spectral features: A case study of scenery image classification. Expert Systems and Applications 32(1), 135–142 (2007) 5. Piatrik, T., Chandramouli, K., Izquierdo, E.: Image Classification using Biologically Inspired Systems. In: Proceeding of the 2nd International Mobile Multimedia Communications Conference, MobiMedia 2006 (2006) 6. Agnelli, D., Bollini, A., Lombardi, L.: Image classification: an evolutionary approach. Pattern Recognition Letters 23(1-3), 303–309 (2002) 7. Tseng, M.-H., Chen, S.-J., Hwang, G.-H., Shen, M.-Y.: A genetic algorithm rule-based approach for land-cover classification. ISPRS Journal of Photogrammetry and Remote Sensing 63(2), 202–212 (2008) 8. Kennedy, J., Eberhart, R.: Particle swarm optimization. In: Proceedings of the IEEE International Conference on Neural Networks, Part 4(6), Perth, Australia (1995) 9. Yin, P.-Y., Wang, J.-Y.: A particle swarm optimization approach to the nonlinear resource allocation problem. Applied Mathematics and Computation 183(1), 232–242 (2006)
470
N. Wahid
10. Allahverdi, A., Al-Anzi, F.S.: A PSO and a Tabu search heuristics for the assembly scheduling problem of the two-stage distributed database application. Computer & Operations Research 33(4), 1056–1080 (2006) 11. Yeh, W.C., Chang, W.W., Chung, Y.Y.: A new hybrid approach for mining breast cancer pattern using discrete particle swarm optimization and statistical method. Expert System with Applications 36(4), 8204–8211 (2009) 12. Melgani, F., Bazi, Y.: Classification of electrocardiogram signals with support vector machines and particle swarm optimization. IEEE Transactions on Information Technology in Biomedicine 12(5), 667–677 (2008) 13. Yang, F., Sun, T., Zhang, C.: An efficient hybrid data clustering method based on Kharmonic means and Particle Swarm Optimization. Expert Systems with Applications 36(6), 9847–9852 (2009) 14. Ang, J.H., Tan, K.C., Mamun, A.A.: An evolutionary memetic algorithm for rule extraction. Expert System with Applications 37(2), 1302–1315 (2010) 15. Blake, C.L., Merz, C.J.: UCI repository of machine learning databases. University of California, Irvine (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html 16. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines. Cambridge University Press, Cambridge (2000) 17. Tsai, C.F.: Image mining by spectral features: A case study of scenery image classification. Expert Systems and Applications 32(1), 135–142 (2007) 18. Li, S., Kwok, J.T., Zhu, H., Wang, Y.: Texture classification using the support vector machines. Pattern Recognition 36(12), 2883–2893 (2003)
An Optimal Mesh Algorithm for Remote Protein Homology Detection Firdaus M. Abdullah1, Razib M. Othman1,*, Shahreen Kasim2, and Rathiah Hashim2 1
Laboratory of Computational Intelligence and Biotechnology, Universiti Teknologi Malaysia, 81310 UTM Skudai, Malaysia 2 Department of Web Technology Faculty of Computer Science and Information Technology Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Batu Pahat, Malaysia [email protected], [email protected], {shahreen,rathiah}@uthm.edu.my
Abstract. Remote protein homology detection is a problem of detecting evolutionary relationship between proteins at low sequence similarity level. Among several problems in remote protein homology detection include the questions of determining which combination of multiple alignment and classification techniques is the best as well as the misalignment of protein sequences during the alignment process. Therefore, this paper deals with remote protein homology detection via assessing the impact of using structural information on protein multiple alignments over sequence information. This paper further presents the best combinations of multiple alignment and classification programs to be chosen. This paper also improves the quality of the multiple alignments via integration of a refinement algorithm. The framework of this paperbegan with datasets preparation on datasets from SCOP version 1.73, followed by multiple alignments of the protein sequences using CLUSTALW, MAFFT, ProbCons and T-Coffee for sequence-based multiple alignments and 3DCoffee, MAMMOTH-mult, MUSTANG and PROMALS3D for structuralbased multiple alignments. Next, a refinement algorithm was applied on the protein sequences to reduce misalignments. Lastly, the aligned protein sequences were classified using the pHMMs generative classifier such as HMMER and SAM and also SVMs discriminative classifier such as SVM-Fold and SVMStruct. The performances of assessed programs were evaluated using ROC, Precision and Recall tests. The result from this paper shows that the combination of refined SVM-Struct and PROMALS3D performs the best against other programs, which suggests that this combination is the best for RPHD. This paper also shows that the use of the refinement algorithm increases the performance of the multiple alignments programs by at least 4%. Keywords: Classification; Multiple alignment; Remote protein homology; Support Vector Machines. *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 471–497, 2011. © Springer-Verlag Berlin Heidelberg 2011
472
F.M. Abdullah et al.
1 Introduction Remote protein homology detection (RPHD) is the inference of structural or functional information of proteins by finding a relationship or homology between new sequences and proteins of which its structural properties are already known at low levels of sequence similarity. Traditional laboratory methods to detect remote protein homology are lengthy and expensive, making it unpractical to be applied to the unprecedented amount of protein sequence data made available from the advances in molecular biology. As a result, researchers have turned to the computational methods which are more powerful and could accommodate such large amount of data. Examples of computational methods in RPHD include multiple alignment and classification. The use of multiple alignment has been proven to improve RPHD [1]. There are two types of multiple alignments in bioinformatics; the multiple sequence alignments and the multiple structural alignments. Based on the observation that protein three-dimensional structure are surprisingly stable with respect to amino acid sequences [2], it is suggested that alignments derived from structural information should have an increased accuracy over sequence alignment alone. On the other hand, in terms of classification, studies have shown that discriminative classification algorithm such as Support Vector Machines (SVMs) outperform generative classification algorithm such as profile Hidden Markov Models (pHMMs) [3-5]. However, there is no study yet that has assessed the combination of these two types of classification algorithm with multiple sequence alignment and multiple structural alignments, further determining which combination is the best. Meanwhile, due to weaknesses in the multiple alignments algorithm, the tendency of misalignments to occur during the multiple alignment process is unacceptably high [6]. These misalignments have to be reduced as it would bring serious alignment errors that could affect the performance of the classification algorithm, thus influencing the determination on the best combination of the multiple alignments and classification algorithm. Multiple alignments can be divided into two types; multiple sequence alignments and multiple structural alignments. A multiple sequence alignment is a sequence alignment of three or more biological sequences. Multiple sequence alignments arranges protein sequences into a rectangular array with the goal that residues in a given column are homologous, superposable or play a common functional role [7]. Examples of most recent multiple sequence alignment software includes MUMMALS [8], DIALIGN-TX [9], MAVID [10] and BAli-Phy [11]. On the other hand, a multiple structural alignment is a form of sequence alignment based on comparison of shape. Multiple structural alignments has been proven to be helpfulin protein structure classification and structure-based functionprediction by highlighting structurally conserved regions of functional significance as well as selectivity determinants [12, 13]. Examples of most current multiple structural alignment softwares include CAALIGN [14], Vorolign [15], Matt [16] and POSA [17]. In order to provide classification on the multiple alignment of protein, two types of classification algorithm are usually used in bioinformatics; the generative classification algorithm and the discriminative classification algorithm. Generative classification algorithm offer a probabilistic measure of relationship between a new sequence and a particular family. These methods such as pHMMs can be trained
An Optimal Mesh Algorithm for Remote Protein Homology Detection
473
iteratively in a semi-supervised manner using both positively labelled and unlabelled samples of a particular family. Examples of pHMMs softwares includes HMMEditor [18], Profile Comparer [19], Meta-MEME [20] and GENEWISE [21]. In the meantime, discriminative classification algorithm focuses on learning a combination of the features that discriminate between the families. Discriminative classification algorithm are able to gain additional accuracy by modelling the difference between positive and negative examples explicitly. Thus, providing state-of-the-art performance with appropriate kernels. Examples of SVM softwares includes SVMGist [22], SVM Classifier [23], SVM-Prot[24] and SVM-Fold [25]. The use of heuristic approaches in multiple alignments which usually sacrifices accuracy in favour of computational efficiency has led to the problem of errors called misalignments. In progressive alignment strategy for example, errors during the early steps of the alignment process are propagated to the final output. This is certainly undesirable as multiple alignments are very important in many of bioinformatics application such as RPHD, protein structure prediction and phylogenetic analysis. Therefore, approaches to reduce the misalignments has been introduced which include the use of phylogenetic tree, consistency-based objective function and iterative realignment. Example of work using phylogenetic tree is a tool developed by Manohar and Batzoglou [26]. In their tool named TreeRefiner, sum-of-pairs function in a restricted three-dimensional space around the alignment is optimized when the tool is given a multiple alignment, a phylogenetic tree and scoring parameters as inputs.In the meantime, a consistency-based objective function incorporates multiple sequence information in scoring pairwise alignments. Example of work using consistency-based objective function is a work by Notredame et al. [27].In their work, a tool named COFFEE,evaluation is made by comparing each pair of aligned residues observed in the multiple alignments to those present in the library of a collection of all-against-all pairwise alignment on a set of given protein sequences. Meanwhile, iterative approaches usually work by performing a certain refinement function iteratively. Iterative approaches have been used either alone or in combination with other methods. For example, Edgar [28] has applied iterative refinement on MUSCLE algorithm using a variant of tree-dependant restricted partitioning. The advantage of using iteration over other approaches is that it is usually very simple whether in terms of coding the algorithm for the integration or in terms of the complexity of computational time and memory required [29].
2 Framework for Finding the Optimal Mesh Algorithm The framework to find an optimal mesh algorithm for RPHD is consisted of three stages as shown in Figure 1. In the first stage, the multiple alignments are produced using multiple sequence alignment and multiple structural alignment programs. Four multiple sequence alignment programs namely CLUSTALW [30], MAFFT [31], ProbCons [32] and TCOFFEE [33] are chosen to be used in this paper while another four that are 3DCOFFEE [34], MAMMOTH-mult [35], MUSTANG [36] and PROMALS3D [37] are used for multiple structural alignments. In the second stage, a refinement algorithm is performed on the results of multiple alignments. The purpose is to reduce the misalignments made during the alignment process thus increasing its
474
F.M. Abdullah et al.
SCO P 1.73 Dataset
Stage 1: Multiple alignmen t of protein sequenc es Stage 2: Realignm ent of protein
SequenceBased Multiple Alignment • CLUSTAL W • T-Coffee • MAFFT
Realignment of Protein Blocks using Refinement Algorithm
SVMs • SVM-Fold • SVMStruct Stage 3: Classific ation and evaluatio n
StructuralBased Multiple Alignment • 3DCoffee • MAMMOT H-mult • MUSTAN
HMMs • SAM • HMMER
Evaluation • ROC • Precision • Recall
Fig. 1
An Optimal Mesh Algorithm for Remote Protein Homology Detection
475
accuracy. The last stage is consisted of generative classification algorithm and discriminative classification algorithm. pHMMs software that are HMMER [38]and SAM[39] are used as the generative classsifier while SVMs software that are SVMStruct [40] and SVM-Fold [41] are used as discriminative classification algorithm. Even though a similar approach has been conducted by Bernardes et al.[42] in their work by comparing the performance of various multiple alignment software, yet they only use pHMMs for classification. They also assessed only four multiple sequence alignment software and two multiple structural alignment software. This paper will take into account the use of refinement algorithm to refine multiple alignments, the use of SVMs for classification and add another two multiple structural alignment softwares for evaluation. Meanwhile, Chakrabarti et al.[43]also has applied similar refinement algorithm but they only used HMMER and SALTO_global [44] to test their work. Moreover, the dataset which they used are different from this paper where the algorithm is tested on 362 CDD [45] multiple alignments and 900 Pfam [46] alignments. They also only tested the algorithm with multiple sequence alignment using CLUSTALW and T-Coffee. This paper used datasets from the latest version of Structural Classification of Proteins (SCOP) [47] database which is version 1.73 and measured using Receiver Operating Characteristics (ROC)[48]. Precision and Recall test are also applied to support ROC because of its tendency to provide an exaggeratedly optimistic view of the classification results. 2.1 Dataset Generation SCOP is a manually examined database of protein folds and structures that provides a comprehensive ordering according to their evolutionary and structural relationships. In this database, protein domains from all species are classified hierarchically into families, superfamilies, folds and classes. This database also includes all entries in the Protein Data Bank (PDB) [49]. Therefore, it is perfect for works on RPHD. In this paper, SCOP database version 1.73 that is the latest version of the database is used to provide the datasets. This paper only considers protein with identity below 30% to be used in this work. This paper is conducted at superfamily level because at this level families are grouped such that a common evolutionary origin is not obvious from sequence identity, but instead probable through functional features and structure analysis. Figure 2 displays the diagram and flowchart of the dataset preparation. The datasets used in this paper are cross-validated in order to make our approach different. First, protein sequences from SCOP database are divided by super-family. Only super-families that have at least two families and 20 sequences are chosen. As a result, 35 super-families are obtained as listed in Table 1. Meanwhile, from the superfamilies we obtained 411 families and the total numbers of protein sequences are 6,496. Then, leave-one-family-out cross validation are performed where for any superfamily x having n families, n profiles are built so that each profile P was built from the sequences in the remaining n-1 families. Lastly, the n-1 sequences are defined as the training set for profile P while the remaining sequences forms the test positives set for profile P and all other database sequences forms the test negatives set. 2.2 Multiple Alignments Cross-validated protein sequences from the dataset generation process are input into multiple alignment programs. In this paper, four multiple sequence alignment
F.M. Abdullah et al.
A
476
477
Fig. 2
An Optimal Mesh Algorithm for Remote Protein Homology Detection
478
F.M. Abdullah et al.
Table 1. Superfamily ids. The table list all the superfamily ids for the protein sequence selected after leave-one family-out cross validation. a.3.1 b.60.1 c.69.1 d.58.18
a.5.2 b.71.1 c.94.1 d.81.1
a.26.1 b.82.1 c.108.1 g.3.6
a.35.1 b.121.4 d.3.1 g.3.7
a.39.1 b.122.1 d.14.1 g.3.11
a.118.1 c.3.1 d.15.1
b.6.1 c.52.1 d.32.1
b.18.1 c.55.3 d.38.1
b.29.1 c.67.1 d.58.4
b.36.1 c.68.1 d.58.7
programs and four multiple structural alignment programs are used to provide the multiple alignments. The multiple sequence alignment programs that were used are as follows: •
•
•
•
CLUSTALW (http://www.clustal.org/) is a progressive alignment algorithm that improvise the sensitivity of original progressive multiple sequence alignment methods via sequence weighting, position-specific gap penalties and weight matrix choices. Initially, it obtains a guide tree and then uses a greedy search to calculate the best match for the selected sequences over aligned clusters of sequences and lines them up so that the identities, similarities and differences can be seen. T-Coffee (http://www.tcoffee.org) also applied a progressive alignment algorithm. It works by improving the quality of the initial pairwise sequence alignment via taking into account the alignment between all the pairs as it carries out every step in the progressive alignment algorithm. By default, TCoffee compares all sequences two by two, producing a global alignment and a series of local alignments using lalign. It then combines all these alignments into a multiple alignment. MAFFT (http://align.bmr.kyushu-u.ac.jp/mafft) package is consisted of five alignment methods that are FFT-NS-1, FFT-NS-2, NW-NS-1, NW-NS-2 and FFT-NS-i. First, it applies Fast Fourier Transform (FFT) to rapidly identify homologous regions and then it uses a simplified scoring system for reducing Central Processing Unit (CPU) time and to increase the accuracy of alignments. In its latest version, improvement have been made and new iterative refinement options, H-INS-I, F-INS-i and G-INS-I are offered [45]. ProbCons (http://probcons.stanford.edu) merges the use of probabilistic modeling and consistency-based alignment techniques. It introduces probabilistic consistency, a novel scoring function that is based on paired HMMs using progressive multiple sequence alignment. Its emission probabilities which correspond to traditional substitution scores are based on the BLOSUM62 matrix [46]. Transition probabilities which correspond to gap penalties are trained with unsupervised expectation maximization.
Meanwhile, in order to provide multiple structural alignments, four programs that were used in this paper are as follows: •
3DCoffee (http://www.tcoffee.org) is an aligner based on T-Coffee, but uses pairwise structure comparison to improve accuracy. Pairwise structure comparison is performed by SAP [47] if both structures are known. If only one structure is known, 3DCoffee uses the FUGUE threading method [50].
An Optimal Mesh Algorithm for Remote Protein Homology Detection
•
•
•
479
MAMMOTH-mult (http://ub.cbm.uam.es) is a progressive multiple alignments program that uses a sequence independent heuristic to obtain a fully structural alignment. It starts from a Cα trace to obtain an alignment. Second, it finds an alignment of local structures based on computing a similarity score from the URMS metrics [51]. Third, it finds similar local structures with their Cα close in Cartesian space. MUSTANG (http://www.cs.mu.oz.au/~arun/mustang/) is a reliable and robust algorithm for the alignment of multiple protein structures. Given a set of protein structures, the program constructs a multiple alignment using the spatial information of the Cα atoms in the set. This algorithm gains accuracy through novel and effective refinement stages which is broadly based on the progressive pairwise heuristic. The output of MUSTANG consists of the multiple sequence alignment and the corresponding structural superpositions. PROMALS3D (http://prodata.swmed.edu/promals3d/) is an extension of PROMALS [50], a progressive method that clusters similar sequences and aligns them by a simple and fast algorithm which is based on the use of 3D structural information to guide sequence alignments. This method appliesmore highly structured and elaborate techniques to align the relatively differing clusters to each other. As an extension, firstly PROMALS3D identifies homologs with known 3D structures for the input sequences automatically. Second, it derives structural constraints through structurebased alignments and then combines them with sequence constraints to construct consistency-based multiple sequence alignments. The output is a consensus alignment that brings together sequence and structural information about input proteins and their homologs. The advantage of PROMALS3D is that it gives researchers an easy way to produce high-quality alignments consistent with both sequences and structures of proteins.
2.3 Refinement Algorithm The results from the multiple alignment process are then input into a refinement algorithm. This refinement algorithm takes into account the block model of the protein family: a representation of conserved sequence or structure regions that are almost impossible to contain gaps. The conserved regions are also common to all the family members. These block models comprises a prearranged set of one or more non-overlapping blocks: regions where every sequences are aligned without gaps. This algorithm iteratively selects random sequences from the multiple alignments and realigns it with the family block model. The iteration continues until the multiple alignment score comes to a stable value or until the iteration cycle terminates. The order in which the sequences are refined is randomized in order to avoid bias and make the use of multiple iterations more effective. In this paper, multiple alignments from the multiple alignment softwares are first input into ReadSeq (http://iubio.bio.indiana.edu/soft/molbio/readseq/java/):a software to convert the multiple alignments format from Clustal to Fasta. Then, the multiple alignments are converted to .cn3 files using fa2cd (ftp://ftp.ncbi.nih.gov/pub/REFINER/). The multiple alignments have to be converted in such a way because the refinement algorithm only understand inputs in the form of CD format, that is the same format
480
F.M. Abdullah et al.
used by CDD[39] database. One or more iterations of refinement which contains a stage of ‘block shifting’ followed by a ‘block editing’ stage are performed in the algorithm. In the block shifting stage of the refinement, a dynamic programming(DP)[52] module that works as an engine runs on each sequence of the original multiple alignment to set new block positions. The DP module finds the optimal placement of every block in the block model on the protein sequence: rows in the multiple alignments, by scoring each allowed arrangement of the blocks on the sequence using the Position Specific Scoring Matrices(PSSM)[53]of the multiple alignments. The DP engine uses row scores: a function to determine the refined block positions for the selected sequence. The function is denoted as follows:
Sr = ∑ PSSM ( i, AAir ), L
(1)
i =1
where for an alignment of length L, Sr is the sum of scores derived from the PSSM over all aligned positions of the sequence r. Therefore, The PSSM is indexed by alignment column i and the corresponding residuesAA from the sequence r. Meanwhile, the alignment score is calculated using an objective function which is denoted as follows: N
SN = ∑Sr,
(2)
r=1
where for a multiple alignment with N rows, SN are the sum of all the row scoresSr.If the block shifting stage of the refinement has an effect on a sequence, the position of some of the blocks of the alignment on that sequence are changed or updated. The updates are represented with a function Bwhich is denoted as follows:
B= where for a given sequence,
Δ s h ift
Δshift , lb
(3)
is the difference in the position of block before
l
and after the refinement and b is the length of the blockb. The shifted blocks retain the same relative order after DP. This block shifting stage will iterates until all sequences within the multiple alignments are used. Next, after all sequences in the multiple alignments have been used, the subsequent block editing stage examines each block across all the sequences to see if it is advantageous to extend it. A 3-fold heuristic criteria is used to control block editing based on the statistical properties observed for block and loop regions.After each of the block shifting stage, columns in loop regions adjacent to blocks are inspected and added to existing blocks until a column fails the 3-fold heuristic criteria. The 3-fold heuristic criteria are described as follows: •
First, the residues aligned in block-forming columns should have a median PSSM matrix value of at least 3 for a column to be added. In their work,
An Optimal Mesh Algorithm for Remote Protein Homology Detection
•
•
481
Chakrabarti et al. [43]concluded that this criterion have to be met before a column can be added based on the pattern of the median PSSM value in block-forming columns in their test datasets. Second, the frequency of occurrence of negative scores should not exceed 0.3. The frequency of negative scores is simply computed as the ratio of the number of sequences with a negative PSSM value to the number of alignment rows. Third, the relative weight of negative PSSM values also should not exceed 0.3. The relative weight of negative PSSM values are computed as the absolute sum of negative PSSM values in a column divided by the sum of all PSSM values in that column.
The second and third threshold value is the characteristic of alignment column where block extension is favorable as suggested by Chakrabarti et al. [43].Convergence is declared when no further improvement of overall alignment score is observed or all iterations have expired. Lastly, the outputs of the refinement algorithm are converted back to Fasta formats followed by another conversion to Clustal formats using the ReadSeq software. This conversion is done in order to suit the requirement of the classification algorithm which only understands inputs in Clustal formats. The flow of the refinement process is represented in Figure 3.The refinement algorithm applied in this paper is not novel; it has been introduced by Chakrabarti et al.[43] and named as REFINER. However, this algorithm has not yet been tested on multiple alignments programs other than CLUSTALW. This algorithm also has not yet been tested on datasets from SCOP 1.73. Therefore, the contribution of this paper is that of the application of this refinement algorithm on the multiple alignments from different programs towards reducing the misalignment problems on datasets from SCOP 1.73. 2.4 Classification Algorithm The result from each of the multiple alignment programs are classified using pHMMs and SVMs. The pHMMs programs that were used are as follows: •
•
HMMER (http://hmmer.janelia.org/) works by differentiating between match alignment columns and insert alignment columns in model building. Then, given the model HMMER allocates columns to the match or insert states. This is done in order to increase the posterior probability of the aligned sequences to the maximum. A Dirichlet mixture with 9 stages is used for priors and Viterbi algorithm is used to do the scoring. HMMER version 2.3.3 was used in this works where three stages in the software are used that are model building, model calibration and model scoring. In model building, hmmbuild process is carried out to build HMMER models. Next, models are calibrated using hmmcalibrate. Then, hmmsearch process is executed for scoring. Throughout the software execution, HMMER default parameters are applied. The main difference between SAM (http://compbio.soe.ucsc.edu/sam2src/) and HMMER is that SAM uses a script called SAM-T2K script which performs iterative process to generate multiple alignments and HMMs.
482
F.M. Abdullah et al.
Furthermore, the developer of this software intends to improve it by using information on protein structure and prior probabilities. SAM also applies standard pHMMs architecture with 9 transitions yet however it does not dfferentiate between match and insert columns as HMMER does. This software uses a Dirichlet mixture with 20 stages for priors and Forward algorithm is used for scoring. This paper applies SAM version 3.4 where two stages in the software are used. In the first stage, modelfromalign is used to build the models. This is followed by the second stage where hmmscore is used for score computation. SAM default parameters are applied throughout the program execution. Meanwhile, in order to provide us with discriminative classification the results from the multiple alignment process are input into SVMs. The SVMs softwares that were used are as follows: •
•
•
SVM-Struct (http://svmlight.joachims.org/) performs supervised learning by approximating a mapping of h: X Æ Y using labeled training examples ( x1, y1),..., ( xn, yn ) . Different with regular SVMs, SVM-Struct can predict complex objects y such as trees, sequences or sets. Examples of problems with complex outputs are natural language parsing, sequence alignment in protein homology detection and Markov models for part-ofspeech tagging. SVM-Struct can also be used for linear-time training of binary and multi-class SVMs under the linear kernel. In this paper, a program called pafig (http://www.mobioinfor.cn/ pafig/) is used to convert multiple aligments to feature vectors. Then, the feature vectors are input into two programs that are svm_learn and svm_classify. The latest version of SVM-Struct is version 3.1. SVM-Fold (http://svm-fold.c2b2.columbia.edu/) combines SVMs kernel methods with a novel multi-class algorithm, delivering efficient and accurate protein fold and superfamily recognition. It detects subtle protein sequence similarities by learning from all available annotated proteins, as well as utilizing potential hits as identified by PSI-BLAST [53]. SVM-Fold uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel where the underlying feature representation is a histogram of inexact matching k-mer frequencies. SVM-Fold also employs a novel machine learning that is the one-vs-all approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes.
2.5 Performance Evaluation In this paper, we compare the ROC value of each combination of multiple alignment and classifier program to evaluate its performance. ROC score is the normalized area under a curve that plots true positives against false positives for different possible thresholds for classification [54]. The ROC contingency table as shown in Table 2 is referred in order to analyze evaluation measures in family classification.
An Optimal Mesh Algorithm for Remote Protein Homology Detection
483
START CLUSTAL format multiple sequence alignment d1qksa1_a.3.1.2 SDAQYNEANKIYFERCAGCHGVLRKG--ATGKALTPDLTR-DLGFDYLQSFITYASPAG d1kb0a1_a.3.1.6 DPAKVEAGTXLYVANCVFCHGVPG----VDRGGNIPNLGYMDASYIENLPNFVFKGPAM d1ppjd1_a.3.1.3 DHTSIRRGFQVYKQVCSSCHSMDYVAYRHLV GVCYTEDEAKALAEEVEVQDGPNEDGEMF d1m70a1_a.3.1.4 -AGDAEAGQGKVAVCGACHGVDG----NSPAPNFPKLAGQGERYLLKQLQDIKAGSTP
Input: Multiple alignments
Objective function for block shifting. DP engine runs on each sequence of the original multiple alignments to set new block positions. Sequences are selected randomly to avoid bias.
3-fold heuristic criteria for block editing based on observed statistical properties
No Converged?
CLUSTAL format multiple sequence alignment d1ppjd1_a_3_1_3 DHTSIRRGFQVYKQVCSSCHSMDYVAYRHLV GVCYTEDEAKALAEEVEVQDGPNEDGEMF d1qksa1_a_3_1_2 SDAQYNEANKIYFERCAGCHGVLRKG--ATGKALTPDLTR-DLGFDYLQSFITYASPAG d1kb0a1_a_3_1_6 DPAKVEAGTXLYVANCVFCHGVPGV----DRGGNIPNLGYMDASYIENLPNFVFKGPAM d1m70a1_a_3_1_4 -AGDAEAGQGKVAVCGACHGVDGN----SPAPNFPKLAGQGERYLLKQLQDIKAGSTP
Yes Output: Refined multiple alignments
End
Fig. 3
Entries in the contingency table with n number of sequences are described as follows: TP FN FP
= number of examples correctly classified as positives = number of positive examples incorrectly classified as negative = number of negative examples incorrectly classified as positive
484
F.M. Abdullah et al.
TN n
= number of negative examples correctly classified as negative = TP + FN + FP + TN (total number of sequences)
In ROC space, False Positive Rate (FPR) are plotted on the x-axis and True Positive Rate (TPR) are plotted on the y-axis. Both FPR and TPR are calculated using the following formulas: FP , FP + TN
FPR =
(4)
where FPR is the fraction of negative examples that are misclassified as negatives and
TPR =
TP , TP + FN
(5)
where TPR is the fraction of positive examples that are correctly classified as positive. Meanwhile, in Precision Recall (PR) space Recall are plotted on the x-axis and Precision are plotted on the y-axis. The formula for PR is defined as follows:
Recall =
TP , TP + FN
(6)
where Recall is the fraction of positive examples that are correctly classified as positive and Precision =
TP , TP + FP
(7)
where Precision is the fraction of examples classified as positives (i.e: TP and FP) that are correctly positive. Information encoded in the contingency table can be used not only in the evaluation measurements of RPHD, but can also be applied in general classification problems. In this paper, the best combinations of a multiple alignment and a classification program are denoted by looking at the highest ROC scores. Table 2. ROC contingency table. The table shows the ROC contingency matrix. Actual Positive
Actual Negative
Predicted Positive
True Positives (TP)
False Negatives (FN)
Predicted Negative
False Positives (FP)
True Negatives (TN)
3 Results 3.1 HMMER Performance HMMER performance was assessed using multiple alignments from CLUSTALW, MAFFT, ProbCons and T-Coffee for MSA and 3DCoffee, MAMMOTH-mult,
An Optimal Mesh Algorithm for Remote Protein Homology Detection
485
A
B
C
Fig. 4
MUSTANG and PROMALS3D for MStA. Figure 4(a) shows the ROC results of unrefined and refined HMMER. Figure 4(b) and Figure 4(c) shows the Precision and Recall result of unrefined and refined HMMER respectively. For MSA, refined HMMER-MAFFT performs the best in terms of ROC with value of 0.83. Meanwhile, unrefined HMMER-CLUSTALW performs the worst with an ROC of 0.53. The result
486
F.M. Abdullah et al.
for Precision test shows that refined HMMER-ProbCons performs the best with value of 0.02893. On the other hand, unrefined HMMER-CLUSTALW performs the worst with a value of 0.00694. For Recall test, refined HMMER-T-Coffee performs the best with a value of 0.03124. Unrefined HMMER-CLUSTALW instead performs the worst with a value of 0.00756. For MStA, refined HMMER-PROMALS3D performs the best in terms of ROC with the value of 0.85. Meanwhile, unrefined HMMER-MUSTANG performs the worst with an ROC of only 0.5. Precision test results further shows that refined HMMER-PROMALS3D performs the best with the value of 0.03377. On the other hand, unrefined HMMER-MAMMOTH-mult performs the worst with only 0.01206. As for Recall test, refined HMMER-PROMALS3D performs the best with a value of 0.03957. Meanwhile, unrefined HMMER-MUSTANG performs the worst with a value of only 0.01394. Overall experiment result for HMMER derived from MSA and MStA shows that refined HMMER-PROMALS3D performs the best in terms of ROC with the value of 0.85. Meanwhile, unrefined HMMER-MUSTANG performs the worst with value of only 0.5. Precision test further shows that refined HMMER-PROMALS3D performs the best compared to other multiple alignment tools with a value of 0.03377. Meanwhile, the combination of refined HMMER-CLUSTALW performs worst with Precision value of 0.00694. Recall test results also shows that refined HMMERPROMALS3D performs the best with a value of 0.03957. The combination of refined HMMER-CLUSTALW on the other hand performs the worst with a value of 0.00756. 3.2 SAM Performance SAM performance was also assessed using multiple alignments from CLUSTALW, MAFFT, ProbCons and T-Coffee for MSA and 3DCOFFEE, MAMMOTH-mult, MUSTANG and PROMALS3D for MStA. Figure 5(a) shows the ROC result of unrefined and refined multiple alignments combined with SAM. Figure 5(b) and Figure 5(c) both shows the result of Precision and Recall test respectively. ROC results for MSA shows that, refined SAM-MAFFT performs the best with ROC of 0.86. On the other hand, unrefined SAM-MAFFT, unrefined SAM-T-Coffee and refined SAM-ProbCons perform worst with an ROC of only 0.6. Results from Precision test in the meantime shows that refined SAM-ProbCons performs the best with a value of 0.03094. Meanwhile, the worst performance for this test is shown by the combination of unrefined SAM-T-Coffee with a value of only 0.009. As for Recall test, refined SAM-ProbCons displays the best performance with a value of 0.02797. The combination of refined SAM-T-Coffee in the meantime performs worst with a value of only 0.00973. For MStA, the combination of refined SAM-PROMALS3D performs the best in terms of ROC with a value of 0.87. On the other hand, unrefined SAM-3DCoffee and SAM-MAMMOTH-mult performs worst with an ROC of only 0.59. Precision test also shows that the combination of refined SAM-PROMALS3D performs best with a value of 0.03893. Meanwhile, the worst combination for this test is shown by the combination of unrefined SAM-3DCoffee with a value of only 0.019. The Recall test further shows that refined SAM-PROMALS3D performs the best with a value of 0.04147. Meanwhile, the combination of unrefined SAM-3DCoffee performs worst with a value of only 0.02.
An Optimal Mesh Algorithm for Remote Protein Homology Detection
487
The overall result of the comparison between unrefined and refined SAM combined with multiple alignment tools shows that refined SAM-PROMALS3D performs best with an ROC of 0.87. Meanwhile, the worst performance is shown by the combination of unrefined SAM-3DCoffee and SAM-MAMMOTH-mult with an ROC of 0.59 respectively. In terms of Precision test, the combination of refined SAM-PROMALS3D performs the best with a value of 0.03893. Meanwhile, unrefined SAM-T-Coffee performs the worst with a value of only 0.009. Recall test also shows that the combination of refined SAM-PROMALS3D performs the best with a value of 0.04147. Meanwhile, the worst performance is shown by the combination of refined SAM-T-COffee with a value of only 0.00973.
Fig. 5
Fig. 6
3.3 HMMER and SAM Performance The comparison between HMMER and SAMperformance derived from the refined multiple alignments is also compared. Figure 6(a) shows the comparison of ROC between HMMER and SAM. Figure 6(b) and Figure 6(c) both shows the results
488
F.M. Abdullah et al.
comparison of Precision and Recall test respectively. For MSA, the combination of refined SAM-MAFFT performs the best in terms of ROC with a value of 0.86. Meanwhile, the worst combination is shown by refined SAM-ProbCons with an ROC of only 0.6. In terms of Precision test, the combination of refined SAM-ProbCons performs the best with a value of 0.03094. However, the combination of refined SAM-T-Coffee performs the worst with a value of 0.02061. Meanwhile, Recall test shows that the combination of refined HMMER-T-Coffee performs the best with a value of 0.03124. On the other hand, the combination of refined SAM-T-Coffee performs worst with a value of only 0.00973. For MStA, the combination of refined SAM-PROMALS3D performs the best in terms of ROC with a value of 0.87. The worst combination is shown by the combination of refined HMMER-MUSTANG with an ROC value of only 0.79. In terms of Precision test, the combination of refined SAM-PROMALS3D are also shown to performs the best with a value of 0.03893. Refined SAM-3DCoffee on the other hand performs the worst with a value of only 0.02670. Recall test further shows that refined SAM-PROMALS3D performs the best with a value of 0.04147. In the mean time, refined HMMER-3DCoffee performs the worst with a value of only 0.02402. From the overall result comparison, refined SAM-PROMALS3D are shown to performs the best in terms of ROC with a value of 0.87. Meanwhile, refined SAMProbCons performs the worst with an ROC of only 0.60. In terms of Precision, refined SAM-PROMALS3D are shown to performs the best with a value of 0.03893. On the other hand, refined SAM-T-Coffee combination performs the worst with a value of only 0.02061. In the mean time, Recall test result also shows that the combination of refined SAM-PROMALS3D performs the best with a value of 0.04147. Meanwhile, the worst combination is displayed by refined SAM-T-Coffee with a value of only 0.00973. 3.4 SVM-Fold Performance This paper also assessed the performance of SVM-Fold using multiple alignments derived from CLUSTALW, MAFFT, ProbCons and T-Coffee for MSA and 3DCOFFEE, MAMMOTH-mult, MUSTANG and PROMALS3D for MStA. Figure 7 (a) displays the results of ROC for unrefined and refined SVM-Fold. Figure 7(b) and Figure 7(c) both shows the result of Precision and Recall test of unrefined and refined SVM-Fold respectively. For MSA, refined SVM-Fold-MAFFT and SVM-Fold-T-Coffee both performs the best in terms of ROC with a value of 0.86 respectively. On the other hand, worst performance was displayed by the combination of refined SVM-Fold-ProbCons with an ROC value of only 0.61. For Precision test, refined SVM-Fold-ProbCons performs the best with a value of 0.029. Meanwhile, unrefined SVM-Fold-CLUSTALW shows the worst performance for this test with a value of only 0.018. In the meantime, Recall test shows that refined SVM-Fold-T-Coffee performs the best with a value of 0.028. Unrefined SVM-Fold-CLUSTALW on the other hand performs worst in this test with a value of only 0.011. As for MStA, ROC test shows that refined SVM-Fold-3DCOFFEE performs the best with 0.87. Unrefined SVM-Fold-MAMMOTH-mult on the other hand performs
An Optimal Mesh Algorithm for Remote Protein Homology Detection
489
worst with an ROC of only 0.57. Meanwhile, Precision test shows that refined SVMFold-PROMALS3D performs the best with a value of 0.035. In the meantime, unrefined SVM-Fold-MAMMOTH-mult performs the worst with a value of only 0.021. As for Recall test, refined SVM-Fold-PROMALS3D also performs the best with a value of 0.035. Meanwhile, unrefined SVM-Fold-MUSTANG displays the worst performance with a value of only 0.019. Overall performance comparison between the combination of SVM-Fold with different type of multiple alignments shows that refined SVM-Fold-3DCOFFEE performs the best with an ROC of 0.87. On the other hand, worst performance is shown by the combination of unrefined SVM-Fold-MAMMOTH-mult with an ROC of only 0.57. Precision test in the meantime shows that the combination of refined SVM-Fold-PROMALS3D performs the best with a value of 0.035. On the other hand, unrefined SVM-Fold-CLUSTALW performs the worst in this test with a value of only 0.018. As for Recall test, refined SVM-Fold-PROMALS3D also performs the best with a value of 0.035. Unrefined SVM-Fold-CLUSTALW in the meantime performs the worst with a value of only 0.011. 3.5 SVM-Struct Performance SVM-Struct performance in this paper is also assessed using multiple alignments from CLUSTALW, MAFFT, ProbCons and T-Coffee for MSA and 3DCOFFEE, MAMMOTH-mult, MUSTANG and PROMALS3D for MStA. Figure 8(a), Figure 8(b) and Figure 8(c) shows the ROC, Precision and Recall result of unrefined and refined SVM-Struct respectively. For MSA, the best performance in terms of ROC is shown by refined SVM-Struct-ProbCons with an ROC of 0.88. Meanwhile, unrefined SVM-Struct-T-Coffee performs worst with ROC of 0.69. Precision test also shows that refined SVM-Struct-ProbCons performs best with a value of 0.034. Unrefined SVM-Struct-CLUSTALW instead performs worst with a value of only 0.01732 compared to others in this test. As for Recall test, refined SVM-Struct-ProbCons once again comes best with a value of 0.035. Meanwhile, unrefined SVM-StructCLUSTALW performs the worst with a value of only 0.01363. MStA derived SVM-Struct witnesses refined SVM-Struct-PROMALS3D to be performing the best compared to others in terms of ROC with a value of 0.89. On the other hand, unrefined SVM-Struct-MAMMOTH-mult performs the worst with an ROC of 0.6. Meanwhile, Precision test also shows that refined SVM-StructPROMALS3D performs the best with value of 0.039. Unrefined SVM-Struct3DCOFFEE instead performs the worst with a value of only 0.0257. Results from Recall test also displays that refined SVM-Struct-PROMALS3D to perform the best with a value of 0.044. Meanwhile, unrefined SVM-Struct-3DCOFFEE shows the worst performance with a value of only 0.02301. Overall comparison results between SVM-Struct MSA and MStA shows that refined SVM-Struct-PROMALS3D performs the best in terms of ROC with a value of 0.89. Meanwhile, the worst performance is shown by unrefined SVM-StructMAMMOTH-mult with an ROC of only 0.6. Precision test in the meantime also shows that refined SVM-Struct-PROMALS3D gives the best performance with a value of 0.039. Unrefined SVM-Struct-CLUSTALW however displays the worst performance with a value of only 0.01732. As for Recall test, the combinations of
490
F.M. Abdullah et al.
Fig. 7
Fig. 8
refined SVM-Struct-PROMALS3D are also shown to give the best performance with a value of 0.044. Meanwhile, the worst performance is displayed by the unrefined SVM-Struct-CLUSTALW with a value of 0.01363. 3.6 SVM-Fold and SVM-Struct performance In this section, results of refined SVM-Fold and SVM-Struct derived from multiple alignments tools that are CLUSTALW, MAFFT, ProbCons and T-Coffee for MSA and 3DCOFFEE, MAMMOTH-mult, MUSTANG and PROMALS3D for MStA are discussed. Figure 9 (a) shows the ROC results of refined SVM-Fold and SVM Struct. Figure 9 (b) and Figure 9 (c) both shows the results of refined Precision and Recall of SVM-Fold and SVM-Struct respectively. For MSA, refined SVM-Struct-ProbCons performs the best with an ROC of 0.88. Meanwhile, the worst performance is shown by refined SVM-Fold-ProbCons with an ROC of only 0.61. Precision test on the other hand shows that refined SVM-StructProbCons performs the best with a value of 0.034. In the meantime, the worst performance is displayed by refined SVM-Fold-CLUSTALW with a value of only
An Optimal Mesh Algorithm for Remote Protein Homology Detection
491
0.026. As for the Recall test, refined SVM-Struct-ProbCons also shows the best result of 0.035. Meanwhile, refined SVM-Struct-CLUSTALW performs the worst with value of only 0.021. MStA derived refined SVM-Fold and SVM-Struct on the other hand shows that refined SVM-Struct-PROMALS3D performs the best in terms of ROC with a value of 0.89. Refined SVM-Struct-MAMMOTH-mult and SVM-Fold-MUSTANG instead performs the worst with an ROC value of only 0.8 respectively. Precision test also displays that refined SVM-Struct-PROMAL3D performs the best with a value of 0.039. Refined SVM-Fold-3DCOFFEE in the meantime performs the worst compared to other MStA tools with a value of only 0.029. Recall test on the other hand also shows that refined SVM-Struct-PROMALS3D performs the best with a value of 0.044. Meanwhile, the worst performance for Recall test is shown by refined SVMStruct-MUSTANG with a value of only 0.029. Overall performance comparison between refined SVM-Fold and SVM-Struct shows that refined SVM-Struct-PROMALS3D performs the best with an ROC of 0.89. Meanwhile, refined SVM-Fold-ProbCons shows the worst performance with an ROC of only 0.61. As for Precision test, refined SVM-Struct-PROMALS3D also displays the best performance with a value of 0.039. Refined SVM-FoldCLUSTALW in the meantime displays the worst performance with a value of only 0.026. Recall test also shows that refined SVM-Struct-PROMALS3D performs the best with a value of 0.044. Instead, the worst performance is shown by refined SVMStruct-CLUSTALW with a value of only 0.021. 3.7 PHMMs and SVMs Performance Lastly, the overall performance between pHMMs and SVMs derived from CLUSTALW, MAFFT, ProbCons and T-Coffee for MSA and 3DCOFFEE, MAMMOTH-mult, MUSTANG and PROMALS3D for MStA are also discussed. Figure 10(a) shows the comparison of ROC results of refined pHMMs and SVMs. Meanwhile, Figure 10(b) and Figure 10(c) displays the comparison of result of Precision and Recall tests for refined pHMMs and SVMs respectively. For MSA, refined SVM-Struct-ProbCons performs the best in terms of ROC with a value of 0.88. Meanwhile, the worst performance is shown by refined SAM-ProbCons with an ROC of only 0.6. Precision test on the other hand shows that refined SVMStruct-ProbCons performs the best with a value of 0.034. Refined SAM-T-Coffee in the meantime performs the worst with a value of 0.02061. As for Recall test, the best performance is also shown by refined SVM-Struct-ProbCons with a value of 0.035. Meanwhile, the worst performance is displayed by refined SAM-T-Coffee with a value of 0.00973. For MStA, refined SVM-Struct-PROMALS3D performs the best in terms of ROC with a value of 0.89 respectively. The worst performance on the other hand is shown by refined HMMER-MUSTANG with a value of only 0.79. For Precision test, refined SVM-Struct-PROMALS3D displays the best performance with a value of 0.039. Meanwhile, the worst performance is displayed by refined SAM-3DCOFFEE with a value of only 0.0267. As for Recall test, the best performance is also shown by the combination of refined SVM-Struct-PROMALS3D with a value of 0.044. The worst performance in the meantime is shown by refined HMMER-3DCOFFEE with a value of only 0.02402.
492
F.M. Abdullah et al.
Overall performance result comparison shows that refined SVM-StructPROMALS3D performs the best in terms of ROC with an ROC value of 0.89. On the other hand, the worst performance is shown by refined SAM-ProbCons with an ROC value of only 0.6. Precision test in the meantime witnesses that refined SVM-StructPROMALS3D also performs the best with a value of 0.039. On the other hand, refined SAM-T-Coffee shows the worst performance with a value of 0.02061. Recall test also displays that refined SVM-Struct-PROMALS3D performs the best with a value of 0.044. Meanwhile, the worst performance is shown by refined SAM-TCoffee with a value of 0.00973.
4 Discussion 4.1 PROMALS3D: The Best Multiple Alignments Based on the results from this paper, PROMALS3D can be seen to perform the best for multiple alignments. It is believed that PROMALS3D is able to achieve the best performance by deriving structural constraints from representative sequences with known structures. Furthermore, PROMALS3D uses 3D structural information to guide multiple alignments constructed using PROMALS. By automatically identifying homologs with known 3D structures, deriving structural constraints through structure-based multiple alignments and then combining them with sequence constraints to construct consistency-based multiple alignments, this software is capable to output a consensus multiple alignment which brought together sequence and structural information. 4.2 SVM-Struct: The Best Classification Algorithm As for classification algorithm, SVMs which in this case SVM-Struct employs discriminative classification approaches that are proven to be superior compared to other methods including pHMMs. Discriminative approaches typically would estimate posterior probabilities directly without attempting to model underlying probability distributions [55]. This would further focuses computational resources more on specific task which would in turn resulted on increasing performance. The SVM-Struct is specifically designed for prediction of complex outputs such as multiple alignments and has more superior generalization performance. Unlike regular SVMs, SVM-Struct considers multivariate and structured outputs. Furthermore, it also employs the 1-slack cutting-plane algorithm which uses new but equivalent formulation of the structural SVMs quadratic program. It is also several orders of magnitude faster than other methods. 4.3 REFINER: The Impact or Refinement Algorithm The performance of the optimal mesh algorithm is further enhanced through the application of the realignment algorithm which increases the quality of the multiple alignments through conserved core iterative realignment of the protein block model. The refinement algorithm is applied because multiple alignment which employs progressive approach have the problem whereby the misalignments made at previous
An Optimal Mesh Algorithm for Remote Protein Homology Detection
493
stages cannot be corrected afterwards. This will further disseminate into serious alignment errors. Making things worse, the final alignment strongly depends on the order of sequences aligned. The use of a refinement algorithm can correct misalignments between a given sequence and the rest of the profile and at the same time preserves the family’s overall block model. The ROC test shows that the performance of RPHD has been increased for 14.1% through the application of the refinement algorithm.
Fig. 9
Fig. 10
4.4 PRS: The Optimal Mesh Algorithm Based on the findings of this paper, the combination of PROMALS3D, REFINER and SVM-Struct (PRS) produces an optimal mesh algorithm for RPHD. By implementing an approach of utilizing 3D structural information to act as guide for multiple alignment construction, the algorithm is further enhanced through the use of discriminative classification approach that are proven to be superior. The algorithm also employs a refinement algorithm to further increase the quality of the multiple alignments, which in turn increased the performance for RPHD.
494
F.M. Abdullah et al.
5 Conclusions Throughout this paper, a data pre-processing procedure has been introduced to prepare the datasets for the RPHD from SCOP 1.73. Furthermore, different combinations between multiple sequence alignments and multiple structural alignments programs with pHMMs and SVMs has been introduced and integrated in order to construct an optimal mesh algorithm for RPHD. The optimal mesh algorithm consists of three main stages that are the multiple alignments stage, realignment of protein sequences stage and the classification and evaluation stage. A refining algorithm has also been applied to reduce misalignments in multiple sequence alignments and multiple structural alignments, thus further assisting in accurate RPHD. The results for RPHD using SCOP 1.73 can be introduced to be utilized as a benchmark result for other researchers to further improve the detection of remote protein homology via identifying the best combination of multiple alignments and classifier programs that include pHMMs for generative classifier and SVMs for discriminative classifier. Based on the result of the experiments, the use of realignment algorithm clearly improves the detection of remote protein homology by the programs chosen. This can be inducted from the performance graphs that shows realigned multiple alignments outperforms multiple alignment which is not realigned. Meanwhile, the optimal mesh algorithm is displayed by the refined SVM-Struct-PROMALS3D. The results and findings from this paper can be further utilized such as by integrating biological information, for example from Gene Ontology in order to further increase its capability. Apart from that, the step of feature extraction can also be improvised by applying non-negative matrix factorization to further increase the performance of the optimized algorithm. Lastly, the algorithm designed in this paper can also be utilised by transforming it into a web service which will enable researchers to perform RPHD. Acknowledgements. This project is funded by Malaysian Ministry of Higher Education under Fundamental Research Grant Scheme (project no. 78185). FMA carried out the studies and drafted the manuscript. RMO and SK conceived the study, participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.
References 1. Madera, M., Gough, J.: A comparison of profile hidden markov model procedures for remote homology detection. Nucleic Acids Research 30, 4321–4328 (2002) 2. Bourne, P., Weissig, H. (eds.): Structural Bioinformatics. Wiley-Liss, Hoboken (2003) 3. Leslie, C.S., Eskin, E., Cohen, A., Weston, J., Noble, W.S.: Mismatch string kernels for discriminative protein classification. Bioinformatics 20, 467–476 (2004) 4. Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Computational Biology 7, 95–114 (2000) 5. Liao, L., Noble, W.S.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology 10, 857–868 (2003)
An Optimal Mesh Algorithm for Remote Protein Homology Detection
495
6. Chakrabarti, S., Lanczycki, C.J., Panchenko, A.R., Przytycka, T.M., Thiessen, P.A., Bryant, S.H.: Refining multiple sequence alignments with conserved core regions. Nucleic Acids Research 34, 2598–2606 (2006) 7. Edgar, R.C., Batzoglou, S.: Multiple sequence alignment. Current Opinion in Structural Biology 16, 368–373 (2006) 8. Pei, J., Grishin, N.V.: MUMMALS: Multiple sequence alignment improved by using hidden markov models with local structural information. Nucleic Acids Research 34, 4364–4374 (2006) 9. Subramanian, A., Kaufmann, M., Morgenstern, B.: DIALIGN-TX: Greedy and progressive approaches for segment-based multiple sequence alignment. Algorithms for Molecular Biology 3, 6–17 (2008) 10. Bray, N., Pachter, L.: MAVID: Constrained ancestral alignment of multiple sequences. Genome Research 14, 693–699 (2004) 11. Suchard, M.A., Redelings, B.D.: BAli-Phy: Simultaneous bayesian inference of alignment and phylogeny. Bioinformatics 22, 2047–2048 (2006) 12. Sheinerman, F.B., Al-Lazikani, B., Honig, B.: Sequence, structure and energetic determinants of phosphopeptide selectivity of SH2 domains. Journal of Molecular Biology 334, 823–841 (2003) 13. Al-Lazikani, B., Sheinerman, F.B., Honig, B.: Combining multiple structure and sequence alignments to improve sequence detection and alignment: application to the SH2 domains of Janus kinases. PNAS 98, 14796–14801 (2001) 14. Oldfield, T.: CAALIGN: A program for pairwise and multiple protein-structure alignment. Acta Crystallographica Section D 63, 514–525 (2007) 15. Birzele, F., Gewehr, J.E., Csaba, G., Zimmer, R.: Vorolign-fast structural alignment using voronoi contacts. Bioinformatics 23, e205–211 (2007) 16. Menke, M., Berger, B., Cowen, L.: Matt: local flexibility aids protein multiple structure alignment. PLoS Computational Biology 4, e10 (2008) 17. Ye, Y., Godzik, A.: Multiple flexible structure alignment using partial order graphs. Bioinformatics 21, 2362–2369 (2005) 18. Dai, J., Cheng, J.: HMMEditor: A visual editing tool for profile hidden markov model. BMC Genomics 9, S8 (2008) 19. Madera, M.: Profile Comparer: A program for scoring and aligning profile hidden markov models. Bioinformatics 24, 2630–2631 (2008) 20. Grundy, W.N., Bailey, T.L., Elkan, C.P., Baker, M.E.: Meta-MEME: Motif-based hidden markov models of protein families. Computer Applications in the Biosciences 13, 397– 406 (1997) 21. Birney, E., Clamp, M., Durbin, R.: GeneWise and Genomewise. Genome Research 14, 988–995 (2004) 22. Pavlidis, P., Wapinski, I., Noble, W.S.: Support vector machine classification on the web. Bioinformatics 20, 586–587 (2004) 23. Pirooznia, M., Deng, Y.: SVM Classifier - A comprehensive java interface for support vector machine classification of microarray data. BMC Bioinformatics 7, S25 (2006) 24. Cai, C.Z., Han, L.Y., Ji, Z.L., Chen, X., Chen, Y.Z.: SVM-Prot: Web-based support vector machine software for functional classification of a protein from its primary sequence. Nucleic Acids Research 31, 3692–3697 (2003) 25. Melvin, I., Ie, E., Kuang, R., Weston, J., Noble, W., Leslie, C.: SVM-Fold: A tool for discriminative multi-class protein fold and superfamily recognition. BMC Bioinformatics 8, S2 (2007)
496
F.M. Abdullah et al.
26. Manohar, A., Batzoglou, S.: TreeRefiner: A tool for refining a multiple alignment on a phylogenetic tree. In: Proceeding of the 4th International IEEE Computer Society Computational Systems Bioinformatics Conference, pp. 111–119 (2005) 27. Notredame, C., Holm, L., Higgins, D.G.: COFFEE: An objective function for multiple sequence alignments. Bioinformatics 14, 407–422 (1998) 28. Edgar, R.: MUSCLE: A multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113–132 (2004) 29. Wallace, I.M., O’Sullivan, O., Higgins, D.G.: Evaluation of iterative alignment algorithms for multiple alignment. Bioinformatics 21, 1408–1414 (2005) 30. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., et al.: Clustal W and Clustal X version 2.0. Bioinformatics 23, 2947–2948 (2007) 31. Katoh, K., Misawa, K., Kuma, K., Miyata, T.: MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Research 30, 3059– 3066 (2002) 32. Do, C.B., Mahabhashyam, M.S.P., Brudno, M., Batzoglou, S.: PROBCONS: Probabilistic consistency-based multiple sequence alignment. Genome Research 15, 330–340 (2005) 33. Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: A novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302, 205–217 (2000) 34. O’Sullivan, O., Suhre, K., Abergel, C., Higgins, D.G., Notredame, C.: 3DCoffee: Combining protein sequences and structures within multiple sequence alignments. Journal of Molecular Biology 340, 385–395 (2004) 35. Lupyan, D., Leo-Macias, A., Ortiz, A.R.: A new progressive-iterative algorithm for multiple structure alignment. Bioinformatics 21, 3255–3263 (2005) 36. Konagurthu, A.S., Whisstock, J.C., Stuckey, P.J., Lesk, A.M.: MUSTANG: A multiple structural alignment algorithm. Protein Science 64, 559–574 (2006) 37. Kann, M.G., Thiessen, P.A., Panchenko, A.R., Schaffer, A.A., Altschul, S.F., Bryant, S.H.: A structure-based method for protein sequence alignment. Bioinformatics 21, 1451–1456 (2005) 38. Eddy, S.R.: Profile hidden Markov models. Bioinformatics 14, 755–763 (1998) 39. Karplus, K., Barrett, C., Hughey, R.: Hidden Markov Models for Detecting Remote Protein Homologies. Bioinformatics 14, 846–856 (1998) 40. Rangwala, H., Karypis, G.: Profile-based Direct Kernels for Remote Homology Detection and Fold Recognition. Bioinformatics 21, 4239–4247 (2005) 41. Melvin, I., Ie, E., Kuang, R., Weston, J., Noble, W., Leslie, C.: SVM-Fold: A tool for discriminative multi-class protein fold and superfamily recognition. BMC Bioinformatics 8, 2 (2007) 42. Bernardes, J., Davila, A., Costa, V., Zaverucha, G.: Improving Model Construction of Profile HMMs for Remote Homology Detection Through Structural Alignment. BMC Bioinformatics 8, 435–447 (2007) 43. Chakrabarti, S., Lanczycki, C.J., Panchenko, A.R., Przytycka, T.M., Thiessen, P.A., Bryant, S.H.: Refining multiple sequence alignments with conserved core regions. Nucleic Acids Research 34, 2598–2606 (2006) 44. Marchler-Bauer, A., Anderson, J.B., Chitsaz, F., Derbyshire, M.K., DeWeese-Scott, C., Fong, J.H., Geer, L.Y., Geer, R.C., Gonzales, N.R., Gwadz, M., et al.: CDD: specific functional annotation with the Conserved Domain Database. Nucleic Acids Research 37, D205–210 (2009)
An Optimal Mesh Algorithm for Remote Protein Homology Detection
497
45. Finn, R.D., Tate, J., Mistry, J., Coggill, P.C., Sammut, S.J., Hotz, H.-R., Ceric, G., Forslund, K., Eddy, S.R., Sonnhammer, E.L.L., Bateman, A.: The Pfam protein families database. Nucleic Acids Research 36, D281–288 (2008) 46. Andreeva, A., Howorth, D., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: SCOP database in 2004: Refinements integrate structure and sequence family data. Nucleic Acids Research 32, D226–229 (2004) 47. Sonego, P., Kocsor, A., Pongor, S.: ROC analysis: Applications to the classification of biological sequences and 3D structures. Briefings in Bioinformatics 9, 198–209 (2008) 48. Supper, J., Spangenberg, L., Planatscher, H., Draeger, A., Schroeder, A., Zell, A.: BowTieBuilder: modeling signal transduction pathways. BMC Systems Biology 3, 67 (2009) 49. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000) 50. Katoh, K., Kuma, K., Toh, H., Miyata, T.: MAFFT Version 5: Improvement in Accuracy of Multiple Sequence Alignment. Nucleic Acids Research 33, 511–518 (2005) 51. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proceeding of the National Academy of Sciences of the United States of America 89, 10915–10919 (1992) 52. Taylor, W.R., Orengo, C.A.: Protein Structure Alignment. Journal of Molecular Biology 208, 1–22 (1989) 53. Shia, J., Blundella, T.L., Mizuguchia, K.: FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. Journal of Molecular Biology 310, 243–257 (2000) 54. Gribskov, M., Robinson, N.L.: Use of Receiver Operating Characteristic (ROC) Analysis to Evaluate Sequence Matching. Computers & Chemistry 20, 25–33 (1996) 55. Kedem, K., Chew, L.P., Elber, R.: Unit-vector RMS (URMS) as a tool to analyze molecular dynamics trajectories. Proteins 37, 554–564 (1999) 56. Pei, J., Grishin, N.V.: PROMALS: towards accurate multiple sequence alignments of distantly related proteins. Bioinformatics 23, 802–808 (2007) 57. Wang, Q., Song, E., Jin, R., Han, P., Wang, X., Zhou, Y., Zeng, J.: Segmentation of lung nodules in computed tomography images using dynamic programming and multidirection fusion techniques. Academic Radiology 16, 678–688 (2009) 58. Sato, K., Morita, K., Sakakibara, Y.: PSSMTS: position specific scoring matrices on tree structures. Journal of Mathematical Biology 56, 201–214 (2008) 59. Neuwald, A.F., Poleksic, A.: PSI-BLAST searches using hidden Markov models of structural repeats: prediction of an unusual sliding DNA clamp and of ß-propellers in UVdamaged DNA-binding protein. Nucleic Acids Research 28, 3570–3580 (2000) 60. Ng, A.Y., Jordan, M.I.: On Discriminative vs Generative Classification algorithm: A Comparison of Logistic Regression and Naive Bayes. In: Dietterich, T., Becker, S., Ghahramani, Z. (eds.) Advances in Neural Information Processing Systems (NIPS), vol. 14, pp. 841–848. MIT Press, Vancouver (2001)
Definition of Consistency Rules between UML Use Case and Activity Diagram Noraini Ibrahim1, Rosziati Ibrahim1, Mohd Zainuri Saringat1, Dzahar Mansor2, and Tutut Herawan3 1 Department of Software Engineering Faculty of Computer Science and Information Technology Universiti Tun Hussein Onn Malaysia 2 Microsoft Malaysia 3 Department of Computer System and Software Engineering Universiti Malaysia Pahang {noraini,rosziati,zainuri}@uthm.edu.my, [email protected], [email protected]
Abstract. Consistency is the situation where two or more overlapping elements of different diagrams that describe the behavior of system are jointly satisfiable. It is one of the attributes to measure the quality of UML model. Even though the research on consistency between diagrams is rapidly increased, there is still lack of research of consistency driven by use case. Therefore, this paper will define elements of use case and activity diagram, also consistency between them using logic approach. Based on an example of UML model consists of both diagrams, we show how the diagrams fulfilled our proposed consistency rules. Furthermore, the elements involved in the consistency rules are detected and formally reasoned. Keywords: Use case diagram; Activity diagram; Consistency rules; Logic.
1 Introduction Modeling of a system is an essential process in software development lifecycle. It will produce a system artifact called a system model. A system model can provide structure for problem solving; experiment to explore multiple solutions and abstractions to manage complexity. In object-oriented based software development, a system model can be developed by using Unified Modeling Language (UML). A UML system model may build up from different UML diagrams [1].They modeled at different level of abstraction from very abstract view to concrete one using several of notations. They also derived from contribution of many stakeholders and system modelers. These issues lead to consistency problems among the software models. Consistency is the situation where two or more overlapping elements of different software models that describe the aspects of system are satisfy to joint [2]. Consistency between models is very important to ensure that implementation of the models are not getting trouble [3]. In general, there are three main activities in model consistency management. They are consistency specification, inconsistency detection T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 498–508, 2011. © Springer-Verlag Berlin Heidelberg 2011
Definition of Consistency Rules between UML Use Case and Activity Diagram
499
and inconsistency handling [2, 4]. Consistency specifications are description of the consistency rules which must be respected by different UML diagrams in order for them to be consistent. When the diagrams do not hold the consistency rules, inconsistencies were aroused and they must be detected and be handled. The research on checking the consistency of UML diagrams is going rapidly [5], yet there is still lack of consistency checking driven by use case. Fryz and Kotulski [6] have defined consistency between use case and class diagram using graphs. Sapna and Mohanty [7] describe consistency between use case, activity, sequence, class and statechart diagram. While Shinkawa [8] specify consistency between use case, activity, sequence and statechart diagram using Colored Petri Net (CPN). Consistency between use case, activity and class diagram also has been defined by Chanda et al. [9] using Context Free Grammar (CFG). However, the definition of the consistency rules of the previous approach [7-9] can be refined and extended. It is important to define consistency using an approach that provides a mechanism to detecting and handling inconsistencies between diagrams [4, 10]. Therefore, in this paper, we define consistency between use case and activity diagram and the elements involved in the consistency using logic approach. We refine and propose new consistency rules than other researchers [7-9]. We show an example of a UML system model that consists of a use case diagram and two (2) activity diagrams. We also show how the diagrams fulfilled our proposed consistency rules. Furthermore, the elements involved in the consistency rules are detected and formally reasoned. Noted that UML standard [11] is referred in this paper. The rest of this paper is organized as follows. The discussion on related works is in Section 2. Section 3 presents formalization of the elements of use case diagram and activity diagram. In this section, we also present definition of three consistency rules. While on Section 4 is an example of a UML system model and elements involved in consistency problem, and Section 5 concludes the paper.
2 Preliminaries This section describes on definition of consistency rules between UML use case and activity diagram. 2.1 Formal Definition of UML Use Case Diagram (UCD) and Activity Diagram (AD) Even though there are various researchers involved in giving formal definition to UML use case diagrams, they focus to several domains. Shinkawa [8] define actor and use case in a use case diagram and activity in an activity diagram as place and transition in CPN. Use case, actor and the relationship between them are expressed as CFG by Chanda et al. [9]. They describe one of the rules as actor has relationship with actor, which is either <> or <<extend>>. This rule disagree with UML standard [11] whereby <> or <<extend>> involved binary relationship between use cases. They also define elements of activity diagram
500
N. Ibrahim et al.
such as activity, start node and object flow in CFG. Sapna et al. [7] define elements of use case diagram and activity diagram using schema table. But the definition just limited to elements of use case, actor and activity. 2.2 Consistency Rules between UML UCD and AD Chanda et al. [9] propose one (1) consistency rules between use case diagram and activity diagram. They define an action/ activity on activity diagram is one-to-one mapping to an event of a use case in use case diagram. They used the formalization of the syntax of each diagram to reason the rule using CFG. Shinkawa [8] also define one (1) consistency rule where each use case appears at least in one activity model. A use case and an activity diagram are defined as CPN model. While Sapna and Mohanty [7] propose each use case in a use case diagram must have a corresponding activity diagram and for each actor in a use case diagram there must exist a matching class in the activity diagram. The consistency rules are expressed using Object Constraint Language (OCL).
3 The Proposed Technique In this section, we described the formalization of the elements of use case diagram and activity diagram. Definition 1. A system model SysModel is consisting of different UML diagrams. 3.1 Formalization of UCD Notation 1. The set of all actors is denoted by Actor. For naming UML elements, we assume a non empty set of names Name. The set of all use case diagrams is denoted by UCD. The set of all use cases is denoted by UseCase. The set of all associations between use case and actor is denoted by Assoc. Definition 2. An include is defined as a binary relation of use cases, i.e.,
include = (including , included )∈ Include where including , included ∈ UseCase . Definition 3. An extend is defined as a binary relation of use cases, i.e.,
extend = (extending , extended ) ∈ Extend , where extending , extended ∈ UseCase . Definition 4. A Use Case Diagram (UCD) is defined as a 6-tuple, which is consists of a set of actors Actor , a set of use cases UseCase , a set of association between
Definition of Consistency Rules between UML Use Case and Activity Diagram
501
actors and use cases, a set of generalization of actors and use cases, a set of <> and <<extend>> relationship between use cases, i.e., UCD = ( Actor , UseCase, Assoc, Gen, Include, Extend ) Definition 5. An actor is an element in Actor and a name is an element in Name. Where an actor has a name. Definition 6. A use case is an element in UseCase and a useCase has a name. Definition 7. An assoc is an element in Assoc and defined as a binary relationship between an actor and a use case, where actor ∈ Actor and useCase ∈ UseCase , i.e. assoc = (actor , useCase ) ∈ Assoc Definition 8. A generalization is defined as a binary relationship may happen between use cases and/or between actors, i.e., gen actor = (actori , actor j )∈ Gen , where i ≠ j , actori is a parent, and actor j is a child. gen useCase = (useCase m , useCase n ) ∈ Gen , where m ≠ n , useCase m is a parent, and
useCase n is a child. A generalization has properties irreflexive, asymmetric and transitive. 3.2 Formal Definition of AD Notation 2 The set of all activity diagrams is denoted by AD. The set of all activity nodes is denoted by ACT. The set of all activity partitions is denoted by AP. Definition 9. An activity diagram aduseCase ∈ AD for a use case useCase, consists of an initial node IN useCase ∈ IN , a set of activity nodes ACTuseCase ∈ ACT , and a set activity partitions APuseCase ∈ AP . Definition 10. An activity diagram, ADuseCase ∈ AD must consist a start node, IN useCase ∈ IN . Definition 11. An activity diagram, ADuseCase ∈ AD could have zero or more than one activity node, ACTuseCase ∈ ACT . Definition 12. An activity diagram, ADuseCase ∈ AD could have zero or more than one activity partition, APuseCase ∈ AP . 3.3 Formalization on Consistency Rules between UML UCD and UML AD Consistency Rules 1. Each use case has at least an activity diagram. Involved UML Model Elements: SysModel, useCase, AD
502
N. Ibrahim et al.
Definition 13. An inconsistency happens when there is no associated activity diagram of a use case in use case diagram, i.e.,
∃useCase ∈ UseCase SysModel : ∃/ADuseCase ∈ ADSysModel Consistency Rules 2. An actor that associates to the use case will be an activity partition in the activity diagram. Involved UML Model Elements: useCase, actor, AP, AD Definition 14. An inconsistency happens when an actor associated to a use case does not appear as activity partition in the associated activity diagram, i.e.,
∃actor , (assoc(actor , useCase), ∃/ap ∈ APuseCase Consistency Rules 3. When an including use cases include an included use case, the associated including activity diagram must contain the included use case as an activity node. Involved UML Model Elements: includingActivity, includedActivity
includingUseCase,
includedUseCase,
Definition 15. An inconsistency happens when an including use case include an included use case, there is no associated included activity diagram in the associated including activity diagram, i.e. include = (including , included ) ∈ Include where including , included ∈ UseCase : actincluded ∈ ACTincluding
4 Result and Discussion This section shows an example of UML model for Lecture Assessment System (LAS). The model consists of UML use case diagram and activity diagrams. The elements of each diagram are described and how the diagrams fulfilled our proposed consistency rules are also shown. 4.1 UML Model for LAS
LAS is a web based system that provide effective application and to help the management of university to monitor the progress of all students. It enables lecturers to input student’s mark electronically through web based and to produce student’s grading based on total marks of assignments, tests, quizzes, laboratories and final examination. LAS also provide list of student attendance and enable to notify to the lecturer and produce the warning letters to the specific students. The system also produces several reports.
Definition of Consistency Rules between UML Use Case and Activity Diagram
503
The requirements of LAS are captured and visualized using a use case diagram as shown in Fig. 1. The functionalities of each use case in Fig. 1 are then modeled using at least an activity diagram. Based on Fig. 1, there are should be at least five (5) activity diagrams for LAS. Because of the limitation of the space of this paper, only two (2) activity diagrams are shown. Therefore, a system model for LAS is SysModel = {UCD LAS , AD LAS } . 4.2 UML Use Case Diagram of LAS
Fig. 1 shows a use case diagram for LAS.
Student Information System
<>
Manage Marks
<>
Lecturer Login <> Manage Student Attendance <>
Academic Advisor
Retrieve Report
Head of Department Manage User
Academic Management Officer System Admin
Fig. 1. Use Case diagram of LAS
Based on Fig. 1. a. A use case diagram ucd is defined as a 5-tuple which is consists of a set of actors Actor , a set of use cases UseCase , a set of association between actors and use cases, a set of generalization of actors and a set of <>, i.e.,
ucd LAS = ( ActorLAS ,UseCaseLAS , Assoc LAS , GenLAS , IncludeLAS ) ∈ UCDLAS In the diagram, there are no generalization of use cases and <<extend>> relationship between use cases.
504
N. Ibrahim et al.
b. Six actors named Lecturer, Academic Advisor, Head of Department, Academic Management Officer, System Admin and Student Information System, i.e.,
ActorLAS = {Lecturer, Academic Advisor, Head of Department , Academic Management Officer, System Admin, Student Information System} . c. Academic Advisor is generalized to Lecturer, i.e.,
gen AcademicAdvisor = (Lecturer, Academic Advisor ) d. Five use cases named Login, Manage Marks, Manage Attendance, Retrieve Report and Manage User, i.e., UseCaseLAS = {Login, Manage Marks, Manage Attendance, Retrieve Report, Manage User}
e. Manage Marks, Manage Attendance, Retrieve Report and Manage User use cases include Login use case, i.e., Include LAS = {(Manage Marks, Login ), (Manage Attendance , Login ), (Retrieve Report, Login ),
includeManageMarks
(Manage User, Login )} = (Manage Marks,Login ) = (Manage Attendance,Login ) = (Retrieeve Report,Login ) = (Manage User,Login )
includeManageAttendance includeRe trieve Re port includeManageUser
includeLogin = null f. Association of actors and use case, i.e.,
Assoc LAS = {(Lecturer, Manage Marks ), (Lecturer, Manage Student Attendance ), (AcademicAdvisor, Manage Student Attendance), (AcademicAdvisor,Retrieve Report), (System Admin, Manage User ), (Student Information System, Manage Marks)} 4.3 UML Activity Diagrams of LAS
Based on Fig. 2. a. It is an activity diagram for Manage Marks use case, i.e.
ad ManageMarks ∈ ADLAS . b. There is an initial node, i.e. IN Manage Marks ∈ IN LAS . c. There is an activity partition named Lecturer, i.e. Lecturer ∈ APManage Marks . d. There are thirteen (13) activity nodes such as “AD: Login”, “select subject code and section to be updated” and etc. " AD : Login"∈ ACTManage Marks
Definition of Consistency Rules between UML Use Case and Activity Diagram
Lecturer
AD: Login
display list of subjects of he/ she taught for current semester
select subject code and section to be updated
display list of students registered for the selected subject and code
select from list of assessment (quiz, lab, assignment, project, test, final
choices of menu set assessment
select assessment type
yes
input mark
set number of assessment
input marks
set percentag e of each assessment
prompt invalid message
no
valid marks?
accumulate the assessment for each student
yes
select "Submit" button
produce grade
continue?
no
Fig. 2. Activity diagram for Manage Marks use case
enter staff id and password
staff id and password valid?
No[ <=3 ]
yes
check roles assigned
AD: Retrieve Report
Academic Advisor, Head of Department, Academic Manage System Admin
Manage User
Lecturer
Lecturer, Academic Advisor, Head of Department, Academic Management Officer AD: Manage Marks
AD: Manage Student Attendance
Fig. 3. Activity diagram for Login use case
505
506
N. Ibrahim et al.
Based on Fig. 3. a. It is an activity diagram for Login use case, i.e. ad Login ∈ ADLAS . b. There is an initial node, i.e. IN Login ∈ IN LAS . c. There are six (6) activity nodes such as “AD: Manage Marks”, “check roles assigned” and etc, i.e. " AD : Manage Marks"∈ ACTLogin . 4.4 Consistency Rules between UML Use Case Diagram and UML Activity Diagram
Based on Fig. 1 - Fig. 3. a. For Manage Marks use case in Fig. 1, there is an associated activity diagram Fig. 2, i.e. Manage Marks ∈ UseCase LAS ,
ad ManageMarks ∈ ADLAS
Fig. 1 and Fig. 2 fulfilled Consistency Rules 1, i.e.
∃Manage Marks ∈ UseCase LAS : ∃ad Manage Marks ∈ ADLAS b. For Login use case in Fig. 1, there is an associated activity diagram Fig. 3, i.e. Login ∈ UseCase LAS
ad Login ∈ ADLAS Fig. 1 and Fig. 3 fulfilled Consistency Rules 1, i.e.
∃Login ∈ UseCaseLAS : ∃ad Login ∈ ADLAS c. In Fig. 1, Lecturer is associates/ interacts to Manage Marks use case. The Lecturer is appearing in the associated activity diagram Fig. 2 as a partition, i.e.
assoc (Lecturer , Manage Marks )∈ Assoc LAS Lecturer ∈ APManage Marks Fig. 1 and Fig. 2 fulfilled Consistency Rules 2,
∃Lecturer , (assoc ( Lecturer , Manage Marks), Lecturer ∈ APManage Marks ) d. In Fig. 1, Student Information System associates/ interacts to Manage Marks use case. The Student Information System is not appearing in the associated activity diagram Fig. 2 as a partition, i.e.
assoc (Student Information System, Manage Marks ) ∈ Assoc LAS
Definition of Consistency Rules between UML Use Case and Activity Diagram
507
Student Information System ∉ APManage Marks . Fig. 1 and Fig. 2 not fulfilled Consistency Rules 2, i.e.
∃Student Information System, (assoc ( Student Information System, Manage Marks), Student Information System ∉ APManage Marks ) . e. In Fig. 1, Manage Marks use case has included Login use case. Therefore, in associated activity diagram for the Manage Marks use case, there is an activity node “AD: Login” that refers to an activity diagram of login, i.e.
include(Manage Marks,Login ) ∈ IncludeLAS " AD : Login"∈ ACTManage Marks Fig. 1 - Fig. 3 fulfilled Consistency Rules 3, i.e.
∃include(Manage Marks,Login ) ∈ IncludeLAS :" AD : Login"∈ ACTManage Marks " AD : Login"∈ ACTManage Marks Based on the above example, activity diagram associated to Manage Marks use case in Fig. 2 is not consistent with the use case diagram defined in Fig. 1. It is because System Information System actor in Fig. 1 is not appearing as activity partition in Fig. 2. While activity diagram associated to Login use case in Fig. 3 is consistent with the use case diagram defined in Fig. 1. In associated activity diagram for Manage Marks use case in Fig. 2, there is an activity node that refers to activity diagram for Login use case. This situation is consistent to the Fig. 1 that Manage Marks use case includes Login use case.
5 Conclusion UML is a popular modeling technique especially in object-oriented based software development. A UML model produced consists of different diagrams, views of different level of abstraction and contributes from different stakeholders and modelers. These open to consistency problems between diagrams. Even though the research in this issue is increased, there is still lack of research of consistency driven by use case. With this motivation, we have proposed formalization of use case and activity diagram elements. This formalization is then used to define new consistency rules between the diagrams using logic approach. We intend to use this approach as the platform to conduct the next activity of inconsistency management framework.
Acknowledgement The authors would like to thank Ministry of Higher Education, Malaysia for supporting this research under the Scheme of Academic Training Initiative (SLAI).
508
N. Ibrahim et al.
References 1. Huzar, Z., Kuzniarz, L., Reggio, G., Sourrouille, J.L.: Consistency Problems in UMLBased Software Development. In: Nunes, N.J., et al. (eds.) UML 2004. LNCS, vol. 3297, pp. 1–12. Springer, Heidelberg (2005) 2. Spanoudakis, G., Zisman, A.: Inconsistency Management in Software Engineering: Survey and Open Research Issues. World Scientific Pub. Co., New Jersey (2001) 3. Nugroho, A., Chaudron, M.R.V.: A survey into the rigor of UML use and its perceived impact on quality and productivity. In: Proceedings of the Second ACM-IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 90– 99. ACM, Kaiserslautern (2008) 4. Hubaux, A., Cleve, A., Schobbens, P.-Y., Keller, A., Muliawan, O., Castro, S., Mens, K., Deridder, D., Straeten, R.V.D.: Towards a Unifying Conceptual Framework for Inconsistency Management Approaches. Technical Report, University of Namur Rue Grandgagnage (2009) 5. Lucas, F.J., Molina, F., Toval, A.: A Systematic Review of UML Model Consistency Management. Information and Software Technology 51, 1631–1645 (2009) 6. Fryz, L., Kotulski, L.: Assurance of System Consistency During Independent Creation of UML Diagrams. In: 2nd International Conference on Dependability of Computer Systems (DepCoS-RELCOMEX 2007), pp. 51–58. IEEE Press, Szklarska (2007) 7. Sapna, P.G., Mohanty, H.: Ensuring Consistency in Relational Repository of UML Models. In: 10th International Conference on Information Technology (ICIT 2007), pp. 217–222 (2007) 8. Shinkawa, Y.: Inter-Model Consistency in UML Based on CPN Formalism. In: 13th Asia Pacific Software Engineering Conference (APSEC 2006), pp. 414–418. IEEE Press, Kanpur (2006) 9. Chanda, J., Kanjilal, A., Sengupta, S., Bhattacharya, S.: Traceability of Requirements and Consistency Verification of UML UseCase, Activity and Class diagram: A Formal Approach. In: International Conference on Methods and Models in Computer Science 2009 (ICM2CS), pp. 1–4. IEEE Press, New Delhi (2009) 10. Straeten, R.V.D.: Inconsistency Management in Model-Driven Engineering An Approach using Description Logics. PHD Thesis, Vrije Universiteit Brussel (2005) 11. Object Management Group (OMG): OMG Unified Modeling LanguageTM (OMG UML) Superstructure version 2.2. Technical Report, Object Management Group (2009)
Rough Set Approach for Attributes Selection of Traditional Malay Musical Instruments Sounds Classification Norhalina Senan1, Rosziati Ibrahim1, Nazri Mohd Nawi1, Iwan Tri Riyadi Yanto2, and Tutut Herawan3 1
Faculty of Information Technology and Multimedia Universiti Tun Hussein Onn Malaysia, Johor, Malaysia 2 Department of Mathematics Universitas Ahmad Dahlan, Yogyakarta, Indonesia 3 Facultyof Computer System and Software Engineering Universiti Malaysia Pahang {halina,rosziati,nazri}@uthm.edu.my, [email protected], [email protected]
Abstract. Feature selection has become very important research in musical instruments sounds for handling the problem of ‘curse of dimensionality’. In literature, various feature selection techniques have been applied in this domain focusing on Western musical instruments sounds. However, study on feature selection using rough sets of non-Western musical instruments sounds including Malay Traditional musical instruments is inadequate and still needs an intensive research. Thus, in this paper, an alternative feature selection technique using rough set theory based on Maximum Degree of dependency of Attributes (MDA) technique proposed by [8] for Traditional Malay musical instruments sounds is proposed. The modeling process comprises eight phases: data acquisition, sound editing, data representation, feature extraction, data discretization, data cleansing, feature selection using proposed technique and feature validation via classification. The results show that the highest classification accuracy of 99.82% was achieved from the best 17 features with 1-NN classifier. Keywords: Rough Set Theory; Dependency of Attribute, Feature selection; Traditional Malay musical instruments sounds dataset.
1 Introduction One of the most common problems encountered in many data mining tasks including music data analysis and signal processing is the issue of ‘curse of dimensionality’. This problem deals with the high dimensional data with massive amount of attributes. Using the whole set of attribute is inefficient in term of time processing and storage requirements. In addition, it may difficult to interpret and may decrease the classification performance respectively. T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 509–525, 2011. © Springer-Verlag Berlin Heidelberg 2011
510
N. Senan et al.
It also has been proven that finding all possible reductions in an information system is NP-hard problem. For that, various feature selection algorithms have been proposed [1]. One of the potential techniques is based on rough set theory. The theory of rough set proposed by Pawlak in 1980s [2] is a mathematical tool for dealing with the vagueness and uncertainty data. In feature selection problem, rough set is implemented with the aim of finding the minimal subsets of attributes which sufficient to generate the same classification accuracy as the whole set of attributes. This minimal features set is known as reduct. Banerjee et al. [3] also claimed that the concept of reduct and core in rough set is relevant in feature selection to identify the essential features amongst the non-redundant ones. For that, several attempts on rough set as feature selection technique in musical instrument problems have been done in [4-6]. However, all these studies are conducted based on Western musical instruments. A study on rough sets for feature selection of non-Western musical instruments sounds is inadequate and still needs an intensive research. Thus, in this paper, an alternative feature selection technique based on rough set theory for Traditional Malay musical instruments sounds is proposed. The technique is developed based on rough set approximation using maximum dependency of attributes proposed by [7]. The main contribution of our work is to select the most significant features by ranking the relevant features based on the highest dependency of attributes on the dataset. Then, the redundant features with the similar dependency value are deleted. To accomplish this study, the quality of the instruments sounds is first examined. Then, the 37 features from two combination of features schemes which are perception-based and Mel-Frequency Cepstral Coefficients (MFCC) are extracted [8]. In order to employ the rough set theory, this original dataset (continuous values) is then discritized into categorical values by using equal width and equal frequency binning algorithm [9]. Afterwards, data cleansing process is done to remove the irrelevant features. The proposed technique is then adopted to rank and select the best feature set from the large number of features available in the dataset. Finally, the performance of the selected features based on the accuracy rate achieved is analyzed and compared. To do this, the open source of machine learning software known as Weka [10] with various classifier algorithms including Support Vector Machine (SVM), k-Nearest Neighbour (k-NN), and Naïve Bayes.is utilized. The rest of this paper is organized as follows: Section 2 discuss a related work on feature selection of musical instruments sounds classification problems. Section 3 presents the theory of rough set. Section 4 describes the details of the modeling process. A discussion of the result is presented in Section 5 followed by the conclusion in Section 6.
2 Related Work An assortment of feature ranking and feature selection algorithms in musical instruments sounds classification have been applied [11-13]. Liu and Wan [11] carried out a study on classifying the musical instruments into five families (brass, keyboard, percussion, string and woodwind) using NN, k-NN and Gaussian mixture model (GMM). Three categories of features schemes which are temporal features, spectral
Rough Set Approach for Attributes Selection
511
features and coefficient features (with total of 58 features) were exploited. A sequential forward selection (SFS) is used to choose the best features. The k-NN classifier using 19 features achieves the highest accuracy of 93%. In [12], they conducted a study on selecting the best features schemes based on their classification performance. The 44 features from three categories of features schemes which are human perception, cepstral features and MPEG-7 were used. To select the best features, three entropy-based feature selection techniques which are Information Gain, Gain Ratio and Symmetrical Uncertainty were utilized. The performance of the selected features was assessed and compared using five classifiers which are k-NN, naive bayes, SVM, multilayer perceptron (MLP) and radial basic functions (RBF). They found that the Information Gain produce the best classification accuracy up to 95.5% for the 20 best features with SVM and RBF classifiers. Benetos et. al [13] applied subset selection algorithm with branch-bound search strategy for feature reduction. A combination of 41 features from general audio data, MFCC and MPEG-7 was used. By using the best 6 features, the non-negative matrix factorization (NMF) classifier yielded an accuracy rate of 95.2% at best. They found that the feature subset selection method adopted in their study able to increase the classification accuracy. As mentioned in Section 1, study of the rough set as feature selection technique in musical instrument problems can be found in [4-6]. Wieczorkowska [5] conducted a study on rough set for identifying the most important parameter for musical instruments sounds classification process with a limited number of parameters used. She found that the obtained 16 reducts out of 62 full features able to produce 90% accuracy rate. In [6], rough set and neural networks are applied to classify musical instrument sounds in the basis of a limited number of parameters. Their finding shows that rough set produce highest classification accuracy up to 97.98% compared to neural network with 0.18% more. In overall, all these works demonstrate that the reduced features able to produce highest classification rate. On the other hand, Deng et al. [12] claimed that benchmarking is still an open issue in this area of research. This shows that the existing feature ranking approaches applied in the various sound files may not effectively work to other conditions. Therefore, there were significant needs to explore other feature ranking methods with different types of musical instruments sounds in order to find the best solution.
3 Rough Set Theory In this section, the basic concepts of rough set theory in terms of data representation are presented. 3.1 Information System Data are often presented as a table, columns of which are labeled by attributes, rows by objects of interest and entries of the table are attribute values. By an information system, a 4-tuple (quadruple) S = (U , A, V , f ) , where U is a non-empty finite set of
objects, A is a non-empty finite set of attributes, V =
U
a∈ A
Va , Va is the domain
512
N. Senan et al.
(value set) of attribute a, f : U × A → V is a total function such that f (u, a ) ∈ Va , for every (u, a ) ∈ U × A , called information (knowledge) function. In many applications, there is an outcome of classification that is known. This a posteriori knowledge is expressed by one (or more) distinguished attribute called decision attribute; the process is known as supervised learning. An information system of this kind is called a decision system. A decision system is an information system of the form D = (U , A U {d },V , f ) , where d ∉ A is the decision attribute. The elements of A are called condition attributes. 3.2 Indiscernibility Relation
The notion of indiscernibility relation between two objects can be defined precisely. Definition 3.1. Let S = (U , A, V , f ) be an information system and let B be any subset of A. Two elements x, y ∈ U are said to be B-indiscernible (indiscernible by the set of attribute B ⊆ A in S) if and only if f (x, a ) = f ( y, a ) , for every a ∈ B .
Obviously, every subset of A induces unique indiscernibility relation. Notice that, an indiscernibility relation induced by the set of attribute B, denoted by IND(B ) , is an equivalence relation. It is well known that, an equivalence relation induces unique partition. The partition of U induced by IND (B ) in S = (U , A, V , f ) denoted by U / B and the equivalence class in the partition U / B containing x ∈ U , denoted by [x ]B . Given arbitrary subset X ⊆ U , in general, X as union of some equivalence classes in U might be not presented. It means that, it may not be possible to describe X precisely in AS . X might be characterized by a pair of its approximations, called lower and upper approximations. It is here that the notion of rough set emerges. 3.3 Set Approximations
The indiscernibility relation will be used next to define approximations, basic concepts of rough set theory. The notions of lower and upper approximations of a set can be defined as follows. Definition 3.2. Let S = (U , A, V , f ) be an information system, let B be any subset of A
and let X be any subset of U. The B-lower approximation of X, denoted by B( X ) and B-upper approximations of X, denoted by B( X ) , respectively, are defined by
{
B( X ) = x ∈ U
[x]
B
}
{
⊆ X and B( X ) = x ∈ U
[x]
B
}
I X ≠φ .
The accuracy of approximation (accuracy of roughness) of any subset X ⊆ U with respect to B ⊆ A , denoted α B ( X ) is measured by
α B (X ) =
B( X ) B( X )
,
Rough Set Approach for Attributes Selection
513
where X denotes the cardinality of X. For empty set φ , α B (φ ) = 1 is defined.
Obviously, 0 ≤ α B ( X ) ≤ 1 . If X is a union of some equivalence classes of U, then
α B ( X ) = 1 . Thus, the set X is crisp (precise) with respect to B. And, if X is not a union of some equivalence classes of U, then α B ( X ) < 1 . Thus, the set X is rough
(imprecise) with respect to B [11]. This means that the higher of accuracy of approximation of any subset X ⊆ U is the more precise (the less imprecise) of itself. Another important issue in database analysis is discovering dependencies between attributes. Intuitively, a set of attributes D depends totally on a set of attributes C, denoted C ⇒ D , if all values of attributes from D are uniquely determined by values of attributes from C. In other words, D depends totally on C, if there a functional dependency between values of D and C. The formal definition of attributes dependency is given as follows. Definition 3.3. Let S = (U , A, V , f ) be an information system and let D and C be any subsets of A. Attribute D is functionally depends on C, denoted C ⇒ D , if each value of D is associated exactly one value of C. 3.4 Dependency of Attributes
Since information system is a generalization of a relational database. A generalization concept of dependency of attributes, called a partial dependency of attributes is also needed. Definition 3.4. Let S = (U , A, V , f ) be an information system and let D and C be any
subsets of A. The dependency attribute D on C in a degree k (0 ≤ k ≤ 1) , is denoted by C ⇒ k D, where k = γ (C , D ) =
∑
X ∈U / D
U
C(X )
.
(1)
Obviously, 0 ≤ k ≤ 1 . If all set X are crisp, then k = 1 . The expression ∑ X ∈U / D C ( X ) , called a lower approximation of the partition U / D with respect to C, is the set of all elements of U that can be uniquely classified to blocks of the partition U / D , by means of C. D is said to be fully depends (in a degree of k) on C if k = 1 . Otherwise, D is partially depends on C. Thus, D fully (partially) depends on C, if all (some) elements of the universe U can be uniquely classified to equivalence classes of the partition U / D , employing C. 3.5 Reducts and Core
A reduct is a minimal set of attributes that preserve the indiscernibility relation. A core is the common parts of all reducts. In order to express the above idea more precisely, some preliminaries definitions are needed.
514
N. Senan et al.
Definition 3.5. Let S = (U , A, V , f ) be an information system and let B be any subsets of A and let a belongs to B. It say that a is dispensable (superfluous) in B if U / (B − {b}) = U / B , otherwise a is indispensable in B.
To further simplification of an information system, some dispendable attributes from the system can be eliminated in such a way that the objects in the table are still able to be discerned as the original one. Definition 3.6. Let S = (U , A, V , f ) be an information system and let B be any subsets of A. B is called independent (orthogonal) set if all its attributes are indispensable. Definition 3.7. Let S = (U , A, V , f ) be an information system and let B be any subsets
of A. A subset B * of B is a reduct of B if B * is independent and U / B* = U / B .
Thus a reduct is a set of attributes that preserves partition. It means that a reduct is the minimal subset of attributes that enables the same classification of elements of the universe as the whole set of attributes. In other words, attributes that do not belong to a reduct are superfluous with regard to classification of elements of the universe. While computing equivalence classes is straighforward, but the problem of finding minimal reducts in information systems is NP-hard. Reducts have several important properties. One of them is a core. Definition 3.8. Let S = (U , A, V , f ) be an information system and let B be any subsets of A. The intersection off all reducts of is called the core of B, i.e.,
Core(B ) =
I Red(B ) ,
Thus, the core of B is the set off all indispensable attributes of B. Because the core is the intersection of all reducts, it is included in every reduct, i.e., each element of the core belongs to some reduct. Thus, in a sense, the core is the most important subset of attributes, for none of its elements can be removed without affecting the classification power of attributes.
4 The Modeling Process In this section, the process of this study is presented. There are eight main phases which are data acquisition, sound editing, data representation, feature extraction, data discretization, data cleansing, feature selection using proposed technique and feature validation via classification. Figure 1 illustrates the phases of this process. To conduct this study, the pre-processing steps (from data representation to feature selection using proposed model) are implemented in MATLAB version 7.6.0.324 (R2008a). Weka [10] is used for feature validation via classification using various techniques. All these
Rough Set Approach for Attributes Selection
515
processes are executed on a processor Intel Core 2 Duo CPUs. The total main memory is 2 gigabytes and the operating system is Windows Vista. The details of the modeling process as follows: 4.1 Data Acquisition, Sound Editing, Data Representation and Feature Extraction
The 150 sounds samples of Traditional Malay musical instruments were downloaded from personal [14] and Warisan Budaya Malaysia web page [15]. The dataset comprises four different families which are membranophones, idiophones, aerophones and chordophones. The distribution of the sounds into families is shown in Table 1. This original dataset is non-benchmarking (real work) data. It is well-known that the quality of the data is one of the factors that might affect the overall classification task. For that, the dataset is firstly edited and trimmed. Afterwards, two categories of features schemes which are perception-based and MFCC features were extracted. All 37 extracted features from these two categories are shown in Table 2. The first 1-11 features represent the perception-based features and 12-37 are MFCC’s features. The mean and standard deviation were then calculated for each of these features. The details of these phases can be found in [16].
Fig. 1. The modelling process for feature selection of the Traditional Malay musical instruments sounds classification
516
N. Senan et al. Table 1. Data Sets
Family Membranophone
Instrument Kompang, Geduk, Gedombak, Gendang, Rebana, Beduk, Jidur, Marwas, Nakara Gong, Canang, Kesi, Saron, Angklung, Caklempong, Kecerik, Kempul, Kenong, Mong, Mouth Harp Serunai, Bamboo Flute, Nafiri, Seruling Buluh Rebab, Biola, Gambus
Idiophone
Aerophone Chordophone
Table 2. Features Descriptions
Numb er 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
Feature ZC MEANZCR STDZCR MEANRMS STDRMS MEANC STDC MEANB STDB MEANFLUX STDFLUX MMFCC1 MMFCC2 MMFCC3 MMFCC4 MMFCC5 MMFCC6 MMFCC7 MMFCC8 MMFCC9 MMFCC10 MMFCC11 MMFCC12 MMFCC13
Description Zero Crossing Mean of Zero Crossings Rate Standard Deviation of Zero Crossings Rate Mean of Root-Mean-Square Standard Deviation of Root-Mean-Square Mean of Spectral Centroid Standard Deviation of Spectral Centroid Mean of Bandwidth Standard Deviation of Bandwidth Mean of Flux Standard Deviation of Flux Mean of the MFCCs #1 Mean of the MFCCs #2 Mean of the MFCCs #3 Mean of the MFCCs #4 Mean of the MFCCs #5 Mean of the MFCCs #6 Mean of the MFCCs #7 Mean of the MFCCs #8 Mean of the MFCCs #9 Mean of the MFCCs #10 Mean of the MFCCs #11 Mean of the MFCCs #12 Mean of the MFCCs #13
Rough Set Approach for Attributes Selection
517
Table 2. (continued)
25 26 27 28 29 30 31 32 33 34 35 36 37
SMFCC1 SMFCC2 SMFCC3 SMFCC4 SMFCC5 SMFCC6 SMFCC7 SMFCC8 SMFCC9 SMFCC10 SMFCC11 SMFCC12 MFCC13
Standard Deviation of the MFCCs #1 Standard Deviation of the MFCCs #2 Standard Deviation of the MFCCs #3 Standard Deviation of the MFCCs #4 Standard Deviation of the MFCCs #5 Standard Deviation of the MFCCs #6 Standard Deviation of the MFCCs #7 Standard Deviation of the MFCCs #8 Standard Deviation of the MFCCs #9 Standard Deviation of the MFCCs #10 Standard Deviation of the MFCCs #11 Standard Deviation of the MFCCs #12 Standard Deviation of the MFCCs #13
4.2 Data Discretization
The features (attributes) extracted in the dataset is in the form of continuous value with non-categorical features (attributes). In order to employ the rough set approach in the proposed technique, it is essential to transform the dataset into categorical ones. For that, the discretization technique known as the equal width binning in [9] is applied. In this study, this unsupervised method is modified to be suited in the classification problem. The algorithm first sort the continuous valued attribute, then the minimum x min and the maximum x max of that attribute is determined. The interval width, w, is then calculated by:
w=
xmax − xmin , k*
where, k * is a user-specified parameter for the number of intervals to discretize of each target class. The interval boundaries are specified as xmin + wi , where i = 1,2,L, k − l . In this study, the k value is assigned to 3. Afterwards, the equal frequency binning method is used to divide the sorted continuous values into k interval where each interval contains approximately n / k data instances with adjacent values of each class. 4.3 Data Cleansing Using Rough Set
As mentioned in Section 1, the dataset used in this study is raw data obtained from multiple resources (non-benchmarking data). In sound editing and data representation phases, the reliability of the dataset used have been assessed. However, the dataset may contain irrelevant features. Generally, the irrelevant features present in the dataset are features that having no impact on processing performance. However, the
518
N. Senan et al.
existence of these features in the dataset might increase the response time. For that, in this phase, the data cleansing process based on rough sets approach explained in subsection 3.5 is performed to eliminate the irrelevant features from the dataset. 4.4 The Proposed Technique
In this phase, the construction of the feature selection technique using rough set approximation in an information system based on dependency of attributes is presented. The idea of this technique is derived from [7]. The relation between the properties of roughness of a subset X ⊆ U with the dependency between two attributes is firstly presented as in Proposition 3.1. Proposition 3.1. Let S = (U , A,V , f ) be an information system and let D and C be any subsets of A. If D depends totally on C, then
α D (X ) ≤ αC (X ) , for every X ⊆ U . Proof. Let D and C be any subsets of A in information system S = (U , A,V , f ) . From
the hypothesis, the inclusion IND(C ) ⊆ IND(D ) holds. Furthermore, the partition U / C is finer than that U / D , thus, it is clear that any equivalence class induced by IND (D ) is a union of some equivalence class induced by IND(C ) . Therefore, for every x ∈ X ⊆ U , the property of equivalence classes is given by
[x]
C
⊆ [x ]D .
Hence, for every X ⊆ U , we have the following relation D( X ) ⊆ C ( X ) ⊂ X ⊂ C ( X ) ⊆ D ( X ) . Consequently,
α
D
(X ) =
D( X ) D( X )
≤
C(X ) C(X )
=α
C
(X ) .
The generalization of Proposition 3.1 is given below. Proposition 3.2. Let S = (U , A,V , f ) be an information system and let C1 , C2 ,L, Cn
and D be any subsets of A. If
C1 ⇒ k1 D, C 2 ⇒ k2 D,L, Cn ⇒ kn D , where
k n ≤ k n−1 ≤ L ≤ k 2 ≤ k1 , then
α D ( X ) ≤ α Cn ( X ) ≤ α Cn −1 ( X ) ≤ L ≤ α C2 ( X ) ≤ α C1 ( X ) , for every X ⊆ U .
Rough Set Approach for Attributes Selection
519
Proof. Let C1 , C2 ,L, Cn and D be any subsets of A in information system S. From the hypothesis and Proposition 3.1, the accuracies of roughness are given as
α D ( X ) ≤ α C1 ( X )
α D ( X ) ≤ α C2 ( X ) M
α D ( X ) ≤ α Cn ( X ) Since k n ≤ k n−1 ≤ L ≤ k 2 ≤ k1 , then
[x] [x]
Cn
C n −1
[x]
C2
⊆ [x ]Cn −1
⊆ [x ]Cn−2 M ⊆ [x]C1 .
Obviously,
α D ( X ) ≤ α Cn ( X ) ≤ α Cn −1 ( X ) ≤ L ≤ α C2 ( X ) ≤ α C1 ( X ) .
□
Figure 2 shows the pseudo-code of the proposed technique. The technique uses the dependency of attributes in the rough set theory in information systems. It consists of five main steps. The first step deals with the computation of the equivalence classes of each attribute (feature). The equivalence classes of the set of objects U can be obtained using the indiscernibility relation of attribute ai ∈ A in information
system S = (U , A,V , f ) . The second step deals with the determination of the dependency degree of attributes. The degree of dependency attributes can be determined using formula in equation (1). The third step deals with selecting the maximum dependency degree. Next step, the attribute is ranked with the ascending sequence based on the maximum of dependency degree of each attribute. Finally, all the redundant attributes with similar maximum of dependency degree are deleted. Algorithm: FSDA Input: Data set with categorical value Output: Selected non-redundant attribute Begin Step 1. Compute the equivalence classes indiscernibility relation on each attribute.
using
the
ai
with
Step 2. Determine the dependency degree of attribute respect to all
aj,
where
i ≠ j.
Step 3. Select the maximum of dependency degree of each attribute. Step 4. Rank the attribute with ascending sequence based on the maximum of dependency degree of each attribute. Step 5. Delete the redundant attribute with similar value of the maximum of dependency degree of each attribute. End
Fig. 2. The FSDA algorithm
520
N. Senan et al.
The example to find the degree of dependency of attributes of an information system based on formula in equation (1) will be illustrated as in Example 4.1. Example 4.1. To illustrate in finding the degree of dependency of attributes, the information system as shown in Table 3 is considered. From Table 3, based on each attribute, there are four partitions of U induced by indiscernibility relation on each attribute, i.e.
U / A = {{1,2,5}, {3,4}} , U / B = {{1}, {2,3,4,5}} ,
U / C = {{1,2,3,4}, {5}} and U / D = {{1}, {2,5}, {3,4}} . Based on formula in equation (1), the degree of dependency of attribute B on attribute A, denoted A ⇒ k B , can be calculated as follows. A ⇒k B , k =
∑
X ∈U / B
A( X )
U
=
{3,4}
{1,2,3,4,5}
= 0 .4 .
Using the same way, the following degrees are obtained B ⇒k C , k = C ⇒k D , k =
∑
X ∈U / C
=
{1}
= 0 .2 .
{1,2,3,4,5} C (X ) {5} = = 0.2 . U {1,2,3,4,5}
U
∑
B( X )
X ∈U / D
The degree of dependency of all attributes of Table 2 can be summarized as in Table 3. Table 3. The degree of dependency of attributes of Table 2
Attribute (Depends on) A B C D
Degree of dependency
B 0.2 A 0.4 A 0.4 A 0.4
C 0.2 C 0.2 B 0.2 B 0.2
D 1 D 1 D 0.6 C 0.2
Maximum Dependency of Attribute 1 1 0.6 0.4
From Table 3, the attributes A, B, C and D are ranked based on the maximum degree of dependency. It can be seen that the attributes A and B have similar maximum degree of dependency. In order to select the best attributes and reduce the dimensionality respectively, only one of the redundant attributes will be chose. To do
Rough Set Approach for Attributes Selection
521
this, the selection approach in [7] is adopted where it is suggested to look at the next highest of maximum degree of dependency within the attributes that are bonded and so on until the bind is broken. In this example, attribute A is deleted from the list. 4.5 Feature Evaluation via Classification
The performance of the selected features generated from sub-section 4.4 is then further evaluated using three different classifiers which are SVM, k-NN and Naïve Bayes. Each classifier performs based on 10-fold cross validation. SMO algorithm in Weka[10] is applied for SVM classifier, with RBF kernels and a C value of 100. The kernel estimation during training is utilized for Naïve Bayes classifier. The 1-NN is used in this study. All these classifiers are executed as multiple classifiers in Weka [10]. The accuracy rate achieved by the classifiers is then analysed to identify the effectiveness of the selected features. Achieving a high accuracy rate is important to ensure that the selected features are the best relevance features that perfectly serve to the classification architecture which able to produce a good result. At the end of this phase, the result is compared between full features and selected features. This is done in order to identify the effectiveness of the selected features in handling classification problem.
5 Results and Discussion The aim of this study is to identify the best features using the proposed technique. Afterwards, the performance of the selected features is evaluated using three different classifiers which are SVM, 1-NN and Naïve Bayes. The assessment of the performance is based on the accuracy rate obtained. From the data cleansing step, it is found that {MMFCC1, SMFCC1} is the dispensable (irrelevant) set of features. It is means that the number of the relevant features is 35 out of 37 of original full features. After that, the proposed technique is employed to identify the best features set. As demonstrated in Table 4, all the 35 relevant features are ranked in ascending sequence based on the value of the maximum degree of attribute dependency. From the table, it is interesting to see that some of the features adopted in this study are redundant. In order to reduce the dimensionality of the dataset, only one of these redundant features is selected. It is revealed that the propose feature selection technique able to select the best 17 features out of 35 features available. The best selected features are given in Table 5. After that, two datasets which consists of the full features and the selected features (generated from the propose technique) are used as an input to classify the Traditional Malay musical instruments sounds into four families which are membranophone, idiophone, chordophone and aerophone. This approach is meant to assess the performance of the best 17 selected features as compared to the 35 full features. For that, three different classifiers which are SVM, 1-NN and Naïve Bayes are exploited. From Table 6, the finding shows that for all three classifiers, the dataset with full features achieve classification accuracy above 95%. From these classifiers, 1-NN produces highest accuracy rate with 99.46% followed by SVM with 98.39% and
522
N. Senan et al.
Naïve Bayes with 94.98%. Even the classification accuracy decreases along with the number of features (selected features) in SVM and Naïve Bayes, the classification accuracy is still quite satisfactory with the overall performance achieved above 90%. Table 4. Feature Ranking using Proposed Method
Number of Features 3 36 23 24 22 30 31 1 37 32 33 34 35 27 29 11 21 20 6 19 18 28 7 8 9 13 16 17 5
Name of Features STDZCR SMFCC12 MMFCC12 MMFCC13 MMFCC11 SMFCC6 SMFCC7 ZC SMFCC13 SMFCC8 SMFCC9 SMFCC10 SMFCC11 SMFCC3 SMFCC5 STDFLUX MMFCC10 MMFCC9 MEANC MMFCC8 MMFCC7 SMFCC4 STDC MEANB STDB MMFCC2 MMFCC5 MMFCC6 STDRMS
Maximum Degree of Dependency of Attributes 0.826165 0.655914 0.52509 0.52509 0.237455 0.208781 0.208781 0.193548 0.1819 0.108423 0.108423 0.108423 0.108423 0.087814 0.087814 0.077061 0.077061 0.074373 0.065412 0.065412 0.056452 0.056452 0.042115 0.042115 0.042115 0.031362 0.031362 0.031362 0.021505
Rough Set Approach for Attributes Selection
Table 4. (continued)
10 2 4 14 15 26
MEANFLUX MEANZCR MEANRMS MMFCC3 MMFCC4 SMFCC2
0.011649 0 0 0 0 0
Table 5. The Best Selected Features
Number of Features
Name of Features
3 36 23 22 30 1 37 32 27 11 20 6 18 7 13 5 10
STDZCR SMFCC12 MMFCC12 MMFCC11 SMFCC6 ZC SMFCC13 SMFCC8 SMFCC3 STDFLUX MMFCC9 MEANC MMFCC7 STDC MMFCC2 STDRMS MEANFLUX
Maximum Degree of Dependency of Attributes 0.826165 0.655914 0.52509 0.237455 0.208781 0.193548 0.1819 0.108423 0.087814 0.077061 0.074373 0.065412 0.056452 0.042115 0.031362 0.021505 0.011649
Table 6. The Comparison of Features Performance of Three Classifiers
Features
SVM
All 35 Best 17
98.39% 92.92%
Naïve Bayes 94.98% 93.73%
1-NN 99.46% 99.82%
523
524
N. Senan et al.
Surprisingly, it is fascinating to see that the performance of the selected features is increases in 1-NN and able to produce highest classification rate up to 99.82%. These results show that the propose feature selection technique able to select the best features for Traditional Malay musical instruments sounds.
6 Conclusion In this study, the effectiveness of the propose feature selection technique based on rough set theory for the problem of Traditional Malay musical instruments sounds is examined. To perform this task, two categories of features schemes which are perception-based and MFCC which consist of 37 attributes are extracted. In order to utilize rough set theory, the equal width and equal frequency binning discretization technique is then employed to transform the original dataset with continuous-value (non-categorical features) into categorical form. The proposed technique is then adopted for feature selection through feature ranking and dimensionality reduction. Finally, three classifiers which are SVM, 1-NN and Naïve Bayes in Weka is used to evaluate the performance of the selected features in terms of the accuracy rate obtained. The finding shows that the propose technique able to generate 17 best features from 35 full features successfully. The result of low performance of the selected features in SVM and Naïve Bayes classifiers shows that it may not scale well to certain classifiers and dataset used. However, with the highest classification rate produce with 1-NN classifier, it shows that the propose feature selection technique able to identify the most relevant features and reduce the complexity process correspondingly. In addition, it also indicates that this propose technique might well perform with others classifiers and dataset. All these show that more studies is needed to overcome these problems. Thus, the future work will investigate the effectiveness of the proposed technique towards other musical instruments sounds domain and apply different types of classifiers to validate the performance of the selected features.
Acknowledgements This work was supported by the Universiti Tun Hussein Onn Malaysia (UTHM). Special thanks to Dr. Musa Mohd Mokji from Faculty of Electrical Engineering, Universiti Teknologi Malaysia for his kindness in teaching the Matlab programming for Digital Signal Processing.
References 1. Guyo, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003) 2. Pawlak, Z.: Rough Sets. International Journal of Computer and Information Science 11, 341–356 (1982) 3. Banerjee, M., Mitra, S., Anand, A.: Feature Selection using Rough Sets. In: Banerjee, M., et al. (eds.) Multi-Objective Machine Learning. SCI, vol. 16, pp. 3–20. Springer, Heidelberg (2006)
Rough Set Approach for Attributes Selection
525
4. Czyzewski, A.: Soft Processing of Audio Signals. In: Polkowski, L., Skowron, A. (eds.) Rough Sets in Knowledge Discovery: 2: Applications, Case Studies and Software Systems, pp. 147–165. Physica Verlag, Heidelberg (1998) 5. Wieczorkowska, A.: Rough Sets as a Tool for Audio Signal Classification. In: Ras, Z.W., Skowron, A. (eds.) ISMIS 1999. LNCS, vol. 1609, pp. 365–375. Springer, Heidelberg (1999) 6. Kostek, B., Szczuko, P., Zwan, P.: Processing of Musical Data Employing Rough Sets and Artificial Neural Networks. In: Tsumoto, S., et al. (eds.) RSCTC 2004. LNCS (LNAI), vol. 3066, pp. 539–548. Springer, Heidelberg (2004) 7. Herawan, T., Mustafa, M.D., Abawajy, J.H.: Rough set approach for selecting clustering attribute. Knowledge Based Systems 23(3), 220–231 (2010) 8. Senan, N., Ibrahim, R., Nawi, N.M., Mokji, M.M.: Feature Extraction for Traditional Malay Musical Instruments Classification. In: Proceeding of International Conference of Soft Computing and Pattern Recognition, SOCPAR 2009, pp. 454–459 (2009) 9. Palaniappan, S., Hong, T.K.: Discretization of Continuous Valued Dimensions in OLAP Data Cubes. International Journal of Computer Science and Network Security 8, 116–126 (2008) 10. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009) 11. Liu, M., Wan, C.: Feature Selection for Automatic Classification of Musical Instrument Sounds. In: Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2001, pp. 247–248 (2001) 12. Deng, J.D., Simmermacher, C., Cranefield, S.: A Study on Feature Analysis for Musical Instrument Classification. IEEE Transactions on System, Man, and Cybernetics-Part B: Cybernetics 38(2), 429–438 (2008) 13. Benetos, E., Kotti, M., Kotropoulus, C.: Musical Instrument Classification using NonNegative Matrix Factorization Algorithms and Subset Feature Selection. In: Proceeding of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2006, vol. 5, pp. 221–224 (2006) 14. Warisan Budaya Malaysia: Alat Muzik Tradisional, http://malaysiana.pnm.my/kesenian/Index.htm 15. Shriver, R.: Webpage, http://www.rickshriver.net/hires.htm 16. Senan, N., Ibrahim, R., Nawi, N.M., Mokji, M.M., Herawan, T.: The Ideal Data Representation for Feature Extraction of Traditional Malay Musical Instrument Sounds Classification. In: Huang, D.-S., et al. (eds.) ICIC 2010. LNCS, vol. 6215, pp. 345–353. Springer, Heidelberg (2010)
Pairwise Protein Substring Alignment with Latent Semantic Analysis and Support Vector Machines to Detect Remote Protein Homology Surayati Ismail1, Razib M. Othman1,*, and Shahreen Kasim2 1
Laboratory of Computational Intelligence and Biotechnology, Universiti Teknologi Malaysia, 81310 UTM Skudai, Malaysia 2 Department of Web Technology Faculty of Computer Science and Information Technology Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Batu Pahat, Malaysia [email protected], [email protected], [email protected]
Abstract. Remote protein homology detection has been widely used as a part of the analysis of protein structure and function. In this study, the good quality of protein feature vectors is the main aspect to detect remote protein homology; as it will assist discriminative classifier model to discriminate all the proteins into homologue or non-homologue members precisely. In order for the protein feature vectors to be characterized as having good quality, the feature vectors must contain high protein structural similarity information and are represented in low dimension which is free from any contaminated data. In this study, the contaminated data which originates from protein dataset was investigated. This contaminated data may prevent remote protein homology detection framework to produce the best representation of high protein structural similarity information in order to detect the homology of proteins. To reduce the contaminated data and extract high protein structural similarity information, some research has been done on the extraction of protein feature vectors and protein similarity. The extraction of protein feature vectors of good quality is believed could assist in getting better result for remote protein homology detection. Where, the good quality of protein feature vectors containing the useful protein similarity information and represent in low dimension will be used to identify protein family precisely by discriminative classifier model. Referring to this factor, a method which combines Protein Substring Scoring (PSS) and Pairwise Protein Substring Alignment (PPSA) from sequence comparison model, chi-square and Singular Value Decomposition (SVD) from generative model, and Support Vector Machine (SVM) as discriminative classifier model is introduced. Keywords: Remote Protein Homology Detection; Protein Substring Scoring; Pairwise Protein Substring Alignment; Latent Semantic Analysis; Support Vector Machines. *
Corresponding author.
T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 526–546, 2011. © Springer-Verlag Berlin Heidelberg 2011
PPSA with Latent Semantic Analysis and SVM to Detect Remote Protein Homology
527
1 Introduction We would like to draw your attention to the fact that it is not possible to modify a paper in any way, once it has been published. This applies to both the printed book and the online version of the publication. Every detail, including the order of the names of the authors, should be checked before the paper is sent to the Volume Editors. Remote protein homology detection is one of the methods that have been used widely by researchers to manage protein sequences by classifying proteins into their respectively family [14]. Protein family classification is a technique which able to assist in giving clues to help cure any genetic diseases and to perform drug design [4, 5, 6]. Nowadays, many methods in remote protein homology detection have been developed. Every method has their own ways to produce the best result. Most methods focus on handling certain problem in order to achieve better result, such as handling hard-to-align proteins [13], improving the sensitivity in sequence comparison model to detect protein similarity [4], and handling complex classification [20]. Basically, remote protein homology detection can be divided into three models [16]: sequence comparison model, generative model and discriminative classifier model. In sequence comparison model, the example method such as Smith-Waterman algorithm is a well-known algorithm for pairwise sequence comparison because of its ability to produce more accurate results which prioritize efficiency [8]. Nowadays many methods apply Smith-Waterman algorithm such as Zaki and Deris [22], Mohseni-Zadeh et al. [17], and Liao and Noble [16]. Other sequence comparison models are BLAST [1] and FASTA [18], which prioritize accuracy in their methods. A generative model such as Hidden Markov Model (HMM) and Latent Semantic Analysis (LSA) have demonstrated remote protein homology detection with great success. HMM is capable to handle big protein dataset, whereas LSA is capable to handle high dimensional protein feature vectors. The discriminative classifiers algorithm such as Support Vector Machine (SVM) has been used to separate each given structural protein class from the ‘rest of the world’ [2]. SVM will discriminate positive and negative members with the appropriate kernel. Similar with other models, SVM also able to handle certain problems such as SVM-Fisher [12] which is developed to handle multi-domain, SVM-String-Scoring (SVM-SS, Zaki and Deris [22]) is developed to handle big dataset, and SVM-Pattern-LSA [7] is developed to handle noisy data. All of these methods have been successful in producing improved results as compared to the other methods. The success of SVM in classification actually depends on the choice of the quality protein feature vectors to describe each similarity of protein [8]. In order to identify protein feature vectors of good quality, further effort is needed, which focuses on finding good representation of similarity between protein sequences. The major problems associated with the effort to find good representation of similarity between sequences are the lack of protein similarity information and high dimensional protein feature vectors caused by redundant and noisy data These are the problems which normally prevent SVM to produce precise result. In order to get a maximum margin that will lead to produce more homologue protein members, string kernel is applied in SVM; whereas to extract high protein similarity information, Protein Substring Scoring (PSS) and Pairwise Protein Substring Alignment (PPSA) are applied. These methods extract high protein similarity information by checking region by region of
528
S. Ismail, R.M. Othman, and S. Kasim
protein sequence. On the other hand, redundant and noisy data which caused high dimensional protein feature vectors is reduced by using chi-square and Singular Value Decomposition (SVD) respectively. Chi-square is the most effective feature selection method in document classification [21] whereas SVD [15] is an efficient feature extraction method. Example of methods that use SVD model are SVM-Ngram, SVMMotif [8], and SVM-Pattern-LSA [7]. These methods significantly improve the performance of remote protein homology detection. In this paper, three models are combined to detect remote protein homology. The three models comprise of sequence comparison model, generative model, and discriminative classifier model. Sequence comparison model uses PSS to split the protein sequence into protein substrings in order to assist PPSA in extracting protein similarity based on sensitive and non-sensitive regions [22]; generative model presents chi-square to reduce redundant data and SVD to remove noisy data which at the same time extracts and represents protein similarity information in protein feature vectors; discriminative classifier model presents the SVM method to discriminate homologous and non-homologous proteins [3]. To measure the quality of the results, Receiver Operating Characteristics (ROC), Median Rate of False Positives (MRFP), and family by family comparison of ROC scores are used. The ROC score is used to normalize area under a curve that plots true positives against false positives of different possible thresholds for classification. MRFP calculates the number of false positives scoring as high as or better than the median scoring true positives. Family by family comparison of ROC score is a comparison of ROC score of every family. The comparison is between the numbers of true positives against false positives of the 54 families with different possible thresholds for classification. Experimental results have shown that the use of LSA method has successfully produced better results compared to the other methods such as SVM-Ngram, SVM-Pattern, SVM-Motif, SVM-Ngram-LSA, SVM-Motif-LSA, SVM-Pattern-LSA, SVM-Fisher, and SVMString-Scoring.
2 Methods SVM-SS-LSA as shown in Fig. 1 is an enhancement method to detect remote protein homology. SVM-SS-LSA uses protein substring to check region by region of protein sequence in order to get the highest similarity information which is measured using PPSA method. The high similarity information will then be used to represent the good quality protein feature vectors in low dimension with less redundant and noisy data. In order to make protein feature vectors are represented in low dimension with less redundant and noisy data, chi-square and SVD from LSA is applied respectively. Chisquare algorithm [21] is used to reduce redundant data by removing any data based on chi-square independent value using distribution with one degree of freedom to judge extremeness whereas, SVD [12] is performed on the similarity information to remove noisy data and produce protein feature vectors. Then, the representation of good quality protein feature vectors in low dimension will be classified by SVM.
PPSA with Latent Semantic Analysis and SVM to Detect Remote Protein Homology
529
Fig. 1. Overview of SVM-SS-LSA method
3 Dataset In order to prove the performance of SVM-SS-LSA, standard evaluation datasets [16] are used. The datasets are taken from the SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) database version 1.53. These datasets are measured using E-value threshold of 10-25 to choose protein sequences. The algorithm yields 4352 distinct protein sequences grouped by families and superfamilies. The protein sequences within family and in the same superfamily are taken as positive training examples. Whereas, negative training examples are taken from outside the family’s fold. Details about the complete datasets and the various families can be found at http://www.cs.columbia.edu/compbio/ svmpairwise.
4 Sequence Comparison Model 4.1 Protein Substrings The purpose of producing protein substrings is to get all the possible substrings of amino acid based on sensitive and non-sensitive regions. To get protein substrings,
530
S. Ismail, R.M. Othman, and S. Kasim
Protein Substring Scoring (PSS) method [22] is used. The method is implemented by simply sliding a window of a length k > 1 over the protein sequences. The method is illustrated as follows: Example of protein sequence: >d3sdha_ 1.1.1.1.1 Hemoglobin I {Ark clam (Scapharca inaequivalvis)} SVYDAAAQLTADVKKDLRDSWKVIGSDKKGNGVALMTTLFADNQETI Assume k =15, SVYDAAAQLTADVKKDLRDSWKVIGSDKKGNGVALMTTLFADNQETI SVYDAAAQLTADVKKDLRDSWKVIGSDKKGNGVALMTTLFADNQETI SVYDAAAQLTADVKKDLRDSWKVIGSDKKGNGVALMTTLFADNQETI The yields of 4 protein substrings are as follows: 1. SVYDAAAQLTADVKK 2. DLRDSWKVIGSDKKG 3. NGVALMTTLFADNQETI For the last sliding of amino acid in the protein sequence, if the balance of amino acid k<15 it will be included in the previous protein substring. 4.2 Pairwise Protein Substring Alignment Pairwise protein substring alignment (PPSA) is used to compare a protein substring with the aims of inferring structural, functional, and evolutionary relationships. In this study, the pairwise protein substring alignment based on Smith-Waterman algorithm is used. The Smith-Waterman is a one of the algorithms from sequence comparison model. The main purpose of PPSA is to determine an optimal alignment region by region between two protein substrings. The Smith-Waterman algorithm also known as dynamic programming used to determine distance or similarity between protein sequences. The distance is defined as the number of insertions, deletions, and replacements of characters. The Smith-Waterman algorithm is chosen because it produces accurate results based on sensitivity and selectivity aspects [1]. In this study, protein substrings produced by the PSS method is used as input. The PPSA process will begin by calculating the structural similarity score between protein substrings and searching for the optimal alignment by tracing back the similarity matrix. To make it more clearly, the process of the PPSA is shown as below: For two protein substrings A and B, the length of A is g, |A|= g; the length of B is m, |B| = m; V(d,e) is the optimal alignment score of two protein substrings A[1]…A[d] and B[1]…B[e], the calculation of V(d,e) is defined as Equation 1 and Equation 2: Initialization:
⎧⎪V ( d , 0 ) = 0, 0 ≤ d ≤ g . ⎨ ⎪⎩V ( 0, e ) = 0, 0 ≤ e ≤ m Recursion relation:
(1)
PPSA with Latent Semantic Analysis and SVM to Detect Remote Protein Homology
531
⎧0 ⎪ ⎪V ( d − 1, e − 1) + ( A[ d ] , B [ e]) V ( d , e ) = max ⎨ ,1 ≤ d ≤ g ,1 ≤ e ≤ g . − + − V d e A d 1, , ( ) [ ] ( ) ⎪ ⎪ ⎩V ( d , e − 1) + ( −, B [ e])
(2)
In these formulas, a “-” stands for a null character or gap; V(d,0) stands for the result of comparing each character in A with a gap in B; the definition of V(0,e) is the counterpart of the comparison of each character in B with a gap in A; and (A[d],B[e]) is the value of substitution matrix. While calculating the similarity matrix, the score of any matrix element V(d,e) always depends on the score of three other elements: the up-left neighbor element V(d-1,e-1), the left neighbor V(d,e-1), and the up neighbor V(d-1,e). Therefore, the calculation protein substring begins from the top-left element to the bottom-right element according to the direction as shown by the arrow. Through the observation of the similarity matrix calculation process, we found that for each clock cycle, every element on an anti-diagonal line marked with the same number could be calculated simultaneously, with the standing for the elements that could be calculated at the same time. To further describe the level of similarity between two protein substrings, an affine gap model was introduced to the Smith-Waterman algorithm by Gotoh [11]. In the affine gap model, the gap is used to compensate for the insertion or deletion, to make the alignment more condensed in satisfying an expecting model. The gap is usually a consecutive null character string in a sequence and should be as long as possible. In the affine gap model, the penalty score for the first gap is called gap open, and the penalty score for the following gaps are called the gap extension. According to the affine gap model, the formulas to calculate the similarity matrix are described below: Initialization:
⎧⎪V ( d , 0 ) = E ( d , 0 ) = 0, 0 ≤ d ≤ g . ⎨ = = ≤ ≤ V 0, e F 0, e 0, 0 e m ( ) ( ) ⎪⎩
(3)
Recursion relation:
⎧0 ⎪ ⎪E ( d , e) V ( d , e ) = max ⎨ ,1 ≤ d ≤ g ,1 ≤ e ≤ m , ⎪F (d , e) ⎪V ( d − 1, e − 1) + ( A [ d ] , B [ e ]) ⎩ ⎧⎪V ( d , e − 1) − α ,1 ≤ d ≤ g ,1 ≤ e ≤ m , E ( d , e ) = max ⎨ ⎪⎩ E ( d , e − 1) − β
(4)
(5)
532
S. Ismail, R.M. Othman, and S. Kasim
⎧⎪V ( d − 1, e ) − α F ( d , e ) = max ⎨ ,1 ≤ d ≤ g ,1 ≤ e ≤ m . ⎪⎩ F ( d − 1, e ) − β
(6)
In these formulas, α stands for the gap open, and β stands for the gap extension. E(d,e) and F(d,e) are the maxima of the following two items: open a new gap or keep extending an existing gap. The protein substrings produced from a split method are presented in the jdimensional protein feature vectors where j is the total number of protein substrings. All the protein substrings are scored against the protein substrings of interest. The alignment scores are based on the notion of distance, counting the number of transformation from one protein substring is required to obtain the second protein substring. Transformations include substituting one character for another, inserting a string of characters, or deleting a string of characters. After the alignment of all protein substrings are implemented, the protein structural similarity score for every alignment is need to be found. The calculation of the score is referred to the substitution matrix defined by the BLOSUM50 matrix. After the protein structural similarity scores for every alignment are calculated and the optimal alignment has been gained. The optimal alignment is gained from the highest protein structural similarity score which is chosen from several possibility alignments for two protein substrings and it will be used for the next process.
5 Generative Model 5.1 Protein Words Like strings of letters and words in a text, protein sequences are linear chains of amino acids. The linear chains can be one of 20 residue standards of amino acids, labeled as (A C D E F G H I K L M N P Q R S T V W Y). The string of amino acids is called words of protein. In LSA, each protein substring that belongs to a particular class is treated as a document which contains words of protein. This is composed of bags-of-X, where X can be any basic building blocks of protein substring [7]. Words of protein are not as clear as words in a natural language due to the absence of any language information. In this paper, a protein pattern is selected to produce the basic building blocks for the document. The protein pattern uses a ‘don’t care’ symbol to represent any 20 residues of amino acids, example: protein pattern C. . . L H is present in both CVWLH and CKELH, where C, L and H are solid characters. The ‘.’ symbol is denoted as ‘don’t care’. This delegation can be shown as ∑ U{.} where
∑
is the set of 20 residues of amino acids and {.} is any residue of amino acids and gaps. Every pattern has their score. The score is calculated using score matrices which associate weight with each pair of characters protein pattern in a string is as follows:
∑ U∑ (∑ U{.}) ∑ ∗
,
(7)
PPSA with Latent Semantic Analysis and SVM to Detect Remote Protein Homology
533
∑ U{.} starts and ends with solid character. The ∑ ∗ or (∑ U{.})∗ ) is denoted by |l|, the letter at position i in l
where a string on the alphabet
length of a string l (in is denoted by l[i] and is shown as l= l[0] l[1]… l[|l |-1].
5.2 Protein Pattern Blocks
To extract protein patterns, the TEIRESIAS algorithm [19] is used. TEIRESIAS is implemented in two phases: scanning and convolution. In the scanning phase, elementary protein patterns with sufficient support are identified. The scanning process builds and located all elementary protein patterns with support at least K distinct sequence. In this case, L is the number of acid amino residue in protein subpattern and G is length of protein sub-pattern. Protein sub-pattern is protein substring of protein pattern F, that it itself is the protein pattern. Then, all elementary protein patterns constitute the building blocks for the convolution phase. The convolution phase makes it easy to identify and discard non-maximal protein patterns. To generate maximal protein patterns, all-against-all approach is used. Where, all the elementary protein patterns are combined into progressively larger and larger protein patterns until maximal protein patterns are generated. In this study, TEIRESIAS algorithm is applied by using testing and training protein sequences. The parameter settings used were L=3, G=35, and K=7. Parameter L was set to 3 because this is the smallest value for which the benefits of convolution become apparent as the prefixes and suffixes used are non-trivial. L>3 will affect the performance of the algorithm by decreasing the convolution time while increasing the scanning time. For the parameter G, larger value has been tried but no more substantial protein patterns were discovered. Whereas, K=7 is suitable with the value of L=3 because there is subset of at least K input sequences exhibiting extensive degree of similarity. By following these parameters, the results show that a total of 75811 protein patterns were extracted. The extracted protein patterns contain redundant information and this raises a problem for machine learning algorithms to perform well in a high-dimensional feature space using high dimensional protein feature vectors. The following sections will provide a detailed explanation of the procedure. 5.3 Chi-Square
With references to the previous problem, that redundant data exists in original dataset; it is desirable to reduce the dimension of the protein feature vectors by reducing redundant data. This problem can be solved by using the feature selection method such as document frequency, information gain, mutual information, chi-square, and term strength [21]. But in this study, the chi-square algorithm is selected because it is one of the most effective feature selection methods in document classification [21]. In the chi-square x 2 measures, the lack of independence between a feature t and a classification category c can be compared to the chi-square distribution with one degree of freedom to judge extremes. The chi-square x 2 value of feature category c is defined as follows:
t relative to
534
S. Ismail, R.M. Othman, and S. Kasim
N • ( A • D − C • B) x2 ( t , c ) = ( A + C ) • ( B + D) • ( A + B) • (C + D) 2
(8)
where N is the total number of documents, A is the number of times t and cooccur B is the number of times t occurs without c , C is the number of times c occurs without t , and D is the number of times neither c nor t occurs. The chi-square x 2 statistics has a natural value of zero if t and c are independent. Each category of chi-square x 2 is computed between each unique term in a training corpus within its category. Then, the category specific scores of each feature are combined into two scores as follows: m
2 xavg ( t ) = ∑ Pr ( ci ) x 2 ( t , ci )
(9)
2 xmax ( t ) = max im=1 { x 2 ( t , ci )} .
(10)
i =1
In this method, the maximum feature value is used since its performance is better than the average value. 5.4 Singular Values Decomposition
SVD performed on the protein patterns can remove the noisy data and leading to reduce the dimensions of the protein feature vectors in order to get the good quality representation of protein feature vectors [9]. In order to remove noisy data from the protein patterns, all the protein patterns need to be changed to protein feature vectors. In this study, the protein pattern similarity is treated as words and the protein sequences are viewed as the documents. Through collecting the weight of each word in the document, the word-document matrix is constructed and then the LSA is performed on the matrix to produce the protein feature vectors, leading to noise removal. To extract the good quality protein feature vectors with less noisy data, SVD needs to decompose the protein patterns into three components.
W = USJ T ,
(11)
where S is the R°R diagonal matrix of singular values, U is the C°R matrix of eigenvectors derived from the protein pattern correlation matrix given by WWT, and J is the D°R matrix of eigenvectors derived from the document-document correlation matrix given by WTW. Matrix W is leading to the dimensionality reduction. This occurs because SVD is able to make the similar data appears more similar. In the reduced versions of U and J, the protein feature vectors are more similar. The protein feature vectors contain components ordered from the most to the least amount of variation accounted for in the original data. By deleting the protein feature vectors representing the dimensions which do not exhibit meaningful variation which is only
PPSA with Latent Semantic Analysis and SVM to Detect Remote Protein Homology
the top
535
dimensions for which the elements in S are greater
than threshold are considered for further processing, and then effectively eliminate the noisy data in the representation of protein feature vectors. Thus, the dimensions of matrices U, S and J are reduced to C°R, R°R and D°R, leading to data noise removal. By deleting protein feature vectors representing dimensions which do not exhibit meaningful variation, in the same time the noisy data in the representation of protein feature vectors effectively eliminate. Now the protein feature vectors are transformed into leading to the low dimensional protein feature vectors, and contain only the elements that account for the most significant correlations among protein patterns in the original dataset.
6 Discriminative Classifier Model 6.1 Support Vector Machines
Support Vector Machines (SVM) is a class of machine learning based on statistical learning and has shown excellent performance in practice. SVM addresses the general problems of learning by analysing a linear decision boundary to discriminate between positive and negative members of a given class of n-dimensional vectors. The basic idea of applying SVM can be stated by mapping the n-dimensional vectors into a feature space which is relevant with the selection of the kernel function. Then, SVM constructs a hyper-plane to fit into the linear or non-linear curve which then separates these two classes of vectors with the maximum margin of separation. A maximum margin or an optimal hyper-plane is needed to lead maximal generalization when predicting classification of unlabeled example. The samples of training set E used in this method consists of input vectors
( xi , yi ) and they are shown as follows:
xi ∈ Rd ( i = 1,..., n )
(12)
yi ∈ {+1, −1}( i = 1,..., n )
(13)
with corresponding labels
where Rd refers to input space, and + 1 and − 1 are used to stand respectively for the two classes. If the analyses from the input consist of two-category target variables with two predictor variables, a linear classification rule f will be used by the pair
( w, b ) and
is shown as follows:
f ( x ) = ( w • xi ) + b
(14)
where x is classified as positive if f ( x ) > 0 and f ( x ) < 0 for negative data. Geometrically, the decision boundary is a hyper plane
536
S. Ismail, R.M. Othman, and S. Kasim
{ x ∈ Rd : w0 • x + b0 = 0} .
(15)
For analyses with more than two predictor variables, SVM needs to separate the points using non-linear classification. To do so, SVM maps the data by a function Φ into a higher dimensional space (feature space), and defines a separating hyper plane there. The kernels of the SVM are:
K ( x, z ) = ( Φ ( x ) • Φ ( z ) ) , for all x, z ∈ Rd which will be used to realize a non-linear mapping with a feature space.
(16)
K ( x, z ) is a
symmetric function. The appropriate hyper-plane can be found by an SVM in the feature space that corresponds to a decision boundary in the input space. The concept of a kernel mapping function is very powerful. It allows SVM models to perform separations of vectors even with very complex boundaries. All these capabilities make SVMs an attractive classification system. In this study, the process to implement SVM begins with testing dataset which are vectorized in the same way as the training dataset. It will be fed into the classifier constructed for a given class to make separation between positive and negative members. The SVM assigns each protein in the testing dataset a discriminative score which indicates a predicted positive level of protein. The proteins with discriminative scores higher than the threshold zero are classified as positive members and the others as negative members. This process is iterated until all proteins are tested. In order to implement this process, the Gist.2.3 SVM package implemented by Liao and Noble (2003) is used. It is available at: http://bioinformatics.ubc.ca/gist /download.html.
7 Results and Discussion The performance of the proposed method is compared with the current successful homology detection methods. This method is compared with other methods that have combined SVM with LSA (SVM-Pattern-LSA, SVM-Ngram-LSA, and SVM-MotifLSA), SVM without LSA (SVM-Pattern, SVM-Ngram, and SVM-Motif), and SVM with kernel matrix (SVM-String-Scoring and SVM-Fisher). In order to assess the recognition performance of each method, testing is done on its ability to classify protein sequences into homologue or non-homologue members in the SCOP version 1.53. The dataset contains 54 families within at least 10 family members and 5 superfamily members outside of the family. To measure the performance of the proposed method, an evaluation in terms of the average Receiver Operating Characteristics (ROC) and Median Rate of False Positives (MRFP) values over 54 experiments are summarized and presented in Table 1. The results have shown that the proposed method has performed better than the other methods. Refer to the Fig. 2(a) and Fig. 2(b) present the ranks of the ROC and MRFP scores. In each graph, a higher curve corresponds to the more accurate homology detection performance. Compared to ROC or MRFP, the SVM-SS-LSA method performs significantly better than the other methods. The results in Table 1, Fig. 2(a),
PPSA with Latent Semantic Analysis and SVM to Detect Remote Protein Homology
537
and Fig. 2(b) have shown that the use of combination PSS, PPSA, LSA, and SVM has been successfully applied in the remote protein homology detection. The implementation of the four methods has helped to extract good quality protein feature vectors. The PSS and PPSA methods have assisted in extracting more structural similarity information between protein substrings where this information will be kept by the protein feature vectors. On the other hand LSA has been utilized were assisting to reduce the dimension of protein feature vectors by reducing the redundant and noisy data in the protein patterns and protein feature vectors. All these methods have contributed to produce good quality protein feature vectors. This makes SVM-SS-LSA a better option compared to other methods. Another evaluation to show the performance of the proposed method is by using family by family comparison of the ROC. The results are plotted in Fig. 3 and Fig. 4. Fig. 3 shows the performance comparisons between the SVM-SS-LSA and methods without LSA and Fig. 4 presents the performance comparisons between the SVM-SSLSA and methods with LSA. The SVM-SS-LSA performs better compared to the other methods that have applied SVM with LSA and SVM without LSA. In Fig. 3 and Fig. 4, the families in the right-bottom area mean that the method labeled by y-axis outperforms the method labeled by x-axis on this family. The use of the SVM-SSLSA method shows a better performance compared to the other methods. This is due to the usage of a chi-square and SVD. Chi-square is used to reduce the number of redundant data. The success of chi-square to reduce redundant data is achieved by the ability of the chi-square to select the most discriminative features of proteins by their average chi-square scores. Whereas, the SVD is used to reduce the high dimension of protein features vector by reducing the noisy data. SVD decomposes the matrix of structural similarity information into three matrices in order to choose the good quality protein feature vectors. In this procedure, the top ( ) dimensions for the elements in diagonal matrix S that are greater than the threshold are considered for further processing. Both methods are capable to produce best representation of protein feature vectors in low dimension with less redundant and noisy data. This success has contributed to extract the good quality protein feature vectors. The success of method to classify protein sequences into their homologue or nonhomologue members of protein is dependent on the protein similarity score that represents the value of similarity structure of protein sequences. Almost all of the protein substrings present high similarity scores due to the use of Smith-Waterman algorithm that is known to be the most sensitive pairwise comparison method. Furthermore, this algorithm also measures the sensitive and non-sensitive regions. Finally, all the high similarity sequences scores are used by LSA to produce the protein feature vectors. In this study, experiment to show the performance of protein substrings against protein sequences was implemented. All the protein sequences were split into different number of k values. Different k sizes namely 25, 35, 45, 55, 65 have been chosen in order to investigate which value can produce the best results. The result shows that k =45 is better than the other chosen values of k, where it produced the highest value of mean ROC and mean RFP. The results detail can be referred in Table. 2.
(
)
538
S. Ismail, R.M. Othman, and S. Kasim
Fig. 2. Relative performance of the nine homology detection methods. Each graph plots the total number of families for which a given method exceeds a score threshold. The top graph (a) uses ROC scores and the bottom graph (b) uses MRFP scores.
PPSA with Latent Semantic Analysis and SVM to Detect Remote Protein Homology
539
Fig. 3. Family by family comparison of ROC based on the SVM-SS-LSA method and those without LSA. Each point on the graph corresponds to one of the 54 SCOP families. In each figure, the axes are ROC scores achieved by the two primary methods compared. Fig. (a), (b), (c), (d), and (e) are based on SVM-Ngram, SVM-Pattern, SVM-Motif, SVM-Fisher, and SVMSS.
540
S. Ismail, R.M. Othman, and S. Kasim
Fig. 3. (continued)
PPSA with Latent Semantic Analysis and SVM to Detect Remote Protein Homology
541
Fig. 3. (Continued)
Fig. 4. Family by family comparison based on the SVM-SS-LSA method and those with LSA. Each point on the graph corresponds to one of the 54 SCOP families. In each figure, the axes are ROC scores achieved by the two primary methods compared. Fig. (a), (b), and (c) are based on SVM-Ngram-LSA, SVM-Motif-LSA, and SVM-Pattern-LSA.
542
S. Ismail, R.M. Othman, and S. Kasim
Fig. 4. (continued)
PPSA with Latent Semantic Analysis and SVM to Detect Remote Protein Homology
543
Table 1. Average ROC and MRFP scores for the 54 families
Methods
Mean ROC
Mean MRFP
SVM-Ngram
0.773000
0.204000
SVM-Pattern
0.791415
0.144053
SVM-Motif
0.813560
0.134893
SVM-Ngram-LSA
0.835387
0.124572
SVM-Motif-LSA
0.859193
0.101688
SVM-Pattern-LSA
0.859484
0.099527
SVM-Fisher
0.878926
0.096300
SVM-SS
0.887630
0.070287
SVM-SS-LSA
0.890626
0.069435
Table 2. The number of split protein sequence with ROC and MRFP scores
Length of protein sequence Measures
l=25
l=35
l=45
l=55
l=65
Mean ROC
0.00000
0.88764
0.89063
0.88961
0.88990
Mean MRFP
0.00000
0.17532
0.06944
0.07042
0.09211
In remote protein homology detection, computational efficiency is the one important aspect. In this regard, SVM-SS-LSA which comprised from the combination of PSS and PPSA, LSA, and SVM is comparable with SVM with LSA and SVM without LSA. The results show that SVM-SS-LSA is more efficient than SVM with LSA and slightly worse than SVM without LSA. Any SVM-based method includes a vectorization and optimization step. The time complexity of the vectorization step SVM-based method is O (nml), where n is the number of training dataset, m is the total number of words, and l is the length of the longest training protein sequence. The protein feature vector extraction of SVM-SS-LSA involves computing n2 pairwise scores. Using Smith-Waterman, the vectorization step for SVM-SS-LSA is O (m2) and protein pattern extraction using TEIRESIAS takes O
544
S. Ismail, R.M. Othman, and S. Kasim
(nl1lognl), where m is the length of the longest protein substring, yielding a total running time of O (nl1lognl + n2m2). SVM-SS-LSA has shorter words of protein sequences; thus this will lead to the lowest running time compared to other methods that utilize SVM with LSA. SVM without LSA method uses n-gram, pattern, and motif. By not incorporating LSA, lower running time will be produced as compared to SVM-SS-LSA and SVM with LSA. SVM-Ngram which extracts n-gram using PSIBLAST takes O (n2mt), where t is the minimum of n and m, SVM-Pattern which extracts protein pattern using TEIRESIAS algorithm takes O (nl1lognl + n2l2m), and SVM-Motif which extracts motif by MEME takes O (n2l2W) where W is the width of motif. The SVM with LSA is similar with SVM without LSA and it has an additional SVD process which roughly added t.
8 Conclusion In this paper, a new method called SVM-SS-LSA is introduced and it has been successfully used in remote protein homology detection. SVM-SS-LSA is presented to manage protein sequences by classifying the protein sequences according to their family. SVM-SS-LSA is based on the detection of high similarity information from protein substring extracted by PSS and PPSA from sequence comparison model, low dimensional representation of protein feature vectors with less redundant and noisy data filtered by chi-square and SVD from generative model, and discrimination of homologue and non-homologue members by SVM which is a discriminative classifier model. The performance of SVM-SS-LSA is shown by using several measures such as ROC, MRFP, and family by family comparison. The experimental results show that the use of high similarity information of protein substring and low dimension protein feature vectors will yield protein feature vectors of good quality that has been proved to be more successful than the other methods which are based on SVM with LSA and SVM-without LSA. The experiment has been implemented on a benchmark experiment of SCOP superfamily recognition which was designed to simulate the problem of remote protein homology detection. The remarkable application of SVM-SS-LSA comes from a combination of good methods such as PSS, LSA, and SVM. The PPSA which applies Smith-Waterman algorithm is the best method currently to do protein substring alignment precisely [22]. Whereas, the LSA and SVM are well known methods [10] and have shown a good performance in detecting remote protein homology [8, 22]. In this study, the SmithWaterman algorithm which has been applied in PPSA has been developed to quantify the similarity information of protein sequences in order to extract high protein similarity information. Their parameters have been optimized over the years to provide relevant measures of similarity for homologous protein substring, and they now represent core tools in computational biology. Besides that, many of the more recent SVM-based methods focus on finding useful representations of protein sequences. Such representations suffer from the peaking phenomenon in many machine-learning methods because the large dimensional protein feature vectors may be introduced. The methods in SVM-SS-LSA are capable to handle large dimensional feature vectors using chi-square and SVD from LSA. Both methods are capable to produce low dimensional protein feature vectors with less redundant and noisy data. Chi-square
PPSA with Latent Semantic Analysis and SVM to Detect Remote Protein Homology
545
uses independent technique to reduce redundant data whereas SVD uses thresholding technique to reduce the dimension of matrices from vector space S which the Ddimensional space spanned by the Us, leading to noise removal and good quality representation of protein feature vectors. The successful application of PSS and PPSA from sequence comparison model, LSA from generative model, and SVM from discriminative classifier model to remote protein homology detection is of great significance. Future improvements are necessary to make SVM-SS-LSA more efficient in detecting remote protein homology. The following directions to be considered in future research are optimizing the protein substring width in order to get all possible substrings of amino acid and having a secondary and tertiary prediction structure with functional properties of proteins that can be investigated to perform as a dataset to detect remote protein homology. Acknowledgments. This work is supported by the Malaysian Ministry of Science, Technology, and Innovation (MOSTI) under grant no. 01-01-06-SF0228.
References 1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Computational Biology 215(3), 403–410 (1990) 2. Burges, C.J.C.: A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(1), 121–167 (1998) 3. Cai, Y.D., Liu, X.J., Xu, X.B., Zhou, G.P.: Support vector machines for predicting protein structural class. BMC Bioinformatics 2(3), 1471–2105 (2001) 4. Chou, K.C.: Review: structural bioinformatics and its impact to biomedical science. Current Medicinal Chemistry 11(16), 2105–2134 (2004) 5. Chou, K.C., Elrod, D.W.: Prediction of membrane protein types and subcellular locations. Proteins: Structure Function Genetics 34(1), 137–153 (1999) 6. Chou, K.C., Shen, H.B.: Predicting protein subcellular location by fusing multiple classifiers. Journal of Biochemistry and Cell 99(2), 517–527 (2006) 7. Dong, Q.W., Lin, L., Wang, X.L., Li, M.H.: A pattern-based SVM for protein remote homology detection. In: International Conference on Machine Learning and Cybernetics of the Guangzhou of China, pp. 3363–3368 (2005) 8. Dong, Q., Wang, X.L., Lin, L.: Application of latent semantic analysis to protein remote homology detection. Bioinformatics 22(3), 285–290 (2006) 9. Fukushima, A., Wada, M., Kanaya, S., Arita, M.: SVD based anatomy of gene expressions for correlation analysis in arabidopsis thaliania. DNA Research 15(1), 367–374 (2008) 10. Gabrys, B., Howlet, R.J., Jain, L.C.: Knowledge-Based intelligent information and engineering systems. In: Proceeding of the Tenth Conference KES of the Bournemouth of United Kingdom, pp. 393–400 (2006) 11. Gotoh, O.: An improved algorithm for matching biological sequences. Molecul Biology 162(1), 705–708 (1982) 12. Jaakkola, T., Diekhans, M., Haussler, D.: A discriminative framework for detecting remote protein homologies. Journal of Bioinformatics and Computational Biology 7(1-2), 95–114 (2000)
546
S. Ismail, R.M. Othman, and S. Kasim
13. Kelil, A., Wang, S., Brzezinski, R., Fleury, A.: CLUSS: clustering of protein sequences based on a new similarity measure. BMC Bioinformatics 8(1), 1–19 (2007) 14. Kuang, R., Ie, E., Wang, K., Wang, K., Siddiqi, M., Freund, Y., Leslie, C.: Profile-Based string kernels for remote homology detection and motif extraction. Journal of Bioinformatics and Computational Biology 3(3), 152–160 (2004) 15. Landauer, T.K., Foltz, P.W., Laham, D.: Introduction to latent semantic analysis. Discourse Process 25(1), 259–284 (1998) 16. Liao, L., Noble, S.N.: Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. Journal of Computational Biology 10(1), 857–868 (2003) 17. Mohseni-Zadeh, S., Brezellec, P., Risler, J.L.: Cluster-C, an algorithm for the large-scale clustering of protein sequences based on the extraction of maximal cliques. Computational Biology and Chemistry 28(1), 211–218 (2004) 18. Pearson, W.R.: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymo 183(1), 63–98 (1990) 19. Rigoutsos, I., Floratos, A.: Combinatorial pattern discovery in biological sequences: the TEIRESIAS algorithm. Bioinformatics 14(1), 55–67 (1998) 20. Tang, Y., Jing, B., Zhang, Y.Q.: Granular support vector machines with association rules mining for protein homology prediction. Artificial Intelligence in Medicine 25(1), 121–134 (2005) 21. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the International Conference on Machine Learning of the Salvador of Brazil, pp. 412–420 (1997) 22. Zaki, M.N., Deris, S.: Detecting remote protein evolutionary relationships via string scoring method. International Journal of Biomedical Sciences 2(1), 59–66 (2007)
Jordan Pi-Sigma Neural Network for Temperature Prediction Noor Aida Husaini1, Rozaida Ghazali1, Nazri Mohd Nawi1, and Lokman Hakim Ismail2 1
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Batu Pahat, Johore, Malaysia 2 Faculty of Environmental and Civil Engineering, Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Batu Pahat, Johore, Malaysia [email protected], {rozaida,nazri,lokman}@uthm.edu.my
Abstract. This study examines and analyses the use of a new recurrent neural network model: Jordan Pi-Sigma Network (JPSN) as a forecasting tool. JPSN’s ability to predict future trends of temperature was tested and compared to that of Multilayer Perceptron (MLP) and the standard Pi-Sigma Neural Network (PSNN); trained with the standard gradient descent algorithm. A set of historical temperature measurement for five years from Malaysian Meteorological Department was used as input data to train the networks for the next-day prediction. Simulation results show that JPSN forecast comparatively superior to MLP and PSNN models, with lower prediction error, thus revealing a great potential in predicting the temperature measurement. Keywords: Jordan Neural Network, Recurrent Neural Network, Pi-Sigma Neural Network, Higher Order Neural Network, Temperature Forecasting.
1 Background Forecasts are made in many disciplines, among them are economics [1], [2], weather forecast [3], [4], disease forecasting [5], [6], and technology forecasting [7], [8]. Accordingly, temperature forecasting, the subject of this study, is the essence of traceability for weather forecasting. Certainly, temperature can be defined qualitatively as a measure during the day, by considering corresponding hotness or coldness [9] that lies on a predetermined set of values. The temperature can have a greater influence in daily life than any other single element on a routine basis. Therefore, precise temperature measurement is needed to obtain the accuracies [10]. Being located in the equatorial region, the annual temperature range in Malaysia is relatively small [11] which is around 26.5ºC. It varies over the lowland area. The lowest temperature recorded was in Kuala Krai Meteorological Station with the reading of 18.0˚C. Meanwhile, the highest temperature recorded was in Sri Aman Meteorological Station with the reading of 34.6°C [11]. T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 547–558, 2011. © Springer-Verlag Berlin Heidelberg 2011
548
N.A. Husaini et al.
Temperature forecasting is mainly issued in qualitative terms with the use of conventional methods, assisted by the data projected images taken by meteorological satellites to assess future trends [4], [12]. It is noted that numerical models products are later use as input to obtain a temperature forecast. There are many different scientific efforts in the domain of meteorological parameters forecasting (temperature, solar radiation, humidity, etc.) [11] has been applied; however, those techniques involve sophisticated mathematical models to justify the use of empirical rules and require a prior knowledge of the characteristics of the input time-series to predict future events. The potential of achieving outstanding performance by utilising neural network (NN) has captured a great attention from forecasters, researchers, scientists and engineers. NN theory grew out of Artificial Intelligence (AI) research, which naturally adapts special characteristics from human intelligence, by learning in a manner similar to the human brain. Haykin [13] describes NN as an adaptive machine that has a natural tendency for storing experiential knowledge by constructing a nonlinear mapping between input-output data. Various studies in temperature forecasting have been applied and discussed by many authors. Bhardwaj et al. [12] used the Direct Model Output (DMO) which forecast values for each location of interest that were obtained using several atmospheric parameters. On the other hand, Smith et al. [14] noted that ward-style NN can be used for predicting the temperature. Meanwhile, Radhika and Shashi [15] used Support Vector Machine (SVM) for one-step-ahead prediction. They found that SVM could be used for weather forecasting. Finally, Wang and Chen [16] found that automatic clustering techniques and two-factor higher-order fuzzy can be used to cluster the historical numerical data into intervals of different lengths. Until recently, the widely used NN, Multilayer Perceptron (MLP) also received a great attention in many fields such as prediction [14], [15], pattern recognition [17], signal processing [18], [19], and classification [20], [21]. However, MLP adopts computationally intensive training algorithms and relatively slow when the hidden layer increases [2], [22]. It also introduces many local minima in the error surface [23]. This is then; the development of Higher Order Neural Network (HONN) has captured researchers’ attention. Pi-Sigma Neural Network (PSNN) which lies within this area, has the ability to converge faster and maintain the high learning capabilities of HONN [1], [2]. The uses of PSNN itself for temperature forecasting are preferably acceptable. However, in this study we disposed towards an idea to develop a new network model: Jordan Pi-Sigma Network (JPSN) to overcome such drawbacks in MLP and takes the advantages of PSNN. Presently, JPSN is use to learn the historical temperature data of Batu Pahat, and to predict the temperature measurements for the next-day. The remaining parts of the paper are broken up into the following sections: Neural Networks, Learning Algorithm of the JPSN, Research Design of Neural Network Models, Results and Discussions, and Conclusions. The Neural Networks review the network models that being used in this study. The learning algorithm of the JPSN describes the step-by-step process to train the network. Results and discussions compile pertinent tables and graphical review of the information presented. The paper is ended with the conclusions.
Jordan Pi-Sigma Neural Network for Temperature Prediction
549
2 Neural Networks NN is essentially a parametric model that can be one of the very useful tools for solving many practical forecasting problems. This section describes the NN models used in this study. 2.1 Multilayer Perceptron MLP is formed by a set of sensory units that represent the input layers, one or more hidden layers, and an output layer. The MLP, which is typically used in supervised learning problem, have the ability to learn from input-output pairs [24]. The standard MLP is usually trained with the standard gradient descent algorithm [25]. The input signal propagates through the network layer-by-layer [13] through weighted-sum [26] and sigmoid activation function 1 (1 + e − x ) of the hidden and output units. The sigmoid function is often chosen because it is mathematically convenient and close to linear near origin which prevents accelerating growth throughout the network [27], thus allowing the networks to model well on nonlinear problems [28]. The computation of the network error is done at the output layer, and the error is propagated back to the input layer by updating weights consequently, right after each input is presented to the network. This training process is iterated until a stepwise change of the weights have converged [25]. It is noted the MLP has been successfully applied in many applications involving pattern recognition [17], signal processing [18], [19], and classification [20], [21]. However, the ordinary MLP is extremely slow due to its multilayer architecture especially when the hidden layer increases [22]. It also converges very slow in typical situations principally when dealing with complex and nonlinear problems; and does not scale well with problem size [22]. 2.2 Pi-Sigma Neural Network PSNN is a type of HONN which comprises of a single layer of learnable weights [22]. PSNN consists of two layers of units; the product unit and the summing unit layers. The weighted inputs are fed to a linear summing unit, and to the output layer. The number of the summing units in PSNN reflects the network order. By using an additional summing unit, it will increase the network’s order by 1. In PSNN, weights from hidden layer to the output layer are fixed to 1. This property drastically reduces the training time of PSNN. Sigmoid and linear functions are adopted at the output layer and summing layer, respectively. The use of linear summing units at the hidden layer makes the convergence analysis of the learning rules for the PSNN more accurate and tractable [2]. The applicability of this network was successfully applied for function approximation [2], classification [22], pattern recognition [29], cryptography scheme [30], and so forth. Compared to other HONN models, PSNN has a more simple structure [2] and can converge faster whilst maintain the high learning capabilities of HONN [1], [2]. Moreover, PSNN is superior to the MLP with at least two orders of magnitude less number of computations when compared to the ordinary MLP for similar performance levels, and over a broad class of problems [1].
550
N.A. Husaini et al.
2.3 Jordan Neural Network The Jordan Neural Network (JNN), which was proposed by Jordan [31], is a type of recurrent neural network (RNN). The motivation for exploring JNN is their potential of maintaining internal state to generate temporal sequences. The output of JNN which is clocked back to the input layers are called context unit. Each unit receives a copy from the output neurons and from themselves [31]. The recurrent connections from the output layer to the context neurons have an associated parameter, m which lies between [0,1] . The feedback connection concatenates the input-output layer, therefore, allows the network to posses short term memory for temporal pattern recognition over a time series. The activation at time t is given by the following expression: c (t ) = mc (t − 1) + x(t − 1)
(1)
where x(t − 1) is the output network at time t − 1 . The activation function for the rest nodes is calculated similar to that of MLP. By considering the expression of the context neuron activation, it is probable to write: t −1
c (t ) = ∑ μ x (t − 1) j =1
j −1
(2)
Consequently, the parameter m endows to the JNN with certain inertia of the context neurons. To note, the parameter m resolves the sensitivity of the context neurons to preserve this information [31]. Thus, the recurrence on the context units provides a decaying history of the output over the most recent time steps. The pertinency of JNN has been tested through [32], [33], [34] for different kind of problems. 2.4 Jordan Pi-Sigma Network A new type of recurrent HONN, called the Jordan Pi-Sigma Network (JPSN) is proposed in this research work. The JPSN will be used for temperature prediction, and the network will be trained with the standard gradient descent algorithm [25]. The architecture of JPSN is constructed by having a recurrent link from output layer back to the input layer which gives the temporal dynamics of the time series process [35]. The network combines the properties of both HONN and RNN so that better performance can be achieved. Additionally, the unique architecture of JPSN may also avoid from the combinatorial explosion of higher-order terms as the network order increases. The structure of JPSN is quite similar to the ordinary PSNN. The main difference is the integration of a recurrent or feedback link from the output to the input layer thus allowing for computational structure in a more parsimonious way [35]. Weights between the input layers and summing layers are trainable, while weights between the summing layers and the output layer are fixed to unity. In this work, the number of hidden nodes for MLP, and the number of higher order terms for PSNN and JPSN, are set between 2 and 5. The architecture of the proposed JPSN is shown in Fig. 1.
Jordan Pi-Sigma Neural Network for Temperature Prediction
551
Fig. 1. The JPSN ( z − 1 denotes the time delay operator)
3 Learning Algorithm of the JPSN The learning algorithm for JPSN is calculated as follows: m
m+2
m =1
m =1
hL (t + 1) = ∑ w Lm x m (t ) + wL ( M +1) + w L (M + 2 ) y (t ) = ∑ w Lm z m (t )
(3)
where h L (t + 1) represents the activation of the L unit at time t + 1 , and y (t ) is the previous network output. The unit’s transfer function f denotes sigmoid function, which bounded of output range of [0 ,1] is used in this network. For each training example, do: 1. Calculate the output.
⎛ k ⎞ y(t + 1) = f ⎜⎜ ∏ hL (t + 1)⎟⎟ ⎝ L=1 ⎠
2.
(4)
where y (t + 1) is the network output. Compute the error e at output node:
e (t + 1) = d (t + 1) − y (t + 1) 3.
(5)
Compute the weight changes:
Δwij (t + 1) = −η where
η
∂J (t + 1) + αΔwij (t ) ∂wij
is the learning rate and
α
is the momentum term.
(6)
552
N.A. Husaini et al.
4.
Update the weight:
Wi = Wi + ΔWi 5.
(7)
If current epoch > maximum epoch stop the training else go to step 1
4 Research Design of Neural Network Models 4.1 Data Description We are considering the univariate input data; the 5-years daily measurement of temperature in Batu Pahat region, ranging from 2005 to 2009. The data was obtained from Central Forecast Office, Malaysian Meteorological Department (MMD). 4.2 Experimental Design All networks were built considering 5 different number of input nodes ranging from 4 to 8. A single neuron was considered for the output layer. The number of hidden layer (for MLP), and higher order terms (for PSNN and JPSN) was initially started with 2 nodes, and increased by one until a maximum of 5 nodes. The combination of 4 to 8 input nodes and 2 to 5 nodes for hidden layer/higher order terms of the three network models yields a total of 1215 NN architectures for each in-sample training dataset. Since the forecasting horizon is one-step-ahead, the output variable represents the temperature measurement of one-day ahead. Each of data series is segregated in time order and is divided into three sets; the training, validation and the out-of-sample data. To avoid computational problems, the data is normalised between the upper and lower bounds of the network transfer function, which often to be the monotonically increasing function, 1 (1 + e − x ) [27]. The interconnections weights in each neuron are adjusted to minimise the error by using standard gradient descent algorithm [25]. To be more certain of getting the true global minima, it is practical to initialise the weights with a small random number. The reason to initialise weights with small values between [0,1] is to prevent saturation for all patterns and the insensitivity to the training process. On the other hand, if the initial weights are set too small, training will tend to start very slow. The training set serves the model for training purpose and the testing set is used to evaluate the network performances [28]. Training involves the adjustment of the weights so that the NN is capable to predict the value assigned to each member of the training set. The network is validated and tested in order to preserve some features for calibration purpose [28], which means producing appropriate outputs for those input samples which were not encountered during training [4].
Jordan Pi-Sigma Neural Network for Temperature Prediction
553
4.3 Assessment Criteria There is no consensus on the most appropriate measure to assess the performance of a forecasting technique. Network models performance were evaluated using Mean Squared Error (MSE), Signal to Noise Ratio (SNR) [1], Mean Absolute Error (MAE) [36] and Normalised Mean Squared Error (NMSE) [1], which are expressed by: MSE =
(
1 n ∑ Pi − Pi* n i =1
)
2
(8)
SNR = 10 * lg(sigma ) sigma = n
m2 * n SSE
(
SSE = ∑ Pi − Pi* i =1
)
m = max(P )
MAE =
NMSE =
σ2 =
(9)
1 n ∑ Pi − Pi* n i =1 1
σ2
(10)
2
∑ (P − P ) n n
*
i =1
(
i
1 n ∑ Pi − Pi* n − 1 i =1
i
2
)
n
P * = ∑ Pi i =1
(11) *
where n is the total number of data patterns, Pi and Pi represent the actual and predicted output value [1], [36]. In addition, we also consider the number of iterations during the training process.
5 Results and Discussions All networks training and testing have been implemented using MATLAB 7.10.0 (R2010a) on Pentium® Core ™2 Quad CPU. Pattern selection was set randomly and stopping criteria was set to 3000 events. All results presented were based on the average of 10 simulations/runs. The prediction performances of JPSN are instantiated with PSNN and MLP based on the aforementioned performance metrics. Table 1
554
N.A. Husaini et al.
shows the performance of JPSN, PSNN and MLP. Over all the training patterns, the JPSN obtained the lowest MAE, which is 0.063458; while the MAE for PSNN and MLP were 0.063471 and 0.063646, respectively. JPSN outperformed PSNN by ratio 1.95 × 10−4 , and 2.9 ×10−3 for the MLP. Moreover, it can be seen that JPSN reached higher value of SNR. Therefore, it can be concluded that the networks can track the signal better than PSNN and MLP. Table 1. Comparison of JPSN, PSNN and MLP for all performance metrics
Network Model MAE SNR NMSE MSE Training MSE Testing Epoch
JPSN 0.063458 18.7557 0.771034 0.006203 0.006462 1460.9
PSNN 0.063471 18.71039 0.779118 0.006205 0.006529 1211.8
MLP 0.063646 18.69706 0.781514 0.00623 0.006549 2849.9
The models’ performances were also evaluated by comparing their NMSE. Fig. 2 illustrates the NMSE for the three network models. It shows that JPSN fashionably gives better NMSE when compared to both PSNN and MLP. NMSE 0.784 0.781514237
0.782 0.78
JPSN PSNN MLP
0.77911793
0.778 0.776 0.774 0.772
0.771034086
0.77 0.768 0.766 0.764 Network Models
Fig. 2. NMSE for JPSN, PSNN and MLP
The graphs depicted in Fig. 3 present the prediction errors on the testing data set for all network models. As expected, differences between the predicted and the actual data for JPSN practically beat out PSNN and MLP by 1.038% and 1.341%, respectively. Evaluations on different performance metrics over the temperature data demonstrated that JPSN were merely improved the performance level compared to the two benchmarked network models, PSNN and MLP.
Jordan Pi-Sigma Neural Network for Temperature Prediction
a) Temperature Forecast made by JPSN
b) Temperature Forecast made by PSNN
c) Temperature Forecast made by MLP Fig. 3. Temperature Forecast made by JPSN, PSNN and MLP on the Testing Set
555
556
N.A. Husaini et al.
6 Conclusions This study focuses on using JPSN as an alternative mechanism to predict the temperature, which is less memory required during the network training. The network combines the properties of both PSNN and RNN. On the whole, JPSN is able to show a better performance level compared to the ordinary PSNN and MLP. It is concluded that JPSN has the capability to forecast temperature and, if properly trained, the meteorological station could benefit from the use of this superior forecasting tool. Looking for the future, we are delighted to consider the hybridisation of more than one meteorological parameters in order to improve the prediction of future temperature events. Acknowledgments. The authors would like to thank Universiti Tun Hussein Onn Malaysia for supporting this research under the Postgraduate Incentive Research Grant.
References 1. Ghazali, R., Hussain, A., El-Dereby, W.: Application of Ridge Polynomial Neural Networks to Financial Time Series Prediction. In: International Joint Conference on Neural Networks (IJCNN 2006), Vancouver, BC, pp. 913–920 (2006) 2. Ghazali, R., Al-Jumeily, D.: Application of Pi-Sigma Neural Networks and Ridge Polynomial Neural Networks to Financial Time Series Prediction. In: Zhang, M. (ed.) Artificial Higher Order Neural Networks for Economics and Business. Information Science Reference, pp. 271–293 (2009) 3. Lorenc, A.C.: Analysis Methods for Numerical Weather Prediction. Quarterly Journal of the Royal Meteorological Society 112, 1177–1194 (1986) 4. Paras, Mathur, S., Kumar, A., Chandra, M.: A Feature Based Neural Network Model for Weather Forecasting. In: Proceedings of World Academy of Science, Engineering and Technology. pp. 67–73 (2007) 5. Cooke, B.M., Jones, D.G., Kaye, B., Hardwick, N.V.: Disease Forecasting. In: The Epidemiology of Plant Diseases, pp. 239–267. Springer, Netherlands (2006) 6. Coombs, A.: Climate Change Concerns Prompt Improved Disease Forecasting. Nature Medicine 14, 3 (2008) 7. Byungun, Y., Yongtae, P.: Development of New Technology Forecasting Algorithm: Hybrid Approach for Morphology Analysis and Conjoint Analysis of Patent Information. IEEE Transactions on Engineering Management 54, 588–599 (2007) 8. Cheng, A.-C., Chen, C.-J., Chen, C.-Y.: A Fuzzy Multiple Criteria Comparison of Technology Forecasting Methods for Predicting the New Materials Development. Technological Forecasting and Social Change 75, 131–141 (2008) 9. Childs, P.R.N.: Temperature. In: Practical Temperature Measurement., pp. 1–15. Butterworth-Heinemann, Butterworths (2001) 10. Ibrahim, D.: Temperature and its Measurement. In: Microcontroller Based Temperature Monitoring & Control, Newnes, pp. 55–61 (2002) 11. Malaysian Meteorological Department, http://www.met.gov.my
Jordan Pi-Sigma Neural Network for Temperature Prediction
557
12. Bhardwaj, R., Kumar, A., Maini, P., Kar, S.C., Rathore, L.S.: Bias-free Rainfall Forecast and Temperature Trend-based Temperature Forecast using T-170 Model Output during the Monsoon Season. Meteorology & Atmospheric Sciences 14, 351–360 (2007) 13. Haykin, S.: Neural Networks - A Comprehensive Foundation. Prentice-Hall, Englewood Cliffs (1998) 14. Smith, B.A., Hoogenboom, G., McClendon, R.W.: Artificial Neural Networks for Automated Year-Round Temperature Prediction. Computers and Electronics in Agriculture 68, 52–61 (2009) 15. Radhika, Y., Shashi, M.: Atmospheric Temperature Prediction using Support Vector Machines. International Journal of Computer Theory and Engineering 1, 55–58 (2009) 16. Wang, N.-Y., Chen, S.-M.: Temperature Prediction and TAIFEX Forecasting based on Automatic Clustering Techniques and Two-Factors High-Order Fuzzy Time Series. Expert Systems with Applications 36, 2143–2154 (2009) 17. Isa, N.A.M., Mamat, W.M.F.W.: Clustered-Hybrid Multilayer Perceptron Network for Pattern Recognition Application. Applied Soft Computing 11, 1457–1466 (2011) 18. Nielsen, J.L.G., Holmgaard, S., Ning, J., Englehart, K., Farina, D., Parker, P.: Enhanced EMG Signal Processing for Simultaneous and Proportional Myoelectric Control. In: Annual International Conference of the IEEE, Engineering in Medicine and Biology Society, EMBC 2009, pp. 4335–4338 (2009) 19. Li, H., Adali, T.: Complex-Valued Adaptive Signal Processing using Nonlinear Functions. EURASIP J. Adv. Signal Process, 1–9 (2008) 20. Yuan-Pin, L., Chi-Hong, W., Tien-Lin, W., Shyh-Kang, J., Jyh-Horng, C.: Multilayer Perceptron for EEG Signal Classification during Listening to Emotional Music. In: TENCON 2007 - 2007 IEEE Region 10 Conference, pp. 1–3 (2007) 21. Belacel, N., Boulassel, M.R.: Multicriteria Fuzzy Classification Procedure PROCFTN: Methodology and Medical Application (2003) 22. Shin, Y., Ghosh, J.: The Pi-Sigma Networks: An Efficient Higher-Order Neural Network for Pattern Classification and Function Approximation. In: Proceedings of International Joint Conference on Neural Networks, Seattle, WA, USA, pp. 13–18 (1991) 23. Yu, W.: Back Propagation Algorithm. Psychology/University of Newcastle (2005) 24. Güldal, V., Tongal, H.: Comparison of Recurrent Neural Network, Adaptive Neuro-Fuzzy Inference System and Stochastic Models in Eğirdir Lake Level Forecasting. Water Resources Management 24, 105–128 (2010) 25. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by backpropagating errors. Nature 323, 533–536 (1986) 26. Rumbayan, M., Nagasaka, K.: Estimation of Daily Global Solar Radiation in Indonesia with Artificial Neural Network (ANN) Method. In: Proceedings of International Conference on Advanced Science, Engineering and Information Technology (ISC 2011), pp. 190–103. Malaysia National University (2011) 27. Cybenko, G.: Approximation by Superpositions of a Sigmoidal Function. Signals Systems 2, 14 (1989) 28. Shrestha, R.R., Theobald, S., Nestmann, F.: Simulation of Flood Flow in a River System using Artificial Neural Networks. Hydrology and Earth System Sciences 9, 313–321 (2005) 29. Shin, Y., Ghosh, J., Samani, D.: Computationally Efficient Invariant Pattern Recognition with Higher Order Pi-Sigma Networks. In: Intelligent Engineering Systems through Artificial Neural Networks-II, pp. 379–384 (1992) 30. Song, G.: Visual Cryptography Scheme Using Pi-Sigma Neural Networks. In: International Symposium on Information Science and Engineering, pp. 679–682 (2008)
558
N.A. Husaini et al.
31. Jordan, M.I.: Attractor Dynamics and Parallelism in a Connectionist Sequential Machine. In: Proceedings of the Eighth Conference of the Cognitive Science Society, pp. 531–546. IEEE Press, Piscataway (1986) 32. Saha, S., Raghava, G.P.S.: Prediction of Continuous B-cell Epitopes in an Antigen using Recurrent Neural Network. Proteins: Structure, Function, and Bioinformatics 65, 40–48 (2006) 33. Carcano, E.C., Bartolini, P., Muselli, M., Piroddi, L.: Jordan Recurrent Neural Network versus IHACRES in Modelling Daily Streamflows. Journal of Hydrology 362, 291–307 (2008) 34. West, D., Dellana, S.: An Empirical Analysis of Neural Network Memory Structures for Basin Water Quality Forecasting. International Journal of Forecasting (2010) (in press) (corrected proof) 35. Hussain, A.J., Liatsis, P.: Recurrent Pi-Sigma Networks for DPCM Image Coding. Neurocomputing 55, 363–382 (2002) 36. Valverde Ramírez, M.C., de Campos Velho, H.F., Ferreira, N.J.: Artificial Neural Network Technique for Rainfall Forecasting Applied to the São Paulo Region. Journal of Hydrology 301, 146–162 (2005)
Accelerating Learning Performance of Back Propagation Algorithm by Using Adaptive Gain Together with Adaptive Momentum and Adaptive Learning Rate on Classification Problems Norhamreeza Abdul Hamid, Nazri Mohd Nawi, Rozaida Ghazali, and Mohd Najib Mohd Salleh Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, 86400 Parit Raja, Batu Pahat, Johor, Malaysia [email protected], {nazri,rozaida,najib}@uthm.edu.my
Abstract. The back propagation (BP) algorithm is a very popular learning approach in feedforward multilayer perceptron networks. However, the most serious problem associated with the BP is local minima problem and slow convergence speeds. Over the years, many improvements and modifications of the back propagation learning algorithm have been reported. In this research, we propose a new modified back propagation learning algorithm by introducing adaptive gain together with adaptive momentum and adaptive learning rate into weight update process. By computer simulations, we demonstrate that the proposed algorithm can give a better convergence rate and can find a good solution in early time compare to the conventional back propagation. We use two common benchmark classification problems to illustrate the improvement in convergence time. Keywords: back propagation, convergence speed, adaptive gain, adaptive momentum, adaptive learning rate.
1 Introduction Artificial Neural Network (ANN) is a computational paradigm which employs simplified models of the biological neurons system, though loosely based on diverse characteristics of the human brain functionality. ANN is a powerful data modelling tool which capable to capture and represent complex input-output relationships. Over the years, the application of ANN has been growing in acceptance level since it is proficient of capturing process information in a black box mode. Due to its ability to solve some problems with relative ease of use, robustness to noisily input data, execution speed and analysing complicated systems without accurate modelling in advance, ANN has successfully been implemented across an extraordinary range of problem domains that involves prediction and a wide ranging usage area in the classification problems [1-8]. T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 559–570, 2011. © Springer-Verlag Berlin Heidelberg 2011
560
N. Abdul Hamid et al.
Among all the diverse type of ANN architectures, Multilayer Perceptron (MLP) is the well known and the most frequently used [9]. Moreover, it is suitable for a large variety of applications [10]. In general, MLP is a Feedforward Neural Network (FNN) which is made up of sets of neurons (nodes) arranged in numerous layers. There are three distinct types of layers viz, an input layer, one or more hidden layers and an output layer. The connections between the nodes of adjacent layers relay the output signals from one layer to the subsequent layer. ANN using the Back Propagation (BP) algorithm to perform parallel training for improving the efficiency of MLP network. It is the most popular, effective, and easy to learn model for complex, multilayered networks. A BP is a supervised learning technique which is based on the gradient descent (GD) optimisation technique that attempts to minimise the error of the network by moving down the gradient of the error curve [11]. The weights of the network are adjusted by the algorithm, thus the error is reduced along a descent direction. Unfortunately, despite the common success of BP in learning MLP network, several major drawbacks are still required to be solved. Since BP algorithm uses GD optimisation technique, the limitations comprise a slow learning convergence and easily get trapped at local minima [12-13]. We have noted that many local minima problems are closely associated with the neuron saturation in the hidden layer. When such saturation occurs, neuron in the hidden layer will lose their sensitivity to the input signals and propagation chain is blocked severely in some cases, the network can no longer learn. Additionally, the convergence behaviour of the BP algorithm also depends on the selection of network architecture, initial weights and biases, learning rate, momentum coefficient, activation function and value of the gain in the activation function. In the last decades, a significant number of different learning algorithms have been introduced to improve the performance of BP algorithm. These involved the development of heuristic techniques, based on studies of properties of the conventional BP algorithm. These techniques include such idea as varying the learning rate, using momentum and gain tuning of activation function. Wang et al. [14] proposed an improved BP algorithm caused by neuron saturation in the hidden layer. Each training pattern has its own activation function of hidden nodes in order to prevent neuron saturation when the network output has not acquired the desired signals. The activation functions are adjusted by the adaptation of gain parameters during the learning process. However, this approach not performed well on the large problems and practical applications. Meanwhile, Ng et al. [15] localised generalisation error model for single layer perceptron neural network (SLPNN). This is an extensibility of the localised generalisation error model for supervised learning with mean squared error minimisation. On the other hand, this approach serves as the first step of considering localised generalisation error models of MLP. Ji et al. [16] proposed a BP algorithm that improved conjugate gradient (CG) based. In the CG algorithms, a search is performed along conjugate directions which usually lead to faster convergence compared to GD directions. Nevertheless, if it reaches a local minimum, it remains forever, as there is no mechanism for this algorithm to escape. The learning rate value is one of the most effective means to accelerate the convergence of BP learning. Nonetheless, the size of weight adjustments needs to be
Accelerating Learning Performance of Back Propagation Algorithm
561
done appropriately during the training process. If the chosen value of learning rate is too large for the error surface in order to speed up the training process, the network may oscillates and converges comparatively slower than a direct descent. Besides, it may cause instability to the network. Conversely, algorithm will take longer time to converge or may never converge if the learning rate value is too small. The descent takes in a very small steps, extensively increases the total of time to converge. Another effective approach to improve the convergence behaviour is by adding some momentum coefficient to the network. It can help to smooth out the descent path by preventing extreme changes in the gradient due to local anomalies [17]. Consequently, it is liable to suppress any oscillation that result from changes in the slope of the error surface. The momentum coefficient is typically chosen to be constant in the conventional back propagation algorithm with momentum. However, such a momentum with a fixed coefficient seems to speed up learning only when the current downhill gradient of the error function and the last change in weight have a similar direction. While the current negative gradient is in an opposing direction to the previous update, the momentum may cause the weight to be adjusted up the slope of the error surface instead of down the slope as desired [18]. In order to make learning more effective, the momentum should be varied adaptively rather than being fixed during the training process [19]. On the other hand, Nazri et al. [20] demonstrated that changing the ‘gain’ value adaptively for each node can significantly reduce the training time without changing the network topology. Therefore, this research proposed a further improvement on [20] by adjusting activation function of neurons in the hidden layer in each training patterns. The activation functions are adjusted by the adaptation of gain parameters together with adaptive momentum and adaptive learning rate value during the learning process. The proposed algorithm which known as back propagation gradient descent with adaptive gain, adaptive momentum and adaptive learning rate (BPGD-AGAMAL) significantly can prevent the network from trapping into local minima that caused by the neuron saturation in the hidden layer. In order to verify the efficiency of the proposed algorithm, the performance of the proposed algorithm will be compared with the conventional BP algorithm and back propagation gradient descent with adaptive gain (BPGD-AG) proposed by [20]. Some simulation experiments were performed on two classification problems including iris [21] and card [22] problems and the validity of our method is verified. The remaining of the paper is organised as follows. In Section 2, the effect of using activation function with adaptive gain is reviewed. While in section 3 presents the proposed algorithm. The performance of the proposed algorithm is simulated on benchmark dataset problems in Section 4. This paper is concluded in the final section.
2 The Effect of Gain Parameter on the Performance of Back Propagation Algorithm An activation function is used for limiting the amplitude of the output neuron. It generates an output value for a node in a predefined range as the closed unit interval [0,1] or alternatively [− 1,1] which can be a linear or non-linear function. This value is
562
N. Abdul Hamid et al.
a function of the weighted inputs of the corresponding node. The most commonly used activation function is the logistic sigmoid activation function. Alternative choices are the hyperbolic tangent, linear, step activation functions. For the j th node, a
logistic sigmoid activation function which has a range of [0,1] is a function of the following variables, viz
oj =
1 −c a 1+ e j
net , j
(1)
where,
⎛ l ⎞ a net , j = ⎜ ∑ wij oi ⎟ + θ j ⎝ i =1 ⎠ where, oj
output of the j th unit.
oi
output of the i th unit.
wij
weight of the link from unit i to unit j .
a net , j
net input activation function for the j th unit.
θj
bias for the j th unit.
cj
gain of the activation function.
(2)
The value of the gain parameter, c j , momentum directly influences the slope of the
activation function. For large gain values (c ≥ 1) , the activation function approaches a
‘step function’ whereas for small gain values (0 < c ≤ 1) , the output values change from zero to unity over a large range of the weighted sum of the input values and the sigmoid function approximates a linear function. Most of the application oriented papers on neural networks tend to advocate that neural networks operate like a ‘magic black box’, which can simulate the “learning from example” ability of our brain with the help of network parameters such as weights, biases, gain, hidden nodes, and so forth. Also, a unit value for gain has generally being used for most of the research reported in the literature, though a few authors have researched the relationship of the gain parameter with other parameters which used in BP algorithms. Results in [23] demonstrate that the learning rate, momentum coefficient and gain of the activation function have a significant impact on training speed. Unfortunately, higher values of learning rate or gain may cause instability [24].. Thimm et al. [25] also proved that a relationship between the gain value, a set of initial weight values, and a learning rate value exists. Eom et al. [26] proposed a method for automatic gain tuning using a fuzzy logic system. Nazri et al. [20] proposed a method to change the gain value adaptively on other optimisation method such as conjugate gradient.
Accelerating Learning Performance of Back Propagation Algorithm
563
3 The Proposed Algorithm In this section, a further improvement on the current working algorithm proposed by [20] for improving the training efficiency of back propagation is proposed. The proposed algorithm modifies the initial search direction by changing the three terms adaptively for each node. Those three terms are; gain value, momentum and learning rate. The advantages of using an adaptive gain value together with momentum and learning rate have been explored. Gain update expressions as well as weight and bias update expressions for output and hidden nodes have also been proposed. These expressions have been derived using same principles as used in deriving weight updating expressions. The following iterative algorithm is proposed for the batch mode of training. The weights, biases, gains, learning rates and momentum terms are calculated and update for the entire training set which is being presented to the network. For a given epoch, For each input vector, Step 1. Calculate the weight and bias values using the previously converged gain, momentum coefficient and learning rate value. Step 2. Use the weight and bias value calculated in Step (1) to calculate the new gain, momentum coefficient and learning rate value. Repeat Steps (1) and (2) for each input vector and sum all the weights, biases, learning rate, momentum coefficient and gain updating terms Update the weights, biases, gains, momentum coefficients and learning rates using the summed updating terms and repeat this procedure on epoch-by-epoch basis until the error on the entire training data set reduces to the predefined value.
The gain, momentum and learning rate update expression for a gradient descent method is calculated by differentiating the following error term E with respect to the corresponding gain parameter. The network error E is defined as follows: E=
For output unit,
1 (t k − ok (o j , ck ,α k ))2 ∑ 2
(3)
∂E ∂E needs to be calculated while for hidden units, is also ∂α k ∂α j
required. The respective momentum values would then be updated with the following equations:
∂E ) ∂α k ∂E ) Δ α j = η (− ∂α j
Δ α k = η (−
∂E = − (t k − ok ) ok (1 − ok ) ∂α k
(∑w
(4) (5) jk
oj + θk
)
(6)
564
N. Abdul Hamid et al.
Therefore, the momentum update expression for links connecting to output nodes is: Δ α k (n + 1) = η (t k − ok ) ok (1 − ok )
∂E ⎡ = − ∂α j ⎢⎣
∑ α w o (1 − o ) (t k
jk
k
k
k
k
( ∑w
jk
o j + θk
⎛⎛ ⎤ − ok )⎥ o j (1 − o j ) ⎜ ⎜⎜ ⎜ ⎦ ⎝⎝
)
(7)
⎞
∑ w o ⎟⎟⎠ + θ ij
i
j
j
⎞ ⎟ ⎟ ⎠
(8)
and the momentum update expression for the links connecting hidden nodes is: ⎛⎛ ⎞ ⎞ ⎡ ⎤ Δ α j (n + 1) = η ⎢ − ∑ α k w jk o k (1 − o k ) (t k − o k )⎥ o j (1 − o j ) ⎜ ⎜⎜ ∑ wij oi ⎟⎟ + θ j ⎟ ⎜ ⎟ ⎣ k ⎦ ⎠ ⎝⎝ j ⎠
(9)
Similarly, the weight and bias expressions are calculated as follows: The weight update expression for the links connecting to output nodes with a bias is: Δ w jk = η (t k − ok ) ok (1 − ok ) α k o j
(10)
Similarly, the bias update expressions for the output nodes would be: Δ θ k = η (tk − ok ) ok (1 − ok )α k
(11)
The weight update expression for the links connecting to hidden nodes is:
⎡ Δ wij = η ⎢ ⎣
∑α w o (1 − o )(t k
jk
k
k
k
k
⎤ − ok )⎥ α j o j (1 − o j )oi ⎦
(12)
Similarly, the bias update expressions for the hidden nodes would be: ⎡ Δθ j = η ⎢ ⎣
∑ α w o (1 − o ) (t k
jk
k
k
k
k
⎤ − ok )⎥ α j o j (1 − o j ) ⎦
(13)
The learning rate, η value is adjusted after every training iteration in order to attempt accelerating the leveling speed of back propagation convergence. The η for all weights are adjusted as shown in (14):
ηi +1 = ηi − where the
∂E ∂E ∂w . = ηi − ∂η ∂w ∂η
(14)
∂w is the gradient of the new weight with respect to the learning rate. ∂η
4 Results and Discussions The performance criterion used in this research focuses on the speed of convergence, measured in number of iterations and CPU time. The benchmark problems have been used to verify our algorithm. Two classification problems have been tested and verified using two benchmark dataset namely iris [21] and card [22] problems.
Accelerating Learning Performance of Back Propagation Algorithm
565
The simulations have been carried out on a Pentium IV with 2 GHz HP Workstation, 3.25 GB RAM and using MATLAB version 7.10.0 (R2010a). On each problem, the following three algorithms were analysed and simulated. 1) 2) 3)
The Back Propagation Gradient Descent (BPGD) The Back Propagation Gradient Descent with Adaptive Gain (BPGD-AG) [20] The Back Propagation Gradient Descent with Adaptive Gain, Adaptive Momentum and Adaptive Learning Rate (BPGD-AGAMAL)
To compare the performance of the proposed algorithm with conventional BPGD and BPGD-AG [20], network parameters such as network size and architecture (number of nodes, hidden layers, etc), values for the initial weights and gain parameters were kept the same. For all problems, the neural network had one hidden layer with five hidden nodes and sigmoid activation function was used for all nodes. All algorithms were tested using the same initial weights which were initialised randomly from range [0,1] and received the input patterns for training in the same sequence. For all training algorithms, as the gain value was modified, the weights and biases were updated using the new value of gain, momentum coefficient and learning rate. To avoid oscillations during training and to achieve convergence, an upper limit of 1.0 is set for the gain value. The initial value used for the gain parameter is set to one. The momentum term and learning rate is randomly generated from range [0,1] by using trial and error method. The best momentum term and learning rate value were selected. For each run, the numerical data is stored in two files - the results file and the summary file. The result file lists the data about each network. The number of iterations until the network converged is accumulated for each algorithm from which the mean, the standard deviation (SD) and the number of failures are calculated. The networks that failed to converge are obviously excluded from the calculations of the mean and SD and were consider to be reported as failures. For each problem, 50 different trials were run, each with different initial random set of weights. For each run, the number of iterations required for convergence is reported. For an experiment of 50 runs, the mean of the number of iterations (mean), the SD, and the number of failures are collected. A failure occurs when the network exceeds the maximum iteration limit; each experiment is run to ten thousand iterations; otherwise, it is halted and the run is reported as a failure. Convergence is achieved when the outputs of the network conform to the error criterion as compared to the desired outputs. 4.1 Iris Classification Problem
This dataset is a classical classification dataset made famous by Fisher, who used it to illustrate principles of discriminant analysis [21]. The classifying of Iris dataset involved classifying the data of petal width, petal length, sepal width and sepal length into three classes of species which are Iris Sentosa, Iris Versicolor and Iris Verginica. The selected architecture of the feedforward neural network is 4-5-3 with target error was set to 0.001. The best momentum term and learning rate value for conventional BPGD and BPGD-AG for the iris dataset is 0.4 and 0.6 while BPGD-AGAMAL is initialised randomly in range [0.1, 0.4] for momentum term and [0.4, 0.8] for learning rate value.
566
N. Abdul Hamid et al. Table 1. Algorithm performance for Iris Classification Problem [21] BPGD 1081 1.23 x 101 1.14 x 10-2 1.4 x 102 91.9 1
Mean Total CPU time (s) of converge CPU time(s)/Epoch SD Accuracy (%) Failures
BPGD-AG 721 5.89 8.17 x 10-3 4.09 x 102 90.3 0
BPGD-AGAMAL 533 4.26 7.99 x 10-3 2.45 x 102 93.1 0
Table 1 shows that the proposed algorithm (BPGD-AGAMAL) exhibit very good average performance in order to reach the target error. The proposed algorithm (BPGD-AGAMAL) needs only 533 epochs to converge as opposed to the conventional BPGD at about 1081 epochs while BPGD-AG needs 721 epochs to converge. Apart from speed of convergence, the time required for training the classification problem is another important factor when analysing the performance. For numerous models, training process may suppose a very important time consuming process. The results in Fig. 1 clearly show that the proposed algorithm (BPGD-AGAMAL) outperform conventional BPGD with an improvement ratio, nearly 2.9 seconds while BPGD-AG, the proposed algorithm outperformed 1.38 seconds for the total time of converge. Furthermore, the accuracy of BPGD-AGAMAL is much better than BPGD and BPGD-AG algorithm. 14.00
12.29
12.00
1000.00
10.00
Epoch
800.00
8.00 600.00
5.89 1080.96
400.00
4.26 720.56
6.00 4.00
532.92
200.00
CPU Time (seconds)
1200.00
2.00
0.00
0.00 GDM
GDM-AG
GDM-AGAMAL
Method Epoch
CPU Time (seconds)
Fig. 1. Performance comparison of BPGD-AGAMAL with BPGD-AG and conventional BPGD on Iris Classification Problem
4.2 Card Classification Problem
This dataset contains all the details on the subject of credit card applications. It predicts the approval or non-approval of a credit card to a customer [22]. Descriptions of each attribute name and values are enclosed for confidentiality reason. This dataset classified whether the bank granted the credit card or not. The selected architecture of
Accelerating Learning Performance of Back Propagation Algorithm
567
Neural Network is 51-5-2 and the target error was set as 0.001. The best momentum term for conventional BPGD and BPGD-AG is 0.4 meanwhile the best learning rate value is 0.6. The momentum value for BPGD-AGAMAL is initialised randomly in range [0.1, 0.4] and [0.4, 0.8] for learning rate value. Table 2. Algorithm performance for Card Classification Problem [22] BPGD 8645 5.47x 102 6.33x 10-2 2.76 x 10-3 83.45 41
Mean Total CPU time (s) of converge CPU time(s)/Epoch SD Accuracy (%) Failures
BPGD-AG 1803 4.72 x 101 2.61 x 10-2 6.55 x 10-1 82.33 0
BPGD-AGAMAL 1328 2.2 x 101 1.66 x 10-2 6.75 x 10-2 83.9 0
Table 2 reveals that BPGD needs 547 seconds with 8645 epochs to converge, whereas BPGD-AG needs 47.2 seconds with 1803 epochs to converge. Conversely, the proposed algorithm (BPGD-AGAMAL) performed significantly better with only 41.1 seconds and required 1328 epochs to converge. From Fig. 2, it is worth noticing that the performance of the BPGD-AGAMAL is almost 96% faster than BPGD and 53.4% faster than BPGD-AG. 600.00
10000.00 8000.00
547.10 500.00
8644.52
Epoch
7000.00
400.00
6000.00 5000.00
300.00
4000.00 200.00
3000.00
1803.16
2000.00 1000.00
47.19
0.00 GDM
GDM-AG
1328.18 22.00
CPU Time (seconds)
9000.00
100.00 0.00
GDM-AGAMAL
Method Epoch
CPU Time (seconds)
Fig. 2. Performance comparison of BPGD-AGAMAL with BPGD-AG and conventional BPGD on Card Classification Problem
The results show that the BPGD-AGAMAL perform better as compared to BPGD and BPGD-AG. Moreover, when comparing the proposed algorithm with BPGD and BPGD-AG, it has been empirically demonstrated that the proposed algorithm (BPGD-AGAMAL) performed highest accuracy than BPGD and BPGD-AG algorithm. This conclusion enforces the usage of the proposed algorithm as alternative training algorithm of BP neural networks.
568
N. Abdul Hamid et al.
5 Conclusions Although BP algorithm is widely implemented in the most practical neural networks applications and performed relatively well, this algorithm still needs some improvements. We have proposed a further improvement on the current working algorithm proposed by Nazri et al. [20]. The proposed algorithm adaptively changes the gain parameter of the activation function together with momentum coefficient and learning rate to improve the learning speed. The effectiveness of the proposed algorithm has been compared with the conventional Back Propagation Gradient Descent (BPGD) and Back Propagation Gradient Descent with Adaptive Gain (BPGD-AG) [20]. The three algorithms were been verified by means of simulation on two classification problems including iris dataset with an improvement ratio nearly 2.8 seconds for the BPGD and 1.33 seconds better for the BPGD-AG in terms of total time to converge. Meanwhile, card dataset indicates almost 92.5% and 12.92% faster compared to BPGD and BPGD-AG respectively. The results show that the proposed algorithm (BPGD-AGAMAL) has a better convergence rate and learning efficiency as compared to conventional BPGD and BPGD-AG [20]. Acknowledgments. The authors would like to thank Universiti Tun Hussein Onn Malaysia (UTHM) for supporting this research under the Postgraduate Incentive Research Grant.
References 1. Nawi, N.M., Ransing, R.S., Salleh, M.N.M., Ghazali, R., Hamid, N.A.: An Improved Back Propagation Neural Network Algorithm on Classification Problems. In: Zhang, Y., Cuzzocrea, A., Ma, J., Chung, K.-i., Arslan, T., Song, X. (eds.) DTA and BSBT 2010. Communications in Computer and Information Science, vol. 118, pp. 177–188. Springer, Heidelberg (2010) 2. Nawi, N.M., Ghazali, R., Salleh, M.N.M.: The Development of Improved BackPropagation Neural Networks Algorithm for Predicting Patients with Heart Disease. In: Zhu, R., Zhang, Y., Liu, B., Liu, C. (eds.) ICICA 2010. LNCS, vol. 6377, pp. 317–324. Springer, Heidelberg (2010) 3. Sabeti, V., Samavi, S., Mahdavi, M., Shirani, S.: Steganalysis and payload estimation of embedding in pixel differences using neural networks. Pattern Recogn. 43, 405–415 (2010) 4. Mandal, S., Sivaprasad, P.V., Venugopal, S., Murthy, K.P.N.: Artificial neural network modeling to evaluate and predict the deformation behavior of stainless steel type AISI 304L during hot torsion. Applied Soft Computing 9, 237–244 (2009) 5. Subudhi, B., Morris, A.S.: Soft computing methods applied to the control of a flexible robot manipulator. Applied Soft Computing 9, 149–158 (2009) 6. Yu, L., Wang, S.-Y., Lai, K.K.: An Adaptive BP Algorithm with Optimal Learning Rates and Directional Error Correction for Foreign Exchange Market Trend Prediction. In: Wang, J., Yi, Z., Żurada, J.M., Lu, B.-L., Yin, H. (eds.) ISNN 2006. LNCS, vol. 3973, pp. 498–503. Springer, Heidelberg (2006)
Accelerating Learning Performance of Back Propagation Algorithm
569
7. Lee, K., Booth, D., Alam, P.: A comparison of supervised and unsupervised neural networks in predicting bankruptcy of Korean firms. Expert Systems with Applications 29, 1–16 (2005) 8. Sharda, R., Delen, D.: Predicting box-office success of motion pictures with neural networks. Expert Systems with Applications 30, 243–254 (2006) 9. Popescu, M.-C., Balas, V.E., Perescu-Popescu, L., Mastorakis, N.: Multilayer perceptron and neural networks. WSEAS Trans. Cir. and Sys. 8, 579–588 (2009) 10. Fung, C.C., Iyer, V., Brown, W., Wong, K.W.: Comparing the Performance of Different Neural Networks Architectures for the Prediction of Mineral Prospectivity. In: Proceedings of 2005 International Conference on Machine Learning and Cybernetics, pp. 394–398 (2005) 11. Alsmadi, M.K.S., Omar, K., Noah, S.A.: Back Propagation Algorithm: The Best Algorithm Among the Multi-layer Perceptron Algorithm. International Journal of Computer Science and Network Security 9, 378–383 (2009) 12. Otair, M.A., Salameh, W.A.: Speeding Up Back-Propagation Neural Networks. In: Proceedings of the 2005 Informing Science and IT Education Joint Conference, Flagstaff, Arizona, USA, pp. 167–173 (2005) 13. Bi, W., Wang, X., Tang, Z., Tamura, H.: Avoiding the Local Minima Problem in Backpropagation Algorithm with Modified Error Function. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. E88-A, 3645–3653 (2005) 14. Wang, X.G., Tang, Z., Tamura, H., Ishii, M., Sun, W.D.: An improved backpropagation algorithm to avoid the local minima problem. Neurocomputing 56, 455–460 (2004) 15. Ng, W.W.Y., Yeung, D.S., Tsang, E.C.C.: Pilot Study On The LocalizedGeneralization Error Model For Single Layer Perceptron Neural Network. In: Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, August 13-16, pp. 3078–3082 (2006) 16. Ji, L., Wang, X., Yang, X., Liu, S., Wang, L.: Back-propagation network improved by conjugate gradient based on genetic algorithm in QSAR study on endocrine disrupting chemicals. Chinese Science Bulletin 53, 33–39 (2008) 17. Sun, Y.-J., Zhang, S., Miao, C.-X., Li, J.-M.: Improved BP Neural Network for Transformer Fault Diagnosis. Journal of China University of Mining and Technology 17, 138–142 (2007) 18. Hongmei, S., Gaofeng, Z.: A New BP Algorithm with Adaptive Momentum for FNNs Training. In: WRI Global Congress on Intelligent Systems, GCIS 2009, pp. 16–20 (2009) 19. Hamid, N.A., Nawi, N.M., Ghazali, R.: The Effect of Adaptive Gain and Adaptive Momentum in Improving Training Time of Gradient Descent Back Propagation Algorithm on Classification Problems. In: Proceeding of the International Conference on Advanced Science, Engineering and Information Technology 2011, pp. 178–184. Hotel Equatorial Bangi-Putrajaya, Malaysia (2011) 20. Nawi, N.M., Ransing, R.S., Ransing, M.S.: An Improved Conjugate Gradient Based Learning ALgorithm for Back Propagation Neural Networks. International Journal of Information and Mathematical Sciences 4, 46–55 (2008) 21. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179–188 (1936) 22. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Series in Machine Learning. Morgan Kaufmann, San Francisco (1993) 23. Maier, H.R., Dandy, G.C.: The effect of internal parameters and geometry on the performance of back-propagation neural networks: an empirical study. Environmental Modelling and Software 13, 193–209 (1998)
570
N. Abdul Hamid et al.
24. Hollis, P.W., Paulos, J.J.: The effects of precision constraints in a backpropagation learning network. In: International Joint Conference on Neural Networks, IJCNN, vol. 2, p. 625 (1989) 25. Thimm, G., Moerland, P., Fiesler, E.: The interchangeability of learning rate and gain in backpropagation neural networks. Neural Comput. 8, 451–460 (1996) 26. Eom, K., Jung, K., Sirisena, H.: Performance improvement of backpropagation algorithm by automatic activation function gain tuning using fuzzy logic. Neurocomputing 50, 439– 460 (2003)
Developing an HCI Model: An Exploratory Study of Featuring Collaborative System Design Saidatina Fatimah Ismail1, Rathiah Hashim1, and Siti Zaleha Zainal Abidin2 1
Faculty of Computer Science and Information Technology, Universiti Tun Hussein Onn Malaysia, 86400, Batu Pahat, Johor 2 Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450, Shah Alam, Selangor Malaysia [email protected], [email protected], [email protected]
Abstract. In a networked collaborative virtual environment (NCVE), people work together as a group and share the similar user interface (UI) design. Therefore, it is crucial to have a good UI so that activities can be performed quickly and effectively. The human computer interaction (HCI) discipline and principles can be a solution to poorly designed human-machine interfaces. Nowadays, studies of human and machine interaction is an attempt to allow task to be performed easily, friendly and fast since many activities are running concomitantly in collaborative environment. Thus, this paper presents an exploratory study of featuring collaborative system design. We highlight various collaborative features that follow the common HCI rules. The study serves as an effort towards developing a new HCI model that can be applied to any collaborative applications. Keywords: Human computer interaction, user interface, networked collaborative virtual environment, collaborative features.
1 Introduction Currently, it is amazing that a collaborative application is developed to allow users to collaborate with each other and bring people together for one reason or another: to socialize, to work together, to innovate, to cooperate and contribute to the production of something [1]. More concretely, in a collaborative environment, people are communicating with each other from different physical locations to share the same resources using the same interface such as online gaming, emailing, video conferencing, chatting and the like [2]. Furthermore, real-time information features offer faster decision making to accomplish users’ objectives. An effectiveness and correctness of the information that communicated between users are the key elements to improve any decision making process [3]. T.-h. Kim et al. (Eds.): UCMA 2011, Part II, CCIS 151, pp. 571–580, 2011. © Springer-Verlag Berlin Heidelberg 2011
572
S.F. Ismail, R. Hashim, and S.Z. Zainal Abidin
A task with an easy and fast performance (less number of clicks) can make collaborative system interfaces more effective and user friendly. Therefore, the best collaborative features should be integrated with HCI rules and principles in order to build interface design in networked collaborative virtual environment (NCVE). HCI models and User Interface (UI) guidelines are tools to help in UI designation in terms of usability [2]. They give great impact on high usability to make the UI easier to use and train people for using it. So, our aim is to reveal the collaborative features that follow the HCI rules and principles in order to increase the user’s satisfaction. The remainder of this paper is organised as follows: Section 2 discusses on the related works of HCI and collaborative applications, while the exploratory study of collaborative interface features is performed in Section 3. Then, Section 4 describes the new features for collaborative interface and followed by the conclusion in Section 5.
2 Related Works 2.1 Human Computer Interaction Human computer interaction (HCI) is clearly focused on a study of interaction between humans and computers rather than just design of the interface. Therefore, the supporting knowledge on the human side such as cognitive psychology, social sciences, linguistics, communication theory, graphic and industrial design disciplines, and human performance are relevant. HCI is also closely related to UI. UI or simply called interface is the intermediary between human and computer. Interface is also considered as an interpreter or guider to the complexities of a site between user and content through physical interface and software interface. HCI principles are crucial in designing a good UI in terms of usability, information architecture and individual operation where UI can be simple and obvious [4]. The famous HCI principle is Shneiderman’s eight golden rules; strive for consistency, enable frequent users to use shortcuts, offer informative feedback, design dialog to yield closure, offer simple error handling, permit easy reversal of actions, support internal locus of control and reduce short-term memory load [2]. However, UI principles can also be applied such as simplicity, structure, visibility, flexible or tolerance and reuse to maintain the consistency so that the need for user to rethink and remember can be reduced [5]. This ten general UI principles or also called “heuristics” are more in the nature of rules of thumb than specific usability guideline. Nevertheless, Norman [6] developed a number of models of user interaction based on theories of cognitive processing arising out of cognitive science, that were intended to explain the way users interacted with interactive technologies. Another highly influential model based on cognitive theory that made its mark in the 1980s was Card, Moran and Newell’s keystroke level model [7]. This was used by a number of researchers and designers as a predictive way of analysing user performance for different interfaces to determine which would be the most effective. More recent models developed in interaction design are user models, that predict what information users want in their interactions, and models that characterise core components of the user experience, such as Norman’s model of emotional design [8]. All the principles and guidelines can be used in any UI including collaborative application such as computer games and e-Learning system. For example, in order to
Developing an HCI Model
573
develop an effective interface for e-Learning application, several issues must be considered such as the convenient way of students to give high concentration without facing any complex process, dynamic transformation of content and size (balanced information) and active participation of students through their characteristics (any action detected from interface) [9]. 2.2 Networked Collaborative Virtual Environment Networked Collaborative Virtual Environment (NCVE) allows people to cooperate and work with each other from different locations. Communication and computing technologies are highly interdependent and closely related to each other [10]. Meanwhile, collaborative applications are the set of tools which support the communication between the learners, using asynchronous collaboration and/or synchronous collaboration as well [11]. Besides, the way multimedia elements are integrated is also crucial in making collaborative activities successful. The collaborative virtual environment (CVE) provides the platform for performing activities and communication tasks. CVE is also known as distributed virtual reality systems and it offers a space or digital landscape. Within this landscape, individual and data can interact with each other and make it as a share place for work and leisure. Individuals or users, artifacts and data object can encounter each other by navigate through the space. In this case, they are communicating using verbal and non-verbal communication which they can see each other, orient their body to each other and converse together through visual and auditory channels [12]. Recently, many collaborative applications are broadly used to develop projects, negotiate business deals, promote products, conduct meetings and also as a medium for teaching and learning [13]. This shows that collaborative community from different countries can exchange ideas and work together effectively to support a particular set of goals. Meanwhile enterprises are looking at collaborative applications and technologies to address these drivers for business transformation: the impact of the empowered user, the demand for secure, real-time information, and the concept of a borderless enterprise [1].
3 Exploratory Study In networked collaborative virtual environment (NCVE), people work together as a group and share the same UI. This collaborative work is supported by computer supported cooperative work (CSCW) and allows people to collaborate with each other through UI that supports all those information exchange and at the same time being user friendly and easy to use. There are two different scenarios in NCVE that include synchronous and asynchronous [1]. In e-learning collaborative system, the most important part is a set of tools which favours the communication between the learners. Such collaboration can be in asynchronous or/and synchronous scenario [11]. Various techniques and technologies are used in synchronous collaboration such as the whiteboard due to its communication is held at real-time. These activities can facilitate people to work with each other since the information can be exchanged in a faster way. Meanwhile in
574
S.F. Ismail, R. Hashim, and S.Z. Zainal Abidin
asynchronous collaboration, the last changes performed are always stored in the system when users leave the session. In this way, collaborators can continue to work later with the information of the latest session [11]. There is much researches undertaken to construct a real-world-like interface (Wouter [14]), such as to develop an architecture for Discovery Learning. It defines how they can be supported by collaboration, and how the new types of instructional support can be created from the interaction between collaborative support measures and the specific learning processes involved in discovery learning. Table 1 shows the features of Discovery Learning. Table 1. Features of Discovery Learning [14] Features Frame of reference
Description Collaborative tools must be able to interpret the experimental results and hypothesis stated in the simulation environment. At least, they need to know what to do with them.
Collaborative tools
They will be used in order to discuss about new hypothesis, the results of experiments, reasoning about the results, etc. In general, they are responsible for the collaboration between learners.
Experimentation space
It refers to an area in which the experiments are held, providing ways of communication between the experiments and the users’ interfaces. More exactly, the results of these experiments are the base in which learners will discuss and reason.
Collaborative scenarios
It refers to the kind of collaboration to support the learning tasks. As we have mentioned before there are two major approaches, the synchronous and the asynchronous.
Collaborative features are embedded in scripting language such as JACIE (Java-based Authoring language for Collaborative Interactive Environment) [3]. JACIE is a scripting language which designed to support the management of multimedia interaction and communication in collaborative applications. Collaborative mechanisms are popular in networked activities. Furthermore, JACIE is also as a practical solution to network programming by focusing on the management of user interaction and communications. It supports communication through the concept of channels and interaction protocols [2], [15]. The special features of JACIE include a single program for client and server, template-based programming style and hiding the low-level networking programming and event management from the programmers [3]. Upon having the connection to the server, it supports server mediated multi user interaction for networked collaboration [2]. Table 2 shows the main features of JACIE language.
Developing an HCI Model
575
Table 2. Features of JACIE [15] Features Template based programming style
Description A JACIE program fits into a prescribed standard template layout that eases the programming task.
Single program for server and client
JACIE allows the integration of the code segments for both server and client in one program. JACIE allows this combination in order to ease the data manipulation between server and clients. It is the compiler’s job to generate the separate programs that run on separate computers
Communication channels
There are several built-in communication channels that facilitate different forms of media and communication methods. The channels are canvas, message, chat, voice, video and whiteboard. The last three channels are supported by the Microsoft Netmeeting.
Interaction protocols
All the introduced interaction protocols were built-in and managed by the server. Almost all common collaborative applications to be programmed
Multithreading
JACIE is built using Java which provide multithreading in order to accommodate the computing environment of a networked system.
Interfacing with Java
JACIE have a programming interface with the Java language for more complicated graphics application.
However, from the view of UI design principles, JACIE interface is lack of several factors such as structure, simplicity, visibility, feedback and etc. Therefore, a comparative study on user interface design for collaborative bridge game has been conducted between several bridge game applications, which one of those game applications was developed under JACIE Graphic User Interface (GUI) output [2]. While focusing on various accessibility features and HCI rules, different elements of each feature were compared in an attempt to improve the usability of JACIE’s interface. Since NCVE does not include communication cues such as gesture, body posture, facial expressions, direction of gaze and verbal communication, therefore awareness issues are important to address [16]. More concretely, the absent of social cues brings a lack of sense of social awareness. Thus as stated in [12], control factors (degree of control, immediacy of control, anticipation, mode of control and physical environment modifiability); sensory factors (modality, environmental richness, multimodel presentation, consistency of multimodel presentation, degree of movement and active search) and distraction factors (isolation and selective attention)
576
S.F. Ismail, R. Hashim, and S.Z. Zainal Abidin
are very much contribute to a sense of presence and enhancing levels of social awareness. As described in [13], communication in NCVE can be improved through the use of eight identified awareness models (Refer to Table 3). Table 3. Eight awareness models [13], [17], [18] Features Awareness of presence
Description Who is in virtual space, how many participants are involved, where they come from and their availability [17].
Awareness of state
User’s state of mind [13].
Awareness of role
User’s position in virtual space [13].
Awareness of turn taking
Who is talking, who is listening, whose ideas it is and whose turn to talk [13].
Awareness of emotion
Communication cue, gesture, eye contact and tone of voice [13].
Awareness of identities
When user have choices to use different identities in different virtual spaces [18].
Contextual awareness
Users are aware of the task and progress of themselves and others [18].
Conversational awareness
What a user is talking about and what he/she is referring to [13].
In CNVE, many things are shared together. People share information and communicate with task through interaction with each other and data representation [12]. This is also known as shared context, which users will share knowledge, artifacts environment and etc. In this process they are leaded to share understandings with each other. For example, shared activity among users (e.g. shared document). Communication capabilities, publication and collaborative knowledge, community management services, common application services and collaboration engine (presentation, editing integration, search and mashups) can be shared across application such as blog and wiki [1]. The closest example is an e-learning system where the students form a group to learn together in different places or time and share all the same features in the application [9]. A collaboration tool is an instrument designed to help people involved in a common task to achieve their goals. There are a lot of tools designed to facilitate and manage group activities such as communication and cover all detailed aspects of collaborative activities while working on the overall project [12]. Effective tools are needed for successful interaction and communication [14]. For example, a collaborative game can perform better when the awareness tools are used [19] such as Collabtive, phpGroupWare, Dimdim and eGroupWare.
Developing an HCI Model
577
4 Collaborative Features for New HCI Model Today, studies of a human and a machine interaction is an attempt to allow task to be performed easily, friendly and fast since many activities are running concomitantly in collaborative environment. However, the designer is required to refer to HCI principle and guideline since it is very much contributed to user interface of many applications. Furthermore, the existence of HCI models that help in terms of UI designation and evaluation should be chosen conscientiously. However the representative of HCI model for collaborative system is one of the possible solutions that can be used in order to build an interface that promising an activity to be performed quickly and effectively. There are eight issues that have been collected and analysed based on current features and additional criteria gathered from HCI and UI principles. The eight issues are awareness, display style, interaction or communication channel, sound and multimedia, action, accessibility factors, tools, shareability and scenarios. Combination of these features is important for UI collaborative application because it is able to improve usability that plays a great impact of collaborative activities. Referring to Table 4, the compilation of collaborative features is based on the existing features that cater on e-Learning, JACIE language game collaboration and etc. However, this compilation is for collaborative applications in general and leads to the importance realisation of specific features that can be the guideline for a designer to build any collaborative application. Awareness features encompass eight of awareness model which are crucial to be adapted in a collaborative application. Representative of users, labeling of users, positioning, blinking element turn visibility/taking and reversibility are the elements that follow the UI principle (visibility of system status) [5]. In addition, consistency rules [2], [5] from HCI principles also should be included for display features and sound and multimedia features. The elements such as text, color, contrast, animation, buttons, response time and sound should be consistent and standardized besides describe the specific visual and textual expression of the interface. A tool is an important feature to facilitate the management and covers all detailed aspects of collaborative activities. The activities encompassing in successful interaction and communication are closely related with the tools. Therefore, all the communication channels such as canvas, message, online chatting, voice, video and whiteboard are depended on good collaborative tools to perform better UI. Collaborative scenarios determine the applications whether they are running on real time or vice versa. It can be known through four scenarios that include synchronous, synchronous or distributed, asynchronous and asynchronous or distributed. These scenarios allow the shareability of communication capabilities, publication and collaborative knowledge, community management services, common application services and collaboration engine (presentation, editing integration, search and mashups). Accessibility features are procedures involved in accessing all the collaborative applications. Unlike some other applications, collaborative systems make the connection between users and the server prior the application. In addition, several collaborative applications require users to insert their usernames before registering; otherwise users have to fill in data entry forms or use confirmation code as security purpose before performing another task (e.g. sending email). It is also existed in several applications that have loading time to start and use the services. However, it is better for a user to click only one button to logging off all collaborative applications.
578
S.F. Ismail, R. Hashim, and S.Z. Zainal Abidin Table 4. Collaborative Features for New HCI model Features Awareness
Description Representation of users Labeling of users Positioning Blinking element Turn visibility/taking Reversibility
Display Style
Text Colour Contrast Animation Buttons
Sound and Multimedia
Response time Sound
Tools
Facilitate, manage and covers all detailed aspects of collaboration activities
Interaction/Communication Channel
Canvas Message Online Chatting Voice Video Whiteboard
Scenarios
Synchronous – same time/same place Synchronous/Distributed – same time/ different place Asynchronous – same place/different time Asynchronous/Distributed – different place/different time
Shareability
Communication capabilities Publication and collaborative knowledge Common application services Community management services Collaboration engine (presentation, editing integration, search and mashups)
Accessibility Factors
Connection to server User registration User verification Start the application Use the application Logging Off
Developing an HCI Model
579
5 Conclusions We have explored many essential collaborative features that are common for e-Learning, JACIE language, game collaboration and etc. All of them serve as guideline features that can be embedded into an existing scripting language, JACIE. Therefore, in this paper we have compiled new collaborative features to be used as guidelines in order to develop collaborative applications. There are eight features and their description which can be integrated to any collaborative environment. There are many user interfaces that have been designed by using a design model or framework published in the HCI literatures covering different aspects of user experience in different application domains. However, it is still questionable whether these frameworks or models are applicable to many types of collaborative systems. Thus, this analysis will be an initial effort to develop an UI model for collaborative applications from a software development (language based) perspectives on JACIE. Acknowledgments. The authors would like to thank Universiti Tun Hussein Onn Malaysia for supporting this research under the Post Graduate Incentive Research Grant.
References 1. Cisco Systems, http://cisco.biz/en/US/solutions/collateral/ns340/ns858/ C11-503429-00-CollaArchit.pdf 2. Ismail, S.F., Hashim, R., Abidin, S.Z.Z.: Collaborative Bridge Game: A Comparative Study on User Interface Design. In: International Conference on Information Retrieval and Knowledge Management, Shah Alam, pp. 34–39 (2010) 3. Haji-Ismail, A.S., Chen, M., Grant, P.W., Kiddell, M.: JACIE-an Authoring Language for Rapid Prototyping Net-Centric, Multimedia and Collaborative Applications. Annals of Software Engineering 12, 47–75 (2001) 4. Baxley, B.: Universal model of a user interface. In: Proceedings of the Conference on Designing for User Experiences, pp. 1–4 (2003) 5. Nielsen, J., Molich, R.: Heuristic evaluation of user interfaces. In: Proceeding of the ACM CHI 1990 Conference, Seattle, pp. 249–256 (1990) 6. Norman, D.A., Norman, D.: The Design of Everyday Things. Basic Books, New York (1988) 7. Department of Computer Science, http://www.cs.umd.edu/class/fall2002/cmsc838s/tichi/printer/ goms.html 8. Norman, D.A.: Emotional Design: Why we Love (or Hate) Everyday Things Basic Books, New York (2004) 9. Kojiri, T., Ito, Y., Watanabe, T.: User-oriented Interface for Collaborative Learning Environment. In: Proceedings of the International Conference on Computers in Education (ICCE 2002), pp. 213–214. ACM, New York (2002) 10. Regmi, A., Watt, S.M.: A Collaborative Interface for Multimodal Ink and Audio Documents. In: 10th International Conference of Document Analysis and Recognition, ICDAR 2009, pp. 901–905. IEEE, Barcelona (2009)
580
S.F. Ismail, R. Hashim, and S.Z. Zainal Abidin
11. Carreras, M.A.M., Skarmeta, A.F.G., Gracia, E.M.: Designing Collaborative Environments and their Application in Learning. In: International Conference on Collaborative Computing: Networking, Applications and Worksharing, p. 10. IEEE, San Jose (2005) 12. Churchill, E.F., Snowdon, D.N., Munro, A.J.: Collaborative Virtual Environments: Digital Places and Spaces for Interaction. Springer, New York (2001) 13. Idrus, Z., Abidin, S.Z.Z., Hashim, R., Omar, N.: Social Awareness: The Power of Digital Elements in Collaborative Environment. WSEAS Transactions on Computers 9, 644–653 (2010) 14. Van Joolingen, W.R.: Designing for Collaborative Discovery Learning. In: Gauthier, G., VanLehn, K., Frasson, C. (eds.) ITS 2000. LNCS, vol. 1839, pp. 202–211. Springer, Heidelberg (2000) 15. Abidin, S.Z.Z.: Interaction and Interest Management in a Scripting Language, PhD thesis, Department of Computer Sceince, University of Wales Swansea, UK (2006) 16. Forland, E.P., Divitini, M.: Supporting Social Awareness: Requirements for Educational CVE. In: Proceedings of The 3rd IEEE International Conference on Advanced Learning Technologies, pp. 366–367. IEEE, Athens (2003) 17. Eisentadt, M., Komzak, J., Dzbor, M.: Instant Messaging + Maps = Powerful Collaboration Tools for Distance Learning. In: Procedings of TelEduc 2003, pp. 19–21 (2003) 18. Tran, M.H.: Supporting Group Awareness in Synchronous Distribution Groupware: Framework, Tools and Evaluations. Doctorial thesis. Swinburne University of Technology, Melbourne, Australia (2006) 19. Nova, N., Wehrle, T., Goslin, J., Bourquin, Y., Dillenbourg, P.: Collaboration in a multiuser game: impacts of an awareness tool on mutual modeling. Multimedia Tools and Applications 32, 161–183 (2007)