Advances in Computer Science and Information Technology: AST UCMA ISA ACN 2010 Conferences, Miyazaki, Japan, June 23-25, 2010. Joint Proceedings ... Applications, incl. Internet Web, and HCI)
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
6059
Tai-hoon Kim Hojjat Adeli (Eds.)
Advances in Computer Science and Information Technology AST/UCMA/ISA/ACN 2010 Conferences Miyazaki, Japan, June 23-25, 2010 Joint Proceedings
13
Volume Editors Tai-hoon Kim Hannam University Daejeon 306-791, South Korea E-mail: [email protected] Hojjat Adeli The Ohio State University Columbus, OH, 43210, USA E-mail: [email protected]
Library of Congress Control Number: 2010927807 CR Subject Classification (1998): C.2, H.4, H.3, I.2, I.4, I.5 LNCS Sublibrary: SL 3 – Information Systems and Application, incl. Internet/Web and HCI ISSN ISBN-10 ISBN-13
0302-9743 3-642-13576-5 Springer Berlin Heidelberg New York 978-3-642-13576-7 Springer Berlin Heidelberg New York
Advanced Science and Technology, Advanced Communication and Networking, Information Security and Assurance, Ubiquitous Computing and Multimedia Applications are conferences that attract many academic and industry professionals. The goal of these co-located conferences is to bring together researchers from academia and industry as well as practitioners to share ideas, problems and solutions relating to the multifaceted aspects of advanced science and technology, advanced communication and networking, information security and assurance, ubiquitous computing and multimedia applications. This co-located event included the following conferences: AST 2010 (The second International Conference on Advanced Science and Technology), ACN 2010 (The second International Conference on Advanced Communication and Networking), ISA 2010 (The 4th International Conference on Information Security and Assurance) and UCMA 2010 (The 2010 International Conference on Ubiquitous Computing and Multimedia Applications). We would like to express our gratitude to all of the authors of submitted papers and to all attendees, for their contributions and participation. We believe in the need for continuing this undertaking in the future. We acknowledge the great effort of all the Chairs and the members of advisory boards and Program Committees of the above-listed events, who selected 15% of over 1,000 submissions, following a rigorous peer-review process. Special thanks go to SERSC (Science & Engineering Research Support soCiety) for supporting these colocated conferences. We are grateful in particular to the following speakers who kindly accepted our invitation and, in this way, helped to meet the objectives of the conference: Hojjat Adeli (The Ohio State University), Ruay-Shiung Chang (National Dong Hwa University), Adrian Stoica (NASA Jet Propulsion Laboratory), Tatsuya Akutsu (Kyoto University) and Tadashi Dohi (Hiroshima University). We would also like to thank Rosslin John Robles and Maricel O. Balitanas, graduate students of Hannam University, who helped in editing the material with great passion.
April 2010
Tai-hoon Kim
Preface
This volume contains carefully selected papers that were accepted for presentation at the second International Conference on Advanced Science and Technology held in conjunction with the ISA, ACN and UCMA held on June 23–25, 2010, at Sheraton Grande Ocean Resort, in Miyazaki, Japan. The papers in this volume were recommended based on their scores, obtained from the independent reviewing processes of each conference, and on their relevance to the idea of constructing hybrid solutions to address the real-world challenges of IT. The final selection was also based on the attempt to make this volume as representative of the current trend in IT as possible. The conference focused on various aspects of advances in advanced computer science and information technology with computational sciences, mathematics and information technology. It provided a chance for academic and industry professionals to discuss recent progress in the related areas. We expect that the conference and its publications will be a trigger for further related research and technology improvements in this important subject. We would like to acknowledge the great effort of all the Chairs and members of the Program Committee. Out of approximately 122 papers accepted, a total of 49 papers are published in this LNCS volume. The remaining accepted papers were included in the proceedings of each particular event and published by Springer in its CCIS series (respective volume numbers: 74, 75, 76 and 77). We would like to express our gratitude to all of the authors of submitted papers and to all the attendees, for their contributions and participation. We believe in the need for continuing this undertaking in the future. Once more, we would like to thank all the organizations and individuals who supported this event as a whole and, in particular, helped in the success of second International Conference on Advanced Science and Technology in conjunction with ISA, ACN and UCMA.
April 2010
Tai-hoon Kim
Table of Contents
Information Security and Assurance Fuzzy Based Threat Analysis in Total Hospital Information System . . . . Nurzaini Mohamad Zain, Ganthan Narayana Samy, Rabiah Ahmad, Zuraini Ismail, and Azizah Abdul Manaf
1
An ID-Based Anonymous Signcryption Scheme for Multiple Receivers Secure in the Standard Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bo Zhang and Qiuliang Xu
15
A Supervised Locality Preserving Projections Based Local Matching Algorithm for Face Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yingqi Lu, Cheng Lu, Miao Qi, and Shuyan Wang
28
Information Systems Security Criticality and Assurance Evaluation . . . . . Moussa Ouedraogo, Haralambos Mouratidis, Eric Dubois, and Djamel Khadraoui Security Analysis of ‘Two–Factor User Authentication in Wireless Sensor Networks’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Muhammad Khurram Khan and Khaled Alghathbar Directed Graph Pattern Synthesis in LSB Technique on Video Steganography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Debnath Bhattacharyya, Arup Kumar Bhaumik, Minkyu Choi, and Tai-hoon Kim Feature Level Fusion of Face and Palmprint Biometrics by Isomorphic Graph-Based Improved K-Medoids Partitioning . . . . . . . . . . . . . . . . . . . . . . Dakshina Ranjan Kisku, Phalguni Gupta, and Jamuna Kanta Sing Post-quantum Cryptography: Code-Based Signatures . . . . . . . . . . . . . . . . . Pierre-Louis Cayrel and Mohammed Meziani Security Analysis of the Proposed Practical Security Mechanisms for High Speed Data Transfer Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Danilo Valeros Bernardo and Doan Hoang A Fuzzy-Based Dynamic Provision Approach for Virtualized Network Intrusion Detection Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Bo Li, Jianxin Li, Tianyu Wo, Xudong Wu, Junaid Arshad, and Wantao Liu
38
55
61
70
82
100
115
VIII
Table of Contents
An Active Intrusion Detection System for LAN Specific Attacks . . . . . . . Neminath Hubballi, S. Roopa, Ritesh Ratti, F.A. Barbhuiya, Santosh Biswas, Arijit Sur, Sukumar Nandi, and Vivek Ramachandran
129
Analysis on the Improved SVD-Based Watermarking Scheme . . . . . . . . . . Huo-Chong Ling, Raphael C-W. Phan, and Swee-Huay Heng
143
Advanced Communication and Networking Applications of Adaptive Belief Propagation Decoding for Long Reed-Solomon Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhian Zheng, Dang Hai Pham, and Tomohisa Wada Dynamic Routing for Mitigating the Energy Hole Based on Heuristic Mobile Sink in Wireless Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . Seong-Yong Choi, Jin-Su Kim, Seung-Jin Han, Jun-Hyeog Choi, Kee-Wook Rim, and Jung-Hyun Lee
150
159
Grammar Encoding in DNA-Like Secret Sharing Infrastructure . . . . . . . . Marek R. Ogiela and Urszula Ogiela
175
HATS: High Accuracy Timestamping System Based on NetFPGA . . . . . . Zhiqiang Zhou, Lin Cong, Guohan Lu, Beixing Deng, and Xing Li
183
A Roadside Unit Placement Scheme for Vehicular Telematics Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Junghoon Lee and Cheol Min Kim Concurrent Covert Communication Channels . . . . . . . . . . . . . . . . . . . . . . . . Md Amiruzzaman, Hassan Peyravi, M. Abdullah-Al-Wadud, and Yoojin Chung Energy Efficiency of Collaborative Communication with Imperfect Frequency Synchronization in Wireless Sensor Networks . . . . . . . . . . . . . . Husnain Naqvi, Stevan Berber, and Zoran Salcic High Performance MAC Architecture for 3GPP Modem . . . . . . . . . . . . . . . Sejin Park, Yong Kim, Inchul Song, Kichul Han, Jookwang Kim, and Kyungho Kim Modified Structures of Viterbi Alogrithm for Forced-State Method in Concatenated Coding System of ISDB-T . . . . . . . . . . . . . . . . . . . . . . . . . . . . Zhian Zheng, Yoshitomo Kaneda, Dang Hai Pham, and Tomohisa Wada A New Cross-Layer Unstructured P2P File Sharing Protocol Over Mobile Ad Hoc Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nadir Shah and Depei Qian
196 203
214 228
239
250
Table of Contents
IX
A Model for Interference on Links in Inter-Working Multi-Hop Wireless Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Oladayo Salami, Antoine Bagula, and H. Anthony Chan
264
An Optimum ICA Based Multiuser Data Separation for Short Message Service . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Mahdi Khosravy, Mohammad Reza Alsharif, and Katsumi Yamashita
279
Advanced Computer Science and Information Technology Multiple Asynchronous Requests on a Client-Based Mashup Page . . . . . . Eunjung Lee and Kyung-Jin Seo
287
Using an Integrated Ontology Database to Categorize Web Pages . . . . . . Rujiang Bai, Xiaoyue Wang, and Junhua Liao
Video Copy Detection: Sequence Matching Using Hypothesis Test . . . . . . Debabrata Dutta, Sanjoy Kumar Saha, and Bhabatosh Chanda
499
An XML-Based Digital Textbook and Its Educational Effectiveness . . . . Mihye Kim, Kwan-Hee Yoo, Chan Park, Jae-Soo Yoo, Hoseung Byun, Wanyoung Cho, Jeeheon Ryu, and Namgyun Kim
509
SIMACT: A 3D Open Source Smart Home Simulator for Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Kevin Bouchard, Amir Ajroud, Bruno Bouchard, and Abdenour Bouzouane
524
Table of Contents
XI
Design of an Efficient Message Collecting Scheme for the Slot-Based Wireless Mesh Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Junghoon Lee and Gyung-Leen Park
534
A Novel Approach Based on Fault Tolerance and Recursive Segmentation to Query by Humming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Xiaohong Yang, Qingcai Chen, and Xiaolong Wang
CAS4UA: A Context-Aware Service System Based on Workflow Model for Ubiquitous Agriculture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Yongyun Cho, Hyun Yoe, and Haeng-Kon Kim
572
A Power Control Scheme for an Energy-Efficient MAC Protocol . . . . . . . . Ho-chul Lee, Jeong-hwan Hwang, Meong-hun Lee, Haeng-kon Kim, and Hyun Yoe
586
Towards the Designing of a Robust Intrusion Detection System through an Optimized Advancement of Neural Networks . . . . . . . . . . . . . . . . . . . . . . Iftikhar Ahmad, Azween B Abdulah, and Abdullah S Alghamdi
Fuzzy Based Threat Analysis in Total Hospital Information System Nurzaini Mohamad Zain1, Ganthan Narayana Samy2, Rabiah Ahmad1, Zuraini Ismail3, and Azizah Abdul Manaf3 1
Centre for Advanced Software Engineering (CASE), Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia (UTM), Malaysia 2 Department of Computer Systems and Communications, Faculty of Computer Science & Information Systems, Universiti Teknologi Malaysia (UTM), Malaysia 3 Department of Science, College of Science and Techology, Universiti Teknologi Malaysia (UTM), Malaysia [email protected], [email protected], {rabiah,zurainisma,azizah07}@ic.utm.my
Abstract. This research attempts to develop fuzzy based threat analysis model in which; linguistic variable, fuzzy number and fuzzy weighted average are applied to deal with the uncertainty problem in potential threats evaluation in Total Hospital Information System (THIS) environment. In fuzzification process, Triangular Average Number technique using two sets of membership functions was applied to evaluate “likelihood” and “consequence” of THIS threat variables upon a particular THIS asset. Then, each security threat level was aggregated using Efficient Fuzzy Weighted Average (EFWA) algorithm. Finally, Best Fit Technique is used in defuzzification process to translate a single fuzzy value to linguistic terms that indicates the overall security threat level impact on THIS asset. To confirm the effectiveness of this adopted model, prototype is developed and verified using scenario method. Finding shown that this model, is capable to perform threat analysis with incomplete information and uncertain in THIS environment. Keywords: Total Hospital Information System (THIS), Risk Analysis, Threats, Information Security, Fuzzy logic.
Therefore, it can be stated that prediction process in estimating the probability of threats and its consequences that take place in HIS environment is highly uncertain and crucial. Apparently, from the existing research, there is no much research on fuzzy technique in threat analysis, particularly in HIS. Based on the above gaps, the aim of this study is to assess and analyze threat in HIS by using fuzzy logic approach. This study also investigates whether fuzzy logic approach is applicable and capable to perform threat analysis in HIS. In order to verify the effectiveness of threat analysis model with fuzzy logic approach in HIS, scenario method is created based on the empirical study and data collected from THIS [1]. Furthermore, multi-expert opinion and judgment using Delphi method is applied in fuzzy threat analysis technique. This paper designed in six sections. The next section describes previous researches related to this study. Section 3 explains method used in this research and section 4 presents result and analysis. Section 5 presents the discussion, followed by conclusion in section 6.
2 State of the Art In this section, risk analysis concept, its uncertainties and available techniques is explored in general. Furthermore, several risk analysis studies in context of information security is explored and discussed. In risk analysis model with fuzzy approach is inspired by previous related work. Several works discussed on improving theory fuzzy technique and algorithm. Besides, some works adopt fuzzy logic algorithm. This works are applied in various research areas such as information security, software development, network security and enterprise strategic risk assessment. Quantitative risk assessment method is proposed in information security risk [4]. It is based on fuzzy number operation where sufficient data collection for security assessment is scarce or even impossible. Comprehensive fuzzy assessment is made using operation rules defined to the triangular fuzzy number. The probability of information security events is gained by the evaluation of information external cause (threat) and internal cause (survivability). Research also has been done for risk assessment in e-commerce development. Fuzzy Decision Support System (FDSS) prototype using a fuzzy set approach is developed to assist e-commerce project managers and decision makers [5]. In addition, the prototype function is to help the evaluation of a company’s risk level and provides overall risk evaluation of E-Commerce (EC) development. In this research, empirical data is used in categorizing EC development risks and developing FDSS prototype. The concept of relative membership is introduced in Multi-objective fuzzy optimization theory in proposing model of enterprise strategic risk management [6]. The goal of this study is to choose the best strategic project by using the new relative membership degrees matrix. Based on the related strategic management theory and empirical data, this model attempts to include all categories and processes that are necessary to assess directly the corporate strategic risk.
Fuzzy Based Threat Analysis in Total Hospital Information System
3
Besides that, novel Hierarchical Fuzzy Weighted Average (HFWA) is developed to perform fuzzy risk assessment method in network security risk analysis [7]. It is designed to help network managers and practitioners to monitor the security risk by calculating the overall risk using the fuzzy set theory. Basically, it implements security hierarchical structures and the Fuzzy Weighted Average (FWA) method is used to calculate the final risk value. Before the hierarchical structure is established, different risk factors that threaten the successful of network system work and development according to the different analysis goal is clearly identified. Furthermore, fuzzy-logic based threat modeling design is proposed with Mamdanistyle fuzzy inference system which is incorporated in MATLAB fuzzy logic tools [8]. The goal of this model is to perform qualitative risk analysis in identifying, quantifying and analyzing potential threats that related to computer-based systems. The potential threat is based on empirical data that related to six major threats categories (STRIDE – Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service and Elevation of Privilege). Apparently, from the above work, most fuzzy techniques used in risk analysis are related to fuzzy-multiple attribute decision making. Fundamental concepts associated with fuzzy set theory as applied to decision systems, are membership functions, linguistic variables, natural language computation, linguistic approximation, fuzzy set arithmetic operations, set operations and Fuzzy Weighted Averages [9]. Current available freeware or open source programming software, client server and database tools is used. Java Net Beans Integrated Development Environment (IDE) 6.5.1, MySQL 5.0 Community Server / v5.0.27 is applicable for prototype development. Besides, one proprietary software program which is Microsoft Excel 2007 is used for assessment result documentation.
3 Method The development of case study is done by referring to empirical study. In depth work in investigating various types of threats that exist in THIS is comprehended [1]. In their study, a complete taxonomy of threat categories reveals twenty two systems and twenty two potential threats categories with fifty threat descriptions. From the literature review, it is noted that a challenge in this research is the interrelationship between threats and security threat impacts upon THIS asset. This is due to the fact that fuzzy logic approach has not been thoroughly research in THIS environment. For the development of fuzzy logic threat analysis prototype, suitable fuzzy logic model which related in information security field will be adopted in this project. Based on the literature, most of fuzzy logic in risk analysis applies fuzzy multipleattribute decision making. This process will focus on analyzing and interpreting threat in HIS environment using fuzzy approach. In developing the fuzzy threat analysis model, risk analysis methodology is adapted and adopted from [10], [11], [12]. Fuzzy risk analysis model is adapted from [5], [7]. As shown in Fig. 1, Fuzzy Threat Analysis model is constructed in six major steps. These steps are further discussed in next sub sections.
4
N. Mohamad Zain et al.
Fig. 1. Total Hospital Information System (THIS) Fuzzy Threat Analysis Model
3.1 Context Identification The development of case study is done by referring to empirical study. In depth work in investigating various types of threats that exist in THIS is comprehended [1]. In this study, a government supported hospital in Peninsular Malaysia is used as field setting. Based on the empirical study, THIS system is consisting of twenty two systems (asset) and twenty two potential threats categories with fifty threat descriptions. 3.2 Identification of Threat Categories In this step, fuzzy risk analysis model from [5], [7] is adopted. As shown in Fig. 2, THIS classification framework, the breakdown is listed as a following: Level 1 – Goal to the THIS security threats Level 2 – Categories of threats for evaluating those threat descriptions (factors) Level 3 – Comprises of Threat Descriptions (factors) that associated with THIS security threats. Therefore, to identify security threats impacts in each level, for each possible threat, we evaluated its impact or consequence and the likelihood or probability that it would occur. Each threat descriptions were given qualitative values for consequence and likelihood (i.e. Very Low, Low, Medium, High and Very High.). As shown in Fig. 3, simple hierarchical structure is designed to identify the security threats impact for “TC01 - Power failure / loss”. There are three threats descriptions associated with “TC01” and the Fuzzy Weight Average for the security threats impact value is calculated as follows:
(1)
Fuzzy Based Threat Analysis in Total Hospital Information System
5
Fig. 2. Hierarchical Structure of Security Threats to Total Hospital Information System
Fig. 3. Simple hierarchical structure to identify the security threats impact for “TC01 - Power failure / loss”
6
N. Mohamad Zain et al.
3.3 Natural Language Representation In this step, as shown in Table 1, Fuzzy Set Representation is used for each linguistic term. Then, two membership function is defined which is ‘Consequence’ and ‘Likelihood’ as depicted in Fig. 4 and Fig. 5. The weighting for each membership function is fixed. In this step [11], [12] scale definition for “Likelihood”, “Consequence” and “Security Threat Impact” is adopted. Scale definition for “Likelihood” and “Consequence” is range from value 0 to 5. Table 1. Fuzzy set representation for each linguistic terms Likelihood Level Very Low Low
Fig. 4. Membership function of Consequence Table 2. The membership functions scale definition Security Threat Impact Insignificant Low
Moderate
High Very High
Description Acceptable Can lead to acceptable risk. The system can be used with the identified threats, but the threats must be observed to discover changes that could raise the risk level Can for this system be an acceptable risk, but for each case it should be considered whether necessary measures have been implemented. Can lead to not acceptable risk. Cannot start using the system before risk reducing treatment has been implemented. Not acceptable risk. Can cause huge financial loss and risk reduction need to be implemented.
Fuzzy Based Threat Analysis in Total Hospital Information System
7
Fig. 5. Membership function of Likelihood
In Table 2, it illustrates predefined scale level and descriptions for result on “Security Threat Impact” and its descriptions. As depicted in Table 3, predefined scale level and descriptions for “Likelihood” which is Probability of threat Occurrence and “Consequence” which is Outcome to the system / asset value are clearly determined. Table 3. Impact of threat level for system definition Membership Function Likelihood (Probability of threat Occurrence)
Level
Very Low Low
Consequence (Outcome to the system / asset value)
High Medium Very High Very Low Low
High
Medium
Very High
Descriptions
Very rare or unlikely to occur. Assumes less than once in every 10 years Rare. Once every 3 years May happen. Once in a year Quite often. Once every 4 month Very often. Once every month Does not affect confidentiality, integrity and availability of the system. Short interruptions of availability of the system. No breach of confidentiality or integrity of the system. Interruptions of availability of the system for longer period. No breach of confidentiality or integrity of the system. Partially breaches of information confidentiality, integrity and availability of the system. Breaches of information confidentiality, integrity and availability which affect the system as a whole.
8
N. Mohamad Zain et al.
3.4 Fuzzy Assessment Aggregation In this step, Triangular Average Number is applied and he number n evaluators is considered and use Fuzzy Average Number to get the mean. The fuzzy average value is obtained based on “likelihood” and “consequence” of each threat that was performed all identified evaluators. 3.5 Fuzzy Weight Average Computation After obtaining the Fuzzy Average for all each sub-categories (Tx) for (TCx ), The Fuzzy Weighted Average (FWA) is calculated with EFWA algorithm. (Note: X is specific categories for a particular main threat).This algorithm is applied in order to find: (2) Moreover, this step is focusing on adopting and implementing fuzzy algorithm in the fuzzy threat analysis model. Fuzzy Weightage Average (FWA) using EFWA algorithm is adopted [13]. From the literature, this algorithm has been tested in FDSS prototype [5]. Moreover, the study also shown that the developed prototype is widely accepted by the fuzzy set theory (FST) expert and the EC practitioner. Moreover, the computational algorithm of EFWA is based on the value representation of fuzzy sets and interval analysis [13]. Besides that, this algorithm also has been applied in Context-awareness Content Gateway system. In this work, [14] agreed that EFWA technique provides the system with a comprehensible way by measuring the power of learning devices efficiently and deliver the proper learning style. Moreover, EFWA algorithm is applied because it can reduce the number of comparisons and arithmetic operations to O (n log n) rather than O (n2). This is associated case with Improved Fuzzy Weighted Average Algorithm (IFWA) [13] hence, it’s seems applicable and relevant to calculate the potential security threat impact for THIS asset. 3.6 Linguistic Approximation In this step, Euclidean Distance Formula (Best Fit Technique) is applied. As the results are fuzzy numbers, Euclidean distances are used to map the resultant fuzzy interval back to linguistic terms. (3)
3.7 Prototype Architecture and Design This prototype is desktop application and working in Java environment. This prototype allow the user to store and manage information on the THIS asset, threat analysis expert team member and identified potential threats in THIS environment. In this
Fuzzy Based Threat Analysis in Total Hospital Information System
9
project, the architectural design of Fuzzy Threat Analysis prototype can be divided into three interrelated components, which is user interface, database, and fuzzy threat analysis component. MySql JDBC (Java Database Connectivity) with TCP/IP network protocol is used to connect MySql database with Java client program. Fuzzy threat analysis component is built in Java Client Language. Fuzzy threat analysis component is called to access necessary information from database, such as the likelihood and consequence of each threat descriptions, to perform fuzzy averaging, calculate the fuzzy weighted average and obtain linguistic approximation. In this research study, the system prototype was tested on Microsoft Windows. MySql 5.0.27 Community Server is used for MySql server and client. In Fig. 6, it illustrates whole interrelated three components in Total Hospital Information System (THIS) Fuzzy Threat Analysis.
Fig. 6. Total Hospital Information System (THIS) Fuzzy Threat Analysis prototype architecture and design
4 Result To verify fuzzy threat analysis model, Fuzzy Threat Analysis prototype is produced. Since Picture Archiving Communication System (PACS) is one of the major systems in THIS component, apparently overall PACS system characterization is crucial where it can be describe as follow: i. ii. iii.
The mission of the PACS, including the processes implemented by the system. The criticality of the PACS, determined by its value and the value of the data to the organization. The sensitivity of the PACS and its data.
As shown in Fig.7, From initial observation, its signify that “overall security threat impact level” scores for S1 – PACS in linguistic term is “Moderate” and the closest Euclidean distance value is 0.8055. As defined earlier, “Moderate” indicates that overall security threat impact level for S1 – PACS can be acceptable risk. However, for each case it should be considered whether necessary measures have been implemented.
10
N. Mohamad Zain et al.
ï
Fig. 7. Threat Assessment Result on PACS
Moreover, for each ‘Threat Categories’, security threat impact in linguistic terms with fuzzy values (Resultant Fuzzy Set, Defined Fuzzy Set) and the closest Euclidean Distance value could also be further examined. As shown in Table 4, description on each threat level impact for THIS asset is based on Table 2. PACS threat assessment result shows that none of the ‘Threat Categories’ values is Very High or Very Low. Only one (1) is High, ten (10) are Moderate and eleven (11) is Low which can be listed as in Table 4. This is the result of looking at only one system (S1 – PACS) where overall security threat impact level and each level for threat categories (TC01 until TC22) are clearly stated. From this point, this result can be compiled and attached into a report and it can be presented to risk analysis team or hospital management level. For instance, when risk analysis team or hospital management level is presented with this information, they can view the result for each threat categories (TC01 until TC22). Therefore, they can determine which threats could cause the greatest security threat impact to S1 – PACS and which threats should be addressed first. In this scenario, with in depth examination, it seems that the staff (S1 – PACS evaluators) feels that acts of human error or failure threat (TC02) is contributed highest security threat impact to S1 – PACS with “High” score. The “High” security threat impact level of TC02 could lead S1 – PACS to not acceptable risk. User cannot start using S1 – PACS before risk reducing treatment has been implemented. In TC02 threat descriptions; the results compose of several unwanted incidents which are:
Fuzzy Based Threat Analysis in Total Hospital Information System
i. ii. iii. iv. v.
11
T04 - Entry of erroneous data by staff. T05 - Accidental deletion or modification of data by staff. T06 - Accidental misrouting by staff. T07 - Confidential information being sent to the wrong recipient. T08 - Storage of data / classified information in unprotected areas by staff. Table 4. PACS security threat impact in ranking Security Threat Impact High
Moderate
Low
Threat Categories
TC02 - Acts of Human Error or Failure TC01 - Power failure/loss TC03 - Technological Obsolescence TC04 - Hardware failures or errors TC05 - Software failures or errors TC06 - Network Infrastructure failures or errors TC07 - Deviations in quality of service TC08 - Operational issues TC09 - Malware attacks (Malicious virus, Worm, Trojan horses, Spyware and Adware) TC16 - Technical failure TC18 - Misuse of system resources TC10 - Communications interception TC11 - Masquerading TC12 - Unauthorized use of a health information application TC13 - Repudiation TC14 - Communications infiltration TC15 - Social Engineering attacks TC17 - Deliberate acts of Theft (including theft of equipment or data) TC19 - Staff shortage TC20 - Wilful damages TC21 - Environmental Support Failure/Natural disasters TC22 – Terrorism
Although, it shows that TC02 is “High” level, it can be stated that TC04 and TC05 threat categories with “Moderate” level might also contribute to this outcome. Moreover, “Moderate” indicates that security threat impact level of TC04 and TC05 can be acceptable risk. However, for each threat categories it should be considered whether necessary measures have been implemented. For example, TC04 - Hardware Failure or errors could cause spurious signals to be generated that are outside the range of inputs expected by software. The software could then behave unpredictably. Moreover, TC05 – Software failures or errors might lead to unexpected system behavior that might confuse the staff (operator) and result in staff stress. The staff may then act incorrectly and choose inputs that are inappropriate for the current failure situation. These inputs could further confuse the system and more errors are generated. A single sub-system failure that is recoverable can thus rapidly develop into a serious problem requiring a complete S1 – PACS shutdown.
12
N. Mohamad Zain et al.
Therefore, effective control measure should be put in place and good practice among the staff must be exercised. Furthermore, depth analysis should be performed and appropriate controls should be put in place to reduce security threats level TC01, T03, TC06, TC07, TC08, TC09 and TC16 which also label as “Moderate”. In further analysis, it shows that TC10, TC11, TC12, TC13, TC14, TC15, TC17, TC19, TC20, TC21 and TC22 threat categories is categorized as “Low”. The “Low” security threat impact level of these threat categories could lead to acceptable risk for S1 – PACS. In this situation, S1 – PACS can be used with the identified threats, but the threats must be observed to discover changes that could raise the risk level. With this threat analysis result, risk analysis team or hospital management can make decision and take further step in the risk analysis process. As mentioned before, security concerns in Total Hospital Information System (THIS) environment is related to loss confidentiality, loss integrity and loss availability. Therefore, it is vital to ensure THIS resources are appropriately taken care of and patient’s health information, privacy and safety are securely protected. However, further step in risk analysis and information security risk management is out of this research scope and will be not discuss details in this research.
5 Discussion Throughout the cause of conducting this study, it provides several steps that are applicable and significant for further research. Several progress and achievements has been made particularly in the design and implementations of fuzzy threat analysis prototype in healthcare information system (HIS) environment. Therefore, it can be stated that main contribution of this study is the proposed fuzzy threat analysis model and the prototype that has been developed. It’s never been applied before in HIS and we’ve tried to come out with significant result. Verification based on the scenario method shown that the adopted fuzzy threat analysis model can be done by using appropriate fuzzy threat analysis technique. Besides, one of main benefit for the organization is perceptibly during the threat assessment process, involvement of multi-expert in team evaluation made analysis result more accurate and reliable. Moreover, the essence of fuzzy logic in using linguistic representation which is closely to human judgment also made the prototype easy to use. The theory in fuzzy set that allow ordinal number compared to conventional risk assessment method using statistical method gives more reliable result. However, this study has some limitations which need further consideration in order to make the result of the study more acceptable. Therefore several possible future works in the same research direction of this study is pointed out. The first limitation in this study is fuzzy threat analysis model verification is only been performed on one of THIS information system. PACS was selected as it is one of the major components in THIS. Thus, result given in this study was only based on one THIS system component. Hence, attempts to generalize these results must be done in caution. Therefore, in order to gain more accurate result, it is recommended that in future research all the twenty two (22) systems in THIS should be included and
Fuzzy Based Threat Analysis in Total Hospital Information System
13
tested with this model. As a result, complete risk analysis process can be performed and all risk level for twenty two (22) THIS information system can be determined. Secondly, as this model using fix weighted average on “likelihood” and “consequence” membership function where it assumed that the “weighting” assigned by each evaluator in the risk evaluation was the same. However, the relative importance placed on certain factors by individual decision makers and experts could be widely different. Therefore, it is recommended that further research is needed to develop different weighting for different evaluator. Thirdly, the proposed prototype is only focus on fuzzy threat analysis engine in THIS environment. Less effort has been put on the screen design, analysis report and user friendly aspect. Therefore, it is recommended that user interface design should be improved and more features is added before the prototype is implemented in real THIS environment. Thus, it can be easily used by THIS evaluation team member. Rapid prototyping with the end user involvement can be executed to improve this prototype [15]. Finally, in future, risk analysis using fuzzy technique can be developed using [4] algorithm and can be implemented in THIS environment.
6 Conclusion In further, this study can be used to produce threat analysis tools particularly in HIS which can be beneficial to healthcare professional, top management and policy makers and risk analysis personnel particularly in healthcare industry. Acknowledgments. We gratefully acknowledge the funding received from Ministry of Science, Technology and Innovation (MOSTI) that helped sponsor this study and also sincere thanks for the cooperation given by Ministry of Health Malaysia, Hospital Selayang and Universiti Teknologi Malaysia (UTM).
References 1. Narayana Samy, G., Ahmad, R., Ismail, Z.: Security Threats Categories in Healthcare Information Systems. In: 14th International Symposium on Health Information Management Research, Sweden, pp. 109–117 (2009) 2. Maglogiannis, I., Zafiropoulos, E.: Modeling risk in distributed healthcare information systems. In: 28th Annual International Conference of the IEEE on Engineering in Medical and Biology Society, pp. 5447–5450. IEEE Press, New York (2006) 3. Ahmad, R., Narayana Samy, G., Bath, P.A., Ismail, Z., Ibrahim, N.Z.: Threats Identification in Healthcare Information Systems using Genetic Algorithm and Cox Regression. In: 5th International Conference on Information Assurance and Security, pp. 757–760. IEEE Computer Society, China (2009) 4. Fu, Y., Qin, Y., Wu, X.: A method of information security risk assessment using fuzzy number operations. In: 4th International Conference on Wireless Communications, Networking and Mobile Computing. IEEE, China (2008) 5. Ngai, E.W.T., Wat, F.K.T.: Fuzzy Decision Support System for Risk Analysis in ECommerce Development. Decision Support Sys. 40(2), 235–255 (2005)
14
N. Mohamad Zain et al.
6. Pan, C., Cai, X.: A Model of Enterprise Strategic Risk Assessment: Based on the Theory of Multi-Objective Fuzzy Optimization. In: 4th International Conference on Wireless Communications, Networking and Mobile Computing. IEEE, China (2008) 7. Liao, Y., Ma, C., Zhang, C.: A New Fuzzy Risk Assessment Method for the Network Security Based on Fuzzy Similarity Measure. In: The Sixth World Congress on Intelligent Control and Automation, pp. 8486–8490. IEEE, China (2006) 8. Issues in Informing Science and Information Technology, http://proceedings.informingscience.org/InSITE2007/IISITv4p0 53-061Sodi261.pdf 9. Zimmermann, H.J.: Fuzzy Sets, Decision Making and Expert Systems. Kluwer Academic Publishers, USA (1987) 10. International Organization for Standardization: ISO/IEC 27005: Information Security Risk Management Standard. ISO Publication, London (2008) 11. Council of Standards Australia: AS/NZS 4360:1999 Australian Standard Risk Management. Standards Association of Australia, NSW (1999) 12. Bones, E., Hasvold, P., Henriksen, E., Strandenaes, T.: Risk analysis of information security in mobile instant messaging and presence system for healthcare. IJMI 76, 677–687 (2007) 13. Lee, D.H., Park, D.: An efficient algorithm for fuzzy weighted average. Fuzzy Sets and Systems 87(1), 39–45 (1997) 14. Huang, Y.M., Kuo, Y.H., Lin, Y.T., Cheng, S.C.: Toward interactive mobile synchronous learning environment with context-awareness service. Comp. & Edu. 51(3), 1205–1226 (2008) 15. Sommerville, I.: Software Engineering. Pearson Education Limited, England (2007)
An ID-Based Anonymous Signcryption Scheme for Multiple Receivers Secure in the Standard Model Bo Zhang and Qiuliang Xu School of computer science and technology, Shandong University, 250101, Jinan, Shandong, P.R. China [email protected], [email protected]
Abstract. Anonymous signcryption is a novel cryptographic primitive which provides anonymity of the sender along with the advantage of traditional signcryption scheme. In this paper, we propose an anonymous identity-based signcryption scheme for multiple receivers in the standard model. The proposed scheme satisfies the semantic security, unforgeability and signcrypter identity’s ambiguity. We also give the formal security proof on its semantic security under the hardness of Decisional Bilinear Diffie-Hellman problem and its unforgeability under the Computational Diffie-Hellman assumption. Keywords: Signcryption, identity based cryptography, multi-receiver, anonymous signcryption.
1
Introduction
Encryption and signature are basic cryptographic tools to achieve private and authenticity. In 1997, Zheng [1] proposed the notion of signcryption, which can perform digital signature and public key encryption simultaneously at lower computational costs and communication overheads than sign-then-encrypt way to obtain private and authenticated communications in the open channel. Identitybased (ID-based) cryptosystems were introduced by Shamir [2] in 1984. Its main idea is that the public keys of a user can be easily derived from arbitrary strings corresponding to his identity information such as name, telephone number or email address. The corresponding private key can only be derived by a trusted Private Key Generator (PKG). By combining ID-based cryptology and signcryption, Malone-Lee [3] gave the first ID-based signcryption scheme. Since then, quite a few ID-based signcryption schemes [4,5,6,7,8] have been proposed. In some network applications, we have to distribute same message to several different persons. A simple approach for achieving this goal is that the sender encrypts the message for each person respectively. Obviously, the cost of using the approach in large group is very high. Consider a scenario like this, suppose Bob is
This work is supported by the National Natural Science Foundation of China under Grant No.60873232.
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 15–27, 2010. c Springer-Verlag Berlin Heidelberg 2010
16
B. Zhang and Q. Xu
a cabinet member who wants to leak a very important information to the public. The fastest and most convenient way is to leak the information to several different journalists at the same time (avoiding that some of them have been corrupted). Bob wants to remain anonymous, but needs to convince these journalists that the information actually came from a cabinet member. At the same time, the information should not be leaked until most the journalists receive it. Thus, we need anonymity and authentication of Bob, confidentiality of the information before it reaches the honest journalists. All of the properties are together achieved by a primitive called ”Anonymous Signcryption for Multiple Receivers”. Anonymous signcryption or ring signcryption is a novel cryptographic primitive motivated from ring signature [9]. It is an important method to realize the signcrypter identities’ ambiguity. The receiver in an anonymous signcryption scheme only knows that the message is produced by one member of a designated group, but he cannot know more information about actual signcrypter’s identity. Huang et al. [10] proposed the first ID-based ring signcryption scheme along with a security model. Some more ID-based ring signcryption schemes are reported in [11,12,13]. In 2006, Duan et al. [14] gave the first multi-receiver ID-based signcryption scheme which only needs one pairing computation to signcrypt a message for n receivers and in 2009, Sunder Lal et al. [15] proposed the first anonymous ID-based signcryption scheme for multiple receivers. The security of the scheme was proven secure in the random oracle model [16]. Although the model is efficient and useful, it has been shown that when random oracles are instantiated with concrete hash functions, the resulting scheme may not be secure [17]. Therefore, it is an important research problem to construct an ID-based anonymous signcryption scheme secure in the standard model. Our contribution. In this paper, we give the first ID-based anonymous signcryption scheme for multiple receivers in the standard model. The proposed scheme satisfies the semantic security, unforgeability and signcrypter identity’s ambiguity. We also give the formal security proof on its semantic security under the hardness of Decisional Bilinear Diffie-Hellman problem and its unforgeability under the Computational Diffie-Hellman assumption.
2
Preliminaries
Let G and GT be two cyclic multiplicative groups of prime order p and g be a generator of G. 2.1
Bilinear Pairings
The map e : G × G → GT is said to be an admissible bilinear pairing if the following conditions hold true. (1) e is bilinear, i. e. e(g a , g b ) = e(g, g)ab for all a, b ∈ Zp . (2) e is non-degenerate, i. e. e(g, g) = 1 GT . (3) e is efficiently computable. We refer the reader to [18] for more details on the construction of such pairings.
An ID-Based Anonymous Signcryption Scheme for Multiple Receivers Secure
2.2
17
Complexity Assumptions
Decisional Bilinear Diffie-Hellman (DBDH)Assumption. The challenger chooses a, b, c, z ∈ Zp at random and then flips a fair binary coin . If β = 1 it output the tuple (g, A = g a , B = g b , C = g c , Z = e(g, g)abc ). Otherwise, if β = 0, the challenger outputs the tuple (g, A = g a , B = g b , C = g c , Z = e(g, g)z ). The adversary must then output a guess β of β. An adversary, λ, has at least an ε advantage in solving the decisional BDH problem if |P r[λ(g, g a , g b , g c , e(g, g)abc ) = 1] − P r[λ(g, g a , g b , g c , e(g, g)z ) = 1]| ≥ ε where the probability is over the randomly chosen a, b, c, z and the random bits consumed by λ. Definition 1. The decisional DBDH assumption holds if no adversary has at least ε advantage in solving the above game. Computational Diffie-Hellman (CDH) Assumption. The challenger chooses a, b ∈ Zp at random and outputs (g, A = g a , B = g b ). The adversary then attempts to output g ab ∈ G. An adversary, λ, has at least an ε advantage if P r[λ(g, g a , g b ) = g ab ] ≥ ε where the probability is over the randomly chosen a, b and the random bits consumed by λ. Definition 2. The computational CDH assumption holds if no adversary has at least ε advantage in solving the above game.
3 3.1
ID-Based Anonymous Signcryption Scheme for Multiple Receivers(IASCfMR Scheme) Generic Scheme
An IASCfMR scheme consists of the following algorithms. Setup: Given a security parameter k, PKG generates a master key S and common parameters P . P is made public while S is kept secret. Extract: Given an identity IDu , the PKG runs this algorithm to generate the private key du associated with IDu and transmits it to the user via a secure channel. Signcrypt: To send a message m to n receivers with identity L = {ID1 , ..., IDn } anonymously, the actual signcrypter with identity IDs selects a group of n users’ identities L = {ID1 , ..., IDn } including himself obtain a ciphertext σ by running Signcrypt (m, ds , L, L ). Unsigncrypt: Upon receiving the ciphertext σ, the receiver with identity IDj in the receiver list L = {ID1 , ..., IDn } runs Unsigncrypt (σ, dj , L, L ) and obtains the message m or the symbol ⊥ indicating that the ciphertext is invalid.
18
3.2
B. Zhang and Q. Xu
Security Notions
Now we present security notions for our IASCfMR scheme. Definition 3. (Signcrypter identity’s ambiguity) An IASCfMR scheme is unconditional anonymous if for any group of n members with identities in the signer list L, the probability of any adversary to identify the actual signcrypter is not more than random guess i. e. the adversary output the identity of actual signcrypter with probability 1/n if he is not a member of L, and with probability 1/(n − 1) if he is the member of L. Definition 4. (Semantic security) An IASCfMR scheme is said to have the indistinguishability against adaptive chosen ciphertext attacks property (INDIASCfMR-CCA2) if no polynomially bounded adversary has a non-negligible advantage in the following game. Setup: The challenger C runs the Setup algorithm with a security parameter k and obtains common parameters P and a master key S. He sends P to the adversary and keeps S secret. First stage: The adversary performs a polynomially bounded number of queries. These queries may be made adaptively, i. e. each query may depend on the answers to the previous queries. Extraction queries. The adversary requests the private key of an identity IDu and receives the extracted private key du =Extract (IDu ). Signcryption queries. The adversary produce a signer list L = {ID1 , ..., IDn }, a receiver list L = {ID1 , ..., IDn } and a plaintext m (Note that the adversary should not have asked the private key corresponding the identities in the receiver list). C computes di =Extract (IDi )(i ∈ {1, ..., n})randomly and σ = Signcrypt(m, di , L, L ), then he sends σ to the adversary. Unsigncryption queries. The adversary produce a signer list L = {ID1 , ..., IDn}, a receiver list L = {ID1 , ..., IDn } and a ciphertext σ. C computes di =Extract (IDi )(i ∈ {1, ..., n })randomly and sends the result of Unsigncrypt(σ, di , L, L ) to the adversary. This result may be the symbol ⊥ if σ is an invalid ciphertext. Challenge: The adversary chooses two plaintexts, m0 and m1 , a signer list L = {ID1 , ..., IDn } , and a receiver list L = {ID1 , ..., IDn } on which he wishes to be challenged. He cannot have asked the private key corresponding the identities in the receiver list in the first stage. C chooses randomly a bit γ, computes di =Extract (IDi )(i ∈ {1, ..., n})randomly and σ=Signcrypt (mγ , di , L, L ) and sends σ to the adversary. Second stage: the adversary asks a polynomial number of queries adaptively again as in the first stage. It is not allowed to extract the private key corresponding the identities in the receiver list and it is not allowed to make an unsigncryption query for under the receiver list. Guess: Finally, the adversary produces a bit γ and wins the game if γ = γ.
An ID-Based Anonymous Signcryption Scheme for Multiple Receivers Secure
19
Definition 5. (Unforgeability)An IASCfMR scheme is said to be secure against an existential forgery for adaptive chosen message attacks (EUF-IASCfMRCMA) if no polynomially bounded adversary has a non-negligible advantage in the following game. Setup: The challenger C runs the Setup algorithm with a security parameter k and obtains common parameters P and a master key S. He sends P to the adversary and keeps S secret. Queries: The adversary performs a polynomially bounded number of queries adaptively just like in the previous definition. Forgery: Finally, the adversary produces a new triple (σ, L, L ) (i. e. a triple that was not produced by the signcryption oracle) where all of the private keys of signers in the signer list were not asked. The adversary wins the game if the result of Unsigncrypt (σ, L, L ) is a valid message m and (m, L)have never been asked.
4
The Concrete Scheme
In the section, we describe our IASCfMR scheme. Our concrete scheme is motivated from Waters’ ID-based encryption scheme [19] and the signature schemes in [20,21]. Setup: Choose groups G and GT of prime order p such that an admissible pairing e : G × G → GT can be constructed and pick a generator g of G. Now, pick a random secret α ∈ Zp , compute g1 = g α and pick g2 ←R G. Furthermore, pick elements u , m ←R G and vectors VU , VM of length nu and nm , respectively, whose entries are random elements from G. Let H, Hu , Hm be a cryptography hash functions where H : GT → {0, 1}lt , Hu : {0, 1}∗ → {0, 1}nu , Hm : {0, 1}lt × {0, 1}∗ × GT → {0, 1}nm where lt is the length of plaintext. The public parameters are P = (G, GT , e, g, g1 , g2 , u , VU , m , VM , H, Hu , Hm ) and the master secret S is g2α . Extract: Let U be a bit string of length nu representing an identity and let U[i] be the i-th bit of U. Define U ⊂ {1, . . . , nu }to be the set of indices i such that U[i] = 1. To construct the private key du of the identity U, pick ru ← Zp and compute: ui )ru , g ru ) du = (g2α (u i∈U
Signcrypt: Let L = {ID1 , ID2 , ..., IDn } be the list of n identities including the one of the actual signer, L = {ID1 , ID2 , ..., IDn } be the receiver list and m be a bit string representing a message. Let the actual signer be indexed s, where s ∈ {1, 2, ..., n}, with private key ds = (ds1 , ds2 ) = (g2α (u uj )r , g r ) j∈Us
} including himHe selects a group of n user’ identities L = {ID1 , ID2 , ..., IDn self, picks r1 , r2 , ..., rn , rm ∈ Zp randomly, computes Uj = u i∈U ui (for j = j 1, 2, ..., n), Uj = u i∈U ui (for j = 1, 2, ..., n ) and follows the steps below: j
20
B. Zhang and Q. Xu
(1) Compute ω = e(g1 , g2 )rm (2) Compute c = m ⊕ H(ω) (3) Compute σ1 = {R1 = g r1 , ..., Rs−1 = g rs−1 , Rs = g rs · ds2 = g rs +r , Rs+1 = rs+1 , ..., Rn = g rn } g (4) Compute σ2 = {Rj = Ujrm |j = 1, 2, ..., n } (5) Compute σ3 = g rm n (6) Compute M = Hm (m, L, ω), σ4 = ds1 · ( j=1 (Uj )rj )(m j∈M mj )rm (M ⊂ {1, 2, ..., nm} be the set of indices j such that m[j] = 1, where m[j] is the jth bit of M ). The resultant ciphertext is σ = (c, σ1 , σ2 , σ3 , σ4 , L). Unsigncrypt: The receiver with index j in L decrypts the ciphertext as follows: (1) Compute ω = e(dj1 , σ3 )/e(dj2 , Rj ) (2) Compute m = c ⊕ H(ω) (3) Compute M = Hm (m, L, ω) The receiver accepts the message if and only if the following equality holds: n e(σ4 , g) = e(g1 , g2 )( e(Uj , Rj ))e(m mj , σ3 ) j=1
5 5.1
j∈M
Analysis of the Scheme Correctness
The correctness of the scheme can be directly verified by the following equations. e(σ4 , g) = e(ds1 · (
n
(Uj )rj )(m
mj )rm , g)
j∈M
j=1
= e(g2α Usr , g)e(
n
j=1
= e(g2α , g)e( = e(g1 , g2 )(
n
j=1,j=s n
= e(g1 , g2 )(
j=1,j=s n j=1
mj )rm , g)
j∈M
(Uj )rj · Usr , g)e((m
j=1 n
= e(g1 , g2 )(
(Uj )rj , g)e((m
mj )rm , g)
j∈M
e(Uj , Rj )) · e(Usr+rs , g)e(m
mj , σ3 )
j∈M
e(Uj , Rj )) · e(Us , Rs )e(m
e(Uj , Rj ))e(m
j∈M
j∈M
mj , σ3 )
mj , σ3 )
An ID-Based Anonymous Signcryption Scheme for Multiple Receivers Secure
5.2
21
Security
Theorem 1. The proposed IASCfMR scheme is unconditional anonymous. Proof. We have to show that given a signcryption ciphertext on the message m produced by a member in the signcrypter list L = {ID1 , ID2 , ..., IDn } , anyone is not able to identify the actual signcrypter except the real signcrypter himself. To show our scheme satisfies unconditional anonymous, we only prove that anyone in the signcrypter list can produce the same ciphertext on the message m. We assume there are two signers A and B with identities IDi and IDj (i, j ∈ {1, 2, ..., n}) whose private keys are dA = (dA1 , dA2 ) = (g2α (u uj )rA , g rA ) j∈UA
and
dB = (dB1 , dB2 ) = (g2α (u
uj )rB , g rB )
j∈UB
We know that, to produce signcryption ciphertext on the message m, A should picks r1 , r2 , ..., ri , ..., rj , ..., rn , rm ∈ Zp randomly and compute as follows: (1) Compute ω = e(g1 , g2 )rm (2) Compute c = m ⊕ H(ω) (3) Compute σ1 = {R1 = g r1 , ..., Ri−1 = g ri−1 , Ri = g ri · dA2 , Ri+1 = ri+1 g , ..., Rn = g rn } (4) Compute σ2 = {Rs = Usrm |s = 1, 2, ..., n } (5) Compute σ3 = g rm n (6) Compute σ4 = dA1 · ( j=1 (Uj )rj )(m j∈M mj )rm In the following, it is shown that there exists random numbers r1 , ..., rn , rm ∈ Zp , by which B can produce the same signcryption ciphertext. The random numbers = rm choose by B are r1 = r1 , ..., ri = ri + rA , ..., rj = rj − rB , ..., rn = rn , rm Then B could produce the signcryption ciphertext as
(1) Compute ω = e(g1 , g2 )rm (2) Compute c = m ⊕ H(ω) (3) Compute σ1 = {R1 = g r1 , ..., Ri−1 = g ri−1 , Ri = g ri · dB2 , Ri+1 = g ri+1 , ..., Rn = g rn } r (4) Compute σ2 = {Rs = Us m |s = 1, 2, ..., n } (5) Compute σ3 = g rm (6) Compute σ4 = dB1 · (
n
j=1
(Uj )rj )(m
j∈M
mj )rm = dA1 · (
n
j=1
(Uj )rj )(m
mj )rm
j∈M
Obviously, the signcryption ciphertext generated by B is the same as ciphertext generated by A. In other words, given σ = (c, σ1 , σ2 , σ3 , σ4 , L) on the message m,
22
B. Zhang and Q. Xu
all of the signers in L can produce it. So, our IASCfMR scheme is unconditional anonymous. The probability of any adversary to identify the actual signcrypter in not more than random guess’s i. e. the adversary output the identity of actual signcrypter with probability 1/n if he is not a member of L, and with probability 1/(n − 1) if he is the member of L. Theorem 2. Assume there is an IND-IASCfMR-CCA2 adversary that is able to distinguish two valid ciphertexts during the game defined in Definition 4 with an advantage E and asking at most qE extraction queries, qS signcryption queries and qU unsigncryption queries, then there exists a distinguisher D that can solve an instance of the Decisional Bilinear Diffie-Hellman problem with an E advantage. 2n +2 ((q +q +q )(n +1))n q (n +1) E
S
U
u
S
m
Proof. Assume that the distinguisher D receives a random DBDH problem instance (g, A = g a , B = g b , C = g c , Z ∈ GT ) , his goal is to decide whether Z = e(g, g)abc or not. D will run the adversary as a subroutine and act as the adversary’s challenger in the IND-IASCfMR-CCA2 game. Our proof is based on Waters’ idea such as in [19,20,21]. Setup: Let lu = 2(qE + qS + qU ) and lm = 2qS , D choose randomly (1) Two integers ku and km (0 ≤ ku ≤ nu , 0 ≤ km ≤ nm ). (2) An integer x ∈ Zlu , an nu -dimensional vector X = (xi )(xi ∈ Znu ). (3) An integer z ∈ Zlm , an nm -dimensional vector Z = (zj )(zj ∈ Znm ). (4) Two integers y , ω ∈ Zp , an nu -length vector Y = yi (yi ∈ Zp ) and an nm -length vector W = ωj (ωj ∈ Zp ). For ease of analysis, we define the functions for an identity u and a message m respectively: F (U) = −lu ku + x + i∈U xi and J(U) = y + i∈U yi K(m) = −lm km + z + j∈M zj and L(m) = ω + j∈M ωj Then the challenger assigns a set of public parameters as follows.
g1 = g a , g2 = g b , u = g2−lu ku +x g y , ui = g2xi g yi (1 ≤ i ≤ nu )
z
m = g2−lm km +z g ω , mj = g2j g ωj (1 ≤ j ≤ nm ) Note that these public parameters have the same distribution as in the game between the distinguisher D and the adversary. For any identity u and any message m, we have F (u) K(m) L(m) U = u ui = g2 g J(u) , m mj = g 2 g i∈U
j∈M
An ID-Based Anonymous Signcryption Scheme for Multiple Receivers Secure
23
First stage: D answers the queries as follows: Extract queries. When the adversary asks for the private key corresponding to an identity U. The distinguisher D first checks if F (U) = 0 and aborts in this situation. Otherwise, it chooses a random ru ∈ Zp and gives the adversary the −J(u) −1 pair du = (du1 , du2 ) = (g1F (u) (u i∈U ui )ru , g1F (u) g ru ) Let ru = ru − F α(u) , as in Waters’ proof [18] and Paterson’s proof [19] and we will show in the following, du is a valid private key for identity U. The distinguisher D can generate such a du if and only if F (U) = 0modlu . The simulation is perfect since −J(u)
F (u) J(u) ru
du1 = g1F (u) (g2 =
g
)
α F (u) g2α (g2 g J(u) )ru − F (u) −1
α
F (u) J(u)
= g2α (g2 =
g
−α
F (u) J(u) ru
) F (u) (g2
g
)
F (u) g2α (g2 g J(u) )ru
and du2 = g1F (u) g ru = g ru − F (u) = g ru Signcryption queries. At any time, the adversary can perform a signcryption query for a signer list L = {ID1 , ID2 , ..., IDn }, a receiver list L = {ID1 , ID2 , ..., IDn } and a plaintext m. If for all j ∈ [1, n], F (Uj ) = 0modlu , D will simply abort. Otherwise, D first choose an identity Ui , where F (Ui ) = 0modlu , generates a private key di for Ui just calling the extract query algorithm described above, and then runs Signcrypt (m, di , L, L ) to answer the adversary’s query. Otherwise, D will simply abort. Unsigncryption queries. At any time, the adversary can perform an unsigncryption query on a ciphertext σ for a signer list L = {ID1 , ID2 , ..., IDn } and a receiver list L = {ID1 , ID2 , ..., IDn }. If for all j ∈ [1, n ], F (Uj ) = 0modlu , = D will simply abort. Otherwise, D first choose an identity Ui , where F (Ui ) modlu , generates a private key di for Ui just calling the extract query algorithm described above, and then runs Unsigncrypt (σ, di , L, L ) to answer the adversary’s query.
Challenge: After a polynomially bounded number of queries, the adversary chooses a signer list L∗ = {ID1∗ , ID2∗ , ..., IDn∗ }, a receiver list L∗ = {ID1∗ , ID2∗ , ..., IDn∗ }, on which he wishes to be challenged. Note that the adversary has not asked a key extraction query on any identity in L∗ during the first stage. Then the adversary submits two messages m0 , m1 ∈ GT to D. D checks whether the following conditions are fulfilled: ∗ ∗ (1) F (u∗ j ) = 0modlu for all j ∈ [1, n ] where uj = Hu (IDj ) ∗ ∗ ∗ (2) K(m ) = 0modlm where m = Hm (mγ , L , Z)
If not all above conditions are fulfilled, D will abort. Otherwise, D flips a fair binary coin γ and constructs a signcryption ciphertext of Mγ as follows. mγ [i] denotes the ith bit of m∗ and let M ⊂ {1, 2, ..., nm} be the set of indices j such that mγ [i] = 1. D choose an identity u∗s , where F (u∗s ) = 0modlu and r1 , r2 , ..., rn ∈R Zp . D sets the ciphertext as
24
B. Zhang and Q. Xu −1 F (u∗ )
∗
(mγ ⊕ H(Z), {g r1 , g r2 , ..., g rs−1 , g rs · g1 s g rs , g rs+1 , ..., g rn }, n −J(u∗ s) ∗ F (u∗ ) F (u∗ ) J(u∗ ) i {C |i = 1, 2, ..., n }, C, g1 s · (g2 i g J(ui ) )ri · C L(mγ ) ) i=1
let Z = e(g, g)abc , c = rm , C = g c , the simulation is perfect since ∗
Z = e(g, g)abc = e(g1 , g2 )rm , C J(ui ) = (Ui∗ )rm , n −J(u∗ s) ∗ F (u∗ ) F (u∗ ) g1 s · (g2 i g J(ui ) )ri · C L(mγ ) i=1
= d∗s1 · (
n
j=1
(Uj )rj )(m
mj )rm
j∈M
Second stage: The adversary then performs a second series of queries which are treated in the same way as the first stage. Guess: At the end of the simulation, the adversary outputs a guess γ of γ. If γ = γ, D answers 1 indicating that Z = e(g, g)abc ; Otherwise, D answers 0 to the DBDH problem. Probability of success: Now we have to assess D’s probability of success. For the simulation to complete without aborting, we require the following conditions fulfilled: (1) Extraction queries on an identity ID have F (u) = 0modlu , where u = Hu (ID). (2) Signcryption queries on a message m, a signer list L and a receiver list L have F (ui ) = 0modlu , for some i ∈ [1, n] where IDi ∈ L. (3) Unsigncryption queries on a ciphertext σ, a signer list L and a receiver list L have F (ui ) = 0modlu for some i ∈ [1, n ] where IDi ∈ L . ∗ ∗ ∗ (4) F (uj ) = 0modp for all j ∈ [1, n ], where u∗ j = Hu (IDj ) and K(m ) = ∗ ∗ 0modp where m = Hm (mγ , L ). Let u1 , u2 , ..., uqI be the output of the hash function Hu appearing in queries not involving the challenge identity list L∗ . Clearly, we will have qI ≤ qE + qS + qU . Define the events Ai : F (ui ) = 0modlu where i = 1, 2, ..., qI ∗ ∗ A : F (u∗ ) j = 0modp for all j ∈ [1, n ], where uj = Hu (IDj ) ∗ ∗ ∗ ∗ B : K(m ) = 0modp where m = Hm (mγ , L ) I Then the probability of D not aborting is P r[abort] P r[∧qi=1 Ai ∧ A ∧ B ∗ ] Since the function F and K are selected independently, therefore, the event I Ai ∧ A ) and B ∗ are independent. Assume lu (nu + 1) < p which implies (∧qi=1 0 ≤ lu nu < p. It is easy to see that F (u) = 0modp =⇒ F (u) = 0modlu . Furthermore, this assumption implies that if F (u) = 0modlu , there will be a
An ID-Based Anonymous Signcryption Scheme for Multiple Receivers Secure
25
unique ku with 0 ≤ ku ≤ nu such that F (u) = 0modp. For the randomness of ku , x and X, we have
On the other hand, for any i, the event Ai and A are independent, so we have I I I P r[∧qi=1 Ai ∧ A ] = P r[A ]P r[∧qi=1 Ai |A ] = P r[A ](1 − P r[∨qi=1 Ai |A ])
1 1 Similarly, we have P r[B ∗ ] = lm nm +1 By combining the above result, we have
P r([abort]) I ≥ P r[∧qi=1 Ai ∧ A ∧ B ∗ ] 1 ≥ n +2 2 ((qE + qS + qU )(nu + 1))n qS (nm + 1)
If the simulation does not abort, the adversary will win the game in definition 4 with probability at least E . Thus B can solve for the DBDH problem instance E with probability 2n +2 ((q +q +q )(n +1))n q (n +1) E
S
U
u
S
m
Theorem 3. Under the CDH assumption, the proposed IASCfMR scheme is existentially unforgeable against adaptive chosen message attack. Proof. Assume that a EUF-IASCfMR-CMA forger for our scheme exists, we will construct a challenger C , who runs the forger as a subroutine to solve an instance of CDH problem. C is given a group G, a generator g and elements g a and g b . His goal is to compute g ab . C first sets the public parameters using the Setup algorithm described in the previous proof. Note that in Setup phase, C assigns g1 = g a and g2 = g b . After C defines functions F (u), J(u), K(m), L(m) and public parameters u , m , ui , mj , we have
26
B. Zhang and Q. Xu
u
F (u) J(u)
ui = g2
i∈U
g
,
m
K(m) L(m)
mj = g 2
g
.
j∈M
Then, the forger can perform a polynomially bounded number of queries including private key extraction queries, signcryption queries, and unsigncryption queries. The challenger C answers the forger in the same way as that of Theorem 2. Finally, if C does not abort, the forger will return a new ciphertext σ ∗ = (c∗ , σ1∗ , σ2∗ , σ3∗ , σ4∗ , L∗ ) on message m∗ , where m∗ has never been queried. Now, C can unsigncrypt σ ∗ and obtain m∗ . C checks whether the following conditions are fulfilled: (1) F (u∗j ) = 0modlu for all j ∈ [1, n], where u∗j = Hu (IDj∗ ) (2) K(m∗ ) = 0modlm where m∗ = Hm (mγ , L∗ ) If not all the above conditions are fulfilled, C will abort. Otherwise C computes and outputs g2α ni=1 (Ui )ri · (m j∈M ∗ mj )rm σ4∗ = n ∗ J(u∗ ) J(u∗ J(u∗ ) L(m∗ ) i )ri · g L(m )rm R1 1 ...Rn n Rm i=1 g n ∗ F (u∗ ) g2α i=1 (g2 i g J(ui ) )ri · (m j∈M ∗ mj )rm = n = g2α = g ab ∗ J(u∗ i )ri · g L(m )rm g i=1 as the solution to the given CDH problem.
6
Conclusions
We have proposed an IASCfMR scheme that satisfy the semantic security, unforgeability and signcrypter identity’s ambiguity. To our best knowledge, this is the first IASCfMR scheme that can be proven secure in the standard model. As we can see from the concrete scheme, the cost is linear with the size of group. It remains an open problem to construct a much more efficient scheme that is secure in the standard model with constant size signcryption ciphertext while removing all limitations on the size of group.
References 1. Zheng, Y.: Digital signcryption or how to achieve cost (signature & encryption) cost (signature)+cost (encryption). In: Kaliski Jr., B.S. (ed.) CRYPTO 1997. LNCS, vol. 1294, pp. 165–179. Springer, Heidelberg (1997) 2. Shamir, A.: Identity-based cryptosystem and signature scheme. In: Blakely, G.R., Chaum, D. (eds.) CRYPTO 1984. LNCS, vol. 196, pp. 120–126. Springer, Heidelberg (1985) 3. Malone-Lee, J.: Identity based signcryption, Cryptology ePrint Archive. Report 2002/098
An ID-Based Anonymous Signcryption Scheme for Multiple Receivers Secure
27
4. Libert, B., Quisquator, J.: A new identity based signcryption scheme from pairings. In: Proc. IW 2003, pp. 155–158 (2003) 5. Boyen, X.: Multipurpose identity based signcryption: a Swiss army knife for identity based cryptography. In: Boneh, D. (ed.) CRYPTO 2003. LNCS, vol. 2729, pp. 383– 399. Springer, Heidelberg (2003) 6. Chen, L., Malone-Lee, J.: Improved identity-based signcryption. In: Vaudenay, S. (ed.) PKC 2005. LNCS, vol. 3386, pp. 362–379. Springer, Heidelberg (2005) 7. Barreto, P., Libert, B., McCullagh, N., et al.: Efficient and provably-secure identity based signatures and signcryption from bilinear maps. In: Roy, B. (ed.) ASIACRYPT 2005. LNCS, vol. 3788, pp. 515–532. Springer, Heidelberg (2005) 8. Yu, Y., Yang, B., Sun, Y., et al.: Identity based signcryption scheme without random oracles. Computer Standards and Interfaces 31(1), 56–62 (2009) 9. Rivest, R., Shamir, A., Tauman, Y.: How to leak a secret. In: Boyd, C. (ed.) ASIACRYPT 2001. LNCS, vol. 2248, pp. 552–565. Springer, Heidelberg (2001) 10. Huang, X., Su, W., Mu, Y.: Identity-based ring signcryption scheme: cryptographic primitives for preserving privacy and authenticity in the ubiquitous world. In: Safavi-Naini, R., Seberry, J. (eds.) ACISP 2003. LNCS, vol. 2727, pp. 649–654. Springer, Heidelberg (2003) 11. Li, F., Xiong, H., Yu, Y.: An efficient id-based ring signcryption scheme. In: International conference on Communications, Circuits and Systems, ICCCAS 2008, pp. 483–487 (2008) 12. Zhu, Z., Zhang, Y., Wang, F.: An efficient and provable secure identity based ring signcryption scheme. Computer Standards and Interfaces, 649–654 (2008) 13. Zhang, J., Gao, S., Chen, H., et al.: A novel ID-based anonymous signcryption scheme. In: Li, Q., Feng, L., Pei, J., Wang, S.X., Zhou, X., Zhu, Q.-M. (eds.) APWeb/WAIM 2009. LNCS, vol. 5446, pp. 604–610. Springer, Heidelberg (2009) 14. Duan, S., Cao, Z.: Efficient and Provably Secure Multi-receiver Identity-based Signcryption. In: Batten, L.M., Safavi-Naini, R. (eds.) ACISP 2006. LNCS, vol. 4058, pp. 195–206. Springer, Heidelberg (2006) 15. Lal, S., Kushwah, P.: Anonymous ID Based Signcryption Scheme for Multiple Receivers. Cryptology ePrint Archive: Report 2009/345 (2009), http://eprint.iacr.org/2009/345 16. Bellare, M., Rogaway, P.: Random oracles are practical: a paradigm for designing efficient protocols. In: Proc. CCS 1993, pp. 62–73 (1993) 17. Canetti, R., Goldreich, O., Halevi, S.: The random oracle methodology, revisited (preliminary version). In: Proc. STOC 1998, pp. 209–218 (1998) 18. Boneh, D., Franklin, M.: Identity-based encryption from the Weil pairings. In: Kilian, J. (ed.) CRYPTO 2001. LNCS, vol. 2139, pp. 213–229. Springer, Heidelberg (2001) 19. Waters, R.: Efficient identity based encryption without random oracles. In: Cramer, R. (ed.) EUROCRYPT 2005. LNCS, vol. 3494, pp. 114–127. Springer, Heidelberg (2005) 20. Paterson, K., Schuldt, J.: Efficient identity based signatures secure in the standard model. In: Batten, L.M., Safavi-Naini, R. (eds.) ACISP 2006. LNCS, vol. 4058, pp. 207–222. Springer, Heidelberg (2006) 21. Au, M., Liu, J., Yuen, T., et al.: ID-Based ring signature scheme secure in the standard model. In: Yoshiura, H., Sakurai, K., Rannenberg, K., Murayama, Y., Kawamura, S.-i. (eds.) IWSEC 2006. LNCS, vol. 4266, pp. 1–16. Springer, Heidelberg (2006)
A Supervised Locality Preserving Projections Based Local Matching Algorithm for Face Recognition* Yingqi Lu1, Cheng Lu1, Miao Qi2, and Shuyan Wang2,** 1
School of Computer Science and Technology, Jilin University, China 2 School of Computer Science and Information Technology, Northeast Normal University, China [email protected]
Abstract. In this paper, a novel local matching algorithm based on supervised locality preserving projections (LM-SLPP) is proposed for human face recognition. Unlike the holistic face recognition methods which operates directly on the whole face images and obtains a global face features, the proposed LM-SLPP operates on sub-patterns partitioned from the original whole face image and separately extracts corresponding local sub-features from them. In our method, the input face images are firstly divided into several sub-images. Then, the supervised locality preserving projections is applied on each sub-image set for feature extraction. At last, the nearest neighbor classifier combined with major voting is utilized to classify the new face images. The efficiency of the proposed algorithm is demonstrated by experiments on Yale and YaleB face databases. Experimental results show that LM-SLPP outperforms other holistic and sub-pattern based methods. Keywords: Pattern recognition; Face recognition; Manifold learning; Supervised locality preserving projections.
A Supervised Locality Preserving Projections Based Local Matching Algorithm
29
categories: holistic based methods and local matching based methods [3]. Currently, the most representative holistic based methods for face recognition are principal component analysis (PCA) [4], Fisher linear discriminant analysis (LDA) [5], independent component analysis (ICA) [6], non-negative matrix factorization (NMF) [7] and locality preserving projection (LPP) [8]. The character of them is that they operates directly on the whole face images and obtains a global face features under different rules. More recently, the local matching based face recognition methods which extract facial features from different levels of locality, show more promising results in face recognition tasks [3]. To the best of our knowledge, the first local matching based face recognition method is proposed by Pentland et al. [9]. In this method, the original eigenface [4] method is extended to a layered representation by combining with other eigenmodules, such as eigeneyes, eigennoses, and eigenmouths. Then, this modular eigenface approach was studied and extended by several other researchers. In [10], Rajkiran and Vijayan proposed a modular PCA (mPCA) method for face recognition. mPCA first divides the input face images into smaller sub-images, and then extracts the sub-pattern features by applying PCA to all sub-image blocks. Chen and Zhu proposed a similar approach called sub-pattern PCA (SpPCA) [11]. In their method, the whole images were also firstly partitioned into a set of equally-sized sub-patterns in a non-overlapping way as mPCA. Secondly, PCA was performed on each of subpattern sets which share the same original feature components. In [12], the SpPCA method is extended to adaptively weighted sub-pattern PCA (Aw-SpPCA). In Aw-SpPCA, the weight of each sub-image block was determined by the similarities between the sub-pattern’s probe set and gallery set. Besides PCA, some other feature extraction methods were also used for local matching based face recognition, such as Sub-Gabor [17], SpNMF [18] and LRR [19]. In [20], an adaptive weighted subpattern LPP (Aw-SpLPP) algorithm is proposed for face recognition. This method uses LPP to extract the local facial features and the weight of each sub-image block is determined by the neighborhood information of each sub-pattern. In this paper, a novel local matching algorithm based on supervised locality preserving projections (LM-SLPP) is proposed for human face recognition. Like the aforementioned local matching methods, the first step of LM-SLPP is to partition an original whole face images into a set of equally-sized non-overlapping sub-patterns, and then all those sub-patterns sharing the same original feature components are respectively collected from the training set to compose a corresponding sub-pattern’s training set. In the second step, SLPP is applied to each sub-pattern’s training set to extract its features. Finally, each sub-pattern’s features are concatenated together to classify a new face image. Since the SLPP can simultaneously preserve the manifold structures of the sub-pattern sets and improve the discriminability of the embedded results. The proposed LM-SLPP outperforms other holistic and local matching based methods, such as PCA, LPP and SpPCA. Here, it should be pointed out that the main difference between our method and Aw-SpLPP [20] is that our LM-SLPP integrates the discriminative information into the feature extraction step, and does not need to compute the weights of the sub-patterns.
30
Y. Lu et al.
The rest of this paper is organized as follows. In Section 2, we briefly review the LPP and supervised LPP (SLPP) algorithms. The proposed LM-SLPP method is presented in Section 3. Experimental and comparison results are shown in Section 4 and conclusions are given in Section 5.
2 Review of LPP and SLPP The locality preserving projections (LPP) is a recently proposed dimensionality reduction method [8]. Unlike the traditional linear methods such as PCA and LDA which aim to preserve the global structures of input data. The objective of LPP is to preserve the local structure and discover the underlying manifold geometry of the original high-dimensional data. Formally, let X = [x1, x2, …, xn] denote n data points in a high M dimensional space. The goal of LPP is to project the high dimensional data into a low-dimensional manifold subspace that can maximally preserve the original data’s locality. Let us denote the corresponding set of n points in m (m << M) dimensional subspace as Y = [y1, y2, …, yn]. The objective function of LPP is as follows:
min
∑(y
i
− y j ) 2 S ij
(1)
i, j
where Sij is the similarity of xi and xj. In [8], two ways of defining Sij using heat kernel function were given as:
⎧ ⎛ ⎪exp⎜ − x i − x j S ij = ⎨ ⎝ ⎪0, ⎩
2
t ⎞⎟, ⎠
if x i − x j
2
<ε
(2)
otherwise
or
S ij
⎧ ⎛ ⎪ exp ⎜⎝ − x i − x j ⎪ =⎨ ⎪ ⎪ 0, ⎩
2
t ⎞⎟ , ⎠
if x i is among k nearest neighbors or x j is among k nearest neighbors
of x j of x i
(3)
otherwise
where t is a parameter which determines the rate of decay of the similarity function, and ε in Equation (2) is a small positive real number. From the objective function, it can be seen clearly that the choice of symmetric weights Sij (Sij = Sji) incurs a heavy penalty if neighboring points xi and xj are projected far apart. Thus, minimizing Equation (1) can ensure that if xi and xj are close in high-dimensional space, then their projected results yi and yj are close as well. We suppose W is a transformation matrix, that is, Y=WTX. After some simple algebraic steps, the objective function of LPP can be reduced to:
A Supervised Locality Preserving Projections Based Local Matching Algorithm
∑ (y
i
− y j ) 2 S ij =
i, j
∑ (W
T
31
x i − W T x j ) 2 S ij
i, j
⎛ =2⎜ ⎜ ⎝
∑W x S x W − ∑W x S = 2 tr (W XDX W − W XSX W ) = 2tr (W X (D − S )X W ) T
T
i
ij i
i
i, j
ij x j W
i, j
T
T
T
T
⎞ ⎟ ⎟ ⎠
(4)
T
T
= 2tr ( W T XLX T W ) where tr (⋅) denotes the trace operator, D is a diagonal matrix whose entries are column sums of S, i.e. Dii = Σj Sij, and L = D – S is the Laplacian matrix. The entry of matrix D indicates how important each data point is. Therefore, a constraint is imposed as follows: W T XDX T W = I
(5)
Finally, the objective function of LPP can be obtained as: arg min W T XLX T W W
(6)
s.t. W XDX W = I T
T
By applying the Lagrange multiplier method, the transformation matrix W that minimizes the objective function can be given by the minimum eigenvalue solution to the generalized eigenvalue problem as: XLX T W = λXDX T W
(7)
Although the LPP method can effectively preserve the manifold structure of the input data, its discriminability is little because the label information is neglected during dimensionality reduction. Therefore, a supervised LPP (SLPP) is proposed to overcome this limitation [13]. In SLPP, the similarity matrix S in Equation (3) is computed with the constraint that each point’s k nearest neighbors must be chosen from the samples with the same class label as its. In other words, Sij in SLPP is obtained as:
⎧ ⎛ ⎪exp⎜⎝ − x i − x j ⎪ ⎪ ⎪ S ij = ⎨ ⎪ ⎪ ⎪ ⎪ ⎩0,
2
t ⎞⎟, ⎠
if x i is among k nearest neighbors of x j and has the same class label as x j or if x j is among k nearest neighbors of x i and has the same class label as xi otherwise
(8)
32
Y. Lu et al.
Through introducing the class label into the process of similarity matrix construction, the embedding results of SLPP are more easily to be classified [13].
3 Proposed LM-SLPP The proposed LM-SLPP method consists of three main steps: (1) partition face images into sub-patterns, (2) apply SLPP to sub-patterns sharing the same original feature components for feature extraction, (3) classify an unknown face image. 3.1 Image Partition
In the proposed method, we need partition each input face image into several subimages firstly. In local matching based face recognition methods, we can either divide a face image into a set of equally or unequally sized sub-images. However, how to choose appropriate sub-image size which gives optimal performance is still an open problem. In our work, without loss of generality, equally sized partition is adopted as many other approaches [10-12].
…
… Sub-pattern set 1
… Sub-pattern set 2
… Sub-pattern set 3
…
… Sub-pattern set K
Fig. 1. The construction of sub-image pattern sets (face images come from Yale face database)
Formally, supposing there are N face images belonging to P persons in the training set, these persons possess N1, N2, …, NP images, respectively, and the size of each image is H1×H2. We first partition each face image into K equally sized sub-images in a non-overlapping way, and then further concatenate them into corresponding column vectors with dimensionality of H1×H2/K. After all training images are partitioned, the sub-pattern vectors at the same position of all face images are collected to form a specific sub-pattern’s training set. Therefore, we can get K separate sub-pattern sets totally. This image partition process is illustrated in Fig. 1.
A Supervised Locality Preserving Projections Based Local Matching Algorithm
33
3.2 SLPP for Feature Extraction
After the image partition procedure, we have already obtained K sub-pattern training sets through image partition. For each sub-pattern set, denotes by SPi (i = 1, 2, …, K), its locality preserving features can be extracted using SLPP. Let Xi = [xi1, xi2, …, xiN] denote N column vectors in SPi. In this step, the k nearest neighbors of each xin (n=1, 2, … , N) with the same class label are firstly selected using Euclidean metric. Then, the supervised similarity matrix is computed by Equation (8). At last, the transformation matrix Wi of the ith sub-pattern set SPi can be obtained by solution to the generalized eigenvalue problem as: X i Li X i T W i = λ X i D i X i T W i
(9)
where Di and Li are the diagonal matrix and Laplacian matrix, respectively. Let λ1 , λ 2 , K , λ r (r < H1×H2/K ) be the first r smallest eigenvalues of XiLiXiT and XiDiXiT, and w1, w2, …, wr be the corresponding eigenvectors. We can get: Wi = [ w1 , w2 , K , wr ]
(10)
3.3 Classification
In order to classify a new face, the unknown face image U is firstly divided into K sub-patterns in the same way previously applied to the training images. Then, each unknown sub-pattern’s features are extracted using the corresponding transformation matrix Wi (i = 1, 2, …, K). The identity of each sub-pattern is determined by a nearest neighbor classifier using Euclidean distance. Because there are K sub-patterns obtained by the unknown face image and the classification results of them are independent with each other, we will get total K recognition results for the unknown face image. Therefore, for the sake of obtaining the final recognition result of the image U, a major voting method is used. Let the probability of the unknown image U belonging to the cth class be:
pc =
1 K
K
∑q
c i
(11)
i =1
where ⎧1, qic = ⎨ ⎩0,
if the ith sub − pattern belongs to cth class otherwise
(12)
Then, the final identity result of the unknown face image U is Identity (U ) = arg max ( p c ) c
(13)
34
Y. Lu et al.
4 Experiments In this section, the performance of the proposed LM-SLPP is evaluated on two standard face databases (Yale and Extended YaleB). Both the holistic (PCA, LPP) and local matching (SpPCA) based methods are used here for comparison. Furthermore, in order to test the effect of label information to improve the recognition performance, we also compare LM-SLPP with the local matching based on unsupervised LPP (LMLPP), in which the similarity matrix S is constructed by Equation (3). For all face data in each database, the original images were first normalized (in scale and orientation) such that the two eyes were aligned at the same position, then the facial areas were cropped into the final images for recognition. 4.1 Experimental Results on Yale Database
The Yale face database [14] is constructed by the Yale Center for Computation Vision and Control. There are 165 images of 15 individuals in this database (each person has 11 images). The variations of images are demonstrated in lighting condition (centerlight, left-light and right-light), facial expression (normal, happy, sad, sleepy, surprised and wink), and glasses (with glasses and without glasses). Figure 2 shows sample images of one person from Yale database. All face images are resized to 100×100 for computation efficiency in our experiments.
Fig. 2. Sample images of one individual in Yale database
In this experiment, we randomly choose six images of each individual to form the training set and the rest five images of each individual is considered as testing set. This random selection is repeated 10 times. For LPP, LM-LPP and LM-SLPP, the parameters are set as t=800 and k=5. The sub-image size in all local matching methods is chosen as 20×20. The average recognition rates versus subspace dimensions of all methods are shown in Fig. 3 and the best recognition rate obtained by each method is shown in Table 1. We can find that the performances of SpPCA, LM-LPP and LMSLPP are all better than the holistic methods such as PCA and LPP. This is due to that some local facial features may not vary with pose, illumination and expression. Thus, extracting these local features from sub-patterns of the face images can improve the robustness of local matching methods. Moreover, we can also observe that the LMLPP and LM-SLPP outperform SpPCA. This is because PCA is a linear feature extraction method, and cannot preserve the manifold structure of face images. At last, it can be seen that LM-SLPP performs better than LM-LPP. The reason is that the LMSLPP takes label information during feature extraction and can produce more discriminative embedded results.
A Supervised Locality Preserving Projections Based Local Matching Algorithm
35
Fig. 3. Performance comparisons of different algorithms on Yale database Table 1. The top recognition rate and corresponding subspace dimensions for different approaches on Yale database
Methods
PCA
LPP
SpPCA
LM-LPP
LM-SLPP
Top rates
78%
79.07%
82.53%
88.33%
90.0%
Dimensions
70
40
30
50
45
4.2 Experimental Results on Extended YaleB Database
The extended YaleB face database [15] [16] is an extension of Yale face database. For this database, we simply use the cropped images and resize them to 64×64 pixels. In our experiment, a dataset which contains 38 individuals and around 64 near frontal images under different expressions and illumination conditions per individual are chosen from the database. Figure 4 shows some sample cropped images of one person from extended YaleB database.
Fig. 4. Sample images of one individual in YaleB database
In this experiment, the parameters we set for all methods are the same as Section 4.1. Thirty images of each person are randomly selected as the training set and the left images are testing set. The sub-image size is set as 16×16. The best recognition rate
36
Y. Lu et al.
Table 2. The top recognition rate and corresponding subspace dimensions for different approaches on Extended YaleB database
Methods
PCA
LPP
SpPCA
LM-LPP
LM-SLPP
Top rates
56.4%
78.51%
91.99%
94.56%
95.8%
Dimensions
70
70
40
50
65
achieved by PCA, LPP, SpPCA, LM-LPP and LM-SLPP can be seen in Table 2. From this table, we can find that the local matching methods outperform the holistic methods and the proposed LM-SLPP obtains the best performance. These two observations are consistent with the experimental results in Yale database.
5 Conclusions A supervised locality preserving projections based local matching algorithm (LMSLPP) is proposed in this study. Our method possess the following two characters: First, LM-SLPP extracts local facial features from the sub-patterns partitioned from whole face images. Thus, it is not very sensitive to facial pose, illumination and expression. Second, LM-SLPP uses the supervised LPP for feature extraction, which can not only preserve the manifold structures of the sub-pattern sets, but also takes the label information into consideration. We test our method on two standard face databases, and compare it with other holistic and local matching methods. Experimental results show that the proposed method can produce better recognition rate.
References 1. Cevikalp, H., Neamtu, M., Wikes, M., Barkana, A.: Discriminative Common Vectors for Face Recognition. IEEE Transaction on Pattern Analysis and Machine Intelligence 27(1), 4–13 (2005) 2. Zhao, W., Chellappa, R., Phillips, P.J., Rosenfeld, A.: Face recognition: a literature survey. ACM Comput. Surv. 35(4), 399–458 (2003) 3. Zou, J., Ji, Q., Nagy, G.: A Comparative Study of Local Matching Approach for Face Recognition. IEEE Transactions on Image Processing 16(10), 2617–2628 (2007) 4. Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognitive Neurosci. 3(1), 71–86 (1991) 5. Belhumeur, P.N., Hepanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transaction on Pattern Analysis and Machine Intelligence 19(7), 711–720 (1997) 6. Barlett, M.S., Movellan, J.R., Sejnowski, T.J.: Face recognition by independent component analysis. IEEE Transaction on Neural Network 13(6), 1450–1464 (2002) 7. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process, 556–562 (2000) 8. He, X., Yan, S., Hu, T., Niyogi, P., Zhang, H.: Face recognition using Laplacianfaces. IEEE Transaction on Pattern Analysis and Machine Intelligence 27(3), 328–340 (2005)
A Supervised Locality Preserving Projections Based Local Matching Algorithm
37
9. Pentland, A., Moghaddam, B., Starner, T.: View-Based and Modular Eigenspaces for Face Recognition. In: CVPR 1994, pp. 84–91 (1994) 10. Gottumukkal, R., Asari, V.K.: An improved face recognition technique based on modular PCA approach. Pattern Recognition Letters 25, 429–436 (2004) 11. Chen, S., Zhu, Y.: Subpattern-based principle component analysis. Pattern Recognition 37, 1081–1083 (2004) 12. Tan, K., Chen, S.: Adaptively weighted sub-pattern PCA for face recognition. Neurocomputing 64, 505–511 (2005) 13. Zheng, Z., Zhao, Z., Yang, Z.: Gabor Feature Based Face Recognition Using Supervised Locality Preserving Projection. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2006. LNCS, vol. 4179, pp. 644–653. Springer, Heidelberg (2006) 14. Yale University Face Database, http://cvc.yale.edu/projects/yalefaces/yalefaces.html 15. Georghiades, A.S., Belhumeur, P.N., Kriegman, D.J.: From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Trans. Pattern Anal. Mach. Intelligence 23(6), 643–660 (2001) 16. Lee, K.C., Ho, J., Kriegman, D.: Acquiring Linear Subspaces for Face Recognition under Variable Lighting. IEEE Trans. Pattern Anal. Mach. Intelligence 27(5), 684–698 (2005) 17. Nanni, L., Maio, D.: Weighted Sub-Gabor for face recognition. Pattern Recognition Letters 28, 487–492 (2007) 18. Zhu, Y.-L.: Sub-pattern non-negative matrix factorization based on random subspace for face recognition. In: International Conference on Wavelet Analysis and Pattern Recognition, pp. 1356–1360 (2007) 19. Xue, H., Zhu, Y., Chen, S.: Local ridge regression for face recognition. Neurocomputing 72, 1342–1346 (2009) 20. Wang, J., Zhang, B., Wang, S., Qi, M., Kong, J.: An adaptively weighted sub-pattern locality preserving projection for face recognition. J. Network Comput. Appl. (2010), doi:10.1016/j.jnca.2009.12.013
Information Systems Security Criticality and Assurance Evaluation Moussa Ouedraogo1,2, Haralambos Mouratidis2, Eric Dubois1, and Djamel Khadraoui1 1 Public Research Center Henri Tudor - 1855 Kirchberg/Luxembourg {moussa.ouedraogo,eric.dubois,djamel.khadraoui}@tudor.lu 2 School of Computing, IT and Engineering, University of East London, England [email protected]
Abstract. A prerequisite to implement effective and efficient Information Systems security measures is to have a clear understanding of both, the business that the system will support and the importance of the system in the operating environment. Similarly, the evaluation of one’s confidence in the deployed safeguarding measures, to adequately protect system assets, requires a better understanding of the security criticality of the system within its context of use (i.e. where is the system used and what for?). This paper proposes metrics as well as a methodology for the evaluation of operational systems security assurance. A critical feature of our approach is that assurance level is dependent on the measurement of security correctness and system security criticality. To that extend, we also propose a novel classification scheme for Information Systems based on their security criticality. Our work is illustrated with an application based on the case study of a Domain Name Server (DNS). Keywords: Security assurance, criticality, security verification, Multi-agent systems.
Information Systems Security Criticality and Assurance Evaluation
39
but their actual deployment may be less impressive or unidentified hazards in the system environment may render them less effective. How good, for instance, is a fortified door if the owner, inadvertently, leaves it unlocked? Or considering a more technical example, how relevant is a firewall for a critical system linked to the Internet if it is configured to allow any incoming connections?
Fig. 1. Security assurance evaluation model
Therefore, monitoring and reporting on the security status or posture of IT systems can be carried out to determine compliance with security requirements [2] and to get assurance as to their ability to adequately protect system assets. This remains one of the fundamental tasks of security assurance, which is here defined as the ground for confidence on deployed security measures to adequately protect system assets. Unfortunately most of what has been written about security assurance is definitional. Published literatures either aim at providing guidelines for identifying metrics ([3], [4], [5]), without providing indications on how to combine them into quantitative or qualitative indicators that are important for a meaningful understanding of the security posture of an IT component; or target end products ([6]). Our approach: We argue that evaluation of system security assurance only make sense when placed within a risk management context. To reflect this, our method literally takes place after the risk assessment has been completed and the countermeasures deployed. Figure.1 shows the security assurance evaluation model and how it relates to the risk assessment stage, whose concepts are depicted in bold. The security requirements identified for the risks mitigation could come either on the form of security functions deployed on the system or on the form of guidelines for security relevant properties i.e. those parameters that are not directly linked to security but when altered could induce a security issue. According to the NIST special publication 800-33 [7], the assurance that the security objectives (integrity, availability, confidentiality, and accountability) will be adequately met by a specific implementation depends partly on whether required
40
M. Ouedraogo et al.
security functionality is present and correctly implemented. Heeding that call, our approach to evaluating the security assurance of a security measure is founded on: •
•
Key verifications that aim to: (i) ensure that any security measures identified as necessary during the risk assessment stage have been implemented and is running (availability check), (ii) ensure the correctness of the configuration of the security measures at any time using a reference configuration file (conformity check). The security criticality, defined as the magnitude of the impact of an eventual security breach for an organization/ individual in a specific context, of the context in which the system is operating is accounted for when determining the security assurance level of a system.
The result of these three parameters are integrated in our security assurance function (refer to section 4.4) to yield a value of security assurance. Users may elect to use a system with a set of predefined security measures for its protection. However, once the system is deployed, previously unknown errors or vulnerabilities may surface for a given security entity or, environmental assumptions may need to be revised. Furthermore, the effectiveness of most security measures is limited in time. Today’s state of the art protection may be by-passed with relative ease tomorrow as attackers’ techniques are getting more and more sophisticated. As a result of operation, feedback could be given that would require the operator to correct the system security model or redefine its security requirements or environmental assumptions in view of strengthening the security of the system. To handle that eventuality, the vulnerability check, which is associated to each evaluated security entity, uses a known vulnerability database such as the National Vulnerability Database (NVD, http://nvd.nist.gov) to verify whether any vulnerability has been identified for an evaluated protection measure or security relevant parameter. Recommendations on how to overcome such matter are then taken into account by the operator and will help constitute the new reference against which any a posteriori conformity evaluation of the protection measure will be undertaken. This ensures that the system security policy is permanently updated and henceforth presents enough quality to face up to potential threats to the system. One of the main drawbacks of traditional risk management is that it is often a one-shot activity, or at best it is performed at regular but distant intervals of time (every six months, or so). To that extent, the continuous vulnerability check adds a hint of “dynamic risk management” to our approach. Outline: The rest of the paper is organized as followed: Section 2 presents related work. Section 3 provides a classification scheme for measuring a system’s security criticality. Section 4 describes the steps of the security assurance methodology. Section 5 discusses the choice of architecture for the approach while section 6 illustrates its applicability with the aid of an application based on Domain Name Server (DNS). Section 7 concludes the paper and presents directions for future work.
2 Related Work Considerable efforts have been made across computer science disciplines to address the ever-growing issue of security. Information System engineering, for instance, has
Information Systems Security Criticality and Assurance Evaluation
41
recently called for the systematic integration of security in the development process, to help developers in producing more secure systems. In that perspective modeling methodologies such as Secure Tropos (www.securetropos.org) [8] and UMLsec [9] have been proposed in the literature. The rationale is that without a rigorous and effective way of dealing with security at system development process, the end product cannot be secured. While this is true, the emphasis on design and process evidence versus actual product software largely overshadows practical security concerns involving the implementation and deployment of operational systems [2]. The Common Criteria [6] defines security assurance evaluation requirements for the development and design phases. The seven security assurance rating of the standard can be used as a platform for customers to compare the security product of different vendors. However, CC is not directly applicable for the evaluation of the security assurance of a system in operation. In fact, it defines how the system must be developed, but not how to maintain it in the “correct” (i.e. intended) state. Defining metric taxonomy is also a topic where extensive works have been realized. Amongst the taxonomies for the evaluation of security are the one proposed by Vaughn et.al. [3], Seddigh et.al. [4] and Savola[5]. However, these contributions only provide means to find the metrics and do not indicate how to combine them into quantitative or qualitative indicators more meaningful for appreciating the security posture of an IT component. Current techniques for assurance evaluation, such as BUGYO [10, 11] assume that assurance level is independent of the context of use of the system. However, it is common for systems to operate in an evolving context i.e. different environments and/or for different purposes. Thus, assurance that implemented security measures will protect a system should depend on the security criticality level of the context. Our approach exhibits the following advantages: end-users’ usage of IT systems is multipurpose. Therefore, providing them with continuous monitoring so to get some assurance on the correctness of the security measures would help determine which activities can be performed (at a given time) with confidence, and those not to undertake or to be performed with more caveats. Moreover, providing indicators on the actual status of security measures will assist security managers in the management of their systems security since a drop in security assurance level would imply that a component security posture is no longer compliant with the security requirements specifications. The consequence being that, ill-intentioned individuals might exploit such vulnerability to inflict damage. Our methodology and tool may therefore help practitioners identify areas of the security measures which need attention and address them before it is too late.
3 Categorization and Determination of Systems Context Security Criticality Although a system security criticality is taken into account when defining the security measures, the actual evaluation of the confidence level in those measures has so far been conducted without considering the criticality of the context in which the system is running. In fact, far from being static, the security criticality of a system may change
42
M. Ouedraogo et al.
over time (depending on several factors including new or changed business objectives, new or updated regulations, new threats and so forth) while the in place security measures remain unchanged. Consider the following example: Alice may feel very confident in using an unsecured wireless connection for simple Internet browsing but that confidence would drop considerably if Alice were to use it for Internet banking. We could state that purpose 1 (web browsing) requires low security or that its context security criticality level (α) is low (mainly because any potential risk impact for that context will be relatively low for the user); whereas purpose 2 has a higher context security criticality level. Therefore, our methodology supports the systematic determination of a system’s context security criticality by using a well defined classification scheme. The Federal Information Processing Standards (FIPS) in its publication FIPS PUB 199 [12] establishes security categories for both information and IS. According to the latter, the security criticality levels are based on the potential impact on an organization, should certain events occur that jeopardize the information and IS needed by the organization to accomplish its assigned mission, protect its assets, fulfill its legal responsibilities, maintain its day-to-day functions, and protect individuals [12]. It is worth mentioning that Criticality Assessment is an activity of Risk Assessment (RA) ([13],[14]) dealing with the impact of incidents. After an analysis and comparison of different approaches for Criticality Assessment, we retain the model proposed in the Norwegian Oil Industry Association (OLF) guidelines [15], mainly because of the level of details it provides and the alignment of its classification scheme with the definition of the FIPS. Thus, we tailored it for assessing the security criticality of IS depending on their context of use in the following way: (1) Categories for impact evaluation. Health and safety: Does the occurrence of an Information Security incident for a given context lead to death or injury? Revenue or finance: Does the occurrence of incident lead to financial losses? Organization or individuals’ intangible assets: Does the occurrence of an incident lead to material loss or to intangible assets such as an individual or an organization reputation or competitiveness? Organization or individuals’ activity performance: Does the occurrence of an incident lead to the degradation of an individual or organization’s activity performance. (2) Levels of security criticality. Low: The loss of confidentiality, integrity, or availability (CIA) could be expected to have no or a negligible adverse effect on organizational operations, assets, or individuals; Moderate or Medium: The loss of CIA could be expected to have a limited adverse effect on organizational operations, organizational assets, or individuals; High: The loss of CIA could be expected to have a serious adverse effect on organizational operations, organizational assets, or individuals; Very high: The loss of CIA could be expected to have a severe or catastrophic adverse effect on organizational operations, organizational assets, or individuals. Our aim is to provide a quantitative context dependent security assurance value. Therefore, the value of the system security criticality (we note α) may be in the range [0, 1] or expressed in terms of percentage. The above defined security criticality level can be quantitatively classified as follows: Low: α 0.25; Moderate: 0.25 α 0.5; High: 0.5 α 0.75; Very high: 0.75 α 1. For a given context, there may be several impacts for the organization or the individual that may be considered as qualitatively the same (moderate for instance), but with different relevance. Thus, associating security criticality levels to a range of values could serve such purpose. Importantly, we consider that a system with a context security criticality level α means that the
Information Systems Security Criticality and Assurance Evaluation
43
security measures deployed for its protection should have an effectiveness level greater or equal to α for the system to stand the chance of being secure. This implies that, for instance, in a context where α=0, the security measures are required to have an effectiveness level (α >= 0) i.e. the security measures working or not is irrelevant since the system is supposedly in a context where no IT security risks exist. The determination of a system context security criticality is based on answering valuable questions for the individual or the organization. The questions should cover all the four categories previously. Rather than adopting a weighting system of the possible categories to account for their relevance for the individual or the organization, the highest security criticality of the four categories will determine the security criticality of the system context. Tables 1 to 4 provide detailed questionnaires proposed for the determination of the security criticality for each category. The questions have been identified based on the definition of security criticality and level of criticality provided by the FIPS. For each category, the process of determining the security criticality level is as follows: When a Yes is given to a question, the corresponding level should be noted and the next category of questions should be reviewed. There is no need to answer the remaining questions within the same category. The organization or individual may insert its own portfolio of relevant questions in the questionnaire to better suit their case. Table 1. Health and safety related questions question
Table 3. Intangible assets relate questions
Table 2. Finance and revenue related
Table 4. Activity performance questions
44
M. Ouedraogo et al.
4 Steps of the Security Assurance Evaluation Methodology The security assurance evaluation methodology was mainly inspired by the BUGYO methodology to evaluate, maintain and document the security assurance of IS [10]. However, it also distinguishes itself from BUGYO by considering assurance level as dependent on the context of a system. The methodology consists of the following steps: Assurance Components Modelling, Specification of the Metrics, Verification of the Component Security Posture, Assurance Level Aggregation, Display and Monitor 4.1 Assurance Components Modeling Modeling consists of determining and modeling assurance relevant components. This means that the model does not reflect the whole system but only critical elements that are important and need to be security assured towards a system’s well functioning continuity. An efficient way of identifying those critical components is an a priori use of a RA methodology. Weights can be assigned to each component to account for their respective impact on the overall system that is decomposed in hierarchical way. 4.2 Specification of the Metrics This step involves mainly specifying, for each assurance relevant component, the probe to conduct the verification of the correctness of a component and the frequency of the verification. A probe may be an existing security tool (such as firewall tester, IDS, system vulnerability scanner and so forth) or self developed programs to audit the operational system. The quality level of those probes affects the value of the security assurance. In fact a correlation exists between the quality of the probe used and one’s confidence (assurance) in the check result achieved. Highly qualitative probes will provide higher confidence in the verification outcome. We therefore developed a probe quality metric taxonomy to assist evaluators in determining the quality of the probe involved in the security verification. The metric taxonomy used is inspired by the organization of the Common Criteria security requirements into hierarchy of class–family–dependency and from the Systems Security Engineering Capability Maturity Model (SSE-CMM) levels. The SSE-CMM-like tailored levels are used to define the different quality levels possible for probes evaluating a security measure while some of the Common Criteria EAL families serve as the requirements for being assured at some level of quality. The underlying concept of this process is that high assurance can be consistently attained if a process exists to continuously measure and improve the quality of the security evaluation process. The Common Criteria (CC) philosophy of assurance helps in identifying some of the quality requirements pertinent to assurance. As a matter of fact, the CC philosophy asserts that greater assurance results from the application of greater evaluation effort, and that the goal is to apply the minimum effort required to provide the necessary level of assurance. The increasing level of effort is based upon: (i) Scope or coverage: The effort is greater because a larger portion of the IT system is included in the verification; Depth: The effort is greater because it is deployed to a finer level of design and implementation detail; Rigor: The effort is greater because the verification is applied in a more structured, formal manner. We use the five capability maturity levels of the SSE-CMM,
Information Systems Security Criticality and Assurance Evaluation
45
which we tailored to represent the verification probe quality levels and some of the CC families (Scope, Rigor, Depth and Independent verification) as the requirements to fulfil to be at some level of quality. The matrix shown in table 5 expresses the minimum requirements to achieve certain quality level. Table 5. Probe quality metric taxonomy Class
QAM: Probe Quality Metric
Family and meaning
Quality Level: QL 1 2 3 4 5 2 2 3 QAM_COV: Coverage (Larger coverage of the verified security measure 1 2 provides more confidence on the results about its status) QAM_DPT: Depth (A detailed verification of the security measure will 1 2 2 3 4 decrease the likelihood of undiscovered errors.) QAM_RIG: Rigor (The more structured the evaluation of the deployed 1 2 2 2 2 security measure, the more reliable the outcome of the verification) QAM_IND: Independent Verification (verification performed by a third 1 1 2 2 3 party evaluator or software tool provides more assurance)
A probe satisfies quality level 3 (QL3), for instance, if at least the following requirements are met: QAM_COV.2( Only some of the key areas of the security measure, known to be relevant for its well functioning are verified in the process.), QAM_DPT.2( High level verification of the security measure through its interface.), QAM_RIG.2( The verification process is structured and follows the requirements within a verification documentation or a standard.), QAM_IND.2( partial verification by independent probes available on the market). Due to space limitation, the complete probe quality taxonomy is not detailed in this paper. However, interested readers may refer to publication [16] for further details.
Fig. 2. Security measure conformity verification process
46
M. Ouedraogo et al.
4.3 Verification of the Components Security Posture Once the metrics have been specified, the next step consists in performing the verification of the security measures through probes available in the system. Those probes will carry out the metric specifications or base measures, which provide information on the nature of the verification, the targeted component, the frequency of the verification and so on. The verification process of a security measure with respect to conformity check is depicted in figure 2. It shows that verification probes verify implemented security measures by performing some base measures. An interpretation of the results of the base measures is then performed i.e. compared to a reference which can be a mandatory or recommended configuration file, a best practices database, and the like; to inform on whether a security measure has been properly deployed. The result of the interpretation is a derived measure that will be normalized (associated a discrete value) to inform on the posture of the security measure. Expert knowledge is required to associate to each possible conformity mismatch a discrete number (C ∈[0, QL]) which is correlated to the gravity of the mismatch for the system security. 0 will mean that nothing is conformed to the reference or that the mismatch is of a high risk and QL (probe quality will be used when there is a maximum match or mismatch with the lowest risk for the system security.). As for Availability checks, the normalized value is Boolean: “0” to signify security measure not present and “1” for security measure present and working.
Fig. 3. Security assurance aggregation process
4.4 Assurance Level Aggregation As previously mentioned, the confidence in a security-enforcing mechanism to adequately protect system assets, in a specific context of use, is dependent on whether the concerned security measure is present and properly implemented. For the sake of
Information Systems Security Criticality and Assurance Evaluation
47
verifying each of the deployed security measure, dedicated software probes are used. As shown in figure 3, probe agents collect results from probe and proceed to a first aggregation aiming at determining the overall indicator value for each of the assurance parameters (availability, conformity). Such an aggregation is particularly necessary since there may be need to verify more than one parameter (with different importance) of the security measure to account for its status with respect to each category of checks. Once that aggregation is completed, the actual value of the security measure security assurance is determined using a mathematical function. As a matter of fact, the security assurance of a security measure (S) with a conformity level with respect to the specified security requirement, working in a context with a security criticality level α is a function SAL: R3 → [0, 1] taking as input a 3-tuples of values on that are: Availability (A), Conformity (c) of (S) and the Context Security Criticality α (discussed in section 3) and produces a real value within the range [0, 1]. Some important elements to take into account in defining the assurance function are: (i) No assurance can be gained if the evaluated security measure is not “available” i.e. present and working (A=0 ÆSAL=0); (ii) A complete lack of conformity of the security measure with respect to the reference provided by the security requirements will result in no confidence (c=0 Æ SAL=0); (iii) In the event that the system is working in a context with a security criticality level α= 100% or 1, no assurance can be gained since no security measure in practice can guarantee over 100% security. An interesting mathematical characterization closer to the requirements above specified can be derived from probability theory, namely from normal law of probability. Let α be the estimated security criticality of a system context of use and C be the value of conformity between deployed security measure and the security policy specification. The assurance that the security measure will exhibit an effectiveness level at least equal to α given its current conformity level C is given by the following formula: (1) Given that the non-deployment or non availability of a security measure would result in no assurance at all since the security measure is not prompt to advert security risks, we extend (1) by adding the availability parameter. (2) with α ∈ [0, 1[, ci ∈ [0, QL] , A= {0, 1} and c = Agf {ci} 1<= ci <= number of conformity checks performed within the security measure. For a fix value of α (same context criticality- figure.4a) the higher the conformity level, the higher the confidence in the security measure. One obvious way of increasing the security assurance level through the value of c, would be to use a probe with a high quality level. When one switches to a context with higher security criticality (figure.4b), the confidence level is still positively correlated to the conformity level, but drops sharply. The value of the security assurance for a value of c=QL (probe quality level) is called the achievable security assurance level. It represents the highest assurance level one could expect while using a particular probe. The value of the security assurance for c= QL also reflects the well functioning of the security measure. Therefore it should be displayed along with the actual value of the security assurance to help the operator appreciate the gap between the two values and subsequently take appropriate actions to get closer to the achievable level.
48
M. Ouedraogo et al.
Fig. 4a. Assurance fluctuation with conformity level
Fig. 4b. Assurance fluctuation with the context
A system in general, a critical component in particular is normally associated with several security measures protecting it against specific threats. The security assurance of such component or system requires therefore the consideration of the security assurance levels of all its associated security measures. An aggregation function (Agf) is therefore necessary. Algorithms such us the minimum, maximum or weighted average can be used depending on our associated security measures contribute to a component security. Equation (2) can be generalized as follows: SAL (Component) = Agf {SAL (SMi)}
(3)
Where SAL (SMi) represents the security assurance level of an associated security measure and 1<=i<=number of associated security measures. Subsequently, the security assurance level of a critical system or service can be obtained by aggregating the assurance level of its critical components. Figure.3 shows the sequence of aggregation up to a system level. 4.5 Display and Monitoring of the Assurance Level Once the overall value of the security assurance level has been estimated, an almost real time display of security assurance of the service is performed. Furthermore, a comparison between the current value of the security assurance level and the achievable security assurance level will result in an appropriate message being displayed in case of mismatch between the two. This can help the security manager identify causes of assurance deviation and also assist him/her in making decisions.
5 Architectural Choice for the Assurance Evaluation Methodology Given the highly distributed nature of most current systems, verification of the security measures is more challenging due to issues such as concurrency, fault tolerance, security and interoperability. Multi agent systems (MAS) [17] offer interesting
Information Systems Security Criticality and Assurance Evaluation
49
features for verifying the security of such systems. In our work, we consider an agent as an encapsulated computer system that is situated in some environment and that is capable of flexible, autonomous action in that environment in order to meet its design objectives [18]. As agents have control over their own behavior, they must cooperate and negotiate with each other to achieve their goals [17]. The convergence of these agents properties and distributed systems behavior makes MAS architecture an appropriate mechanism to evaluate the security assurance of critical infrastructures run by distributed systems. In our framework, the security assurance evaluation approach has been implemented using Jade [19] with an agent organization involving a hierarchy of three types of agents: Server agent (embedded in the server), Multiplexer agents or MUXagent (For huge and multi domains systems with dedicated firewalls, crossing each firewall every time a check is needed is not recommendable. Thus MUX agents can be defined, for each sub-domain, at firewall level to relay the information to probe agents.) and Probe agents (agents triggering a probe or collecting information from probe during assurance evaluation). Importantly, although an XML based format is used for the message between server agent and MUX agents, the message format between probe agents and probe is specific to the probe. Thus the message from the probe agent has to be transformed to a format understandable by the probe. Requ est S e rv e r-A g e n t
BM 1 BM 2
1
Respo nse
2
BM1 BM2
R o le s D ir e c t o r y A g e n t ID R o le s PA1 BM1 BM3 PA2 BM2 MUX S u b d o m a in X . Y
9
3
BM 1 BM 2
MUX
4
S u b - d o m a in X .Y
Re quest BM1
BM 1 BM 2
10 Respo nse
R equest
5
Requ est BM 2
R espon se BM2
8 R espon se BM1
P ro b e -A ge nts
B M : B a se M e a su re P A : P ro b e -A ge nt
PA1 6
PA2 7
P ro b e
P rob e
[Bulut et.al. 2007]
Fig. 5. Agents interaction scenario
The following scenario and figure.5 [10] describes how the agents are hierarchically organized and interact when conducting a security assurance measurement: The Server-Agent receives a request including the list of verifications to perform or base measures (BM1, BM2) with the targeted network elements (1). It consults the roles directory for determining the sub-domains to which the targeted network elements belong and also the MUX-Agent (2). Then, it sends the request to the concerned MUX-Agent (3). The MUX-Agent dispatches the base measures request (4, 5) after determining which Probe-Agents can perform these base measures. The Probe-Agents receive the request intended for them, launch the measurement at the probe level (6), get the result and format it in a well defined agent-messaging format (7) and send it to the MUX-Agent (8). The MUX-Agent collects all the measurement results and sends
50
M. Ouedraogo et al.
them to the Server-Agent (9). The Server-Agent collects all the measurement coming from the MUX-Agent or generally from all the MUX-Agents, aggregates them before generating the response (10).
6 Case Study For the purpose of illustrating the security assurance evaluation approach, let us consider the following scenario, brief summary of a larger case study used in a technical report: Bob is an employee who frequently uses his office IT networks for other activities such as online banking, personal emails check and so on. A critical component part of the network and relevant to Bob is the DNS server. In fact, an illicit modification of the DNS configuration can cause the transmission of malicious information and affecting the security of the DNS dependent elements. This may result in him being tricked while visiting his bank website and subsequently give his banking details on a replica of the website controlled by fraudsters. We propose to evaluate the security assurance of the DNS server taking into account the context of use of the system by Bob. For that, we consider a scenario where errors have been injected in the address resolution file in order to corrupt the DNS bind9. An illicit modification of the DNS configuration can cause the transmission of malicious information and affect the security of the DNS dependent elements. In this scenario, the Samhain [20], an open source host-based intrusion detection system using cryptographic checksums of files to detect modifications, is used as probe for the verification. An effective functioning of that probe helps detect the address resolution files integrity being corrupted as a result of a malicious attack. Furthermore, self-developed scripts have been used to verify whether the DNS file was well constructed. Following the steps of the methodology discussed in the previous section, the following analysis is performed:
Fig. 6. A simple representation of Bob’s workplace network
Step 1: Assurance components modeling- Figure.6 provides an overview of the system model and the assurance relevant component, particularly the DNS. Step 2: Specification of the metrics- Based on information obtained from the Samhain documentation and by comparing the quality metric taxonomy and the specification in
Information Systems Security Criticality and Assurance Evaluation
51
table 6, we derived the following conclusions: Coverage of the measures: The coverage of the Samhain measurements satisfies QAM_COV.2. In fact the measures only represent a static behavior of the service and not a dynamic network view (in and outgoing flows from and to the DNS server). Depth of the measures: The measures undertaken by the Samhain target the address resolution file. This is good because a missing address resolution file or a bad content is relevant for the correct DNS behavior. This satisfies at least QAM_DPT.3. Rigor of the measures: Samhain is a dedicated open source integrity check software tool (QAM_RIG.2). Independence of verification: A recent version (v2.4.5) was used with a continuous evolution of the dictionary. The main weaknesses of the DNS controlled (QAM_IND.3). The values of the Samhain (S) capabilities do not explicitly correspond to any of quality level of table 1. Nonetheless, all the parameters of the Samhain (Coverage: 2, Depth: 3, Rigor: 2 and Independent verification: 3) are greater or equal to those of quality level 4, while some are lower than those of quality level 5. We can here conclude that the Samhain probe corresponds to quality level 4. Step 3: Verification of the components security posture-For the performance of the security assurance evaluation of the DNS, two probe agents have been launched for verifying the DNS both the DNS conversion file integrity and the conversion file construction. The verification frequency was set to 1 hour during which the agents were dispatched in the network to perform the evaluation. Table 6 describes the requirements for the DNS conformity verification. Table 6. Requirements for the DNS conformity verification Reference DNS conversion files integrity DNS conversion files construction.
Base measures Check the integrity of address resolution file. Files corrupted: Check if the resolution files content is well constructed.
Required probes Samhain Scripts
Frequency 1 hour 1 hour
Step 4: Assurance level aggregation- Determining the Security Assurance Level of the DNS based on Bob Experience Availability value: Availability check only concerns security measures which have been implemented or installed, in the case of the DNS verification for instance, it is rather a case of security relevant parameter where altering some elements of its configuration could result in a security loophole. In such case the value of availability is by default set to 1. Conformity value: Taking into account the quality level of the Samhain and the possible results obtain from the Samhain, the overall conformity level for the DNS (depending on the gravity of the security breach in the expert view) can be summarized as follows: If the address resolution files integrity is compromised: • • •
Corrupted files: An evil-minded modification then the conformity level is 0 and a message is then triggered. Configuration errors: the conformity level is 1, an appropriate message is displayed. Otherwise, if everything is fine the conformity level is 4.
The above classification of the mismatch assumes that a configuration error is less serious than a malicious modification.
52
M. Ouedraogo et al. Table 7. Determining the security criticality for Internet banking
Context description Aggregated security criticality Internet banking Very high Category Security criticality Rationale Health and safety Low The context is not directly related to health and safety Revenue/ finance Very high An IT security incident may result in financial losses for Bob. The severity of the loss may depend on Bob’s financial capability and the time that he takes to realize something irregular is happening with his account. We here assume that the impact may be the highest i.e. severe or catastrophic for Bob. Intangible assets moderate An IT security incident may result in the disclosure of Bob’s banking (Reputation, information, which is a breach to his privacy. privacy, competitiveness,.) Activity Low The context has no direct impact Bob’s other activity and especially on performance his professional activities.
Table 8. Security Assurance Value for the DNS DNS Security assurance values SAL= A (1-αc) with A=1 and depending on the context Internet banking (α= 0.8) Personal email checks (α= 0.6) Simple web browsing (α= 0.3)
Determining the security criticality of the contexts of use for Bob: Considering on the daily usage of the network by Bob for his personal matters, we use the classification scheme in section 2 to determine the security criticality for: Internet banking estimated as very high (0.8) - see table 7-, the email checks estimated as high (0.6) because a security breach may have a high impact on Bob’s privacy and simple web browsing could be associated to 0.3 (moderate). Owing to page limitation, only the
Information Systems Security Criticality and Assurance Evaluation
53
determination process for Internet banking has been further detailed. The figures on the table show that the achievable assurance level (i.e. no errors detected) for Internet banking is 0.59. In case of security loophole leading to a conformity level c = 0 (corrupted files, evil-minded modification) the displayed assurance level is 0. Presence of configuration errors (c =1) gives an assurance level of 0.2. Step 5: Display and monitoring- Once the security assurance level has been evaluated, the value is displayed on the assurance evaluation system as shown in figure 7. The current security assurance level is then compared to the previous evaluated value. The Operator can then use the assurance measurements history to find out whether the assurance level has experienced any drop, increase or is static.
7 Conclusion This paper has presented a methodology for the evaluation of security assurance of an IT system given the criticality of the context in which it is operating. A classification scheme that aims at helping to evaluate system context security criticality has also been provided. Our approach exhibit the following advantages: On one hand, it can help increase end-users trust in their usage of IT systems and moreover help them in knowing which activities can be performed with more confidence and which one is advisable to refrain from doing based on the security assurance values. On the other hand, it helps security experts manage better deployed security measures by identifying areas of the system that need attention before a security loophole is exploited by malicious individuals. Nonetheless, the categories for criticality assessment proposed still require some investigation so to ensure they are complete enough to be use for most systems. During the application of the methodology, we realized that a sound knowledge of the system components is imperative for a better specification of the metrics which in turn will guarantee realistic values for security assurance. We currently envisage a wider application of the approach to a variety of systems in view of its improvement. Moreover, we plan to integrate the approach with network management tools and risk assessment software and to simulate real world complexity. Acknowledgement. This work has been supported by the TITAN project, financed by the national fund of research of the Grand Duchy of Luxembourg under contract C08/iS/21.
References 1. Le Grand, C.H.: Software security assurance: A framework for software vulnerability management and audit. CHL Global Associates and Ounce Labs, Inc. (2005) 2. Jansen, W.: Directions in Security Metrics Research. National Institute of Standards and Technology Special publication# NISTIR 7564 (2009) 3. Vaughn, R.B., Henning, R., Siraj, A.: Information Assurance Measures and Metrics – State of Practice and Proposed Taxonomy. In: Proceedings of the IEEE/HICSS 2003, Hawaii (2002)
54
M. Ouedraogo et al.
4. Seddigh, N., Pieda, P., Matrawy, A., Nandy, B., Lambadaris, L., Hatfield, A.: Current Trends and Advances in Information Assurance Metrics. In: Proc. of PST 2004, pp. 197– 205 (2004) 5. Savola, R.M.: Towards a Taxonomy for Information Security Metrics. In: International Conference on Software Engineering Advances (ICSEA 2007), Cap Esterel, France (2007) 6. Common Criteria for information Technology, part 1-3, version 3.1 (September 2006) 7. Stoneburner, G.: Underlying Technical Models for Information Technology Security, National Institute of Standards and technology Special publication #800-33 (2001) 8. Mouratidis, H., Giorgini, P.: Secure Tropos: A Security-Oriented Extension of the Tropos methodology. International Journal of Software Engineering and Knowledge Engineering (IJSEKE) 17(2), 285–309 (2007) 9. Jürjens, J.: Secure Systems Development with UML. Springer, Berlin (2005) 10. Bulut, E., Khadraoui, D., Marquet, B.: Multi-Agent based security assurance monitoring system for telecommunication infrastructures. In: Proceedings to the Communication, Network, and Information Security conference, Berkely/California (2007) 11. Bugyo project, http://projects.celtic-initiative.org/bugyo/ (accessed: March 8, 2009) 12. Evans, D.L., Bond, P.J., Bement, A.L.: Standards for Security categorization of Federal Information And Information Systems. NIST Gaithersburg, MD 20899-8900 (2004) 13. Operationally Critical Threat, Asset and Vulnerability Evaluation (OCTAVE), Carnegie Mellon - Software Engineering Institute (June 1999) 14. CRAMM, CCTA Risk Analysis and Management Method, http://www.cramm.com/ 15. OLF Guideline No 123: Classification of process control, safety and support ICT systems based on criticality, Norway (2009) 16. Ouedraogo, M., Mouratidis, H., Khadraoui, D., Dubois, E.: A probe capability metric taxonomy for assurance evaluation. In: Proceedings of UEL 5th conference on Advances in Computing and Technology Conference (AC&T), England (2010) 17. Wooldridge, M.: An Introduction to Multi-Agent Systems. John Wiley & Sons, Chichester (2002) 18. Jennings, N.R.: An agent-based software engineering. In: Garijo, F.J., Boman, M. (eds.) MAAMAW 1999. LNCS, vol. 1647. Springer, Heidelberg (1999) 19. JADE, http://jade.tilab.com (accessed: March 10, 2008) 20. Samhain, http://www.la-samhain.de/samhain (accessed: March 10, 2008)
Security Analysis of ‘Two–Factor User Authentication in Wireless Sensor Networks’ Muhammad Khurram Khan1 and Khaled Alghathbar2 1,2 2
Center of Excellence in Information Assurance, King Saud University, Saudi Arabia Information Systems Department, College of Computer and Information Sciences, King Saud University, Saudi Arabia [email protected], [email protected]
Abstract. Authenticating remote users in wireless sensor networks (WSN) is an
important security issue due to their un-attended and hostile deployments. Usually, sensor nodes are equipped with limited computing power, storage, and communication module, thus authenticating remote users in such resourceconstrained environment is a critical security concern. Recently, M.L Das proposed a two-factor user authentication scheme in WSN and claimed that his scheme is secure against different kind of attacks. However, in this paper, we prove that M.L Das-scheme has some critical security pitfalls and is not recommended for real application. We point out that in his scheme: users cannot change/update their passwords, it does not provide mutual authentication between gateway node and sensor node, and is vulnerable to gateway node bypassing attack and privileged-insider attack.
work is valuable and should only be given access to the registered or legitimate users. Benenson et al. first sketched the security issues of user authentication in WSN and introduced the notion of n-authentication [4]. Later on, Watro et al. proposed a TinyPK authentication protocol with public key cryptography that uses RSA and DiffieHellman algorithms [5], however, this protocol suffers from masquerade sensor node attack, in which an adversary can spoof the user. In 2006, Wong et al. [6] proposed a light-weight dynamic user authentication scheme in WSN environment. They justified their scheme through security and cost analysis and discussed the implementation issues with the recommendations of using security features of IEEE 802.15.4 MAC sublayer. Later, Tseng et al. [7] identified that Wong et al.’s scheme has some security weaknesses and cannot be implemented in real-life environment. They showed that Wong et al.’s scheme is not protected from replay and forgery attacks, passwords can easily be revealed by any of the sensor nodes, and users cannot freely change their passwords. To overcome these discrepancies, Tseng et al. proposed an enhanced scheme and claimed that their scheme not only retains the advantages of Wong et al.’s scheme but provides: resistance to replay and forgery attacks, reduction of password leakage risk, and capability of changeable password with better efficiency [7]. Lately, T. H Lee [8] also analyzed the Wong et al.’s scheme and proposed two simple dynamic user authentication protocols that are the variations of Wong et al.’s scheme. In his first protocol, T.H Lee simplified the authentication process by reducing the computational load of sensor nodes while preserving the same security level of Wong et al.’s scheme. On the other hand, in his second protocol, T.H Lee proposed a scheme in which an intruder cannot impersonate the gateway node to grant access to illegitimate users. L.C Ko [9] proved that while Tseng et al.’s scheme achieves several security measures upon Wong et al.’s scheme, it is still insecure under a reasonable attack model [9]. L.C Ko discussed that Tseng et al.’s scheme does not achieve mutual authentication between the Gateway node (GW) and the Sensor node (SN), and between the User (U) and the SN. Furthermore, L.C Ko identified that an adversary can forge the communication message which is sent from sensor node to the gateway node. Consequently, L.C Ko proposed a modified scheme which attempts to overcome the aforementioned security pitfalls of Tseng et al.’s protocol and proved that his scheme has better security features than Tseng et al.’s scheme. [7] Recently, Binod et al. [10] cryptanalyzed the authentication schemes of Wong et al. and Tseng et al. and proposed their improved scheme. Binod et al. showed that their scheme is more robust than previously published schemes and can withstand replay attack, forgery attack, man-in-the-middle attack and provides mutual authentication between login node and gateway node. More recently, M.L Das [11] proposed a two-factor user authentication scheme in wireless sensor networks. M.L Das also identified that Wong et al.’s protocol is vulnerable to many logged-in users with the same login-id threat, that is, who has a valid user’s password can easily login to the sensor network [11]. He also identified that Wong et al.’s protocol is susceptible to stolen-verifier attack, because the GW-node and login-node maintain the lookup table of all the registered users’ credentials. Consequently, M.L Das proposed his protocol to overcome the security flaws of Wong et al.’s scheme. His protocol uses two factor authentication concept based on password
Security Analysis of ‘Two-Factor User Authentication in Wireless Sensor Networks’
57
and smart card and resists many logged-in users with the same login identity, stolenverifier, guessing, replay, and impersonation attacks. However, in this paper, we identify that M.L Das-scheme is also not secure and vulnerable to several critical security attacks. We show that M.L Das-scheme is defenseless against GW-node by-passing attack, does not provide mutual authentication between GW-node and sensor nodes, has the security threat of insider attack, and does not have provision for changing or updating passwords of registered users. Rest of the paper is organized as follows; Section 2 briefly reviews M.L Dasscheme, Section 3 elaborates on the weaknesses and security pitfalls of his scheme, and at the end, Section 4 concludes the findings of this paper.
2 Review of M.L Das–Scheme In this section, we briefly review user authentication scheme of M.L Das which is divided into two phases namely, registration phase and authentication phase. 2.1 Registration Phase When a user wants to perform registration with the wireless sensor network (WSN), he submits his and to the Gateway node (GW-node) in a secure manner. Upon receiving the registration request, the GW-node computes || , where K is a symmetric key which is secure to the GWnode, and ‘||’ is a bit-wise concatenation operator. Now, the GW-node personalizes the smart card with the parameters . , , , and , where . is a one-way secure hash function and is a secret value generated securely by the GW-node and stored in some designated sensor nodes before deploying the WSN. At the end of this phase, gets his personalized smart card in a secure manner. 2.2 Authentication Phase Authentication phase is invoked when wants to login into WSN or access data from the network. This phase is further sub-divided into two phases, namely login and verification phases. 1)
Login Phase In the login phase, inserts his smart card into terminal and inputs and . The smart card validates the and with the stored values. If is successfully authenticated, the smart card performs the following steps: Step- L1: Computes stamp of system Step- L2: Computes
|| ||
Verification Phase Upon receiving the login request cates by the following steps:
||
||
, where
is the current time-
,
to the GW-node
, then send
,
2)
,
,
at time
, the GW-node authenti-
58
M.K. Khan and K. Alghathbar
Step-V1: Checks if ∆ then GW-node proceeds to the next step, otherwise verification step is terminated. Here ∆ shows the expected time interval for the transmission delay Step-V2: Computes || || and || ||
||
Step-V3: if then GW-node accepts the login request; otherwise login request is rejected. Step-V4: GW-node now sends a message , , ′ to some nearest sensor over a public channel to respond the query data what is looking for, where the value of is || || || ′ , where ′ is the current timestamp of the GW-node. Here, value of is used to ensure that the message is originally come from the real GW-node. Step-V5: After receiving the message , , ′ , the validates the timestamp. If timestamp is within valid interval, then computes | || | ′ and checks whether it is equal to . If this step is passed, then responds to the ’s query.
3 Security Analysis and Pitfalls of M.L. Das–Scheme 3.1 GW–Node Bypassing Attack In M.L Das-scheme, after performing the verification phase and accepting the login request of , the GW-node sends an intimation message , , ′ to some nearest sensor node to inform about the successful login of and requests to respond the query/data of . Here, is computed by || || || ′ , where is a secret parameter which is known to GW-node, sensor node and stored in the smart card of . ′ is the timestamp of GW-Node and is the dynamic ID of us|| || . In M.L Das-scheme the er , which is calculated by value of is used to ensure that message is coming from the legitimate GWnode. Here, we assume that if the value of is extracted from smart card of by some means [12, 13], then himself or any adversary can login the without going through the verification of GW-node, so Das et al.’s scheme is vulnerable to ‘GWnode by-passing attack’. In the following, we show that how this attack works on M.L Das-scheme: (i)
(ii) (iii) (iv)
Suppose an adversary or himself computes a fake dynamic identity by using the extracted from smart card || || , where is a fake ID of adversary, is a randomly chosen fake password, and is the current timestamp of adversary’s machine. Adversary computes || || || , where is the nearest sensor node for querying the data. Now, adversary sends the message , , to over insecure communication channel. ∆ , then After receiving the message, first validates . If proceeds to next step, otherwise terminates the operation. Here, ∆ shows the expected time interval for the transmission delay.
Security Analysis of ‘Two-Factor User Authentication in Wireless Sensor Networks’
(v)
59
now computes ′ || || || and checks whether the value of or not. If it holds, responds to adversary’s query and who is an adversary and not a legitimate user of the sensor network system enjoys the resources as an authorized user without being the member of the system. ′ ?
3.2 No Mutual Authentication between GW and Sensor Nodes In M.L Das-scheme, after accepting the login request of , the GW-node sends a message , , ′ to some nearest sensor node . Here the value of is com| || | ′ , where ′ is the current timestamp of GW-node. puted by This message informs the sensor node to respond the query/data, which is requesting from the sensor network. In this message, the value of is used to ensure the sensor node that it is come from the real GW-node. However, sensor node verifies the authenticity of GW-node but there is no authenticity that the sensor node is fake or real. Thus, M.L Das-scheme only provides unilateral authentication between GWnode and sensor node, and there is not mutual authentication between the two nodes, which is an indispensable property of authentication protocol designing [14] [15]. 3.3 Privileged–Insider Attack In real environment, it is a common practice that many users use same passwords to access different applications or servers for their convenience of remembering long passwords and use them easily whenever required. However, if the system manager or a privileged-insider of GW-node has known the passwords of , he may try to impersonate by accessing other servers where could be a registered user. In M.L Dasscheme, performs registration with GW-node by presenting his password in plain format i.e. . Thus, his scheme has pitfall of insider’s attack of GW-node by privileged user who has come to know the password of and can misuse the system in future [15]. 3.4 No Provision for Changing/Updating Passwords In M.L Das-scheme, there is no provision for to change or update his password whenever required. It is widely recommended security policy for highly secure applications that user’s should update or change their passwords frequently, while there is no such option in M.L Das-scheme for it.
4 Conclusion In this paper, we have shown that the recently proposed two-factor user authentication scheme in wireless sensor network environment is insecure against different kind of attacks and should not be implemented in the real-applications. We have demonstrated that in M.L. Das-scheme, there is no provision for users to change or update their passwords, the GW-node bypassing attack is possible, it does not provide mutual authentication between GW-node and sensor node, and it is susceptible to privilegedinsider attack.
60
M.K. Khan and K. Alghathbar
References 1. Chiara, B., Andrea, C., Davide, D., Roberto, V.: An Overview on Wireless Sensor Networks Technology and Evolution. Sensors 9, 6869–6896 (2009) 2. Callaway, E.H.: Wireless Sensor Networks, Architectures and Protocols. Auerbach Publications, Taylor & Francis Group, USA (2003) 3. Chong, C.Y., Kumar, S.: Sensor Networks: Evolution, Opportunities, and Challenges. Proceedings of the IEEE 91, 1247–1256 (2003) 4. Benenson, Z., Felix, C.G., Dogan, K.: User Authentication in Sensor Networks. In: Proceedings of Workshop Sensor Networks, Germany, pp. 385–389 (2004) 5. Watro, R., Derrick, K., Sue-fen, C., Charles, G., Charles, L., Peter, K.: TinyPK: Securing Sensor Networks with Public Key Technology. In: Proceedings of the 2nd ACM workshop on Security of ad hoc and sensor networks, USA, pp. 59–64 (2004) 6. Wong, K.H.M., Yuan, Z., Jiannong, C., Shengwei, W.: A dynamic user authentication scheme for wireless sensor networks. In: Proceedings of Sensor Networks, Ubiquitous, and Trustworthy Computing, Taichung, pp. 244–251 (2006) 7. Tseng, H.R., Jan, R.H., Yang, W.: An Improved Dynamic User Authentication Scheme for Wireless Sensor Networks. In: Proceedings of IEEE Globecom, pp. 986–990 (2007) 8. Tsern, H.L.: Simple Dynamic User Authentication Protocols for Wireless Sensor Networks. In: Proceedings of 2nd International Conference on Sensor Technologies and Applications, pp. 657–660 (2008) 9. Ko, L.C.: A Novel Dynamic User Authentication Scheme for Wireless Sensor Networks. In: Proceedings of IEEE ISWCS, pp. 608–612 (2008) 10. Binod, V., Jorge, S.S., Joel, J.P.C.R.: Robust Dynamic User Authentication Scheme for Wireless Sensor Networks. In: Proceedings of ACM Q2SWinet, Spain, pp. 88–91 (2009) 11. Das, M.L.: Two-Factor User Authentication in Wireless Sensor Networks. IEEE Transactions on Wireless Communications 8, 1086–1090 (2009) 12. Kocher, P., Jaffe, J., Jun, B.: Differential Power Analysis. In: Wiener, M. (ed.) CRYPTO 1999. LNCS, vol. 1666, pp. 388–397. Springer, Heidelberg (1999) 13. Messerges, T.S., Dabbish, E.A., Sloan, R.H.: Examining Smartcard Security under the Threat of Power Analysis Attacks. IEEE Transactions on Computers 51, 541–552 (2002) 14. Khan, M.K., Zhang, J.: Improving the Security of A Flexible Biometrics Remote User Authentication Scheme. Computer Standards & Interfaces, Elsevier Science 29, 82–85 (2007) 15. Ku, W.C., Chen, S.M.: Weaknesses and Improvements of An Efficient Password based Remote user Authentication Scheme using Smart Cards. IEEE Transactions on Consumer Electronics (50), 204–207 (2004) 16. Wang, X., Zhang, W., Zhang, J., Khan, M.K.: Cryptanalysis and Improvement on Two Efficient Remote User Authentication Scheme using Smart Cards. Computer Standards & Interfaces, Elsevier Science 29, 507–512 (2007) 17. Khan, M.K.: Fingerprint Biometric-based Self and Deniable Authentication Schemes for the Electronic World. IETE Technical Review 26, 191–195 (2009)
Directed Graph Pattern Synthesis in LSB Technique on Video Steganography Debnath Bhattacharyya2, Arup Kumar Bhaumik1, Minkyu Choi2, and Tai-hoon Kim2,* 1
Abstract. In this paper, we propose a data hiding and extraction procedure for high resolution AVI videos. Although AVI videos are large in size but it can be transmitted from source to target over network after processing the source video by using these Data hiding and Extraction procedure securely. There are two different procedures, which are used here at the sender’s end and receiver’s end respectively. The procedures are used here as the key of Data Hiding and Extraction. Keywords: Data hiding, AVI, security, LSB Technique, Pattern Synthesis.
1 Introduction Currently, Internet and digital media are getting more and more popular. So, requirement of secure transmission of data also increased. Various good techniques are proposed and already taken into practice. Data Hiding is the process of secretly embedding information inside a data source without changing its perceptual quality. Data Hiding is the art and science of writing hidden messages in such a way that no one apart from the sender and intended recipient even realizes there is a hidden message. Generally, in Data Hiding, the actual information is not maintained in its original format and thereby it is converted into an alternative equivalent multimedia file like image, video or audio which in turn is being hidden within another object. This apparent message is sent through the network to the recipient, where the actual message is separated from it. The requirements of any data hiding system can be categorized into security, capacity and robustness Cox et al. (1996). All these factors are inversely proportional to each other creating the so called data hiding dilemma. The focus of this paper aims at *
maximizing the first two factors of data hiding i.e. security and capacity coupled with alteration detection. The proposed scheme is a data-hiding method that uses high resolution digital video as a cover signal. the proposed recipient need only process the required steps in order to reveal the message; otherwise the existence of the hidden information is virtually undetectable. The proposed scheme provides the ability to hide a significant quality of information making it different from typical data hiding mechanisms because here we consider application that require significantly larger payloads like video-in-video and picture-in-video. The purpose of hiding such information depends on the application and the needs of the owner/user of the digital media. Data hiding requirements include the following: • • • •
Imperceptibility- The video with data and original data source should be perceptually identical. Robustness- The embedded data should survive any processing operation the host signal goes through and preserve its fidelity. Capacity-Maximize data embedding payload. Security- Security is in the key.
Data Hiding is the different concept than cryptography, but uses some of its basic principles [1]. In this paper, we have considered some important features of data hiding. Our consideration is that of embedding information into video, which could survive attacks on the network.
2 Previous Works As video file consist of several image sequence, so considering the data hiding technique of image will also apply for video data hiding we get:2.1 Least-Significant Bit Modifications The most widely used technique to hide data, is the usage of the LSB. Although there are several disadvantages to this approach, the relative easiness to implement it, makes it a popular method. To hide a secret message inside a image, a proper cover image is needed. Because this method uses bits of each pixel in the image, it is necessary to use a lossless compression format, otherwise the hidden information will get lost in the transformations of a lossy compression algorithm. When using a 24 bit color image, a bit of each of the red, green and blue color components can be used, so a total of 3 bits can be stored in each pixel. Thus, a 800 × 600 pixel image can contain a total amount of 1.440.000 bits (180.000 bytes) of secret data. For example, the following grid can be considered as 3 pixels of a 24 bit color image, using 9 bytes of memory: (00100111 11101001 11001000) (00100111 11001000 11101001) (11001000 00100111 11101001)
Directed Graph Pattern Synthesis in LSB Technique on Video Steganography
63
When the character A, which binary value equals 10000001, is inserted, the following grid results: (00100111 11101000 11001000) (00100110 11001000 11101000) (11001000 00100111 11101001) In this case, only three bits needed to be changed to insert the character successfully. On average, only half of the bits in an image will need to be modified to hide a secret message using the maximal cover size. The resulting changes that are made to the least significant bits are too small to be recognized by the human eye, so the message is effectively hidden. While using a 24 bit image gives a relatively large amount of space to hide messages, it is also possible to use a 8 bit image as a cover source. Because of the smaller space and different properties, 8 bit images require a more careful approach. Where 24 bit images use three bytes to represent a pixel, an 8 bit image uses only one. Changing the LSB of that byte will result in a visible change of color, as another color in the available palette will be displayed. Therefore, the cover image needs to be selected more carefully and preferably be in grayscale, as the human eye will not detect the difference between different gray values as easy as with different colors. 2.2 Masking and Filterings Masking and filtering techniques, usually restricted to 24 bits or grayscale images, take a different approach to hiding a message. These methods are effectively similar to paper watermarks, creating markings in an image. This can be achieved for example by modifying the luminance of parts of the image. While masking does change the visible Properties of an image, it can be done in such a way that the human eye will not notice the anomalies. Since masking uses visible aspects of the image, it is more robust than LSB modification with respect to compression, cropping and different kinds of image processing. The information is not hidden at the ”noise” level but is inside the visible part of the image, which makes it more suitable than LSB modifications in case a lossy compression algorithm like JPEG is being used [2]. 2.3 Transformations A more complex way of hiding a secret inside an image comes with the use and modifications of discrete cosine transformations. Discrete cosine transformations (DST)), are used by the JPEG compression algorithm to transform successive 8 x 8 pixel blocks of the image, into 64 DCT coefficients each. Each DCT coefficient F(u,v) of an 8 x 8 block of image pixels f(x,y) is given by:
64
D. Bhattacharyya et al.
where C(x) = 1/ 2 when x equals 0 and C(x) = 1 otherwise. After calculating the coefficients, the following quantizing operation is performed:
Where Q(u,v) is a 64-element quantization table. A simple pseudo-code algorithm to hide a message inside a JPEG image could look like this: Input: Message, cover image Output: steganographic image containing message While data left to embed do Get next DCT coefficient from cover image If DCT not equal to 0 and DCT not equal to 1 then Get next LSB from message Replace DCT LSB with message bit End if Insert DCT into steganographic image End while Although a modification of a single DCT will affect all 64 image pixels, the LSB of the quantized DCT coefficient can be used to hide information. Lossless compressed images will be suspect able to visual alterations when the LSB are modified. This is not the case with the above described method, as it takes place in the frequency domain inside the image, instead of the spatial domain and therefore there will be no visible changes to the cover image [3]. When information is hidden inside video the program or person hiding the information will usually use the DCT (Discrete Cosine Transform) method. DCT works by slightly changing the each of the images in the video, only so much though so it’s isn’t noticeable by the human eye. To be more precise about how DCT works, DCT alters values of certain parts of the images, it usually rounds them up. For example if part of an image has a value of 6.667 it will round it up to 7. Data Hiding in Videos is similar to that of Data Hiding in Images, apart from information is hidden in each frame of the video. When only a small amount of information is hidden inside of video it generally isn’t noticeable at all, however the more information that is hidden the more noticeable it will become.
3 Our Work with Analysis The main high resolution AVI fie is nothing but a sequence of high resolution image called frames. Initially we will like to stream the video and collect all the frames in bitmap format (Fig.1). And also collect the following information: Starting frame: It indicates the frame from which the algorithm starts message embedding. Starting macro block: It indicates the macro block within the chosen frame from which the algorithm starts message embedding.
Directed Graph Pattern Synthesis in LSB Technique on Video Steganography
65
Number of macro blocks: It indicates how many macro blocks within a frame are going to be used for data hiding. These macro blocks may be consecutive frame according to a predefined pattern. Apparently, the more the macro blocks we use, the higher the embedding capacity we get. Moreover, if the size of the message is fixed, this number will be fixed, too. Otherwise it can be dynamically changed. Frame period: It indicates the number of the inter frames, which must pass, before the algorithm repeats the embedding. However, if the frame period is too small and the algorithm repeats the message very often, that might have an impact onto the coding efficiency of the encoder [4]. Apparently, if the video sequence is large enough, the frame period can be accordingly large. The encoder reads these parameters from a file. The same file is read by the software that extracts the message, so as both of the two codes to be synchronized. After streaming the AVI video file into AVI frames we will like to use the conventional LSB replacement method. LSB replacement technique has been extended to multiple bit planes as well. Recently [5] has claimed that LSB replacement involving more than one least significant bit planes is less detectable than single bit plane LSB replacement. Hence the use of multiple bit planes for embedding has been encouraged. But the direct use of 3 or more bit planes leads to addition of considerable amount of noise in the cover image. Still as my work is in high resolution video so we are getting a RGB combination of each pixel as in Fig. 2 hence if we consider one LSB we will have a choice of 3 bits for each pixel [6].
Fig. 1. AVI video Streaming and Data Hiding Algorithm
Fig. 2. The RGB bit Patterns
66
D. Bhattacharyya et al.
Fig. 3. Simple LSB data insertion technique
Fig. 4. Star directed graph
Fig. 5. Hiding Data with the Star Graph
Directed Graph Pattern Synthesis in LSB Technique on Video Steganography
67
Now to overcome the clam of [5] in a better way, we proposed a directed graph pattern synthesis method which will insert the binary bit of the stego data in the main or carrier video with a directed graph pattern, which will insert the data depending on the graph direction. And will give a higher security of the Data Hiding method. In the proposed method if we insert data in the video with the directed graph (in Fig. 4) we get the corresponding result in Fig. 5. Here in this case 5 values can be changed with this Star graph structure out of 9 values. Rest of the 4 bit can be used as a key value for each 3 pixels. As in this case we are getting 5 bit as data bit out of 9 bits so this graph patter is useful in case where data amount is lass. In the other hand if we consider the Fig. 6 as the graph we can get 8 bit as the data bit and only one bit as the key bit out of 9 bit from 3 corresponding pixels. With this graph if we insert data i.e. A considering the binary of A as 10000011 we get the data as in Fig. 7.
Fig. 6. Graph
Fig. 7. Hiding Data with help of Fig. 6
68
D. Bhattacharyya et al.
With this graph if we insert data i.e. A considering the binary of A as 10000011 we get the data as in Fig. 7.
4 Data Size Estimation Each frame of the Video is taken a data source for Data Hiding. First the maximum size of the hiding data is calculated as shown in Fig. 3 with the simple method. The size of the image is 2000 × 1000 and modified it to 2048 × 1024. On further calculations we get 786,432,000 chars that can be embed. We have followed the following equation mentioned below: (((Width × height) × 3 bits)/8 bits)/3 bytes × 3000 frames = char/video And the image Bitmap size = 2048 × 1024 Step of calculations the maximum of hiding information 1. 2. 3. 4. 5. 6.
Each frame consist = 2048 × 1024 = 2, 097, 152 Pixels Each pixel include 3 bytes (One byte we use single bit for encode data hiding) R = 1 bit, G = 1 bit and B = 1 bit. Each frame = Pixels ×3 = 2, 097, 152 × 3 = 62, 915, 456 bits Each frame we can maximum hiding data is 62, 915, 456 bits/8 bits = 786, 432 bytes If this video 3000 frames = 786, 432 × 3000 = 2, 359.296, 000 bytes (1 bytes = 1 Character) For 1 Character of Unicode we need 3 bytes/1 character of Unicode = 2, 359.296,000 bytes/3 =786, 432, 000 chars.
5 Conclusion In this paper we propose a data hiding technique for high resolution video. Our intension is to provide proper protection on data during transmission. For the accuracy of the correct message output that extract from source we can use a tools for comparison and statistical analysis can be done. Its main advantage is that it is a blind scheme and its affect on video quality or coding efficiency is almost negligible. It is highly configurable, thus it may result in high data capacities. Finally, it can be easily extended, resulting in better robustness, better data security and higher embedding capacity.
References [1] Bhattacharyya, D., Das, P., Mukherjee, S., Ganguly, D., Bandyopadhyay, S.K., Kim, T.-h.: A Secured Technique for Image Data Hiding. In: Communications in Computer and Information Science, vol. 29, pp. 151–159. Springer, Heidelberg (2009) [2] Bandyopadhyay, S.K., Bhattacharyya, D., Ganguly, D., Mukherjee, S., Das, P.: A Tutorial Review on Steganography. In: International Conference on Contemporary Computing (IC3 2008), Noida, India, August 7-9, pp. 105–114 (2008) [3] Krenn, J.R.: Steganography and Steganalysis (January 2004)
Directed Graph Pattern Synthesis in LSB Technique on Video Steganography
69
[4] Kapotas, S.K., Varsaki, E.E., Skodras, A.N.: Data Hiding in H.264 Encoded Video Sequences. In: IEEE 9th Workshop on Multimedia Signal Processing, Crete, October 1-3, pp. 373–376 (2007) [5] Ker, A.: Steganalysis of Embedding in Two Least-Significant Bits. IEEE Trans. on Information Forensics and Security 2(1), 46–54 (2007) [6] Bhaumik, A.K., Choi, M., Robles, R.J., Balitanas, M.O.: Data Hiding in Video. International Journal of Database Theory and Application 2(2) (June 2009)
Feature Level Fusion of Face and Palmprint Biometrics by Isomorphic Graph-Based Improved K-Medoids Partitioning Dakshina Ranjan Kisku1,*, Phalguni Gupta2, and Jamuna Kanta Sing3 1
Department of Computer Science and Engineering, Dr. B. C. Roy Engineering College, Durgapur – 713206, India 2 Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur – 208016, India 3 Department of Computer Science and engineering, Jadavpur University, Kolkata – 700032, India {drkisku,jksing}@ieee.org, [email protected]
Abstract. This paper presents a feature level fusion approach which uses the improved K-medoids clustering algorithm and isomorphic graph for face and palmprint biometrics. Partitioning around medoids (PAM) algorithm is used to partition the set of n invariant feature points of the face and palmprint images into k clusters. By partitioning the face and palmprint images with scale invariant features SIFT points, a number of clusters is formed on both the images. Then on each cluster, an isomorphic graph is drawn. In the next step, the most probable pair of graphs is searched using iterative relaxation algorithm from all possible isomorphic graphs for a pair of corresponding face and palmprint images. Finally, graphs are fused by pairing the isomorphic graphs into augmented groups in terms of addition of invariant SIFT points and in terms of combining pair of keypoint descriptors by concatenation rule. Experimental results obtained from the extensive evaluation show that the proposed feature level fusion with the improved K-medoids partitioning algorithm increases the performance of the system with utmost level of accuracy. Keywords: Biometrics, Feature Level Fusion, Face, Palmprint, Isomorphic Graph, K-Medoids Partitioning Algorithm.
1 Introduction In multibiometrics fusion [1], feature level fusion [2,3] makes use of integrated feature sets obtained from multiple biometric traits. Fusion at feature level [2,3] is found to be useful than other levels of fusion such as match score fusion [4], decision fusion *
Feature Level Fusion of Face and Palmprint Biometrics by Isomorphic Graph-Based
71
[4], rank level fusion [4]. Since feature set contains relevant and richer information about the captured biometric evidence, fusion at feature level is expected to provide more accurate authentication results. It is very hard to fuse multiple biometric evidences [2,3] at feature extraction level in practice because the feature sets are sometimes found to be incompatible. Apart from this reason, there are two more reasons to achieve fusion at feature extraction level such as the feature spaces are unknown for different biometric evidences and fusion of feature spaces may lead to the problem of curse of dimensionality problem [2]. Further, poor feature representation may cause to degrade the performance of recognition of users. Multimodal systems [4] acquire information from more than one source. Unibiometric identifiers [5] use single source biometric evidence and often are affected by problems like lack of invariant representation, non-universality, noisy sensor data and lack of individuality of the biometric trait and susceptibility to circumvention. These problems can be minimized by using multibiometric systems that consolidate evidences obtained from multiple biometric sources. Feature level fusion [2] of biometric traits is a challenging problem in multimodal fusion. However, good feature representation and efficient solution to curse of dimensionality problem can lead to feature level fusion with ease. Multibiometrics fusion [4] at match score level, decision level and rank level have extensively been studied and there exist a few feature level fusion approaches. However, to the best of the knowledge of authors, there is enough scope to design an efficient feature level fusion approach. The feature level fusion of face and palmprint biometrics proposed in [6] uses single sample of each trait. Discriminant features using graph-based approach and principal component analysis techniques are used to extract features from face and palmprint. Further, a distance separability weighting strategy is used to fuse two sets at feature extraction level. Another example of feature level fusion of face and hand biometrics has been proposed in [7]. It has been found that the performance of feature level fusion outperforms the match score fusion. In [8], a feature level fusion has been studied where phase congruency features are extracted from face and Gabor transformation is used to extract features from palmprint. These two feature spaces are then fused using user specific weighting scheme. A novel feature level fusion of face and palmprint biometrics has been presented in [9]. It makes use of correlation filter bank with class-dependence feature analysis method for feature fusion of these two modalities. A feature level fusion of face [10] and palmprint [11] biometrics using isomorphic graph [12] and K-medoids [13] is proposed in this paper. SIFT feature points [14] are extracted from face and palmprint images as part of feature extraction work. Using the partitioning around medoids (PAM) algorithm [15] which is considered as a realization of K-medoids clustering algorithm is used to partition the face and palmprint images of a set of n invariant feature points into k number of clusters. Then for each cluster, an isomorphic graph is drawn on SIFT points which belong to the clusters. Graphs are drawn on each partition or cluster by searching the most probable isomorphic graphs using iterative relaxation algorithm [16] from all possible isomorphic graphs while the graphs are compared between face and palmprint templates. Each pair of clustered graphs are then fused by concatenating the invariant SIFT points and all pairs of isomorphic graphs of clustered regions are further fused to make a single concatenated feature vector. The same set of invariant feature
72
D.R. Kisku, P.Gupta, and J.K. Sing
vector is also constructed from query pair of samples of face and palmprint images. Finally, matching between these two feature vectors is determined by computing the distance using K-Nearest Neighbor [17] and normalized correlation [18] distance approaches. IIT Kanpur multimodal database is used for evaluation of the proposed feature level fusion technique. The paper is organized as follows. Next section discusses SIFT features extraction from face and palmprint images. Section 3 presents K-Medoids partitioning of SIFT features into a number of clusters. The method of obtaining isomorphic graphs on the sets of the SIFT points which belong to the clusters is also discussed in the same section. Next section presents feature level fusion of clustered SIFT points by pairing two graphs of a pair of clustered regions drawn on face and palmprint images. Experimental results are presented in Section 5 while conclusion is made in the last section.
2 SIFT Keypoints Extraction To recognize and classify objects efficiently, feature points from objects can be extracted to make a robust feature descriptor or representation of the objects. David Lowe [14] has introduced a technique to extract features from images which are called Scale Invariant Feature Transform (SIFT). These features are invariant to scale, rotation, partial illumination and 3D projective transform and they are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. SIFT image features provide a set of features of an object that are not affected by occlusion, clutter and unwanted noise in the image. In addition, the SIFT features are highly distinctive in nature which have accomplished correct matching on several pair of feature points with high probability between a large database and a test sample. Following are the four major filtering stages of computation used to generate the set of features based on SIFT [14]. In the proposed work, the face and palmprint images are normalized by adaptive histogram equalization [2]. Localization of face is done by the face detection algorithm proposed in [19] while localization of palmprint is made by the algorithm discussed in [20]. After geometric normalization and spatial enhancement, SIFT features [14] are extracted from the face and palmprint images. Each feature point is composed of four types of information – spatial location (x, y), scale (S), orientation (θ) and Keypoint descriptor (K). For the experiment, only keypoint descriptor [14] information has been considered which consists of a vector of 128 elements representing neighborhood intensity changes of each keypoint. More formally, local image gradients are measured at the selected scale in the region around each keypoint. The measured gradients information is then transformed into a vector representation that contains a vector of 128 elements for each keypoint calculated over extracted keypoints. These keypoint descriptor vectors represent local shape distortions and illumination changes. In Figure 1 and Figure 2, SIFT features extractions are shown for the face and palmprint images respectively.
Feature Level Fusion of Face and Palmprint Biometrics by Isomorphic Graph-Based
73
Fig. 1. Face Image and SIFT Keypoints Extraction
Fig. 2. Palm Image and SIFT Keypoints Extraction
3 Feature Partitioning and Isomorphic Graph Representation In most multimodal biometric applications [4], lack of well feature representation leads to the degradation of the performance. Therefore, well representation of feature space and template in terms of invariant feature points may help to exhibit robust and efficient solution towards user authentication. In spite of considering the whole biometric template and all the SIFT keypoints, clustering of all feature points into a number of clusters with limited number of invariant points can be an efficient approach of feature space representation. Clustering approach [21] often gathers together the keypoints which are most relevant and useful members of a particular cluster and association of these keypoints represents the relation within the keypoints in a cluster. The proposed fusion approach partition the SIFT keypoints [14] which are extracted from face and palmprint images into a number of clustering regions with limited number of keypoints in each cluster and then isomorphic graph [12] is formed on each cluster with the keypoints of partitioned face and palmprint images. Prior to construct the isomorphic graphs on clusters, corresponding pairs of clusters are established in
74
D.R. Kisku, P.Gupta, and J.K. Sing
terms of relation between keypoints and geometric distance between keypoints regarded as vertices and edges respectively to itself as auto-isomorphism [12] for face and palmprint images. Three different steps are followed to make a correspondence between a pair of face cluster and a palmprint cluster after clustering of keypoints. Since the number of keypoints on face is more than that on palmprint, face image can be made as reference with respect to palmprint image. Later auto-isomorphism graph is built on the each face cluster with the keypoints and the corresponding isomorphism is built on a palm cluster while point correspondences are established using point pattern matching approach [3]. Then a pair of clusters corresponding to a pair of face and palmprint images is searched by mapping the isomorphic graph of face cluster to the isomorphic graph of palmprint cluster. This process is carried out for all pairs of clusters of face and palmprint images. Lastly, the fusion of each pair of clusters of identical dimension of keypoints is dome by sum rule approach [3]. Since each keypoint descriptor is a vector of 128 elements and each face and palm cluster is represented by an isomorphic graph. Isomorphic graphs for both the face and palm clusters contain same number of keypoints with one-to-one mapping. These two feature vectors containing SIFT keypoints are then fused using sum rule. 3.1 SIFT Keypoints Partitioning Using PAM Characterized K-Medoids Algorithm A medoid can be defined as the object of a cluster, which means dissimilarity to all the objects in the cluster is minimal. K-medoids [13] chooses data points as cluster centers (also called ‘medoids’). K-medoids clusters the dataset of n objects into k clusters and is more robust to noise and outliers as compared to K-means clustering algorithm. This clustering algorithm is an adaptive version of K-means clustering approach and is used to partition the dataset into a number of groups which minimizes the squared error between the points that belong to a cluster and a point designated as the center of the cluster. The generalization of K-medoids algorithm is the Partitioning around Medoids (PAM) algorithm [15] which is applied to the SIFT keypoints of face and palmprint images to obtain the partitioned of features which can provide more discriminative and meaningful clusters of invariant features. The algorithm can be given below. Step 1: Select randomly k number of points from the SIFT points set as the medoids. Step 2: Assign each SIFT feature point to the closest medoid which can be defined by a distance metric (i.e., Minkowski distance over the Euclidean space) Step 3: for each medoid i, i = 1, 2…k for each non-medoid SIFT point j swap i and j and compute the total cost of the configuration Step 4: Select the configuration with the lowest cost Step 5: Repeat Step 2 to Step 5 until there is no change in the medoid. Improved version of PAM clustering using Silhouette approximations. Silhouette technique [15] can be used to verify the quality of a cluster of data points. After applying the PAM clustering technique [15] to the sets of SIFT keypoints for face and palmprint images, each cluster can be verified by Silhouette technique. Let, for each
Feature Level Fusion of Face and Palmprint Biometrics by Isomorphic Graph-Based
75
keypoint i, x(i) be the average distance of i with all the keypoints in cluster cm. Consider x(i+1) as an additional average distance next to x(i). These two successive distances x(i) and x(i+1) are considered to verify the matching of these keypoints i and (i+1) to the cluster where these points are assigned. Then the average distances of i and (i+1) with the keypoints of another single cluster are found. Repeat this process for every cluster in which i and (i+1) are not a member. If the cluster with lowest average distances to i and (i+1) are y(i) and y(i+1) (y(i+1) is the next lowest average distance to y(i)), the cluster is known to be the neighboring cluster of the former cluster in which i and (i+1) are assigned. It can be defined by the following equation
S (i ) =
( y (i ) + y (i + 1)) / 2 − ( x (i ) + x (i + 1)) / 2 max[(( x (i ) + x (i + 1)), ( y (i ) + y (i + 1))]
(1)
From Equation (1) it can be written that -1 ≤ S(i) ≤ 1 When x(i)+x(i+1) < < y(i)+y(i+1), S(i) would be very closer to 1. Distances x(i) and x(i+1) are the measures of dissimilarity of i and (i+1) to its own cluster. If y(i)+y(i+1) is small enough, then it is well matched, otherwise when the value of y(i)+y(i+1) is large then bad match is occurred. Keypoint is well clustered when S(i) is closer to 1 and when that value of S(i) is negative then it belongs to another cluster. S(i) zero means keypoint is on the border of any two clusters. The existing algorithm has been extended by taking another average distances x(i+1) and y(i+1) for a pair of clusters and it has been determined that a better approximation could be arise while PAM algorithm is used for partition the keypoints set. The precision level of each cluster is increased by this improved approximation method where more relevant keypoints are taken instead of taking restricted number of keypoints for fusion. 3.2 Establishing Correspondence between Clusters of Face and Palmprint Images To establish correspondence [10] between any two clusters of face and palmprint images, it has been observed that more than one keypoint on face image may correspond to single keypoint on the palmprint image. To eliminate false matches and to consider the only minimum pair distance from a set of pair distances for making correspondences, first it needs to verify the number of feature points that are available in the cluster of face and that in the cluster of palmprint. When the number of feature points in the cluster for face is less than that of the cluster for palmprint, many points of interest from the palmprint cluster needs to be discarded. If the number of points of interest on the face cluster is more than that of the palmprint cluster, then a single interest point on the palmprint cluster may act as a match point for many points of interest of face cluster. Moreover, many points of interest on the face cluster may have correspondences to a single point of interest on the cluster for palmprint. After computing all distances between points of interest of face cluster and palmprint cluster that have made correspondences, only the minimum pair distance is paired. After establishing correspondence between clusters for face and palmprint images, isomorphic graph representation [12] for each cluster has been formed while removing few more keypoints from the paired clusters. Further iterative relaxation
76
D.R. Kisku, P.Gupta, and J.K. Sing
algorithm [16] is used for searching the best possible pair of isomorphic graphs from all possible graphs. 3.3 Isomorphic Graph Representations of Partitioned Clusters To interpret each pair of clusters for face and palmprint, isomorphic graph representation has been used. Each cluster contains a set of SIFT keypoints [14] and each keypoint is considered as a vertex of the proposed isomorphic graph. A one-to-one mapping function is used to map the keypoints of the isomorphic graph constructed on a face cluster to a palmprint cluster while these two clusters have been made correspondence to each other. When two isomorphic graphs are constructed on a pair of face and palmprint clusters with equal number of keypoints, two feature vectors of keypoints are constructed for fusion.
V1
V3
V2
V1
V2 V3 V4 V5
V5 V6
V4
V6 Face Graph (FG)
Palm Graph (PG)
Fig. 3. One-to-one Correspondence between Two Isomorphic Graphs
Let FG and PG be two graphs and also let f be a mapping function from the vertex set of FG to vertex set of PG. So when • •
f is one-to-one and f(vk) is adjacent to f(wk) in PG if and only if vk is adjacent to wk in FG
Then the function f is known as an isomorphism and two graphs FG and PG are isomorphic. Therefore the two graphs FG and PG are isomorphic if there is a one-to-one correspondence between vertices of FG and those of PG while two vertices of FG are adjacent then so are their images in PG. If two graphs are isomorphic then they are identical graph though the location of the vertices may be different. Figure 3 shows an example of isomorphic graph and one-to-one correspondence between two isomorphic graphs where each colored circle refers independent vertex.
Feature Level Fusion of Face and Palmprint Biometrics by Isomorphic Graph-Based
77
4 Fusion of Keypoints and Matching 4.1 Fusion of Keypoints To fuse the SIFT keypoint descriptors obtained from each isomorphic graph for face and for palmprint images, two different fusion rules are applied serially, viz. sum rule [3] and concatenation rule [2]. Let FG (vk) = (vk1, vk2, vk3,…, vkn) and PG(wk) = (wk1, wk2, wk3, …, wkn) be the two sets of keypoints obtained from two isomorphic graphs for a pair of face and palmprint clusters. Suppose there are m numbers of clusters in each of face and palmprint images. Then these two sets of clusters can be fused using sum fusion rule and the concatenation rule can be further applied to form an integrated feature vector. Suppose that FG1, FG2, FG3, …, FGm sets of keypoints are obtained from a face image after clustering and isomorphism and PG1, PG2, PG3, …, PGm are the sets of keypoints obtained from a palmprint image. The sum rule can be defined for the fusion of keypoints as follows
S FP1 = FG1 + PG1 = {(v 1k1 + w1k1 ), (v1k 2 + w1k 2 ), (v 1k 3 + w1k 3 ),..., (v 1kn + w1kn )} S FP 2 = FG 2 + PG 2 = {(v k21 + wk21 ), (v k22 + wk22 ), (v k23 + wk23 ),..., (v kn2 + wkn2 )} -----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
-----
(2)
S FPm = FGm + PGm = {(v km1 + wkm1 ), (v km2 + wkm2 ), (v km3 + wkm3 ),..., (v knm + wknm )} In Equation (2), SFPj (i = 1,2,..,m), vkj (j = 1,2,…,n) and wkj (j = 1,2,…,n) refer to a fused set of keypoint descriptors for a pair of isomorphic graphs obtained by applying sum fusion rule, a keypoint of a face graph and a keypoint of a palm graph respectively. In the next step, concatenation rule is applied to the sets of keypoints to form a single feature vector. 4.2 Matching Criterion and Verification The K-Nearest Neighbor (K-NN) distance [17] and correlation distance [18] approaches are used to compute distances from the concatenated feature sets. In K-NN approach, Euclidean distance metric is used to get K best matches. Let di be the Euclidian distance of the concatenated feature set of subject Si, i = 1, 2, .... K, which belong to the K best matches against a query subject. Then the subject St is verified against the query subject if dt ≤ Th where dt is the minimum of d1, d2, ..., dK and Th is the pre-assigned threshold. On the other hand, the correlation distance metric is used for computing distance between a pair of reference fused feature set and probe fused feature set. The similarity between two concatenated feature vectors f1 and f2 can be computed as follows
78
D.R. Kisku, P.Gupta, and J.K. Sing
d =
∑f f ∑f∑ 1
1
2
f2
(3)
Equation (3) denotes the normalized correlation between feature vectors f1 and f2. Let di be the similarity of the concatenated feature set of subject Si, i = 1, 2, … K, with respect to that of a query subject. Then the subject St is verified against the query subject if dt ≥ Th where dt is the maximum of d1, d2, ..., dK and Th is the pre-assigned threshold.
5 Experimental Results and Discussion 5.1 Database The proposed feature level fusion approach is tested on IIT Kanpur multimodal database containing face and palmprint images of 400 subjects each. To conduct experiment two face and two palmprint images are taken for each subject. Face images are taken in controlled environment with maximum tilt of head by 20 degree from the origin. However, for evaluation purpose frontal view faces are used with uniform lighting and minor change in facial expression. These face images are acquired in two different sessions. Among the two face images, one image is used as a reference face and the other one is used as a probe face. After preprocessing of face images, it is then cropped by taking the face portion only for evaluation. To detect the face portion efficiently, the algorithm for face detection discussed in [19] is used. Palmprint images are also taken in controlled environment with a flat bed scanner having spatial resolution of 200 dpi. Palmprint impressions are taken on the scanner with rotation of at most ±350 to each user. 800 palmprint images are collected from 400 subjects and each subject is contributed 2 palmprint images. Palmprint images are preprocessed with an image enhancement technique to achieve uniform spatial resolution. In the next step, palm portion is detected from palmprint image and this is achieved by the technique proposed in [20]. 5.2 Experimental Results The performance of the proposed fusion approach is determined using one-to-one matching strategy. Experimental results are obtained using two distance approaches namely, K-Nearest Neighbor (K-NN) distance [17] and normalized correlation [18]. We have not only determined the performance of the proposed feature level fusion approach, but also that of face and palmprint modality independently. Fused feature set which is obtained from reference face and palmprint images is matched with the feature set obtained from probe pair of face and palmprint images by computing the distance between these two sets. The Receiver Operating Characteristic curves (ROC) are determined for the six distinct cases: (i) face modality using K-NN, (ii) face modality using normalized correlation method, (iii) palmprint modality using K-NN, (iv) palmprint modality using normalized correlation method, (v) feature fusion verification using K-NN and (vi) feature fusion verification using normalized correlation method.
Feature Level Fusion of Face and Palmprint Biometrics by Isomorphic Graph-Based
79
Table 1. Different Error Rates METHOD
FAR (%)
Face Recognition (K-NN) Face Recognition (Correlation) Palmprint Verification (K-NN) Palmprint Verification (Correlation) Feature Level Fusion (K-NN) Feature Level Fusion (Correlation)
False Accept Rate (FAR), False Reject Rate (FRR) and recognition rate are determined from the 800 face and palmprint images of 400 subjects. Feature level fusion method using normalized correlation outperforms other proposed methods including individual matching of face and palmprint modalities. The correlation metric based feature level fusion obtained 98.75% recognition rate with 0% FAR while K-NN based feature level fusion method has the recognition rate of 97.5% with 2% FAR. It can also be noted that FAR all the proposed methods are found to be less than its corresponding FRR. On the other hand, palmprint modality performs better than face modality while K-NN and correlation metrics are used. The distance metrics play an important role irrespective of use of invariant features and isomorphic graphs representations. However, the robust representation of face and palmprint images using isomorphic graphs with use of invariant SIFT keypoints and PAM characterized KMedoids algorithm makes use of the proposed feature level fusion method to be efficient one. In single modality, the same approach has been used which are applied in feature level fusion method. Therefore, the different error rates obtained from the
80
D.R. Kisku, P.Gupta, and J.K. Sing
single modalities and feature fusion method are determined under a uniform framework. However, the methodology used for feature level fusion found to be not only superior to other methods and also shows significant improvements in terms of recognition rate and FAR. Table 1 shows different error rates for the proposed methods while the Receiver Operating Characteristics (ROC) curves determined at different operating threshold points are given in Figure 4.
6 Conclusion This paper has presented a feature level fusion system of face and palmprint traits using invariant SIFT descriptor and isomorphic graph representation. The performance of feature level fusion has verified by two distance metrics namely, K-NN and normalized correlation metrics. However, normalized correlation metric is found to be superior to that of K-NN metric for all the verification methods proposed in this paper. The main aim of this paper is to present a robust representation to invariant SIFT features for face and palmprint images which cannot only be useful to the individual verification of face and palmprint modality but has also proved to be useful to the proposed feature level fusion approach. Since isomorphic graph is used for representation of feature points extracted from face and palmprint images, therefore identical numbers of matched pair points are to be used for fusion. In addition PAM characterized K-Medoids algorithm as a feature reduction technique has also proved to be useful for keeping relevant nature of feature keypoints. Single modality palmprint method performs better than face modality while K-NN and correlation approaches are used. Feature level fusion approach attains 98.75% recognition rate at 0% FAR and can also be used efficiently.
References 1. Ross, A., Jain, A.K., Qian, J.Z.: Information Fusion in Biometrics. In: Bigun, J., Smeraldi, F. (eds.) AVBPA 2001. LNCS, vol. 2091, pp. 354–359. Springer, Heidelberg (2001) 2. Kisku, D.R., Gupta, P., Sing, J.K.: Feature Level Fusion of Biometric Cues: Human Identification with Doddington’s Caricature. In: International Conference on Security Technology, Communications in Computer and Information Sciences, pp. 157–164. Springer, Heidelberg (2009) 3. Rattani, A., Kisku, D.R., Bicego, M., Tistarelli, M.: Feature Level Fusion of Face and Fingerprint Biometrics. In: 1st IEEE International Conference on Biometrics, Theory, Applications and Systems, pp. 1–6 (2007) 4. Ross, A., Nandakumar, K., Jain, A.K.: Handbook of Multibiometrics. Springer, Heidelberg (2006) 5. Jain, A.K., Ross, A., Prabhakar, S.: An Introduction to Biometric Recognition. IEEE Transactions on Circuits and Systems for Video Technology, Special Issue on Image- and Video-Based Biometrics 14(1), 4–20 (2004) 6. Yao, Y.-F., Jing, X.-Y., Wong, H.-S.: Face and Palmprint Feature Level Fusion for Single Sample Biometrics Recognition. Neurocomputing 70(7), 1582–1586 (2007) 7. Ross, A., Govindarajan, R.: Feature Level Fusion using Hand and Face Biometrics. In: SPIE Conference on Biometric Technology for Human Identification II, pp. 196–204 (2005)
Feature Level Fusion of Face and Palmprint Biometrics by Isomorphic Graph-Based
81
8. Fu, Y., Ma, Z., Qi, M., Li, J., Li, X., Lu, Y.: A Novel User-specific Face and Palmprint Feature Level Fusion. In: 2nd International Symposium on Intelligent Information Technology Application, pp. 296–300 (2008) 9. Yan, Y., Zhang, Y.-J.: Multimodal Biometrics Fusion using Correlation Filter Bank. In: International Conference on Pattern Recognition, pp. 1–4 (2008) 10. Kisku, D.R., Rattani, A., Grosso, E., Tistarelli, M.: Face Identification by SIFTbased Complete Graph Topology. In: 5th IEEE International Workshop on Automatic Identification Advanced Technologies, pp. 63–68 (2007) 11. Jain, A.K., Feng, J.: Latent Palmprint Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(6), 1032–1047 (2009) 12. Whitney, H.: Congruent Graphs and the Connectivity of Graphs. Am. J. Math. 54, 160– 168 (1932) 13. Zhang, O., Couloigner, I.: A New and Efficient K-Medoid Algorithm for Spatial Clustering. In: Gervasi, O., Gavrilova, M.L., Kumar, V., Laganá, A., Lee, H.P., Mun, Y., Taniar, D., Tan, C.J.K. (eds.) ICCSA 2005. LNCS, vol. 3482, pp. 181–189. Springer, Heidelberg (2005) 14. Lowe, D.G.: Object Recognition from Local Scale-invariant Features. In: International Conference on Computer Vision, pp. 1150–1157 (1999) 15. Theodoridis, S., Koutroumbas, K.: Pattern Recognition, vol. 855. Elsevier, Amsterdam (2006) 16. Horiuchi, T.: Colorization Algorithm using Probabilistic Relaxation. Image and Vision Computing 22(3), 197–202 (2004) 17. Dasarathy, B.V.: Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991) 18. Kumar, A., Wong, D.C.M., Shen, H.C., Jain, A.K.: Personal verification using Palmprint and Hand Geometry Biometric. In: 4th International Conference on Audioand Video-Based Biometric Authentication, pp. 668–675 (2003) 19. Kisku, D.R., Tistarelli, M., Sing, J.K., Gupta, P.: Face Recognition by Fusion of Local and Global Matching Scores using DS theory: An Evaluation with Uni-classifier and Multiclassifier Paradigm. In: IEEE Computer Vision and Pattern Recognition Workshop on Biometrics, pp. 60–65 (2009) 20. Ribarí, C.S., Fratríc, I.: A Biometric Identification System based on Eigenpalm and Eigenfinger Features. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(11), 1698–1709 (2005) 21. Dubes, R., Jain, A.K.: Clustering Techniques: The User’s Dilemma. Pattern Recognition 8(4), 247–260 (1976)
Post-quantum Cryptography: Code-Based Signatures Pierre-Louis Cayrel and Mohammed Meziani CASED – Center for Advanced Security Research Darmstadt Mornewegstrasse, 64293 Darmstadt, Germany [email protected], [email protected]
Abstract. This survey provides a comparative overview of code-based signature schemes with respect to security and performance. Furthermore, we explicitly describe serveral code-based signature schemes with additional properties such as identity-based, threshold ring and blind signatures. Keywords: post-quantum cryptography, coding-based cryptography, digital signatures.
1
Introduction
Secure digital signature are essential components of IT-security solutions, and several schemes, such as the Digital Signature Algorithm DSA and the Elliptic Curve Digital Signature Algorithm ECDSA are already used in practice. The security of such schemes relies on the hardness of the discrete logarithm problem, either in the multiplicative group of a prime field, or in a subgroup of points of an elliptic curve over a finite field. These computational assumptions, however, could be broken in a quantum setting by Shor’s algorithm [39], which was proposed in 1997. Moreover, this algorithm succeeds in polynomial time. Therefore, new, quantum-attack-resistant signature schemes must be designed. Code-based cryptosystems are promising alternatives to classical public key cryptography, and they are believed to be secure against quantum attacks. Their security is based on the conjectured intractability of problems in coding theory, such as the syndrome decoding problem, which has been proven to be NP-complete by Berlekamp, McEliece, and Van Tilborg [4]. In 1978, McEliece [28] first proposed an asymmetric cryptosystem based on the coding theory, which derives its security from the general decoding problem. The general idea is to first select a particular (linear) code for which an efficient decoding algorithm is known, and then to use a trapdoor function to disguise the code as a general linear code. Though numerous computationally-intensive attacks against the scheme appear in the literature [5,20], no efficient attack has been found to date. The McEliece encryption scheme is not invertible, and therefore it cannot be used for authentication or for signature schemes; this is indeed why very few T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 82–99, 2010. c Springer-Verlag Berlin Heidelberg 2010
Post-quantum Cryptography: Code-Based Signatures
83
signature schemes based on coding theory have been proposed. This problem remained open until 2001, when Courtois et al. [15] showed how to achieve a code-based signature scheme whose security is based on the syndrome decoding problem. While this problem is NP-complete, constructions based on it are still inefficient for large numbers of errors. A few code-based signature schemes with additional properties, most of them based on the construction of [15], have recently been published. Lattice-based digital signature schemes for a post-quantum age are described in [8]. This paper describes code-based solutions. Contribution and Organisation After recalling some basic definitions and notations in Section 2, we discuss the various code-based signature schemes, starting with CFS, Stern, and KKS in Section 3. In Section 4, we describe all code-based signature schemes with additional properties, and we conclude in Section 5.
2
Coding Theory Background
This section recalls some basic definitions and then lists some instances of hard problems in coding theory. Definition 1: (Linear Code). An (n, k)-code over Fq is a linear subspace C of the linear space Fnq . Elements of Fnq are called words, and elements of C are codewords. We call n the length, and k the dimension of C. Definition 2: (Hamming distance, weigth). The Hamming distance d(x, y) between two words x, y is the number of positions in which x and y differ. That is, d(x, y) = |{i : xi = yi }|, where x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ). Here, we use |S| to denote the number of elements, or cardinality, of a set S. In particular, d(x, 0) is called the Hamming weigth of x, where 0 is the vector containing n 0’s. The minimum distance of a linear code C is the minimum Hamming distance between any two distinct codewords. Definition 3: (Generator matrix). A generator matrix of an (n, k)-linear code C is a k × n matrix G whose rows form a basis for the vector subspace C. We call a code systematic if it can be characterized by a generator matrix C of the form G = (Ik×k |Ak×(n−k) ), where Ik×k is the k × k identity matrix and A, an k × (n − k) matrix. Definition 4: (Parity-check matrix). A parity-check matrix of an (n, k)linear code C is an (n−k)×n matrix H whose rows form a basis of the orthogonal complement of the vector subspace C, i.e. it holds that, C = {c ∈ Fnq : HcT = 0}. In what follows, we recall several NP-complete problems in coding theory. Note that NP-completeness ensures the impossibility to solve a problem in polynomial time in the worse case. In other words, the property ensures the existence of some hard instances, not the hardness of every instance.
84
P.-L. Cayrel and M. Meziani
Definition 5: (Binary Syndrome Decoding (SD) problem) – Input: An r × n matrix H over F2 , a target binary vector s ∈ Fr2 , and an integer t > 0. – Question: Is there a binary word x ∈ Fn2 of weight ≤ t, such that s = HxT ? This problem has been proved to be NP-Complete by Berlekamp, McEliece, and van Tilborg [4]. In 1994, Barg [2] extended this result of Berlekamp, McEliece, and van Tilborg over Fq by proving that the following problem, called q-ary Syndrome Decoding (q-SD) problem, is NP-complete. Definition 6: (q-ary Syndrome Decoding (q-SD) problem) – Input: An r × n matrix H over Fq , a target vector s ∈ Frq , and an integer t > 0. – Question: Is there a word x ∈ Fnq of weight ≤ t, such that s = HxT ? To end this section, we state the Goppa Code Distinguishing (GD) problem which has been proved NP-complete in [19]. Definition 7: (Goppa Code Distinguishing (GD) problem) – Input: An (n − k) × n binary matrix H. – Question: Is H a parity check matrix of a (n, k)-Goppa code or of a random (n, k)-code ?
3
Code Based Signature Schemes
During the last twenty years several (linear)-code-based signature schemes were proposed; the first attempts were due to Xinmei Wang [43], followed by Harn and Wang [24] and Alabbadi and Wicker [1]. Unfortunately, the security of these constructions cannot be reduced to the hardness of the problems above, and the schemes were proved insecure [43,24]. Several signature schemes based on these problems were subsequently designed; we outline these below. 3.1
Courtois et al.’s Scheme
Unlike RSA, one of the major obstacles to the widespread use of the McEliece or the Niederreiter cryptosystems was the one-to-one nature of the encryption algorithms, i.e, a random word x ∈ Fn2 that is encrypted to, say, y is not necessary decodable. That is, the Hamming distance between y and any codeword is greater than the error capability of the code. This is due the fact that the cardinality of decodable words is very small. To fix this problem, Courtois, Finiasz, and Sendrier [15] (CFS) suggested a method, named complete decoding, which increases the correction capability in order to find the nearest word to a given codeword with high probability.
Post-quantum Cryptography: Code-Based Signatures
85
The CFS signature scheme uses Goppa codes that are subfield subcodes of particular alternant code [27]. For given integers m and t, binary Goppa codes are of length n = 2m , of dimension k = n − mt, and are t-correcting. The basic idea of the CFS signature scheme is to find parameters n, k, and t such that the Niederreiter scheme described in Algorithm 1 is practically invertible.
Algorithm 1. The Niederreiter PKC Key Generation: - Consider an (n, k)-code C over Fq having a decoding algorithm γ. of C. - Construct an (n − k) × n parity check matrix H - Choose randomly an (n − k) × (n − k) invertible matrix Q over Fq . - Choose randomly an n × n permutation matrix P over Fq . - The public key: H = QHP Q, γ) - The private key: (P, H, Encryption: To encrypt a message x ∈ Fn q of weight t - Compute y = HxT . s.t. y = HxT Decryption: To decrypt a cipher y ∈ Fn−k q xT ) - Compute Q−1 y (= HP - Find P xT from Q−1 y by applying γ - Find x by applying P −1 to P xT .
A CFS signature on a message M – see algorithm 2 – is generated by hashing M to a syndrome and then trying to decode it. However, for a t-error correcting Goppa code of length n = 2m , only about 1/t! of the syndromes are decodable. Thus, a counter is appended to M , and the signer updates the counter until the hash value is decodable. The signature consists of both the syndrome’s weight t error pattern and the counter value.
Algorithm 2. The CFS signature Key Generation: of (n, k)-binary, - Pick random parity check matrix H t error-correcting Goppa code with decoding algorithm γ. - Construct binary matrices Q, H and P as in Algorithm 1. Signature: To sign a message M (1) i ← i + 1 (2) x = γ Q−1 h(h(m)i) (3) if no x was found go to 1 - Output (i, x P ) Verification: T - Compute s = Hx and s = h(h(m)i). - The signature is valid if s and s are equals.
Security. The authors of [20] show an attack against the CFS scheme due to Daniel Bleichenbacher. This attack is based on an ’unbalanced’ Generalized Birthday Attack. Therefore, the values of m and t used by CFS have been
86
P.-L. Cayrel and M. Meziani
changed. For a security of more than 280 binary operations, [20] proposed new parameters of: m = 21 and t = 10; m = 19 and t = 11; or m = 15 and t = 12. Furthermore, the authors of modified CFS (mCFS) [16] give a security proof in the random oracle model, where the counter is randomly chosen in {1, . . . , 2n−k }. 3.2
Stern’s Identification Scheme
In 1993, Stern [41] presented a 3-pass zero-knowledge protocol which is closely related to the Niederreiter cryptosystem. This protocol aims at enabling a prover P to identify himself to a verifier V . Its principle is as follows: Let H be an (n − k) × n binary matrix common to all users, where n and k are integers s.t. k ≤ n. Each prover P has an n-bit secret key s of weight t and an (n − k)-bit public identifier y satisfying y = HsT . When P needs to authenticate to V as the owner of y, then P and V run the Algorithm 3. It was shown in [41] that the probability that an adversary successfully impersonates an honest prover is 2/3. Algorithm 3. Stern’s Scheme Key Generation : Given binary random (k, n)-code with parity-check matrix H, secure hash function h. - Private key: s ∈ Fn 2 , such that w(s) = t - Public key: y ∈ Fn−k , such that HsT = y 2 Commitments: - P chooses randomly u from Fn 2 and σ permutation over {1, . . . , n} - P computes the commitments c1 , c2 , and c3 as follows: c1 = h( σ, HuT ), c2 = h(σ(u)), c3 = h(σ (u ⊕ s)) - P sends c1 , c2 , and c3 to V Challenge: V randomly chooses b ∈ {0, 1, 2} and sends it to P Response: - If b = 0: P sends u and σ to V - If b = 1: P sends u ⊕ s and σ to V - If b = 2: P sends σ(u) and σ(s) to V Verification : - If b = 0: V checks if c1 and c2 were honestly computed - If b = 1: V checks if c1 and c3 were honestly computed - If b = 2: V checks if c2 and c3 were honestly computed and w(σ(s)) = t
In 1995, Véron [42] proposed a dual version of Stern’s scheme, which, unlike other schemes based on the SD problem, uses a generator matrix of a random binary linear code. This allows, among other things, for an improved transmission rate. It is possible to convert Stern’s construction into a signature algorithm using the Fiat-Shamir method [18]: the verifier-queries are replaced by values suitably derived from the commitments and the message to be signed. In this case, however, the signature is large, of roughly 120 Kbits. A variation of the Stern construction using double circulant codes is proposed in [21]. The circulant structure of the public parity-check matrix allows for an easy generation of the whole binary matrix with very little memory storage. They
Post-quantum Cryptography: Code-Based Signatures
87
propose a scheme with a public key of 347 bits and a private key of 694 bits. We can also imagine a construction based on quasi-dyadic codes as proposed in [30]. A secure implementation [11] of Stern’s scheme uses quasi-circulant codes. This scheme also inherits Stern’s natural resistance to leakage attacks such as SPA and DPA. 3.3
Kabatianskii et al.’s Scheme
Kabatianskii, Krouk, and Smeets (KKS) [25] proposed a signature scheme based on arbitrary linear error-correcting codes. Actually, they proposed three versions (using different linear codes) presented in the sequel and all have one point in common: the signature is a codeword of a linear code. We give a full description of the KKS scheme which is illustrated in Algorithm 4. First consider a code C defined by a random parity-check matrix H; let d be a good estimate of its minimum distance. Next, consider a linear code U of length n ≤ n and dimension k defined by a generator matrix G = [gi,j ]. We suppose that there exist integers t1 and t2 s.t. t1 ≤ w(u) ≤ t2 for any non-zero codeword u ∈ U. Let J be a subset of {1, . . . , n} of cardinality n , H(J) be the sub matrix of H consisting of the columns hi where i ∈ J, and define an r × n matrix def ∗ ∗ F = H(J)GT . Define a k × n matrix G∗ = [gi,j ] with gi,j = gi,j if j ∈ J and ∗ ∗ gi,j = 0 otherwise. The KKS-signature is σ = mG for any m ∈ Fkq . The main difference with Niederreiter signature occurs in the verification step where the receiver checks that: t1 ≤ w(σ) ≤ t2 and F · mT = H · σ T . Algorithm 4. The KKS Signature Key Generation: - Pick random (n, n − r) code C, then choose secretly and randomly: (1) Generator matrix G of an (n , k) code U with n < n and such that ∀v ∈ V, v = 0 t1 ≤ w(v) ≤ t2 (2) Subset J of {1, · · · , n} of cardinality n - Form the submatrix H(J) consisting of the columns hi of a parity check matrix H of C where i ∈ J - Define the matrix F as F = H(J)GT . Private key: (J, G) Public Key: (F, H, t1 , t2 ) Signature: To sign a message m (1) Calculate σ ∗ = m · G (2) Produce σ such that ∗ σi if i ∈ J σi = 0 if j ∈ /J Verification: Given (σ, m) test whether the following holds: (1) H · σ T = F · m (2) t1 ≤ w(σ) ≤ t2
88
P.-L. Cayrel and M. Meziani
Security. The authors of [25] proposed four KKS-signature schemes: KKS-1, KKS-2, KKS-3, KKS-4, which are claimed to be as secure as the Niederreiter scheme if the public parameters do not provide any information. Unfortunately, in [12] the author showed that a generated KKS-signature discloses a lot of information about the secret set J, and so an adversary can find the secret matrix G with a very high probability. Indeed, an attacker needs about 277 binary operations and at most 20 signatures to break the original KKS-3 scheme. For this reason, the authors of [12] suggest new parameters for a security of 40 signatures, as follows: n = 2000, k = 160, n = 1000, r = 1100, t1 = 90 and t2 = 110.
4
Code Based Signature Schemes with Additional Properties
There exist just a few code-based signature schemes with special properties (SP) up to date, namely blind, (threshold) ring signatures, and identity-based signature schemes. By comparison, classical cryptography includes more than sixty classes of signature schemes, some with special properties such as group- or proxy signature. This variety reflects the wide range of application scenarios. In recent years, existing signature schemes were combined with specific protocols in order to achieve enhanced code-based constructions with additional features, such as anonymity. The properties of the underlying basic scheme could be e.g. authentication and non-repudiation. In what follows, we give a state of the art of such signature schemes. 4.1
Ring Signatures
The concept of ring signatures was firstly introduced in 2001 by Rivest, Rivest, and Tauman [34]. Such signature schemes allow signers of a document to remain anonymous in a group of users, called a ring. As opposed to group signatures, no group manager, group setup procedure, cooperation, and revocation mechanisms are needed in ring signatures: the signer specifies an arbitrary ring and then signs on its behalf without permission or assistance from other users. To generate a valid signature, users need their private keys and some other members’ public keys. Zheng et al.’s scheme. In [45], Zheng, Li, and Chen (ZLC) proposed the first code-based ring signature, which extends the CFS signature scheme and is based on the syndrome decoding problem. To describe the ZLC signature, we use the following notations. Let N and l be the number of potential signers and of signers participating in the signature-generating, respectively. Denote by Si and Sr a potential signer and the ring signer, respectively. Let M be a message and h, a . Write the concatenation of s1 and s2 as (s1 |s2 ); let hash function of range Fn−k 2 R
− U indicate that u is randomly selected from a set U. The ring signer and u← all other potential signers run Algorithm 5 to generate a ring signature on M .
Post-quantum Cryptography: Code-Based Signatures
89
Algorithm 5. The ZLC ring Signature Key Generation: Potential signers Si generate their private/public keys as in the CFS algorithm (Alg.2): i Pi - The public key: Hi = Qi H i , Qi , γi ) - The private key: (Pi , H Signature: To sign message M (1) Initialization: For j = 0, 1, 2, · · · R -x ¯j ← − {0, 1}n−k - Set xr+1,j = h(N |h(M )|¯ xj ) (2) Generating ring sequences: For j = 0, 1, 2, · · · R - zi,j ← − {0, 1}n s.t. w(zi,j ) = t T ⊕ xi,j - Set xi+1,j = h N |h(M )|Hi · zi,j (3) Find an j0 s.t. xr,j0 ⊕ x ¯j0 is decodable T = xr,j0 ⊕ x ¯j0 (4) Apply the decoding algorithm to get an zr,j0 s.t. Hr · zr,j 0 (5) Compute the index Izi,j0 corresponding to zi,j0 (6) The ring signatutre: (x0,j0 , Iz1,j0 , · · · , Izl−1,j0 ) Verification: Given (x0,j0 , Iz1,j0 , · · · , Iz(l−1),j0 ) (1) Derive zi,j0 from Izi,j0 for each i ∈ {0, 1, · · · , l − 1} T ⊕ xi,j0 for i ∈ {0, 1, · · · , l − 1} (2) Compute xi+1,j0 = h N |h(M )|Hi · zi,j 0 (3) Accept if xl,j0 = x0,j0 and reject otherwise.
Security and Efficiency. The ZLC scheme is based on CFS signatures, whose security relies on two assumptions: It is hard to solve an instance of the SD problem, and it is hard to distinguish a Goppa code from a random one – the GD problem. The authors of [45] also showed that the ZLC construction provides unforgeability and anonymity. Indeed, the probability of forging a signature is 1 2n , and any adversary outside the ring cannot guess the signer’s identity due to the uniform distribution of xi,j0 . This scheme is as efficient as the CFS signature, and verification takes tl column operations1 and the l + 1 hash computations; total signature length is close to (n − k) + log2 ( nt )l bits, where log2 ( nt ) is the number of bits required to address a word of length n and weight t. For instance, for m = 16 and t = 9, the signature length is about 144 + 126l bits. 4.2
Threshold Ring Signatures
Since its introduction in 2001, a lot of effort has gone into modifying and extending the ring signature scheme [34]. One such extension is the BSS threshold ring signature scheme first proposed by Bresson, Stern and Szydlo [7] in 2002. In threshold ring signature schemes, the secret signing key is distributed amongst N members; at least l of these members are required to generate a valid signature. More precisely, in an (l, N ) threshold signature scheme, any set of l members can generate an l-out-of-N signature on behalf of the whole group, without revealing their identity. This type of construction decreases the cost of signing, as it does not require the participation of all N members. 1
One column operation is one access to a table plus one operation like a comparison or an addition.
90
P.-L. Cayrel and M. Meziani
Several threshold ring signatures have followed [7]. For example, Wong et al. [44] proposed the tandem construction, a threshold signature scheme using a secure multiparty trapdoor transformation. The threshold ring signature in [26] uses both RSA- and DL-based public keys at the same time and introduces the notion of separability: all signers can select their own keys independently, with distinguishable parameter domains. These signatures, however, and many others, are factoring- ECC-, or pairing-based. Only two coding-based proposals are known, however, up to date. In the following, we outline these proposals. Aguilar et al.’s scheme (ACG). The first non-generic code-based threshold ring signature scheme is introduced in [29]; it generalizes Stern’s identification protocol into a threshold ring signature scheme, using the Fiat-Shamir paradigm [18]. Algorithm 6 explains how Aguilar et al.’s construction works. We denote by N the number of signers (provers) in the ring, and let l with (l ≤ N ) stand for the number of first signers. A leader SL amongst them gives to the ring members their public keys. Security and Efficiency. Aguilar et al.’s identification scheme is a zeroknowledge protocol with a cheating probability of 2/3 as in Stern’s scheme. Its security relies on the hardness of the SD problem: finding a vector s ∈ FnN of 2 weight tl and a null syndrome w.r.t. H such that each block (out of N ) of length is of weight t or 0. The signing complexity and signature length are N times those of Stern’s signature scheme: a complexity of about 140n2 N independently of l, and a length of about 20kB ×N . In order to reduce the public key size, [29] suggested the use of double-circulant matrices, requiring nN/2, rather than n2 N/2 storage bits. For double-circulant matrices, [21] proposes parameters n = 347 and t = 76 for an 83-bit security level, rather than n = 634, t = 69, and a rate 1/2, as in an 80-bit secure Stern’s scheme. Dallot et al.’s scheme . A second code-based threshold ring signature has been proposed by Dallot and Vergnaud (DV) in [17], combining the generic construction of Bresson et al. [7] with the CFS signature scheme. The DV construction requires the following: an (n, k) t-error-correcting binary Goppa code with n = 2m and k = n − mt, where m a positive integer. We denote by N and l the number of ring users and the number of signers respectively. Let h be a pubmt lic collision-resistant hash function of range {0, 1} , f(·) , a trapdoor one-way a mt function : {0, 1} → {0, 1} , and (Ek,i ), a family of random permutations that encrypts b-bit messages with a0 -bit keys and an additional parameter i ∈ [1, N ]. R − S. We again denote concatenation as (s|s ) and random selection by x ← For simplicity, we index the signers as 1, · · · , l. In addition, each ring member i is associated with a secret/public key pair as in the CFS construction, i.e. the i Pi and the secret key is (Pi , H i , Qi , γi ). Dallot et al.’s public key is Hi = Qi H procedure is presented in Algorithm 7.
Post-quantum Cryptography: Code-Based Signatures
91
Algorithm 6. The ACG Identification Scheme Key Generation: Each potential signer Si has: - The public key: the (n − k) × n binary matrix Hi - The private key: n-bit word si the weight t s.t. Hi sTi = 0, - The ring public key : (n − k)N × nN binary matrix H defined by: ⎛ ⎞ H1 0 · · · 0 ⎜ 0 H2 0 0 ⎟ ⎜ ⎟ H=⎜ . . ⎟ ⎝ .. . . Hi 0 ⎠ 0 0 · · · HN Commitment: - Each prover Si (among l signers) chooses randomly zi ∈ Fn 2 and a permutation σi of {1, · · · , n} - Each prover Si sendsto SL three commitments c1,i , c2,i and c3,i given by: c1,i = h( σi , Hi ziT ), c2,i = h(σi (zi )) and c3,i = h(σi (zi ⊕ si )) - SL generates the N − l missing commitments for the N − l non-signers by fixing all remaining si at 0. - SL chooses randomly a constant n-block permutation Π on N blocks - SL computes the master commitments C1 , C2 and C3 using c1,i , c2,i and c3,i by: C1 = h(Π (c1,1 , · · · , c1,N )), C2 = h(Π (c2,1 , · · · , c2,N )), C3 = h(Π (c3,1 , · · · , c3,N )) - SL sends C1 , C2 and C3 to the verifier V . Challenge: -V sends a challenge b ∈ {0, 1, 2} to SL which forwards this challenge to l signers. Response: - Perform the challenge step of the Stern’s protocol between each prover Si and SL - SL simulates the missing N −l Stern’s protocol with si = 0 for all l+1 ≤ i ≤ N - SL gathers all answers to create the global response for V as follows: * If b = 0: SL sets z = (z1 · · · , zN ), Ω = Π ◦ (σ1 , · · · , σN ) and reveals z and Ω * If b = 1: SL constructs x = (y1 ⊕ s1 , · · · , yN ⊕ sN ) and reveals x and Ω * If b = 2: SL constructs Π(y1 , · · · , yN ) and reveals Ω(s1 , · · · , sN ) Verification: - If b = 0: V checks that Ω(s) is a n-block permutation and that C1 , C2 were honestly computed. - If b = 1: V checks that Ω(s) is a n-block permutation and that C1 , C3 were honestly computed. - If b = 2: V checks that: * C2 , C3 were honestly computed * w(Ω(s)) = lt * each of block of Ω(s) of length n has weigth t or 0.
92
P.-L. Cayrel and M. Meziani
Algorithm 7. The DV threshold ring signature scheme Key Generation : Each signer in the ring has to: - choose an (n, k)-code Ci over F2 having a decoding algorithm γi correcting up to t errors. i of Ci . - construct an n × (n − k) parity check matrix H - choose randomly an (n − k) × (n − k) invertible matrix Qi over F2 . - choose randomly an n × n permutation matrix Pi over F2 . i Pi - The public key: Hi = Qi H i , Qi , γi ) - The private key: (Pi , H Signature: To generate a signature on a message M : - compute the symmetric key for E: k = h(M ). - compute value at origin: v0 = h(Hi , · · · , HN ) . - choose random seeds: For each i = l + 1, · · · , N do R (1) xi ← − {x ∈ Fn s.t. w(x) ≤ t} 2 R (2) ri ← − {1 · · · , 2tm } (3) yi ← − Hi xTi + h(M |ri ) - compute a sharing polynomial: Find a polynomial f over F2tm s.t. - deg(f ) = N − l - f (0) = v0 - f (i) = Ek,i (yi ) ∀ l + 1 ≤ i ≤ N - For each i = 1, · · · , l do −∅ - xi ← - While xi = ∅ do R (1) ri ← − {1 · · · , 2tm } − γi (Q−1 · (Ek,i (f (i)) + h(M |ri ))) (2) zi ← i − zi Pi−1 (3) if zi = ∅ then xi ← - The signature: σ = (N, x1 , . . . , xN , r1 , . . . , rn , f ) Verification: Given (N, x1 , . . . , xn , r1 , . . . , rn , f ) any user can verify the signature by: - Recovering the symmetric key: k = h(M ) - Recovering (yi ): yi = Hi xTi + h(M |ri ) - checking the equations: ? (1) f (0) = h(H1 , · · · , HN ) ?
(2) f (i) = Ek,i (yi ) ∀ 1 ≤ i ≤ N
Security and Efficiency. The DV construction is a provably secure threshold ring signature satisfying three properties: consistency, anonymity, and unforgeability [17]. Unforgeability is proved based on two coding theory problems. One is the well known NP-complete [4] Bounded Distance Decoding problem (GBDP) which is a variant of the SD problem with the constraint that the number of errors is up to (n − k/ log2 (n)) as in the mCFS signature scheme. The second is the GCD problem: distinguishing a randomly sampled Goppa code from a random linear code (with the same parameters); this problem is widely considered as difficult [36]. The complete security proof of the DV scheme is in [17]. For a ring with N members, the set of public keys (Hi ) are stored in n(n − k)N bits. To produce a valid signature, the signer has to perform the following
Post-quantum Cryptography: Code-Based Signatures
93
calculations: computing N − l syndromes, N polynomial evaluations that can be performed in 2N (N −l) binary operations using Horner’s rule and l(t!) decodings of Goppa codes, each consisting of: computing a syndrome (in about t2 m2 /2 binary operations), computing a localisator polynomial (6t2 m binary operations) and computing its roots ( 2t2 m2 binary operations). Thus, the total cost for generating a signature would be (N − l)t2 m2 /2 + 2N (N − l) + l(t!)(3/2 + 6/m) binary operations. Signature verification requires (N + 1) polynomial evaluations N syndromecomputations, resulting in 2(N + 1)(N − l) + N t2 m2 /2 binary operations. The signature consists of: the number N of ring-users, which are stored in log2 (N ) bits,
t m N random vectors xi of weight up to t which can be indexed with a log2 i=1 2i bit counter, N random vectors ri in {0, . . . , 2mt − 1} requiring at most mt bits and a polynomial of N − l which needs degree (N − l + 1)mt bits. The signature size is
t 2m thus about N log2 i=1 i + 2mt + log2 (N ) − (l − 1)mt bits. 4.3
Blind Signatures
Blind signatures were first introduced by Chaum [13] for applications such as eVoting or electronic payment systems, which require anonymity. The main goal of Chaum’s scheme is to ensure Blindness (i.e., the signed message is disguised – blinded – before signing) and Untraceability (i.e., the signer cannot trace the signed message after the sender has revealed the signature publicly). Several blind signature schemes followed Chaum’s proposal. In 1988, the authors of [14] showed a new signature scheme for electronic payment systems. Later, the authors of [40] introduced fair blind signature schemes. In 1992, another blind signature scheme based on factoring and discrete logarithm-based identification schemes [31] have been developed. Based on Schnorr’s [35] and GuillouQuisquater’s [23] protocols, provably secure blind signature schemes were presented in [33]. As far as we know, there exists only a single code-based blind signature scheme, namely Overbeck’s construction [32]. Overbeck’s scheme. The general idea behind Overbeck’s protocol is, instead of blinding the message, to use permuting kernels in order to blind the signer’s public key from a public key of a code. A blind signature is thus generated by the owner of a valid secret key, with the blinded public key. During verification, the blinder gives a static zero-knowledge proof showing that the private and public keys are paired. This proof is based on the Permuted Kernels Problem (PKP) which can be formulated as follows: Given a random (n, k) code and a random permuted subcode of dimension L < k, find the permutation. This problem is known to be NP-hard in the general case [38]. For simplicity, we denote a code by its generator matrix. Let h be a hash function, r be a random seed, and w be a positive integer. Denote by PKP-proof(A, B) the static PKP-Proof that code A is an isometric subcode of code B, s.t. dim(A) ≤ dim(B). The notation dim(C) stands for the dimension of the code C. A slightly modified version of Overbeck’s blind signature scheme is depicted in Algorithm 8.
94
P.-L. Cayrel and M. Meziani
Algorithm 8. Overbeck’s blind signature Key Generation: - choose an (n, k)-code C over F2 having a decoding algorithm γ correcting up to t errors. of C. - construct an (n − k) × n parity check matrix H - choose randomly an (n − k) × (n − k) invertible matrix Q over Fq . - choose randomly an n × n permutation matrix P over Fq . - The public key: H = QHP - The private key: (P, H, Q, γ) Blinding: The user has to: - generate a random p × n public matrix R0 over Fq - generate a random L × p matrix K of full rank over Fq - set R = KR0 - generate a n × n permutation matrix Π - create the blind generator matrix Gb as follows: Gb = [ G ]Π R - derive from Gb the blind check matrix Hb - solve Hb xT = h(M |Hb ) in x - output s = H(xΠ −1 ) (the blind syndrome) and u = (r, Π, Hb ) (the unblinding information) Unblinding: Given M , s and a correct signature σ of H: - Check if that: w(σ) = t and Hσ T = s. If not output failure. - Generate a PKP-Proof(Hb , H0 ) with H0 = [ RH0 ] - Output the blind signature σb = (r, Hb , σπ, PKP-Proof(Hb , H0 )) Verification: Given r, M , and σb = (r, Hb , σπ, PKP-Proof(Hb , HR )), where HR is a parity-check matrix of the code generated by R, verify σb by: - Generate the matrix R0 from r - Find some vector τ satisfying Hτ T = h(M |Hb ) - Verify w(σΠ) < t and (τ − σΠ) ∈ Hb - Check PKP-Proof(Hb , HR )
Security and Efficiency. Overbeck assessed the efficiency of his scheme by applying it to the CFS construction. For a (2m , 2m − mt) binary Goppa code, the complexity of this scheme is as follows: To store a public parity check matrix key H 2m × mt bits are needed. The blind matrix Hb is a 2m × (mt− L) binary matrix. To m mt 2 generate a single signature, the Blinding algorithm is run 2 / t times, each time requiring m3 t2 binary operationsfor the signer and m3t3 for the blinder. Thus, m 3 2 mt 2 the total signing complexity is about 2 / t (m t + m3 t3 ) binary operations per signature. The blind signature size mainly depends on PKP-Proof(Hb , H0 ) requiring the storage of the generator matrix Gb of size ((k + L) × n) bits in each round. The author of [32] does not explicitly prove the proposed construction, but he claims that the scheme is provably secure based on the hardness of some instances of the PKP and SD problems.
Post-quantum Cryptography: Code-Based Signatures
4.4
95
Identity-Based Signatures
Identity-based cryptography was proposed by Shamir in 1984 [37] so as to simplify PKI requirements. An identity is associated with data such as an e-mail or IPaddress instead of a public key; the secret key is issued by a trusted Key Generation Center (KGC) thanks to a master secret that only the KGC knows. Some PKI and certificate costs can now be avoided. However, identity-based cryptography suffers from a major drawback: the KGC must be trusted completely. A solution to this problem, also known as the key escrow problem, is to employ multiple PKGs that jointly produce the master secret key (see [6]). Identity-based cryptography has led to the development, in 1984, of identitybased signature (IBS) schemes. One of the most interesting contributions to this subject is the framework of [3], for which a large family of IBS are proved secure. This work was later extended in [22], which implied the existence of generic IBS constructions with various additional properties that are provably secure in the standard model. Cayrel et al.’s identification scheme. The first coding-based IBS appeared in [10] is due to Cayrel, Gaborit and Girault (CGG). The main idea of this scheme is to combine the mCFS scheme with a slightly modified Stern scheme to obtain an IBS scheme whose security relies on the syndrome decoding problem. The mCFS scheme is used to solve an instance of the SD problem given a hash value of an identity in the first step, while Stern’s protocol is used for identification in the following step. Consider a linear (n, k)-code over F2 with a disguised parity-check matrix H with H the original parity-check matrix, Q invertible, and S defined by H = QHS a permutation matrix. The matrix H is public, while Q and S are kept secret by a trusted Key Generation Center (KGC). Denote by h a hash function with outputs in {0, 1}n−k . In addition, let y be the identity associated to the prover wishing to authenticate to a verifier. The Cayrel et al. identification scheme works as Stern’s protocol, with a few variations. Security and Efficiency. Cayrel et al.’s identification scheme (CFS-Stern IBS) is provably secure against passive (i.e., eavesdropping only) impersonation attacks [10], based on the hardness of the SD and GD problems. The security and the performance of the proposed identification scheme mainly depends on the difficulty of finding a couple {s, j} without the description H. At the same time, an attacker needs to minimize the number of attempts used to find j, so as to be able to find s with minimal cost. 4.5
Summary
In Table 1 we summarize the complexity of the code-based proposals with special properties by using the following notations: t is the correction capability of the code, n denotes the code length which equals 2m in the case of Goppa codes, k indicates the code dimension, N is number of users in the ring, l is number of involved signers
96
P.-L. Cayrel and M. Meziani
Algorithm 9. The CGG Identity-based scheme Key Deliverance: - The prover sends its identity y to KGC - TCA runs the CFS algorithm (Alg. 2) on y to get {s, j} s.t. h(h(y)|j) = HsT with w(s) ≤ t - The Public key: h(h(y)|j) - The Private key: {s, j} Identification: Run the Stern’s protocol as follows: - Commitments: - P chooses randomly u from Fn 2 and σ permutation over {1, . . . , n} - P computes the commitments c1 , c2 and c3 as follows: c1 = h σ, HuT , c2 = h(σ(u)), c3 = h(σ (u ⊕ s)) - P sends c1 , c2 and c3 and j to V - Challenge: V choose randomdy b ∈ {0, 1, 2} and sends it to P - Response: - If b = 0: P sends u and σ to V - If b = 1: P sends u ⊕ s and σ to V - If b = 2: P sends σ(u) and σ(s) to V - Verification : - If b = 0: V checks if c1 and c2 were honestly computed - If b = 1: V checks if c1 and c3 were honestly computed - If b = 2: V checks if c3 and c3 were honestly computed and w(s) = t
in the ring , L is dimension of the subcode introduced in [32] and ri is number of rounds for i = 1, 2. Based on these notations, we define the two following quantities:
t m – A(m, t, N, l) = N log2 i=1 2i + 2mt + log2 (N ) − (l − 1)mt – B(m, t, N, l) = (N − l)t2 m2 /2 + 2N (N − l) + l(t!)(3/2 + 6/m). For example, for m = 22, t = 9, Table 1. Code-based signatures with special properties with (m, t, N, l, L, r1 , r2 ) = (15, 12, 100, 50, 40, 58, 80) Schemes Identity Based Signatures PGGG [9] Ring Signatures ZLC [45] Threshold(ring) Signatures ACG [29] DV [17] Blind Signatures
Several code-based signature schemes already exist, exhibiting features such as small public key size (Stern [41]), short signature size (CFS [15]), or a good balance
Post-quantum Cryptography: Code-Based Signatures
97
of public key and signature size at the expense of security (KKS [25]). By combining such schemes, additional constructions such as identity-based, threshold ring, or blind signatures can be obtained. However these schemes also inherit the disadvantages of the underlying protocols. We strongly encourage the code-based research community to actively investigate future possibilities for post-quantum signature schemes, such as multi-signatures, group signatures, or linkable signatures.
References 1. Alabbadi, M., Wicker, S.B.: Digital signature scheme based on error–correcting codes. In: Proc. of 1993 IEEE International Symposium on Information Theory, pp. 19–29. Press (1993) 2. Barg, S.: Some New NP-Complete Coding Problems. Probl. Peredachi Inf. 30, 23–28 (1994) 3. Bellare, M., Chanathip, N., Gregory, N.: Security Proofs for Identity-Based Identification and Signature Schemes. J. Cryptol. 22(1), 1–61 (2008) 4. Berlekamp, E., McEliece, R., van Tilborg, H.: On the inherent intractability of certain coding problems. IEEE Transactions on Information Theory 24(3), 384–386 (1978) 5. Bernstein, D.J., Lange, T., Peters, C.: Attacking and defending the McEliece cryptosystem. Cryptology ePrint Archive, Report 2008/318 (2008), http://eprint.iacr.org/ 6. Boneh, D., Franklin, M.: Identity-based encryption from the Weil pairing, pp. 213– 229. Springer, Heidelberg (2001) 7. Bresson, E., Stern, J., Szydlo, M.: Threshold Ring Signatures and Applications to Ad-hoc Groups. In: Yung, M. (ed.) CRYPTO 2002. LNCS, vol. 2442, pp. 465–480. Springer, Heidelberg (2002) 8. Buchmann, J., Lindner, R., Ruckert, M., Schneider, M.: Post-Quantum Cryptography: Lattice Signatures (2009) 9. Cayrel, P.-L., Gaborit, P., Galindo, D., Girault, M.: Improved identity-based identification using correcting codes. CoRR, abs/0903.0069 (2009) 10. Cayrel, P.-L., Gaborit, P., Girault, M.: Identity-based identification and signature schemes using correcting codes. In: Augot, D., Sendrier, N., Tillich, J.-P. (eds.) WCC 2007, pp. 69–78 (2007) 11. Cayrel, P.-L., Gaborit, P., Prouff, E.: Secure Implementation of the Stern Authentication and Signature Schemes for Low-Resource Devices. In: Grimaud, G., Standaert, F.-X. (eds.) CARDIS 2008. LNCS, vol. 5189, pp. 191–205. Springer, Heidelberg (2008) 12. Cayrel, P.L., Otmani, A., Vergnaud, D.: On Kabatianskii-Krouk-Smeets Signatures. In: Carlet, C., Sunar, B. (eds.) WAIFI 2007. LNCS, vol. 4547, pp. 237–251. Springer, Heidelberg (2007) 13. Chaum, D.: Blind Signatures for Untraceable Payments. In: CRYPTO, pp. 199–203 (1982) 14. Chaum, D., Fiat, A., Naor, M.: Untraceable Electronic Cash. In: Goldwasser, S. (ed.) CRYPTO 1988. LNCS, vol. 403, pp. 319–327. Springer, Heidelberg (1990) 15. Courtois, N., Finiasz, M., Sendrier, N.: How to Achieve a McEliece-based Digital Signature Scheme. In: Boyd, C. (ed.) ASIACRYPT 2001. LNCS, vol. 2248, pp. 157– 174. Springer, Heidelberg (2001)
98
P.-L. Cayrel and M. Meziani
16. Dallot, L.: Towards a Concrete Security Proof of Courtois, Finiasz and Sendrier Signature Scheme (2007), http://users.info.unicaen.fr/˜ldallot/download/articles/ CFSProof-dallot.pdf 17. Dallot, L., Vergnaud, D.: Provably Secure Code-Based Threshold Ring Signatures. In: Cryptography and Coding 2009: Proc. of the 12th IMA International Conference on Cryptography and Coding, pp. 222–235. Springer, Heidelberg (2009) 18. Fiat, A., Shamir, A.: How to prove yourself: practical solutions to identification and signature problems. In: Odlyzko, A.M. (ed.) CRYPTO 1986. LNCS, vol. 263, pp. 186–194. Springer, Heidelberg (1987) 19. Finiasz, M.: Nouvelles constructions utilisant des codes correcteurs dérreurs en cryptographie à clé publique. PhD thesis, INRIA - Ecole Polytechnique (2004) 20. Finiasz, M., Sendrier, N.: Security Bounds for the Design of Code-based Cryptosystems. To appear in Advances in Cryptology – Asiacrypt 2009 (2009), http://eprint.iacr.org/2009/414.pdf 21. Gaborit, P., Girault, M.: Lightweight code-based authentication and signature. In: IEEE International Symposium on Information Theory – ISIT 2007, Nice, France, pp. 191–195. IEEE, Los Alamitos (2007) 22. Galindo, D., Herranz, J., Kiltz, E.: On the Generic Construction of Identity-Based Signatures with Additional Properties. In: Lai, X., Chen, K. (eds.) ASIACRYPT 2006. LNCS, vol. 4284, pp. 178–193. Springer, Heidelberg (2006) 23. Guillou, L.C., Quisquater, J.-J.: A practical zero-knowledge protocol fitted to security microprocessor minimizing both transmission and memory. In: Günther, C.G. (ed.) EUROCRYPT 1988. LNCS, vol. 330, pp. 123–128. Springer, Heidelberg (1988) 24. Harn, L., Wang, D.C.: Cryptoanalysis and modification of digital signature scheme based on error–correcting codes. Electronics Letters 28(2), 157–159 (1992) 25. Kabatianskii, G., Krouk, E., Smeets, B.J.M.: A digital signature scheme based on random error-correcting codes. In: Darnell, M.J. (ed.) Cryptography and Coding 1997. LNCS, vol. 1355, pp. 161–167. Springer, Heidelberg (1997) 26. Liu, J.K., Wei, V.K., Wong, D.S.: A Separable Threshold Ring Signature Scheme. In: Lim, J.-I., Lee, D.-H. (eds.) ICISC 2003. LNCS, vol. 2971. Springer, Heidelberg (2004) 27. MacWilliams, F.J., Sloane, N.J.A.: The theory of error-correcting codes, vol. 16. North-Holland Mathematical Library, Amsterdam (1977) 28. McEliece, R.J.: A public-key cryptosystem based on algebraic coding theory. Jpl dsn progress report 42-44, pp. 114–116 (1978) 29. Aguilar Melchor, C., Cayrel, P.-L., Gaborit, P.: A New Efficient Threshold Ring Signature Scheme Based on Coding Theory. In: Buchmann, J., Ding, J. (eds.) PQCrypto 2008. LNCS, vol. 5299, pp. 1–16. Springer, Heidelberg (2008) 30. Misoczki, R., Barreto, P.S.L.M.: Compact McEliece Keys from Goppa Codes. Preprint (2009), http://eprint.iacr.org/2009/187.pdf 31. Okamoto, T.: Provably Secure and Practical Identification Schemes and Corresponding Signature Schemes. In: Brickell, E.F. (ed.) CRYPTO 1992. LNCS, vol. 740, pp. 31–53. Springer, Heidelberg (1993) 32. Overbeck, R.: A Step Towards QC Blind Signatures. Cryptology ePrint Archive, Report 2009/102 (2009), http://eprint.iacr.org/ 33. Pointcheval, D., Stern, J.: Provably Secure Blind Signature Schemes. In: Kim, K.-c., Matsumoto, T. (eds.) ASIACRYPT 1996. LNCS, vol. 1163, pp. 252–265. Springer, Heidelberg (1996) 34. Rivest, R.L., Shamir, A., Tauman, Y.: How to leak a secret. In: Boyd, C. (ed.) ASIACRYPT 2001. LNCS, vol. 2248, p. 552. Springer, Heidelberg (2001)
Post-quantum Cryptography: Code-Based Signatures
99
35. Schnorr, C.-P.: Efficient Signature Generation by Smart Cards. J. Cryptology 4(3), 161–174 (1991) 36. Sendrier, N.: Cryptosystèmes à clé publique basés sur les codes correcteurs d’erreurs. Mémoire d’habilitation à diriger des recherches, Université Paris 6 (March 2002) 37. Shamir, A.: Identity-based cryptosystems and signature schemes. In: Blakely, G.R., Chaum, D. (eds.) CRYPTO 1984. LNCS, vol. 196, pp. 47–53. Springer, Heidelberg (1985) 38. Shamir, A.: An efficient identification scheme based on permuted kernels. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 606–609. Springer, Heidelberg (1990) 39. Shor, P.W.: Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer. SIAM J. Sci. Statist. Comput. 26, 1484 (1997) 40. Stadler, M., Piveteau, J.-M., Camenisch, J.: Fair Blind Signatures. In: Guillou, L.C., Quisquater, J.-J. (eds.) EUROCRYPT 1995. LNCS, vol. 921, pp. 209–219. Springer, Heidelberg (1995) 41. Stern, J.: A new identification scheme based on syndrome decoding. In: Stinson, D.R. (ed.) CRYPTO 1993. LNCS, vol. 773, pp. 13–21. Springer, Heidelberg (1994) 42. Véron, P.: Improved Identification Schemes Based on Error-Correcting Codes. Appl. Algebra Eng. Commun. Comput. 8(1), 57–69 (1996) 43. Wang, X.M.: Digital signature scheme based on error-correcting codes. Electronics Letters (13), 898–899 (1990) 44. Wong, D.S., Fung, K., Liu, J.K., Wei, V.K.: On the RS-Code Construction of Ring Signature Schemes and a Threshold Setting of RST. In: Qing, S., Gollmann, D., Zhou, J. (eds.) ICICS 2003. LNCS, vol. 2836, pp. 34–46. Springer, Heidelberg (2003) 45. Zheng, D., Li, X., Chen, K.: Code-based Ring Signature Scheme. I. J. Network Security 5(2), 154–157 (2007)
Security Analysis of the Proposed Practical Security Mechanisms for High Speed Data Transfer Protocol Danilo Valeros Bernardo and Doan Hoang iNext – Centre for Innovation for IT Services and Applications Faculty of Engineering and Information Technology University of Technology, Sydney [email protected], [email protected]
Abstract. The development of next generation protocols, such as UDT (UDPbased data transfer), promptly addresses various infrastructure requirements for transmitting data in high speed networks. However, this development creates new vulnerabilities when these protocols are designed to solely rely on existing security solutions of existing protocols such as TCP and UDP. It is clear that not all security protocols (such as TLS) can be used to protect UDT, just as security solutions devised for wired networks cannot be used to protect the unwired ones. The development of UDT, similarly in the development of TCP/UDP many years ago, lacked a well-thought security architecture to address the problems that networks are presently experiencing. This paper proposes and analyses practical security mechanisms for UDT. Keywords: Next Generation, GSS-API, High Speed Bandwidth, UDT, HIP, CGA, SASL, DTLS.
Security Analysis of the Proposed Practical Security Mechanisms for High Speed Data
101
from the SDSS project so far has increased to 2 terabytes and continues to grow. Currently, this 2 terabytes of data is being delivered to the Asia-Pacific region, including Australia, Japan, South Korea, and China. Astronomers also want to execute online analysis on multiple datasets stored in geographically distributed locations [17]. This implementation offers a promising direction for future high speed data transfer in various industries. Securing data during its operations across network layers is therefore imperative in ensuring UDT itself is protected when implemented. The challenge to reduce the cost and complexity of running streaming applications over the Internet and through wireless and mobile devices, while maintaining security and privacy for their communication links, continues to mount. The absence of a well-thought security mechanism for UDT in its development, however, drives this paper to introduce ways to secure UDT in a few implementation scenarios. This paper presents application and IP-based mechanisms and a combination of existing security solutions of existing layers that may assist in further enhancing the work earlier presented by Bernardo [8,9,10,11] for UDT. Bernardo [8] presented a framework which adequately addresses vulnerability issues by implementing security mechanisms in UDT while maintaining transparency in data delivery. The development of the framework was based on the analyses drawn from the source codes of UDT found at SourceForge.net. The source codes were analyzed and tested on Win32 and Linux environments to gain a better understanding on the functions and characteristics of this new protocol. Network and security simulations using NS2 [37] and the Evaluation Methods for Internet Security Technology tool (EMIST) developed at the Pennsylvania State University with support from the US Department of Homeland Security and the National Science Foundation [38], were performed. Most of the security vulnerability testing, however, was conducted through simple penetration and traffic load tests. The results provided significant groundwork in the development of a proposal for a variety of mechanisms to secure UDT against various adversaries, such as Sybil, addresses, man-in-the-middle and the most common, DoS attacks. This paper discusses these mechanisms, their simulation and implementation in a controlled environment. The discussion is categorized in the following format. In Section 2 of this paper, the authors present an overview of UDT [17]. More details on UDT and its architecture were discussed by Bernardo in his early works [8,9,10,11]. His works were drawn from and mainly influenced by the works of Gu [17], who developed UDT at the National Data Mining Centre in the US. Also in this section, the descriptions of the proposed security designs and implementation drawn from the initial work performed by Bernardo, and the motivation behind his work, are presented. In Sections 3, 4 and 5, new security approaches are presented. Section 6 discusses the conclusion of the paper and future work.
2 Overview UDT is a connection-oriented duplex protocol [17], which supports data streaming and partial reliable messaging. It also uses rate-based congestion control (rate control)
102
D.V. Bernardo and D. Hoang
and window-based flow control to regulate outgoing traffic. This was designed such that rate control updates the packet sending period every constant interval, whereas flow control updates the flow window size each time an acknowledgment packet is received. It was expanded to satisfy more requirements for both network research and applications development [8-11,17]. This expansion is called Composable UDT and is designed to complement the kernel space network stacks. However, this feature is intended for: • Implementation and deployment of new control algorithms. Data transfer through private links can be implemented using Composable UDT. • Composable UDT supports application-aware algorithms. • Ease of testing new algorithms for kernel space when using Composable UDT compared to modifying an OS kernel. The Composable UDT library implements a standard TCP Congestion Control Algorithm (CTCP). CTCP can be redefined to implement more TCP variants, such as TCP (low-based) and TCP (delay-based). The designers [10] emphasized that the Composable UDT library does not implement the same mechanisms as those in the TCP specification. TCP uses byte-based sequencing, whereas UDT uses packet-based sequencing. This does not prevent CTCP from simulating TCP’s congestion avoidance behavior [8-11,17]. UDT was designed with the Configurable Congestion Control (CCC) interface which is composed of four categories: 1) control event handler call backs, 2) protocol behavior configuration, 3) packet extension, and 4) performance monitoring. Its services/features can be used for bulk data transfer and streaming data processing, unlike TCP which cannot be used for this type of processing because it has two problems. Firstly, in TCP, the link must be clean (little packet loss) for it to fully utilize the bandwidth. Secondly, when two TCP streams start at the same time, the stream with longer RTT will be starved due to the RTT bias problem; thus, the data analysis process will have to wait for the slower data stream [8-11]. UDT, moreover, can cater for streaming video to many clients. It can also provide selective streaming for each client when required, whereas TCP cannot send data at a fixed rate. Additionally, in UDP most of the data reliability control work has to be handled by the application.
3 Related Works Bernardo and Hoang [8-11] present a security framework highlighting the need to secure UDT. The work focuses on UDT’s position in the layer architecture which provides a layer-to-layer approach to addressing security. Its implementation relies on proven security mechanisms developed and implemented on existing mature protocols. A summary of security mechanisms and their implementations are presented in fig. 2. As UDT is at the application layer or on top of UDP, data are required to be transmitted securely and correctly. This is implemented by each application and not by an operating system or by a separate stack [8]. These implementations may be, and often
Security Analysis of the Proposed Practical Security Mechanisms for High Speed Data
103
are, based on generic libraries [17]. The existence of five application-dependent components, such as the API module, the sender, receiver, and UDP channel, as well as four data components: sender’s protocol buffer, receiver’s protocol buffer, sender’s loss list and receiver’s loss list [8], require that UDT security features must be implemented on an application basis.
Fig. 1. UDT in Layer Architecture. UDT is in the application layer above UDP. The application exchanges its data through UDT socket, which then uses UDP socket to send or receive data [8-11].
4 Motivation 4.1 UDT Security Limitations The contention for the need of security mechanisms of the new UDT is derived from four important observations [2,5]. • Absence of an inherent security mechanism, such as checksum for UDT. • Dependencies on user preferences and implementation on the layer on which it is implemented. • Dependencies on existing security mechanisms of other layers on the stack. • Dependencies on TCP/UDP which are dependent on nodes and their addresses for high speed data transfer protocol leading to a number of attacks such as neighborhood, Sybil and DoS (Denial of Service) attacks. The presentation of a security framework for UDT supports the need to minimize its sending rates [8-11] in retransmissions and to introduce its own checksum in its design. It also supports the importance of implementing security in UDT. However, the
104
D.V. Bernardo and D. Hoang
introduction of other security mechanisms to secure UDT is presented to address its vulnerabilities to adversaries exploiting the application, transport, and IP layers. 4.1.1 Possible Security Mechanisms Previous literature [8-11] presented an overview of the basic security mechanisms for UDT. As the research progresses, the following approaches are further developed. UDT is designed to run on UDP and is thus dependent on its existing security mechanisms. Consequently, the designers of the applications that use UDT are faced with limited choices to protect the transmission. This paper proposes the following. Firstly, utilising IPsec RFC 2401; however, for a number of reasons, this is only suitable for some applications [7]. Secondly, designing a custom security mechanism on the application layer using API, such as GSS-API [23,29,47], or a custom security mechanism on IP layer, such as HIP-CGA [1,3,4,5,19,21,25,26,33,43] Thirdly, integrating SASL [31] or DTLS [39] on the transport layer. These approaches can be significant for application and transport layer- based authentication and end-to-end security for UDT. • GSS-API - Generic Security Service Application Program Interface [23,29,47]. • Self-certifying addresses using HIP-CGA. • SASL - Simple Authentication and Security Layer (SASL) [31]. • DTLS – Data Transport Layer Security [39]. • IPsec – IP security [7]. They are also applicable to a combination of the following: * SASL/GSS-API for authentication + channel binding to DTLS, DTLS for data integrity/confidentiality protection. * SASL/GSS-API for authentication + channel binding to IPsec, IPsec for data integrity/confidentiality protection. In this paper, only a brief description of each approach is presented due to space limitations.
5 Securing UDT Bernardo [8-11] presented an overview on securing UDT implementations in various layers. However securing UDT in application and other layers needs to be explored in future UDT deployments in various applications. There are application and transport layer-based authentication and end-to-end [12] security options for UDT. This paper also advocates the use of GSS-API in UDT in the development of an application using TCP/UDP. The use of Host Identity Protocol (HIP), a state of the art protocol, combined with Cryptographically Generated Addresses (CGA) is explored to solve the problems of address-related attacks.
Security Analysis of the Proposed Practical Security Mechanisms for High Speed Data
105
5.1 Host Identity Protocol (HIP) Implementing Host Identity Protocol (HIP) [1,19] is one possible way to secure UDT on top of UDP and IP. This protocol solves the problem of address generation in a different way by removing the dual functionality of IP addresses as both host identifiers and topological locations. In order to do this, a new network layer called the Host Identity is required. Securing IP addresses plays an important role in networking, especially in the transport layer. Generating a secure IP address can be achieved through HIP. It is considered the building block which is used in other protocols, as well as being a way to secure the address generation in practice [19]. Much literature has been published on the various research on HIP since it was first introduced in RFC 4423. This resulted in a number of new experimental RFCs in April of 2008. Host identification is attained by using IP addresses that depend on the topological location of the hosts, consequently overloading them. The main motivation behind HIP is to separate the location and host identification information to minimize stressing IP addresses, since they are identifying both hosts and topological locations. HIP introduces a new namespace, cryptographic in nature, for host identities [19]. The IP addresses continue to be used for packet routing.
Using HIP for UDP/TCP in the transport layer of the new network layer, called Host Identity (HI), protects not only the underlying protocol, but UDT as well, since it is running on top of UDP. HI is placed between the IP and transport layer; see fig. 2. In HIP, the public-key [32,39] of an asymmetric key pair is used as the HI and the host itself is defined as the entity that holds the private-key from the key pair.
106
D.V. Bernardo and D. Hoang
Application and other higher layer protocols are bound to HI instead of an IP address. The prerequisite for HIP implementation should support RSA and DSA [41] for the public-key cryptography. 5.2 Cryptographically Generated Addresses (CGA) Solving the problems of address-related attacks can be also be achieved by using CGA for address and verification. Self-certifying is widely used and standardized, such as by Host Identity Protocol (HIP) [1,19] and Accountable Internet Protocol (AIP) [2]. CGA [3,4] uses the cryptographic hash of the public key. It is a generic method for self-certifying address generation and verification that can be used for specific purposes. In this paper, the conventions are used to either IPv4 or IPv6. The simplified setting for CGA is presented in fig. 3. The interface identifier is generated by taking the cryptographic hash [39] of the encoded public-key of the user. Modern cryptography has functions that produce a message digest with more than the required number of bits in CGA. The interface identifier is formed by truncation of the output of the cryptographic hash function to a specific number of bits depending on the leftmost number of bits that form the subnet prefix, i.e., IPv6 addresses are 128-bit data blocks, therefore the leftmost bits are 64 and the rightmost bits are 64. The prefix is used to determine the location of each node in Internet topology and the interface identifier is used as an identity of the node. Using a cryptographic hash of the public-key is the most effective method to generate self-certifying addresses. In CGA, the assumption is that each node in the network is equipped with a publickey before generating its address and the underlying public-key cryptosystems have no known weaknesses. Similarly, in UDT, the assumption is that its protection is derived from the security controls implemented on existing transport layers. In this paper, the authors focus on the generic attack models that can be adapted to both UDT and CGA.
Fig. 3. Simplified and modified principle of Cryptographically Generated Addresses [3]
5.2.1 HIP-CGA and UDT HIP introduces a new namespace, which is cryptographic in nature, for host identifiers. It introduces a way of separating the location and host identity information.
Security Analysis of the Proposed Practical Security Mechanisms for High Speed Data
107
A hashed encoding of the HI, the HIT, is used in protocols to represent the Host Identity. The HIT is 128 bits long and has the following three properties [1,19]: - It can be used in address-sized fields in APIs and protocols. - It is self-certifying (i.e., given a HIT, it is computationally hard to find a Host Identity key that matches this HIT). - The probability of HIT collision between two hosts is very low. The HITs are self-certifying. This means that no certificates are needed in practice. The protocol used in HIP to establish an IP-layer communications context, called HIP association, prior to communications is called base exchange [5]. The details are briefly summarized below [33]. - Initiator sends a trigger packet (I1) to responder containing the HIT of the initiator and possibly the HIT of the responder if it is known. - Next, the responder sends the (R1) packet which contains a puzzle, a cryptographic challenge that the initiator must solve before continuing the exchange. The puzzle mechanism is to protect the responder from a number of DoS threat, see RFC 5201 [33,34]. R1 contains the initial Diffie-Hellman parameters and a signature, covering a part of the message. - In the I2 packet, the initiator must display the solution to the received puzzle. If an incorrect solution is given, the I2 message is discarded. I2 also contains a Diffie-Hellman parameter that carries needed information for the responder. The packet is signed by the sender. - The R2 packet finalizes the base exchange and the packet is then signed. The base exchange protocol is used to establish a pair of IPsec security associations between two hosts for further communication. HIP introduces a cryptographic namespace for host identifiers to remove the dual functionality of IP addresses as both identifiers and topological locations. When UDT is implemented on top of UDP, its packets are delivered through HIP. With HIP, the transport layer operates on Host identities instead of using IP addresses as end points. At the same time, the network layer uses IP addresses as pure locators. This provides added protection to the transport layer with applications using UDT’s high speed data transmission. With the development of hashed encoding of Host Identifier, a Host Identity Tag can be used in address-sized fields in API’s and protocols, including UDT. The hash is truncated to values which are larger in the case of IPv6 implementation, and hence more secure compared with all security levels of CGA [3,4]. HIP uses base exchange protocol [5] to establish a pair of IPsec security associations between two hosts for further communication. The main challenge of implementing HIP is the requirement of a new network layer, called the Host Identity [1], which is difficult to run with existing networking protocols in use. 5.3 Generic Security Service- Application Program Interface (GSS-API) There are significant application and transport layer based authentication and end-toend security options for UDT. In this paper, the authors also propose - Generic Security Service Application Program Interface (GSS-API).
108
D.V. Bernardo and D. Hoang
The GSS-API is a generic API for carrying out DT client-server authentication. The motivation behind it is that every security system has its own API [26], and the effort in adding different security systems to applications is made extremely difficult by the variance between security APIs. However, with a common API, application vendors could write to the generic API and it could work with any number of security systems, according to [17,29,48]. Vendors can use GSS-API during the UDT implementation. It is considered the easiest to use and implement and implementations exist, such as Kerberos [35]. The Generic Security Service Application Programming Interface provides security services to calling applications. It allows a communicating application to authenticate the user associated with another application, to delegate rights to another application, and to apply security services such as confidentiality and integrity on a per-message basis. Details of GSS-API are discussed in RFC 1964 [29,48]. In summary, the protocol when used in UDT application can be viewed as: • Authenticate (exchange opaque GSS context) through the user interface and CCC option of UDT. • The utilize per-message token functions (GSS-API) to protect UDT messages during transmissions. The GSS-API is a rather large API for some implementations, but for applications using UDT, one need only use a small subset of that API [48]. 5.4 Data Transport Layer Security (DTLS) Another possible mechanism is DTLS. DTLS [40] provides communications privacy for datagram protocols. The protocol allows client/server applications to communicate in a way that is designed to prevent eavesdropping, tampering, or message forgery. The DTLS protocol is based on the Transport Layer Security (TLS) [14] protocol and provides equivalent security guarantees. Datagram semantics of the underlying transport are preserved by the DTLS protocol. DTLS is similar to TLS, but DTLS is designed for datagram transport. High speed data transmission uses datagram transport such as UDP for communication due to the delay-sensitive nature of transported data. The speed of delivery and behavior of applications running UDT are unchanged when DTLS is used to secure communication, since it does not compensate for lost or re-ordered data traffic when applications using UDT running on top of UDP are employed. DTLS, however, is susceptible to DoS attacks. Such attacks are launched by consuming excessive resources on the server by transmitting a series of handshake initiation requests, and by sending connection initiation messages with a forged source of the victim. The server sends its next message to the victim machine, thus flooding it. In implementing DTLS, designers need to include cookie exchange with every handshake during the implementation of applications using UDT and UDP. 5.5 Internet Protocol Security (IPsec) Most protocols for application security, such as DTLS, operate at or above the transport layer. This renders the underlying transport connections vulnerable to denial of
Security Analysis of the Proposed Practical Security Mechanisms for High Speed Data
109
service attacks, including connection assassination (RFC 3552). IPsec offers the promise of protecting against many denial of service attacks. It also offers other potential benefits. Conventional software-based IPsec implementations isolate applications from the cryptographic keys, improving security by making inadvertent or malicious key exposure more difficult. In addition, specialized hardware may allow encryption keys protected from disclosure within trusted cryptographic units. Also, custom hardware units may well allow for higher performance.
Fig. 4. UDT flow using end-to-end security [8-11]. IPsec can be used without modifying UDT and the applications running it.
Implementing UDT running at or above the application layers with IPsec provides adequate protection for data transmission (fig. 4). A datagram-oriented client application using UDT will use the connection-oriented part of its API (because it is using a given datagram socket to talk to a specific server), while the server it is talking to can use the connection-oriented API because it is using a single socket to receive requests from and send replies to a large number of clients. If nothing else works or is possible in the development of APIs, and in introducing other protocols to protect UDT, IPsec may be a last possible option which provides less overhead in the implementation of applications running UDT.
UDP encapsulation of IPsec ESP Packets
Source Port Length
Destination Port Checksum ESP Header (RFC2406)
Fig. 5. Schematic diagram of securing UDT on top of UDP [8-11]
110
D.V. Bernardo and D. Hoang
IPsec can be administered separately and its management can be left to administrators to maintain. It is possible to create a security arrangement to secure UDT connections, such as authentication handled by IPsec. Since it relies on UDP, developers can use UDP encapsulation (see fig. 5) to ensure the connection from UDP is secure. IPsec provides encryption and keying services and offers authentication services; adding ESP extends services to encryption. Specifications on protecting UDP packets can be found on RFC3948.
6 Simulation and Implementation Schemes In observing the behaviors of UDT in both protected and unprotected settings, (1) the Simulated and, (2) the Implementation schemes are constructed. The simulated environment operates separately on ns2 [37] and EMIST [38] to provide internal validation. This environment is used to simulate the behavior of data transmission when UDT is used on top of UDP. A test is performed using a new probabilistic packet marking scheme constituted by 3000 nodes. 1000 attackers are selected randomly. To test and determine the number of packets required to reconstruct the attacking path, the selection of one path from all of the attacking paths and its length is w, w=1,2…30. For each value of w, a simultaneous change of values of w is repeatedly changed until the protocol shows a clear attacking path. This allows the simulation to produce a pattern of the behaviors of UDT without any means of protection. The implementation environment comprises a simple topology. Two honey pot servers (HP1 and HP2) with UDT for windows are installed at two separate locations. They are in a network operating environment running on a 10G pipe trunk 802.1q for tunneling behind firewalls. The attackers are sourced from the Internet. In the first implementation, all traffic is allowed to traverse through any source, destined through any ports on UDP and TCP, and locked to the destination honey pot, where UDT is running on top of UDP. A simple data transfer of 600MB -200GB to another server is then performed. The test is initially performed without any protection. Subsequent tests are performed with the proposed security mechanisms and results are compared. The following protection schemes are attempted: (1) A simple authentication scheme using Kerberos [29,35] for GSS-API on an application running UDT and UDP. (2) IPsec between H1 and H2, running the application within the encrypted tunnel. (3) Using VPN SSL connections and running the applications in H1 and H2.
7 Results The number of attacks in figures 6 and 7 is constant in the implementation scheme. The dropped packets were detected when the IDS/IPS was activated on the firewalls. The simple authentication scheme developed to transfer a file via UDT provided by Kerberos using GSS-API on the UDP socket where UDT was operating provided an
Security Analysis of the Proposed Practical Security Mechanisms for High Speed Data
111
100000 90000 80000 70000
Attacks
60000
unknown traffic
50000 40000 30000 20000 10000 0 0
1000
2000
3000
4000
5000
6000
Fig. 6. Unprotected environment
3500000 3000000 2500000 Attacks
2000000
unknown traffic/DROPPED
1500000 1000000 500000 0 0
1000
2000
3000
4000
5000
6000
Fig. 7. Protected environment
added protection that sources where the location of the authenticating party was located were assumed to be in the protected environment. The trend presented in fig. 7 yields significant improvement. End-to-end transfer of data is transparent to the UDT application. The available security mechanisms for UDT that require minimal application and program development are feasible and predominantly applicable to UDT implementation. In simple file transfer, many available mechanisms for UDP and TCP, and existing security protections for applications are acceptable, i.e., simple authentication. However, for extensive use of UDT, such as SDSS and other large project implementations that require security, UDT requires a security mechanism that is developed and tailored for its behaviors and characteristics based on its design. This paper emphasizes the need, just like the existing mature protocols, for a continuing security evaluation to develop and provide adequate protection, and to maintain integrity and confidentiality against various adversaries and unknown attacks, as well as minimizing dependencies on other security solutions applied on other protocols to ensure minimal overhead in data and message transmission streams. The limitations of the simulation and implementation schemes constructed may be the simplicity of the applications developed for the tests. Experiments are difficult to perform on the following mechanisms: HIP because of the required additional layer HIT, and DTLS + CGA because of lack of resources. These experiments will be performed in future work.
112
D.V. Bernardo and D. Hoang
More extensive development of an application that uses UDT might have yielded more detailed and comprehensive results. Furthermore, the number of false positives and collisions are not considered in the tests. However, the results provide an important indication of how the application which utilizes UDT behaves in such environments.
8 Conclusion and Future Work Protecting UDT can be achieved by introducing approaches related to self-certifying address generation and verification. A technique which can be applied without major modifications in practice is Cryptographically Generated Addresses (CGA). This technique is standardised in a protocol for IPv6. Similarly, HIP solves the problem of address generation in a different way by removing the functionality of IP addresses as both host identifiers and topological locations. However in order to achieve this, a new network layer, called Host Identity, is introduced, which makes HIP incompatible with current network protocols.
Another way of protecting UDT is by using GSS-API in UDT; however, this needs to be thoroughly evaluated by application vendors. The use of the GSS-API interface does not in itself provide an absolute security service or assurance; instead, these attributes are dependent on the underlying mechanism(s) of UDT which support a GSSAPI implementation. In the simulation and implementation schemes, IPsec provides adequate protection on data transfer, and also provides end to end protection on source and destination nodes. In this scheme, the performance of UDT remains the same. More options remain to be explored such as DTLS – Data Transport Layer Security, SASL - Simple Authentication and Security Layer, and their combinations such as SASL/GSS-API for authentication + channel binding to DTLS; DTLS for data integrity/confidentiality protection; SASL/GSS-API for authentication + channel binding to IPsec; IPsec for data integrity/confidentiality protection.
References 1. Al-Shraideh, F.: Host Identity Protocol. In: ICN/ICONS/MCL, p. 203. IEEE Computer Society, Los Alamitos (2006) 2. Andersen, D.G., Balakrishnan, H., Feamster, N., Koponen, T., Moon, D., Shenker, S.: Accountable Internet Protocol (AIP). In: Bahl, V., Wetherall, D., Savage, S., Stoica, I. (eds.) SIGCOMM, pp. 339–350. ACM, New York (2008) 3. Aura, T.: Cryptographically Generated Addresses (CGA). In: Boyd, C., Mao, W. (eds.) ISC 2003. LNCS, vol. 2851, pp. 29–43. Springer, Heidelberg (2003) 4. Aura, T.: Cryptographically Generated Addresses (CGA). RFC 3972, IETF (March 2005) 5. Aura, T., Nagarajan, A., Gurtov, A.: Analysis of the HIP Base Exchange Protocol. In: Boyd, C., González Nieto, J.M. (eds.) ACISP 2005. LNCS, vol. 3574, pp. 481–493. Springer, Heidelberg (2005) 6. Bellovin, S.: Defending Against Sequence Number Attacks. RFC 1948 (1996) 7. Bellovin, S.: Guidelines for Mandating the Use of IPsec. Work in Progress, IETF (October 2003)
Security Analysis of the Proposed Practical Security Mechanisms for High Speed Data
113
8. Bernardo, D.V., Hoang, D.: A Conceptual Approach against Next Generation Security Threats: Securing a High Speed Network Protocol – UDT. In: Proc. IEEE the 2nd ICFN 2010, Shanya China (2010) 9. Bernardo, D.V., Hoang, D.: Security Requirements for UDT. IETF Internet-Draft – working paper (September 2009) 10. Bernardo, D.V., Hoang, D.: Network Security Considerations for a New Generation Protocol UDT. In: Proc. IEEE the 2nd ICCIST Conference 2009, Beijing China (2009) 11. Bernardo, D.V., Hoang, D.: A Security Framework and its Implementation in Fast Data Transfer Next Generation Protocol UDT. Journal of Information Assurance and Security 4(354-360) (2009), ISN 1554-1010 12. Blumenthal, M., Clark, D.: Rethinking the Design of the Internet: End-to-End Argument vs. the Brave New World. Proc. ACM Trans Internet Technology 1 (August 2001) 13. Clark, D., Sollins, L., Wroclwski, J., Katabi, D., Kulik, J., Yang, X.: New Arch: Future Generation Internet Architecture. Technical Report, DoD – ITO (2003) 14. Dierks, T., Allen, C.: The TLS Protocol Version 1.0. RFC 2246 (January 1999) 15. Falby, N., Fulp, J., Clark, P., Cote, R., Irvine, C., Dinolt, G., Levin, T., Rose, M., Shifflett, D.: Information assurance capacity building: A case study. In: Proc. 2004 IEEE Workshop on Information Assurance, U.S. Military Academy, June 2004, pp. 31–36 (2004) 16. Gorodetsky, V., Skormin, V., Popyack, L. (eds.): Information Assurance in Computer Networks: Methods, Models, and Architecture for Network Security, St. Petersburg. Springer, Heidelberg (2001) 17. Gu, Y., Grossman, R.: UDT: UDP-based Data Transfer for High-Speed Wide Area Networks. Computer Networks 51(7) (2007) 18. Hamill, J., Deckro, R., Kloeber, J.: Evaluating information assurance strategies. Decision Support Systems 39(3), 463–484 (2005) 19. H. I. for Information Technology, H. U. of Technology, et al. Infrastructure for HIP (2008) 20. Harrison, D.: RPI NS2 Graphing and Statistics Package, http://networks.ecse.rpi.edu/~harrisod/graph.html 21. Jokela, P., Moskowitz, R., Nikander, P.: Using the Encapsulating Security Payload (ESP) Transport Format with the Host Identity Protocol (HIP). RFC 5202, IETF (April 2008) 22. Joubert, P., King, R., Neves, R., Russinovich, M., Tracey, J.: Highperformance memorybased web servers: Kernel and user-space performance. In: USENIX 2001, Boston, Massachusetts (June 2001) 23. Jray, W.: Generic Security Service API Version 2: C-bindings. RFC 2744 (January 2000) 24. Kent, S., Atkinson, R.: Security Architecture for the Internet Protocol. RFC 2401 (1998) 25. Laganier, J., Eggert, L.: Host Identity Protocol (HIP) Rendezvous Extension. RFC 5204, IETF (April 2008) 26. Laganier, J., Koponen, T., Eggert, L.: Host Identity Protocol (HIP) Registration Extension. RFC 5203, IETF (April 2008) 27. Leon-Garcia, A., Widjaja, I.: Communication Networks. McGraw Hill, New York (2000) 28. Linn, J.: Generic Security Service Application Program Interface Version 2, Update 1. RFC 2743 (January 2000) 29. Linn, J.: The Kerberos Version 5 GSS-API Mechanism. IETF, RFC 1964 (June 1996) 30. Mathis, M., Mahdavi, J., Floyd, S., Romanow, A.: TCP selective acknowledgment options. IETF RFC 2018 (April 1996) 31. Melnikov, A., Zeilenga, K.: Simple Authentication and Security Layer (SASL) IETF. RFC 4422 (June 2006) 32. Menezes, A.J., van Oorschot, P.C., Vanstone, S.A.: Handbook of Applied Cryptography. CRC Press, Boca Raton (1997)
114
D.V. Bernardo and D. Hoang
33. Moskowitz, R., Nikander, P.: RFC 4423: Host identity protocol (HIP) architecture (May 2006) 34. Moskowitz, R., Nikander, R., Jokela, P., Henderson, T.: Host Identity Protocol. RFC 5201, IETF (April 2008) 35. Neuman, C., Yu, T., Hartman, S., Raeburn, K.: Kerberos Network Authentication Service (V5), IETF, RFC 1964 (June 1996) 36. NIST SP 800-37. Guide for the Security Certification and Accreditation of Federal Information Systems (May 2004) 37. NS2, http://isi.edu/nsna/ns 38. PSU Evaluation Methods for Internet Security Technology (EMIST) (2004), http://emist.ist.psu.edu (visited December 2009) 39. Rabin, M.: Digitized signatures and public-key functions as intractable as Factorization. MIT/LCS Technical Report, TR-212 (1979) 40. Rescorla, E., Modadugu, N.: Datagram Transport Layer Security. RFC 4347, IETF (April 2006) 41. Rivest, R.L., Shamir, A., Adleman, L.M.: A method for obtaining digital signature and public-keycryptosystems. Communication of ACM 21, 120–126 (1978) 42. Schwartz, M.: Broadband Integrated Networks. Prentice Hall, Englewood Cliffs (1996) 43. Stewart, R. (ed.): Stream Control Transmission Protocol. RFC 4960 (2007) 44. Stiemerling, M., Quittek, J., Eggert, L.: NAT and Firewall Traversal Issues of Host Identity Protocol (HIP) Communication. RFC 5207, IETF (April 2008) 45. Stoica, I., Adkins, D., Zhuang, S., Shenker, S., Surana, S.: Internet Indirection Infrastructure. In: Proc. ACM SIGCOMM 2002 (August 2002) 46. Szalay, A., Gray, J., Thakar, A., Kuntz, P., Malik, T., Raddick, J., Stoughton, C., Vandenberg, J.: The SDSS SkyServer - Public access to the Sloan digital sky server data. ACM SIGMOD (2002) 47. Wang, G., Xia, Y.: An NS2 TCP Evaluation Tool, http://labs.nec.com.cn/tcpeval.html 48. Williams, N.: Clarifications and Extensions to the Generic Security Service Application Program Interface (GSS-API) for the Use of Channel Bindings. RFC 5554 (May 2009) 49. Globus XIO, http://unix.globus.org/toolkit/docs/3.2/xio/index.html (retrieved on November 1, 2009) 50. Zhang, M., Karp, B., Floyd, S., Peterson, L.: RR-TCP: A reordering-robust TCP with DSACK. In: Proc. the Eleventh IEEE International Conference on Networking Protocols (ICNP 2003), Atlanta, GA (November 2003)
A Fuzzy-Based Dynamic Provision Approach for Virtualized Network Intrusion Detection Systems Bo Li1,*, Jianxin Li1, Tianyu Wo1, Xudong Wu1, Junaid Arshad2, and Wantao Liu1 1
School of Computer Science and Engineering, Beihang University, Beijing, China {libo,lijx,woty,wuxudong,liuwt}@act.buaa.edu.cn 2 School of Computing, University of Leeds, Leeds UK LS2 9JT [email protected]
Abstract. With the increasing prevalence of virtualization and cloud technologies, virtual security appliances have emerged and become a new way for traditional security appliances to be rapidly distributed and deployed in IT infrastructure. However, virtual security appliances are challenged with achieving optimal performance, as the physical resource is shared by several virtual machines, and this issue is aggravated when virtualizing network intrusion detection systems (NIDS). In this paper, we proposed a novel approach named fuzzyVIDS, which enables dynamic resource provision for NIDS virtual appliance. In fuzzyVIDS, we use fuzzy model to characterize the complex relationship between performance and resource demands and we develop an online fuzzy controller to adaptively control the resource allocation for NIDS under varying network traffic. Our approach has been successfully implemented in the iVIC platform. Finally, we evaluate our approach by comprehensive experiments based on Xen hypervisor and Snort NIDS and the results show that the proposed fuzzy control system can precisely allocate resources for NIDS according to its resource demands, while still satisfying the performance requirements of NIDS. Keywords: Network intrusion detection systems, fuzzy control, virtualization, dynamic provision.
Xen and Virtual PC. Virtual security appliances allow users to consolidate and manage security and networking products in a virtualized way which can, therefore, greatly reduce hardware costs and simplify IT management. Virtual security appliances are challenged with achieving optimal performance due to the fact that, in virtualized environments, physical servers are shared by all the virtual machines running on them whereas those resources are dedicated for physical security appliances. Furthermore, workloads are often consolidated in virtualized environments for the efficient use of server resources. However, server sharing will result in resource competition between virtual security appliance and the workload VMs running on the same physical server. The issues described above are aggravated when virtualizing network intrusion detection and prevention systems due to multiple factors. Firstly, a network intrusion detection system (NIDS) often has a sensor that is required to analyze network packets at or near wire speed. However, a failure in this respect can lead to intrusion or malicious behaviors going undetected. Secondly, in order to comprehensively examine network traffic, NIDS need to analyze entire packet and also the payload, thereby consuming large amount of CPU cycles and memory pages. Finally, NIDSs often face varying network traffic; therefore, the resource consumption will fluctuate accordingly. If NIDS is virtualized, the performance of the other virtual servers which share the same physical resource with the NIDS virtual appliance will be affected. Moreover, resource competition will also degrade the performance of NIDS and affect its detection accuracy. To guarantee the performance of virtualized NIDS, one common approach is allocating enough resources to IDS VM according to its maximum resource demand. However, the resource consumption of NIDSs varies with varying network traffic, which leads to resource idleness and waste thereby violating the objective of server consolidation. This, therefore, reflects a tradeoff between resource utilization and accuracy of security appliances in general and NIDS in particular. The emphasis of this paper is to present our efforts to achieve an effective tradeoff with the objective to preserve the performance of security appliances such as NIDS. To improve resource efficiency without sacrificing the performance of NIDS, one way can be to establish a mathematical model to characterize the relationship between workloads1 and resource requirement of NIDS virtual appliance. Consequently, resource allocation can be done dynamically to allocate appropriate resources for NIDS to match the varying network traffic in a real-time manner. Unfortunately, the complex nature of NIDS poses great challenges to accomplish this objective. Firstly, the detection mechanism of NIDS is complex. It involves analyzing every packet based on a number of matching rules which often requires varying processing times. Secondly, a NIDS’s resource usage also depends on the characteristics of network traffic it analyzes. For example, the CPU cycles consumed for inspecting UDP and TCP packets are different; traffic containing more malicious packets will require more processing times than normal traffic for performing alerting and logging. Finally, for a NIDS virtual appliance, the resource usage includes resource consumption by both the NIDS application and the virtual machine hosting the NIDS application. For example, when handling network traffic, the frequent network 1
The workloads of NIDS are usually referred to as network traffic that NIDS analyzes.
A Fuzzy-Based Dynamic Provision Approach Virtualized NIDS
117
I/O operations of NIDS VA also consume lot of CPU cycles. Therefore, it is a nontrivial task to estimate the overall resource consumption of an NIDS appliance. To address the above issues, we propose a dynamic provision approach called fuzzyVIDS based on fuzzy control theory, which can continuously control resource allocation for NIDS virtual appliance to deal with varying network traffic while still satisfying the performance requirements of NIDS. The major contributions are summarized as follows: z z z
We design a fuzzy model to characterize the relationship between workloads, resource demands and performance of NIDS. We develop an online feedback-driven control system to accurately adjust resource allocation while still fulfilling NIDS’s performance requirement in a real-time manner. A prototype of fuzzyVIDS is implemented in our iVIC platform. We leverage the run-time provision function provided by Xen to control the resource allocation for NIDS virtual appliance. Snort [2] NIDS is used to evaluate the effectiveness of our approach. Experimental results show that our feedback fuzzy control approach can effectively allocate resource to NIDS virtual appliance under time-varying network traffic while still satisfying the performance requirements of NIDS.
The remainder of this paper is organized as follows. Section 2 introduces some concepts and problems statement, and Section 3 presents our fuzzy controller for adaptive resource allocation. We introduce the design and implementation of fuzzyVIDS in Section 4. The performance evaluation is given and analyzed in Section 5. We discuss related work about the resource allocation for NIDS virtual appliance in Section 6. Finally, we conclude the whole paper in Section 7.
2 Problem Statement 2.1 Terminology and Assumption In a virtualized environment, NIDS application is encapsulated in a dedicated virtual machine and deployed on a physical server. A physical server runs a virtual machine monitor (VMM) and hosts one NIDS virtual appliance and many virtual servers which host applications. The NIDS virtual appliance is responsible for monitoring incoming and outgoing network traffic of virtual servers. To differentiate, we use the term NIDS VA for the virtual appliance which contains both the NIDS application and the underlying operating system, and NIDS APP to refer to the NIDS application itself. For simplification, we only consider CPU and memory resources. In this paper, we focus on investigating how resource provision will affect the performance of NIDS. With respect to CPU and memory allocation, packet drop rate is the most relevant performance metric. Therefore, we choose it as the performance indicator. In the rest of paper, packet drop rate and performance of NIDS are used interchangeably. We use DR to represent the packet drop rate. Since no NIDS can guarantee zero packet loss, we define an target drop rate TDR. If DR<= TDR, we conclude the performance of NIDS to be satisfactory.
118
B. Li et al.
2.2 Problem Definition If the resource allocation is too small, NIDS will experience performance degradation and allow malicious packets to enter the network undetected. However, if the NIDS is over-provisioned, it will lead to poor resource utilization and resource waste. There is a trade-off between performance and resource requirements, especially for virtual shared environments in which NIDS share physical resources with other virtual servers. So we can conclude that the key problem is to, given a network traffic rate, determine appropriate resources that need to be allocated to the NIDS VA to satisfy the performance of NIDS APP. Fig. 1 shows the model of NIDS Virtual Appliance.
Fig. 1. A Model of NIDS Virtual Appliance
CPU allocation, memory allocation and network traffic affect the performance of NIDS in varying degrees. For example, CPU or memory overload directly leads to packet drops and the characteristics of network traffic that the NIDS analyzes influence the resource usage, so it will also affect drop rate indirectly. While it is very hard to construct a precise mathematical model to characterize the correlation between performance and resource consumption, especially when handling varying network traffic.
3 Fuzzy Controller Instead of designing a mathematical model to characterize the correlation among network traffic load, resource requirements and performance of NIDS virtual appliance, we resort to feedback control techniques to adaptively allocate resources to meet its performance requirements. We have considered traditional feedback control systems, most of which require specifying the mathematical models in advance. However, NIDS virtual appliance is too complex to be represented by a mathematical model. To solve this problem, we use fuzzy models to characterize the complex relationship between performance and resource demands. We design fuzzy logic-based controllers which control the resource allocation based on a set of linguistic IF-THEN rules, thus it is not necessary to have a mathematical model. Our fuzzy controller has three main objectives: (1) guarantee the performance of NIDS virtual appliance; (2) does not impact the performance of applications in other virtual servers (performance isolation); (3) maintain high resource utilization. 3.1 Background A fuzzy control system is a control system based on fuzzy logic[13]. Fuzzy logic is effective in dealing with uncertain, imprecise, or qualitative decision-making
A Fuzzy-Based Dynamic Provision Approach Virtualized NIDS
119
problems, so fuzzy control systems are often used to handle control problems that are too complex to be modeled by conventional mathematical methods. In fuzzy control system, a core concept is fuzzy set whose elements have degrees of membership. The input variables in a fuzzy control system are mapped into fuzzy sets by membership functions. This process is called fuzzification. After fuzzification, the controller makes decisions for what action to take based on some verbal or linguistic rules of if-then form. This process is called inference. After inference, the control actions are translated into a quantifiable output. Since this process is the inverse transformation of fuzzification, it is called defuzzification. 3.2 Fuzzy Modeling As shown in Fig. 2, we use a fuzzy controller to control a NIDS virtual appliance. One input of NIDS is the network traffic it analyzes. This input is not directly under control, but it has impact on the system output. The other input is the resource allocation to the NIDS virtual appliance. The output is the drop rate of NIDS, and it will be fed back to the fuzzy controller. The controller first computes the deviation from current drop rate to the target value, then takes the deviation as input and gives a control action on how much resource will be allocated. The control action will be executed by VMM to adjust the actual resource allocation for the NIDS virtual appliance. From the Fig.3 we can see, the whole process is a feedback control loop, and our controlling goal is continuously adjusts the resource allocation to approximate the drop rate target as far as possible.
Fig. 2. Fuzzy controller for dynamic resource allocation
The key component in the control loop is the fuzzy controller. To design a fuzzy controller, we need to determine six things: z
z
State variable and control variable. The state variable is the input of the controller; the control variable is the output of the controller, and its value will be applied to the NIDS virtual appliance. In our model, the state variable is the drop rate deviation which is the difference between the current drop rate and the target value; the control variable is the resource amount to be allocated to NIDS. Fuzzy Sets. The input and output are normalized on the interval [-1, +1]. As shown in Fig. 3, the space of input and output is partitioned into seven regions. Each region is associated with a linguistic term. The membership function of the fuzzy sets is TRIANGULAR.
120
z z z z
B. Li et al.
Fuzzification. We choose Singleton fuzzifier which measures the state variables without uncertainty. Fuzzy rules. We have conducted some experiments through manually adjusting the resource allocation, finally we collect some knowledge such as: “IF deviation IS small negative THEN cpu_change IS small negative”. Inference method. We choose Mandani inference method for its simplicity. Defuzzification. Center of gravity method is selected for defuzzification.
Fig. 3. Input and output regions
4 System Design and Implementation Before we introduce the design of our fuzzy control system, we will first demonstrate how the NIDS virtual appliance is deployed in the virtual environment. 4.1 The Deployment of NIDS Virtual Appliance There are two typical deployment methods for NIDS virtual appliance. One is in-line protection, the other is off-line protection (also known as port mirroring). For in-line protection method, NIDS virtual appliance also acts as a router, thus will incur additional overhead, so we choose port mirroring method in our implementation. As illustrated in Fig. 4, virtual servers, Peth0 and the vif-n of NIDS virtual appliance are connected with a virtual switch. It should be noted that multiple virtual switches can coexist in one VMM. Peth0 is a physical network interface, and it is connected with the physical network. Any packet the virtual servers send to or receive from the outside world will go through Peth0.
Virtual Server
VMM
Virtual Server
NIDS Virtual Appliance
Virtual Switch
Fig. 4. The deployment of NIDS in virtual environments
NIDS virtual appliance is responsible for monitoring in and out network traffic of virtual servers. It has two virtual network interfaces i.e. vif-n and vif-c. Vif-n is
A Fuzzy-Based Dynamic Provision Approach Virtualized NIDS
121
connected with the mirror port of the virtual switch, and provides us with the ability to set which virtual servers are under the monitoring of NIDS through virtual switch control tools. The in and out packets of the virtual servers under monitoring will be duplicated and forwarded to the NIDS virtual appliance. Once the NIDS application in virtual appliance is started, the vif-n will be set to promiscuous mode so that all packets arriving at vif-n can be captured. vif-c is specially used by NIDS virtual appliance to communicate with the outside worlds, such as transmitting the alert information to database or communicating with the IDS console. To guarantee no disturbance to the whole system, a physical network interface Peth1 is dedicated for vif-c to connect with physical network. 4.2 System Architecture In this section, we propose the FuzzyVIDS framework for dynamically adapting resource allocation for NIDS virtual appliance. The system architecture is illustrated in Fig. 5.
Fig. 5. FuzzyVIDS Architecture
We have chosen Xen as the hypervisor. In Xen, domain 0 is a special privileged domain which serves as an administrative interface to Xen. As shown in Fig.5, there are three modules in domain 0. The performance sensor module periodically collects drop rate statistics from the performance monitor plug-in of NIDS APP, and sends drop rate data to the fuzzy controller module. The fuzzy controller consists of four components. First, the fuzzifier will map the drop rate data into some fuzzy values by given membership functions. Then, the inference engine will infer from the fuzzy values to make decisions and produce the output actions according to the fuzzy rules stored in the Rule base, the output actions is also fuzzy values. Last, the defuzzifier will combine the output values to give a crisp value, the actual resource allocation for the NIDS virtual appliance. The allocation actuator will execute the resource allocation actions made by the fuzzy controller through Xen interface. It is worth noting that the performance sensor and allocation actuator communicate with the fuzzy controller using TCP protocol, which means the fuzzy controller can be located at any place as long as it has network connection with other modules, thus achieves
122
B. Li et al.
good scalability. The enforcement of new resource allocation made by the allocation actuator will impact the performance of NIDS APP inside the NIDS VA, and the performance monitor component of NIDS APP will record the drop rate data and send them to the Performance sensor through XenStore channel. By now, a feedback control loop has finished. 4.3 Implementation Details IVIC2 is a virtual computing environment developed for HaaS and SaaS applications. iVIC enables users to dynamically create and manage various kinds of virtual resources such as virtual machines, virtual clusters and virtual networks. It can also deliver on-demand streaming applications to a client in a real-time manner without on-premise installation. NIDS virtual appliance has been employed by iVIC for virtual network monitoring to reduce the risk of virtual machine intrusion and infection. We implemented the fuzzyVIDS framework in iVIC system to adaptively provision resources for NIDS virtual appliance. Currently, fuzzyVIDS is working on Xen hypervisor, we choose Debian lenny as the operating system for dom 0, NIDS virtual appliance and virtual servers. We use snort NIDS in fuzzyVIDS, since it is the most widely used open source network intrusion prevention and detection system. 4.3.1 Virtual Switch and Port Mirroring Linux Bridge is working with Layer 2 protocol, and it works in a similar manner with physical switch, so we choose Linux Bridge as the virtual switch in our implementation. We modified Linux Bridge to support port mirroring. We add a flag to net_bridge_port struct to indicate whether the network traffic traverses this port will be duplicated and forwarded or not. We add a pointer to net_bridge struct, it points to a bridge port to indicate that this port is the mirror port of the bridge. We add the following codes to do the packets duplicating and forwarding: If (br->dst_port && (p->copy_flag || dst->dst->copy_flag)) if(br->dst_port != p) { copy_skb = skb_copy(skb, GFP_ATOMIC); br_forward(br->dst_port, copy_skb); } 4.3.2 The Implementation of Fuzzy Control System The performance sensor is implemented as a linux daemon. Besides periodically collecting drop rate data, it also gathers some real-time performance statistics of the NIDS virtual appliance through xentop, such as CPU usage and network. The Fuzzy controller is implemented in pure C in consideration of performance. We employ TRIANGULAR membership function to map the input into fuzzy values. The inference engine is implemented using Mandani inference method. Fuzzy rules, together with state variable, control variable and fuzzy sets, are stored in text file format, and loaded into the inference engine when the fuzzy controller is started. 2
Http://www.ivic.org.cn
A Fuzzy-Based Dynamic Provision Approach Virtualized NIDS
123
The allocation actuator is responsible for enforcing the CPU and memory adjustment decisions. For CPU scheduling, we choose Xen credit scheduler in non-working conserving mode, which provides strong performance isolation. Xen balloon driver is used for run-time memory allocation adjustment. We modify the performance monitor plug-in of snort to transmit the real-time performance data outside of the virtual appliance. It is undesirable to use network for data transmission, since it will disturb the detection of NIDS. Now, snort supports two real-time performance data output methods: console and file. We add a third method, xen-channel, which leverages XenStore mechanism to exchange information between domain 0 and NIDS virtual appliance without bothering the network.
5 Experimental Evaluation 5.1 Experimental Environment Setup We conduct a series of experiments to evaluate the effectiveness of our approach. The experimental environment consists of one physical server and several workload generating clients, all interconnected with Gigabit Ethernet. Xen 3.2 is installed on the physical server which has Intel Core2 Duo CPUs and 4G RAM. We prepare a NIDS virtual machine image which encapsulates our modified version of snort 2.7 and mounts a 2GB disk image as the primary partition and a 1GB image for swap partition. NIDS Workloads Generation. We collect network traffic traces to test the performance of Snort. Tcpdump is used to capture and save the network packets into a .pcap trace file. Tcpreplay is used to resend the captured packets from the trace file, and it also provides the function to control the speed at which the traffic is replayed. To impose various loads on Snort, we collect various kinds of network traffic traces. For example, we capture normal network traffic traversing the gateway of our lab; we also use some tools such as nessus, nmap and snot to generate malicious packets and then capture them using tcpdump. 5.2 Experiment Results This section summarizes the experimental evaluation of the suitability of the proposed fuzzy control system for dynamic resource allocation to NIDS virtual appliance with time-varying workloads. Experiment Group1. Before we begin to test the performance of our fuzzy controller, we first evaluate how resource allocation will affect the drop rate of Snort. We set a very low CPU allocation 10% to the NIDS VA, which means that a VM cannot use more than 10% CPU time, even if there are idle CPU cycles. We allocate 192M memory to the NIDS VA. Tcpreplay is used to send 100,000 packets at a speed of 50 mbs from a load-generating client. First, we investigate the CPU allocation, and we expect that the drop rate will be high, since the CPU allocation is very low. But the result is totally beyond our expectation, the drop rate of Snort is only 3.5%. While we notice that the number of packets that Snort captured is 42,378 which is far less than the client has sent. At first we
124
B. Li et al.
think drop rate data produced by Snort is wrong, but through monitoring the Tx and Rx of NIDS VA reported by Proc file system, we notice that the number of received packets is consistent with the number of the packets Snort has captured. We also observed the number of packets that arrived at the bridge port connected with Peth0, and it is almost the same with the number of the packets client has sent. This is to say, some packets have arrived at the bridge, but did not reach the kernel of NIDS VA. We gradually increase the CPU allocation, and the dropped percentage of NIDS VA decreases accordingly. As shown in Table 1, when the CPU allocation reaches 60%, all packets are received by NIDS VA. Therefore, we can come to a conclusion that if the CPU allocation is inadequate, the NIDS VA will also drop packets. A strange phenomenon is that Snort has gained relatively more CPU cycles than NIDS VA (Snort’s drop rate is relatively low). Generally speaking, the operations in Linux networking system are kernel-mode operations, and they cannot be preempted by user-mode application such as Snort. So we had thought that the Snort process will be starved. This phenomenon is probably related with the scheduling strategy of Xen scheduler and Linux networking subsystem. We also observed when the CPU allocation is increased the drop rate reported by Snort also changes. The actual drop rate consists of two parts: the drop rate Snort reports and the drop rate of NIDS virtual appliance. In the following experiments, we calculate the drop rate by summing the two parts. We notice that when the CPU allocation is 100%, Snort’s drop rate is 1.5%. Recall that in the above experiment the memory allocation is 192M, so we increase the memory allocation to 256M and repeat the experiment. The results show that Snort’s drop rate decreases, especially for 80% and 100% CPU allocation, Snort’s drop rate is nearly 0%. Table 1. Drop rate for NIDS VA and Snort under different CPU allocations CPU alloc Dropped by
10%
20%
30%
40%
50%
60%
80%
100%
NIDS VA Snort
58.6 % 5.5%
46.8% 5.0%
20.9% 7.1%
11.3% 4.1%
4.4% 3.9%
0.0% 3.2%
0.0% 1.7%
0.0% 1.5%
Experiment Group2. From the first experiment we can see, the performance of NIDS VA can be improved through adjusting the CPU and memory allocation. In this experiment group, we evaluate the effectiveness and performance of fuzzyVIDS for adaptive CPU allocation. To simulate a resource competition situation, a virtual server is running on the same physical machine with the NIDS VA and CPUburn is running in it to consume CPU as much as possible. As shown in Fig.6, to simulate timevarying workloads, we change the packets sending speeds every 10 seconds. Fig.7 shows the actual CPU allocation obtained from the fuzzy controller when handling varying network traffic. We set three target drop rates (TDR) for the fuzzy controller: 1%, 2% and 3% and we try to figure out the difference of CPU allocations for the three TDRs. First, we can see from Fig. 7, all of them can achieve adaptive CPU allocation to keep up with the time-varying workloads. For 3% TDR, its CPU allocation is smaller than the allocation for 1% and 2% TDRs almost at any time, and it can save about 7% CPU on average compared with 1% TDR.
A Fuzzy-Based Dynamic Provision Approach Virtualized NIDS
Fig. 6. Time-varying workloads
Fig. 8. Transient and accumulated drop rate for 2% drop rate target
125
Fig. 7. CPU allocation under different drop rate targets
Fig. 9. Accumulated drop rate under different drop rate targets
For 1% TDR, the latter part of the curve exhibits more jitters and declines slower compared with 2% and 3% TDR. This is because there exists sudden burst of transient drop rate, which will have a more significant impact for smaller TDR. For example, when encountering sudden burst like 8%, for 1% TDR the deviation is 7%, while for 3% TDR, the deviation is 5%, so the controller will allocate more CPU for 1% TDR than for 3% TDR. We can also infer that from the following fuzzy rule segments: IF deviation IS small negative THEN cpu_change IS small negative IF deviation IS middle negative THEN cpu_change IS middle negative Fig. 8 shows the transient and accumulated drop rate for 2% TDR. We can see that the transient drop rate fluctuates up or down at the TDR, while the accumulated drop rate tends to gradually converge at the TDR. We can also see some transient spikes of drop rate. For example, at the 105th seconds the drop rate is almost 6%. Most of the transient spikes are abnormal and should be filtered out. We set a threshold for transient spikes, only if the current drop rate exceeds the threshold for two successive observation points, it will be fed back to the fuzzy controller. Fig. 9 shows the variation of accumulated drop rates. For 1%, 2% and 3% TDRs, the accumulated drop rates almost converge at their own TDR respectively. Combined with the results shown in Fig. 7, we can see that there is a balance between CPU
126
B. Li et al.
allocation and the performance of NIDS VA, and our fuzzy controller can dynamically control the CPU provision for NIDS VA to maintain the drop rate at a given target value. Experiment Group3. To evaluate the effectiveness of our approach in memory adjustment, we first set the initial memory size of NIDS virtual appliance to 128MB, and we observe that the used swap size reaches 69428KB after Snort has started, which indicates memory resource is under pressure. We use Tcpdump to generate network traffic at a speed of 200Mbit/s to stress the Snort and the target drop rate is set to 0% - a very stringent requirement. Two methods can be used to adjust VM’s memory size in Xen 3.2: “xm mem-set dom_id mem_size” in dom0 and “echo –n ‘mem_size’ > /proc/xen/balloon” in domU. We choose the latter one, since it can allocate memory at the granularity of KB.
Fig. 10. Memory adjustment for NIDS VA
Fig. 11. Transient drop rate of Snort
As shown in Fig.11, Snort experiences a severe performance bottleneck at the beginning of the experiment due to the extreme shortage of memory, and its drop rate even reaches 82.6%. From Fig.10 we can see, the fuzzy controller allocates about 40MB memory in three continuous time intervals and greatly relieves Snort from the performance bottleneck. While at the sixth second, drop rate reaches 20%. This is because the newly allocated memory gets exhausted, and the performance of Snort degrades again. The fuzzy controller continuously adjusts the memory allocation to fulfill the performance of Snort based on the drop rate it observed, and after the 31th second, the drop rate almost maintains at zero. One may find that in this experiment memory allocation is increased all the time, this is because (we found) in Xen 3.2 the memory allocation of a VM cannot be decreased to a value less than 238,592KB. For example, the current memory size is 512MB, and we try to adjust it to 128MB through “xm mem-set”, but the actual memory size can only be shrunk to 238,592KB. It also means that once allocated, the memory is hard to be reclaimed. To avoid resource over-provision, we modified the fuzzy sets and rules to enable a much finer tuning when the drop rate is relatively low. The experimental results show that memory allocation given by the controller can gradually approaches to an appropriate value based on the observed drop rate.
A Fuzzy-Based Dynamic Provision Approach Virtualized NIDS
127
6 Related Work Recently, virtual security appliance has shown great market potential in virtualization and security appliance markets. A recent report [3] from IDC pointed that “virtual security appliance deliver a strong left hook to traditional security appliances” and “radically change the appliance market”. In academic world, researchers have adopted virtual machine technology to enhance the intrusion detection systems. Livewire [4] leverages virtual machine technology to isolate the IDS from the monitored host, while still can enable IDS VM to monitor the internal state of the host through VM introspection technique. Joshi et al. used vulnerability-specific predicates to detect past and present intrusions [5]. When vulnerability is discovered, predicates are determined and used as signatures to detect future attacks. Hyperspector [6] is a virtual distributed monitoring environment used for intrusion detection, and it provides three inter-VM monitoring mechanisms to enable IDS VM to monitor the server VM. In most of the above systems, IDS VM shares the physical resource with the host and the other VMs on the same machine. Sharing will bring resource contention and impact the performance of IDS, but neither of them considered the performance issues. Many research works focus on the performance issues of NIDS. Several proposed NIDSes have been tested in respect of their performance [7][8]. While the approaches mentioned in these papers are only used for performance analysis and evaluation, neither of them considered the relationship between performance and resource usage. Lee et al. [9, 10] proposes dynamic adaptation approaches which can change the NIDS’s configuration according to the current workloads. Dreger et al. proposed a NIDS resource model to capture the relationship between resource usage and network traffic, and use this model to predict the resource demands of different configurations of NIDS [11]. Both of them focus on NIDS configuration adaptation, while the implementation of adaption capability depends on the implementation details of NIDS to some extent, the mechanism implemented in one NIDS may not be fit for others. For example, in [11], it assumes that NIDS system can be structured as a set of subcomponents that work independently. It is a strong hypothesis, since we cannot force all NIDSes to be implemented in the same way. By contrast, fuzzyVIDS leverages feedback fuzzy control mechanism to dynamically provision resources to NIDS application to fulfill its performance requirements without the need to give a model to estimate its resource usage. Xu et al. presented a two-layered approach to manage resource allocation to virtual containers sharing a server pool in a data center [12]. The local controller also uses fuzzy logic, while in this paper fuzzy logic is used to learn the behavior of the virtual container, not for online feedback control. And the proposed method is essentially concerned with server applications, not for NIDS. To our knowledge, fuzzyVIDS is the first system that leverages feedback fuzzy control mechanism to achieve adaptive resource provision for NIDS.
7 Conclusion In this paper, we proposed fuzzyVIDS which is a dynamic resource provision system for NIDS virtual appliance. We use fuzzy models to characterize the complex relationship between performance and resource demands to overcome the absence of
128
B. Li et al.
mathematical model for NIDS virtual appliance. An online fuzzy controller has been developed to adaptively control the resource allocation for NIDS under varying network traffic based on a set of linguistic rules. We have implemented our approach based on Xen VMM and our experiences show that it is a viable solution for the dynamic resource provision of NIDS virtual appliance. There is a lot of work to be done in the future. In this paper, we design the rules of our fuzzy controller manually, while in the future we would like to leverage learning method to learn the behavior of NIDS virtual appliance under varying network traffic and generate the fuzzy rules automatically. Furthermore, we are planning to collect the real production trace of the iVIC system to have a detail performance analysis of our fuzzy control system.
References 1. Virtual security appliance, http://en.wikipedia.org/wiki/Virtual_security_appliance 2. Snort: An open-source network intrusion prevention and detection system by sourcefire, http://www.snort.org/ 3. Virtual Security Appliance Survey: What’s Really Going On? http://www.idc.com/getdoc.jsp?containerId=220767 4. Garfinkel, T., Rosenblum, M.: A Virtual Machine Introspection Based Architecture for Intrusion Detection. In: Proceedings of the 10th Annual Network and Distributed System Security Symposium (February 2003) 5. Joshi, A., King, S.T., Dunlap, G.W., Chen, P.M.: Detecting Past and Present Intrusions through Vulnerability-specific Predicates. In: Proceedings of the 2005 SOSP (October 2005) 6. Kourai, K., Chiba, S.: Hyperspector: Virtual distributed monitoring environments for secure intrusion detection. In: Proceedings of the 1st ACM/USENIX International Conference on Virtual Execution Environments (2005) 7. Paxson, V.: Bro: A System for Detecting Network Intruders in Real-Time. Computer Networks 31(23-24), 2435–2463 (1999) 8. Kruegel, C., Valeur, F., Vigna, G., Kemmerer, R.: Stateful Intrusion Detection for HighSpeed Networks. In: Proceedings of IEEE Symposium Security and Privacy. IEEE Computer Society Press, Calif. (2002) 9. Lee, W., Cabrera, J.B., Thomas, A., Balwalli, N., Saluja, S., Zhang, Y.: Performance Adaptation in Real-Time Intrusion Detection Systems. In: Wespi, A., Vigna, G., Deri, L. (eds.) RAID 2002. LNCS, vol. 2516, p. 252. Springer, Heidelberg (2002) 10. Lee, W., Fan, W., Miller, M., Stolfo, S.J., Zadok, E.: Toward Cost-sensitive Modeling for Intrusion Detection and Response. Journal of Computer Security 10(1-2), 5–22 (2002) 11. Dreger, H., Feldmann, A., Paxson, V., Sommer, R.: Predicting the resource consumption of network intrusion detection systems. In: Lippmann, R., Kirda, E., Trachtenberg, A. (eds.) RAID 2008. LNCS, vol. 5230, pp. 135–154. Springer, Heidelberg (2008) 12. Xu, J., Zhao, M., Fortes, J., Carpenter, R., Yousif, M.: Autonomic resource management in virtualized data centers using fuzzy logic-based approaches. Cluster Comput. J. 11, 213– 227 (2008) 13. Jantzen, J.: Foundations of Fuzzy Control. John Wiley & Sons, Chichester (2007)
An Active Intrusion Detection System for LAN Specific Attacks Neminath Hubballi, S. Roopa, Ritesh Ratti, F.A. Barbhuiya, Santosh Biswas, Arijit Sur, Sukumar Nandi , and Vivek Ramachandran Department of Computer Science and Engineering Indian Institute of Technology Guwahati, India - 781039 {neminath,roopa.s,.ratti,ferdous,santosh biswas,arijit, sukumar}@iitg.ernet.in, [email protected] http://www.iitg.ernet.in
Abstract. Local Area Network (LAN) based attacks are due to compromised hosts in the network and mainly involve spoofing with falsified IP-MAC pairs. Since Address Resolution Protocol (ARP) is a stateless protocol such attacks are possible. Several schemes have been proposed in the literature to circumvent these attacks, however, these techniques either make IP-MAC pairing static, modify the existing ARP, patch operating systems of all the hosts etc. In this paper we propose an Intrusion Detection System (IDS) for LAN specific attacks without any extra constraint like static IP-MAC, changing the ARP etc. The proposed IDS is an active detection mechanism where every pair of IP-MAC are validated by a probing technique. The scheme is successfully validated in a test bed and results also illustrate that the proposed technique minimally adds to the network traffic. Keywords: LAN Attack, Address Resolution Protocol, Intrusion Detection System.
1 Introduction The security and performance considerations in any organization with sizeable number of computers lead to creation of LANs. A LAN is a high-speed communication system designed to link computers and other data processing devices together within a small geographic area, such as department or a building. Security threat to any computer, based on LAN specific attack is always from a compromised machine. The basic step involved in most of these attacks comprise cache poisoning with falsified IP-MAC pairs which may then lead to other attacks namely, man in the middle, denial of service etc [1]. ARP is used by hosts in a LAN to map network addresses (IP) to physical addresses (MAC). ARP is a stateless protocol and so when an ARP reply is received, the host
The work reported in this paper is a part of the project “Design, Development and Verification of Network Specific IDS using Failure Detection and Diagnosis of DES”, Sponsored by Department of Information Technology, New Delhi, INDIA.
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 129–142, 2010. c Springer-Verlag Berlin Heidelberg 2010
130
N. Hubballi et al.
updates its ARP cache without verifying the genuineness of the IP-MAC pair of the source [1]. There are number of solutions proposed in the literature to detect, mitigate and prevent such attacks. The schemes can be broadly classified as: Static ARP Entries[2]: The most foolproof way to prevent ARP attacks is to manually assign static IPs to all systems and maintain the static IP-MAC pairings at all the systems. However, in a dynamic environment this is not a practical solution. Security Features[3]: One possible action to combat ARP attacks is enabling port security (CIS) on the switch. This is a feature available in high-end switches which tie a physical port to a MAC address. These port-address associations are stored in Content Addressable Memory (CAM) tables. A change in the transmitter’s MAC address can result in port shutdown or ignoring the change. The problem with this approach is, if the first sent packet itself is having a spoofed MAC address then the whole system fails. Further, any genuine change in IP-MAC pair will be discarded (e.g., when notified by Gratuitous request and reply). Software based solutions: The basic notion of port security involving observation of changes in IP-MAC pairs in switches have also been utilized in software solutions namely, ARPWATCH [4], ARPDEFENDER [5], COLASOFT-CAPSA [6]. These software solutions are cheaper than switches with port security but have slower response time compared to switches. Obviously, these tools suffer from the drawbacks as that of port security in switches. Signature and anomaly based IDS: Signature based IDSs like Snort [7] can be used to detect ARP attacks and inform the administrator with an alarm. The main problem with IDSs is that they tend to generate a high number of false positives. Furthermore, ability of IDSs to detect all forms of ARP related attacks is limited [8]. Recently, Hsiao et al. [9], have proposed an anomaly IDS to detect ARP attacks based on SNMP statistics. A set of features are extracted from SNMP data and data mining algorithms such as decision tree, support vector machines and bays classifier have been applied to classify attack data from normal data. Reported results show that false negative rates are as high as 40%. Modifying ARP using cryptographic techniques: Several cryptography based techniques have been proposed to prevent ARP attacks namely S-ARP[10], TARP [11]. Addition of cryptographic features in ARP may lead to performance penalty [8]. Also, it calls for upgradation of network stacks of all the hosts in the LAN, which makes the solution nonsalable. Active techniques for detecting ARP attacks: The IDS in active detection of ARP attacks, sends probe packets to systems in the LAN in addition to observations in changes of IP-MAC pairs. In [12], a database of known IP-MAC pairs is maintained and on detection of a change the new pair is actively verified by sending a probe with TCP SYN packet to the IP under question. The genuine system will respond with a SYN/ACK or RST depending on whether the corresponding port is open or closed. While this scheme can
An Active Intrusion Detection System for LAN Specific Attacks
131
validate the genuineness of IP-MAC pairs, it violates the network layering architecture. Moreover it is able to detect only ARP spoofing attacks. An active scheme for detecting MiTM attack is proposed in [13]. The scheme assumes that any attacker involved in MiTM must have IP forwarding enabled. First, all systems with IP forwarding are detected (actively). Once all such systems are detected, the IDS attacks all such systems one at a time and poison their caches. The poisoning is done in a way such that all traffic being forwarded by the attacker reaches the IDS (instead of the system, the attacker with IP forwarding wants to send). So, the IDS can differentiate the real MiTM attackers from all systems with IP forwarding. There are several drawbacks in this approach, namely huge traffic in case of a large network with all machines having IP forwarding, assumption of successful cache poisoning of the machine involved in MiTM attack, cache poisoning (of the the machine involved in MiTM attack by IDS) exactly when the attack is going on etc. So, from the review, it may be stated that an ARP attack preventation/detection scheme needs to have the following features – – – – –
Should not modify the standard ARP or violate layering architecture of network Should generate minimal extra traffic in the network Should not require patching, installation of extra softwares in all systems Should detect a large set of LAN based attacks Hardware cost of the scheme should not be high
In this paper we propose an active IDS for ARP related attacks. The technique involves installation of the IDS in just one system in the network, do not require changes in the standard ARP and do not violate the principles of layering structure as is the case with [12] (while sending active ARP probes). In addition, the IDS has no additional hardware requirements like switches with port security. Our proposed scheme detects all spoofing attempts and in addition, identify the MAC of the attacker in case of successful MiTM attacks. Rest of the paper is organized as follows. In Section 2 we present the proposed approach. In Section 3 we discuss the test bed and experimental results. Finally we conclude in Section 4.
2 Proposed Scheme In this section we discuss the proposed active intrusion detection scheme for ARP related attacks. 2.1 Assumptions The following assumptions are made regarding the LAN 1. Non-compromised (i.e., genuine) hosts will send one response to an ARP request within a specific interval Treq . 2. IDS is running on a trusted machine with static IP-MAC. It has two network interfaces: one is used for data collection in the LAN through port mirroring and the other is exclusively used for sending/receiving ARP probes requests/replies.
132
N. Hubballi et al.
2.2 Data Tables for the Scheme Our proposed scheme ensures the genuineness of the IP-MAC pairing by an active verification mechanism. The scheme sends verification messages termed as probe requests upon receiving ARP requests and ARP replies. To assist in the probing and separating the genuine IP-MAC pairs with that of spoofed ones, we maintain some information obtained along with the probe requests, ARP requests and ARP replies in some data tables. The information and the data tables used are enumerated below. Henceforth in the discussion, we use the following short notations: IP S - Source IP Address, IP D - Destination IP Address, M ACS - Source MAC Address, M ACD - Destination MAC Address. Fields of any table would be represented by T ableN amef ield; e.g., RQTIP S represents the source IP filed of “Request table. Also, T ableN ameMAX represents the maximum elements in the table at a given time. 1. Every time an ARP request is sent from any host querying some MAC address, an entry is created into the “Request table (denoted as RQT )” with source IP (RQTIP S ), source MAC (RQTMACS ) and destination IP(RQTIP D ). Also the time τ when the request was received is recorded in the table as RQTτ . Its entries timeout after Treq seconds. The value of Treq will depend on the ARP request-reply round trip time, which can be fixed after a series of experiments on the network. According to [14], the approximate ARP request-reply round trip time in a LAN is about 1.2 ms - 4.8 ms. 2. Every time an ARP reply is received from any host replying with the MAC address corresponding to some IP address, an entry is created in the “Response table (denoted as RST )” with source IP (RSTIP S ), source MAC (RSTMACS ), destination IP (RSTIP D ) and destination MAC (RSTMACD ). Also the time when the response was received is recorded in the table. Its entries timeout after Tresp seconds. The Tresp value can be determined based on the maximum ARP cache timeout value of all the hosts in the LAN. 3. When some IP-MAC pair is to be verified, an ARP probe is sent and response is verified. The probe is initiated by IDS, upon receiving either a Request or a Response. The source IP address and the source MAC address from the Request/ Response packets used for verification are stored in “Verification table (denoted as V RF T )”. The entries in this table are source IP (V RF TIP S ) and source MAC (V RF TMACS ). 4. Every time any IP-MAC pair is verified and found to be correct, an entry is created for the pair in the “Authenticated bindings table (denoted as AU T HT )”. There are two fields in this table, IP address (AU T HTIP ) and MAC address (AU T HTMAC ) 5. Every time a spoofing attempt or an unsolicited response is detected, an entry is created in the “Log table (denoted as LT )” with source IP(LTIP S ), source MAC (LTMACS ), destination IP(LTIP D ) and destination MAC (LTIP D ). Also the time when the spoof or unsolicited response was detected is recorded in the table as LTτ . The table has same fields as that of the Response table. 6. “Unsolicited response table (denoted as U RSP T )” is used for storing the number of unsolicited responses received by each host within a specified time interval (δ). Every time an Unsolicited response is received, an entry is created in the Unsolicited response table with destination IP (U RSP TIP D ), time (U RSP Tτ )when
An Active Intrusion Detection System for LAN Specific Attacks
133
the unsolicited response was received and unsolicited response counter (U RSP Tcounter ) for each IP. In general, ARP replies are received corresponding to the ARP requests. If unsolicited responses are observed in the network traffic, it implies an attempt of ARP attack. On receiving any such unsolicited ARP reply by a host, the corresponding unsolicited response counter is incremented along with the time stamp (in the Unsolicited response table). The entries may timeout after Tunsolicit which can be fixed after considering the maximum cache timeout period for all the hosts in the network. However, there is an exception to the fact that all unsolicited ARP replies are attempts for attack. Gratuitous ARP responses are unsolicited which are generated by systems at startup to notify the network, its IP-MAC address. Gratuitous ARP responses are not entered into the table and are handled separately. The proposed attack detection technique has two main modules namely, ARP REQUESTHANDLER() and ARP RESPONSE-HANDLER(). These are elaborated in Algorithm 1 and Algorithm 2, respectively. Algorithm 1 process all the request ARP packets in the network. For any ARP request packet RQP , it first checks if its is malformed (i.e., any changes in the immutable fields of the ARP packer header or different MAC addresses in the MAC and ARP header field) or unicast; if so, a status flag is set accordingly and stops further processing of this packet. If the packet is not unicast or malformed, but a request packet from (IDS) i.e., RQPIP S is IP of IDS and RQPMACS is MAC of IDS, Algorithm 1 skips processing of this packet; we do not consider ARP request from IDS as we assume that IP-MAC pairing of the IDS is known and validated. If the ARP request is not from IDS, the source IP (RQPIP S ), source MAC (RQPMACS ), destination IP (RQPIP D ) and time τ when the request packet is received is recorded in Request table. The algorithm next finds whether the packet received is a Gratuitous ARP request and the status flag is set accordingly. Gratuitous ARP request can be determined if RQPIP S == RQPIP D . For such Gratuitous ARP request, ARP probe is sent for checking the correctness of the IP-MAC pair. Hence, the VERIFY IP-MAC() module is called for RQP along with τ (the time information when RQP was received). If none of the above cases are matched, then RQPIP S is searched in the Authenticated bindings table. If a match is found as AU T HTIP S [i](where i is the ith entry in the AU T HT ) and the corresponding MAC address AU T HTMACS [i] in the table is same as RQPMACS , the packet has genuine IP-MAC pair which is already recorded in the Authenticated bindings table. In case of a mismatch in the MAC address (i.e., = AU T HTMACS [i]) the packet is spoofed with a wrong MAC address RQPMACS and hence the status flag is set as spoofed. Also, the details regarding the non-genuine RQP is stored in the Log table. It may be noted that this checking of spoofing could be done without ARP probe thereby reducing ARP traffic for verification. Also, it may be the case that IP-MAC pair given in RQPIP S is not verified as yet and no match can be found in Authenticated bindings table. In such a case, ARP probe is to be sent by IDS to RQPIP S and RQPMACS for verifying the correctness of RQPIP S RQPMACS pair. This is handled by the VERIFY IP-MAC() module with RQP and τ as parameters.
if (RQP is malf ormed) then Status=malf ormed else if (RQP is U nicast) then Status=U nicast else if (RQPIP S == IP (IDS) && RQPMACS == M AC(IDS)) then EXIT else ADD RQPIP S , RQPMACS , RQPIP D and τ to the Request table if (RQPIP S == RQPIP D ) then Status=Gratutious Packet VERIFY IP-MAC(RQP , τ ) else if (RQPIP S == AU T HTIP S [i] (for some i, 1 ≤ i ≤ AU T HTMAX ) then if (RQPMACS == AU T HTMACS [i]) then Status= Genuine else Status=Spoof ed ADD RQPIP S , RQPMACS , RQPIP D , N U LL, and τ to the Log table end if else VERIFY IP-MAC(RQP , τ ) end if end if end if
Algorithm 2 is an ARP response handler. For any ARP reply packet RSP , the algorithm determines whether the reply is malformed; if malformed, a status flag is set accordingly and the next packet is processed. Otherwise, the source IP (RSPIP S ), source MAC (RSPMACS ), destination IP (RSPIP D ) and timestamp τ of the received packet are recorded in Response table. Next, it verifies whether the packet is a Gratuitous ARP reply by checking if RSPIP S == RSPIP D . For such Gratuitous ARP reply, ARP probe is sent to check the correctness of the IP-MAC pair. Hence, the VERIFY IP-MAC() module is called. If the reply packet is not Gratuitous, next it verifies if it is a reply for any ARP probe sent by the VERIFY IP-MAC() module (i.e., ARP probe by IDS). The response for the ARP probe can be determined if RSPIP D == IP (IDS) and RSPIP S has an entry in the Verification table. For such such response packets, Algorithm 2 calls SPOOFDETECROR() module. If none of the above cases holds, the reply packet is then matched for a corresponding request in the Request table, using its source IP. If a match is found (i.e.,
An Active Intrusion Detection System for LAN Specific Attacks
135
RSPIP S == RQTIP D [i]), the RSPIP S is searched in the Authenticated bindings table. If a match is found and the corresponding MAC address in the table is same as RSPMACS , the packet has genuine IP-MAC pair (which is already recorded in the Authenticated bindings table). In case of a mismatch in the MAC address (i.e., = AU T HTMACS [j]) the packet may be spoofed with a wrong MAC RSPMACS address and hence the status flag is set as spoofed. Also, the details regarding the non-genuine RSP is stored in the Log table. If the RSPIP S is not present in the Authenticated bindings table, then an ARP probe is sent for verification by the VERIFY IP-MAC() module. If there was no corresponding request for the response packet in the Request table, then it is an unsolicited response packet. Hence, the UNSOLICITED-RESPONSEHANDLER() is called with the destination IP of the RSP and τ . Also this entry is added into Log table in order to check the MiTM attempts. Algorithm 2: ARP RESPONSE HANDLER Input : RSP - ARP response packet, τ - time at which RSP was received, Request table, Verification table, Authenticated bindings table Output: Updated Response table, Updated Log table, Status 1: if RSP is malf ormed then 2: Status= malf ormed 3: else 4: Add RSPIP S , RSPMACS , RSPIP D , RSPMACD and τ to Response table 5: if (RSPIP S == RSPIP D ) then 6: Status= Gratuitous 7: VERIFY IP-MAC(RSP , τ ) 8: else 9: if (RSPIP D == IP (IDS) && (RSPIP S == V RF TIP S [k]))(for some k, 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:
1 ≤ k ≤ V RF TMAX )) then EXIT else if (RSPIP S == RQTIP D [i] (for some i, 1 ≤ i ≤ RQMAX )) then if (RSPIP S == AU T HTIP S [j] (for some j, 1 ≤ j ≤ AU T HTMAX )) then if (RSPMACS == AU T HTMACS [j]) then Status= Genuine else Status=Spoof ed Add RSPIP S , RSPMACS , RSPIP D , RSPMACD and τ to Log table end if else VERIFY IP-MAC(RSP , τ ) end if else
136
N. Hubballi et al.
Add RSPIP S , RSPMACS , RSPIP D , RSPMACD and τ to Log table 24: 25: UNSOLICITED-RESPONSE-HANDLER(RSPIP D , τ ) 26: end if 27: end if 28: end if 29: end if The main modules discussed in Algorithms1 and Algorithm 2 are assisted by three sub-modules namely, VERIFY IP-MAC(), SPOOF-DETECTOR() and UNSOLICITEDRESPONSE-HANDLER(). Now, we discuss these sub-modules in detail. VERIFY IP-MAC() (Algorithm 3) sends ARP probes to verify the correctness of the IP-MAC pair given in the source of the request packet RQP or response packet RSP . Every time a probe is sent, its record is inserted in Verification table. Before, sending the ARP probe request, we need to verify if there is already such a request made by the IDS and response is awaited. This can be verified by checking IP and MAC in the Verification table; if a match pair is found the module is exited. A spoofing may be attempted if IP matches the entry in the Verification table but MAC does not; in this case, the status is set as spoofed and Log table is updated. This checking in the Verification table (before sending probe) limits the number of ARP probes to be sent for any known falsified IP-MAC address, thereby lowering extra ARP traffic. If the corresponding IP address is not found in the Verification table, a probe request is sent and the algorithm adds the IP and the MAC into the Verification table. At the same time SPOOF-DETECTOR() module is called which waits for a round trip time and analyzes all entries in the Response table collected during the round trip time (as replies against the probe). Algorithm 3: VERIFY IP-MAC Input : RP - ARP request/reply packet, τ - time of arrival of RSP , Verification table Output: Updated Verification table, Status 1: if (RPIP S ∈ V RF TIP S [i]) (for some i, 0 ≤ i ≤ V RF TMAX ) then 2: if (RPMACS == V RF TMACS [i]) then 3: EXIT 4: else 5: Status=Spoof ed 6: Add RPIP S , RPMACS , RPIP D , RPMACD and τ to Log table 7: end if 8: else 9: Send ARP P robe Request to RPIP S 10: Add RPIP S and RPMACS to the Verification table 11: SPOOF-DETECTOR(RP , τ ) 12: end if
SPOOF-DETECTOR() (Algorithm 4) is called from VERIFY IP-MAC() after sending the ARP P robe Request to source IP of the packet to be checked for spoofing
An Active Intrusion Detection System for LAN Specific Attacks
137
(RPIP S ). As discussed, it is assumed that all replies to the ARP probe will be sent within Treq time. So, SPOOF-DETECTOR() waits for Treq interval of time, thereby collecting all probe responses in the Response table. As it is assumed that non-comprised hosts will always respond to a probe, at least one response to the probe will arrive. In other words, in one of the replies to the probe, genuine MAC for the IP RPIP S would be present. Following that, Response table will be searched to find IP-MAC (source) pairs having IP of RPIP S . If all IP-MAC pairs searched have same MAC, packet under question is not spoofed. In case of the packet being spoofed, more than one reply will arrive for the probe, one with genuine MAC and the other with spoofed MAC. The reason for assuming more than one replies in case of spoofing is explained as follows. Let a packet be spoofed as IP(of B)-MAC(of D). Now for the ARP probe to B, B will reply with IP(of B)-MAC(of B) leading to tracking the attacker (MAC (of D)). To avoid self identification, attacker D has to reply to all queries asking for B with spoofed IP-MAC pair IP(B)-MAC(D). The IDS has no clue whether IP(B)-MAC(D) or IP(B)-MAC(D) is genuine; only possibility of spoofing is detected. If spoofing attempt is determined Log table is updated. If the packet is found genuine, Authenticated bindings table is updated with its source IP (RPIP S ) and the corresponding MAC. Algorithm 4: SPOOF-DETECTOR Input : RP - ARP request/reply packet, Treq - Time required for arrival of all responses to an ARP probe (ARP request-reply round trip time), Response table Output: Updated Authenticated bindings table, Updated Log table, Status, 1: Wait for Treq time interval 2: if (RPIP S == RSTIP S [i]) & & (RPMACS = RSTMACS [i])(for some i, 1 ≤ i ≤
RSTMAX ) then 3: Status=Spoof ed 4: Add RPIP S , RPMACS , RPIP D , RPMACD and τ to Log table 5: Add RSTIP S , RSTMACS , RSTIP D , RSTMACD and τ to Log table 6: EXIT 7: end if 8: Update Authenticated bindings table with RPIP S ,RPMACS
UNSOLICITED-RESPONSE-HANDLER() (Algorithm 5) is invoked whenever an unsolicited ARP reply packet is received (i.e., ARP reply packet did not find a matching ARP request in the Request table) and is used for detection of denial of service attacks. Entries in the Unsolicited response table maintains the number of unsolicited responses received against individual IP addresses along with the timestamp of the latest such reply. This algorithm finds out whether the IP address against which unsolicited reply is received currently has a matching entry in the Unsolicited response table. This algorithm declares the detection of DoS attack if the number of unsolicited ARP replies against a particular IP, within a time interval (δ) exceeds a preset threshold DoST h . If a matching entry is not found a new entry is made into the Unsolicited response table.
138
N. Hubballi et al.
Algorithm 5: UNSOLICITED-RESPONSE-HANDLER Input : IP - Destination IP of the RSP , τ - Time when RSP is received, δ- Time window, DoST h - DoS Threshold, Unsolicited response table Output: Status 1: if (IP == U RSP TIP D [i]), (for some i, 1 ≤ i ≤ U RSP TMAX ) then 2: if (τ − U RSP Tτ [i] < δ ) then 3: U RSP Tcounter [i]++ 4: U RSP Tτ [i]= τ 5: if (U RSP Tcounter [i] > DoST h ) then 6: Status=DoS 7: EXIT 8: end if 9: else 10: U RSP Tcounter [i]=1 11: U RSP Tτ [i]= τ 12: end if 13: else 14: ADD IP , τ and 1 to the Unsolicited Response table 15: end if
Algorithms 1 - Algorithm 5 can detect spoofing, malformed APR packets, and denial of service attacks. To detect man in the middle attack another module MiTMDETECTOR() is used and is discussed next. This module needs to scan through the Log table at certain intervals to determine man in the middle attacks. As spoofing or unsolicited replies are required for MiTM [1], the module MiTM-DETECTOR() is invoked whenever a spoofing is detected or an unsolicited response is received. In either of these two cases an entry is added to the Log table and MiTM-DETECTOR() is invoked with its source IP, source MAC and destination IP. This module analyzes all the entries in the Log table to detect the possible MiTM attacks (as each spoofing attempt or unsolicited replies are recorded in the Log table). If for a particular source IP - destination IP pair, there is another record having the destination IP- source IP pair (flipped version of earlier one) with same MAC address, within a particular time interval TMiT M , then it detects the possibility of MiTM attack and the associated attacker’s MAC. The algorithm first determines a subset of entries of the Log table whose source MAC matches the source MAC of APR packet last added to the Log table. Also, only those entries of the Log table are considered which have arrived within (and not including) TMiT M time of the arrival of the ARP packet last added. Thus, we obtain a subset of the Log table as LT . Then, if there is an entry in LT where the source IP matches the destination IP of the packet last added and the destination IP of LT matches the source IP of the packet last added, MiTM is detected. Algorithm 6: MiTM-DETECTOR() Input : IP S - Source IP of the entry added to Log table, M ACS- Source MAC of the entry added to Log table, IP D - Destination IP of the entry added to Log table, τ - time
An Active Intrusion Detection System for LAN Specific Attacks
139
when the entry was added in Log table, TMiT M - Time interval for arrival of packets causing MiTM, Log table Output: Status 1: LT = {LT |∀i (LTMACS [i] == M ACS) && (τ - LTτ ) < TMiT M } 2: if (LTIP S [j] == IP D) && (LTIP D [j] == IP S),(for any j, 1 ≤ j ≤ LTMAX )
then
3: Status=M iT M and “attacker is M ACS” 4: end if
2.3 An Example In this sub-section we illustrate ARP reply verification in normal and spoofed cases. Here, the network has five hosts A, B, C, D and E; E is the third party IDS and D is the attacker. Port mirroring is enabled at the switch so that E has a copy of all outgoing and incoming packets from all ports. Also, E has a network interface to solely send ARP probes and receive ARP probes replies. PS 2: PRQP
A
PS 5: PRQP
SP
PS 6: PRSP
R 4: PS
PS 1: RSP
B
E
PS 3: PRSP
D PS 7: PRSP
Fig. 1. Example of a Normal and Spoofed Reply
Figure 1 shows the sequence of packets (indicated with packet sequence numbers) injected in the LAN when (i) A is sending a genuine reply to B with IP(A)-MAC(A) followed by ARP probe based verification (of the reply), (ii) attacker D is sending a spoofed reply as “IP(B)-MAC(D) ” to host A and its verification. The sequences of packets as recorded in Request table, Response table, Verification table and Authenticated bindings table are shown in Table 1 - Table 4. Genuine reply from A to B and its verification – Packet Sequence (PS) 1: Reply is sent by A to B for informing its MAC address (to B). Response table is updated with a new entry corresponding to the request packet sent by A . – Packet Sequence 2: Since there is no entry for IP-MAC of A in Authenticated bindings table, E will send an ARP Probe to know MAC of A and entry is made in the Verification table.
140
N. Hubballi et al.
– Packet Sequence 3: Following PS 2, SPOOF-DETECTOR() starts. Within Treq only A will respond to this ARP Probe request and Authenticated bindings table is updated with the valid entry of IP-MAC of A. Spoofed reply from D to A and its verification – Packet Sequence 4: Let D respond to A with IP of B and its own MAC (D), which is recorded in the Response table. – Packet Sequence 5: Since there is no entry for IP-MAC of B in Authenticated bindings table therefore E will send an ARP probe to know B’s MAC. IP (B)-MAC(D) is entered in the Verification table. – Packet Sequence 6 and 7: SPOOF-DETECTOR() is executed. Within Treq , both B and attacker D will respond to the ARP Probe request (sent to know MAC of B) with their own MACs. These responses are recorded in the Response table. There are two entries in Response table for IP(B), one is MAC of B and other is MAC of D. So response spoofing is detected. Table 1. Request table PS SRC IP SRC MAC Dest IP -
Table 2. Response table PS 1 3 4 6 7
SRC IP IP A IP A IP B IP B IP B
SRC MAC DEST IP DEST MAC MAC A IP B MAC B MAC A IP E MAC E MAC D IP A MAC A MAC D IP E MAC E MAC B IP E MAC E
Table 3. Verification table
Table 4. Authenticated bindings table
PS IP MAC 2 IP A MAC A 5 IP B MAC D
PS MAC IP A MAC A
3 Experimentation The test bed created for our experiments consists of 5 machines running different operating systems. We name the machines with alphabets ranging from A-E. Machines A-E are running the following OSs: Windows Xp, Ubuntu, Windows 2000, Backtrack 4 and Backtrack 4, respectively. The machine D with Backtrack 4 is acting as the attacker machine and machine E is set up as the IDS. These machines are connected in a LAN with a CISCO catalyst 3560 G series switch [15] with port mirroring enabled for system E. The tables mentioned above are created in mysql database. The algorithms are implemented using C language. The IDS has two preemptive modules namely, packet grabber and packet injector. Packet grabber sniffs the packets from the network, filters ARP packets and invoke either the Algorithm 1 or Algorithm 2 depending upon the
An Active Intrusion Detection System for LAN Specific Attacks
141
Table 5. Comparison of ARP Attack Detection Mechanisms ATTACKS PROPOSED ACTIVE [12] COLASOFT [6] ARPDEFENDER [5] ARP spoofing Y Y Y Y ARP MiTM Y N N Y ARP DoS Y N Y N Network Scanning Y N N N Malformed Packets Y Y N N MAC Flooding Y N N Y
packet type. Packet injector generates the ARP probes necessary for verification of IPMAC pairs. Attack generation tools Ettercap, Cain and Abel were deployed in machine D and several scenarios of spoofing MAC addresses were attempted. In our experiments we tested our proposed scheme with several variants of LAN attack scenarios (including the one discussed in the example above). Table 5 presents the types of LAN attacks generated and detected successfully by the proposed scheme. Also, in the table we report the capabilities of other LAN attack detecting tools for these attacks. 900
ARP traffic without IDS running ARP traffic with IDS running but no attack ARP traffic without IDS running but with attack ARP traffic with IDS running and with attack
800
Number of ARP packets
700 600 500 400 300 200 100 0 10
20
30 Time in Seconds
40
50
60
Fig. 2. ARP traffic
Figure 2 shows the amount of ARP traffic generated in the experimentation in 4 cases. The first case is of normal operation in the absence of the IDS. Second case is when the IDS is running and there are no attacks generated in the network. Third case is when we injected 100 spoofed IP-MAC pairs into the LAN and IDS is not running. Fourth case is when we injected 100 spoofed IP-MAC pairs into the LAN with IDS running. We notice almost same amount of ARP traffic under normal situation with and without IDS running. Once genuine IP-MAC pairs are identified (by probing) they are stored in Authenticated bindings table. Following that no probes are required to be sent for any ARP request/reply from these IP-MAC pairs. In case of attack, a little extra
142
N. Hubballi et al.
traffic is generated by our IDS for the probes. With each spoofed ARP packet, our IDS sends a probe request and expects at least two replies (one from normal and the other from the attacker), thereby adding only three APR packets for each spoofed packet.
4 Conclusion In this paper we presented an IDS for detecting a large class of LAN specific attacks. The scheme uses an active probing mechanism and does not violate the principles of network layering architecture. This being a software based approach does not require any additional hardware to operate. At present the scheme can only detect the attacks. In other words, in case of spoofing it can only determine the conflicting IP-MAC pairs without differentiating the spoofed IP-MAC and genuine IP-MAC pair. If to some extent diagnosis capability can be provided in the scheme, some remedial action against the attacker can be taken.
References 1. Held, G.: Ethernet Networks: Design, Implementation, Operation, Management, 1st edn. John Wiley & Sons, Ltd., Chichester (2003) 2. Kozierok, C.M.: TCP/IP Guide, 1st edn. No Starch Press (October 2005) 3. Cisco Systems PVT LTD: Cisco 6500 catalyst switches 4. Arpwatch, http://www.arpalert.org 5. Arpdefender, http://www.arpdefender.com 6. Colasoft capsa, http://www.colasoft.com 7. Snort: Light weight intrusion detection, http://www.snort.org 8. Abad, C.L., Bonilla, R.I.: An analysis on the schemes for detecting and preventing arp cache poisoning attacks. In: ICDCSW 2007: Proceedings of the 27th International Conference on Distributed Computing Systems Workshops, Washington, DC, USA, pp. 60–67. IEEE Computer Society, Los Alamitos (2007) 9. Hsiao, H.W., Lin, C.S., Chang, S.Y.: Constructing an arp attack detection system with snmp traffic data mining. In: ICEC 2009: Proceedings of the 11th International Conference on Electronic Commerce, pp. 341–345. ACM, New York (2009) 10. Gouda, M.G., Huang, C.T.: A secure address resolution protocol. Comput. Networks. 41(1), 57–71 (2003) 11. Lootah, W., Enck, W., McDaniel, P.: Tarp: Ticket-based address resolution protocol, pp. 106– 116. IEEE Computer Society, Los Alamitos (2005) 12. Ramachandran, V., Nandi, S.: Detecting arp spoofing: An active technique. In: Jajodia, S., Mazumdar, C. (eds.) ICISS 2005. LNCS, vol. 3803, pp. 239–250. Springer, Heidelberg (2005) 13. Trabelsi, Z., Shuaib, K.: Man in the middle intrusion detection. In: Globecom, San Francisco, California, USA, pp. 1–6. IEEE Communication Society, Los Alamitos (2006) 14. Sisaat, K., Miyamoto, D.: Source address validation support for network forensics. In: JWICS ’06: The 1st Joint Workshop on Information security, pp. 387–407 (2006) 15. CISCO Whitepaper, http://www.cisco.com
Analysis on the Improved SVD-Based Watermarking Scheme Huo-Chong Ling1 , Raphael C.-W. Phan2 , and Swee-Huay Heng1 1
Research Group of Cryptography and Information Security, Centre for Multimedia Security and Signal Processing Multimedia University, Malaysia {hcling,shheng}@mmu.edu.my 2 Loughborough University, LE11 3TU, United Kingdom [email protected]
Abstract. Watermarking schemes allow a cover image to be embedded with a watermark image, for diverse applications including proof of ownership or for image hiding. In this paper, we present analysis on the improved SVD-based watermarking scheme proposed by Mohammad et al. The scheme is an improved version of the scheme proposed by Liu and Tan and is claimed to be able to solve the problem of false-positive detection in Liu and Tan scheme. We show how the designers’ security claims, related to robustness and proof of ownership application can be invalidated. We also demonstrate a limitation in Mohammad et al. scheme which degrades the whole watermarked image. Keywords: Singular value decomposition; watermarking; attacks; robustness; proof of ownership; false-positive detection.
1
Introduction
Most information, documents and contents these days are stored and processed within a computer in digital form. However, since the duplication of digital content results in perfectly identical copies, the copyright protection issue is a main problem that needs to be addressed. A watermarking scheme [1,2] is one where it is desired to protect the copyright of a content owner by embedding the owner’s watermark into the content. In order to prove the ownership of the watermarked content, the owner takes the case of ownership claim to the authority, and proves ownership by performing the watermark detection process on the claimed content to extract his watermark. Therefore, robustness of the watermarking scheme is an important factor, i.e. it should be infeasible for an attacker to remove, modify or prevent the extraction of an embedded watermark without visible distortions of the image. In this paper, we concentrate on singular value decomposition (SVD)-based watermarking schemes. SVD is a linear algebra scheme that can be used for many applications, particularly in image compression [3], and subsequently for image watermarking [1,2]. Using SVD, it is possible to get an image that is T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 143–149, 2010. c Springer-Verlag Berlin Heidelberg 2010
144
H.-C. Ling, R.C.-W. Phan, and S.-H. Heng
indistinguishable from the original image, but only using 45% of the original storage space[3]. For an N -by-N image matrix A with rank r ≤ N, the SVD of r A is defined as A = U SV T = ui si viT where S is an N -by-N diagonal matrix i=1
containing singular values (SVs) si satisfying s1 ≥ s2 ≥ . . . ≥ sr > sr+1 = . . . = sN = 0, and U and V are N -by-N orthogonal matrices. V T denotes the adjoint (transpose and conjugate) of the N -by-N matrix V. Since the SVs are arranged in decreasing order, the last terms will have the least affect on the overall image. The most popularly cited SVD-based watermarking scheme is due to Liu and Tan [2], that makes sole use of SVD for watermarking. They proposed to insert the watermark into the SVD domain of the cover image, and demonstrated its high robustness against image distortion. Nevertheless, attacks have appeared e.g. Rykaczewski [4] and Zhang and Li [5], on the application of the Liu and Tan scheme for proof of ownership. Mohammad et al. [1] proposed an improved SVDbased watermarking scheme, which is used to solve the problem of false-positive detection in Liu and Tan scheme. However, in this paper, we show attacks and limitation that jeopardize Mohammad et al. scheme. In section 2, we recall the basics of the schemes proposed by Mohammad et al. [1]. We then present attacks on the scheme in section 3 that invalidate the security claims of the designers. Experimental results verifying our attacks and limitation of the scheme are given in section 4, and section 5 concludes this paper.
2
Improved SVD-Based Watermarking Scheme
Mohammad et al. [1] proposed an improved SVD-based watermarking scheme for ownership protection. The SVD transform is performed on the cover image to get its singular values. The singular values are then added to the whole watermark which is multiplied with a coefficient (the coefficient is a scale factor that controls the strength of the embedded watermark). SVD transform is then performed on the modified singular values and the U and V components of the cover image, to get the watermarked image. The watermark embedding steps of the scheme are as follows: E1. Denote cover image A and watermark W as N -by-N matrices. E2. Perform SVD on cover image A: A = U SV T .
(1)
E3. Multiply the watermark W with a coefficient α where α is a scale factor that controls the strength of the embedded watermark. The modified watermark is then added to the singular values S in Eq.(1) to obtain the reference watermark Sn . Sn = S + αW. (2) E4. Perform inverse SVD to obtain the watermarked image Aw as Aw = U Sn V T .
(3)
Analysis on the Improved SVD-Based Watermarking Scheme
145
The watermark extraction steps are as follows: X1. Obtain the possibly corrupted singular values Sn∗ as Sn∗ = U T A∗w V.
(4)
where A∗w is the possibly corrupted watermarked image. X2. The possibly distorted watermark W ∗ is obtained by W ∗ = (Sn∗ − S)/α.
(5)
The difference between Mohammad et al. [1] scheme and Liu and Tan [2] scheme is that the latter has an extra step after step E3, whereby SVD is performed on the reference watermark Sn in Eq.(2) to obtain the Uw , Sw and Vw components. The Sw component, instead of the reference watermark Sn is then used to obtain the watermarked image Aw in Eq.(3). Note that in the watermark embedding step E2, the content owner needs to keep the cover image or its SVD components U , S and V so that he can use it later in the extraction step X1 and X2. For Liu and Tan scheme, the content owner needs to keep Uw , S and Vw . As were proven in [4,5], Liu and Tan scheme turned out to be flawed since it only embeds the diagonal matrix Sw , and false-positive detection [4,5] will be valid by using the reference watermark SVD pair (Uw and Vw ). Therefore, Mohammad et al. [1] claims that false-positive detection will be invalid in their scheme since they are using the cover image and does not use any watermark image components during the extraction steps, as opposed to Liu and Tan scheme. They also claims that their scheme is robust. However, their scheme is still vulnerable to attacks which are described in the next section.
3
Attacks
We show in this section, how attacks can be mounted that invalidate the security claim made by Mohammad et al. [1], namely that the scheme can be used for proof of ownership and that it is robust. To give an intuition for the attack, consider for an illustrative example a scenario whereby Alice is the owner of an image A and Bob is an attacker to the scheme. Alice embeds her watermark W into the image A using Mohammad et al. scheme, to obtain the watermarked image Aw . She keeps the cover image A for later use during watermark extraction. Bob successfully obtains Aw and performs the embedding steps E1 to E4 with Aw and his own watermark WB , to obtain the watermarked image AW B . Both watermarked images Aw and AW B are perceptually correlated with each other since only the singular values of Aw are modified. A dispute arises later when Bob claims that he is the real owner of AW B since he can extract his own watermark WB from AW B by performing extraction steps X1 and X2 using the SVD components of Aw . Alice could also lay equal claim to AW B since she too can extract her own watermark W from AW B by performing extraction steps X1 and X2 using the SVD components of
146
H.-C. Ling, R.C.-W. Phan, and S.-H. Heng
A. This leads to ambiguity because Bob lays equal claim as Alice, and therefore, no one can prove who is the real owner of image AW B . The precise steps of our attack are as follows: A1. Denote watermark WB as an N -by-N matrix. A2. Perform SVD on watermarked image Aw : Aw = Uw Sw VwT .
(6)
A3. Multiply the watermark WB with the coefficient α. The modified watermark is then added to the singular values Sw in Eq.(6). SnB = Sw + αWB .
(7)
A4. Perform inverse SVD to obtain the watermarked image AW B as AW B = Uw SnB VwT .
(8)
The extraction steps are as follows: D1. Obtain the possibly corrupted singular values Sn∗ as Sn∗ = UwT A∗W B Vw .
(9)
where A∗W B is the possibly corrupted watermarked image. D2. The possibly distorted watermark W ∗ is obtained by W ∗ = (Sn∗ − Sw )/α.
(10)
This attack shows that Mohammad et al. scheme cannot be used for proof of ownership claims, directly invalidating the designers’ claim that it can. The problem in Mohammad et al. scheme was that they concentrated on ensuring that false-positive detection does not occur, by using the cover image SVD components, instead of watermark image SVD components. They neglect that the orthogonal matrices U and V can preserve major information of an image [4,5]. Therefore, if Bob uses his own Uw and Vw (as in Eq.(9)), he can still obtain a good estimate of his own watermark WB since during the embedding step A3 and A4, only the singular values of Aw are modified. Besides that, Mohammad et al. scheme also fails the robustness test if the coefficient α used by the attacker in step A3 is high enough to distort the owner’s watermark and the watermarked image is still visually acceptable in terms of peak signal-to-noise ratio (PSNR). This result is verified in the next section.
4
Experimental Results
In this section, experiments are carried out to prove that the attacks in section 3 are feasible. Fig.1(a) and Fig.1(b) show a gray Lena image and owner’s watermark with the size 200 x 200. Fig.1(c) shows the watermarked image after going
Analysis on the Improved SVD-Based Watermarking Scheme
147
Fig. 1. (a)Lena image (b)Owner’s watermark (c)Watermarked Lena image
through the embedding steps E1 - E4. The PSNR and correlation coefficient of the watermarked image in Fig.1(c) with respect to Fig.1(a) are 34.6 and 0.99 respectively. The coefficient α used in this experiment is 0.02, which is the value used in Mohammad et al. experiment section. Attacks in section 3 are carried out using the attacker’s watermark in Fig.2(a) and the watermarked image in Fig.1(c). Fig. 2(b) shows the watermarked image after the attacker’s watermark has been embedded into the image in Fig.1(c). The coefficient α used during the attack is 0.05, instead of 0.02 because the higher the watermark strength, the better the robustness of the attacker’s watermark. The PSNR and correlation coefficient of the watermarked image in Fig.2(b) with respect to Fig.1(c) are 26.4 and 0.96 respectively. This shows that both images are perceptually correlated.
Fig. 2. (a)Attacker’s watermark (b)Watermarked Lena image after attack
Extraction process is then carried out on Fig.2(b). Fig.3(a) shows the extracted watermark using attacker’s SVD components of Fig.1(c) and Fig.3(b) shows the extracted watermark using the owner’s SVD components of Fig.1(a). As can be seen from Fig.3(a) and Fig.3(b), only attacker’s watermark is clearly visible with a PSNR of 27.5 and correlation coefficient of 0.97, as compared to owner’s watermark with a PSNR of 4.8 and correlation coefficient of 0.15. A further increase of coefficient α during the attack will make the owner’s watermark
148
H.-C. Ling, R.C.-W. Phan, and S.-H. Heng
even harder to be seen (or completely not visible). This means that Mohammad et al. scheme is not suitable for protection of rightful ownership and is not robust.
One of the possible countermeasures to overcome these attacks is to choose a coefficient α value so that any further implementation of Mohammad et al. scheme in the watermarked image by the attacker will result in a watermarked image which is not visually acceptable in quality. Fig. 4(a) shows the owner’s watermarked image using coefficient α of 0.04 (PSNR = 28.6, correlation coefficient = 0.98). Fig. 4(b) shows the attacker’s watermarked image which has significant artifacts (PSNR = 21.4, correlation coefficient = 0.88).
Fig. 4. (a)Owner’s watermarked image with α = 0.04 (b)Attacker’s watermarked image (c)Watermarked image with α = 0.1
Besides that, there is a limitation in Mohammad et al. scheme whereby the watermarked image will show some artifacts even if the coefficient value is 0.02. This can be seen through naked eyes in Fig. 1(c). A coefficient α of 0.1, which is the value used in Liu and Tan scheme, results in a watermarked image that have a significant degradation in quality, as shown in Fig.4(c) (PSNR = 19.4, correlation coefficient = 0.88). It is observed that the quality of the watermarked image through naked eyes would not be as good as the Liu and Tan scheme. The reason is due to the embedding step E3 whereby the whole watermark is multiplied with the coefficient α, instead of only the singular values of watermark in Liu and Tan scheme.
Analysis on the Improved SVD-Based Watermarking Scheme
5
149
Conclusions
We have presented attacks on Mohammad et al. [1] scheme, which is an improved version of the scheme proposed by Liu and Tan [2]. This attacks work due to designers’ oversight related to properties of SVD. According to [4,5], orthogonal matrices U and V can preserve major information of an image. The other reason that contributes to the attacks is the higher coefficient α used during the attack. This makes the attacker’s watermark more robust, and therefore, owner’s watermark has less visibility when extracted. Our attacks directly invalidate the security claims made by the scheme designers, namely robustness and use for proof of ownership applications. Our results are the first known attacks on this scheme. We have also shown a limitation in Mohammad et al. scheme whereby a higher coefficient α will degrade the whole watermarked image.
References 1. Mohammad, A.A., Alhaj, A., Shaltaf, S.: An Improved SVD-based Watermarking Scheme for Protection Rightful Ownership. Signal Processing 88, 2158–2180 (2008) 2. Liu, R., Tan, T.: An SVD-based Watermarking Scheme for Protecting Rightful Ownership. IEEE Transactions on Multimedia 4(1), 121–128 (2002) 3. Andrews, H.C., Patterson, C.L.: Singular Value Decomposition (SVD) Image Coding. IEEE Transactions on Communications 24(4), 425–432 (1976) 4. Rykaczewski, R.: Comments on an SVD-based Watermarking Scheme for Protecting Rightful Ownership. IEEE Transactions on Multimedia 9(2), 421–423 (2007) 5. Zhang, X.P., Li, K.: Comments on an SVD-based Watermarking Scheme for Protecting Rightful Ownership. IEEE Transactions on Multimedia 7(2), 593–594 (2005)
Applications of Adaptive Belief Propagation Decoding for Long Reed-Solomon Codes Zhian Zheng1, Dang Hai Pham2, and Tomohisa Wada1 1
Information Engineering Department, Graduate School of Engineering and Science, University of the Ryukyus, 1 Senbaru Nishihara, Okinawa, 903-0213, Japan 2 Faculty of Electronics and Telecommunications, Honoi Universtiy of Technology, 1Dai Co Viet Street, Hai Ba Trung, Hanoi, Vietnam [email protected], [email protected], [email protected]
Abstract. Reed-Solomon (204,188) code has been widely used in many digital multimedia broadcasting systems. This paper focuses on the low complexity derivation of adaptive belief propagation bit-level soft decision decoding for this code. Simulation results demonstrate that proposed normalized min-sum algorithm as belief propagation (BP) process provides the same decoding performance in terms of packet-error-rate as sum-product algorithm. An outer adaptation scheme by moving adaptation process of parity check matrix out of the BP iteration loop is also introduced to reduce decoding complexity. Simulation results show that the proposed two schemes perform a good trade-off between the decoding performance and decoding complexity. Keywords: Reed-Solomon codes, Adaptive belief propagation, Sum-Product algorithm, Min-sum algorithm, Bit-level parity check matrix, Gaussian elimination.
Applications of Adaptive Belief Propagation Decoding for Long Reed-Solomon Codes
151
decoding methods for a wide range of RS codes. The ABP algorithm is an iterative decoding algorithm. At each iteration loop of the ABP algorithm, it develops on belief propagating (BP) operation on adapted parity check matrix, in which the columns corresponding to the least reliable bit information are reduced to an identity submatrix by using Gaussian Elimination. As nice as it may be, the hardware implementation of the ABP algorithm is a complicated task especially for long RS codes due to 1) Floating-point based Sum-Product processing for extrinsic information of each bits, 2) Processing of the adaptation for parity check matrix using Gaussian elimination is involved in every iteration loop. This paper introduces two modified methods for ABP decoding that provides good trade-off between the decoding performance and decoding complexity. The main idea of this paper can be summarized as extending normalized MSA (NMSA) techniques as the BP procedure of the ABP algorithm. The other method for complexity reduction relies on eliminating adaptation of PCM from the iteration loop. The rest of the paper is organized as follows. In section 2, a brief review of the ABP algorithm for RS decoding is given. Modified ABP methods with reduced complexity are proposed in section 3. Section 4 presents simulation results of the proposed algorithm that operates on infinite precision data and quantized data. Finally, section 5 offers the conclusions of this paper.
2 Brief Review of the ABP Algorithm for RS Decoding Consider a narrow sense RS(n, k ) code defined over a Galois field GF (2 q ) , where n
denotes the number of codeword-symbols and k denotes the number of data symbols. Let H s be the parity check matrix (PCM) of this code, where H s is a (n − k ) × n matrix over GF (2 q ) . Additionally, the RS(n, k ) code can be represented as RS( N , K ) at the bit level over GF (2) with N = n × q , K = k × q . And then H s has an equivalent binary image expansion H b (see [8] for details), where H b is a ( N − K ) × N binary parity check matrix over GF (2) . Now, let us consider the decoding for the RS codes in the receiver. Let L = [l1 , l 2 ,L, l N ] be the received bit soft information in terms of log-likelihood ratios
(LLR). The binary parity check matrix H b and the bit soft information L are the two inputs required to run a BP algorithm for decoding that is widely used for decoding of low density parity check (LDPC) codes [9]. However, the high dense property of the binary parity check matrix of RS codes leads to many short cycles. As a result, the parity check matrix is not suitable for running BP algorithm for decoding. The main contribution of ABP algorithm for RS codes is the insight that BP algorithm will run effectively on high dense parity check matrix, if the cycles are eliminated within the sub-matrix corresponding to the low reliability received bits. The ABP algorithm is shown in Fig.1. In step 1(ST1), the Gaussian elimination is applied for transforming PCM H b to new PCM H b ' before BP step. This new PCM H b ' has
152
Z. Zheng, D.H. Pham, and T. Wada
the property that ( N − K ) unit weight columns correspond to ( N − K ) lowest reliability bits. Step 2(ST2) relies on running general BP algorithm on the PCM H b ' . After ST2, a hard decision decoder (BM) can be used during each iteration to improve the performance and accelerate decoding as well. In the BP procedure of the ABP algorithm, the bit reliability updating method applies the idea of using optimization methods such as gradient descent algorithm. The l) extrinsic information L(ext within l BP iteration is generated according to sum-product algorithm (SPA) as formula (1) (see [7] for details),
l) (ci ) = L(ext
⎛ ⎜
⎛ (n − k ) n −1 ⎜ ∑ 2 tanh C tanh ⎜ ⎜ ⎜ j =1 p =1 ⎝ ⎜ p ≠ i , H (l ) =1 (l ) H =1 ji jp ⎝
⎞ ⎟ L(l ) c p ⎞⎟ ⎟ . 2 ⎟⎟ ⎠⎟ ⎠
( )
(1)
The bit soft information is then updated as: L(l +1) = L(l ) + γ Lext (l ) .
(2)
Where 0 < γ < 1 is called as a damping coefficient. Same to [7], the ABP provides a good decoding performance for RS(204,188) and RS(255,239) if γ = 1 / 8 .
Fig. 1. Structure of original ABP
Fig. 2. Structure of outer adaptation scheme
Applications of Adaptive Belief Propagation Decoding for Long Reed-Solomon Codes
153
3 Modified ABP with Reduced Complexity 3.1 Normalized Min-Sum Algorithm (NMSA) l) For the calculation of the extrinsic information L(ext , original ABP methodology employs SPA using tanh (⋅) processing such as Eq.(1). The tanh (⋅) processing incurs operation on infinitely precision data and result in high complexity for implementation. It is well known that complexity of SPA algorithm can be simplified using “min-sum” l) approximation (known as the min-sum algorithm (MSA)). The calculation of L(ext using MSA is to be expressed as,
⎞ ⎛ ⎟ ⎛ ⎞⎜ ⎜ ⎟⎜ ⎟. ( n −k ) n n (l ) ⎜ ⎟ C Lext (c j ) = ∑ sign[L(c p )] ⎜ min (L(c p )) ⎟ i =1 ⎜ p =1 ⎟⎜ p =1 ⎟ (l ) ⎟⎜ p ≠ j , Hij =1 ⎜ p ≠ j ,H (l ) =1 ⎟⎟ ip ⎝ ⎠⎜ H(l ) =1 ⎠ ⎝ ip
(3)
The MSA reduces greatly the complexity but incurs distinct loss of decoding performance. The author of [7] indicates that the MSA approximation results in performance loss about 0.3dB compared to SPA (see Fig.7 of [7]). l) Proposed NMSA for the calculation of L(ext for ABP is inspired by [10]. The NMSA is based on the fact that the magnitude of extrinsic information obtained by Eq. (3) is always larger than the one obtained from Eq.(1). To decrease the magnitude differl) ence, it is natural to think about dividing L(ext of Eq.(3) by a normalized factor β , which is greater than one. And then Eq.(3) is substituted by Eq. (4). ⎞ ⎛ ⎟ ⎛ ⎞⎜ ⎜ ⎟⎜ ⎟. n n 1 ( n− k ) ⎜ (l ) ∑ C Lext (c j ) = sign[L(c p )] ⎟⎜ min (L(c p )) ⎟ ⎟⎜ p =1 β i =1 ⎜ p =1 ⎟ (l ) ⎟⎜ p ≠ j , H =1 ⎜ p ≠ j , H ( l ) =1 ⎟⎟ ij ip ⎠⎜ H(l ) =1 ⎝ ⎠ ⎝ ip
(4)
To obtain the best performance, β should vary with different signal-to-noise ratios (SNRs) and different iterations. However, to make the complexity as simple as possible, β is kept as constant for all iterations and all SNR values. By using normalization factor β , a new damping coefficient α = γ β is obtained for the bit soft information updating of Eq.(2) as follows,
L( l +1) = L(l ) +
γ Lext (l ) = L(l ) + αLext (l ) . β
(5)
A simple approach to determine normalization factors β can be found on [10]. It is investigated that the error correcting performance of NMSA performs exactly the same as SPA if α = γ β = 1 / 16 for both RS (255,239) and RS (204,188).
154
Z. Zheng, D.H. Pham, and T. Wada
3.2 Outer Adaptation Scheme of Parity Check Matrix
Original ABP algorithm adapts the parity check matrix at each BP iteration loop with use of Gaussian Elimination. This adaptation renders a hardware implementation of the procedure impossible for long RS codes since the serial processing of Gaussian Elimination. The outer adaptation of parity check matrix, to eliminate the adaptation out from the BP iteration loop, results in running of BP on same parity check matrix for different iterations. This means that the Gaussian Elimination is run only once for one decoding procedure and complexity of ABP is largely reduced. This reduced complexity scheme is given in Fig 2. Here, we call the adaptation scheme of original ABP algorithm shown in Fig.1 as inner adaptation scheme and the proposed adaptation scheme shown in Fig.2 as outer adaptation scheme.
4 Simulation Results The usefulness of the modified ABP is verified by the decoding performance evaluation in terms of PER using computer simulation. For the simulations, output data of RS encoder is assumed as modulated by BPSK and propagated through AWGN channel. The following notations will be used in the legends. Notation “BM” refers to hard decision decoder with the use of BM algorithm. “NMSA” refers to ABP scheme with the use of proposed NMSA algorithm. “MSA” refers to ABP scheme with the use of MSA algorithm. “SPA” refers to ABP scheme with the use of SPA algorithm. “damping” refers to value of damping coefficient for bit soft information updating. “inner adap” refers to ABP scheme with the use of original adaptation method, in which the adaptation of PCM is done within each iteration. “outer adap” refers to ABP scheme with the use of proposed adaptation method, which is presented in section 3.2. 4.1 Performance of Modified ABP Operating on Infinite Precision Data
This section presents the decoding performance of modified ABP that operates on infinite precision data. Fig.3, 4 and 5 show the decoding performance of the proposed NMSA. In order to highlight the effect of NMSA method, the SPA with damping coefficient (damping = 1 / 8) is set as reference of optimal scheme. Fig.3 shows the performance of ABP with 5 times BP iterations for the RS (255,239) code. The proposed NMSA outperforms MSA by about 0.25dB at PER=10-3. Even varying the value of damping coefficient ( damping = 1 / 8 and damping = 1 / 16 ), it is found that SPA is little sensitive with the value of damping coefficient. It is also seen that the performance results are very close for NMSA and SPA. The effectiveness of normalized MAS can be further proved by increasing number of BP iterations. As shown in Fig.4 and 5, for both RS (255,239) and RS (204,188), if number of iterations is set to 20, it is observed that 1) the performance of NMSA keeps almost the same decoding performance as SPA, and 2) there is 0.1dB decoding gain over 5 iterations.
Applications of Adaptive Belief Propagation Decoding for Long Reed-Solomon Codes
Fig. 3. Performance of NMSA with 5 time iterations for RS (255,239)
Fig. 4. Performance of NMSA with 20 iterations for RS (255,239)
155
156
Z. Zheng, D.H. Pham, and T. Wada
Fig. 5. Performance of NMSA with 20iterations for RS (204,188)
Fig. 6. Performance of outer adaptation scheme for RS (204,188)
Applications of Adaptive Belief Propagation Decoding for Long Reed-Solomon Codes
157
The decoding performance of the proposed outer adaptation scheme for RS (204,188) is shown in Fig.6. NMSA with the use of damping = 1 / 16 and with PCM adaptation within inner BP loop is set as reference for the performance evaluation of outer adaptation scheme. Simulation results show that outer adaptation scheme has about 0.1-0.15dB coding worse than inner adaptation scheme. 4.2 Performance of Modified ABP Operating on Quantized Data
For hardware implementation, the ABP decoder must operate on quantized data. This section presents that the proposed NMSA still work well even operating on quantized data. As mentioned in previous sections, the value of damping coefficient is chosen as 1/16 for NMSA. As a result, the updating of bit soft information using Eq. (5) can be carried out by operation of addition and right shift. The following notations in the legend of Fig.7 are used. “double” refers to the ABP scheme that operates on infinite decision data. "q(N)” refers to the ABP scheme that operates on N bits quantized data. Based on Fig.7, NMSA based on 7bits quantized data achieves almost the same performance as the NMSA operating on infinite precision data. The MSA based on 7bits quantized data incur about 0.2-0.25dB loss at a PER=10-3. It should be noticed that the performance of NMSA deteriorates if operating on 6bits quantized data.
Fig. 7. Performance of NSMA operating on quantized data for RS (204,188)
5 Conclusion The ABP algorithm is an efficient soft-decision decoding methodology for RS codes that is iterative and based on bit-level. In this paper, we have presented two modified algorithms which lower decoding complexity of original ABP algorithm. Simulation
158
Z. Zheng, D.H. Pham, and T. Wada
results showed that the proposed two algorithms are effective even for long practical codes RS (255,239) and RS (204,188). It is indicated that NMSA as BP procedure provides exactly the same decoding performance as SPA. It is also investigated that outer adaptation scheme of parity check matrix for ABP provides a good trade-off (0.1-0.15dB coding difference) between decoding performance and decoding complexity. The proposed NMSA operating on quantized data meet the performance of the decoding that operates on infinite precision data.
References 1. Reed, I.S., Solomon, G.: Polynomial Codes over Certain Finite Fields. Journal of Society for Industrial and Applied Mathematics 8(2), 300–304 (1960) 2. ETSI TS 102 427 V1.1.1: Digital Audio Broadcasting (DAB); Data Broadcasting – MPEG-2 TS Streaming (2005) 3. ETSI EN 300 429 V1.2.1: Digital Video Broadcasting (DVB); Framing structure, Channel Coding and Modulation for Cable Systems (1998) 4. ETSI EN 300 744 V1.5.1: Digital Video Broadcasting (DVB); Framing Structure, Channel Coding, and Modulation for Digital Terrestrial Television (2004) 5. ISDB-T: Terrestrial Television Digital Broadcasting Transmission. ARIB STD-B31 (1998) 6. Berlekamp, E.R.: Algebraic Coding Theory. McGraw-Hill, New York (1960) 7. Jiang, J., Narayanan, K.R.: Iterative Soft-Input-Soft-Output Decoding of Reed-Solomon Codes by Adapting the Parity Check Matrix. IEEE Transaction on Information Theory 52(8), 3746–3756 (2006) 8. Lin, S., Costello, D.J.: Error Control Coding: Fundamentals and Applications. Prentice Hall, New Jersey (1983) 9. Kschischang, F.R., Frey, B.J., Loeliger, H.-A.: Factor Graphs and the Sum-product Algorithm. IEEE Transactions on Information Theory 47(2), 498–519 (2001) 10. Chen, J.H., Dholakia, A., Eleftheriou, E., Fossorier, M.P.C., Hu, X.Y.: ReducedComplexity Decoding of LDPC Codes. IEEE Transactions on Communications 53(8), 1288–1299 (2005)
Dynamic Routing for Mitigating the Energy Hole Based on Heuristic Mobile Sink in Wireless Sensor Networks Seong-Yong Choi1, Jin-Su Kim1, Seung-Jin Han2, Jun-Hyeog Choi3, Kee-Wook Rim4, and Jung-Hyun Lee1 1 Dept. of Computer Science Engineering, Inha University School of Information & Media, Kyungin Women’s College 3 School of Management & Tourism, Kimpo College 4 Dept. of Computer and Information Science, Sunmoon University, South Korea [email protected], [email protected], [email protected], [email protected], [email protected], [email protected] 2
Abstract. Because the nodes of a sensor network have limited node resources and are easily exposed to harsh external environment, they should be able to use energy efficiently, send data reliably, and cope with changes in external environment properly. Furthermore, the lifetime of networks adopting the multi hop routing is shortened by the energy hole, which is the rapid decrease of energy in the nodes surrounding the sink. This study proposes Dynamic Routing that solves the above-mentioned conditions at the same time by using a dynamic single path, monitoring its own transmission process, and moving the sink heuristically in response to change in surrounding environment. According to the results of our experiment, the proposed method increased network lifetime, and mitigated the energy hole and enhanced its adaptability to topological changes. Keywords: Wireless Sensor Network, Energy Efficiency, Topology Adaptation, Mobile Sink, Dynamic Routing.
in data transmission rather than in detection of data and processing of detected data, an energy efficient routing algorithm is essential [1][2][3][4]. Multi hop routing can reduce packet collisions, enable channel reuse in different regions of a wireless sensor network, lower energy consumption of sensor nodes, and extend the lifetime of sensor nodes, so it is suitable for wireless sensor networks [5]. Multi hop routing algorithms in sensor networks include flooding [6], Directed diffusion [7], and GRAB [8]. In case of a multi hop sensor network including a stationary sink, the nodes close to the sink not only have to detect event but also have to relay data detected by other nodes to the sink. That is, because nodes surrounding the sink deplete limited energy quickly, the entire network stops its operation although most nodes in the network have sufficient energy, and this problem is called energy hole [9]. A mobile sink can solve the energy hole and increase network lifetime by balancing the energy consumption of nodes through changing data flow. The authors of [10] have theoretically prove that, under the conditions of a short path routing and a round network region, moving along network periphery is the optimum strategy for a mobile sink. But fixedtrack moving strategies lack adaptability to different networks and have to be redesigned when network devices are deployed in various circumstances. This study proposes dynamic routing algorithm, for aiming at reliable data transmission, energy efficiency, topological adaptability to the change of external environment and mitigation of energy hole at the same time using a dynamic single path and mobile sink in a multi hop sensor network. The proposed method maintains cost_table and transmits data after searching for the optimal single path using COST to the sink. In response to the change of external environment, each node monitors the transmission process. If a node detects a damaged path, it changes the optimal path dynamically in a way of distributing energy consumption evenly over nodes, and by doing so, it enhances network reliability and energy efficiency of each node. On the change of network topology, only the changed part is reconstructed instead of the whole network. Therefore the proposed method minimizes unnecessary energy consumption and does not require periodic flooding data to adapt the topological changes. Furthermore, the sink monitors change in the residual energy of its surrounding nodes, and determines the time and place of its movement heuristically. This mitigates the energy hole, and extends network lifetime. Chapter 2 reviewed previous studies, and Chapter 3 described the method proposed in this study. Chapter 4 presented the results of experiment, and Chapter 5 analyzed the results of experiment.
2 Related Studies Flooding [6] is the most reliable and fast one among the methods for a source node that detects events in surrounding environment using multi hop routing to transmit its collected data to a sink. Also, it does not require costly topology maintenance and complex route discovery algorithm. But each node transmits data to its neighbor nodes regardless of whether to have received redundant data and, as a result, there are the overlapping and implosion problems. Furthermore, because each node does not consider the energy level of itself and its neighbor nodes, the algorithm is not energy-efficient.
Dynamic Routing for Mitigating the Energy Hole Based on Heuristic Mobile Sink
161
Directed diffusion [7] is a data forwarding protocol where a sink floods its interest to build reserve paths from all potential sources to the sink. After receiving data from the source, a sink refresh and reinforce the interest. Although Directed diffusion has the potential for significant energy savings, it copes with communication errors and node failures by periodically flooding data to repair the path. GRAB (GRAdient Broadcast) [8] is a routing protocol that improved Directed diffusion. GRAB transmits data according to cost field and credit. Data detected by a source node is transmitted in a direction that decreases COST. GRAB uses multiple paths for reliable data transmission, and uses credit for adjusting the width of multiple paths. As the width of multiple paths is controlled using credit in GRAB, the reliability of data transmission increase. However, the use of multiple paths causes additional energy consumption, so not desirable in terms of energy efficiency. Furthermore, in a sensor network using limited resources, its topology is changed frequently, and such changes require the modification of network structure. For this, network is updated periodically or when changes are detected by an algorithm for detecting network changes. However, as network reconstruction involves all the nodes in the network, it increases energy consumption, and shortens network lifetime.
3 Dynamic Routing Algorithm In Dynamic Routing, each node must build and maintain cost_table. In order to maintain the optimal path to the sink according to change in the energy and topology of surrounding nodes, each node should set HC, NAE, and COST and inform its neighbor nodes at a hop’s distance of these data. Here, HC is the number of hops between the sink and the node, NAE is the average residual energy of nodes on the path from the node to the sink, and COST is the cost of transmission to the sink calculated using HC and NAE. For this, each node should measure NRE, which is its own normalized residual energy, accurately. It is because a decrease in the residual energy Table 1. Types and functions of packets used in dynamic routing algorithm Packet type INIT NEG TRN ACK REQ REP HELLO
Function A packet that the sink broadcasts over the network at the beginning of network construction or just after the sink has moved. A packet broadcasted in order to determine whether a sensing node that has detected an event can be a source node to send data to the sink. A packet broadcasted by the source node or the node that sends data to the sink. A packet broadcasted to neighbor nodes without requesting a reply. A packet requesting the current remaining energy of neighbor nodes within a specific number of hops in order to decide the destination of a sink move. A packet broadcasted by a node that has received a REQ packet to report its current remaining energy. A packet for a newly added node or a moved node to advertise its existence to its neighbor nodes.
162
S.-Y. Choi et al.
of a node affects other nodes’ setting of COST. However, if data are transmitted to neighbor nodes whenever NRE changes, it increases traffic and reduces network lifetime. For this reason, in Dynamic Routing, routing information is updated and sent to neighbor nodes only when each node has to send a packet. On receiving such a packet, the neighbor nodes update cost_table with the routing information of its neighbor node immediately. The process of Dynamic Routing is composed of five processes as follows: initialization as the initial process of network construction, negotiation that selects a representative node among nodes that have detected the same event, transmission that sends detected data to the sink, reconfiguration that copes with the addition and movement of nodes, and sink mobility related to the monitoring of the occurrence of an energy hole and sink movement. 3.1 Initialization At the beginning of network construction, a sink transmits an INIT packet. On the transmission of an INIT packet, the sink sets the transmission node ID to sink, and HC and NAE to 0 respectively. Node n, which has received an INIT packet, prevents the redundant transmission of INIT by waiting for 1HD_Time (1 hop distance time: time for the one farthest from the node among the neighbor nodes at a hop’s distance to receive the transmitted packet) after receiving firstly INIT sent by neighbor node ni, and then receives another INIT packet. After the lapse of 1HD_Time, it calculates (1) ~ (7) by searching cost_table, and modifies to INIT node_ID , HC , NAE . In initialization process, all the nodes send INIT only once. In order to explain the mechanism of dynamic routing proposed in this study, we set a network consisting of three sensor nodes and a sink as in Figure 1. All the nodes were simplified so that NRE decreases by 0.03 at sending a packet and 0.02 at COST
argmin cost_table n . COST j HC
cost_table n . HC j
2
cost_table n . NAE j
3
n ⁄E
4
NAE NRE
E HC
NAE COST
HC
NAE
n
1 HC
HC HC NAE
1
5 NRE
HC NAE
6
1 1 HC
NRE
7
receiving a packet, and it was assumed that energy is not consumed in computing and event detection. NRE in Figure 1(a) is the initial NRE of each node. At the beginning of network construction, the sink creates INIT (sink, 0, 0) and broadcasts it to its neighbor nodes at a hop’s distance.
Dynamic Routing for Mitigating the Energy Hole Based on Heuristic Mobile Sink
163
①
(a) On receiving INIT , n1 records HC , NAE , and COST =0 on cost_table as a neighbor node, and waits for 1HD_Time. After the lapse of
(a)
(c)
(b)
(d)
Fig. 1. The illustration of an initialization process
1HD_Time, it measures NRE =0.68, and searches COST , and HC , NAE at that time in cost_table, and calculates HC , NAE , and COST =1.471 by (5), (6), and (7). Because n1 does not have its cost set in cost_table, it records HC ,
164
S.-Y. Choi et al.
NAE , and COST in cost_table, modifies to INIT n1, 1, 0.68 , and broadcasts as in (b). At that time, NRE decreases to 0.65. n2, which is similar to n1, modifies and broadcasts INIT n2, 1, 0.78 as in (c). At that time, NRE decreases to 0.75. Cost_table in (a) is that after n1 and n2 have sent INIT and INIT respectively.
②
(b)
① The sink drops received INIT . ② On receiving INIT , n2 calculates COST
=HC / NAE =1.471 and records HC , NAE , and COST in cost_table as a neighbor node. After receiving INIT , n2 waits for 1HD_Time and if there is another INIT , the node receives it and records cost_table as a neighbor node as above. After the lapse of 1HD_Time, n2 measures NRE =0.73, and calculates (1) ~ (7) by searching cost_table. Because calculated COST =1.370 is higher than 1.282, the cost in cost_table, n2 does not update HC , NAE , and COST , nor rebroadcast INIT . As in (b) , n3, which has received INIT , also records HC , NAE , and COST in cost_table as a neighbor node. Because n3 has to wait for 1HD_Time, it waits until INIT is received in (c) . At that time, NRE decreases to 0.58. Cost_table in (b) is cost_table of n2 and n3 updated after all the neighbor nodes have received INIT sent by n1.
③
① ② ③
②
③
②
n1, which has received INIT , is similar to (b) . At that time, NRE decreases to 0.63. The sink drops received INIT . As in (c) , n3, which has received INIT , records HC , NAE , and COST in cost_table as a neighbor node. At that time, NRE decreases to 0.56. In (b) , after the lapse of 1HD_Time from receiving INIT , n3 calculates (1) ~ (7) by searching cost_table. Because its cost is not set in cost_table, n3 records HC , NAE , and COST , modifies to INIT (n3, 2, 0.67), and broadcasts as in (d). At that time, NRE decreases to 0.53. Cost_table in (c) is that of n1 and n3 updated after all the neighbor nodes have received INIT sent by n2. (d) n1, which has received INIT , is similar to (b) . At that time, NRE decreases to 0.61. n2, which has received INIT , is similar to (b) . At that time, NRE decreases to 0.71. Cost_table in (d) is that of n1 and n2 updated after all the neighbor nodes have received INIT sent by n3. (c)
①
① ②
③
② ②
3.2 Negotiation In order to prevent redundant data transmission, the node with the smallest COST among the nodes that have detected same data sets the source node for sending the detected data. For this, all the nodes n that have detected the event update their cost_table with HC , NAE , COST calculated according to their residual energy at the time of detection, and create and broadcast NEG node_ID , HC , NAE , and then wait for 1HD_Time. Node n, which has received NEG from its neighbor nodes ni that have detected an event during 1HD_Time, can be the source node only when its cost is lowest. At that time, energy consumed by the detection node is as in Equation (8), and energy consumed by the source node is as in Equation (9).
Dynamic Routing for Mitigating the Energy Hole Based on Heuristic Mobile Sink
E E
_
_
n n
E E
_
ni n
165
E
n
8
E
n ,
9
where A : an area communicable with detection node n among all the nodes ni within a hop’s distance from the event. E : energy consumption for receiving a packet. E : energy consumption for node n broadcast a packet. As in Figure 2, if n3 detected a fire, it updates cost_table immediately and broadcasts NEG (n3, 2, 0.655). At that time, NRE decreases to 0.50. n2 is assumed to have downed before the occurrence of the fire. Because n3 has not received NEG with cost lower than its cost during 1HD_Time, it becomes the source node that sends the detected data. At that time, n1 updates information on its neighbor nodes in cost_table with received NEG , and NRE decreases to 0.59.
Fig. 2. The illustration of a negotiation process
3.3 Transmission The transmission node n that delivers received data to the sink (or source node) updates cost_table with its own HC , NAE , COST calculated by (1) ~ (7) according to its residual energy. Then, it creates and broadcasts TRN , and waits for 1HD_Time and monitors whether the node_ID , HC , NAE , COST data are transmitted safely during the time. At that time, among the neighbor nodes of node n, only one node changes received TRN and retransmits it. If node n has not received TRN with decreased cost during 1HD_Time from its neighbor node ni, it judges that the node to receive TRN has downed, deletes records on the corresponding node from cost_table, and repeats the process above. If it has, it ends the monitoring process. Figure 3 is an example of process that data detected by source node n3 is transmitted to the sink.
166
S.-Y. Choi et al.
Fig. 3. The illustration of a transmission process
① Source node n3 calculates (1) ~ (7) by searching cost_table, and updates
cost_table. Then, it creates and broadcasts TRN n3, 2, 0.64, 1.282 . After the transmission, NRE decreases to 0.47. The nodes that receive NRE are n1 and n2. n2 has downed, and because n1 has its cost higher than 1.282, the cost of TRN , it judges that it is not on the optimal path and, therefore, does not broadcast TRN . However, it updates neighbor node information in cost_table with received TRN . At that time, NRE decreases to 0.57.
② n3, which has sent TRN
①
in , expects the retransmission of TRN from n2 but does not receive TRN , the cost of which has decreased, due to the down of n2. n3 waits for 1HD_Time and then removes the record of n2 from cost_table. Again, n3 calculates (1) ~ (7) by searching cost_table, and updates cost_table, and then creates and broadcasts TRN (n3, 2, 0.575, 1.471). After the transmission, NRE decreases to 0.44. n1, which has received TRN , updates neighbor node information in cost_table with received TRN . At that time, NRE decreases to 0.55. Because its cost is equal to 1.471, the cost of TRN , n1 judges that it is on the optimal path. Then, n1 calculates (1) ~ (7) by searching cost_table, and updates cost_table. In addition, it changes TRN n1, 1, 0.55,0 and broadcasts it. After the transmission, NRE decreases to 0.52. The nodes that has received TRN are n3 and the sink. n3 updates neighbor node information with received TRN , and ends the monitoring process. At that time, NRE decreases to 0.42. The sink ends the transmission process by broadcasting ACK (sink, 0, 0), which does not request a reply in order to prevent looping. At that time, NRE decreases to 0.50. Cost_table in Figure 3 is that of n2 and n3 updated after TRN sent by source node n3 has been delivered to the sink.
③
3.4 Reconfiguration Node n, which is a node that has finished its movement or a newly added node, initializes cost_table, and creates and broadcasts HELLO node_ID , and waits for 1HD_Time. Node n’s neighbor node ni, which has received HELLO , replies immediately without waiting for 1HD_Time. Node ni updates cost_table with HC , NAE , and COST calculated by (1) ~ (7) according to its current residual energy, and then
Dynamic Routing for Mitigating the Energy Hole Based on Heuristic Mobile Sink
167
creates and broadcasts ACK node_ID , HC , NAE . Node n, which has received ACK during 1HD_Time and recorded its neighbor nodes in cost_table, updates cost_table with HC , NAE , COST calculated by (1) ~ (7) according to its residual energy, and then creates and broadcasts ACK node_ID , HC , NAE . If node n’s neighbor nodes receive ACK and update cost_table with the information of node n, the reconfiguration process is finished. Figure 4 is an example showing a case that node n4 is newly added to the network or moved to a new location.
Fig. 4. The illustration of a reconfiguration process
① ②
n4 initializes cost_table and broadcasts HELLO , and then waits for 1HD_Time. At that time, NRE decreases to 0.87. The nodes receiving HELLO are the sink, n1, and n3. The sink broadcasts ACK sink, 0, 0 . n1 measures NRE =0.48, calculates (1) ~ (7) by searching cost_table, and then updates cost_table. Then, it creates and broadcasts ACK n1, 1, 0.48 . After the transmission, NRE decreases to 0.45. n3 measures NRE =0.40, calculates (1) ~ (7) by searching cost_table, and then updates cost_table. Then, it creates and broadcasts ACK n3, 2, 0.475 . After n4’s neighbor nodes have sent ACK , n4 and its neighbor nodes receive ACK . n1 receives ACK and ACK and records neighbor node information in cost_table. At that time, NRE decreases to 0.41. n3 receives ACK and records neighbor node information in cost_table. At that time, NRE decreases to 0.35. n4, which has sent HELLO , receives ACK , ACK , ACK and records neighbor node information in cost_table. At that time, NRE decreases to 0.81. n4 calculates (1) ~ (7) by searching cost_table. Then, it creates and broadcasts ACK n4, 1, 0.81 . After the transmission, NRE decreases to 0.78. All the neighbor nodes of n4 that receive ACK calculate COST =1.235, and record neighbor node information in cost_table. At that time, NRE decreases to 0.39, and NRE to 0.33. Cost_table in Figure 4 is that of n1, n3, and n4 changed after the completion of the reconfiguration process.
③
168
S.-Y. Choi et al.
3.5 Heuristic Sink Mobility What should be considered in sink movement for mitigating the energy hole is how to set the time and place of the sink movement. For this, Dynamic Routing uses the maximum move hop count (MMHC), which is the maximum movable distance of the sink, and the average residual energy surrounding the sink (ARES), which is the average residual energy of all the neighbor nodes at a distance of a hop from the sink. In Dynamic Routing, the sink can monitor in real-time change in the remaining energy of all its neighbor nodes through the received data packet, so it calculates ARES whenever it receives new data packet and if the resultant value is less than the predefined threshold, it begins to move. At that time, the sink moves to the node with the highest energy among the nodes within MMHC in order to make energy consumption even among the nodes. For this, the sink sends a REQ packet containing MMHC. The node that has received the REQ packet discards the packet if its HC is larger than MMHC, or retransmits the packet to its neighbor nodes. Furthermore, the node that has received a REQ packet sends the sink a REP packet in order to report its current energy. The sink that has received REP packets from all the nodes within MMHC moves to the node with the highest remaining energy. The sink that has finished its move rebuilds the network by sending an INIT packet.
/* INIT_ARES : The average remaining energy of nodes within a distance of a hop from the sink at the beginning of network construction or just after a sink’s move */ calculate INIT_ARES P ← moving threshold MMHC ← Maximum Move Hop Count if (isDATA_Received()) then calculate ARES if ARES < INIT_ARES * (1-P) then transfer REQ receive REP // from within the range of MMHC move the sink to the node with the maximum remaining energy within the range of MMHC calculate INIT_ARES end if end if Fig. 5. The heuristic sink mobility algorithm
4 Experiment and Performance Evaluation For the simulation, we built a 300m 300m square sensor field. The number of sensor nodes in the sensor field was 100 including a sink, and the other 99 sensor nodes were deployed at intervals of 30m. The sink was positioned at the left bottom part of the sensor field. The sink was assumed to be a large-capacity system without an energy limitation, and all the nodes were assumed to consume 0.021J for sending 128 bytes of data and 0.014J for receiving the same amount of data respectively, for sending and
Dynamic Routing for Mitigating the Energy Hole Based on Heuristic Mobile Sink
169
receiving a bit at a transmission rate of 10kbps in WINS NG [11]. Power consumption in the standby mode and computing were not counted. For performance evaluation, network lifetime was defined as the period of time until the first node dies, and we generated a sensing event at every second by randomly choosing a source node. For the simulation, the radio transmission range was set to 45m uniformly for all the nodes including the sink. We conducted an experiment with changing the node failure rate from 0% to 25% at intervals of 5% in order to change external environment after network construction when the sink does not move. And we compare the performance results of the proposed method with those of the flooding and the simplified GRAB, in which transmits of an ADV packet involving all the nodes occurred at every 50 seconds in order to cope with the change of network topology. At that time, experiment was repeated 5 times for each routing algorithm, and the mean value was calculated. Figure 6(a) shows the average volume of data that the sink received with the change of the node failure rate. In the results of the experiment, all of the three routing algorithms were not significantly affected by the node failure rate.
(a) Average volume of data
(b) Average transmission time (s)
(c) Average residual energy (J)
Fig. 6. Comparison of the performance of flooding, simplified GRAB, and Dynamic Routing while the sink does not move
Simplified GRAB uses multiple paths and sends a network configuration packet periodically, but Dynamic Routing uses a single path and sends a network configuration packet once at the beginning of network construction, so the average received packet of Dynamic Routing increases. Figure 6(b) shows the average length of time for data transmission from a source node to the sink. According to the results of experiment, in simplified GRAB, even if the node failure rate increased, data transmission delay did not increase through the use of multiple paths and cyclic transmission of a network reconfiguration packet.
170
S.-Y. Choi et al.
In dynamic routing, however, data transmission delay increased with the increase in the node down rate. Figure 6(c) shows the average residual energy of nodes with the change of the node failure rate. In the results of experiment, the average residual energy was highest in Dynamic Routing, which uses a single path for data transmission and minimizes energy consumption of nodes. Figure 7(a) shows the average volume of data packet that the sink received in the experiment repeated five times, which changed the moving threshold from 10% to 90% by 10% and changed MMHC from 2 hops to 8 hops by a hop in order to determine the optimal moving threshold and MMHC in Dynamic Routing. The experiment assumed that there is no sensing event happening while the sink is moving. And for a various experiment circumstances, the radio transmission range was set to 31m uniformly including the sink. In Figure 7(a), the reason that MMHC showed low performance at 2 hops and 3 hops is because nodes closer to the sink have a higher energy consumption rate and, as a result, there happen nodes that consume all their energy before the energy level becomes equal among all the nodes. Also, the reason that performance is lower when MMHC is large like 8 hops than when it is small is because the sink moves a longer distance and this distributes energy consumption unevenly over nodes. The highest performance was shown when MMHC was 7 hops and the moving threshold was 50%, at which 1,232.0 data were received on the average.
(a)
(b)
Fig. 7. (a) The average volume of data packets received according to moving threshold and MMHC, (b) The average number of sink movements according to moving threshold and MMHC
Figure 7(b) shows the number of sink moves in the experiment shown in Figure 7(a). The lower the moving threshold was, the larger the number of sink moves was, but when the threshold was 0.5 or higher, the number of sink moves went down below 10 and it was not affected significantly by the number of MMHC. This result is explained by the fact that the Dynamic Routing decides a move based on change in the average energy of the nodes around the sink, and makes energy consumption even among the nodes.
Dynamic Routing for Mitigating the Energy Hole Based on Heuristic Mobile Sink
171
In order to evaluate the performance of dynamic routing when the sink moves in environment with even node density, this study conducted an experiment on four cases (Heuristic, periphery, random, and stationary) with changing the initial energy of each node from 10J to 40J by 10J. The Heuristic case was set to have 7-hop MMHC and moving threshold of 50%, which showed the highest performance in the experiment in Figure 7(a). The periphery case, in which the sink moves along the outer line of the sensor field, was set to have 5-hop MMHC and moving threshold of 30%, which showed the highest performance though the data were not presented here. In the random case, in which the sink moves to a random place at a random time without using information on node energy, the sink moved at a probability of 0.01 (the sink moves once in every 100 times) in order to make environment similar to that for the heuristic movement of the sink when the initial energy of each node was 10J. Figure 8 shows results obtained from repeating the experiment five times for each of the Heuristic, periphery, random, and stationary cases. Figure 8(a) is the average volume of data that the sink received, and Figure 8(b) is the average number of sink movements. The total average volume of data that the sink had received until the end of the network was 3075.5 in the heuristic case, 3511.45 in the periphery case, 2581.25 in the random case, and 765.1 in the stationary case.
(a) Average volume of data the sink received
(b) Average number of sink movements
Fig. 8. Performance evaluation of dynamic routing when the sink moves (even node density)
According to the results of the experiment, the performance of the periphery movement, in which the sink moves along the outer line of the network, was highest as theoretically proved in [10]. And the reason that the random movement is not much lower in performance than the heuristic and periphery movements is that dynamic routing consumes node energy evenly during network operation. Furthermore, while the number of sink movements increased with the increase in the initial energy level in the random case, it was stable regardless of the initial energy level in the heuristic and periphery cases. And the reason that the heuristic case makes fewer movements than the periphery one is that in the periphery movement the sink moves unconditionally if ARES is below the threshold, but in the heuristic movement it does not move if the current position is found to be optimal even if ARES is below the threshold. Figure 9 diagrammed the remaining energy of each node after the closing of the network in stationary, periphery, and heuristic respectively. In Figure 9(a), the energy of nodes around the sink decreased sharply compared to the other nodes due to energy hole, but the sink move in Figure 9(b) and Figure 9(c) reduced workload on the nodes within the energy hole, so energy consumption was
172
S.-Y. Choi et al.
even among the nodes. The node energy consumption rate in Figure 9(c) is not equal to that in Figure 9(b) because, as shown in Figure 8, sometimes the sink does not move depending on the residual energy of the surrounding nodes.
(a) stationary
(b) periphery
(c) Heuristic
Fig. 9. The remaining energy of each node after the closing of the network in Dynamic Routing
In order to evaluate the performance of dynamic routing when the sink moves in environment with uneven node density, we built a network consisting of 200 nodes by deploying additional 100 nodes at regular intervals of 15m in an area of 7.5m-142.5m X 7.5m-142.5m. Figure 10 shows the results of the experiment as in Figure 8. According to the results, the total average volume of data that the sink had received until the end of the network was 2608.4 in the Heuristic case, 2508.6 in the periphery case, 1766.45 in the random case, and 702.8 in the stationary case. In the periphery movement, the sink moves regardless of node density, but in the Heuristic movement, the sink moves according to node density and, as a result, the residual energy of nodes was consumed more evenly and the average volume of data received increased in the Heuristic case.
(a) Average volume of data the sink received
(b) Average number of sink movements
Fig. 10. Performance evaluation of dynamic routing when the sink moves (uneven node density)
In all the cases, the network lifetime was shorter than that in the experiment of Figure 8 because the addition of 100 nodes increased the volume of packet transmission. But in the Heuristic and periphery cases, the number of sink movements decreased because the decrease of ARES required for a sink movement was reduced due to the addition of 100 nodes.
Dynamic Routing for Mitigating the Energy Hole Based on Heuristic Mobile Sink
173
5 Conclusions and Future Works This study proposed dynamic routing that mitigates the energy hole with enhancing energy efficiency, reliability of data transmission, and adaptability to changes in external environment for a multi hop wireless sensor network. In the proposed method, each node transmits data through a single path, monitors its own data transmission, and changes the path dynamically for the even distribution of energy consumption over the entire network. Furthermore, when the network topology has been changed, it reconfigures only the parts involved in the change, and this minimizes unnecessary energy consumption. In order to maximize network life, moreover, the sink decides its movement heuristically. When the proposed dynamic routing was compared with the existing method using multi hop routing, dynamic routing extended network lifetime, but data transmission delay increased with the increase of the node failure rate. In the experiment in environment with even node density in order to evaluate the performance of sink movement, network lifetime was shorter in Heuristic movement than in periphery movement, but when node density was uneven in the environment, the Heuristic case made fewer sink movements and increased network lifetime. As this study did not consider data generation from sink movements, however, further research is necessary to solve resultant problems in reliability. In addition, we need to solve the problem that network lifetime is shortened by increased packet transmission when a large number of nodes are added.
Acknowledgement “This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency)” (NIPA-2010C1090-1031-0004).
References 1. Akkaya, K., Younis, M.: A survey on routing protocols for wireless sensor networks. Adhoc Networks 3(3), 325–349 (2005) 2. Karaki, N.A.I., Kamal, E.: Routing techniques in wireless sensor networks: A survey. IEEE Wireless Communications 11(6), 6–28 (2004) 3. Niculescu, D.: Communication paradigms for sensor networks. IEEE Communications Magazine 43(3), 116–122 (2005) 4. Bi, Y., Sun, L., Ma, J., Li, N., Khan, I.A., Chen, C.: HUMS: An autonomous moving strategy for mobile sinks in data-gathering sensor networks. EURASIP Journal on Wireless Communication and Networking, 1–15 (2007) 5. Zheng, Z., Wu, Z., Lin, H., Zheng, K.: WDM: An Energy-Efficient Multi-hop Routing Algorithm for Wireless Sensor Networks. In: Proc. International Conference on Computational Science, pp. 461–467 (2005) 6. Zhang, Y., Fromherz, M.: A robust and efficient flooding-based routing for wireless sensor networks. Journal of Interconnection Networks 7(4), 549–568 (2006)
174
S.-Y. Choi et al.
7. Intanagonwiwat, C., Govindan, R., Estrin, D.: Directed diffusion: a scalable and robust communication paradigm for sensor networks. In: Proc. of ACM MobiCom, pp. 56–67 (2000) 8. Ye, F., Zhong, G., Lu, S., Zhang, L.: Gradient Broadcast: A Robust Data Delivery Protocol for Large Scale Sensor Networks. Springer Science Wireless Networks 11, 285–298 (2005) 9. Marta, M., Cardei, M.: Improved sensor network lifetime with multiple mobile sinks. Pervasive and Mobile Computing 5(5), 542–555 (2009) 10. Luo, J., Hubaux, J.P.: Joint mobility and routing for lifetime elongation in wireless sensor networks. In: Proc. of 24th Annual Conference of the IEEE Computer and Communications Societies, pp. 1735–1746 (2005) 11. Sensoria Corporation, WINS NG Power Usage Specification: WINS NG 1.0 (2000), http://www.sensoria.com/ 12. Vergados, D.J., Pantazis, N.A., Vergados, D.D.: Energy-efficient route selection strategies for wireless sensor networks. Mob. Netw. Appl. 13(3-4), 285–296 (2008) 13. Chang, J.-H., Tassiulas, L.: Maximum Lifetime Routing in Wireless Sensor Networks. In: Proc. of the 4th Conference on Advanced Telecommunications/Information Distribution Research Program, pp. 609–619 (2000) 14. Choi, S.-Y., Kim, J.-S., Han, S.-J., Choi, J.-H., Rim, K.-W., Lee, J.-H.: Dynamic Routing Algorithm for Reliability and Energy Efficiency in Wireless Sensor Networks. In: Lee, Y.h., et al. (eds.) FGIT 2009. LNCS, vol. 5899, pp. 277–284. Springer, Heidelberg (2009)
Grammar Encoding in DNA-Like Secret Sharing Infrastructure Marek R. Ogiela and Urszula Ogiela AGH University of Science and Technology Al. Mickiewicza 30, PL-30-059 Kraków, Poland Tel.: +48-12-617-38-543; Fax: +48-12-634-15-68 {mogiela,ogiela}@agh.edu.pl
Abstract. This publication presents a new technique for splitting secret information based on mathematical linguistics methods and allowing sequences of one or several bits to be coded in the way used in DNA cryptography. This solution represents a novel approach allowing DNA substitutions to be generalised and bit blocks of any length to be coded. Apart from extending the capability of coding in DNA cryptography, the technique presented will also make it possible to develop a new type of a hierarchical secret splitting scheme. Such schemes can be employed when developing new types of cryptographic protocols designed for the intelligent splitting and sharing of secrets. Keywords: DNA-cryptography, secret sharing protocols, mathematical linguistics, information coding.
Thus the purpose of this publication is to present new algorithms and protocols in the form of so-called linguistic threshold schemes which allow important information to be hierarchically split. The splitting algorithm makes use of information coding with a suitably defined context-free grammar, and the coding process itself depends on the different sizes of encrypted bit blocks obtained for input data. Particular blocks will be coded into terminal symbols of the grammar in a way resembling bit substitutions with nitrogen bases in DNA cryptography. (A, T, G, C). The proposed method can also be implemented in the form of a protocol for distributing information among a group of trusted individuals. This protocol can be followed in various cryptographic versions, for example with the involvement of a trusted arbiter who will generate component shadows of the shared information, but also in a version without any privileged instance, in which all participants of this protocol have exactly equal rights. This protocol can also be implemented in various hierarchical structures of information flow, i.e. ones in which there is a need to independently split the secret for various information management levels or for various levels of knowledge of/access to strategic data. If strong enough context-free grammars are used, the proposed algorithm will also be significantly more universal than simple DNA sequence coding techniques, mainly because of the ability to code not just single or double pairs of bits on one nitrogen base, but also the ability to code longer bit sequences on a single symbol of the context-free grammar.
2 Information Coding in DNA Cryptography Potential computational capabilities of molecules had not been developed until 1970s. The first ideas of combining computers with DNA chains appeared in 1973, when Charles Benett published a paper in which he proposed a model of a programmable molecular computer capable of executing any algorithm. However, only 20 years after this publication were the first successful attempts made. In 1993, Leonard Adleman [1] became the first to make calculations using a DNA computer and solve the Hamilton path problem for seven cities. Since then, more solutions using DNA coding for various cryptographic problems have appeared (for copying information, steganography, cryptanalysis etc.). In practice, every method based on DNA coding boils down at minimum one operating stage to storing data in DNA molecules (including real amino-acids). It is at this level of coding that we already have a number of opportunities for using the acids as a medium. The most obvious one is using the structure of particular nucleotides. As there are four types of them, one base can store 2 bits of information. As an example we can assume the following: • • • •
Grammar Encoding in DNA-Like Secret Sharing Infrastructure
177
We can also code in such a way that one pair of nucleotides, independently of its polarisation (i.e. the binding of purine to pirymidine) codes one bit. For example: • the A-T binding may represent Æ 0; • the G-C binding may represent Æ 1. This approach makes the possible code breaking even more difficult by the so-called masking of overt text (this happens in homophonic encryption which records the same characters as different nucleotides). What is important, however, is that the reading from the homophone table is random, which hides the statistical characteristics of the coded characters. For such possible coding methods, further chapters will introduce a generalisation allowing bit blocks of any length to be coded using terminal symbols of a context-free grammar. Blocks so coded will then constitute information which will undergo the secret sharing procedure using one of the known threshold schemes of information sharing [4, 12, 13, 14].
3 Characterisation of Mathematical Linguistic Methods Mathematical linguistic methods were first developed by N. Chomsky in publications [6] on elaborating grammar formalisms allowing natural languages to be modelled. In these publications a classification of formal grammars was presented, and these later contributed a lot to the creation of computer translation techniques as well as the theory of transcription automatons and systems. The following four classes were distinguished among the basic types of grammars: • • • •
The introduction of this classification obviously necessitated defining such basic concept as the alphabet, dictionary, grammar, language, and syntax analyzer. Here it is worth noting the constantly growing opportunities for applying formal grammars. Originally, they had been defined solely for modelling natural languages. Later however, in addition to these applications, further important areas kept appearing leading to the applications described lower down in this publication, namely modern secret sharing algorithms. All the applications of linguistic methods so far can be listed as follows: • • • • •
Natural language modelling Translation & compilers theory Syntactic pattern recognition Cognitive systems [10] Secret sharing threshold schemes
The following chapter will present a secret sharing algorithm employing a coding modelled on DNA methods and using context-free grammars defined as shown below.
178
M.R. Ogiela and U. Ogiela
The context-free grammar in general is defined by the following formula [10]: GSECRET=(ΣN, ΣT, PS, SS), where: ΣN – set of non-terminal symbols; ΣT – set of terminal symbols; SS – grammar start symbol; PS – is a set of grammar rules in the form: A→γ, where A∈ΣN, and γ∈( ΣN ∪ΣT )+.
4 An Idea of Secret Sharing Secret sharing algorithms are quite a young branch of information technology and cryptography. In the most general case, their objective is to generate such parts for the data in question that could be shared by multiple authorised persons [13, 15]. What arises here is the problem of splitting information in a manner allowing its reconstruction by a certain n-person group interested reconstructing the split information. Algorithm solutions developed to achieve this objective should at the same time make sure that none of the groups of participants in such a protocol, whose number is lesser than the required m persons, could read the split message. The algorithms for dividing information make it possible to split it into chunks known as shadows that are later distributed among the participants of the protocol so that the shares of certain subsets of users, when combined together, are capable of reconstructing the original information. There are two groups of algorithms for dividing information, namely, secret splitting and secret sharing. In the first technique, information is distributed among the participants of the protocol, and all the participants are required to put together their parts to have it reconstructed. A more universal method of splitting information is the latter method, i.e. secret sharing. In this case, the message is also distributed among the participants of the protocol, yet to have it reconstructed it is enough to have a certain number of constituent shares defined while building the scheme. Such algorithms are also known as threshold schemes, and were proposed independently by A. Shamir [8] and G. Blakley [4], and were thoroughly analysed by G. Simmons [13]. The next section describes a method of extending such classical threshold schemes for secret sharing to include an additional linguistic stage at which binary representations of the shared secret are coded into new sequences representing the rules of a formal grammar introduced [10]. Such a stage will introduce additional security against the unauthorised reconstruction of the information and can be executed in two independent versions of protocols for assigning created shadows to protocol participants. The first one is the version involving a trusted arbiter to mediate in the assignment and reconstruction of information. The second is the version without the arbiter, but with the assignment of the introduced grammar as a new, additional part of the secret.
5 Linguistic Extension of Threshold Schemes for DNA-Like Secret Scharing Expansion of the threshold scheme by an additional stage of converting the secret recorded in the form of a bit sequence is performed thanks to the application of context-free grammar.
Grammar Encoding in DNA-Like Secret Sharing Infrastructure
179
Depending on the production set such grammar can change the bit sequences in the form of zeros and ones into a sequence of grammar production numbers that allow the generation of the original bit sequence. The conversion of representation is ensured through syntax analyser that changes the bit sequence into numbers of linguistic rules of the grammar in square time. The graphic representation of using the grammar expansion in classical threshold schemes is presented in Fig. 1. After performing such a transformation, any scheme of secret sharing can be applied to distribute the constituents among any number of n participants of the protocol. This means that at this stage, any classical (m, n)-threshold algorithm for secret sharing can be run. However, the secret being split is not a pure bit sequence, but a sequence composed of numbers of syntactic rules of the introduced grammar. Depending on its structure and type, it can contain values of two, three or more bits. In that case, the structure of the grammar will be similar, but the sequence of generation rule numbers obtained will have a greater range of values.
Fig. 1. Shadow generation scheme in the expanded threshold algorithm. The expansion concerns the use of grammar at the stage of converting the bit representation into sequences of numbers of linguistic rules in grammar.
To illustrate the idea of an enhanced linguistic coding, a generalised version of a linguistic information splitting algorithm will be presented for a grammar that converts blocks of several bits. G=( VN, VT, PS, SS), where:
180
M.R. Ogiela and U. Ogiela
VN = {SECRET, BB, 1B, 2B, 3B, 4B, 5B, 6B, …, NB} – a set of non-terminal symbols VT = {1b, 2b, 3b, 4b, 5b, 6b, …, nb, λ} – a set of terminal symbols which define each bit block value. {λ} – defines an empty symbol. SS = SECRET - the grammar start symbol. A production set PS is defined in following way. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12.
A grammar introduced in this way can support a quicker and briefer re-coding of the input representation of the secret to be shared. Versions for longer bit blocks can be used in the same way. An obvious benefit of grouping bits into larger blocks is that during the following steps of the secret sharing protocol we get shorter representations for the split data. Executing the introduced algorithms provides an additional stage for re-coding the shared secret into a new representation using grammatical rules. The grammar itself can be kept secret or made available to the participants of the entire protocol. If the allocation of grammatical rules is to remain secret, what we deal with is an arbitration protocol, which – to reconstruct the secret for the authorised group of shadow owners – requires the participation of a trusted arbiter, equipped with information about grammar rules. Should the grammar be disclosed, the reconstruction of the secret is possible without the participation of the trusted person and only on the basis of the constituent parts of the secret kept by the authorised group of participants in the algorithm of information sharing.
6 Characteristic of Linguistic Threshold Schemes The most important characteristics of the proposed method of linguistic secret splitting include: • Linguistic methods may be used to create a new class of threshold schemes for secret sharing. Thus the use of formal grammars supports the creation of a new class of intelligent secret sharing methods operating on various lengths of coded data.
Grammar Encoding in DNA-Like Secret Sharing Infrastructure
181
• Sequential grammars allow creating a more generalised protocol which may be used in the hierarchical management of strategic information. Managing this information may become necessary in various circumstances, e.g. in industrial corporations, in the management of secret data important for security reasons, but also of ordinary multimedia data. • The security of this method is guaranteed by the mathematical properties of the information splitting methods used. • The complexity of the process of creating the components of the split secret remains at the multinomial level. For context-free grammars, this complexity should be no greater than O(n2). • Linguistic scheme for bit block coding is a far-going extension of the method of secret information coding used in DNA cryptographic methods. This coding allows blocks of any number of bits to be formed and coded using terminal symbols of the grammar.
7 Conclusion This publication presents a new information sharing concept based on the use of mathematical linguistics methods and formal grammars. This procedure can be used both as a new type of an algorithm of secret splitting methods and as an intelligent protocol of assigning secret components to the authorised participants of such a protocol. It is worth noting that the presented method essentially represents an extension of classical secret sharing methods by ways of adding a new stage of input information coding into the form of sequences of numbers of linguistic rules which represent bit blocks of various lengths. The security of this method is guaranteed by the mathematical properties of the information splitting methods used, while the stage of additionally coding bit blocks does not make the cryptanalysis of this scheme any easier. This means that the generated information shadows are completely secure and without having the required number of them there is no way to reconstruct the original secret. Another important feature of the approach put forward is that the proposed linguistic scheme for bit block coding is a far-going extension of the method of secret information coding used in DNA cryptographic methods. However, unlike in the technique of converting bit pairs into particular nitrogen bases used in that method, in our procedure it is possible to code larger bit blocks, which significantly enhances the opportunities of using this technique of information hiding or splitting. Acknowledgments. This work has been supported by the AGH University of Science and Technology under Grant No. 10.10.120.783.
References 1. Adleman, L.: Molecular Computation of Solutions to Combinational Problems. Science, 266 (1994) 2. Ateniese, G., Blundo, C., De Santis, A., Stinson, D.R.: Constructions and bounds for visual cryptography. In: Meyer auf der Heide, F., Monien, B. (eds.) ICALP 1996. LNCS, vol. 1099, pp. 416–428. Springer, Heidelberg (1996)
182
M.R. Ogiela and U. Ogiela
3. Beimel, A., Chor, B.: Universally ideal secret sharing schemes. IEEE Transactions on Information Theory 40, 786–794 (1994) 4. Blakley, G.R.: Safeguarding Cryptographic Keys. In: Proceedings of the National Computer Conference, pp. 313–317 (1979) 5. Charnes, C., Pieprzyk, J.: Generalised cumulative arrays and their application to secret sharing schemes. Australian Computer Science Communications 17, 61–65 (1995) 6. Chomsky, N.: Syntactic structures. Mouton & Co., Netherlands (1957) 7. Desmedt, Y., Frankel, Y.: Threshold Cryptosystems. In: Brassard, G. (ed.) CRYPTO 1989. LNCS, vol. 435, pp. 307–315. Springer, Heidelberg (1990) 8. Hang, N., Zhao, W.: Privacy-preserving data mining Systems. Computer 40, 52–58 (2007) 9. Jackson, W.-A., Martin, K.M., O’Keefe, C.M.: Ideal secret sharing schemes with multiple secrets. Journal of Cryptology 9, 233–250 (1996) 10. Ogiela, M.R., Tadeusiewicz, R.: Modern Computational Intelligence Methods for the Interpretation of Medical Images. Springer, Heidelberg (2008) 11. Ogiela, M.R., Ogiela, U.: Linguistic Cryptographic Threshold Schemes. International Journal of Future Generation Communication and Networking 1(2), 33–40 (2009) 12. Shamir, A.: How to Share a Secret. Communications of the ACM, 612–613 (1979) 13. Simmons, G.J.: An Introduction to Shared Secret and/or Shared Control Schemes and Their Application in Contemporary Cryptology. In: The Science of Information Integrity, pp. 441–497. IEEE Press, Los Alamitos (1992) 14. Tang, S.: Simple Secret Sharing and Threshold RSA Signature Schemes. Journal of Information and Computational Science 1, 259–262 (2004) 15. Wu, T.-C., He, W.-H.: A geometric approach for sharing secrets. Computers and Security 14, 135–146 (1995)
HATS: High Accuracy Timestamping System Based on NetFPGA Zhiqiang Zhou1, Lin Cong1, Guohan Lu2, Beixing Deng1, and Xing Li1 1
Department of EE Tsinghua University, Beijing, China 2 Microsoft Research Asia, Beijing, China [email protected]
Abstract. The delay and dispersion of the packet train have been widely used in most network measurement tools. The timestamp of the packet is critical for the measurement accuracy. However, timestamping performed either in the application or the kernel layer would be easily affected by the source and destination hosts especially in high-speed network. Therefore, to evaluate the impact of the timestamp precision on the measurement, a high accuracy timestamping hardware system (HATS) based on NetFPGA was designed and implemented. With HATS, the deviation of timestamp accuracy among the application, the kernel and the hardware layers was analyzed. Keywords: Network measurement; timestamp accuracy.
HATS is built to timestamp the probe packets in hardware with high time resolution. It can eliminate the effect caused by host architecture. With HATS, we can measure and estimate the network behavior accurately. It is also possible to evaluate the effect of the host behavior on timestamp precision. HATS is based on NetFPGA, which is an open Linux platform with high-speed network hardware. The outline for this paper is as follows. Part 2 introduces the NetFPGA platform. This is followed by part 3 and 4 where we discuss the design and implementation of HATS. In part 5 we describe the design of experiments evaluating the timestamp accuracy. The analysis of the timestamp accuracy among the application, the kernel and the hardware layer is performed in part 6. Related work about the timestamp precision is presented in part 7. At last we conclude in part 8.
2 Platform Architecture NetFPGA is an open network platform for high-speed network research and design. It has a core clock running at 125MHz, with a timer resolution of 8ns. There are NetFPGA packages (NFPS) with source code implementing various network functions on the Internet [1]. NFPS include three parts, the kernel module, the software used to communicate with the hardware and the reference hardware designs. Four reference designs are available, reference router, reference NIC, reference switch and reference hardware accelerated Linux router. In the reference designs, the hardware is divided into modules. That is helpful for users to modify the reference design. The registers module is independent of the pipeline modules. That is specially designed to simplify the process of adding a module to the design as it doesn't require modifying the central pipeline. With the reference designs and NFPS, one can verify his thought, extend his own design based on the reference design or implement a completely new design without care of the reference design.
3 System Design Based on the NetFPGA, the system ought to be attached to a desktop PC or server via the PCI and run as a Gbps NIC. It can accurately timestamp each packet passing through without consideration of the packet size, packet rate and the interval with the adjacent packets. Simultaneously, the timestamp module can affect neither receiving nor sending of the probe packets. The design of the system is described as follows. 3.1 Reference NIC To achieve accurate timestamp for each probe packet, we should timestamp the packet with the NetFPGA core clock. HATS can be designed and implemented based on the reference NIC. The block diagram of the reference NIC is shown in Fig. 1. The pipeline includes 8 transmit queues, 8 receive queues and a user data path comprised of input arbiter, output port lookup and output queues modules.
HATS: High Accuracy Timestamping System Based on NetFPGA Ethernet Interface (Rcv)
The packet from Ethernet ports (receiving) or the PCI over DMA (sending) arrives at receive queues first. The input arbiter in the user data path decides which receive queue to service next. The packet is pulled from the receive queue and transmitted to the output port lookup module (OPL). OPL decides which port the packet should go out of and submits it to the corresponding output queue. The transmit queues are responsible for sending the packets to the driver via PCI (receiving) or Ethernet ports (sending). In the registers module, each module has a group of registers which show its state. These registers can be easily read and written in certain functions via PCI. 3.2 Time Stamp 3.2.1 Format To timestamp packets in the hardware layer with the NetFPGA core clock, we can use the stamp counter module in NFPS. As the probe packets may get lost or out of order, to identify which packet the timestamp corresponds to, it is not enough to record the sending or receiving time only. Therefore a 16-bit identifier which can identify the packet uniquely is defined. The identifier can be different for varied protocols. It is the 16-bit urgent pointer for TCP, the 16-bit identification for IPV4, the low 16 bits of the flow label for IPV6 and etc. Both the identifier and the time are recorded in the timestamp. 3.2.2 Storage Up to now, the timestamp is usually processed in two ways, either stored in the NIC registers, or inserted into the packet data. In the former case, the register read is performed after each packet having been received and transmitted. That can be implemented by polling in the software. Otherwise the registers would be flushed by the following packet and the timestamp could be lost. That might be all right in low-speed network, but in high-speed network there will be PCI access conflict. Reading registers via PCI will influence the receiving and sending of the packet which needs to be
186
Z. Zhou et al.
transferred between the kernel and NetFPGA via PCI bus. Considering the latter case, there are limitations in two folds. First, writing the timestamp to the packet data may increase the packet processing time. Second, what’s more, we can only put the receiving timestamp into the receiving packet. But for transmitting packets, the timestamp comes forth after the packets leaving the hardware. It's impossible to insert the timestamp into the transmitting packet. Inserting the timestamp into the following packet will also increase the packet processing time. By comprehensive consideration, we use the RAM in the NetFPGA to store the probe packet timestamp in our design.
Fig. 2. Synchronization Mechanism
3.3 Synchronization Mechanism As the application can’t access the RAM directly, the timestamp is read through registers. As a result, it requires synchronization between the application and the hardware. The synchronization mechanism for RAM read is shown in Fig. 2. The state of the application is presented in (a), while that of the hardware in (b). First, the application writes STATE_READ into the flag register. Then if the hardware detects that, one timestamp is read out of the RAM. In the meantime the state of the flag register changes to STATE_READ_RAM. The timestamp is stored in the time registers in the next clock. If the application detects the flag register state to be STATE_READ_RAM, it will read the two time registers and writes the flag register again. It is repeated in such way until the read for all timestamp has finished. Then the flag register state changes to STATE_FINISHED. If that is detected by the application, it means that the RAM read is finished.
HATS: High Accuracy Timestamping System Based on NetFPGA
187
Fig. 3. High Accuracy Timestamping System
4 System Implementation The block diagram of HATS, as shown in Fig. 3, consists of a time counter, RAM to store the timestamp and registers to access RAM. The detailed implementation of the system is described as follows. 4.1 Time Stamp Counter There is a 64-bit time counter in the time stamp counter module. It records the relative time with 8 ns resolution from when the bitfile is downloaded to the NetFPGA. 4.2 Time Stamp The time stamp module is placed in the junction of the NIC and Ethernet. It will timestamp each probe packet sent from the NetFPGA to the Ethernet or received from the Ethernet to the NetFPGA there. Consequently, the most accurate timestamp can be obtained to estimate the network behavior, eliminating the influence of the end host. In the subsequent experiments with UDP probe packets, the 16-bit sequence number of UDP packets generated by Iperf can be used as identifiers. The format of the time stamp is shown in Fig. 4. With low 48 bits recording the receiving and sending time, the maximum time recorded can be 2^48ns=281475s=78.2h. That is more than sufficient for network measurement tools. Once the packet is received or transmitted in the NIC, the hardware writes the timestamp into the RAM. If the receiving and transmitting have been finished for all probe packets, all timestamp can be read out of the RAM.
188
Z. Zhou et al.
Fig. 4. Timestamp Format
4.3 RAM There are two pieces of RAM in the FPGA to record the timestamp for receiving and transmitting probe packets respectively. Limited by the resources in the FPGA, the RAM size can't be unreasonably large. Taking into account both the limited resources and the requirement of network measurement tools, the RAM size is set at 24576×64. That means the number of timestamp kept in the RAM is less than 24576. For the prevention of overflow, the hardware writes the RAM circularly. If the RAM address reaches 24575, the next data is written in the RAM address 0 and replaces the old data. As the new data is much more meaningful than the older one in the measurements, this way is reasonable. On finishing receiving or transmitting a probe packet, we enable the corresponding RAM. The timestamp will be written into the RAM in the next clock. 4.4 Registers As all registers in the NetFPGA are 32 bits, we need three registers for RAM read, two for timestamp storage and one for synchronization between the application and the hardware.
5 Experiment Design As mentioned above, timestamps in the application, the kernel and the hardware layers are different. In this section, experiments are designed and deployed to evaluate
Fig. 5. Testbed Topology
HATS: High Accuracy Timestamping System Based on NetFPGA
the deviation of timestamp accuracy among them. There are many factors which affect the timestamp accuracy. To explain the application of HATS, only network cross traffic and the probe packet rate are chosen as examples. The systematical experiments will be performed in the near future. 5.1 Experimental Environment The testbed, as shown in Fig. 5, has a dumbbell topology which is usually used in network measurement. Table 1 shows the configurations of hosts M0-M3, 1Gbps switch S and 100Mbps switches S0-S1. VLAN is configured in S0 and S1 to isolate M0/M2 and M1/M3. As the NetFPGA installed in M0 doesn’t function as autonegotiation, it can’t be attached to S0 directly. So we attach it to S0 through S. All link capacities are 100Mbps. 5.2 Traffic Generation The probe packets used for the system are generated by Iperf [2], which is developed by NLANR/DAST as a modern alternative for measuring maximum TCP and UDP bandwidth performance. It allows the tuning of various parameters such as packet protocol, packet length, connection port, testing time, etc. The UDP packet generated by Iperf contains a 16-bit sequence number. It can be used as the timestamp identifier in the test. While sending and receiving UDP packets, Iperf is modified to export the departure and arrival timestamp of the packets in the application layer. The poisson cross traffic is generated by the tool developed by ourselves. In the following tests, the UDP packet length is fixed to be 1500bytes. The UDP packet rate ranges from 0.01 to 100Mbps and the cross traffic rate ranges from 10 to 100Mbps. The step sizes are both 10Mbps. As the link capacity is 100Mbps, the sum of the probe packet rate and the cross traffic rate is limited no more than 100Mbps. The smallest unit used for the accuracy of time is microsecond (us).
190
Z. Zhou et al. Table 2. Program Deployed in Hosts M0-3
Test
M0
M1
M2
M3
A
Iperf(sending)/ poisson_rcv
Iperf(receiving)
——
poisson_snd
B
Iperf(receiving)/ poisson_snd
Iperf(sending)
——
poisson_rcv
5.3 Experiment Design To evaluate timestamp accuracy for both sending and receiving ends, two groups of tests are performed in the testbed shown in Fig. 5. As the delay and dispersion of the packet train may be affected by the network cross traffic and the probe packet rate, the program deployed in host M0-M3 is described in Table 2. All tests run for 6 times to collect enough data for analysis. Both Iperf and poisson_snd last for 60s each time. Then the RAM on the NetFPGA board can be full filled each time. About 24576×6=147456 group of data is collected in each test. 5.4 Data Collection and Metrics Iperf exports the timestamp in the application layer, while tcpdump records the timestamp in the kernel layer in the pcap file. HATS is responsible for keeping the timestamp in the hardware layer.
memory bus
PCI bus
Fig. 6. Timestamp Interval in three layers
memory bus
PCI bus
HATS: High Accuracy Timestamping System Based on NetFPGA
191
As shown in Fig. 6, the timestamp interval in the application layer is denoted as A , in the kernel as K and in the hardware as H . The absolute deviation between K and H is denoted as ADK , between A and H as ADA . The relative deviation between K and H is denoted as RDK , between A and H as RDA , where
RDA = A− H / H = ADA / H ,
(1)
RDK = K − H / H = ADK / H .
6 Evaluation and Analysis 6.1 Preprocess In consideration of packet loss, the collected data is preprocessed. We extract the identifiers in all timestamp and only keep probe packets with the same identifier in all three layers for the following analysis. Others are discarded. A , K and H for all reserved packets are calculated in this step. Table 3. ANOVA Table of CTR for Sending ( ADA ) Source
SS
df
MS
F
Prob>F
Columns
0.2
6
0.03
1.4e-5
1
Error
2483793433.4
10131779
2407.29
——
——
Total
2483793433.6
10131785
——
——
——
Table 4. ANOVA Table of CTR for Sending ( ADK ) Source
SS
df
MS
F
Prob>F
Columns
0.03
6
0.005
5.4e-5
1
Error
96279276.20
10131779
93.3139
——
——
Total
96279276.23
10131785
——
——
——
Table 5. ANOVA Table of PPR for Sending ( ADA ) Source
SS
df
MS
F
Prob>F
Columns
1.3
5
0.27
0
1
Error
2230752360.7
884202
2522.9
——
——
Total
2230752362.0
884207
——
——
——
192
Z. Zhou et al. Table 6. ANOVA Table of PPR for Sending ( ADK )
Source
SS
df
MS
F
Prob>F
Columns
7.7
5
1.538
0.01
0.999
Error
98576388.9
884202
111.486
——
——
Total
98576396.6
884207
——
——
——
Table 7. ANOVA Table of CTR for Receiving ( ADA ) Source
SS
df
MS
F
Prob>F
Columns
0.05
6
0
6.1e-7
1
Error
14907654031.05
10131562
1445105
——
——
Total
14907654031.10
10131568
——
——
——
Table 8. ANOVA Table of CTR for Receiving ( ADK ) Source
SS
df
MS
F
Prob>F
Columns
0.01
6
0.0015
2.3e-5
1
Error
65210966.99
10131562
63.2158
——
——
Total
65210967.00
10131568
——
——
——
Table 9. ANOVA Table of PPR for Receiving ( ADA ) Source
SS
df
MS
F
Prob>F
Columns
9.7
5
1.9
7.7e-5
1
Error
22305934803.3
884028
25232.2
——
——
Total
22305934813.0
884033
——
——
——
Table 10. ANOVA Table of PPR for Receiving ( ADK ) Source
SS
df
MS
F
Prob>F
Columns
10.2
5
2.0437
0.02
0.9997
Error
75742611.8
884028
85.679
——
——
Total
75742622.0
884033
——
——
——
HATS: High Accuracy Timestamping System Based on NetFPGA Table 11.
AD and RD of various PPR for Sending (CTR=30Mbps)
Packet rate(Mbps) ADA (us)
10
20
30
40
50
60
70
28.1
25.3
25.4
25.3
23.8
25.7
26.1
RDA (%)
2.5
4.4
6.6
9.2
11.7
17.6
22
ADK (us)
16.1
14.9
14.8
15.2
14.3
15.9
16.1
RDK (%)
1.5
2.6
3.9
5.6
6.8
9.7
12.1
Table 12.
193
AD and RD of various PPR for Receiving (CTR=30Mbps)
Packet rate(Mbps) ADA (us)
10
20
30
40
50
60
70
21.4
20.9
21.2
22.1
22.4
23.3
22.3
RDA (%)
1.8
3.4
5.2
7.2
9.2
11.5
12.7
ADK (us)
7.0
7.9
8.0
8.2
10.1
9.0
9.7
RDK (%)
0.6
1.3
1.9
2.6
4.1
4.3
5.5
6.2 One-Way Analysis of Variance Analysis of variance (ANOVA) is a technique for analyzing the way in which the mean of a variable is affected by different types and combinations of factors. Oneway analysis of variance is the simplest form. It is an extension of the independent ttest. It can be used to compare more than two groups or treatments. It compares the variability between the samples (caused by the factor) with the variability within the samples (caused by systematic error). Rejecting the null hypothesis means the factor is significant. In this section, one-way analysis of variance is used to evaluate the effect of different probe packet rate (PPR) and cross traffic rate (CTR) on ADA and ADK . As the two factors are independent, CTR is set at 40Mbps for testing PPR, while PPR is set at 40Mbps for testing CTR. The results for sending are shown in Table 3-6. Table 7-10 show the results for receiving. The first column of each table shows the source of the variability. The second shows the Sum of Squares due to each source. The third shows the degrees of freedom associated with each source. The fourth shows the Mean Squares for each source. The fifth shows the F statistic. The sixth shows the p-value, which is derived from the cdf of F. As the p-values in Table 3-10 are all approximate to 1, we should accept the null hypothesis. That means neither PPR nor CTR is significant for the absolute measurement error. 6.3 Relative Deviation Analysis In this section, the timestamp interval relative deviation of different probe packet rates is analyzed. The cross traffic rate is set at 30Mbps for the following analysis. The results report 90% confidence interval.
194
Z. Zhou et al.
As shown in Table 11 and 12, absolute deviation of various PPR for both sending and receiving is similar. That is consistent with the One-way ANOVA above. However, as the probe packet rate increases, RDA and RDK increase significantly. The reason for this is that, as the PPR increases, absolute deviation varies little, while the interval between probe packets decreases linearly, as a result, RD increases obviously. Observed from the tables above, there is another point that needs to be noticed. ADA is obviously larger than ADK . So the timestamp in the application layer is more inaccurate than that in the kernel layer.
7 Related Work Most measurement tools are affected by the timestamp accuracy. There are many researches focusing on modifying the network state measurement algorithm, but few researches have been done to analyze and evaluate the deviation of timestamp accuracy among the application, the kernel and the hardware layers. There are several methods to improve the timestamp accuracy. The Global Positioning System (GPS) is widely used in network time synchronization to measure the one way delay (OWD) [3]. It can provide reliable clock synchronization with high accuracy in the order of tens to hundreds of nanoseconds. Li Wenwei proposed to remove the timestamping place from application to network driver to elevate timestamp precision [4]. Some network cards (including SysKonnect) have an onboard timestamp register which can provide information on the exact packet arrival time, and pass this timestamp to the system buffer descriptor [5]. Then the NIC timestamp can replace the system clock timestamp. Endace DAG NICs provide 100% packet capture, regardless of interface type, packet size or network loading. They supply packets through their own API, which provide nanosecond timestamp [6]. However, they're not regular networking cards, as they capture packets bypassing the kernel, network stack and libpcap. Also, they are relatively expensive and requiring custom hardware at the end points limits the flexibility of the framework. Reference [7] presented a method that estimated the timestamp accuracy obtained from measurement hardware Endace DAG 3.5E and software Packet Capture Library. Reference [8] quantified and discussed various impacts on timestamp accuracy of application-level measurements. They used the Distributed Passive Measurement Infrastructure (DPMI), with Measurement Points (MPs) instrumented with DAG 3.5E cards for the reference link-level measurements. Reference [9] investigated how measurement accuracy was affected by hardware and software that was used to collected traffic traces in networks. They compared the performance of the popular free softwares tcpdump and windump with the dedicated measurement card DAG in terms of packet inter-arrival times and data loss.
8 Conclusions An accuracy timestamping system HATS was designed and implemented based on NetFPGA. With HATS, the deviation of timestamp accuracy among the application,
HATS: High Accuracy Timestamping System Based on NetFPGA
195
the kernel and the hardware layers was evaluated and analyzed. The experiments demonstrated that the timestamp in the application or the kernel layers, affected by the end hosts, was not as accurate as that in the hardware layer. The relative deviation increased a lot as the packet rates increased. Therefore the timestamp in the application or the kernel layer is only suitable for low-speed network measurement, while in high-speed network it requires to timestamp probe packets in the hardware layer. Next, we plan to perform more experiments with HATS to evaluate the errors induced by the application and the kernel layer timestamp for different measurement algorithms and tools. Moreover, we will attempt to improve the performance of the measurement tools in high-speed network based on these results. Acknowledgments. This work is supported by National Basic Research Program of China (973 Project, No.2007CB310806) and National Science Foundation of China (No.60850003).
References 1. NetFPGA, http://www.netfpga.org 2. Iperf, http://dast.nlanr.net/Projects/Iperf/ 3. Batista, D.M., Chaves, L.J., da Fonseca, N.L.S., Ziviani, A.: Performance analysis of available bandwidth estimation. The Journal of Supercomputing (October 2009) 4. Wenwei, L., Dafang, Z., Gaogang, X., Jinmin, Y.: A High Precision approach of Network Delay Measurement Based on General PC. Journal of Software (February 2006) 5. SysKonnect, http://www.syskonnct.com 6. Endace DAG NICs, http://www.endace.com/dag-network-monitoring-cards.html 7. Arlos, P., Fiedler, M.: A Method to Estimate the Timestamp Accuracy of Measurement Hardware and Software Tools. In: Uhlig, S., Papagiannaki, K., Bonaventure, O. (eds.) PAM 2007. LNCS, vol. 4427, pp. 197–206. Springer, Heidelberg (2007) 8. Wac, K., Arlos, P., Fiedler, M., Chevul, S., Isaksson, L., Bults, R.: Accuracy Evaluation of Application-Level Performance Measurements. In: Proceedsings of the 3rd EURO-NGI Conference on Next Generation Internet Networks Design and Engineering for Heterogeneity (NGI 2007) (May 2007) 9. Arlos, P., Fiedler, M.: A Comparison of Measurement Accuracy for DAG, Tcpdump and Windump (January 2007), http://www.its.bth.se/staff/pca/aCMA.pdf,verif
A Roadside Unit Placement Scheme for Vehicular Telematics Networks Junghoon Lee1 and Cheol Min Kim2, 1
Dept. of Computer Science and Statistics 2 Dept. of Computer Education Jeju National University, 690-756, Jeju Do, Republic of Korea [email protected], [email protected] Abstract. This paper designs and measures the performance of a roadside unit placement scheme for the vehicular telematics network, aiming at improving connectivity and reducing the disconnection interval for the given number of roadside units, the transmission range, and the overlap ratio on the road network of Jeju city. The placement scheme begins with an initial selection that every intersection is the candidate. For each circle surrounding the candidate position with the radius equal to the transmission range, the number of vehicle reports inside the circle is counted. After ordering the candidates by the count, the placement scheme makes the candidate survive when it is apart from all the other candidates already selected by at least the distance criteria. Performance measurement result obtained using the real-life movement history data in Jeju city shows that about 72.5 % of connectivity can be achieved when the number of roadside units is 1,000 and the transmission range is 300 m, while the disconnection time is mostly kept below 10 seconds. Keywords: Vehicular telematics, roadside unit, vehicle movement history, network connectivity, network planning.
1
Introduction
Nowadays, the vehicular telematics network is extending its coverage area and integrating quite a lot of diverse communication technologies such as IEEE 802.11, Zigbee, DSRC (Dedicated Short Range Communication), and the like[1]. Roadside units, or RSU, installed at a fixed location along the roadside, make it possible for vehicles to connect to the global network[2]. RSU can be 802.11 access point, which typically plays the role of gateway to the global network from the vehicle[3]. Through this buffer point, all data on the RSU can be uploaded and downloaded, including location-dependent advertisement, real-time traffic, and vicinity digital map, along with the existing safety applications[4].
This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency). (NIPA-2010(C1090-1011-0009)). Corresponding author.
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 196–202, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Roadside Unit Placement Scheme for Vehicular Telematics Networks
197
However, it is very difficult to decide where to place RSUs. The performance criteria for the RSU placement scheme is necessarily the network connectivity, that is, the probability that a vehicle can connect to an RSU. Intuitively, it is desirable to install RSUs at the place where many vehicles are concentrated. However, it’s not so simple if we consider multiple RSUs, the transmission range of the wireless interface, vehicle density distribution, and particularly, the road network organization. In the meantime, the location history data obtained from a real-time vehicle tracking system can be very helpful to design and assess an RSU placement scheme. In this regard, this paper is to design an RSU placement scheme capable of taking into account the underlying road network organization based on the real-life vehicle movement data. This paper is organized as follows: After issuing the problem in Section 1, Section 2 provides some background and related work. Then, Section 3 proposes an RSU placement scheme. After demonstrating the performance measurement results in Section 4, Section 5 summarizes and concludes this paper with a brief description on future work.
2
Background and Related Work
Nominated as the Telematics Model City, Jeju Island is launching lots of industrial and academic projects[5]. Telematics devices are now popularized for both rent-a-cars and taxis in this area. Accordingly, Jeju area possesses a telematics network consist of a lot of active telematics devices, enabling us to design, develop, and test diverse challenging services. For example, Taxi Telematics System collects the location of each taxi to support real-time tracking and efficient taxi dispatch. Each of member taxis reports its location every minute. A great deal of location or movement data are being accumulated day by day, and various network planning systems can benefit from those data and to estimate available bandwidth, connection duration time, and cost as well as to test the feasibility of a new service[6]. As for a study on the R2V communication, Zhang et al. have proposed a data access scheduling scheme[2]. When a vehicle enters the roadside unit area, it first listens to the wireless channel, and all of them can send request to the unit with its deadline when they want to access the network[7]. To cope with access from multiple vehicles, the message is scheduled by its deadline and data size. Additionally, this scheme exploits a single broadcast to serve multiple requests and the authors also identified the effect of the upload request on data quality. However, simply aiming at enhancing the service ratio, this scheme may suffer from a problem that a node with good link condition keeps being served, while some nodes have little chance to transmit its message. In our previous work, RSU placement begins with the initial selection of the candidate locations on the virtual grid[8]. The distance between two candidates is chosen by the overlap ratio, which means how much two transmission ranges for two adjacent RSUs can overlap. Now, every circle counts the number of vehicle points belonging to itself. A vehicle point means a snapshot location of
198
J. Lee and C.M. Kim
a vehicle stored in the location history database. However, this scheme didn’t consider the road network topology, not giving precedence to the road area for the candidate location. Recently, Yu et al. proposed an RSU placement scheme based on the genetic algorithm[9]. From the initial RSU placement, a genetic algorithm runs until if finds the solution that meets the given requirement on cost, connectivity. However, this scheme needs manual setting of candidate set. In addition, the classical maximal closure problem needs quite complex heuristic and it is generally appropriate for the Euclidean space.
3 3.1
Placement Scheme Framework Overview
Figure 1 shows the user interface implementation of our connectivity analysis framework. The road network of Jeju city is drawn with light lines and rectangles, respectively representing road segments and intersections. Dark dots indicate the location of each vehicle. Remind that each vehicle reports its location every minute, and every location is marked with a point regardless of its time stamp value. The figure also shows circles, each of which surrounds an RSU, with its radius equal to the transmission range. If a vehicle is located inside a circle, it can access the data or even Internet via the RSU. Otherwise, namely, when it cannot be included in any circle, the vehicle is disconnected from the network system, and it should keep messages generated during the interval in its local cache[10]. Anyway, the more dots are included in the circles, the better will be the RSU placement. If a vehicle can make a connection to any RSU, it can send its data, so our goal is to maximize the possibility that a vehicle can access the RSU for the given number of RSUs. The connectivity depends on many factors such as the transmission range, the number of RSUs, the movement pattern of each vehicle. The movement pattern decides the vehicle density and it greatly depends on the road network organization. Intuitively, the larger the transmission range, the higher the connectivity we can expect. In addition, the more the RSU, the more vehicles can be reachable. The mobility pattern is very similar for each day, so if a scheme works well on the previous data, it will also work well in the future. Hence, the history log can be so much helpful. Besides the connectivity, how long a connection will last can give a useful guideline to design a new vehicular application. Moreover, how long a vehicle will be disconnected during its drive can specify the amount of buffer space[11]. Thus, the analysis framework also implements the function to trace the state change of a vehicle, that is, connected or disconnected, based on the timestamp of each history record. 3.2
Placement Scheme
The problem to solve is, for the given number of RSUs, to place the circles of radius, r, equal to the transmission range, so that the number of points included
A Roadside Unit Placement Scheme for Vehicular Telematics Networks
199
Fig. 1. Framework overview
in the circles is maximized. The placement strategy depends on the overlap ratio which represents how much the transmission area of two RSUs can overlap. If this ratio is large, the coverage area of the city-wide roadside network gets smaller. However, temporary disconnection can be much reduced and the connection will last longer. For the high density area, a large overlap ratio is desirable, as more points can be excluded from the circles, even if the excluded area is small.
1.0
1.0
r
Fig. 2. Overlap ratio
The main idea of the placement scheme is that every intersection of a road network can be a candidate location for RSU installation. The number of intersections in Jeju city is about 17,000, and they are represented by light rectangles in Figure 1. Initially, all RSU locations are selected and the placement scheme checks to which circle a vehicle report point (dark dot) is belonging. A single dot can belong to more than one circle. Then, the candidate locations are sorted by the number of dots they include to give precedence to the location having the heavier traffic. From the RSU with the highest count, the placement scheme
200
J. Lee and C.M. Kim
decides whether to include the RSU to the final set. An RSU will survive in the candidate set, if it is located sufficiently far away from all the RSUs which are already included in the survival list. After all, the first n RSUs will be the final RSU locations, where n is the number of RSUs given to the placement procedure. The distance is decided by the transmission range and the overlap ratio. For example, if the transmission range is 300 m while the overlap ratio is 0.8, an RSU can survive if it is apart from all the survived RSUs by at least 300 × (1.0+0.8) m, namely, 540 m. This strategy may lead to frequent disconnection when the distance between two intersections is longer than the above-mentioned bound. Even in this case, we can calculate the length of each road segment to find those exceeding the distance requirement. Then, this segment can be made to have one or more virtual intersections in the middle of it.
4
Performance Measurement
This section measures the performance of the proposed scheme in terms of network coverage and disconnection time, according to the transmission range, the number of RSUs, and the overlap ratio. Transmission range is the distance reachable from an RSU. It depends on the wireless interface technology, currently known to be ranging from 50 m to 300 m, and upcoming wireless technology will continuously extend this range. Even for the same wireless interfaces, their actual transmission distances can be different in different areas, for example, plain outfield area, downtown area having many tall buildings, and so on. So, we consider it as the average transmission distance. Additionally, the small overlap ratio can generate a multiply covered area, while the large one can bring the blind area, which is covered by no RSU.
1
1 "100m" "200m" "300m"
"100m" "200m" "300m"
0.8
Connectivity
Connectivity
0.8 0.6 0.4
0.6 0.4 0.2
0.2
0
0 0
100 200 300 400 500 600 700 800 900 1000 The number of RSUs
(a) Overlap ratio=0.8
0
100 200 300 400 500 600 700 800 900 1000 The number of RSUs
(b) Overlap ratio=1.2
Fig. 3. Analysis of connectivity
Figure 3 plots the network coverage according to the number of RSUs placed by the proposed scheme, for the transmission ranges of 100 m, 200 m, and 300 m,
A Roadside Unit Placement Scheme for Vehicular Telematics Networks
201
respectively. In this experiment, the overlap ratio is set to 0.8 and 1.2, as shown in Figure 3 (a) and Figure 3 (b). Not to mention, when the transmission range is 300 m, the connectivity is best. For the same number of RSUs, connectivity is better when the overlap ratio is 0.8 than when it is 1.2. In Figure 3 (a), the connectivity can reach up to 72.4 %, while 56.1 % in Figure 3 (b). Each 100 m increase of transmission range can improve the connectivity by about 17 % when the number of RSUs is 1,000. In contrast, this improvement has less effect for the large overlap ratio. When the overlap ratio is 1.2, there is no area multiply covered by two or more RSUs, and each RSU is apart more than its transmission range. So, the total area of all circles is the largest, but there must be many uncovered spots. This situation is problematic in the area which have high traffic. Figure 4 shows the average disconnection interval according to the number of RSUs also when the overlap ratio is 0.8 and 1.2. Each graph has three curves for the case that the transmission range is 100 m, 200 m, and 300 m. When the overlap ratio is 0.8 and the transmission range is 300 m, the disconnection interval is less than 10 seconds for more than 300 RSUs in the entire city. For the case that the transmission range is 200 m, 500 RSUs are needed for the same level of intermittent interval. The disconnection interval is less affected by the overlap ratio. The large overlap ratio is appropriate when a vehicle mainly connects to an RSU for a very short time.
"100m" "200m" "300m"
45 40 35 30 25 20 15 10 5
50
Disconnection interval (sec)
Disconnection interval (sec)
50
"100m" "200m" "300m"
45 40 35 30 25 20 15 10 5
0
100 200 300 400 500 600 700 800 900 1000 The number of RSUs
(a) Overlap ratio=0.8
0
100 200 300 400 500 600 700 800 900 1000 The number of RSUs
(b) Overlap ratio=1.2
Fig. 4. Analysis of disconnection interval
5
Conclusion
This paper has designed and measured the performance of a roadside unit placement scheme for the vehicular telematics network, aiming at improving connectivity and reducing the disconnection interval for the given number of roadside units, the transmission range, and the overlap ratio on the actual road network of Jeju city. For this purpose, our analysis framework implements road network visualizer, RSU locator, and transmission range marker along with the movement
202
J. Lee and C.M. Kim
tracker of each vehicle, making it possible to measure how many vehicle reports are included in the roadside network coverage as well as how long a connection will last. The placement scheme begins with an initial selection that every intersection is the candidate. For each circle surrounding the candidate position with the radius set to the transmission range, the number of vehicle reports is counted. After ordering the candidates by the count, the placement scheme makes the candidate survive when it is apart from all the other candidates already selected by at least the distance criteria. Performance measurement result obtained using the reallife movement history data in Jeju city shows that about 72.5 % of connectivity can be achieved when the number of RSU is 1,000 and the transmission range is 300 m, while most of disconnection time kept below 10 seconds. Moreover, each 100 m increase of transmission range can improve the connectivity by about 17 % when the number of RSUs is 1,000. As future work, we are planning to investigate how to decide the overlap ratio according to the vehicle density. Even though this paper assumed that this ratio is given as a constant, it can be adjusted according to many factors such as vehicle density, tolerable disconnection time, and so on.
References 1. Society of Automotive Engineers: Dedicated short range communication message set dictionary. Tech. Rep. Standard J2735, SAE (2006) 2. Zhang, Y., Zhao, J., Cao, G.: On scheduling vehicle-roadside data access. In: ACM VANET, pp. 9–18 (2007) 3. Ott, J., Kutscher, D.: Drive-thru internet: IEEE 802.11b for automobile users. In: IEEE INFOCOM (2004) 4. US Depart of Transportation. Vehicle safety communication project-final report. Technical Report HS 810 591 (2006), http://www-nrd.nhtsa.dot.gov/departments/nrd-12/pubs_rev.html 5. Lee, J., Park, G., Kim, H., Yang, Y., Kim, P., Kim, S.: A telematics service system based on the Linux cluster. In: Shi, Y., van Albada, G.D., Dongarra, J., Sloot, P.M.A. (eds.) ICCS 2007. LNCS, vol. 4490, pp. 660–667. Springer, Heidelberg (2007) 6. Wu, H., Guensler, R., Hunter, M.: MDDV: A mobility-centric data dissemination algorithm for vehicular networks. In: ACM VANET, pp. 47–56 (2004) 7. Mak, T., Laberteaux, K., Sengupta, R.: A multi-channel VANET providing concurrent safety and commercial services. In: ACM VANET, pp. 1–9 (2006) 8. Lee, J.: Design of a network coverage analyzer for roadside-to-vehicle telematics network. In: 9th ACIS SNPD, pp. 201–204 (2008) 9. Yu, B., Gong, J., Xu, C.: Data aggregation and roadside unit placement for a vanet traffic information system. In: ACM VANET, pp. 49–57 (2008) 10. Hull, B., Bychkovsky, V., Zhang, Y., Chen, K., Goraczko, M.: CarTel: A distributed mobile sensor computing system. In: ACM SenSys (2006) 11. Caccamo, M., Zhang, L., Sha, L., Buttazzo, G.: An implicit prioritized access protocol for wireless sensor networks. In: Proc. IEEE Real-Time Systems Symposium (2002)
Concurrent Covert Communication Channels Md Amiruzzaman1 , Hassan Peyravi1, M. Abdullah-Al-Wadud2 , and Yoojin Chung3,
2
1 Department of Computer Science Kent State University, Kent, Ohio 44242, USA {mamiruzz,peyravi}@cs.kent.edu Department of Industrial and Management Engineering, Hankuk University of Foreign Studies, Kyonggi, 449-791, South Korea [email protected] 3 Department of Computer Science, Hankuk University of Foreign Studies, Kyonggi, 449-791, South Korea [email protected]
Abstract. This paper introduces a new steganographic technique in which a set of concurrent hidden channels are established between a sender and multiple receivers. Each channel is protected by a separate key. The method can work with JPEG blocks in which an 8 × 8 block is divided into four non-overlapping sets, each consisting a covert channel that hides a single bit of information. A receiver can decode its independent hidden data using its dedicated key. The distortion of the covert channel data is controlled by minimizing the round-off error of the JPEG image. The method tries to keep the coefficients of the original histogram intact while carrying hidden bits. The method is immune against first order statistical detection. Keywords: Covert Channel, Concurrent channel, steganography, JPEG.
1
Introduction
In recent years multimedia security has achieved significant attention in steganography field. As more and more data hiding techniques are being developed or improved, so as steganalysis or covert channel detection techniques. The pioneer work of steganography started with the the modification of the least significant bit (LSB) of an image. The LSB modification and LSB matching of images have two different application areas. LSB modification is popular for uncompressed domain, while LSB matching is popular for compressed domain. It is found that detection processes of these techniques are also different. Nowadays, steganographic techniques are getting more secure against statistical detection and undetectable by other different detection techniques. Many innovative steganographic algorithms are developed within last decade.
Corresponding author.
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 203–213, 2010. c Springer-Verlag Berlin Heidelberg 2010
204
Md. Amiruzzaman et al.
Steganography, in digital image processing, started with altering normal JPEG images to hide information. JPEG divides an image into 8 by 8 pixel blocks, and then calculates the discrete cosine transform (DCT) of each block by Equations (1) through (5). Figure (1) shows the JPEG compression process along with an encoding embedded covert channel unit that is described in Section 2. 8 × 8 pixel block =⇒
Discrete Cosine Transform (DCT)
=⇒
Covert Channel Encoder
=⇒
Binary Encoder
=⇒
Quantization
=⇒
Entropy Encoding
=⇒ Out put data stream
Fig. 1. Baseline sequential JPEG encoding and a covert channel unit
During the DCT process for the image compression, the image is divided into 8 × 8 blocks [12], each block contains 64 DCT coefficients each. The DCT coefficients are F (u, v), of and 8 × 8 block of image pixels f (x, y) are obtain by Equation (1) and Equation (2) F (u, v) =
7 7 1 (2x + 1)uπ (2y + 1)vπ I(u)I(v) f (x, y) cos cos 4 16 16 x=0 y=0
(1)
f (x, y) =
7 7 (2y + 1)vπ 1 (2x + 1)uπ cos I(u)I(v)F (u, v) cos 4 u=0 v=0 16 16
(2)
where
√1 2
for u, v = 0 (3) 1 Otherwise The JPEG coefficients are integer number, which can be measured by Equation (4), F (u,v) F Q (u, v) = IntegerRound Q(u,v) (4) I(u), I(v) =
This method also keeps the information of DCT values1 , which can be obtain by Equation (6), F (u, v) (6) F Q (u, v) = Q(u, v) 1
The difference between DCT coefficients and DCT values is that DCT coefficients are integer number and DCT values are floating point number.
Concurrent Covert Communication Channels
205
As DCT coefficients are integer number and DCT values are floating point number thus because of rounding operation there occurs a error, which can be measured by Equation (7),
The J-Steg [11] scheme is considered as the first steganographic method for digital images in which the least significant bit (LSB) of an image is altered to hide a bit of data. This technique suffers from first order statistical detection in which a simple chi-square (χ2 ) test can detect the hidden information [15]. Subsequent modifications of Jsteg in the form of F3 and F4 [13] have made the detection slightly harder. To avoid first order statistical detection, F5 [13] and OutGuess [7] methods were introduced. The OutGuess method can (closely?) preserve the histogram of the original image. With a matrix embedding technique, F5 received significant attentions mainly due to a relatively less modification of the original image and better embedding capacity. What distinguishes F5 from F4 is for its matrix embedding implementation. While OutGuess and F5 are immune from first order statistical detection such as χ2 test, they are not immune from a generalized χ2 test [8,14]. OutGuess and F5 were broken separately by a method known as as calibrated statistical attack [3,4]. Later on, several time several other good methods, known as MBS1 (Model Based Steganography 1) [9] and MBS2 (Model Based Steganography 2) [10] were developed. Model Based Steganographic methods can be broken by first order statistics [2]. As more and more data hiding techniques have been developed and improved, so as steganalysis detection techniques, to the point that steganography has become very complicated. Nowadays, steganographic techniques are getting more immune from statistical detection techniques, and some can avoid other detection techniques. In order to maximize the strength of a steganographic method and minimize the diagnostic effects of it symptoms by a steganalysis techniques is to reduce the level of distortion and flipping bits. Normal JPEG images are having distortion for two reasons, one as a result of quantization and other other as the result of rounding DCT values [5]. The basic problem with F5 method is the increasing number of zeros, called shrinkage it generates. To overcome the shrinkage problem, and considering rounding errors, a modified matrix embedding technique was proposed [6]. Distortion by rounding operation in JPEG image processing has been studied in [5].
206
Md. Amiruzzaman et al.
The rest of this paper is organized as follows. Section 2 describes the new concurrent hidden channel method. Section 4 summarizes performance evaluation and experimental results. Section 5 describes how this algorithm has better resistance against detection techniques. Section 6 concludes the paper with concurrent hidden channel limitations and remarks.
2
Concurrent Covert Channel Scheme
The concurrent covert channel (CCC) scheme divides a JPEG image block into m non-overlapping sub-blocks, each of which is associated with one of the concurrent cover channels. Figure (2) illustrates four concurrent channels with and with user keys. User1 User2 User3 User4
key1 key2 key3 key4
(a)
User1 User2 User3 User4 (b)
Fig. 2. Concurrent covert channels (a) with (b) without key assignment
The concurrent covert channel (CCC) method hides data in JPEG coefficients similar to those in [11,13], except the method starts working with an uncompressed image and keeps track of the rounding error information obtained from Equation (7). The method partitions the set of DCT coefficients into m sub-blocks, S1 , S2 , · · · , Sm and computes the sum of coefficient values in each each sub-block. If the result is odd and the hidden bit is 0, then one of the values in the set has to be modified. Similarly, if the sum of the set is even and the hidden bit is 1, then one of the values in the set, with maximum error, will be modified. Otherwise, no modifications is taken place. The scheme finds a coefficient in the set Si with a maximum error to modify. The candidate, Q (u∗ , v ∗ ), for modification can be found from, E(u∗ , v ∗ ) = max{E(u, v), ∀ u, v}. 2.1
(9)
Implementation
The proposed concurrent covert channel (CCC) method partitions an 8 × 8 JPEG block into n× n blocks, each having m× m coefficients, where (n× n)× (m× m) = (8×8) = 64 JPEG coefficients. Without loss of generality, let m = 4 and n = 2. In this case we have four blocks; S1 , S2 , S3 , and S4 , each having 16 JPEG coefficients. ⎡ ⎤ c11 c12 · · · c18 ⎢ c21 c22 · · · c28 ⎥ S1 S2 ⎢ ⎥ C=⎢ . = ⎥ . .. . ⎦ S3 S4 ⎣ .. . . c81 c82 · · · c88
Concurrent Covert Communication Channels
207
where, Sk = {k + 4j | 0 ≤ j ≤ 15},
1≤k≤4
In general, Sk = {k + mj | 0 ≤ j < 64/m},
1≤k≤m
(10)
Table (1) illustrates the JPEG coordinates of coefficients in each set. Table 1. Partitioning Blocks S1 S2 S3 S4
={ ={ ={ ={
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
17 18 19 20
21 22 23 24
25 26 27 28
29 30 31 32
33 34 35 36
37 38 39 40
41 42 43 44
45 46 47 48
49 50 51 52
53 54 55 56
57 58 59 60
61 62 63 64
} } } }
Each set Si , 1 ≤ i ≤ 4, is responsible to hide one bit of the secret information. The first set contains the DC coefficients, and the DC coefficients will not used to hide data. Bit bi will be hidden in sub-block Si by modifying an AC coefficient in Si (). The candidate AC coefficient corresponding to Si () for modification is the one that is obtained from Equation(9). Let Si () represents the location of the AC coefficient for change, then we modify the transmitting coefficient by incrementing or decrementing its value as, Case AC[Si ()] > 0
where, A is an array that records the number of up/down shifts on the transmitting AC coefficients. Example 1. Consider the sub-block S1 that contains five non-zero DCT coefficients in Table (1). S1 = { · · ·
−1
1
2 2
−3
··· }
Consider the the hidden information bit b1 = 0 and the corresponding rounding errors obtained from Equation (7) to be, E = { ···
0.49 0.48 0.45 0.46 0.42
··· }
Since the sum of values in S1 is odd and the hidden bit b1 = 0, we need to find a candidate AC coefficient in S1 with a maximum rounding error value. Such a candidate is −1 ∈ S1 with corresponding rounding error 0.49 ∈ E. Therefore, the transmitting set with a hidden bit will be, S1 = { · · ·
−2
1
2 2
−3
··· }
208
Md. Amiruzzaman et al.
Now, consider S2 = { · · ·
−2
1
2 2
−5
··· }
with rounding error E = { ···
0.43 0.42 0.45 0.46 0.27
··· }
and hidden bit b2 = 1. Since the sum of values in S2 is even and the hidden bit b2 = 1, we need to find a candidate AC coefficient in S2 with a rounding error value. Such a candidate is 2 ∈ S2 with corresponding rounding error 0.46 ∈ E. Therefore, the transmitting set will be, S2 = {
3
···
−2 1
2
1
−5
· · · }.
Encoding and Decoding Algorithms
The encoding and decoding of the proposed CCC method are described in Sections 3.1 and 3.2, respectively. The secret message hiding is expressed as encoding and message extracting is expressed as decoding. 3.1
Encoding
The proposed CCC scheme hides one bit of information in each non-zero subblock. The sum of a each sub-block odd/even is used to represent one secret bit. An odd/even sum can be modified to even/odd sum to hide a bit of information. Algorithm (1) describes the process steps.
1. 2. 3. 4.
5. 6.
Algorithm 1 (Encoding) Partition an 8 × 8 JPEG block into m sets, Si , 1 ≤ i ≤ m, using Equation (10). Set i = 1.
k Compute si = =1 Si (), where = 64/m. 3.1 Skip sub-blocks with no non-zero elements, i.e., si = 0. If si is odd and the hidden bit bi = 0, or if si is even and the hidden bit bi = 1, then 4.1 Find a DCT coefficient to modify according to Equation (9). Increment/decrement the DCT coefficient according to Equation (11). 5.1 Update array A Move to the next sub-block to hide the next bit, i = i + 1, and go to Step 3.
Analysis. The sub-block concept provides more flexibility to the encoding process. This scheme allows more chances to encode a bit and keeps distortions low, which helps avoiding detection.
Concurrent Covert Communication Channels
3.2
209
Decoding
Extracting covert on the receiving end is called message decoding. The decoding process is simpler than the encoding process. First, this method splits each 8 × 8 JPEG blocks into sub-blocks and then checks the sum whether that is odd or even. Algorithm (2) describes the process steps. Algorithm 2 (Decoding) 1. Partition an 8 × 8 JPEG block into m sets, Si , 1 ≤ i ≤ m, using Equation (10). 2. Set i = 1.
k 3. Compute si = =1 Si (), where = 64/m. 3.1 Skip sub-blocks with no non-zero elements, i.e., si = 0. 4. If si is odd then the hidden bit bi = 1, and if si is even then the hidden bit bi = 0. 5. Move to the next sub-block to decode the next bit, i = i + 1, and go to Step 3. 3.3
Detection Avoidance
Detection can be avoided by preserving the original histogram shape. Array A has been used to track track of the modified DCT coefficients. It tries to keep the original histogram intact while modifying one coefficient in each sub-block. The array is used to offset any deviations of the modified histogram. The role of array A that stores traces of up/down modifications of transmitting DCT coefficients is shown in Equation (11).
4
Experimental Results
The Peak Signal-to-Noise Ratio (PSNR) [1] is a well known distortion performance measurement that can be applied on the stego-images. P SN R = 10 log 10
2 Cmax M SE
,
M SE =
M N 1 (C − Ci )2 M N u=1 v=1 i
(12)
where, MSE denotes mean square error, u and v are the image coordinates, M and N are the dimensions of the image, Ci is the generated stego-image and 2 Ci is the cover image. Also Cmax holds the maximum value in the image. For example, 1 double-precision 2 Cmax = (13) 255 uint 8 bit Consider Cmax = 255 as a default value for 8 bit images. It can be the case, for instance, that the examined image has only up to 253 or fewer representations of gray colors. Knowing that Cmax is raised to a power of 2 results in a severe change to the PSNR value. Thus Cmax can be defined as the actual maximum value rather than the largest possible value.
210
Md. Amiruzzaman et al.
PSNR is often expressed on a logarithmic scale in decibels (dB). PSNR values falling below 30dB indicate a fairly low quality, i.e., distortion caused by embedding can be obvious. However, a high quality stego-image should strive for 40dB and above. Implementing the proposed CCC method is simple and easy. For the encoder and decoder, the CCC method was tested on several images. However, in this paper only four image data is has taken to show the performance with different threshold values. The sample images were 512 × 512 which have 4,096 number of 8 × 8 DCT block. With different threshold values different numbers of sensitive blocks and non-sensitive blocks has been selected to hide data, the threshold values are used to control the capacity as well as quality (i.e., PSNR). In Lena image the proposed CCC method embedded 8,034 bits with 43.09 dB PSNR value. At the same time the PSNR value of F5 method is 37.61 dB PSNR value. In Barbara image, the proposed CCC method hides 8,786 bits with 39.53 dB PSNR, while F5 method with the same hiding capacity, produces 33.04 dB PSNR value. The data hiding capacity in Baboon image by the proposed CCC method is 13,820 bits, with 37.95 dB PSNR. With same data hiding capacity, F5 gives 36.10 dB PSNR value. Lastly, the data hiding capacity of Gold-hill image by the proposed CCC method is 10,757 bits with 40.47 dB PSNR. F5 gives 37.50 dB PSNR value. Table (2) shows the detail of performance comparison between CCC and its counterparts. Table 2. Performance Comparison of the proposed method over F5 algorithm with different JPEG images
Capacity Histogram [bits] changes 8,034 No 8,034 Yes 8,786 No 8,786 Yes 13,820 No 13,820 Yes 10,757 No 10,757 Yes
While the PSNR value ensures the visual quality of an image, and the visual quality helps high-level detection, visual analysis is not the only detection technique for steganography. A good steganographic algorithm must avoid detection by strong statistical detection techniques.
5
Detection
The performance of a steganographic technique is measured not only by its data hiding capacity, but by its resilience against a variety of detection (breaking)
Concurrent Covert Communication Channels
211
tools. While increase in capacity of a steganographic technique is a performance objective, it should not reduce the resilience of the scheme with respect to detection techniques. To measure the strength of the proposed CCC method, a set of extensive tests have been conducted. This method was tested on more than 1,000 JPEG images. In Section 4 only 4 image values have been presented since there are benchmark images. It has been observed that when the PSNR of a steganographic method reaches about 40dB, the method becomes strong enough against detection [1]. The proposed CCC method is undetectable by first order statistics and χ2 analysis. Its resilience against statistical detection and χ2 test is illustrated in Sections 5.1 and 5.2, respectively. 5.1
First Order Statistical Detection
Every JPEG image has some common property in which the number of coefficients with values 1 is grater than the number of coefficients with values 2 [13]. Similarly, the number of coefficients with values -1 is grater than the number of coefficients with values -2. Let the frequency of JPEG coefficient be denoted by P (C) (where C is a JPEG coefficient), then the set of JPEG coefficients has the following properties. P (C = 1) > P (C = 2) > P (C = 3) > · · · P (C = 1) − P (C = 2) > P (C = 2) − P (C = 3) > · · ·
(14)
A stenographic JPEG image that does not hold the properties in Equations (14) is vulnerable to first order statistical detection and one can break it easily. In most cases, the frequency of the modified JPEG coefficients fails to maintain the property expressed in Equations (14). For example, JSteg, F3, F4 method can be easily detected by the first order statistics mentioned above. However, the proposed CCC method preserves the above properties of a JPEG image and can not be detected by the first order statistical detection methods. 5.2
χ2 Analysis
The χ2 test is one of the popular techniques to detect steganographic schemes. The detail of χ2 analysis is explained in [7] and [13]. The χ2 analysis checks the histogram shape of a modified JPEG image. If any steganographic method can preserve the histogram shape of the modified JPEG image then that method can not be detected by the χ2 analysis [7] and [13]. Let hi be the histogram of JPEG coefficients. The assumption for a modified image is that adjacent frequencies h2i and h2i+1 are similar, then the arithmetic mean will be: h2i + h2i+1 n∗ i = (15) 2 To determine the expected distribution and compare against the observed distribution ni = h2i (16)
212
Md. Amiruzzaman et al.
χ2d−1 =
d (ni + n∗ i )2 i=1
n∗ i
(17)
where, d is the degree of freedom. Let P be the probability of our statistic under the condition that the distributions of ni and n∗i are equal. It is calculated by the integration of the density function: χ2d−1 d−1 d−1 1 P = 1 − d−1 e− 2 x 2 −1 dx (18) d−1 2 2 Γ( 2 ) 0 The proposed method preserves the JPEG histogram shape, thus remains undetected by the χ2 method.
6
Conclusion
In this paper, a new stenographic technique that supports multi concurrent covert channels has been proposed. The proposed method has several merits over the existing steganographic methods. The method has the freedom to modify any nonzero AC coefficients. The method does not increase the number of zeros, and modifies only one of the coefficients from each sub-block that represents one of the concurrent channels. Experimental results have shown the proposed CCC scheme gives better performance in terms of capacity, peak signal-to-noise ratio, and resilience against first order statistical detection and χ2 test when compared to F5 method. It is not clear whether the CCC method is as strong against second order statistical detection techniques.
Acknowledgments This work was supported by Hankuk University of Foreign Studies Research Fund of 2009.
References 1. Curran, K., Cheddad, A., Condell, J., McKevitt, P.: Digital image steganography: Survey and analysis of current methods. Signal Processing 90(3), 727–752 (2010) 2. B¨ ohme, R., Westfeld, A.: Breaking cauchy model-based JPEG steganography with first order statistics. In: Samarati, P., Ryan, P.Y.A., Gollmann, D., Molva, R. (eds.) ESORICS 2004. LNCS, vol. 3193, pp. 125–140. Springer, Heidelberg (2004) 3. Fridrich, J., Goljan, M., Hogea, D.: Attacking the outguess. In: ACM Workshop on Multimedia and Security, Juan-les-Pins, France (December 2002) 4. Fridrich, J.J., Goljan, M., Hogea, D.: Steganalysis of JPEG images: Breaking the F5 algorithm. In: Petitcolas, F.A.P. (ed.) IH 2002. LNCS, vol. 2578, pp. 310–323. Springer, Heidelberg (2003)
Concurrent Covert Communication Channels
213
5. Fridrich, J.J., Goljan, M., Soukal, D.: Perturbed quantization steganography with wet paper codes. In: Dittmann, J., Fridrich, J.J. (eds.) MM&Sec, pp. 4–15. ACM, New York (2004) 6. Kim, Y., Duric, Z., Richards, D.: Modified matrix encoding technique for minimal distortion steganography. In: Camenisch, J.L., Collberg, C.S., Johnson, N.F., Sallee, P. (eds.) IH 2006. LNCS, vol. 4437, pp. 314–327. Springer, Heidelberg (2007) 7. Provos, N.: Defending against statistical steganalysis. In: USENIX (ed.) Proceedings of the Tenth USENIX Security Symposium, Washington, DC, USA, August 13–17. USENIX (2001) 8. Provos, N., Honeyman, P.: Detecting steganographic content on the internet. Technical report, ISOC NDSS 2002 (2001) 9. Sallee, P.: Model-based steganography. In: Kalker, T., Cox, I., Ro, Y.M. (eds.) IWDW 2003. LNCS, vol. 2939, pp. 154–167. Springer, Heidelberg (2004) 10. Sallee, P.: Model-based methods for steganography and steganalysis. Int. J. Image Graphics 5(1), 167–190 (2005) 11. Upham, D.: http://zooid.org/~ paul/crypto/jsteg/ 12. Wallace, G.K.: The JPEG still picture compression standard. IEEE Transaction on Consumer Electronics 38(1), 18–34 (1992) 13. Westfeld, A.: F5 — A steganographic algorithm. In: Moskowitz, I.S. (ed.) IH 2001. LNCS, vol. 2137, pp. 289–302. Springer, Heidelberg (2001) 14. Westfeld, A.: Detecting low embedding rates. In: Petitcolas, F.A.P. (ed.) IH 2002. LNCS, vol. 2578, pp. 324–339. Springer, Heidelberg (2003) 15. Westfeld, A., Pfitzmann, A.: Attacks on steganographic systems. In: Pfitzmann, A. (ed.) IH 1999. LNCS, vol. 1768, pp. 61–76. Springer, Heidelberg (2000)
Energy Efficiency of Collaborative Communication with Imperfect Frequency Synchronization in Wireless Sensor Networks Husnain Naqvi, Stevan Berber, and Zoran Salcic Department of Electrical and Computer Engineering, The University of Auckland, New Zealand [email protected], [email protected], [email protected]
Abstract. Collaborative communication produces significant (N2 where N is number of nodes used for collaboration) power gain and overcomes the effect of fading. With imperfect frequency synchronization significant but slightly less than N2 power can be achieved. As the N increases more power gain can be achieved at the expense of more circuit power. In this paper an energy consumption model for collaborative communication system with imperfect frequency synchronization is proposed. The model to calculate the energy consumed by the sensor network for local communication and communication with base station is presented. Energy efficiency model for collaborative communication for the off-the shell products (CC2420 and AT86RF212) are presented. It is also shown that significant energy can be saved using collaborative communication as compared to traditional SISO (Single input single output) for products. The break-even distance where the energy consumed by SISO and collaborative communication is also calculated. From results it is revealed that collaborative communication using 5 nodes produces efficient energy saving. Keywords: Sensor Network; Collaborative Communication; Bit Error Rate; Rayleigh Fading; Energy Consumption; Frequency Synchronization; energy Efficiency.
frequency errors, the collaborative communication of N nodes will produce N2 gain in the received power [5]. Another factor that significantly degrades the data transmission and results more power required is the channel fading. In recent work related to collaborative communication [3] and [5]-[7], it is shown that substantial power can be achieved with imperfect frequency synchronization. In our recent work [7], a collaborative communication model is presented and it is shown that significant power gain and robustness to fading can be achieved with imperfect frequency synchronization. In [7] a theoretical model and performance analysis for collaborative communication in sensor networks in presence of AWGN, Rayleigh fading and frequency offsets are presented. The theoretical results are confirmed by simulation and it is analyzed that substantial power gain and reduction in BER can be achieved with imperfect frequency synchronization. It is analyzed that power gain and BER depends upon the number of sensor nodes used in collaborative communication. But as the number of nodes increases the more operational power of the network (Circuit power) is required. So the total energy saving depends upon the energy gain and circuit energy used by the network. In this paper we have presented a model to investigate the energy efficiency using collaborative communication with imperfect frequency synchronization in Wireless sensor network. The trade-off analysis between the required circuit power and achieved power gain using collaborative communication is analyzed. In [8]-[10] different energy efficient models for the SISO systems are proposed to investigate the optimized system parameters. It is observed that high power gain can be achieved using multi input and multi output (MIMO) systems but due to complex circuit in MIMO more operational (circuit) power is required. An energy consumption model for MIMO system is proposed, analyzed and compared with SISO in [11]. It is shown in [11] that for short range the SISO systems are more efficient than the MIMO systems. But for large transmission distances, the MIMO systems are more energy efficient than SISO. Energy efficiency of major cooperative diversity techniques such as decode-and-forward and Virtual multi input single output (MISO) are presented and analyzed in [12]. The results shows that decode-and-forward technique is more energy efficient than the virtual MISO [12]. In this paper an energy consumption model is proposed modeled and analyzed for collaborative communication with imperfect frequency synchronization by considering the system parameters of the off-the-shelf products CC2420 [15] and AT86RF212 [16]. The total energy required by collaborative communication is the sum of circuit energy and transmission energy for local communication (within the sensor network) and energy consumed by the sensor network and the base station. The energy consumption model for local communication and with the base station is presented. This model is compared with the SISO system without frequency offsets. The energy efficiency is calculated over different transmission distance for different frequency offsets, number of nodes used for collaborative communication (N) and CC2420 and AT86RF212. The break-even distance where the energy of SISO and collaborative communication is equal is also calculated. The paper is organized as follows. Section 2 describes the collaborative communication model, Section 3 presents the energy consumption model for SISO and
216
H. Naqvi, S. Berber, and Z. Salcic
collaborative communication, Section 4 presents the analysis and results and section 5 gives the conclusions.
2 Collaborative Communication Systems Let N be number of nodes in the network. The data needs to be transmitted to the base station is exchanged among the nodes. So, all the nodes transmit the same data to base station. Each node has the independent oscillator, there could be frequency offset in the carrier signal of each transmitted signal. The physical model is shown in Fig. 1. In our recent work [7] a collaborative communication model with imperfect frequency synchronization is proposed in which one node in the network exchange the data with a set of nodes in the network denoted as collaborative nodes. All collaborative nodes transmit the data towards base station as shown in Figure 1. Sensor Node 1
Sensor Node 2
Sensor Node 3
Sensor Node 4
Base Station
Fig. 1. Geometry of sensor Nodes [7]
As the sensor nodes have their own oscillators that may cause frequency mismatch among the transmitted signal. The proposed collaborative communication model can achieve high signal to noise ratio gain and reduction in BER with imperfect phase synchronization. 2cos(w0t) x1(t)=s(t)cos(w1t)
Slave Node 1
r(t)
rm(t) LPF xN(t)=s(t)cos(wNt)
Slave Node N
n(t)
S
Integration (T interval )
BER Calculation
R Power Calculation
Fig. 2. Theoretical Model of the System [7]
Energy Efficiency of Collaborative Communication
217
2.1 Theoretical Model Theoretical model of the system is shown in Figure 2. Let N collaborative nodes make a network to transfer the information from master node to the base station. Let s(t) be the information data to be transmitted to base station. The received signal at the base station is given by [7] N
r m (t ) = ∑ h i s (t ) cos(wi t ) + n(t ) ,
(1)
i =1
where n(t) is AWGN and hi is the Raleigh fading. After the modulation the signal at the decision circuit is given by [7] N
R = ∑ hi S i =1
sin(Δwi T ) +n, Δw i T
(2)
where ∆wi is the frequency error, S = ±√Eb is the signal amplitude and n is the noise amplitude at sampling time T. The expressions of average received power and BER in the presence of frequency errors, AWGN and Rayleigh fading are derived and simulated in [7] and is given by ⎡ (w T )4 (w e T )2 ⎤ N ( N − 1)b 2 S 2 − ⎥+ E[ P R ] = NS 2 ⎢1 + e 180 9 ⎥⎦ 2 ⎢⎣
2
⎡ (w e T )2 ⎤ N ⎢1 − ⎥ + 0 18 ⎥⎦ 2 ⎢⎣
.
(3)
where we is distribution limit of frequency error and b is the mode of Rayleigh random variable h. The probability of error of the received signal is given by ⎛ ⎜ ⎜ P e = 0 . 5 erfc ⎜ ⎜ ⎜ ⎜ ⎝
⎡
π b ⎢1 −
(w e T )2 ⎤
⎣⎢
18 2
⎥ ⎦⎥
⎞ ⎟ ⎟. N (E b / N 0 ) ⎟ 2 2 Nb u (E b / N 0 ) + 1 ⎟ ⎟ ⎟ ⎠
(
2
)
(4)
where u = 0.429 − 0.048(weT )2 + 0.0063(weT )4 . 2.2 Analysis and Results of Collaborative Communication Systems
We have performed Monte Carlo simulation for the above system in MATLAB. BER in the presence of AWGN and Raleigh fading at the base station is shown in Figures 3, 4, 5 and 6. It is analyzed that simulation results match nicely with theoretical findings, which confirm validity of our theory and simulation. To calculate the BER we set the energy per bit of each collaborative node to be Eb/N2 i.e., total transmission energy is Eb/N. It is analyzed that BER decreases as the number of transmitter increases. It is the confirmation of the fact that collaborative communication overcomes the fading effect. These results is used in section 4 to calculate the energy saving.
218
H. Naqvi, S. Berber, and Z. Salcic
-1
10
Bit Error Rate
-2
10
AWGN only N=1 N=2 N=3 N=4 N=5 N=6 N=7 N=9 N=11
-3
10
-4
10
-5
10
0
5
10
15
Eb/No, dB
Fig. 3. BER for CC2420 with total transmitted power Eb/N, frequency error 200 KHz and data rate 250 Kbps
AWGN only N=1 N=2 N=3 N=4 N=5 N=6 N=7 N=9
-1
10
-2
Bit Error Rate
10
-3
10
-4
10
-5
10
0
5
10 Eb/No, dB
15
20
25
Fig. 4. BER for CC2420 with total transmitted power Eb/N, frequency error 350 KHz and data rate 250 Kbps
Energy Efficiency of Collaborative Communication
219
-1
10
Bit Error Rate
-2
10
AWGN only N=1 N=2 N=3 N=4 N=5 N=6 N=7 N=9 N=11
-3
10
-4
10
-5
10
0
5
10 Eb/No, dB
15
20
25
Fig. 5. BER for AT86RF212 with total transmitted power Eb/N, frequency error 55 KHz and data rate 40 Kbps
-1
10
-2
Bit Error Rate
10
AWGN only N=1 N=2 N=3 N=4 N=5 N=6 N=7 N=9 N=11
-3
10
-4
10
-5
10
0
5
10 Eb/No, dB
15
20
25
Fig. 6. BER for AT86RF212 with total transmitted power Eb/N, frequency error 70 KHz and data rate 40 Kbps
220
H. Naqvi, S. Berber, and Z. Salcic
3 Energy Efficiency of Collaborative Communications System In this section we present an energy consumption model for collaborative communication system with imperfect frequency synchronization and SISO system. The energy consumption model is a function of the total received power, required circuit power and transmission distance. The developed energy consumption model is used to calculate the energy efficiency in wireless sensor network in presence of frequency errors. 3.1 SISO Energy Consumption Model
In SISO, there is single transmitter single receiver, so the total energy consumption is the sum of total power consumed by transmitter Ptx and receiver Prx. The energy consumed by unit bit is given by E SISO = (Ptx + Prx ) / Rs ,
(5)
where Rs is the transmission data rate. The power required for data transmission in Rayleigh fading channel can be calculated by simplified path loss model (log-Distance path loss) [13]. The log-Distance path loss model has a concise format and captures the essence of signal propagation [14]. By assuming the transmitter antenna gain Gt and receiver antenna gain Gr equal to 1, Ptx is given by Ptx = Pcir +
(4π ) 2Pr d α d 0α − 2 λ 2
.
(6)
where Pcir is the power consumed by transmitter circuitry, Pr is the power of received signal, λ=c/fc, c is speed of light, fc is the carrier frequency, α is the path loss exponent, d is the actual distance between transmitter and receiver, d0 is the reference distance for far-field region. To achieve desired BER, minimum received power required Pr is given by Pr = Ps × reber .
(7)
where Ps is the receiver sensitivity (in Watt) required to achieve desired BER with AWGN only and reber is the Eb/No (in Watt) to achieve the required BER with Raleigh fading and AWGN. The reber may be calculated as
r eber = erfc-1 2 erfc ( x) =
where
π
is
the
+∞ − t 2
∫x
e
dt .
((1 − 2P ) )/(1 − (1 − 2P ) ) . (erfc (2P ))
inverse
e
2
e
−1
of
the
2
(8)
2
e
complimentary
error
function
i.e.,
Energy Efficiency of Collaborative Communication
221
Using equations (5), (6) and (7), total energy consumed by SISO may be written as ⎛ ⎞ (4π ) 2Ps r eber d α + Prx ⎟ / Rs E SISO = ⎜ Pcir + α −2 2 ⎜ ⎟ λ d0 ⎝ ⎠
(9)
3.2 Collaborative Communication Energy Consumption Model
According to the collaborative model is shown in Figures 1 and 2. The total energy consumption of collaborative communication system is the sum of energy consumed for local communication by sensor nodes within the network i.e., Elocal and energy consumed for transmission with base station i.e., Elong. We have considered Rayleigh Fading channel within sensor network and between the sensor network and base station. The distance between collaborative nodes is different, but we have considered the maximum distance that gives the maximum energy consumed for local communication. The energy consumed by sensor network for local communication may be written as
(
)
Elocal = Ptx _ local + NPrx _ local / Rs
(10)
where N is number of collaborative nodes in the sensor network and Ptx_local can be calculated like SISO. The energy consumption for communication between the sensor network and base station may be written as
(
)
Elong = Ptx _ long + Prx / Rs
(11)
where Ptx_long is the total energy used by all (N) collaborative nodes. The Ptx_long is given by Ptx _ long = NPcir +
(4π ) 2Pr _ long d α Nd 0α − 2 λ 2
(12)
Minimum received power required to achieve desired BER Pr_long may be written as Pr _ long = Ps × rcol _ ber
(13)
where rcol_ber is Eb/No (in Watt) for the collaborative communication system with frequency error, Raleigh fading and AWGN and Eb/No (in Watt) for the system with AWGN only to achieve the required BER. The rcol_ber may be Calculated as r col _ ber =
BER −1 ( P e , N )
(erfc
−1
(2 Pe )
BER-1(.) is the inverse function of equation (4).
)
2
(14)
222
H. Naqvi, S. Berber, and Z. Salcic
Using equations (11), (12), (13) and (14) the total energy consumed by collaborative communication system may be written as 2 α ⎛ ⎞ ⎜ P + (4π ) Ps r eber_ locald _ local + NP + ⎟ cir rx ⎜ ⎟ d _ local0α −2 λ 2 ⎟ / Rs Ecolab= ⎜ ⎜ ⎟ (4π )2Pr _ longd α ⎜ NPcir + ⎟ P + rx ⎜ ⎟ Nd 0α −2 λ 2 ⎝ ⎠
(15)
The energy saving using collaborative communication model may be written as E saving (%) = 100 ×
E SISO − E colab % E SISO
(16)
It is analyzed that for small transmission distance, the circuit energy is dominant over energy saved using collaborative communication. For a transmission range when energy consumed by SISO is equal to energy consumed by collaborative communication, the energy saving is 0% and this distance is called beak-even distance.
4 Analysis and Results of Energy Efficiency Model For our analysis we have considered off-the-shelf RF product’s circuit parameters i.e., CC2420 [15] and AT86RF212 [16]. The break-even distance for different number of collaborative nodes for different frequency errors are calculated. Maximum local distance among collaborative nodes is considered to be 1 meter and the required BER is 10-5. Table 1. Product data and parameters [15-16] Symbol w0 ∆w Rs U Irx Prx Iidle Pcir Psen
Description modulation operating frequency Maximum Frequency error transmission data rate (BPSK) operating voltage (typical) currency for receiving states Receiving power, Prx= UIrx currency for idle states electronic circuitry power, Pcir =UIidle receiver sensitivity
AT86RF212[15] BPSK
CC2420 [16] BPSK
915 MHz
2.45 GHz
55 KHz
200 KHz
40Kbps
250Kbps
3v
3v
9 mA
17.4 mA
27 mW
52.2 mW
0.4 mA
0.4 mA
1.2 mW
1.2 mW
- 110 dBm
- 95 dBm
Energy Efficiency of Collaborative Communication
223
The reason to select these products is its support BPSK. The considered value of path loss exponent α is between 4.0 and 6.0 [17]. Product data and the parameters used for calculation of energy efficiency are shown in Table 1. Figures 7, 8, 9 and 10 show the energy saving for different number of collaborative nodes and break-even distance. From results it is analyzed that the break-even distance increases as the number of collaborative nodes increases. AT86RF212 has more break-even distance and less energy savings than CC2420. 100 90 N=2 N=3 N=4 N=5 N=6 N=7 N=9 N=11
80
Energy savings (%)
70 60 50 40 30 20 10 0
0
20
40
60
80 100 120 Distance, d (m)
140
160
180
200
Fig. 7. Energy saving and break-even distance with frequency error 200 KHz and data rate 250 Kbps for different N for product CC2420 100 90 N=2 N=3 N=4 N=5 N=6 N=7 N=9 N=11
80
Energy savings (%)
70 60 50 40 30 20 10 0
0
20
40
60
80 100 120 Distance, d (m)
140
160
180
200
Fig. 8. Energy saving and break-even distance with frequency error 200 KHz and data rate 350 Kbps for different N for product CC2420
224
H. Naqvi, S. Berber, and Z. Salcic
100 N=2 N=3 N=4 N=5 N=6 N=7 N=9 N=11
90
Energy savings (%)
80 70 60 50 40 30 20 10 0
0
20
40
60
80 100 120 Distance, d (m)
140
160
180
200
Fig. 9. Energy saving and break-even distance with frequency error 55 KHz and data rate 40 Kbps for different N for product AT86RF212 100 90 N=2 N=3 N=4 N=5 N=6 N=7 N=9 N=11
Energy savings (%)
80 70 60 50 40 30 20 10 0
0
20
40
60
80 100 120 Distance, d (m)
140
160
180
200
Fig. 10. Energy saving and break-even distance with frequency error 70 KHz and data rate 40 Kbps for different N for product AT86RF212
The break-even distance for products CC2420 and AT86RF212 is summarized in Table 2 for different number of collaborative nodes. It is also analyzed that as the distance increases the energy saving using collaborative communication also increases. But after a certain distance it achieves its steady state. The energy saving for different frequency errors at distance 60m and 100m for products CC2420 and
Energy Efficiency of Collaborative Communication
225
AT86RF212 are sumarized in Tables 3. From tables 2 and 3 it is also analyzed that for products CC2420 and AT86RF212 the 5 collaborative nodes produce significant energy saving using collaborative communication. Table 2. Break-even distance for CC2420 and AT86RF212
100m 99.45 99.83 99.86 99.85 99.84 99.8 99.75 98.7
60m 98 98 97.5 97 99.5 96 95 94
5 Conclusions We have presented an energy efficiency model for collaborative communication in sensor networks with imperfect frequency synchronization in the presence of noise and Rayleigh fading. The theoretical model of the system is presented, expression for energy consumption and energy saving is derived. The model is analyzed by consider two off-the-shelf products CC2420 and AT86RF212. It is concluded that using collaborative communication 99% energy can be saved with imperfect frequency synchronization. It is also concluded that collaborative communication is very useful when the distance between transmitters and base station is greater than break-even distance. It is also concluded that break-even distance increases as the number of collaborative nodes increases. It is also concluded that collaborative communication of 5 sensor nodes can save energy efficiently. It is included that energy saving
226
H. Naqvi, S. Berber, and Z. Salcic
increases as the distance between transmitter and base station increases, but after certain distance it achieves the steady state. It is also concluded that the AT86RF212 achieves steady state rapidly than the CC2420.
Acknowledgment Husnain Naqvi is supported by Higher Education Commission (HEC), Pakistan and International Islamic University, Islamabad, Pakistan.
References [1] Estrin, D., Girod, L., Pottie, G., Srivastava, M.: Instrumenting the world with wireless sensor networks. In: Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), vol. 4, pp. 2033–2036 (2033) [2] Kahn, J.M., Katz, R.H., Pister, K.S.J.: Next century challenges: mobile networking for smart dust. In: MobiCom 1999: Proc. 5th ACM/IEEE Intl. Conf. on Mobile Computing and Networking, pp. 271–2781 (1999) [3] Barriac, G., Mudumbai, R., Madhow, U.: Distributed beamforming for information transfer in sensor networks. In: Proc. 3rd International Symposium on Information Processing in Sensor Networks (IPSN 2004), April 26–27, pp. 81–88 (2004) [4] Han, Z., Poor, H.V.: Lifetime improvement in wireless sensor networks via collaborative beamforming and cooperative transmission. Microwaves, Antennas & Propagation, IET 1, 1103–1110 (2007) [5] Mudumbai, R., Barriac, G., Madhow, U.: On the feasibility of distributed beamforming in wireless networks. IEEE Trans. Wireless Commun. 6(5), 1754–1763 (2007) [6] Naqvi, H., Berber, S.M., Salcic, Z.: Collaborative Communication in Sensor Networks. Technical report No. 672, University of Auckland Engineering Library (2009) [7] Naqvi, H., Berber, S.M., Salcic, Z.: Performance Analysis of Collaborative Communication with imperfect Frequency Synchronization and AWGN in Wireless Sensor Networks. In: Proceedings of The 2009 International Conference on Future Generation Communication and Networking, Jeju Island, Korea (December 2009) [8] Schurgers, C., Aberthorne, O., Srivastava, M.B.: Modulation scaling for energy aware communication systems. In: Proc. Int. Symp. Low Power Electronics Design, August 2001, pp. 96–99 (2001) [9] Min, R., Chandrakasan, A.: A framework for energy-scalable communication in highdensity wireless networks. In: Proc. Int. Symp. Low Power Electronics Design, August 2002, pp. 36–41 (2002) [10] Cui, S., Goldsmith, A.J., Bahai, A.: Modulation optimization under energy constraints. In: Proc. ICC 2003, AK, May 2003, pp. 2805–2811 (2003), http://wsl.stanford.edu/Publications.html [11] Cui, S., Goldsmith, A.J., Bahai, A.: Energy-Efficiency of MIMO and Cooperative MIMO Techniques in Sensor Networks. IEEE Journal on Selected Areas In Communications 22(6), 1089–1098 (2004) [12] Simić, L., Berber, S., Sowerby, K.W.: Energy-Efficiency of Cooperative Diversity Techniques in Wireless Sensor Networks. In: The 18th Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, PIMRC 2007 (2007)
Energy Efficiency of Collaborative Communication
227
[13] Sklar, B.: Rayleigh fading channels in mobile digital communication systems.I. Characterization. IEEE Communications Magazine 35(7), 90–100 (1997) [14] Goldsmith, A.: Wireless communications, pp. 31–42. Cambridge University Press, Cambridge (2005) [15] CC2420, Texas Instruments Chipcon Products, http://focus.ti.com/analog/docs/ enggresdetail.tsp?familyId=367&genContentId=3573 [16] AT86RF212, ATMEL Products, http://www.atmel.com/dyn/products/ product_card.asp?PN=AT86RF212 [17] Cheng, J., Beaulieu, N.C.: Accurate DS-CDMA bit-error probability calculation in Rayleigh fading. IEEE Trans. on Wireless Commun. 1(1), 3–15 (2002)
High Performance MAC Architecture for 3GPP Modem Sejin Park, Yong Kim, Inchul Song, Kichul Han, Jookwang Kim, and Kyungho Kim Samsung Electronics, Digital Media & Communication R&D Center, Korea {sejini.park,yong95.kim,Inchul.song,kichul.han, jookwang,kyungkim}@samsung.com
Abstract. This paper presents the architecture and implementation of LMAC which is the core block for high speed modem such as HSPA+ and LTE. The required data rate of these modems are 28.8Mbps/11.5Mbps (HSPA Release 7 modem) and 100Mbps/50Mbps (LTE modem) for downlink/uplink, respectively. To support higher data rate, we designed the new LMAC. Architecturally, LMAC includes cipher HW and provides functions such as fast data transfer, packet generation and parsing. Especially, we designed a new function which combines data transfer with cipher and has more performance benefit. As a result, our design can be used for the platform to support HSPA+ & LTE modems. And also, we reduced the processing time and the CPU & bus utilization, therefore, SW can obtain more margin to process the protocol stack. Keywords: 3GPP, Modem, HSPA+, LTE, LMAC, Cipher.
1
Introduction
In wireless and telecommunication system, higher data rate is necessary due to increasing the usage of multimedia such as music, UCC and so on. Therefore commercialized communication systems (3GPP) [1] such as WCDMA and HSDPA/HSUPA are moving to HSPA+ and LTE (Long Term Evolution) which can support higher data rate. Not only the physical throughput but also protocol stack performance should be enhanced to support higher data rate. As data rate is higher, the more performance of MAC (Medium Access Control) layer should be necessary. MAC processor based on WLAN [2] and LMAC (Lower MAC) based on mWiMAX [3] are representative examples for increasing data rate. This paper presents the new hardware architecture of MAC which was neglected in 3GPP modem design. In proposed architecture, the data is efficiently transferred between PHY and CPU. In particular, we integrate cipher H/W to support security [4], [5], [6] into LMAC. This design can reduce memory access time and CPU and bus utilizations. Therefore, a high performance modem can be constructed. This paper is organized as follows. Section 2 represents the design challenges for the design LMAC architecture, and Section 3 describes the implementation details of LMAC. Section 4 discusses the experimental results of LMAC. Finally, section 5 provides a brief conclusion and a description of future work. T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 228–238, 2010. c Springer-Verlag Berlin Heidelberg 2010
High Performance MAC Architecture for 3GPP Modem
2
229
Design Challenges
This section describes design challenges of LMAC for HSPA+ and LTE modems. The first and most important goal of LMAC is support of high data rate. The second goal is the design of LMAC platform for both modems. We analyze each modem’s requirements and differences, and explain about key features of design choices. 2.1
Basic Requirements
HSPA+ modem should support 28.8Mbps/11.5Mbps throughput, downlink/ uplink respectively, and support WCDMA, HSDPA, and HSUPA specifications. The packet formats which must be controlled in the MAC/RLC layer are various. Each PDU (Protocol Data Unit) size is 320bits or 640bits. When making and controlling these packets, bit operation is necessary because the header is not byte-aligned. Therefore, it is necessary that minimizing these overhead. LTE modem should support 100Mbps/50Mbps throughput, downlink/uplink respectively. Contrary to HSPA+ modem, bit operation is not necessary because the header and payload of packet are byte-aligned. It transmits small number of PDUs, but must support higher data rate. Therefore, LTE modem must have architecture suitable for burst transmissions. 2.2
Data Path Design and Cipher HW
3GPP defines the security level provided by UE (User Equipment). And also, it defines the functions and algorithms used in protocol stack. The functions are called cipher in uplink, and decipher in downlink, generally we call cipher. Cipher is processed in the RLC and MAC layer in case of HSPA+, and the PDCP layer in case of LTE. Cipher is divided into f8, data confidentiality function and f9, data integrity function. Kasumi [7] and snow 3G algorithms are used in HSPA+, and SNOW 3G and AES algorithms are used in LTE. Basically, cipher is time consuming job. The processing time increases in proportion to the increase of data size. If cipher is implemented with hardware [8], the sequence is as follows. First, HW reads the data from memory and processes cipher function and then writes to memory. Therefore the total processing time is determined by the performance of cipher algorithm and data access time. Data rate is changed from WCDMA (384Kbps, 20ms TTI) to HSPA+ (28.8Mbps, 2ms TTI) and LTE (100Mbps, 1ms TTI) in downlink. This means that cipher must process faster about 75 250 times. There are some methods to increase performance of cipher such as using higher clock, cipher core algorithm improvement, and parallelizing cipher core. Also, adding input/output buffer can be used burst data transfer. If every data are saved one buffer, buffer size must be large. Therefore double buffering method can be used. This method reduces not only buffer size but also latency of read or write.
230
S. Park et al.
The most important issue in point of architecture is how to design data path. The number of necessary main data path is two (uplink and downlink). Because modem must support uplink and downlink data transmissions simultaneously. From this point of view, it is important that LMAC includes or excludes cipher hardware. Cipher hardware with independent data path can be implemented, because cipher is not dependent on data transmission. But, in case of uplink, data is transferred after ciphering, or in case of downlink, data is transferred after deciphering. In other words, if cipher don’t happen simultaneously or time is sufficient to process, data path will be shared for saving modem’s size. On the other hand, if we can control the sequence of data, we can improve the performance with eliminating the redundant memory access. We will explain detailed in the next section. 2.3
SW and HW Partitioning
In 3GPP specification, there are many functions which must be controlled in MAC or RLC layer. So, it is hard to find the HW functions to maximize performance and efficiency. In this paper, we concentrate on functions that consumes more time among data transmission sequence. In case of WCDMA/HSUPA/ HSDPA modem, the header of packet is non-byte-aligned. This means that bit operation is necessary when we make or parse the packet. If these are controlled by software, the performance comes down. Therefore, hardware function of packet generation and parsing is efficient to reduce data transmission time. In case of LTE modem, packet sizes are various but these are byte-aligned. Therefore, these HW functions aren’t necessarily.
3
Implementation of LMAC
In this section, the architectures and the implementations of HSPA+ (3GPP release 7) modem and LMAC of LTE modem are described. LMAC is an abbreviation of Lower MAC that can be defined as a HW accelerator of L2 layer (MAC & RLC [9], [10], [11], [12]). With the existing software-only implementation, it is hard to process high speed data required by HSPA+ and LTE. Therefore, to increase the throughput, the hardware architecture and the various functional blocks including cipher are designed in this paper. Figure 1 shows a simple diagram of HSPA+ and LTE modems. LMAC is located between CPU/memory and PHY. It is responsible for main data path with the AXI bus. The cipher HW in the LMAC processes the cipher function more efficiently than the original method which processed it by the MAC/RLC/PDCP layer. 3.1
LMAC Architecture
Figure 2 describes more detailed architecture of LMAC. For convenience, HSPA+ modem and LTE modem are shown together. The grey part represents LTE modem only path. The two modems use the memory interface and commonly
High Performance MAC Architecture for 3GPP Modem
231
External Memory
CPU
AXI Bus
Cipher LMAC
Transmitter
Receiver
PHY HSPA+ & LTE MODEM
Fig. 1. HSPA+ & LTE Modem Simple Diagram
AXI Interconnect
TX eXDMAC
TX eXDMAC
RX eXDMAC
Cipher Input Buffer
Decipher Input Buffer
Cipher
Decipher
RX eXDMAC
LMAC
Cipher Output Buffer
RX LMAC
RX LMAC Controller
TX LMAC
TX LMAC Controller
Decipher Output Buffer
Loopback Buffer
Transmitter
Receiver
Fig. 2. HSPA+ & LTE LMAC Block Diagram
use the encoder buffer of the TX path and the decoder buffer of the RX path as the modem PHY interface. Through the AXI bus, LMAC reads or writes the data to or from the memory and then it reads or writes the data to or from the buffers. For the fast data transmission, LMAC includes the embedded DMA controller (eXDMAC) for the AXI bus. As shown in figure 2, LTE modem has two eXDMACs which can process simultaneously transfer, cipher and decipher for the uplink and downlink data. In the view of the modem, it has four DMA channels to access the bus. The cipher and decipher are symmetric but the sizes of the input and output buffers are different according to the data rates. All data accesses are performed using the eXDMAC. In the case of LTE modem, the data sizes of cipher and decipher are huge. Therefore, the double buffering method is applied to the input buffer to
232
S. Park et al.
reduce the buffer size and hide the read latency. In the case of HSPA+ modem, a significant performance enhancement was achieved by write the data to the encoder buffer directly. The detailed explanation of the method will be presented in next section. 3.2
Packet Generation and Header Parsing
In case of HSPA+ modem, LMAC processes the some part of MAC/RLC layer function, such as packet generation and header parsing. Packet generation function integrates non-byte-aligned headers and byte-aligned payload into packet, and then writes to the encoder. On the contrary, header parsing separates headers and payload from packet, and then writes to the memory. These functions are time consuming work because bit operation must be used. There are many packet formats in MAC layer such as MAC, MAC-hs, MAC-e, and MAC-es. Figure 3 and figure 4 show the representative packet formats. The headers must be necessary or can be eliminated in accordance with the channel mapped with PHY. So, packet types are various and complicated. In case of LTE, headers are byte-aligned, so we use scatter/gather operation supported by eXDMAC. There is no dedicated HW. MAC header
MAC SDU
UE-Id or TCTF UE-Id C/T type MBMS-Id
MAC SDU
Fig. 3. HSPA+ MAC PDU
VF Queue ID
TSN
SID1
N1
MAC-hs header
F1
SID 2
MAC-hs SDU
N2
F2
SID k
MAC-hs SDU
Nk
Fk
Padding (opt)
Mac-hs payload
Fig. 4. HSPA+ MAC-hs PDU
3.3
Cipher HW
In our implementation, cipher is significant component in terms of architecture and performance. Figure 5 shows the common block diagram of cipher for HSPA+ and LTE modems. Architecturally, cipher core has input and output buffer, and eXDMAC controls data stream to minimize the latency. We will present the method to maximize the performance in the this section.
High Performance MAC Architecture for 3GPP Modem
233
eXDMAC
Cipher Input Buffer
Kasumi core
Snow 3G core
AES core
Cipher Core
Cipher Output Buffer
Fig. 5. Cipher Block Diagram
Combine Data Transfer with Cipher. In point of HW view, the sequence of cipher and data transfer of HSPA+ modem is analyzed as follows. First, LMAC reads data from memory to cipher input buffer using eXDMAC. Then LMAC writes data from cipher output buffer to memory after cipher operation. Finally, LMAC reads data from memory and does packet generation and then writes to encoder buffer. In point of SW view, cipher is processed after setting the parameters in RLC layer. And then the ciphered data is transferred to PHY after setting the transfer parameter in MAC layer. In this sequence, we focus that ciphered data can be transferred directly PHY without transfer to memory. To support direct transfer, the following conditions should be satisfied. First, the direct path from cipher output buffer to PHY must exist. Second, LMAC can generate the packet. The last, LMAC can control the data flow we described. LMAC including cipher HW can support the described data flow. Figure 6 shows the LMAC architecture for this mechanism. The left arrow indicates the direct path from cipher output to PHY through packet generation logic. After finishing the cipher, LMAC can make packet using data in output buffer and the parameters previously sets. And then LMAC transfers to encoder buffer. Using this mechanism, two memory accesses and one interrupt can be eliminated and bus utilization can be decreased. Also, RLC layer does not wait completion of cipher, so it is possible to get more CPU idle time. Combine Data Transfer with Decipher. According to the downlink data flow of HSPA+ modem, LMAC writes the PHY data to memory after parsing. At this time, the data is written in memory with scattered pattern. After decipher, this scattered data should be made into SDU (Service Data Unit). That is, SDU generation need more two memory accesses. This sequence is similar to cipher,
234
S. Park et al. External Memory
CPU
AXI Bus
Data Transfer
Cipher
LMAC
Encoder/ Decoder PHY
Fig. 6. Data Transfer With Cipher Block Diagram
Fig. 7. HSPA+ & LTE Modems Floorplan Snapshot
except direction. LMAC reads the data and does decipher and then writes the deciphered data into memory directly. As explained in section 3.3, we can get the same advantages. Double Buffering in Cipher. As shown in figure 5, we use the input/output buffer of cipher HW in order to minimize the process time. In case of LTE modem, the amount of necessary buffer is big because of higher data rate. Instead, size of PDU is maximum 1500bytes, but the number of PDUs is small, different from HSPA+ modem. We adapt double buffering method which can data read and cipher simultaneously. After reading one PDU, cipher can be started, during next PDU is stored another cipher buffer. Therefore, read latency is reduced to 1/8 and buffer size is reduced to 1/4. 3.4
HSPA+ and LTE Modems Floorplan
Figure 7 shows the floorplan snapshot of HSPA+ modem (left side) and LTE modem (right side). We call CMC212 and CMC220, respectively. In case of HSPA+
High Performance MAC Architecture for 3GPP Modem
235
modem, LMAC occupies 6%. In case of LTE modem, LMAC occupies 3%. As mentioned previous section, LMAC of HSPA+ modem has more complicated functions than that of LTE modem, so gate count of HSPA+ LMAC is more than LTE LMAC.
4
Performance Evaluation
Figure 8 presents the timing diagram of the LMAC operation. The upper part shows the HSPA+ modem and the lower part shows the LTE modem. The left portions show the time of downlink and the right portions show the time of uplink, respectively. The inside operation of LMAC is indicated LMAC, this portion has the fixed time to process. The operation through AXI bus is indicated AXI Bus, it is different in time because of the traffic on AXI bus. We are testing the HSPA+ modem and LTE modem, so the performance is evaluated with RTL simulation. We assume the bus traffic is idle and the remained factors are clock accurate. Base clock is 133MHz and bus interface has 64bit width. HSPA 1TTI (2ms)
LMAC AXI Bus Decode Done
Tx Done
Decode Done
Tx Done
Decode Done
LTE 1TTI (1ms)
LMAC AXI Bus Decode Done Parsing & Data transfer
Cipher & decipher
Memory read & write
Tx data transfer
SDU Generation
Fig. 8. Timing Diagram of LMAC
4.1
Performance of HSPA+ LMAC
The assumptions for LMAC performance of HSPA+ modem are as follows. Data rate is 11.5/14.4Mbps for uplink/downlink, respectively and 1 PDU size is 640bits. This is the base values of HSPA+ Release 6 specification. Our modem support Release 7, but in this section, we use values of Release 6 specification. Because that data rate of Release 7 is higher than that of Release 6, but PDU size is variable and big, and header is byte aligned, therefore SW processing time is lower than that of Release 6. That is, in point of system performance, the overhead of Release 6 is higher than that of Release 7.
236
S. Park et al. Table 1. Basic Processing Time of HSPA+ LMAC Uplink Downlink 1 memory access time 14.3us 17.7us Snow 3G time 56us 69.5us Packet generation time 7.7us NA SDU generation time NA 35.4us
Table 1 shows the basic processing time of HSPA+ LMAC. Memory access time is very important factor, because all functions of LMAC are composed with memory access. SDU generation time consumes two times more than that of 1 memory access, because of the buffering. Snow 3G and packet generation time is fixed time and increased with the data size. Table 2. Processing Time of HSPA+ LMAC Uplink Downlink Cipher + data transfer 106.5us 130.4us Data transfer with cipher 78us 94.9us
Table 2 represents the processing time of HSPA+ LMAC functions. Cipher + data transfer means the sum of time for separate function. Data transfer with cipher means the method we proposed. In case of uplink, we reduced 28.6us that is the time of 2 memory accesses. In case of downlink, we reduced 35.5 us, because SDU generation time is eliminated. Therefore, our method attains 26.8% and 27.2% higher than using separate functions for uplink and downlink, respectively. In addition, we get the reduction of bus traffic because of the decrease of memory transaction time. 4.2
Performance of LTE LMAC
LTE modem uses max 1500bytes for 1 PDU and supports 50Mbps/100Mbps for uplink/downlink, respectively. Table 3 shows the 1 memory access time and cipher time for this data rate. Compared with HSPA+ modem, 1 memory access time of LTE modem is lower than that of HSPA+ modem. The reason is that there are many PDUs with small size, so the overhead for access AXI bus increase. Table 3. Basic Processing Time of LTE LMAC Uplink Downlink 1 memory access time 9.2us 20.2us Snow 3G time 24.7us 54us
High Performance MAC Architecture for 3GPP Modem
237
Table 4 represents the processing time of LTE LMAC functions. LMAC with double buffering scheme attains 11.8% and 14.8% higher than without double buffering, As mentioned in figure 8, cipher function starts after reading first PDU not all PDUs, so read latency can be hidden. Therefore double buffering is effective method because of reducing the read latency and also reducing the size of cipher buffer. Table 4. Processing Time of LTE LMAC Uplink Downlink LMAC processing time 58.6us 120.8us without double buffering LMAC processing time 51.7us 102.9us with double buffering
5
Conclusions and Future Works
This paper shows the design and implementation of LMAC architecture for high speed 3GPP modem. We designed the new LMAC architecture which includes cipher HW. Functionally, LMAC processes data transfer, cipher, packet generation and header parsing. These are time consuming jobs if processed by SW. Especially, the function which combines data transfer with cipher increase the performance of LMAC, dramatically. Hereafter, we will design the integrated LMAC, which have both functions of HSPA+ and LTE, and share memory, registers, and control logic. In the near future, two modems are integrated into one chip; therefore, the new LMAC will be the platform supports two modems simultaneously. Furthermore, our architecture can be the base platform for the 4th generation modem.
References 1. 3GPP, http://www.3gpp.org 2. Thomson, J., Baas, B., Cooper, E., Gilbert, J., Hsieh, G., Husted, P., Lokanathan, A., Kuskin, J., McCracken, D., McFarland, B., Meng, T., Nakahira, D., Ng, S., Rattehalli, M., Smith, J., Subramanian, R., Thon, L., Wang, Y., Yu, R., Zhang, X.: An Integrated 802.11a baseband and MAC processor. In: ISSCC (2002) 3. Saito, M., Yoshida, M., Mori, M.: Digital Baseband SoC for Mobile WiMAX Terminal Equipment. Fujitsu Scientific and Technical Journal 44(3), 227–238 (2008) 4. 3GPP TS 33.102, 3G Security; Security Architecture 5. 3GPP TS 33.201, 3G Security; Specification of the 3GPP Confidentiality and Integrity Algorithms; Document 1: f8 and f9 specification 6. 3GPP TS 33.202, 3G Security; Specification of the 3GPP Confidentiality and Integrity Algorithms; Document 2: KASUMI Specification 7. Satoh, A., Morioka, S.: Small and High-Speed Hardware Architectures for the 3GPP Standard Cipher KASUMI. In: Chan, A.H., Gligor, V.D. (eds.) ISC 2002. LNCS, vol. 2433, pp. 48–62. Springer, Heidelberg (2002)
238
S. Park et al.
8. Marinis, K., Moshopoulos, N.K., Karoubalis, F., Pekmestzi, K.Z.: On the Hardware Implementation of the 3GPP Confidentiality and Integrity Algorithms. In: Davida, G.I., Frankel, Y. (eds.) ISC 2001. LNCS, vol. 2200, pp. 248–265. Springer, Heidelberg (2001) 9. 3GPP TS 25.321, Medium Access Control (MAC) protocol specification 10. 3GPP TS 25.322, Radio Link Control (RLC) protocol specification 11. 3GPP TS 36.321, Medium Access Control (MAC) protocol specification 12. 3GPP TS 36.322, Radio Link Control (RLC) protocol specification 13. 3GPP TS 36.323, Packet Data Convergence Protocol (PDCP) specification
Modified Structures of Viterbi Alogrithm for Forced-State Method in Concatenated Coding System of ISDB-T Zhian Zheng1, Yoshitomo Kaneda2, Dang Hai Pham3, and Tomohisa Wada1,2 1
Information Engineering Department, Graduate School of Engineering and Science, University of the Ryukyus, 1 Senbaru Nishihara, Okinawa, 903-0213, Japan 2 Magna Design Net, Inc, Okinawa 901-0155, Japan 3 Faculty of Electronics and Telecommunications, Honoi Universtiy of Technology, 1Dai Co Viet Street, Hai Ba Trung, Hanoi, Vietnam [email protected], [email protected], [email protected], [email protected]
Abstract. Iterative decoding based on forced-state method is applied for improving the decoding performance for concatenated coding system. In the application targeted here, this iterative decoding method is proposed for channel decoding of Japan Terrestrial Digital TV (ISDB-T). Modified structure of Viterbi algorithm that operates on quantized data with regard to implementation of the iterative decoding is presented. In general, knowledge about the implementation of conventional Viterbi algorithm can be applied to the modified Viterbi algorithm. The computational kernel of the proposed structure is the path metric calculation and the trace back for Viterbi algorithm with forcedstate method based on quantized data. Keywords: ISDB-T, Concatenated coding, Convolutional codes, ReedSolomon codes, Viterbi algortihm, BM algorithm, forced-state decoding.
high complexity and a large number of memory usage for hardware implementation. By contrast the turbo method of mutual interchanging on soft decision data, this paper considers the way of iterative decoding by exchanging hard decision data between the two decoders. This technique has been discovered in [6]. The inner code used in [6] is made on a unit-memory convolutional code that is different from the one adopted by ISDB-T. It should be noticed that this technique is also applied in [7] for evaluating the gain provided by feedback from the outer decoder in Consultative Committee for Space Data Systems. Author of [7] has called this technique as forced-state (FS) decoding for concatenated coding system. This paper describes techniques of modified structure of VD that are suitable for hardware implementation for FS decoding. The main idea in this paper can be summarized as extending conventional techniques used in implementation of the Viterbi algorithm to the modified VD. Specifically, the well-known rescaling approach [8] with quantized channel data [9] of VD (Q-VD) is used in our work. In the following section of this paper, the modified structure of VD for FS decoding based on quantized data is called as Q-VDFS. The rest of the paper is organized as follows. In section 2, a brief review of the QVD is given. FS decoding for ISDB-T and proposed structures of Q-VDFS are presented in section 3. Section 4 presents performance of FS decoding with the use of QVDFS. Finally, section 5 offers the conclusions of this paper.
2 Brief Review of Q-VD The CC encoder adopted by ISDB-T is shown in Fig.1. This encoder outputs four possible pairs XY (“00”, “01”, “10”, “11”). A general Viterbi algorithm consists of the following three major parts: 1) Branch metric calculation: calculation of a Euclidean distance between the received pair of channel data and one of the four possible transmitted ideal pairs (“+1+1”, “+11”, “-1+1”, “-1-1”). 2) Path metric calculation: calculation of path metric of each encoder state for the survivor path ending in this state (a survivor path is a path with the minimum metric).
Fig. 1. Encoder of Convolutional Code in ISDB-T
Modified Structures of Viterbi Alogrithm for Forced-State Method
241
3) Trace back: this step is necessary for hardware implementation that doesn’t store full information about the survivor paths, but store only one bit decision every time when one survivor path is selected from the two. 2.1 Branch Metric Calculation Based on Quantized Channel Data A branch metric is measured using the Euclidean distance. Let rx be the first received channel data bit in the pair at time t, ry be the second; x0 and y0 be the transmitted ideal pairs. Then the branch metric is calculated as formula (1),
BRt = (rx − x0 ) 2 + (ry − y0 ) 2 .
(1)
Since x0 y0 has 4 possible pairs, then the value of BRt also has 4 different value. For Viterbi decoding, it has been proven that the actual value of BRt is not necessary, only the difference between them makes sense. The branch metric (1) can thus be written as formula (2),
BRt = x0 rx + y0 ry .
(2)
For hardware implementation, Viterbi decoder must operate on quantized data. As shown in Fig.2 (proposed in [9]), 8-level and 16-level quantization for received channel data rx ry can be used. For easy implementation, the branch metric can be further adjusted to non negative number by an offset value using formula (3). BRt = x0 rx + y0 ry + M offset .
(3)
where M offset equal to quantization level number, e.g. M offset equal to 16 for the case that 16-level quantization is used for channel data.
Fig. 2. Quantization thresholds and intervals of received channel data for Q-VD
Simulations show that VD operating on 8-level quantized data entails a loss of 0.2dB decoding performance compared to infinite precision one. However, the 16level quantization is very close to infinite precision one.
242
Z. Zheng et al.
2.2 Path Metric Calculation for Q-VD (PMC_Q-VD)
Fig. 3 shows the structure of PMC for Q-VD. The basic operation of PMC is so-called ACS (Add-Compare-Select) unit. Considering the calculation of PM for a given state S0, there are two states (Assuming S1 and S2) on the previous step which can move to this state, and the output bit pairs that correspond to these transitions. To calculate new path metric of S0, we add the previous path metrics PS1 and PS2 with the corresponding branch metrics BRS1 and BRS2 respectively, and then the better metric is select between the two new path metrics. This procedure is executed for every encoder state and repeated with incrementing time t recursively. The problem with PMC is that PM values tend to grow constantly and will eventually overflow. But, since the absolute values of PM don’t actually matter, and the difference between them is limited, a data type with a finite number of bits will be sufficient. The rescaling approach described in [8] is to subtract the minimum metric from all PMs. Since the PM is saved on finite bits for Q-VD, it is still possible that the value of PM overflows. In order to settle the problem, the so-called flipping approach should be applied using formula (4). Where, the value of PM of state s0 after flipping step is assumed as Ps0 ' and quantized on m bits. ⎧⎪ PS0 PS0 ' = ⎨ ⎪⎩2 m − 1
PS0 < 2 m − 1 PS0 >= 2 m − 1
.
(4)
Fig. 3. PMC structure for Q-VD
3 Proposed Structure of Q-VD with FS Method (Q-VDFS) Fig. 4. illustrates the structure of FS decoding for concatenated coding system of ISDB-T. For the coding system, the outer decoder using of BM algorithm for RS code can correct byte errors up to t=8. By definition, a decoding error occurs when the BM found a codeword other than the transmitted codeword; this is in contrast to a decoding failure, which occurs when the BM decoding failed to find any codeword at all.
Modified Structures of Viterbi Alogrithm for Forced-State Method
243
For RS(204,188) code, decoding error occurs with a probability less than 1/t! [10][11]. This fact means that we can consider the decoded data from RS decoder to be virtually error-free because a decoding error appears with a probability around 1/t! (about 10-5) if more than 8 errors occur. For FS decoding technique, outer decoder provides not only decoded data (DD) but also decoding failure flag (DFF) to Inner decoder. The DFF marks the meaning that the BM decoding failed or succeeded on finding any codeword at all (0: Decoding failure, 1:Decoding success). In the Fig.4, the modified VD of inner decoder for FS decoding is denoted as Q-VDFS. A detailed description of Q-VDFS is given in section 3.1 and 3.2.
Fig. 4. Structure of FS decoding for concatenated coding system of ISDB-T
3.1 Path Metric Calculation (PMC) of Q-VDFS
This subsection shows that the processing of PMC for Q-VDFS can be divided into two different processes called as non forced-state process and forced-state process. Signals DD and DFF in FS decoding technique as shown in Fig.4 are fed back through an interleaver on byte level, which means that the DFF with equaling to 1 continues at least 8 times at bit level if at least one packet RS code is decoded successfully. Furthermore, since the memory of CC used in ISDB-T system equals to 6, 6 DD signals leaves the encoder of CC in a known state. For decoding, the known state is called as forced state (FS). Consider the PMC of Q-VDFS on the trellis in Fig. 5 (For simplicity, not all paths are shown). It is divided into non forced-state process and forced-state process for the whole process of PMC in time domain. The two different processes can be distinguished on the value of DFF counter. For the case of ISDB-T systems, the process with the value of DFF_Counter greater than or equal to 6 is defined as forced-state process, otherwise as non forced-state process. DFF_Counter is identified as formula (5), ⎧0 DFF_Counter = ⎨ ⎩DFF_Counter + 1
DFF == 0 DFF == 1
.
(5)
244
Z. Zheng et al.
Fig. 5. Trellis with path metric calculation of VDFS decoding
The processing of PMC is identical with normal VD in the non forced-state process. When decoding of VD is based on a minimum metric, the practical way to realize PMC in forced-state process can be performed by initializing the forced state with a zero metric and all other states with a large metric. Here, the forced state can be decided simply by a register of previous 6 bits DD signal. The path metrics of non forced-states in Fig. 6 are denoted as ∞ . However, it should be 2 m − 1 if the path metric of Q-VD is realized in m bits. Fig.6 shows the PMC structure of Q-VDFS. The calculation of PM value Psx of state sx can be summarized as formula (6),
Fig. 6. Structure of PM calculation for modified Q-VDFS
Modified Structures of Viterbi Alogrithm for Forced-State Method ⎧0 ⎪⎪ Psx = ⎨ 2 m −1 ⎪ ' ⎩⎪ Psx
where Psx ' is calculated by PMC of normal Q-VD. 3.2 Trace Back of Q-VDFS
This subsection shows that trace back of Q-VDFS can be realized uniformly with normal Viterbi algorithm. Consider the path metric transition of VDFS on trellis occurs from initial state sx0 (forced state) with Psx0 = 0 at time t in Fig.7. For simplicity, the non-forced states with path metrics equaling to ∞ are not shown in the figure. For the case of CC in ISDB-T, each state produces two possible transitions, e.g. state sx0 has transition to state sx1 and to sx2. In addition, each state has two possible paths leading to the state, e.g., the path metric of state sx1 derives from state sx0 or other certain state sxx (drawn in dotted line) using ACS rule. For the case of infinite precision of VDFS, the path metric of initial state is initialized to 0 and other states are initialized to infinite value at time t. As a competition result of ACS at time t+1, the path metrics Psx1 or Psx2 derives from Psx0 . Again, Psx1 and Psx2 is derived to two states respectively and result in four states with finite value path metric at time t+2. Deduced by analogy, the number of states at time t+3, t+4 t+5, t+6, which derive from previous stage with finite value
Fig. 7. Derivation of PM from initial state for VDFS
246
Z. Zheng et al.
path metric, are 23 ,2 4 ,25 ,2 6 respectively. In the other word, the path metric of each state at time t+6 derives from Psx0 of the state sx0 at time t. Assuming the survivor path from time t to t+6 through trellis is denoted as a sequence of states Π tt + 6 = {Ψt , Ψt +1 ,L, Ψt + 6 : Ψx ∈ {s 0, s1,L, s 63}, x ∈ {t , t + 1,L, t + 6} } . Based on the above v
discussion, Π tt + 6 should be the one in the set Π tt + 6 , where, v Π tt + 6 = {Ψt , Ψt +1 ,L, Ψt + 6 : Ψt = forcedstate sx 0, Ψt + 6 ∈ {s0, s1,L, s 63}} .
(7)
v
The Π tt + 6 shown in formula (7) implies that the forced state at time t will be returned certainly at trace back step. Furthermore, the trace back of VDFS can be realized uniformly with normal Viterbi algorithm. In contrast with VDFS, the PMs of non forced states for Q-VDFS is initialized to 2 m − 1 for m-bits width path metric, which is different from the one equaling to ∞ for VDFS case. Here, the following test method as shown in Fig.8 is used to check whether the trace back of Q-VDFS can be realized as VDFS does. In Fig.8, decoded data (Do) by Q-VDFS is compared with DD if DFF equals to 1. This is to be expressed as (8), ne = ne + 1;
if ((DD == Do) & (DFF == 1)) .
(8)
In our test patterns, the bit width of path metric for Q-VDFS is set to 6, 7 or 8bits and branch metric is set to 5bits. The Eb/No ratio over AWGN channel is set to 0.5dB to 2.7 dB. As test results, there are no any errors happened for any patterns. These test result means that the trace back of Q-VDFS can be also realized uniformly with normal Viterbi algorithm.
Fig. 8. Test method for trace back of Q-VDFS
4 Simulation Results In this section, the usefulness of the Q-VDFS by evaluating the performance of FS decoding for concatenated coding system using computer simulation is presented. The coding parameters and structure are same to the channel coding of ISDB-T. In our simulations, signal is assumed as modulated by QPSK and propagated through AWGN channel. The FS decoding structure is shown as Fig.4. In order to highlight the effect of Q-VDFS with the use of FS method, the decoding without iteration from
Modified Structures of Viterbi Alogrithm for Forced-State Method
247
outer decoder to inner decoder is set as reference of conventional channel decoding for ISDB-T. In Fig.9 and Fig.10, the conventional decoding method is plotted in dashed line in Fig.9 and Fig10. Fig.9 and Fig.10 show computer simulation results. The dashed curves refer to the BER performance of conventional decoding without FS method, in contrast to solid curves, which refers to the BER performance of proposed FS decoding. The following notations in legends will be also used. “double” refers to the decoding method that operates on infinite precision data. “quantization16” and “quantization8” refer to quantization level of channel data on 16-level and 8-level respectively. “pmt6” and “pmt8” refer to bit width for saving path metric on 6 bits and 8 bits respectively. “it(N)” refers to FS decoding with maximum number N of iterations from outer decoder to inner decoder. Fig.9 shows the performance of FS decoding. The performance of conventional decoding that operates on infinite precision data is also presented in this Figure. For conventional decoding operating on 8-level quantization channel data, it incurs about 0.25dB loss at BER=10-5, while the one on 16-level quantization provides performance close to infinite precision. It should be noticed that bit width of path metric can be reduced to 6bits without any performance loss for 16-level channel soft quantization. FS decoding (plotted in solid line) at one iteration provides 0.3-0.35dB coding gain than conventional decoding method and 0.4-0.45dB at two iteration at BER=10-5. In addition, the performance improvement is not distinct if the number of iteration increases up to three and four.
Fig. 9. Performance of FS decoding for ISDB-T
248
Z. Zheng et al.
Fig.10 shows the performance of FS decoding that operates on quantized data. The proposed Q-VDFS is applied in the FS decoding. In contrast with Fig.9, FS decoding based on Q-VDFS also acts effectively like that the FS decoding does on infinitely precision data. That is, with 16-level quantization of channel data and 6-bit width of path metric, Q-FS decoding at one iteration also provides 0.3-0.35dB decoding gain over conventional decoding method that operates on quantized data. And the decoding gain can be extended to 0.4-0.45dB at two time iteration.
Fig. 10. Performance of FS decoding using Q-VDFS
5 Conclusion We have presented a survey of techniques for hardware implementation of Viterbi decoding and proposed a modified structure of Viterbi algorithm for forced-state decoding to concatenated coding system adopted by ISDB-T. The analysis shows that the trace back of modified VD even with quantized data can be implemented uniformly with normal Viterbi algorithm. The effectiveness of the proposed Q-VDFS is approved by the decoding performance evaluation of the forced-state based iterative decoding with quantized data. Simulation results (QPSK, AWGN) show that Q-FS decoding provides 0.3-0.35dB decoding gain over conventional decoding method at one time iteration. And the decoding gain can be extended to 0.4-0.45dB at two time iteration. As a general conclusion, the iterative decoding based on proposed Q-VDFS iterative decoding may be realized for practical application which provides considerable channel decoding gain for ISDB-T system.
Modified Structures of Viterbi Alogrithm for Forced-State Method
249
References 1. ISDB-T: Terrestrial Television Digital Broadcasting Transmission. ARIB STD-B31 (1998) 2. Francis, M., Green, R.: Forward Error Correction in Digital Television Broadcast Systems (2008), http://www.xilinx.com/support/documentation/white_papers/ wp270.pdf 3. Viterbi, A.J.: Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory 13(2), 260–269 (1967) 4. Berlekamp, E.R.: Algebraic Coding Theory. McGraw-Hill, New York (1960) 5. Lamarca, M., Sala, J., Martinez, A.: Iterative Decoding Algorithms for RS-Convolutional Concatenated Codes. In: Proc. 3rd International Symposium on Turbo Codes & Related Topics, Brest, France (2003) 6. Lee, L.-N.: Concatenated Coding Systems Employing a Unit-Memory Convolutional Code and a Byte-Oriented Decoding Algorithm. IEEE Transactions on Communications 25(10), 1064–1074 (1977) 7. Paaske, E.: Improved Decoding for a Concatenated Coding System Recommended by CCSDS. IEEE Transactions on Communications 38(8), 1138–1144 (1990) 8. Hekstra, A.P.: An Alternative to Metric Rescaling in Viterbi Decoders. IEEE Transactions on Communications 37(11), 1220–1222 (1989) 9. Heller, J.A., Jacobs, I.M.: Viterbi Decoding for Satellite and Space Communication. IEEE Transactions on Communication Technology 19(5), 835–848 (1971) 10. McEliece, R., Swanson, L.: On the Decoder Error Probability for Reed-Solomon Codes. IEEE Transactions on Information Theory 32(5), 701–703 (1986) 11. Cheng, K.-M.: More on the Decoder Error Probability for Reed-Solomon Codes. IEEE Transactions on Information Theory 35(4), 895–900 (1989)
A New Cross-Layer Unstructured P2P File Sharing Protocol over Mobile Ad Hoc Network Nadir Shah and Depei Qian Sino-German Joint Software Institute Beihang University Beijing, China [email protected], [email protected]
Abstract. We consider the scenario of mobile ad hoc network (MANET) where users equipped with cell phones, PDAs and other handheld devices, communicate through low radio range technology. In this paper, we propose a new cross-layer unstructured peer-to-peer (P2P) file sharing protocol over MANET. We assume that all nodes, though not necessary for them all to be the members of P2P network, are cooperative in forwarding the data for others. In our proposed algorithm the connectivity among the peers in the overlay is maintained closer to the physically topology by efficiently using the expanding-ring-search algorithm during the joining and leaving process of a peer. The connectivity information among peers is used to send the P2P traffic only to the concerned node, avoiding extra traffic to other nodes. Taking the advantage of wireless broadcast, we propose multicasting mechanism to send at a node the keep-alive and filelookup messages to neighbor peers further reducing the routing overhead in the network. Simulation results show that our approach performs better in comparison with ORION (the flooding-based approach). Keywords: MANET, P2P, MAC layer multicasting.
1
Introduction
Peer-to-peer (P2P) network is a robust, distributed and fault tolerant architecture for sharing resources like CPU, memory, files etc. The approaches proposed for P2P over wired network (Internet) [1, 2, 3, 4] can be roughly classified into structured and unstructured architecture [5]. Each of them has its own applications and advantages. P2P networking is a hot research topic and several P2P applications have been deployed over the Internet [6, 7, 8].In mobile ad hoc network (MANET), the mobile nodes communicating through low radio range self-organize themselves in a distributed manner. Each node in MANET works as both a host (for sending/receiving the data) and a router (maintaining the routing information to forward the data of other nodes).Due to recent advances in mobile and wireless technology, P2P networks can be deployed over MANET composed of mobile devices. The approaches proposed for P2P over Internet cannot be directly applied to the ones over MANETs due to the unique characteristics of MANET, e.g., node T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 250–263, 2010. c Springer-Verlag Berlin Heidelberg 2010
A New Cross-Layer Unstructured P2P File Sharing Protocol over MANET
251
mobility, scarce of power energy, limited memory and infrastructure less nature. Recently, several schemes have been proposed for P2P over MANETs. Majority of them are modification of the existing P2P over Internet [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] while others have adopted new approaches [20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]. Our approach is targeting at the MANETs scenarios where not all the nodes are to share and access the files, i.e. some are peers and others are non-peers. We define a node that joins the P2P network for sharing and/or accessing the files as a peer. Non-peer nodes are called normal nodes. But the normal nodes are cooperative in forwarding the data for other peers. Most of the current approaches for P2P over MANETs would perform poorly in such a scenario. Most of the approaches [9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] are based on the existing routing protocols for MANETs, like OLSR [33] and DSR [34], having the property to send the route request to all the nodes (including normal nodes) producing heavy redundant routing overhead. We propose a new cross-layer approach for unstructured P2P file sharing over MANETs based on the concept to maintain connectivity among the peers closer to the physical network. The file lookup request is only sent to the joined peers and not forwarded to irrelevant normal nodes. Taking the advantage of broadcast nature of wireless medium, expanding-ring-search algorithm is proposed for peer joining and peer recovery to avoid flooding in the whole network.To further take the advantage of wireless broadcast, we propose multicasting mechanism [33,34,35,36,37] to reliably send at a node the keep-alive and file-lookup messages to neighbor peers to reduce the routing traffic. Rest of the paper is organized as follows. Section 2 presents related work. Our approach is presented in section 3. Simulation schemes and results are given in section 4. Finally we conclude our paper and discuss future researches in section 5.
2
Related Work
In ORION [23], a node broadcasts the file-lookup query in the entire network. Receiving the file-lookup query, a node checks its local repository for the matching result. If the matching result is found, the node sends the response on the reverse path toward the requesting peer as in AODV [39]. We will refer to this approach as flooding-based approach. Duran et al. [24] extended the work in [23] by also implementing the query-filtering along with response filtering. That is, upon receiving the file-lookup request, a node forwards the file-lookup request for those files which are not found on the receiving node. Abada et al. [25] extended the approach in [23] by monitoring the redundant paths’ quality for a file at a node on the path by periodically calculating the round-trip-time of each path. The request for a block of file is forwarded on the path at a node having less round-trip-time. Nadir et al. [26] extended further the approach in [23] by calculating the route life time for a file. In this approach, a peer retrieves the file on a path having maximum lifetime instead of shortest path. This would reduce
252
N. Shah and D. Qian
the chances of route disconnection while retrieving the file. The flooding based approaches are inefficient and do not scale well, therefore the above mentioned approaches would not be suitable for P2P file sharing over MANET. Hui et al. [29] presented an approach, named as p-osmosis, for P2P file sharing in opportunistic networks. It propagates the file-lookup request in the network using epidemic approach which produces heavy routing traffic. Lee et al. [30] proposed network coding and mobility assisted data propagation for P2P file sharing over MANET based on the concept that the nodes of the same interest meet frequently with each other. They use one hop transmission and rely on the sharing of the shared file information among the neighbors. Their protocol would perform poorly in our scenario. Because in our scenario, the interested nodes for a file may be few and may not frequently meet with each other.
3
Proposed Algorithm
In our system the term node refers to both normal-node (non-peer) and peer. In our algorithm, a node maintains the routing table at the network layer, storing the destination, the distance from the current node to the destination, and the next hop toward the destination. This routing table is updated when control messages are passed through. Each peer also maintains a local repository, a filecache and an application layer peer-routing table. The local repository stores the files provided by the peer. The file-cache at a peer stores the location information about shared files stored at the neighbor peers up to two-hops logically away. The peer extracts this information when the file-lookup request and response are passed through. File-cache is used to respond quickly for matching file without forwarding the request to the source of the file, and thus decreases the delay of the response as well as the routing overhead. A peer maintains the file-cache for its neighbor peers up to 2-hops away (logically) because the availability of these peers can be confirmed from the peer-routing table. For example in Figure 1, the peer 1 will maintain file-cache for peer 4 and peer 5 but not for peer 6. The peer-routing table at a node stores neighbor peers, their corresponding neighbor peers, the maximum distance to their neighbor peers (MND) and their status. A peer updates its peer-routing table when P2P traffic passed through. At a peer, the status of a neighbor peer is either BIND or NBIND. The current peer must send the keep-alive message to the neighbor peer with a BIND status, and receive the keep-alive message from the neighbor peer with a NBIND status. In this way we avoid redundant transmission of keep-alive messages. Our approach is described in detail as follows. 3.1
Peer-Join
Whenever a node decides to join the P2P file sharing network, it informs its routing agent so that routing agent can inform the application layer about the P2P traffic passed through. Then it broadcasts the joining request (JRQST) using expanding ring search (ERS) algorithm to find the nearest existing peer
A New Cross-Layer Unstructured P2P File Sharing Protocol over MANET RoutingtableofPeer4 Destination Distance NextͲhop 3 1 3 1 3 3 5 1 5
1
2
PeerͲroutingtableofPeer4 NeighborPeer Itsneighbors Status MND 1_ BIND 5 6 NBIND
of the P2P network. JRQST contains the source node (the node requesting to join) address (SRC), sequence number (SEQ), time-to-live (TTL) and other required fields. A node uses SRC and SEQ fields to discard the duplicate requests. Receiving JRQST, a node will update its routing table to store the route to the requesting node (SRC) along with its distance. Receiving JRQST, a normalnode will forward JRQST provided the TTL of the receiving message is greater than zero. Receiving JRQST, a peer will store the sending peer (SRC) as its neighbor peer in its peer-routing table with a NBIND status and a distance of the maximum-neighbor distance (MND), and send a join reply message (JRPLY) to the requesting node. JRPLY contains the destination (DST) (address of the requesting node), source address (SRC) (the address of responding node), the list of its neighbor peers and the maximum distance to its neighbor peers (MND). The peer calculates the distance to its neighbor peers from the routing agent and find the one having largest distance. The value of largest distance is assigned to the MND (the MND is used in recovery operation). JRPLY is forwarded through reverse path established during the request phase. Sending JRQST is stopped when the requesting peer either receives JRPLY from at least one other peer or the TTL of ERS algorithm reaches a maximum threshold value in case there is no JRPLY. The requesting peer receiving JRPLY will store in its peer-routing table the source of JRPLY as its neighbor with a BIND status. The JRPLY also contains the neighbor peers of the node issuing JRPLY; this information will also be stored in the peer-routing table of the requesting peer along with other information. Figure 1 shows the routing table and peer-routing table at a normal node and at some peers. 3.2
Update
Each peer periodically sends keep-alive messages to its directly-connected neighbor peers with BIND status to maintain the connectivity. The keep-alive message of a peer contains the SRC, the list of destinations, the list of all its neighbors and the maximum distance to its neighbor peers (MND). The MND is calculated as in peer-join operation. Through keep-alive messages, the peer updates its connectivity as well as the neighbors list. The response by the peer to keep-alive message also contains the same set of information as in keep-alive message. One more optimization is, to reduce the number of transmissions, that the update operation is initiated by the peer having larger degree in term of number of neighbor
254
N. Shah and D. Qian
peers. In case of tie, the peer having lower node-ID is selected. Here by utilizing the broadcast nature of wireless medium; we propose to use MAC layer multicasting to reduce the number of transmissions in order to reduce routing overhead and delay while insuring reliability. Receiving a keep-alive message, the normal node forwards the message to those nodes for which it is used as next-hop. This can be obtained from the MAC-layer agent, its detail is presented in section 3.6. After exchanging the keep-alive messages, the peer P examines its peer-routing table. If two of its neighbor peers, P1 and P2, are also neighbors, and P1 is closer to P2 than to P, then peer P removes neighbor relationship with P1. By doing so connectivity between peers in the P2P network well matches connectivity in the physical network. This prevents direct connectivity between far away peers provided they can communicate through an intermediate peer. It is explained through the following example. Figure 2(a) shows the relationship among the peers, their corresponding routing tables and peer- routing tables. Suppose peer 2 joins the P2P network, after exchanging the keep-alive messages, the peer 3 detects that one of its neighbor peer 1 having its status as BIND is also the neighbor of the new peer 2, and the distance between peer 3 and peer 1 is greater than the distance between peer 2 and peer 1, as shown in figure 2(b). Then peer 3 removes the neighbor relationship with P1 because they can communicate through the new intermediate peer (i.e. peer 2); the resulting topology is shown in figure 2(c). This P2P topology is closer to the physical network. 3.3
Recovery
When a peer P detects that its connectivity to a neighbor peer P1 is broken, it broadcasts a broken message (BROK). The lost of connection may be caused RoutingtableonNode2 Destination Distance NextͲhop 3 1 3 1 1 1
PeerͲroutingtableofPeer3 PeerͲroutingtableofPeer2 NeighborPeer Itsneigbors Status MND NeighborPeer Itsneigbors Status MND 2 1 NBIND 1 3_ BIND 1 1_ BIND 1
(b) P2P network after node 2 joins
(c) P2P network after removing redundant links
Peer
Non-peer
Fig. 2.
Communication link
A New Cross-Layer Unstructured P2P File Sharing Protocol over MANET
255
by different reasons, e.g., P1 has left the P2P network, or P1 been switched off or due to nodes’ mobility. The peer can detects the disconnection of a neighbor peer through keep-alive messages or by the MAC layer feed-back mechanism. The BROK message contains SRC, ID of the disconnected peer P1, TTL and other required fields. The TTL value is set to the sum of MND and the distance of P1 from P. This is so that the BROK message is sent to only the disconnected peer and neighbor peers of the disconnected peer. Receiving a BROK message, if the ID of the receiving peer matches with the disconnected peer in the message or the receiving peer is one of the directly-connected neighbor peer of the disconnected peer, then the receiving peer will update its routing tables as it does for JRQST and send the reply-broke message (RBROK) having the same fields as in JRPLY. Otherwise the receiving peer forwards further the BROK message provided the TTL in the message is greater than zero. At the end of this operation, a peer checks its peer-routing table and remove the connectivity to far away peers provided they can communicate through an intermediate peer as discussed in the section 3.2. 3.4
Peer-Leave
When a peer wants to leave the P2P network, it invokes peer-leave operation to inform its neighbor peers so that neighbor peers can invoke the recovery operation. Normally, a peer does not inform its neighbor peers about its leaving. 3.5
File-Lookup
Whenever a peer wants to access a file, it will first check its file-cache for availability of matching file. If the information cannot be found in the file cache, a file-lookup request (FSRCH) is issued by the peer to its neighbor peers. The FSRCH message contains the SRC, the list of destinations, key-words and other related information. To take the advantage of wireless broadcast, we use multicasting at the MAC layer to reduce the number of transmissions while insuring reliability. Receiving FSRCH, a normal-node simply forwards the message to those nodes for which it is used as a next-hop. This can be obtained from the MAC-layer agent, its detail is presented in section 3.6. Receiving FSRCH, a peer will search its local repository and the file-cache in turn for a matching file. If no matching file is found in both places, the receiving peer forwards further the FSRCH to its neighbor peers excluding those which have already been visited. If a matching file is found in the local repository or in file-cache, the receiving peer sends file-reply message (RSRCH) to the requesting peer and FSRCH is not forwarded further. Receiving RSRCH, a node updates its routing table about the new routing information. Receiving the RSRCH message, an intermediate peer updates its file-cache to record the location of the file. Upon receiving RSRCH, the requesting peer accesses the file in blocks as in [23]. 3.6
Multicasting Mechanism
In MANET, there are three mechanism to send a packet at a node to more than one node, broadcast, multiple unicasts and multicasting. The broadcasting
256
N. Shah and D. Qian
mechanism is easy and more efficient but less reliable because IEEE 802.11 does not use RTS/CTS and ACK for broadcast packet [34]. In multiple unicasts, the same packet is sent to each destination node one by one using CSMA/CA (RST/CTS and ACK) in IEEE 802.11. This approach is more reliable but causes redundant traffic and more delay. To send a packet at a node to more than one next hops, mulicasting mechanism is used to reduce the redundant traffic while ensuring the reliability. In multicasting mechanism, a node sends the data to a group of nodes reliably while reducing redundant traffic. We implement the multicasting mechanism by integrating both routing agent and MAC-layer agent through cross-layer approach, its detail is as follows. To send a data packet to more than one destinations, the routing agent at a node will send the list of destination and their corresponding next-hops from the routing table to the MAC-layer agent and ask the MAC-layer agent to use the MAC layer multicasting. There are several approaches for MAC layer multicasting [34, 35, 36, 37, 38] to reduce the redundant traffic while providing reliability. Each of them is used for a specific scenario. We extend the approach in [34] for this purpose. Receiving multicast data packet from the routing agent, the MAC-layer agent will first send the MRTS (modified version of RTS) message and waits for certain time to receive CTS packets. The MRTS contains the list of destination hops instead of one destination. Receiving the MRTS message, the MAC layer agent responds with the CTS message if the ID of the receiving MAC-layer agent is in the destinations list in the message. The MAC layer agent delays the sending of CTS packet proportional to its index in the destination list . This is to avoid the packet collision. After expiring the timeout, the sending MAC-layer agent of MRTS will send the data packet to those nodes from which the CTS message is received. Receiving the data packet, MAC layer agent will inform its routing agent about the destinations for which it is used as a next hop. Receiving the data packet, the routing agent checks whether its ID is in the destinations-list of the receiving message. If this is true then the receiving routing agent gives one copy of the message to its corresponding application. Then the receiving routing agent forwards the message to those nodes for which it is used as a next-hop through multicasting mechanism. This repeats till the message is delivered to all the destination.
4
Simulation
We use simulator NS-2 [40] to simulate both our approach and ORION [23], the flooding-based approach, for comparison. The specification of the simulation environment is as follows IEEE 802.11 MAC layer, total 100 number of nodes, transmission range 250m, simulation area 1000X1000, total simulation time 1000 seconds, RandomWayPoint as a mobility mode and 2MB as bandwidth value. Our approach has join/leave operations and maintains the connectivity among the peers proactively while ORION (the flooding-based approach) does not. We would like to see if the merits obtained by maintaining the connectivity among peers can outperform the repeated file-lookup in ORION.We emphasize
A New Cross-Layer Unstructured P2P File Sharing Protocol over MANET
257
the scenario where the peers are moving around with a maximum speed limit and randomly joining/leaving the P2P network while maintaining the given peers ratio. For each scenario we had three random executions and took the average of the results. We take routing overhead and false-negative-ratio as metrics for comparison. 4.1
Comparison of Routing Overhead
Routing overhead is the number of transmitted messages at the routing layer. Figure 3 shows the results of routing overhead for both approaches by varying the number of total file-lookup queries initiated by peers. It is shown in this figure that when the number of total file-lookup queries initiated by the peers is few then ORION performs better in comparison with our approach. Also as the total number of file lookup queries initiated by the peers increases, ORION causes larger routing overhead in comparison with our approach i.e. the routing overhead of ORION is linear to the number of lookup queries. It is because in ORION, every file-lookup query is broadcasted in the entire network. And our approach maintains the connectivity among the peers proactively. In our approach, once the connectivity is established then the file-lookup query is sent to only peers using overlay network. In our approach, the use of multicasting mechanism also reduces further the routing traffic. As the total number of file-lookup queries initiated by peers’ increases, the difference of routing overhead between ORION and our approach decreases until there is reached a point where both approaches have the same routing overhead. We called this point as a balance-point. Beyond the balance point our approach performs better in comparison with ORION in term of routing overhead. For example in Figure 3, both approaches have same routing overhead when the number of total file lookup queries initiated by the peers is 36, and if the number of total file-lookup queries is larger than 36 then our approach performs better in comparison with ORION. So next we are going to find the balance-point for various peers ratio by varying the maximum moving speed of nodes. This is to find the number of file-lookup queries initiated per peer and by all peers to amortize the cost of peer joining and maintaining connectivity among the peers in our approach in comparison with ORION. From the Figure 4, it is shown that the balanced point is achieved at larger number of file lookup queries per peer as the maximum moving speed of nodes is increasing. For example from figure 4a, for 10% peers ratio, the balance point is achieved when the number of file lookup queries per peer is 3.6 at 0.4 m/s maximum moving speed while it is 3.7 at 0.8 m/s maximum moving speed. This is because when maximum moving speed of the nodes is increased then it causes frequent topology changes in our approach causing more routing traffic to re-establish links among the peers. From the Figure 4, it is also shown that with the increase of peers ratio the number of file lookup queries per peer to be initiated, so that the routing overhead of ORION is equal to that of our approach, is decreasing. That is the balanced point is achieved at less number of file-lookup queries per peer as the
Fig. 3. Routing overhead verses total file-lookup queries initiated in the system
10% peers
3.8 36 3.6 33
3.4
30
3.2 0.8
1.2
1.6
3.3 3.2
63 3.1 60 3 57
2.9
54
2.8
2
0.4
0.8
Max velocity (m/s)
(a) 3
2.8 2.7 2.6
80
2.5
76
2.4 72
Total num mber of file lookup queries by all peerrs
2.9
84
96
Numb ber of file lookup queries per peer
Total n number of file lookup queries by all p peers
file lookup queries per peer
40% peers
30% peers
88
2.3
1.6
2.3 2.25 2.2
88
2.15 84
2.1 2.05
80
2 76
2
1.95 0.4
total file lookup queries file lookup queries per peer
Max velocity (m/s)
2 35 2.35
92
2.2
68 1.2
2
(b)
92
0.8
1.6
total file lookup queries
files lookup queries per peer
0.4
1.2
total files lookup queries
Numberr of file lookup queries per peer
0.4
Max velocity (m/s)
3.4
66
Num mber of file lookup queries peer peeer
4 39
69 To otal number of file lookup qu ueries by all peers
4.2
42
Number of file lookup queries per peer
Total number of file lookup queries by all peers
20% peers
4.4
45
0.8
1.2
2
total file lookup queries file lookup queries per peer
Max velocity (m/s)
(c)
1.6
(d) 50% peers 2 1.95
99
1.9
96
1.85 93 1.8 90
1.75
87
1.7
84
Number of file lookup queries per peer
Total nu umber of file lookup queries by all peeers
102
1.65 0.4
0.8
1.2
1.6
2
total file lookup queries file lookup queries per peer
Max velocity (m/s)
(e) Fig. 4. To show the balance-point by varying peers ratio and maximum (Max) moving speed of nodes
peers mum when peers
ratio is increasing. For example from figure 4a and figure 4b, for maximoving speed 0.4 m/s, at 10% peers ratio the balance point is achieved the number of file lookup queries per peer is 3.6 while it is 3.0 for 20% ratio. This is because with the increase of peers ratio, the peers in the
A New Cross-Layer Unstructured P2P File Sharing Protocol over MANET
14%
10%
12%
8%
10% Ratio of FN
Ratio of FN
maximum speed 0.8 m/s
maximum speed 0.4 m/s
12%
6% 4%
8% 6% 4%
2%
2%
0% 10% Peers ratio
20%
30%
40%
FN in Our approach
0%
50%
10%
20%
Peers ratio
FN in ORION
maximum speed 1.2 m/s
16% 14%
12%
12%
10%
10%
Ratio of FN
Ratio of FN
40%
50% FN in ORION
(b)
14%
8% 6%
maximum speed 1.2 m/s
8% 6%
4%
4%
2%
2%
0% Peers ratio
30%
FN in our approach
(a) 16%
259
0% 10%
20%
30%
40%
50%
FN in Our approach
Peers ratio
FN in ORION
10%
20%
30%
40%
FN in Our approach
(c)
50% FN in ORION
(d) maximum speed 2.0 m/s
16% 14%
Ratio of FN
12% 10% 8% 6% 4% 2% 0% 10% Peers ratio
20%
30%
40%
FN in Our approach
50% FN in ORION
(e) Fig. 5. The comparison of FN ratio between ORION and Our approach
network become less disperse. And in our approach, a peer maintains the connectivity to the physically closer peers reducing the routing traffic. Our approach also uses multicasting mechanism which causes further decrease in the routing overhead. Also from the Figure 4, the highest and lowest number of file-lookup queries per peer, to be initiated to amortize the cost of peer joining and maintaining connectivity among the peers in our approach to that of ORION, is 3.6 and 1.75 respectively. These values are reasonable because usually a peer initiates more than 4 file-lookup queries in P2P network. Also as the number of peers in P2P network increases, the probability of number of file-lookup queries initiated by a peer also increases. Thus from above discussion our approach in comparison to ORION reduces routing overhead. It would lead to reduce the energy consumption of the nodes maximizing the network longevity. Reducing routing traffic would also reduce the probability of packet collision in the network resulting in reliable transmission.
260
4.2
N. Shah and D. Qian
Comparison of False-Negative Ratio
False-negative (FN) ratio is the ratio between the numbers of unresolved filelookup queries for the files that exist in P2P network to the total number of initiated file-lookup queries.The Figure 5 shows that our approach has overall lower FN ratio as compared to ORION. It is because our approach maintains the overlay, and the file-lookup request is sent reliably while ORION broadcasts the file-lookup request in the entire network. With the increase of peers ratio the FN ratio increases in both approaches. Its reason in ORION is that we randomly select the peers to generate file-lookup request. Thus with the increase of peers ratio the chances increase that the peers generate file-lookup request simultaneously which would increase the chances of packet collision. Similarly in our approach, with the increase of peers ratio more traffic is generated in the network which causes more packet collisions. Similarly, with the increase of maximum moving speed of nodes, the FN ratio increases in both approaches. Its reason in our approach is that with the increase of maximum moving speed of nodes, the topology changes frequently which results in more routing traffic to re-establish the links among the peers. The reason in ORION is that the chances of reverse path disconnection increases with the increased nodes’ mobility. It is also shown that the increase in FN ratio is lesser with the increase of maximum moving speed as compared to the increase in FN ratio with the increase of peers ratio. This is because our approach uses MAC layer IEEE 802.11 feedback mechanism to detect the link disconnection due to node mobility. When the routing layer at a node detects link disconnection (due to node mobility), it stores the packet, re-establishes the link and sends the packet again. This reduces the FN ratio in our approach.
5
Conclusion and Future Work
In this paper we proposed a new unstructured P2P file sharing algorithm for mobile ad hoc networks. We assumed that all the nodes are not necessary to be the members of P2P network. In our proposed algorithm, a peer maintains the connectivity to the closest neighbor peers by using the expanding-ring-search algorithm for peer joining, taking the advantage of broadcast nature of wireless medium. Further taking the advantage of wireless broadcast we proposed multicasting mechanism at a node for sending the keep-alive and file lookup queries to the neighbor peers. Simulation results show that our approach performs better in comparison with ORION (the flooding-based approach). As a future work, we are going to extend our work to include other aspects of the P2P networks like anonymity, replication factor etc.
References 1. Stoica, I., Morris, R., Karger, D., Kaashoek, M.F., Balakrishnan, H.: Chord; A scalable peer-to-peer lookup service for internet applications. In: Proceeding of the ACM SIGCOMM 2001 Conference, San Diego, California, USA (August 2001)
A New Cross-Layer Unstructured P2P File Sharing Protocol over MANET
261
2. Rowstron, A., Druschel, P.: Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In: Guerraoui, R. (ed.) Middleware 2001. LNCS, vol. 2218, p. 329. Springer, Heidelberg (2001) 3. Pourebrahimi, B., Bertels, K.L.M., Vassiliadis, S.: A Survey of Peer-to-Peer Networks. In: Proceeding of the 16th Annual Workshop on Circuits, Systems and Signal Processing, ProRisc (November 2005) 4. Ripeanu, M., Foster, I., Iamnitchi, A.: Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System Design. IEEE Internet Computing Journal special issue on peer-to-peer networking 6(1) (2002) 5. Meshkova, E., Riihij¨ arvi, J., Petrova, M., M¨ ah¨ onen, P.: A survey on resource discovery mechanisms, peer-to-peer and service discovery frameworks. Journal: Computer Networks 52 (August 2008) 6. http://gtk-gnutella.sourceforge.net/en/?page=news 7. http://www.kazaa.com/us/help/glossary/p2p.htm 8. http://opennap.sourceforge.net/status 9. Turi, G., Conti, M., Gregori, E.: A Cross Layer Optimization of Gnutella for Mobile Ad hoc Networks. In: Proceeding of the ACM MobiHoc Symposium, UrbanaChampain (May 2005) 10. Ahmed, D.T., Shirmohammadi, S.: Multi-Level Hashing for Peer-to-Peer System in Wireless Ad Hoc Environment. In: Proceeding of the IEEE Workshop on Mobile Peer-to-Peer Computing, White Plains, NY, USA (March 2007) 11. Kummer, R., Kropf, P., Felber, P.: Distributed Lookup in Structured Peer-to-Peer Ad-Hoc Networks. In: Meersman, R., Tari, Z. (eds.) OTM 2006. LNCS, vol. 4276, pp. 1541–1554. Springer, Heidelberg (2006) 12. Ding, G., Bhargava, B.: Peer-to-peer file-sharing over mobile ad hoc networks. In: Proceedings of the Second IEEE Annual Conference on Pervasive Computing and Communications Workshops, pp. 104–108 (2004) 13. da Hora, D.N., Macedo, D.F., Nogueira, J.M., Pujolle, G.: Optimizing Peer-toPeer Content Discovery over Wireless Mobile Ad Hoc Networks. In: The ninth IFIP International Conference on Mobile and Wireless Communications Networks (MWCN), pp. 6–10 (2007) 14. Pucha, H., Das, S.M., Hu, Y.C.: Ekta: An efficient DHT substrate for distributed applications in mobile ad hoc networks. In: Sixth IEEE Workshop on Mobile Computing Systems and Applications (WMCSA 2004), pp. 163–173 (2004) 15. Tang, B., Zhou, Z., Kashyap, A., Chiueh, T.-c.: An Integrated Approach for P2P File Sharing on Multi-hop Wireless Networks. In: Proceeding of the IEEE International Conference on Wireless and Mobile Computing, Networking and Communication (WIMOB 2005), Montreal, Canada (August 2005) 16. Oliveira, R., Bernardo, L., Pinto, P.: Flooding Techniques for Resource Discovery on High Mobility MANETs. In: International Workshop on Wireless Ad-hoc Networks 2005 (IWWAN 2005), London, UK, May 2005, pp. 23–26 (2005) 17. Niethammer, F., Schollmeier, R., Gruber, I.: Protocol for peer-to-peer networking in mobile environments. In: Proceedings of IEEE12th International Conference on Computer Communications and Networks, ICCCN (2003) 18. Li, M., Chen, E., Sheu, P.C.-y.: A Chord-Based Novel Mobile Peer-to-Peer File Sharing Protocol. In: Zhou, X., Li, J., Shen, H.T., Kitsuregawa, M., Zhang, Y. (eds.) APWeb 2006. LNCS, vol. 3841, pp. 16–18. Springer, Heidelberg (2006) 19. Leung, A.K.-H., Kwok, Y.-K.: On Topology Control of Wireless Peer-to-Peer File Sharing Networks: Energy Efficiency, Fairness and Incentive. In: The 6th IEEE International Symposium on a World of Wireless Mobile and Multimedia Networks, Taormina, Giardini, Naxos (June 2005)
262
N. Shah and D. Qian
20. Sozer, H., Tekkalmaz, M., Korpeoglu, I.: A Peer-to-Peer File Search and Download Protocol for Wireless Ad Hoc Networks. Computer Communications 32(1), 41–50 (2009) 21. Papadopouli, M., Schulzrinne, H.: Effects of Power Conservation, Wireless Coverage and Cooperation on Data Dissemination among Mobile Devices. In: Proc. ACM Symp. (MobiHoc 2001), Long Beach, CA (2001) 22. Lindemann, C., Waldhorst, O.: A Distributed Search Service for Peer-to-Peer File Sharing in Mobile Applications. In: Proc. 2nd IEEE Conf. on Peer-to-Peer Computing (P2P 2002), Link¨ oping, Sweden, pp. 71–83 (2002) 23. Klemm, A., Lindemann, C., Waldhorst, O.: A special-purpose peer-to-peer file sharing system for mobile ad hoc networks. In: Proc. Workshop on Mobile Ad Hoc Networking and Computing (MADNET 2003), Sophia- Antipolis, France, March 2003, pp. 41–49 (2003) 24. Duran, A., Chien-Chung: Mobile ad hoc P2P file sharing. In: Wireless Communications and Networking Conference, WCNC 2004, vol. 1, pp. 114–119. IEEE, Los Alamitos (2004) 25. Abada, A., Cui, L., Huang, C., Chen, H.: A Novel Path Selection and Recovery Mechanism for MANETs P2P File Sharing Applications. In: Wireless Communications and Networking Conference, WCNC 2007, IEEE, Los Alamitos (2007) 26. Shah, N., Qian, D.: Context-Aware Routing for Peer-to-Peer Network on MANETs nas. In: IEEE International Conference on Networking, Architecture, and Storage, July 2009, pp. 135–139 (2009) 27. Li, M., Lee, W.-C., Sivasubramaniam, A.: Efficient peer-to-peer information sharing over mobile ad hoc networks. In: Proceedings of the 2nd Workshop on Emerging Applications for Wireless and Mobile Access (MobEA 2004), in conjunction with the World Wide Web Conference (WWW) (May 2004) 28. Goel, S.K., Singh, M., Xu, D., Li, B.: Efficient Peer-to-Peer Data Dissemination in Mobile Ad-Hoc Networks. In: Proceedings of International Workshop on Ad Hoc Networking (IWAHN 2002, in conjunction with ICPP 2002), Vancouver, BC (August 2002) 29. Hui, P., Leguay, J., Crowcroft, J., Scott, J., Friedman, T.: Vania: Osmosis in Pocket Switched Networks. In: First International Conference on Communications and Networking in China, Beijing, China (October 2006) 30. Lee, U., Park, J.-S., Lee, S.-H., Ro, W.W., Pau, G., Gerla, M.: Efficient Peer-topeer File Sharing using Network Coding in MANET. Journal of Communications and Networks (JCN), Special Issue on Network Coding (November 2008) 31. Clausen, T., Jacquet, P.: Optimized Link-State Routing Protocol, IETF RFC-3626 (October 2003) 32. Johnson, D., Maltz, D., Hu, Y.: The Dynamic Source Routing Protocol for Mobile Ad Hoc Networks, Internet Draft (April 2003) 33. Fabius, K., Srikanth, Y.Z.K., Tripathi, S.: Improving TCP Performance in Ad Hoc Networks Using Signal Strength Based Link Management. Journal of Ad Hoc Networks 3, 175–191 (2005) 34. Gossain, H., Nandiraju, N., Anand, K., Agrawal, D.P.: Supporting MAC Layer Multicast in IEEE 802.11 based MANETs: Issues and Solutions. In: Proceedings of the 29th Annual IEEE International Conference on Local Computer Network, pp. 172–179 (2004) 35. Chen, A., Lee, D., Chandrasekaran, G., Sinha, P.: High Throughput MAC Layer Multicasting over Time-Varying Channels. Computer Communications 32-1, 94– 104 (2009)
A New Cross-Layer Unstructured P2P File Sharing Protocol over MANET
263
36. Sum, M.T., et al.: Reliable MAC Layer Multicast in IEEE 802.11 Wireless Networks. Wireless Communication and Mobile Computing 3, 439–453 (2003) 37. Jain, S., Das, S.R.: MAC Layer Multicast in Wireless Multihop Networks. In: Proc. Comsware 2006, New Delhi, India (January 2006) 38. Chen, A., Chandrasekaran, G., Lee, D., Sinha, P.: HIMAC: High Throughput MAC Layer Multicasting in Wireless Networks. In: Proc. of IEEE MASS, Vancouver, Canada (October 2006) 39. Perkins, C., Belding-Royer, E., Das, S.: Ad hoc On-Demand Distance Vector (AODV) Routing. RFC 3561 (July 2003) 40. http://www.isi.edu/nsnam
A Model for Interference on Links in Inter-working Multi-hop Wireless Networks Oladayo Salami, Antoine Bagula, and H. Anthony Chan Communication Research Group, Electrical Engineering Department, University of Cape Town, Rondebosch, Cape Town, South Africa [email protected], [email protected], [email protected]
Abstract. Inter-node interference is an important performance metric in interworking multi-hop wireless networks. Such interference results from simultaneous transmissions by the nodes in these networks. Although several interference models exist in literature, these models are for specific wireless networks and MAC protocols. Due to the heterogeneity of link-level technologies in interworking multi-hop wireless networks, it is desirable to have generic models to evaluate interference on links in inter-working multi-hop wireless networks. This paper presents a generic model to provide information about the interference level on a link irrespective of the MAC protocol in use. The model determines the probability of interference and uses the negative second moment of the distance between a receiver-node and interfering-nodes to estimate the interference power on a link. Numerical results of the performance of the model are presented. Keywords: Interference, Inter-working, Multi-hop, Wireless networks.
A Model for Interference on Links in Inter-working Multi-hop Wireless Networks
265
wireless network while externally generated interference is caused by nodes in colocated wireless networks. Either of these types of interference can hamper the reliability of wireless channels (links) in terms of throughput and delay and thereby limit the performance gain of the network [5]. Research has identified that INI is one of the most important causes of performance degradation in wireless multi-hop networks. Hence, the research for analytical models for estimating INI in different wireless networks, has received a lot of attention over the past few years. The interest is expected to increase due to the advent of new architectures and communication technologies, e.g. wireless networks sharing the same frequency band (unlicensed), infrastructure-less wireless networks and ultra-wideband systems [6]. The modeling of INI for inter-working multi-hop wireless networks is an important step towards the design, analysis and deployment of such networks. Recent research papers such as [6], [7] [8], [9] [10], and [11] have developed models for interference in wireless networks. In [13], INI models for aloha MAC protocol and the “Per-route” Carrier Sense Multiple Access (PR-CSMA) MAC protocols were derived. Also, in [14], the effect of interference in ALOHA ad-hoc network was investigated. In [15], a model was proposed for calculating interference in multi-hop wireless networks using CSMA for medium access control. [7] presented the use of Mat`ern point process and Simple Sequential Inhibition (SSI) point process, for the modeling of interference distribution in CSMA/CA networks. The authors in [12] put forth a mathematical model for interference in cognitive radio networks, wireless packet networks and networks consisting of narrowband and ultra-wide band wireless nodes. The research work in [11] presented a statistical model of interference in wireless networks, in which the power of the nearest interferer is used as a major performance indicator instead of the total interference power. These related research works have developed interference models for particular networks and they have assumed different network scenarios and different network topologies. For example, the model presented in [15] was specifically for ad-hoc networks in hexagonal network topology. Such deterministic placement of nodes (square, rectangular and hexagonal) may be applicable where the locations of nodes are known or constrained to a particular structure. However, the deterministic placement of nodes is not realistic for inter-working multi-hop wireless networks. Although, some of the research works mentioned have used stochastic models for nodes’ locations, yet the interference models are inclined towards specific transmission technologies and multiple access schemes [12]. These constraints make the results obtained in these research works not to be easily realizable in other wireless technologies where parameters may differ. The challenge associated with inter-working multi-hop wireless networks includes variation in the transmission technologies of the wireless access networks. These technological differences make it difficult to adopt the interference models presented by the reviewed research works. Therefore, it is desirable to characterize inter-node interference on links in inter-working multi-hop wireless networks. The characterization of interference is necessary for the design of strategies that can optimize network performance and resource allocation. Interfering nodes’ (I-nodes’) behavior (e.g. change in power levels, movement and distance relative to the receiving node (Rnode)) can influence network parameters such as throughput, delay and bit error rate. Thus, interference models are useful in the design of power control strategies and traffic engineering strategies (e.g. routing, admission control and resource allocation).
266
O. Salami, A. Bagula, and H.A. Chan
It is known that the higher the interference between nodes, the lower the effectiveness of any routing strategy in the network [12]. Consequently, the provisioning of quality of service (QoS) and resource dimensioning within the network are impacted. Hence, it is necessary to understand the impact of interference on network parameters. This paper presents a MAC protocol independent model for INI in inter-working multi-hop wireless networks. Specifically, the statistical negative moment of distances between nodes and the probability of interference are used to evaluate the INI power on a link in a region within an inter-working multi-hop wireless network. In order to find the expected value of the INI power on a link, the distribution of the distance (βk,R) between the R node and the I nodes was determined. Then, the spatial density of interfering nodes was estimated using the probability of interference within the inter-working network. A region of interference is defined for each R-node and the interference from nodes beyond this region is said to negligible. An approximation of the negative second moment allowed a tractable mathematical analysis of the relationship between the INI power experienced on a link and other important parameters such as SINR, node transmit power and the spatial node density. The analysis also shows how a wireless link’s performance in terms of SINR depends on these parameters. Such an understanding gives valuable insights to inter-working multi-hop wireless network designers. The numerical results presented validated the interference model by showing the influence of interference on the SINR on a link in inter-working multi-hop wireless network. These results provide insights into the effect of interfering node density on INI power. The contents of this paper are as follows: section 2 discusses the network models which include the node distribution and inter-working models, channel propagation and mobility models and the signal to interference and noise ratio model. Section 3 presents the analysis of INI power and section 4 concludes the paper.
2 Network Models So what do we need to characterize interference? A typical model of interference in any network requires: 1) A model, which provides the spatial location of nodes. 2) A channel propagation model, which explains the propagation characteristics of the network. These include the path loss, node mobility models etc. 3) A model for the transmission characteristics of nodes and a threshold-based receiver performance model. 2.1 Node Distribution and Inter-working Network Model Since nodes’ locations are completely unknown a priori in wireless networks, they can be treated as random. This irregular location of nodes, which is influenced by factors like mobility or unplanned placement of the nodes may be considered as a realization of a spatial point pattern (or process) [3] [8]. A spatial point pattern (SPP) is a set of location, irregularly distributed within a designated region and presumed to have been generated by some form of stochastic mechanism.
A Model for Interference on Links in Inter-working Multi-hop Wireless Networks
267
Fig. 1. Network Model
In most applications, the designation is essentially on planar Rd Euclidean space (e.g. d=2 for two-dimensional) [16]. The lack of independence between the points is called complete spatial randomness (CSR) [3]. According to the theory of complete spatial randomness, for an SPP, the number of points inside a planar region P follows a Poisson distribution [16]. It follows that the probability of p points being inside region P, (Pr (p in P)) depends on the area of the region (Ap) and not on the shape or location of the plane. Pr (p in P) is given by (1), where μ is the spatial density of points.
Pr( p in Ρ ) =
( μ AΡ ) p − μ Ap e , p > 0. p!
(1)
Poisson process’ application to nodes’ positions modeling in wireless environments was firstly done in [17] and then in [18]. In [9] it was proved that if the node positions are modeled with Poisson point process, then parameters such as the spatial distribution of nodes, transmission characteristics and the propagation characteristics of the wireless link can be easily accounted for. The nodes are randomly and independently located. This is a reasonable model particularly in a network with random node placement such as the inter-working multi-hop wireless networks. Moreover the most popular choice for the modelling of the nodes’ spatial distribution is the Poisson point process [3] [6] [7] [8] [9] [11]. The network in fig 1 represents a set of inter-working multi-hop wireless networks. Each network is considered as a collection of random and independently positioned nodes. The nodes in the network in fig 1 are contained in a Euclidean space of 2-dimensions (R2). These set of multi-hop wireless networks are overlapping. They are inter-worked with a gateway i.e. there is inter-domain co-ordination between the networks. The gateways co-ordinate the handover issues within the inter-working networks. The inter-working network in fig. 1 is represented as network Ω, which contain three subset networks (sub-networks) A, B, and C. The total number of nodes in Ω is denoted NΩ, while the number of nodes in sub-networks A, B, C are Na, Nb and Nc respectively, where Na+ Nb+ Nc = NΩ. The spatial density of each
268
O. Salami, A. Bagula, and H.A. Chan
sub-network is given by μA, μB, μC (μ=N/a, N is the number of nodes in a subnetwork, a is the sub-network’s coverage area and μ is given in nodes /unit square). The entire inter-working network is considered as a merging Poisson process with spatial density: μ A + μ B + μC = μ Net . In the network, node to node communication may be multi-hop and nodes transmit at a data rate of Ψ bps. In this paper, sourcenodes are referred to as transmitter-nodes (T-nodes) while destination-nodes are referred to as receiver-nodes (R-nodes). {l : l = 1, 2 ,3,...... n} ∈ L represents the links between nodes, where L is the set of all links in the entire network. The length of a communication link is represented by βT,R, subscript T denotes the transmitter-node while subscript R denotes the receiver-node on the link. 2.2 Propagation and Mobility Models In fig.1, for a packet transmitted by the T-node on link l: l=1, 2, 3,….n and received by the R-node, the actual received power at the R-node can be expressed by the Friis equation given as:
Pl = cPl Al = cPl ( β T , R ) − α . r
⎡ G tG r λ c ⎢c = (4π ) 2 L ⎢⎣
2
f
t
t
(2)
⎤ ⎥ ⎥⎦
Plt: power transmitted by the transmitter- node on link l, Plr: power received by the receiver node on link l. Gt and Gr are the transmitter and receiver gain respectively. λc is the wavelength, λc=g/fc (g is the speed of light and fc is the carrier frequency. Lf which is ≥1 is the system loss factor. To account for path-loss, the channel attenuation for link l is denoted by Al. Pathloss is an attenuation effect which results in the reduction of the transmitted signal power in proportion to the propagation distance between any T-node and corresponding R-node. It is typically given that the received power from a T-node at distance -α βT,R from the R-node decays exponentially as (βT,R) . α is the path loss exponent, which represents the exponential decay of the transmitted power. It depends on the environment and could be a constant between 2 and 6. In this paper1, α=2, so Al.= -2 (βT,R) . The exponential decay of power makes it possible to consider interference from nodes located at a far distance from the R-node as negligible. In this paper, it is assumed that the randomness in the distance between nodes, irrespective of the topology of the network captures the movement of nodes. The movement of a node from one point to another changes its location and consequently its distance to a reference node2. Thus, the variation in the distances between any I-node and an R-node is highly coupled with the movement of the I-node. A case where transmitting nodes and interfering nodes use the same physical layer techniques (e.g. modulation techniques) is termed homogeneous. A heterogeneous 1 2
In free space α=2. Reference node refers to the receiver node (R-node) on the link for which interference is being measured.
A Model for Interference on Links in Inter-working Multi-hop Wireless Networks
269
case occurs when nodes use different physical layer techniques. Nodes are able to transmit signals at power levels, which are random and independent between nodes. 2.3 Link Signal to Interference and Noise Ratio The receiver performance model is based on the Signal to Interference and Noise (SINR) physical layer model. The (SINR) ratio is defined as the ratio of the power at which a transmitted signal was received to the total interference power experienced by the receiving node on a link. The total interference power is the sum of the internode interference power and the noise power level as in equation 3.
Pint = Po + Pini . • • •
(3)
Po: thermal noise power level at the R-node on link l. Po= FkToB (k=1.38 × 10-23 J/oK/Hz (Boltzman constant), To is the ambient temperature, B is the transmission bandwidth and F is the noise figure [19]). Pint: total interference power experienced by the receiver at the end of link l. It is the sum of the thermal noise power and the inter-node interference. Pini: inter-node interference power given by equation 4. S
Pini = ∑cPt (k ) (β k , R ) −α .
(4)
k =1
Pini represents the total interference power from nodes simultaneously transmitting with the T-node on the reference link. For a T-node and an R-node on link l in fig. 1, Pini is the cumulative of the interfering power that the R-node experiences from nodes concurrently transmitting with the T-node. I-nodes are the nodes that can transmit simultaneously with the T-node. S is the total number of I-nodes and k is a counter such that k=1, 2, 3…..S. Pt(k) is the transmitting power of the kth I-node and βk,R is the distance between the kth I-node and the R-node. The value of Pini depends mostly on the density of I-nodes. The density of I-nodes is determined by the number of nodes in the network and the distance between the R-node and the kth I-node. θ(l) represents the SINR on the lth link in the network and it is expressed as:
θ
(l )
=
Pl r Pint
=
cPl t ( β T , R ) −2 S
∑ cP
t (k )
k =1
( β k ,R )
−2
.
(5)
+ Po
A transmitted signal (packet) at a data rate Ψbps can only be correctly decoded by the R-node if θ(l) is not less than an appropriate threshold θ(th) throughout the duration of packet transmission [4] [20]. This condition is given as:
θ ( l ) ≥ θ ( th ) .
(6)
270
O. Salami, A. Bagula, and H.A. Chan
3 Inter-node Interference From the denominator of equation 5, inter-node interference is a major metric that contributes to the SINR. Keeping metrics such as the transmit power and received power at fixed values and the noise power level constant, note that all βk,R are independent and identically distributed random variables (R.V.). These R.V. are distributed within the area of interference.
2.5 2 1.5 1 0.5
11
11.5
10
10.5
9
9.5
8
8.5
7
7.5
6
6.5
5
5.5
4
4.5
3
3.5
2
0 2.5
Inter-node Interference (P dBm
ini )
3
Normalized Distance (βk,R )
Fig. 2. Inter-node Interference vs Distance
Inter-node Interference (P ini) dBm
12
10
8
6
4
2
0 2
4
6
8
10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
Number of interfering nodes (S)
Fig. 3. Inter-node Interference vs number of interfering nodes
Fig. 2 shows the effect of these R.V’s on Pini. With a constant number of I-nodes and varying (βk,R), it can be observed that the higher the value of βk,R, , the smaller the Pini. Fig. 3 confirms that as the number of I-nodes increases, intuitively, Pini increases. From here, an interference constraint can be defined for the inter-working network. Fig. 4 illustrates the constraint, which is that nodes beyond a boundary (r+δr) contribute negligible interference due to the exponential decay of power caused by signal
A Model for Interference on Links in Inter-working Multi-hop Wireless Networks
271
Fig. 4. Representation of the transmission from a T-node to a R-node , with interfering nodes (I-nodes), non-interfering nodes (N-nodes), nodes beyond δr (B-nodes) and gateway nodes (Gnodes)
attenuation. With this constraint, an inter-node interference region bounded by equation 7 is defined.
r < β k , R ≤ r + δr .
(7)
For all potential I-nodes, their separation distance to the reference R-node must fulfill equation 7. The bounded region (δr) is defined as the inter-node interference cluster. The interference cluster consists of nodes (the I-nodes) that can simultaneously transmitting within the frequency band of interest. According to [12] such nodes effectively contribute to the total inter-node interference and thus irrespective of the network topology or multiple-access technique, Pini can be derived. The Non-interfering nodes (N-nodes) are found within range r. Normally, whenever a link is established between a T-node and an R-node, the MAC technique will prohibit nearby nodes in the network from simultaneous transmission. The portion of the network occupied by these nearby nodes is directly related to the size of r around the Rnode, which is a fixed value in case of no node power control [21]. The interference cluster in fig.4 is defined with respect to an R-node. A particular R-node in the inter-working network is surrounded by both I-nodes and N-nodes. Since, interference could be internally or externally generated, there are different scenarios in which an R-node can find itself, based on a defined interference constraint as dictated by the MAC protocol. Some of the scenarios that can occur include: 1) The node could be surrounded by I-nodes and N-nodes from the same multi-hop wireless network. 2) The node could be surrounded by I-nodes and N-nodes from different multi-hop wireless networks. As illustrated in fig. 4, the R-node is surrounded by nodes of other networks, which are its I-nodes, N-nodes and of course other nodes beyond δr (the B-nodes). Theorem 1,
272
O. Salami, A. Bagula, and H.A. Chan
as stated below can be used to characterize these nodes within the inter-working multihop wireless network. d
Theorem 1: If each random point of a Poisson process in R with density λ are of N different types and each point, independent of the others, is of type N with probability N
Pi for i = 1, 2, · · ·N, such that, ∑ Pi = 1 , then the N point types are mutually indei =1
pendent Poisson processes with densities λi = Piλ such that the
N
∑λ i =1
i
= λ [22]
Using the splitting property of the Poisson process in theorem 1, let all nodes in the inter-working network, which is characterized by a Poisson point process with spatial density μNet be sorted independently into 3 types, I-nodes, N-nodes, and B-nodes. If the probability of a node being an I-node, N-node or a B-node is PI, PN, or PB respectively such that PI+PN+PB=1, then these 3 types of nodes are mutually independent Poisson processes with spatial densities:
μI = PI μNet, μN = PN μNet,
μB = PBμNet ,where μNet = μI + μN + μB
μI represents the spatial density of I-nodes, μN is the spatial density of the N-nodes and μB is the spatial density of nodes beyond δr. From here, the effective density of Inodes can be derived. If βx,R represents the link distance between the R-node and an arbitrary node x in the network, then:
Pr( x ∈ N − nodes ) = Pr( β x , R ≤ r ). Pr( x ∈ I − nodes ) = Pr( r < β x , R ≤ r + δr ).
Pr( x ∈ B − nodes ) = Pr( β x , R > r + δr ). It was noted earlier that Pini has a stochastic nature due to the random geographic dispersion of nodes, therefore, Pini can be said to be a random variable. Since several nodes can simultaneously transmit in the δr region and they altogether influence the value of Pini, then θ(l) (the SINR on a link) can be estimated using the expected value of Pini, which is given as:
⎡ S ⎤ E [ Pini ] = E ⎢ ∑ cP t ( k ) ( β k , R ) −α ⎥. ⎣ k =1 ⎦
(8)
S is the total number of I-nodes and k is a counter such that k=1, 2, 3….S. For analytical plausibility and to avoid complexity, let all I-nodes transmission power (Pt(k)) be equal.. Note that the T-node’s transmission power and modulation technique are not necessarily the same as that of the I-nodes. Thus, the network in fig. 4 can be represented as a heterogeneous network, in which different multi-hop wireless networks are inter-working.
A Model for Interference on Links in Inter-working Multi-hop Wireless Networks
273
⎡ S ⎤ E [ Pini ] = cP t ( k ) E ⎢ ∑ ( β k , R ) − 2 ⎥. ⎣ k =1 ⎦
(9)
In order to solve equation 8, the distribution function of the distance between the Rnode and the I-nodes (βk,R), given by f( β ) (r) , is of particular interest. k,R
3.1 Distribution Function of βk, R
From fig. 4, the total inter-node interference region is the area outside the range r. This region consists of nodes that can interfere with the R-node’s reception. However, nodes beyond the bounded region (r+δr) cause negligible interference. The region within δr is the interference cluster, which consists of the effective number of I-nodes. In order to find the probability that the distance between the R-node and all I-nodes fulfill the condition in (7), two events are defined. ξ1= {no I-node exist within distance r}. ξ2= {at least one I-node exist within δr}.
Similar to the nearest neighbor analysis in [23], the probability that concurrently transmitting nodes fulfill the condition in (7) is given by:
(Pr[(ξ1 ) ∩ (ξ 2 )]) = (Pr( ξ 1 ) )(Pr( ξ 2 ) ).
(10)
Pr(ξ1 ) = e − μ I πr . 2
(11)
To evaluate Pr(ξ2), the interference cluster is laid as a strip with length 2πr and width δr as shown in fig. 5. 2π r
δr Fig. 5. An approximation of the ring created by the interference cluster
As δr approaches zero, the area of the annulus can be approximated by 2πrδr. It follows from Poisson distribution that the probability of at least one node in the annulus is:
Pr(ξ 2 ) = 1 − e − μI 2πrδr . From
1− e
the
− μ I 2πrδr
first
and
second
term
of
(12) the
Taylor’s
series
[23],
= μ I 2πrδr. Therefore, the probability of having I-nodes within the
cluster (annulus) is:
(Pr(ξ1 ) )(Pr(ξ 2 )) = (2μ I πrδr )(e − μ πr I
2
).
(13)
274
O. Salami, A. Bagula, and H.A. Chan
This probability can be expressed as:
(
)
Pr(r < βk ,R ≤ r + δr) = (2μI πrδr ) e−μIπr = f βk , R (r)δr 2
∴ f βk , R (r) = 2μI πre−μIπr . 2
(14)
The distribution of the distance between the R-node and I-nodes is f β k,R(r) in (14). Now, it is clear that βk,R has a Poisson distribution. To evaluate the expected value of βk,R in equation 9, the summation of the negative-second moment of a Poisson random variable (βk,R), must be solved.
⎤ ⎡ S E ⎢ ∑ ( β k ,R ) − 2 ⎥ = ⎦ ⎣ k =1
S
∑ E [( β k =1
k ,R
) −2 ]
S
= ∑ϖ
(15)
k =1
Very few approximations for the solution of the negative moments of Poisson R.Vs exist in the literature. Two solutions that have been identified by the authors of this paper are the Tiku’s estimators [24] and the approximations developed by C. Matthew Jones et al in [25]. However, in this paper, the Tiku’s approximation has been adopted. It follows from [24] that:
ϖ ≈
1 . ( μ I - 1)( μ I - 2)........ ...( μ I - τ )
(16)
for the τth negative moment of βk,R (τ represents the positive value of the power of βk,R), μI = PI μNet and PI is the probability of interference. 3.2 Probability of Interference
In practice, not all nodes within δr will transmit at the same time,, therefore PI can be defined by two events: ξ3 -at least a node exist within δr and ξ4 -the node is transmitting. For inter-working multi-hop wireless network, with density μNet, Pr(ξ3) is the probability that the distance between an arbitrary node and the R-node is > r and ≤ r+ δr, (r and δr are defined with reference to the R-node of interest). This probability can also be expressed as the probability that > 0 nodes exist within δr of the R-node and it is given by: 1 − e − μ Net AI where AI is the area of δr for the R-node of interest.
⎧⎪1, if P t(k) > 0 Pr( ξ 4 ) = ⎨ ⎪⎩0, if P t(k) = 0 Thus: PI = Pr(ξ 3 ) Pr(ξ 4 ) = 1 − e − μ Net AI
∀P t(k) > 0.
∀ P t(k) ≥ 0 .
A Model for Interference on Links in Inter-working Multi-hop Wireless Networks
275
3.3 Evaluation of the Interference Power (Pinj)
Since μI can now be evaluated, from (8);
E[ Pini ] ≈ cP t ( k ) × S ×ϖ .
(17)
S ≈ PI × N Ω , NΩ is the total number of nodes in the network. Equation 17 expresses the expected value of the effective Pini experienced on a link. Pini is dependent on the spatial density of the interfering nodes (μI) and the interfering nodes’ transmitting power. In order to validate equation 17, Pini has been used to estimate the value of θ(l) and numerical results have been obtained as shown in fig. 6 and 7. Thus, θ(l) can be approximated as:
θ
(l )
≈
cPl t ( β l ) −2 cP t ( k ) × S × ϖ + Po
.
(18)
The network scenario considered is a case of inter-working IEEE 802.11a/b/g mesh networks with 10 nodes, 15 nodes and 25 nodes respectively in a 1000 unit square area. Gt and Gr, the transmitter and receiver gains respectively are assumed to be equal to 1 and Lf=1. Nodes in the network transmit at 10mW. The area of the interference cluster is 200unit square. The evaluation of the interference power as shown in fig. 6-8 has been done with respect to an R-node on a link of interest (in fig. 4) in the inter-working multi-hop wireless network. The number of nodes in the interworking network was increased as applicable. Fig. 6 shows plot of the network node density and the interfering node density. As more nodes are deployed in the inter-working network (i.e. the network becomes denser), the likelihood of having more nodes interfering with an R-node of interest increases. The increase in the density nodes increases the probability of interference and thus the density of the I-nodes. In fig. 7, a plot of the calculated values of equation 17 is given. By keeping the I-nodes’ transmitting power at a fixed value, the expected value of the interference power (Pini) rises as the density of the I-nodes is increased. To validate the model presented in this paper, the effect of the expected interference power on the link’s Signal to Interference and Noise ratio (θ(l)) curve is as shown in fig. 8. It can be observed that the signal to interference and noise ratio on the link of interest decreases as interference power increases. In an inter-working multi-hop wireless network, nodes that can simultaneously transmit with a T-node on a link of interest effectively contribute to the total inter-node interference experienced by the R-node on the same link. Thus irrespective of the network topology or multipleaccess technique, an approximation of the expected value of the inter-node interference power (Pini) can be derived with the model presented in this paper.
O. Salami, A. Bagula, and H.A. Chan
Interfering node density (μi)
0.14 0.12 0.1 0.08 0.06 0.04 0.02
0.116
0.106
0.096
0.086
0.076
0.066
0.056
0.045
0.035
0.025
0.015
0.005
0
Netw ork Node density (μnet )
(P ini)
Fig. 6. Expected SINR value vs interference power 140.0
Exp ected In terferen ce Po w er in d B
120.0 100.0 80.0 60.0 40.0 20.0 0.116
0.106
0.096
0.086
0.076
0.066
0.056
0.045
0.035
0.025
0.014
0.003
0.0
Interfering node density (μi)
Fig. 7. Expected interference power vs interfering node density 0.9 0.8 0.7
l
Ratio (θ ) (dB)
Signal to Interference and Noise
0.6 0.5 0.4 0.3 0.2 0.1 -118
-116
-115
-113
-111
-109
-107
-106
-104
-102
-96
0 -63
276
Expect ed In t er fer en ce Power (Pini) i n dB
Fig. 8. Signal to Interference and Noise Ratio vs expected interference power
A Model for Interference on Links in Inter-working Multi-hop Wireless Networks
277
4 Conclusion The quality of a wireless link is a measure of how reliable it is. One of the physical layer metrics that can be used to measure a link’s quality is the level of inter-node interference on the link. This paper presented a model for inter-node interference on a link in an inter-working multi-hop wireless network. The inter-node interference model incorporates the probability of interference in inter-working networks and uses the negative second moment of the distance between a receiver-node and nodes simultaneously transmitting with the transmitter-node to evaluate the expected value of the inter-node interference power on a link. Tiku’s approximation for the negative moment of a random variable was adopted. The results obtained confirm that the level of inter-node interference has a substantial effect on the expected quality of the signal received at the receiver node. Thus irrespective of the multiple-access technique, the expected value of the inter-node interference power (Pini) can be derived with the model presented in this paper, The future work of this research includes applying this model in a simulation environment.
References 1. Siddiqui, F., Zeadally, S.: Mobility management across hybrid wireless networks: Trends and challenges. Computer Communications 29(9), 1363–1385, 31 (2006) 2. Li, W., Pan, Y.: Resource Allocation in Next Generation Wireless Networks. Wireless Networks and Mobile Computing series, vol. 5 (2005) ISBN: 1-59454-583-9 3. Salami, O., Bagula, A., Chan, H.A.: Analysis of Route Availability in Inter-working Multihop Wireless Networks. In: Proc. of the 4th International Conference on Broadband Communications, Information Technology and Biomedical Applications (BroadCom 2009), Wroclaw, Poland, 15-19 (2009) 4. Blough, D., Das, S., Santi, P.: Modeling and Mitigating Interference in Multi-Hop Wireless Networks. Tutorial presented at Mobicom 2008, San Francisco, September 14 (2008) 5. Gupta, P., Kumar, P.R.: The Capacity of Wireless Networks. IEEE Transactions on Information Theory 46(2), 388–404 (2000) 6. Salbaroli, E., Zanella, A.: Interference analysis in a Poisson field of nodes of finite area. IEEE Trans. on Vehicular Tech. 58(4), 1776–1783 (2009) 7. Busson, A., Chelius, G., Gorce, J.: Interference Modeling in CSMA Multi-Hop Wireless Networks. INRIA research report, No. 6624 (2009) ISSN: 0249-6399 8. Babaei, A.: Statistical Interference Modeling and Coexistence Strategies in Cognitive Wireless Networks. PhD thesis, George Mason University (Spring semester 2009) 9. Pinto, P.: Communication in a Poisson field of interferers. Master of Science Thesis, Massachusetts Institute of Technology (February 2007) 10. Zanella, A.: Connectivity properties and interference characterization in a Poisson field of nodes. WILAB (IEIIT/CNR) presentation (September 19, 2008) 11. Mordachev, V., Loyka, S.: On Node Density – Outage Probability Tradeoff in Wireless Networks. IEEE Journal on Selected Areas in Communications (2009) (accepted for publication) arXiv:0905.4926v1 12. Win, M., Pinto, P., Shepp, L.: A Mathematical Theory of Network Interference and Its Applications. IEEE Transaction 97(2) (2009) ISSN: 0018-9219
278
O. Salami, A. Bagula, and H.A. Chan
13. Ferrari, G., Tonguz, O.K.: Performance of Ad-hoc Wireless Networks with Aloha and PRCSMA MAC protocols. In: Proc. of IEEE Globecom, San Francisco, pp. 2824–2829 (2003) 14. Ganti, R., Haenggi, M.: Spatial and Temporal Correlation of the Interference in ALOHA Ad Hoc Networks. IEEE Communications Letters 13(9), 631–633 (2009) 15. Heckmat, R., Van Mieghem, P.: Interference in Wireless Multi-hop Adhoc Networks and its effect on Network Capacity. Wireless Networks 10, 389–399 (2004) 16. Diggle, P.J.: Statistical Analysis of Spatial Point Patterns, 2nd edn., 168 p. A Hodder Arnold Publication (2001) ISBN 0-340-74070-1 17. Sousa, E.: Performance of a spread spectrum packet radio network link is a Poisson field of interferes. IEEE Trans. Information Theory 38(6), 1743–1754 (1992) 18. Ilow, J., Hatzinakos, D., Venetsanopoulos, A.: Performance of FH SS radio networks with interference and modeled as a mixture of Gaussian and alpha-stable noise. IEEE Trans. Communication 46(4), 509–520 (1998) 19. Rappaport, T.S.: Wireless Communications-Principles and Practice. Prentice Hall, Englewood Cliffs (2002) 20. Avin, C., Emek, Y., Kantor, E., Lotker, Z., Peleg, D., Roditty, L.: SINR Diagrams: Towards Algorithmically Usable SINR Models of Wireless Networks. In: Proc. of the 28th ACM symposium on Principles of distributed computing, pp. 200–209 (2009) ISBN: 9781-60558-396-9 21. Hekmat, R., An, X.: Relation between Interference and Neighbor Attachment Policies in Ad-hoc and Sensor Networks. International Journal of Hybrid Information Technology 1(2) (2008) 22. Hohn, N., Veitch, D., Ye, T.: Splitting and merging of packet traffic: measurement and modeling. Performance Evaluation 62(1-4), 164–177 (2005) 23. Cherni, S.: Nearest neighbor method, http://www.mcs.sdsmt.edu/rwjohnso/html/sofiya.pdf 24. Tiku, M.L.: A note on the negative moments of a truncated Poisson variate. Journal of American Statistical Association 59(308), 1220–1224 (1964) 25. Matthew Jones, C.: Approximating negative and harmonic mean moments for the Poisson distribution. Mathematical Communications 8, 157–172 (2003)
An Optimum ICA Based Multiuser Data Separation for Short Message Service Mahdi Khosravy1, Mohammad Reza Alsharif1, , and Katsumi Yamashita2 1 Department of Information Engineering, Faculty of Engineering, University of the Ryukyus, 1 Senbaru, Nishihara, Okinawa 903-0213, Japan [email protected], [email protected] 2 Graduate School of Engineering, Osaka Prefecture University, 1-1 Gakuen-cho, Sakai, Osaka, Japan [email protected]
Abstract. This paper presents a new algorithm for efficient separation of short messages which are mixed in a multi user short message system. Separation of mixed random binary sequences of data is more difficult than mixed sequences of multivalued signals. The proposed algorithm applies Kullback leibler independent component analysis (ICA) over mixed binary sequences of received data. Normally, the length of binary codes of short messages are less than the required length that makes ICA algorithm sufficiently work. To overcome this problem, a random binary tail is inserted after each user short message at the transmitter side. The inserted tails for different users are acquired in a way to conclude the least correlation between them. The optimum choice of random binary tail not only increase the performance of separation by increasing the data length but also by minimizing the correlation between multiuser data. Keywords: Short message service, independent component analysis, Kullback Leibler, MIMO.
1
introduction
Short Message Service (SMS) is a communication service standardized in the GSM mobile communication system, using standardized communications protocols allowing the interchange of short text messages between mobile phone devices. SMS text messaging is the most widely used data application on the planet, with 2.4 billion active users, or 74% of all mobile phone subscribers sending and receiving text messages on their phones. There are a lot of possible applications for short text messages, figure. 1 shows possibilities of SMS application. The SMS technology has facilitated the development and growth of text
Also, Adjunct Professor at School of Electrical and Computer Engineering, University of Tehran.
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 279–286, 2010. c Springer-Verlag Berlin Heidelberg 2010
280
M. Khosravy, M.R. Asharif, and K. Yamashita
Fig. 1. Possibilities of SMS communication
messaging. The connection between the phenomenon of text messaging and the underlying technology is so great that in parts of the world the term ”SMS” is used as a synonym for a text message or the act of sending a text message, even when a different protocol is being used. SMS as used on modern handsets was originally defined as part of the GSM series of standards in 1985 [1] as a means of sending messages of up to 160 characters (including spaces) [2] to and from GSM mobile handsets [3]. Multiuser data separation is a very important concept in multi-input multioutput (MIMO) communication systems, such as MIMO-OFDM [4]. In such systems, data of several different users are transmitted by several transmitter antennas. The data of users are passed through a medium and they are reluctantly mixed each other. The multiuser mixed data received at the transmitter
An Optimum ICA Based Multiuser Data Separation for SMS
281
Fig. 2. The typical mixture model for BSS problem.
by several antenna. One of the task should be done in a MIMO communication system is removing the interference effect and separation of multiuser data. Independent component analysis (ICA) [5, 6] has been used by Khosravy et al. for separation of multiuser data in a MIMO-OFDM system [7-9]. Here, we investigate separation of multiuser data of a short message system (SMS) by using Kullback Leibleir ICA [10]. The binary code of text messages are mixed together and ICA is applied for separation of binary codes. The short length of the data is problem for application of ICA which has been solved by a technique called random binary tail insertion. The rest of the paper is as follows. section 2 explains the proposed method. Section 3 discusses the experimental results and finally section 4 concludes the paper.
2
Short Messages Separation by ICA
This section explains how to apply Kullback Leibler ICA for separation of mixed short messages. ICA is one of blind source separation (BSS) methods. First, we briefly explain BSS and Kullback Leibler ICA as a BSS method. Then its application to mixed short messages will be explained. 2.1
Blind Source Separation
The typical mixture model for BSS problem is shown in Fig.2. The mixing process is described as x(t) = As(t) + n(t) (1) where an n-dimensional source vector s(t) = [s1 (t), · · · , sn (t)]T is corrupted by an m × n mixing matrix A and x(t) = [x1 (t), · · · , xm (t)]T is the m-dimensional observations. The blind source separation problem is, without the prior knowledge of s(t) and A, to find a n × m separation matrix W , such that the output of the separation process Y (t) = W x(t)
(2)
282
M. Khosravy, M.R. Asharif, and K. Yamashita
equals to s(t), where Y(t) = [Y1 (t), · · · , Yn (t)]T . Combining Eqs.(1) and (2) together, we can express as Y (t) = Gs(t) + W n(t)
(3)
where G = W A is the overall mixing/separating matrix. Under the noiseless scenario, we assume that the W n(t) can be approximated to zero. Thus the recovered sources are Y (t) = Gs(t)
(4)
where apart from the permutation and scaling permutation, setting G to be I is the target. 2.2
Kullback Leibler ICA
Though there exist a lot of BSS algorithms, the natural gradient learning algorithm [11] is chosen here for its hardware-friendly iterative processing structure. This algorithm is known as an information-theoretical approach. In fact, the statistical independence of separated sources can be measured using the nKullbackLeibler(KL) divergence between the product of marginal densities i=1 PYi (Yi ) and the joint density PY (Y ) . The KL divergence measure is given by D(PY (Y )||
n
PYi (Yi )) =
i=1
P (Y ) Y PY (Y ) log dY n P (Y ) Y i i i=1
(5)
and the independence is achieved if and only if this KL divergence is equal to zero, i.e., PY (Y ) =
n
PYi (Yi )
(6)
i=1
It is equivalent to minimize the mutual information I(Y ) =
n
H(Yi ) − H(Y )
(7)
i=1
where H(Yi ) = −
PYi (Yi ) log PYi (Yi )dYi
(8)
PY (Y ) log PY (Y )dY
(9)
and H(Y ) = −
An Optimum ICA Based Multiuser Data Separation for SMS
283
is the marginal entropy. The first information approach is proposed in [10],iterative update equation of W is given by W t+1 = W t + μ[W t−T − g(Y )X T ]
(10)
where μ is a step size, and g(·) is a nonlinear function defined as g(Y ) = [g(Y1 ), · · · , g(Yn ))]T ∂ g(Yi ) = − log(PYi (Yi )) ∂Yi
(11)
Lately, to obtain fast convergence, the NGLA is proposed in [10]by modifying the Eq.(10) to W t+1 = W t + μ[I − g(Y )Y T ]W t 2.3
(12)
The Proposed Technique
Normally short messages are character data no longer than 160 characters. Its binary code is a short length data too. To apply ICA for separation of mixed short messages, their length are required to be longer. The longer mixtures concludes the better performance of separation by ICA. To overcome the problem of short length of SMS data, we have used a trick. A tail of random binary data will be attached to binary code of each SMS message. Indeed by insertion of this tail, the mixed data after channel will be longer and ICA will separate it efficiently. To make the technique more optimum, the inserted binary tails are acquired in a way with the least correlation from each other. The less correlation of source data leads to higher performance of ICA too. This additional tail will be removed at the receiver side. The inserted random binary tails with the least correlation can be provided before. It is not essential to provide them in real time. In this way by using the prepared binary tails, the system perform faster. Figure 3. shows the block diagram of the proposed technique.
3
Experimental Results and Discussion
Here we investigate the proposed technique. We try to separate the mixed binary codes of two different short messages of two different users. To aim this goal, we have used two different text messages as two different users data as follow; – user 1 Have you ever been in Okinawa? – user 2 ACN 2010 will be held on in Miyazaki, Japan. Message are transfered to binary codes. The above mentioned random binary sequence is attached after each user binary code. Multiuser data will pass through a MIMO communication system. We receive the mixed codes at the receiver side.
284
M. Khosravy, M.R. Asharif, and K. Yamashita
text message 1
Char/Binary
Random Binary Tail Insertion
text message 2
Char/Binary
Random Binary Tail Insertion Prepared Uncorrelated Random Binary Tails
MIMO Transmitter
Kullback Leibler ICA
MIMO Receiver
CHANNEL
Tail Removal
Binary/Char
text message 1
Tail Removal
Binary/Char
text message 2
Fig. 3. Block diagram of the proposed technique
Kullback Leibler ICA is applied over mixed data to separate them. The permutation and scaling ambiguity of ICA has been supposed solved. Then separated data are equalized to binary codes. And finally we will obtain their related text messages. After removing the additional tail and transferring the binary code to text, the text messages are obtained. 3.1
Effect of the Length of the Inserted Tail
Here, the original length of binary data is 135 digits. We observed that without a random binary tail of at least 170 digits, ICA doesn’t work. When it works, we receive the text messages without any error, and when it doesn’t work we lose them completely. To evaluate its performance we have counted number percentage of successful separations. By increasing the length of the additional tail, this percentage is increased. Table 1 shows the effect of length of the additional length in performance of the proposed technique. As it is seen, by just using an additional tail of the length 170 digits it works in 100 percents of transmission runs. Table 1. Effect of length of the random binary tail without optimization Tail length (digits): 145 150 155 160 165 170 Success percentage 7% 20% 39% 90% 97% 100%
3.2
Effect of Optimizing the Random Binary Tail to Uncorrelated Sequences
The above obtained results are related to just random binary tails without any limitation for being uncorrelated. Here we use the prepared random binary tails
An Optimum ICA Based Multiuser Data Separation for SMS
285
Table 2. Effect of length of the random binary tail with optimization Tail length (digits): 145 150 155 160 165 170 Success percentage 7% 13% 57% 97% 100% 100%
with the least correlation. From Table 2 it can be seen the required length 5 digits has been decreased. In this way by considering the least correlation of random binary tails, the efficiency of the proposed technique has been increased.
4
Conclusion
A new optimum technique for efficient separation of short messages mixed in a multi user short message system has been proposed. The proposed technique applies ICA for binary mixed codes. To make ICA sufficient for separation of mixed binary sequences, a random binary tail is inserted after binary code of multiuser SMS data at the transmitter side. The random binary tails not only make the ICA more efficient by increasing the data length but also by decreasing the correlation of binary sequences. To aim this goal the prepared binary tails with the least correlation have been used. It has been shown by using just an additional binary tail of 165 digits the technique 100 % works.
References 1. GSM Doc 28/85, Services and Facilities to be provided in the GSM System, rev. 2 (June 1985) 2. LA Times: Why text messages are limited to 160 characters 3. GSM 03.40, Technical realization of the Short Message Service (SMS) 4. Sampath, H., Talwar, S., Tellado, J., Erceg, V., Paulraj, A.: A fourth-generation MIMO-OFDM broadband wireless system: design, performance, and field trial results. IEEE Commun. Mag. 40(9), 143–149 (2002) 5. Wong, C.S., Obradovic, D., Madhu, N.: Independent component analysis (ICA) for blind equalization of frequency selective channels. In: Proc. 13th IEEE Workshop Neural Networks Signal Processing, pp. 419–428 (2003) 6. Jutten, C., Herault, J.: Independent component analysis versus PCA. In: Proc. EUSIPCO, pp. 643–646 (1988) 7. Khosravy, M., Alsharif, M.R., Guo, B., Lin, H., Yamashita, K.: A Robust and Precise Solution to Permutation Indeterminacy and Complex Scaling Ambiguity in BSS-based Blind MIMO-OFDM Receiver. In: Adali, T., Jutten, C., Romano, J.M.T., Barros, A.K. (eds.) ICA 2009. LNCS, vol. 5441, pp. 670–677. Springer, Heidelberg (2009) 8. Khosravy, M., Asharif, M.R., Yamashita, K.: An Efficient ICA Based Approach to Multiuser Detection in MIMO OFDM Systems. In: 7th International Workshop on Multi-Carrier Systems and Solutions (MC-SS 2009), Herrsching, Germany, May 5-6. LNEE, vol. 41, pp. 47–56 (2009) 9. Khosravy, M., Asharif, M.R., Guo, B., Lin, H., Yamashita, K.: A Blind ICA Based Receiver with Efficient Multiuser Detection for Multi-Input Multi-Output OFDM Systems. In: The 8th International Conference on Applications and Principles of Information Science (APIS), Okinawa, Japan, pp. 311–314 (2009)
286
M. Khosravy, M.R. Asharif, and K. Yamashita
10. Bell, A.J., Sejnowski, T.J.: An information-maximization approach to blind separation and blind deconvolution. Neural Computation 7, 1129–1159 (1995) 11. Amari, S., Cichocki, A., Yang, H.H.: A new learning algorithm for blind signal separation. In: Advances in Neural Information Processing Systems, vol. 8, pp. 757–763 (1996)
Mahdi Khosravy received the B.S. degree (with honors) in electrical engineering (Bioelectric) from the Sahand University of Technology, Tabriz, Iran in 2002, and M.Sc degree in Biomedical engineering (Bioelectric) form Beheshti university of medical sciences, Tehran, Iran in 2004. Since 2007, He is working towards the PhD degree of Interdisciplinary Intelligent Systems at Department of Information Engineering, University of the Ryukyus, Okinawa, Japan. He has worked as a Research Assistant in Digital Signal Processing Lab in connection with Field System Inc. Laboratories in Tokyo, Japan, to develop an algorithm to resolve multipath effects in Sound Code project. His research interest lies in the areas of Blind Source Separation, MIMO Speech and Image Processing, MIMO Communication systems, Linear and nonlinear Digital filters (especially Morphological filters and adaptive filters), Medical Signal and Image processing, ECG Preprocessing and ECG arrhythmia detection. He has published two journals, four lecture notes and 15 conference papers. Mr. Khosravy is currently a Scholar of Monbukagakusho by Japanese government. Mohammad Reza Asharif was born in Tehran, Iran, on December 15, 1951. He received the B.Sc. and M.Sc. degree in electrical engineering from the University of Tehran, Tehran, in 1973 and 1974, respectively, and the Ph.D. degree in electrical engineering from the University of Tokyo, Tokyo in 1981. He was Head of Technical Department of IIRB College, Iran from 1981 to 1985. Then, he was a senior researcher at Fujitsu Labs. Co. Kawasaki, Japan from 1985 to 1992. Then, he was an assistant professor in the school of electrical and computer engineering, University of Tehran, Tehran, Iran from 1992 to 1997. From 1997, Dr. Alsharif is a full professor at the Department of Information Engineering, University of the Ryukyus, Okinawa, Japan. He has developed an algorithm and implemented its hardware for real time T.V. Ghost canceling. He introduced a new algorithm for Acoustic Echo Canceller and he released it on VSP chips. He has contributed many publications to journals and conference proceedings. His research topics of interest are in the field of Blind Source Separation, MIMO Speech and Image Processing, MIMO Communication systems, Echo Canceling, Active Noise Control and Adaptive Digital Filtering. He is a senior member of IEEE, and a member of IEICE. Katsumi Yamashita Yamashitareceived the B.E. degree from Kansai University, the M.E. degree from Osaka Prefecture University and the Dr.Eng. degree from Osaka University in 1974, 1976 and 1985, respectively, all in electrical engineering. In 1982, he became an assistant professor in University of the Ryukyus, where he became a professor in 1991. Now he is a professor in Osaka Prefecture University. His current interests are in digital communication and digital signal processing. Dr. Yamashita is a member of the IEEE, IEICE, and IEEJ.
Multiple Asynchronous Requests on a Client-Based Mashup Page* Eunjung Lee and Kyung-Jin Seo Computer Science Department, Kyonggi University, San 94 Yi-ui Dong, Young-Tong Gu, Suwon, Kyunggy Do, South Korea [email protected], [email protected]
Abstract. This paper considers a client-based mashup, in which a page interacts with multiple service methods asynchronously. Browser systems execute callbacks when the corresponding reply arrives, which can potentially be concurrent to user interface actions. In this case, callbacks and user interface actions share data memory and the screen. Furthermore, when the user sent multiple requests, the shared resource problem becomes more complex due to multiple callbacks. To solve the problem of multiple requests, we adopted the following approach. First, we modeled a mashup page with user actions and callbacks, and presented several types of callbacks. Secondly, we defined the concurrency conditions between callbacks and user actions in terms of shared resources. In addition, we proposed a serialization approach to guarantee the safe execution of callbacks. Finally, we applied the proposed concurrency condition to XForms language, and extended the XForms browser to implement the proposed approach. The prototype implementation showed that the proposed approach enable better user experience on mashup pages. Keywords: Web services, asynchronous service calls, REST, callbacks.
1 Introduction Mashup is a web development method that composes web resources and services to form an application with new functions and/or services [1,2]. With the growing popularity of open web services, mashup has been attracting attention as a new approach to software development [3,4]. In particular, client-side mashup is expected to be one of the most popular types of web applications for lightweight, easy-to-develop, and customized user interfaces for web services. There has also been active research on mashup development methods and tools [3,5,6]. REST is a lightweight protocol for web services, which has been accepted as major web service standard along with SOAP [7]. REST and Ajax are the most popular approaches currently used to develop mashup pages [8,9]. Asynchronous web service communication using Ajax allows the users to perform another task without waiting for the response, thus reducing the user’s waiting time as well as the server load. *
This work was partly supported by the Gyeonggi Regional Research Center.
There have been many studies on the currently available mashup techniques, and comparisons of the various tools available [2,10]. In addition, some papers have presented a new software development approach by composing web services as components [3,6]. Research on client-side mashup is active in the areas of data and user interfaces [5,6,10], but service communications and controls in the client pages have not received as much attention thus far. We inspect the executions of callbacks for asynchronous service calls, and present the problems that can result from executing callbacks concurrently with user actions. In addition, when multiple requests are allowed to be sent out at the same time, the condition of “data race” between callbacks can result. A callback of an asynchronous web service is responsible for storing the result to the local memory, and for notifying the user of the response. In addition, a callback might update or enable a view so that the user can check the response result. Therefore, callbacks share data memory and UI resources with user actions. In order to clarify the problems of callback processing, we define safe conditions for concurrent callbacks, and present an inspection method. The proposed inspection method is applied to XForms language to build a concrete algorithm. We also present a serialization method for potentially conflicting (not safe) callbacks, by delaying their execution until the user requests it explicitly. To verify the proposed approach, we implemented the method to the XForms browser. In the extended browser system, potentially conflicting callbacks are put in a waiting queue until the user requests their execution. This paper is organized as follows. Section 2 briefly examines the related research, and section 3 introduces a mashup page model with asynchronous request callbacks. Section 4 formally introduces the safe condition of callbacks and the inspection method. Section 5 presents the application of the proposed inspection method to XForms language, and the implementation of the callback serialization method by delaying potentially conflicting callbacks. In addition, it presents a prototype implementation to show that the callback serialization method helps users to handle multiple request callbacks. Section 6 concludes the paper.
2
Related Work
With the emergence of web 2.0 and the expansion of mashup over the web, research into mashup has recently become active. Many papers have derived various definitions and formal descriptions of mashup [1,10]. Through these studies, the idea of mashup gained credence in academic society. There have been many studies that have classified and analyzed real-world mashups through a formal approach. Wong et al. presented a classification of mashup [8]. Yu et al. studied the ecosystem of mashup, and found that a few major sites got most of the traffic [11]. Several papers compare and analyzed mashup development tools [2, 10]. There have been several proposals made in terms of classifying mashup types [10]. Mashup is either client or server-based, in terms of the place where the mashup happens. In addition, it is either row or flow type depending on the manner in which the service methods are connected. On the other hand, mashup is classified by the focus of the compositions, which may be either data-centered or user-interface-centered [10].
Multiple Asynchronous Requests on a Client-Based Mashup Page
289
Mashup is considered a component-based development approach, in which services and methods are composed over the web. Some papers refer to the basic component of a mashup as a “mashlet,” and a web application might be developed by composing them as black boxes [5]. In addition, Taivalsaari et al. introduced Mashware, which is a next-generation RIA (Rich Internet application) [12]. Perez et al. proposed a web page design method that minimized communication overhead when multiple widgets were composed into a new application [13]. Mashup clients are often developed in web pages, and run on browser systems. Most browser systems support asynchronous web service communications [14,15] using Javascript API Ajax [8]. Ajax uses a separate thread to wait for the response of each request and to run its callback. Also, the Apache SOAP project developed the Axis web service framework, which supports asynchronous web services either by polling or by callbacks [15]. However, current asynchronous web service approaches assume that the client system handles one response at a time. When multiple requests are sent out at the same time, the responses arrive in a random order, and the current callback frameworks may result in unexpected situations. On the other hand, callback may conflict with user actions. Although there have been studies on the concurrency problem on communication layers or servers [17,18], to the best of our knowledge asynchronous web service requests and callbacks on mashup clients have not yet been studied in the literature. Since multiple requests are important when composing methods on mashup pages asynchronously, they are an important topic for the design and development of mashup applications.
3 Callback Activities of Mashup Pages 3.1 Mashup Page Model A client-based mashup page allows users to send service requests and to handle the responses. For all service methods, the page not only provides a user interface but also contains local memory, maintaining input parameters and output results. Generally, client-based mashup is lightweight and easy to develop, since there is no need to consider server-side development. In addition, it is easy to customize to user needs. In this paper, we call the units of mashup “service methods”(or simply “methods” if there is no confusion), and S is the set of methods in the page. Methods are assembled in a row the RESTful services. Mashup pages have evolved from the traditional web pages in which submitting a request moves to a new page, as shown in Figure 1(a). Figure 1 (b) shows the loosely connected mashup structure, in which the data and user interface for each method is independent from others. Some of the mashups have a strongly connected page structure, as shown in Figure 1(c), in which the data and user interface are blended together. Although a great deal of research has focused on this type of page, most mashup pages on the web are in a loosely coupled style. One of the reasons is that development methods and tools are not yet sufficiently mature to aid in the development of strongly coupled style pages, and therefore, the development of complex client page code is expensive. Approaches to handling service methods as components are based on the loosely coupled architecture. Most mashup tools for client-side are also based on this loosely coupled style [12].
290
E. Lee and K.-J. Seo
UI Data Communication Layer (1) Page download for each service request
(2) Loosely coupled mashup
(3) Strongly coupled mashup
Fig. 1. Structure Types of Mashup Pages
There are many protocols for web services, including SOAP, REST, and XMLRPC. REST is a lightweight and well-defined protocol, and has been adopted as the most popular web service protocol on the internet, often called Open API or RESTful services [11]. RESTful services determine predefined method types and input / output data types, which provide essential guidelines for client-side mashup pages. In this paper, we assume mashup pages with RESTful services and in a loosely coupled style. 3.2
Callback Behavior in Mashup Pages
Client mashup pages provide an interface for method calls and response processing. Page activities are triggered either by user actions or by callbacks. Users prepare input parameters and send the requests through user actions (often through controls such as a submit button). Other user actions include selecting a view, checking local data and editing input parameters for the services. On the other hand, callbacks store the response results if they exist, notify users of the response, and sometimes invoke a view for users to check the result. Therefore, user actions and callbacks share memory and screen in potential concurrency. Figure 2 shows the page activities in terms of user actions (downward arrows) and callbacks (upward arrows). There are three types of operations in mashup pages: data operations for reading or writing memory area, UI operations for updating or holding screens, and communications operations for sending and receiving service requests to/from servers. On a mashup page, user might wait after sending out a request until the response arrives, or send out another request while doing other tasks. If user sends out another request before the response to the previous one is received, multiple requests may be Fig. 1. Page Activities by user action and callback One kernel at sent. Callbacks are normally run on separate threads, and are concurrent to user actions. Therefore, (1) callbacks and user actions can be in a data race condition due to concurrent data operations, and (2) UI operations might conflict by updating the screens concurrently. Many mashup pages prohibit multiple requests because of these potential conflicts. However, allowing multiple requests is necessary in some cases; for example, if service responses are delayed for some reason, or if the user is working in a disconnected environment, finishing more than one task and sending out all requests at
Multiple Asynchronous Requests on a Client-Based Mashup Page
291
% ,G-H
1 ' -%$
' - -' '
$ %
- "#3
% ' % %'',)
Fig. 2. Page Activities by user action and callback
once when connected. Supporting multiple requests is sometimes essential for enhancing the user experience. Some service methods need to communicate synchronously depending on the application and the objects of methods. For example, reservation or payment should be run in transaction, and users should respond immediately for the exception states. Therefore, the mashup page should prevent users from sending another request before the finalization of such a method.
4 Testing Sage Callbacks for Multiple Requests When a callback runs concurrently with other callbacks and user actions, the operations triggered by them may conflict with each other. This section discusses the potential conflicts in terms of operations on shared resources. Based on the observations, we propose a serialization method for potentially conflicting callbacks. In order to model the action processing of mashup pages, we define a page with the following views and actions. Page P consists of a set of service methods SP, a set of views VP, and a local memory MP, which is shared by service methods. In addition, P includes a set of actions AP and a set of operations OP. Then, a mashup page is represented as P = (SP, VP, MP, AP, OP). We can omit the superscript P if the context is clear. We represent user actions and callbacks as Auser and Acallback, where A = Auser Acallback, Auser ∩ Acallback = ∅. A user action or a callback triggers a sequence of operations. First, we define the effects of an operation in terms of memory and screen resources. Communication operations are not considered in this paper.
∪
[Definition 1]. Let P = (S, V, M, A, O) be a mashup page, and R be the screen area in which P resides. For a given operation o ∊ O, we define the effects of o as follows:
292
E. Lee and K.-J. Seo
⊂ ⊂
i) read(o) M is the memory area o reads, ii) write(o) M is the memory area o writes, iii) hold(o) R is a part of screen R that should be holding on during the operation, and iv) update(o) R is a part of screen R in which Io updates.
⊂
⊂
∪
∪
Additionally, data_access(o) = read(o) write(o), ui_access(o) = hold(o) update(o). This paper considers screen area R as a shared resource, and the accessed area of R is represented as update(o) and hold(o). The area hold(o) is where the part of the screen should not be updated while the user is doing the operation o. For example, while the user is thinking of the next move in chess game, the board should stay still until the user’s next move. Update is the area an operation updates, and the example is a “set focus” operation, in which invoking the view updates the screen area. We can define race conditions between two operations using these effects. (1) Two operations o1, o2 ∊ O are in a data race condition if data_access(o1) ∩ write(o2) ≠ ∅ or write(o1) ∩ data_access(o2) ≠∅, (2) Two operations o1, o2 ∊ O are in a screen race condition if ui_access(o1) ∩ update(o2) ≠ ∅ or ui_access(o2) ∩ update(o1) ≠∅. If the two operations o1 and o2 are neither in a data race condition nor in a screen race condition, then we say they are safe to be run concurrently, and this is denoted as o1 o2. This means that there is no potential conflict between them in terms of shared resources. As a next step, let’s consider concurrent actions. Let α be a user action and γ be a callback. These are then represented as a sequence of operations triggered by these actions: α = (a1, a2, ..., an), γ = (b1, b2, ..., bm),
║
where ai, bj ∊ O, 0 ≤ i ≤ n, 0 ≤ j ≤ m. We can extend operation effect function write to a sequence of operations. Then, write(α) = 0≤i≤nwrite(ai), write(X) = u∊Xwrite(u) for an action α and a set of operations X. Other effect functions can also be extended in the same way. Now, we are ready to define concurrent actions.
∪
∪
[Definition 2]. A user action α = (a1, a2, ..., an) and a callback γ = (b1, b2, ..., bm) are safe to run concurrently and denoted as α γ, if ai bj for all i, j.
║
║
We can inspect the concurrency between a user action and a callback in terms of the shared data memory and screen area. If a given callback does not use memory to store the result where user actions write, there is no data race condition. Usually, user actions write on the input parameter area, and callbacks write in the output result area on mashup pages. [Lemma 1]. Let γ be a callback and U be a set of all operations triggered by user actions. If write(U) ∩ write(γ) = ∅ and ui_access(U) ∩ update(γ) = ∅, then the callback γ is safe.
Multiple Asynchronous Requests on a Client-Based Mashup Page
293
Callbacks are initiated by the corresponding responses from the server, and usually run on separate threads. Therefore, callbacks might be concurrent to each other, and there might be race conditions between them. When two callbacks run in any order, we say that they are serializable. [Lemma 2]. For two callbacks γ 1 and γ 2, if update(γ 1) ∩ update(γ 2) = ∅ and write (γ 1) ∩ write(γ 2) = ∅, then γ 1 and γ 2 are serializable. (Proof) For callback γ 1 and γ 2, we can assume read(γ 1) ∩ write(γ 2) = ∅ and read(γ 2) ∩ write(γ 1) = ∅ . In other words, a callback does not access the written data of other callbacks. Therefore, it is possible to check only the write set. On the other hand, since callbacks do not hold the user interface, we can assume hold(γ 1) = hold(γ 2) = ∅. Therefore, we can check concurrency considering only write(γ 1) and
⌷
write(γ 2).
Following the above discussion, we can obtain the condition for a callback to be safe to run concurrently. [Theorem 1]. Let P = (S, V, M, A, O) be a client mashup page, and γ be a callback. Then, γ is safe if (i) (ii)
γ ║ α for all α ∈ Auser γ is serializable to all other callbacks.
This result can be applied to the design of mashup pages as useful patterns. We can categorize a service method μ ∈ S into one of the following. (1) The method μ should run synchronously, (2) the method μ should be guaranteed to be safe, or (3) the callback of μ needs no guarantee to be safe. This categorization is determined by the application and service characteristics. We can then design data and screen usage according to the concurrency conditions. While the request of case (1) is in progress, no other method should be allowed to be sent out. While the callback for the method in case (2) should be guaranteed to access shared resources exclusively, other methods are allowed for multiple requests. However, consideration should be given to the possibility of race conditions by callbacks. Sometimes, conflicts are allowed to overwrite newer data. To protect against the conflicts caused by concurrent callbacks, we propose a serialization method for multiple requests. User actions are serialized by the user. Therefore, we can serialize callbacks if they are run according to the user’s explicit request. We call a method to delay the execution of callbacks until the user explicitly asks for callback serialization. Then, we can guarantee the serial execution of callbacks, and prevent race conditions.
294
5
E. Lee and K.-J. Seo
Design and Implementation on XForms Browser System
5.1 Testing Safe Concurrency Condition on XForms Language We use the result of theorem 1 to test safe concurrency. When the condition of theorem 1 is not satisfied, we consider the callback to have potential conflicts. In this section, we apply the proposed method to XForms language. XForms language has an XML data model, and is an appropriate language for developing web service interface pages [27]. Moreover, since the standard does not specify communication protocols, asynchronous communication can be supported by browser systems. Activities of the page are determined by well-defined action elements. To test the safety of a callback, we should be able to compute the operation effects triggered by the callback and user actions. For general programming languages such as javascript, the control structure is too complex to compute the operation effects. However, XForms language uses predefined element types to describe operations, where each element type has predefined behaviors in terms of memory access and UI updates. The XForms page has local data model M with more than one XML tree instance, and we assume that there are separate instance trees for input parameters and output results. Therefore, if the set of trees for input parameters is Minput and the set of trees for output results is Moutput, then M = Minput ∪ Moutput, Minput ∩ Moutput = ∅ We designed the data model such that each service method has a separate output result tree. The XForms page consists of views, each of which corresponds to a element. Each view contains rendering objects and UI controls. To simplify the problem, we assume that each view occupies the whole screen area, and views are stacked like cards. There are control elements for user actions Auser, <submit> and . The element triggers a sequence of data and UI operations, while <submit> triggers a sequence of data and UI operations ending with a service method request. Each control element includes operations shown as child elements (<setvalue> is an operation element for <submit> in Figure 3 ). The <submit> element has a corresponding submission id as an attribute. The <submission> element encloses all information for the service method communication, including service url (action attribute in Figure 3 ), input parameter data (ref attribute in Figure 3 ), and the location to store the result (ref attribute in Figure 3 ). Moreover, the <submission> element has operations executed by callback, which appear as child elements (<setfocus> is an operation performed at callback of the service method in Figure 3 ). XForms language specifies predefined behavior for each operation element. Therefore, it is possible to compute write() and update() functions, as shown in Table 1. In the table, XPath π is the path for XML instances in the data model [28].
①
②
③
②
④
Multiple Asynchronous Requests on a Client-Based Mashup Page
Fig. 3. XForms code part for edit data request Table 1. Write and update functions for XForms operations XForms element setvalue insert delete setfocus refresh
Operands (attributes)
π, value π, tag, value π
write(o) π
group id
Update(o)
π π Corresponding view area Corresponding view area
List 1 shows the algorithm to test safeness for concurrent callbacks using Table 1. write_UI is the union of all the data area user actions write. This algorithm computes write_UI and write(σ) for all submission σ while scanning the page code. Therefore, we can decide the concurrency of each submission at static time. [List 1] Callback safeness test algorithm Input: For all submission elements σ, XForms page P Output: a set of safe concurrent callbacks CS (1) For a body part of the page, (1-1) write_UI ← ∅, (1-2) for all operations u which are descendent of body element, (1-2-1) write_UI ← write_UI ∪ write(u). (2) For a submission element σ, (2-1) write(σ) ← target attribute of the element σ, and update(σ) ← ∅, (2-2) For all descendent operations o of σ, (2-2-1) write(σ) ← write(σ) ∪ write(o), (2-2-2) update(σ) ← update(σ) ∪ update(o). (2-3) if write(σ) ∩ write_UI ← ∅, and update(σ) ← ∅, then add σ to CS. 5.2 Callback Serialization on XForms Browser In Section 4, we proposed a callback serialization method. Serialization is a feature that browser systems should support, so we implemented the proposed method on top
296
E. Lee and K.-J. Seo
of the XForms browser we previously developed [29]. Callbacks with potential conflicts are identified using the algorithm 1 while loading the page. When a response arrives for a callback with potential conflicts, the browser system stores the message to a message queue and notifies the user of its arrival. When the user selects the message in the queue, the browser performs the operations of the callback. We also divide a callback into two phases, (1) storing the response result at the specified location, and (2) performing the other callback operations. We implemented the asynchronous communication using Java API XmlHttpRequest class, which has a similar function to Ajax [30]. Browser system creates an object of the extended class AsyncRequest and runs the thread to wait for the response. We implemented a message queue for callbacks with potential conflicts. In addition, the user interface has the callback notification panel to show the delayed callback messages and allow users to select a callback to be executed. When the response arrives, the thread does the following. (1) If the callback is safe to be run concurrently, runs callback and finishes, (2) If the callback has no data conflicts with UI operations and other callbacks, the result data is stored to the specified area and the message is put into the queue, and then finishes. (3) If the callback is not concurrent, the message is put into the queue, and then finishes. (4) Callback notification panel is refreshed when adding a new message.
Fig. 4. Callback notification panel screenshot
5.3
When the user asks a callback to be run, the browser system runs the callback operations and deletes the message from the queue. Figure 4 shows a screen shot of callback notification panel. Above is the pending list of messages waiting the response, and below is the response list waiting for the execution of callbacks. Each message of the response list has two buttons, one for starting the execution of callback, and one for deleting the message from the queue.
Scenario Services and Implementation Result
We developed a schedule service to demonstrate the extended browser features. The web services provide RESTful style data management methods for the schedule data that resides in servers. The XForms page consists of several views, which are as shown in figure 5. View (a) shows “schedule list view,” which provides buttons for search, create view, detailed view and edit view. The last two buttons trigger a GET request for the detailed schedule data. View (b) is “edit view,” which provides Update or Delete method submissions. The server system is developed using the Rails framework [31], and a delayed reply was simulated in order to test multiple requests. The callback notification panel showing the waiting message queue is shown below.
Multiple Asynchronous Requests on a Client-Based Mashup Page
297
In figure 5 (a), ‘show’ request is waiting for the response, and ‘edit’ message has been responded to and is waiting for the callback execution. In this case, [Go] button for 'edit' message invokes the view in (b) by <setfocus> operation to allow users to edit the returned list. In the view of Figure 5 (b), ‘update' message is added to the pending list. At this time, ’show' message is responded and added to the response list.
(a)
(b)
Fig. 5. Views of schedule service mashup page
6 Conclusion In this paper, we discussed the problems that can be caused by multiple asynchronous service requests and their callbacks in client-side mashup pages. We first showed that the execution of callbacks concurrently to user actions might cause shared resource problems in mashup pages. We formally defined the safe conditions for the concurrent execution of callbacks. In addition, we presented a test algorithm to determine the safeness of a callback. Our observations suggest meaningful design guidelines for mashup pages in terms of shared resources such as local memory and screen area. The algorithm is applied to XForms language, where we can identify callbacks with potential conflicts. In addition, we proposed a serialization method for callbacks with potential conflicts, and demonstrated the proposed approach on the XForms
298
E. Lee and K.-J. Seo
browser system. The implemented browser system put the messages into a waiting queue if the callback is not safe to run concurrently. The user can then choose when to run the callback. The prototype implementation shows that the proposed serialization method is a valid approach to helping users to handle multiple requests. Callback concurrency is a well-known problem. Current mashup pages do not allow multiple requests because of the risk. Usually, developing and testing multiple requests and concurrency is expensive, and requires extensive effort to prevent conflicts or fine-tune the user experience. In this paper, we formally described the problem and presented safe conditions that are easy to test. Our work will assist developers in deciding when to give extra consideration to callbacks with potential conflicts, while providing a useful design pattern for mashup pages to enable safe callbacks to be run concurrently.
References 1. Yu, J., et al.: Understanding mashup development. IEEE Internet Computing 12(5), 44–52 (2008) 2. Koschmider, A., et al.: Elucidating the Mashup Hype: Definition, Challenges, Methodical Guide and Tools for Mashups. In: 2nd Workshop on Mashups, Enterprise Mashups and Lightweight Composition on the Web in conjunction with the 18th International World Wide Web Conference, pp. 1–9 (2009) 3. Taivalsaari: Mashware the future of web applications is software. Sun Labs Technical Report TR-2009-181 (2009) 4. Auer, S., et al.: Dbpedia: A nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L.J.B., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007) 5. Ennals, R., Garofalakis, M.: MashMaker: mashups for the masses. In: Proc. SIGMOD 2007, pp. 1116–1118. ACM Press, Beijing (2007) 6. Wong, J., Hong, J.: Making mashups with Marmite: Towards end-user programming for the web. In: Proceedings of the SIGCHI conference on Human factors in computing systems, San Jose, USA, pp. 1435–1444 (2007) 7. Fielding, R.T.: Architectural Styles and the Design of Network-Based Software Architectures. Doctoral dissertation, Dept. of Computer Science, Univ. of Calif., Irvine (2000) 8. Holdener III, A.: Ajax, the definitive guide. O’Reilly, Sebastopol (2008) 9. Richardson, L., Ruby, S.: RESTful Web Services. O’Reilly Media, Inc., Sebastopol (2007) 10. Lorenzo, G., Hacid, H., Paik, H., Benatallah, B.: Data integration in mashups. ACM Sigmod Record 38(1), 59–66 (2009) 11. Yu, S., Woodard, J.: Innovation in the programmable web: characterizing the mashup ecosystem. In: ICSOC 2008. LNCS, vol. 5472, pp. 136–147. Springer, Heidelberg (2009) 12. Linaje, M., Preciado, J.C., Sánchez-Figueroa, F.: A Method for Model-Based Design of Rich Internet Application Interactive User Interfaces. In: Baresi, L., Fraternali, P., Houben, G.-J. (eds.) ICWE 2007. LNCS, vol. 4607, pp. 226–241. Springer, Heidelberg (2007) 13. Perez, S., Diaz, O., Melia, S., Gomez, J.: Facing interaction-rich RIAs: the orchestration model. In: Proc. of 8th International Conference on Web Engineering (ICWE), pp. 24–37 (2008) 14. Firefox browser, http://www.mozilla.com/firefox/ 15. XForms processor from Forms Player, http://www.formsplayer.com
Multiple Asynchronous Requests on a Client-Based Mashup Page
299
16. Apache group, Axis web services, http://ws.apache.org/axis/ 17. Brambilla, M., et al.: Managing asynchronous web services interactions. In: Proc. IEEE International conference on Web services, pp. 80–87 (2004) 18. Puustjarvi, J.: Concurrency control in web service orchestration. In: Proc. IEEE International conference on Computer and Information Technology (CIT 2008), pp. 466–472 (2008) 19. XForms 1.1 W3C Candidate Recommandation, http://www.w3.org/TR/xforms11/ 20. World-Wide Web Consortium standards including XML, XML Schema, and XPath 21. Yoo, G.: Implementation of XForms browser as an open API platform, MS thesis, Kyonggi University (2007) 22. Java2Script, http://j2s.sourceforge.net/ 23. Thomas, D., Hansson, D.H.: Agile Web Development with Rails, 2nd edn. Pragmatic Bookshelf (2006)
Using an Integrated Ontology Database to Categorize Web Pages Rujiang Bai, Xiaoyue Wang, and Junhua Liao Shandong University of Technology Library Zibo 255049, China {brj,wangxixy,ljhbrj}@sdut.edu.cn
Abstract. As we know, current classification methods are mostly based on the VSM (Vector Space Model), which only accounts for term frequency in the documents, and ignores important semantic relationships between key terms. We proposed a system that uses an integrated ontologies and Natural Language Processing techniques to index texts. Traditional Words matrix is replaced by Concepts based matrix. For this purpose, we developed fully automated methods for mapping keywords to their corresponding ontology concepts. Support Vector Machine a successful machine learning technique is used for classification. Experimental results shows that our proposed method dose improve text classification performance significantly. Keywords: Text classification; ontology; RDF; SVM.
Using an Integrated Ontology Database to Categorize Web Pages
301
in language translation, information extraction and in search engines. The SENSEVAL 2 workshop (http://www.senseval.org/) in which 35 teams with a total of 90 systems participated, demonstrated that many of the different approaches yielded good results and that current systems manage to achieve an accuracy up to ninety percent, although most systems achieve a much lower accuracy. However, most of these systems didn’t resolve ontological indexing problem perfectly. Our goal is proposed an optimize algorithm which can transfer traditional ‘Bag-ofWords’ matrix to ‘Bag-of-Concepts’ matrix underlying RDF ontologies efficiency. This method can improve the performance of text classification clearly. The rest of the paper is organized as follows. Section 2 describes some preliminaries. In Sect. 3, Discussed our proposed method. The experimental setting and results are discussed in Sect. 4. We conclude our paper in Sect. 5.
2 Preliminaries Definition 1. Controlled vocabulary CV CV := named set of concepts c with c: (name, definition, identifier, synonyms) Controlled vocabularies are named lists of terms that are well defined and may have an identifier. The elements of a controlled vocabulary are called concepts. In ontologies the concepts are linked by directed edges, thus forming a graph. The edges of an ontology specify in which way concepts are related to each other, e.g. ‘isa’ or ‘part-of’. Definition 2. Ontology O O := G(CV,E) with E ⊆ CV × CV and a totally defined function t: E → T, which defines the types of edges. T is the set of possible edge types, i.e. the semantics of an edge in natural language and its algebraic relational properties (transitivity, symmetry and reflexivity). All ontologies have an edge type ‘is-a’ T. If two concepts c1, c2 CV are connected by an edge of this type, the natural language meaning is ‘c1 is a c2’.
∈
∈
3 Proposed Method First, The aim is to develop a semantic text classifier, that uses several mapped RDF ontologies for concept based text indexing. The system consists of several components that together fulfil the required functionality (see Fig. 1). In the following, an overview of the different steps is given. Importing and mapping ontologies. First the different ontologies have to be imported to the system by using an RDF-parser. Then the equivalent concepts of the different ontologies are mapped. Documents Preprocessing. This step includes string token, removal of special characters, drop stopwords etc. Building “Bag of Concepts” (BOC) matrix. In this step, the words in the text are linked to ontological concepts. In order to support word sense disambiguation (mouse as a pointing device vs. mouse as a animal), the context in which a word appears in a
302
R. Bai, X. Wang, and J. Liao
text is compared to the context of homonymous concepts (subconcept vs. superconcept). In addition word stemming and part-of-speech (POS) information is used. Stemming allows mapping words independent of their flection (i.e. conjugation and declension). POS information prevents that for example verbs and nouns are confused (the word milk ‘‘to milk a cow’’ vs. ‘‘the milk is sour’’). Input BOC and BOW matrix to text classifier separately during experiment. Documents RDF Representation
Mapping ontologies Homonymous concepts and synonyms+ all super and sub concepts DataBase
TokenLengthFilter
Ontologyical indexing Building BOC matrix
Text classifier
Fig. 1. Overview of the system
3.1 Importing Ontologies Since, there are many different formats for storing ontologies, a common mechanism for exchanging them is needed. The RDF recommendation (http://www.w3.org/RDF/) that was released by the W3 consortium might be able to be at least a common denominator. RDF is a data format based on XML that is well suited as an interchange format of ontologies and controlled vocabularies. RDF is widely accepted for data interchange and RDF versions of most important ontologies are available. For the work presented in this publication, WordNet [6], OpenCyc [7] and SUMO (http://ontology.teknowledge.com) were used. In order to parse and import ontologies to the database, we used the Jena API, v1.6.0 (McBride). It was developed by McBride and is promoted by HP under BSDstyle license for free. Importing an ontology consists of mainly three steps:
Using an Integrated Ontology Database to Categorize Web Pages
303
1. Convert RDF-file to JENA-model. 2. Read configuration file (Fig. 2) for ontology import. A configuration file stores individual properties of an ontology and maps the different notations for equivalent characteristics. Each ontology has a set of characterizing attributes. Beside name and language the relation types of an entry are essential: synonym, isA, isPartOf, isComplementOf and isInstanceOf. This information can vary in different RDF-files and must be inserted manually, according to the RDF source that is used (e.g. comment as description or comment). The UniqueId attribute configures the import to use complete URIs or concept names as concept IDs. New ontologies that are available in RDF can be imported simply by adding a new line to the configuration file. 3. Write concepts and relations into the database. The concepts, relations and synonyms are written to the relational backend via the Java JDBC interface (Java Database Connectivity).
Fig. 2. Configuration file for ontology import
3.2 Mapping Ontologies For the work in this publication, equivalent concepts in the different ontologies had to be aligned. For aligning ontologies several approaches exist: (a) Linguistically similar concept definitions. (b) Linguistically similar concept names and synonyms of the concepts. (c) Manual and tool supported concept mapping. Several tools that support manual ontology mapping exist, such as Chimera [8], FCA-Merge [9], SMART [10] and PROMPT [11]. Some of these tools are reviewed in [12]. (d) Comparing the concept context between the ontologies to be aligned (same name in subconcept and/or superconcept). Considering the size of the ontologies, it was decided to use a fully automated method for ontology alignment. This alignment procedure should be conservative, i.e. only equivalent concepts should be aligned, even if not all equivalent concepts could be found in different ontologies. Approach (d) seems to be best suited for this purpose. In order to define how equivalent concepts were found, we have to define homographic concepts first. Two concepts c are homographic if they have the same name and belong to different ontologies O. Definition 3. Homographic concepts H H :={(c1, c2)|c1
∈
O1 ^ c2
name(c1) ⊆ synonyms(c2)
∈
O2 ^ O1 ≠ O2 ^ (name(c1)=name(c2)
∨synonyms(c1)∩ synonyms(c2) ≠ φ )}
∨
304
R. Bai, X. Wang, and J. Liao
where name(c) is a function that returns the concept name, and synonyms(c) is a function that returns a set of all synonyms of a concept. Equivalent concepts are concepts that represent the same real world entity. Homographic concepts may still contain words that refer to different real world entities, such as mouse (pointing device) vs. mouse (animal). For the scope of this work, concepts are considered to be equivalent, if they are homographic, and if their sub- or superconcepts are homographic: Definition 4. Equivalent concepts S
∈ ∧ ∈ ∧ ∨
∧
S : {(c1, c2)|c1 O1 c2 O2 O1 ≠ O2 (subconcepts(c1, t) and subconcepts(c2, t) is homographic superconcepts(c1,t) and superconcepts(c2, t) is homographic)} where superconcepts(c, t) and subconcepts(c, t) are functions that return the set of all direct super- and sub-concepts of a given concept (with relation type t). 3.3 Ontological Indexing Before ontological indexing, the documents should be preprocessed, which include string token, removal of special characters, drop stopwords etc. After this step we get cleaned up wordlist for each document. For the next step the main idea is to compare the context of a word to be indexed to the context of a concept in the ontology. This comparison results in a score, which indicates how good a word was mapped to an ontological concept. Definition 5. Context of a concept c
The context wcont of a word w is defined as the set of all stems for all the words that occur in the same document as w. The context also does not contain stopwords, which were already filtered out. The ontological indexing process maps words in the text to concepts of the ontologies. For each individual word, we calculated a mapping score ms(w, c), that indicates how good the word w is mapped to the ontological concept c by comparing wcont and ccont. For the prototypical implementation of the system, we counted the number of elements that occur in both contexts, i.e. no difference was made between synonyms, subconcepts and superconcepts. This number is divided by the context size, in order to make up for different context sizes. In addition part-ofspeech information is taken into account, and for the comparison of words and concepts word stemming is used. Definition 6. Mapping score ms
ms(w, c) :=
wcont ∩ c cont c cont
(1)
Using an Integrated Ontology Database to Categorize Web Pages
305
Mapping algorithm: // list of all words in a document W: {w | w in document} //loop through all words in the document that is indexed for i = 1 to |W| /* Loop through the set of all ontologies O, andcreate a list C with all concepts c, where the stem of the concept name or the stem of a synonym of the concept equals the stem of the word, and where the Part-ofSpeech of the concept equals the Part-of-Speech of the word */ C:= {}; for k = 1 to |O| L :={c| c
∈
Ok}
for m = 1 to |L| if stem(w) = stem(name(c)) OR stem(w) 2 stem(synonyms(c)) then if POS(w) = POS(c) then C = C [ {c} next m next k // maximum score mxs :=0 // index of the best scoring concept mxj :=0 for j = 1 to |C| if ms(wi,cj) > mxs then mxs :=ms(wi, cj) mxj :=j next j
306
R. Bai, X. Wang, and J. Liao
// map the best scoring concept to the next word map(wi, cj, mxs) next i. where stem(w) returns the stemmed representation of a word w and POS(w) returns the part-of-speech information for a word w. This mapping algorithm is simplified in order to keep it easy to understand. The implemented algorithm takes care of situations where for a given word several concepts have the same score. If this is the case, the word is mapped to both concepts. When equivalent concepts of different ontologies are mapped, the concept is only mapped to the concept of the biggest ontology. When for a word no concept is found in one of the ontologies, a new concept along with the POS information is added. However, this concept has no relations to any other concepts. The introduced method for context based ontological indexing of documents requires word stemming and POS tagging. Therefore, appropriate methods/tools for stemming and POS tagging had to be selected. This work we decided to use QTag 3.1[13] for several reasons: good accuracy (97%), performance (several MB of text per second on a Pentium 2400 MHz), Java API and the availability which is free of charge for non-commercial purposes. QTag uses a variant of the Brown/ Penn-style tagsets [14]. QTAG is in principle language independent, although this release only comes with resource files for English and German. If there is the need to use it with other languages, it can be trained using pretagged sample texts for creating the required resource files. For word stemming different approaches exist. We used an implementation according to the Paice/Husk algorithm [15] also known as Lancaster algorithm, which is an affixremoval stemmer. Those stemmers are widely used and different implementations are available. As it’s the best known implementation we used the Java adaption of the original located at the Lancaster University.
4 Experiments 4.1 Data Sets The main goal of this research is increasing the classification accuracy through “Bag of Conception” model. We used three real data sets in our experiments: Reuters-21578 [16], OHSUMED [17], and 20 Newsgroups (20NG) [18]. For the Reuters-21578, following common practice, we used the ModApte split (9603 training and 3299 testing documents), and two category sets: the 10 largest categories, and 90 categories with at least one training example and one testing example. OHSUMED is a subset of MEDLINE, which contains 348,566 medical documents. Each document contains a title, and about two-thirds (233,445) also contain an abstract. Each document is labeled with an average of 13 MeSH3 categories (out of 14,000 total). Following [19], we used a subset of documents from 1991 that have abstracts, taking the first 10,000 documents for training and the next 10,000 for testing. To limit the number of categories for the experiments, we randomly generated 5 sets of 10 categories each. 20 Newsgroups (20NG) is a well-balanced data set containing 20 categories of 1000 documents each.
Using an Integrated Ontology Database to Categorize Web Pages
307
4.2 Experimental Results A linear Support Vector Machine (SVM) [19] is used to learn a model to classify documents. We measured text categorization performance using the precision-recall break-even point (BEP). For the Reuters and OHSUMED data sets, we report both the micro-F1 and the macro-F1, since their categories differ in size substantially. The micro-F1 operates at the document level, and is primarily affected by the categorization performance on larger categories, whereas the macro-F1 results over categories, and thus small categories have large impact on the overall performance. Following established practice, we used a fixed data split for the Reuters and OHSUMED data sets, and consequently used macro-sign test (S test) to assess the statistical significance of differences in classifier performance. For the 20NG data set, we performed fourfold cross-validation, and used paired t test to assess the significance. Results As a general finding, the results obtained in the experiments suggest that our proposed method typically achieves better classification for both macro- and micro-F1 results when used with concept-based features. Fig.3. summarizes the results of the experiments for different datasets with the best macro- F1 value values. 1 0.9 0.8 0.7 0.6
BOW BOC
0.5 0.4 0.3 0.2 0.1 0 Reuters-21578
OHSUMED
20NG
Fig. 3. Different datasets with the best macro- F1 value values
Fig. 4. summarizes the results of the experiments for different datasets with the best micro- F1 value values. Compared to the BOW representation model show that in all datasets the performance can be improved by including conceptual features, peaking at an relative improvement of 6.5 % for macro-F1 values and 4.2 % for micro-F1 values on OHSUMED dataset. The relative improvements achieved on OHSUMED are generally higher than those achieved on the Reuters-21578 corpus and 20NG. This makes intuitively sense as the documents in the OHSUMED corpus are taken from the medical domain. Documents from this domain typically suffer some problems such as synonymous terms and multi-word expressions. Through BOC matrix can properly deal with these
308
R. Bai, X. Wang, and J. Liao
1 0.9 0.8 0.7 0.6
BOW BOC
0.5 0.4 0.3 0.2 0.1 0 Reuters-21578
OHSUMED
20NG
Fig. 4. Different datasets with the best micro- F1 value values
problems efficiency. Another reason we explain is that our ontology database ontology insufficiency to disambiguate word senses and generalize text representation properly in Reuters-21578 corpus and 20NG. The results of the significance tests allow us to conclude that the improvements of macro-averaged F1 are higher than with micro-averaging which seems to suggest that the additional concepts are particularly helpful for special domain data sets.
5 Conclusions In this paper we presented a new ontology-based methodology for automated classification of documents. To improve text classification, we enrich documents with related concepts, and perform explicit disambiguation to determine the proper meaning of each polysemous concept expressed in documents. By doing so, background knowledge can be introduced into documents, which overcomes the limitations of the BOW approach. The experimental results demonstrate the effectiveness of our approach. In our future work, we plan to improve our ontology mapping algorithm. Our ontology database will include a relation graph for each concept, which includes synonyms, hyponyms and associative concepts. The use of such graph can be useful to achieve an improved disambiguation process. Acknowledgments. This work was granted in part Shandong Natural Science Foundation of China Project # ZR2009GM015.
References 1. Kehagias, A., et al.: A comparison of word- and sense-based text categorization using several classification algorithms. Journal of Intelligent Information Systems 21(3), 227–247 (2003) 2. Moschitti, A., Basili, R.: Complex Linguistic Features for Text Classification: A Comprehensive Study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181– 196. Springer, Heidelberg (2004)
Using an Integrated Ontology Database to Categorize Web Pages
309
3. Sahlgren, M., Cöster, R.: Using Bag-of-Concepts to Improve the Performance of Support Vector Machines in Text Categorization. In: Proceedings of the 20th International Conference on Computational Linguistics, pp. 487–493 (2004) 4. Mihalcea, R., Moldovan, D.: An iterative approach to word sense disambiguation. In: Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society (FLAIRS), Orlando, Florida, USA. AAAI Press, Menlo Park (2000) 5. Voorhees, E.: Natural language processing and information retrieval. In: Pazienza, M.T. (ed.) Information Extraction: Towards Scalable, Adaptable Systems, pp. 32–48. Springer, New York (1999) 6. Fellbaum, C.: WordNet: An Electronic Lexical Database, Language, Speech, and Communication. MIT Press, Cambridge (1998) 7. Lenat, D.B., Guha, R.V.: Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project. Addison-Wesley Pub. Co., Reading (1989) 8. McGuinness, D.L., Fikes, R., Rice, J., Wilder, S.: An Environment for Merging and Testing Large Ontologies. In: Proceedings of the Seventh International Conference on Principles of Knowledge Representation and Reasoning (KR 2000), Breckenridge, Colorado (April 2000)
Topic Detection by Topic Model Induced Distance Using Biased Initiation Yonghui Wu1,2 , Yuxin Ding2 , Xiaolong Wang1,2 , and Jun Xu2 1
Harbin Institute of Technology, Harbin, People’s Republic of China 2 Key Laboratory of Network Oriented Intelligent Computation, Harbin Institute of Technology, Shenzhen Graduate School Shenzhen, People’s Republic of China {yhwu,wangxl}@insun.hit.edu.cn, [email protected], [email protected]
Abstract. Clustering is widely used in topic detection task. However, the vector space model based distance, such as cosine-like distance, will get a low precision and recall when the corpus contains many related topics. In this paper, we propose a new distance measure method: the Topic Model (TM) induced distance. Assuming that the distribution of word is different in each topic, the documents can be treated as a sample of the mixture of k topic models, which can be estimated using expectation maximization (EM). A biased initiation method is proposed in this paper for topic decomposition using EM, which will generate a converged matrix for the generation of TM induced distance. The collections of web news are clustered into classes using this TM distance. A series of experiments are described on a corpus containing 5033 web news from 30 topics. K-means clustering is processed on test set with different topic numbers. A comparison of clustering result using the TM induced distance and the traditional cosine-like distance are given. The experiment results show that the proposed topic decomposition method using biased initiation is effective than the topic decomposition using random values. The TM induced distance will generate more topical groups than the VS model based cosine-like distance. In the web news collections containing related topics, the TM induced distance can achieve a better precision and recall. Keywords: Topic detection, topic model, clustering, distance measure.
1
Introduction
Ever since the Internet became an important part of our life, the Internet users have been facing huge amount of news articles. Users always prefer well structured or classified information which is convenient to find desired topics quickly and easily. Nowadays, Search Engines are popular tools for the searching of concrete concept, which can be described by “key words”. However, there are many users just want to know the news topics and get an outline of the topic. This problem, which is known as “topic detection” in T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 310–323, 2010. c Springer-Verlag Berlin Heidelberg 2010
Topic Detection by Topic Model Induced Distance Using Biased Initiation
311
Topic Detection and Tracking (TDT) [16] research, is also referenced in the comparative text mining (CTM) [14] task. Clustering algorithms are used to cluster the text stream from the web news into different topic clusters to help the users following recent topics. The representation of text is the key point of clustering. The Vector Space (VS) model is widely used in the representation of text documents. News documents are represented as bag of words in the VS model. The similarity between documents is generated from the VS model using different similarity measure methods, such as the cosine-method. These similarity values are then used in the clustering process to optimize an object function derived from the representation. In K-means clustering, the optimization object is the minimum squared represented error. However, the optimization of represented error do not always generate the optimized topical classes as the result of the limited ability of VS model. In order to capture the long-range topical dependency features, language model (LM) are introduced. The collection of web news streams is treated as one single collection in LM based topical clustering. Assuming the distribution of word is different in each topic, the word distribution of a topic can be treated as a topic-based language model (TM) and each document can be treated as a sample of the mixture of k TM. The parameter of a TM can be estimated by any estimator. EM method is widely used in the TM parameter estimation [14] [3] [10]. The parameter estimation of TM is also called “topic decomposition” [3], in which the sequences of words in each documents are assigned different topical probabilities according to the evidence accumulated from documents collections in EM iteration. This method is widely used in the “topical structure mining” problem, which is aimed to find the structure of a certain topic. The testing corpus is a collection of web news talking about the same topic. In this paper, we propose a distance measure method for clustering based on the TM induced topical distance for “topic detection” problem. An EM based TM clustering is used to decompose the news documents collection to construct a converged measure matrix, which will be used to get the corresponding TM induced distance. Instead of using the random values to initiate the parameters, we propose a biased initiation method, which is more efficient. The collection of web news is then clustered into classes using the TM induced distance. The testing corpus is a collection of 5033 web news documents from 30 topics. We randomly select test set containing different numbers of topics and documents. The clustering result shows that the TM induced distance achieves a better performance than the VS model based distance. When the data set contains related topics, the TM induced distance can achieve a better precision and recall, which is always lower in cosine-like distance. The rest of the paper is organized as follows. Section 2 gives the review of related work. Section 3 contains the formulation of topic model and EM iteration steps using a biased initiation as well as the topic model induced distance. In
312
Y. Wu et al.
Section 4 we present the experiment result. Conclusions and future works are described in Section 5.
2
Related Work
Much work has been done on text clustering [11] [5]. Most of the text clustering method is based on the VS(or bag of words) model using the cosine-like distance [1] [12]. A series of useful distance measure methods can be seen from [15]. Usually, the definition of a topic is complex than the definition of category in text clustering. When the clustering method is used in topic detection problem more topical rich features has to be extracted to achieve a better performance. The topic detection problem is referred both in Topic Detection and Tracking task and Comparative Text Ming task. Clustering methods are widely used in related research about topic detection. In order to capture more topical features, topic model is used in the topic structure mining (or topic analysis) problem. There are many TM proposed to mining the topic structure. H.Li K.Yamanishi [8] proposed a method using EM based mixture model to analysis topic structure. C.Zhai A.Velivelli and B.Yu [14] proposed a novel text mining problem : Comparative Text Mining, in which the cross-collection mixture model is proposed to mining the topic structure by considering the back ground distribution in all collections. This model is also used in the mining of topic evolution theme pattern in [10]. Then the sentence level models are used in evolutionary theme pattern mining [9] and Hot Topic Extraction [2]. B.Sun et al. [13] proposed a mutual information based method for topic segmentation. However, there are problems. First. In topic structure mining problem, the testing corpus are selected from a series of web news from a related topic with the real topic number unknown, which makes the evaluation of TM a non-trivial task. In this paper, we gathered a collection of web news with the topic tag labeled by hand. Instead of directly using the converged matrix, we define a new measure distance from the matrix. Then the clustering method can be used based on this distance. The clustering result can be evaluated using the F-measure method. Second. The EM method is widely used as the estimator in the TM clustering. Most of the EM iterations use a random initiated parameter values in the EM iterations, which may cause a local maximum of the likelihood. A method used in [10] is to set an empirically threshold to select the most salient topics. In this paper, we propose a biased initiation method to instead of the random values, which is more efficient than the random values with empirically threshold. To the best of our knowledge, the most related to our work is [7]. In their research, a user-centered approach is proposed to evaluate the topic models. The corpus is a collection of web news. The topics labled by two users is used as the evaluation base-line result. As a evaluation method, the topic number kt is known as the number judged by users. The collection of news is clustered into classes using K-means with a fixed k = kt . The TMs are recomputed in each K-means loop using a statistical method.
Topic Detection by Topic Model Induced Distance Using Biased Initiation
313
In our research, the number of topics is unkown. The collections of news are decomposited by a mixture model based on EM method and the number of TMs are auto select by the EM iteration process. Using the TM induced distance defined on the converged matrix, the normal K-means are used to comare two different distance measure methods. Different number of k are used in K-means. The numbers near the real topic numbers are presented which is the most useful part for topic detection problem.
3 3.1
Clustering on the TM Induced Distance Topic Detection from Web News
The TDT defines a topic as “a seminal event or activity, along with all directly related events and activities” [16]. In this research we are focused on the topic detection from web news. The topic is defined as a group of web news talking about the same event or activities. The topic detection problem then becomes a clustering problem : clustering the web news stream into groups with each group containing the web news talking about the same topic. 3.2
Formulation of a Uni-gram TM
By assuming the distribution of word is different in each topic, the word distribution in topic can be treated as a topic model(TM). Then, each document can be treated as a sample of the mixture of k TM. Let D = {d1 , d2 , . . . , dn } be a collection of documents, T be the set of k topics exists in this collection, W be the word set of this collection, (θ1 , . . . , θk ) be the corresponding k TM, Di be the document set talking about θi . Then the probability of a word w ∈ W occurs in D can be defined as: P (w|t)P (t|D). (1) P (w|D) = t∈T
Here P (w|t) is the topic-specific word probability, which denote the occurrence probability of word w in TM of topic t. P (t|D) is the mixture parameter of different topic t. This is a simple mixture TM proposed by Daniel Gildea [3]. TMs with different mixture pattern are proposed in related research. By considering the background knowledge, Zhai et al. [14] proposed the cross-collection mixture model: P (w|Di ) = (1 − λB )× k
Here, we use the simple mixture model described by Equation 1 as an example. Other TM can be used in our topic detection method by changing the EM iteration equations.
314
Y. Wu et al.
Using EM method ,the TM defined by Equation 1 can be estimated as follows. E-step: using the mixture parameter P (t|d) and probability of word w in TM t P (w|t) in the prior iteration to estimate the probability of word w from document d ∈ D in TM t ∈ T . P n (t|w, d) =
P n−1 (w|t)P n−1 (t|d) . n−1 (w|t )P n−1 (t |d) t P
M-step: adjusting the parameters: n n d count(w, d)P (t|w, d) P (w|t) = , n w d count(w , d)P (t|w , d) n w count(w, d)P (t|w, d) P (t|d) = . n t w count(w, d)P (t |w, d) n
(2)
(3)
(4)
As can be seen from section 2, there are many TM proposed in related research. But most of them are used in the “topic structure mining” problem, which is aimed to constructing an evolution graph of the topic. The testing corpus are web news talking about the same topic. For example, a searching result of tsunami is used in [10]. The EM based TM clustering is used to mine the structure of the topics. The related research about TM clustering for topic structure mining show that, this method could detect the topic evolution pattern effectively. 3.3
Topic Decomposition Using Biased Initiate Values
Usually the EM iteration of TM starts from a series of randomly initialized parameters. In the iteration of TM defined by Equation 1, the E-step computes the probability of a word w in document d generated by the TM θi . In the Mstep, the parameters in TM: P (w|t) and P (t|d) are adjusted. In order to reduce the running complexity, there are some on-line or approximation method are used in the iteration. In our experiments, the random values are not used. The initial value of 1 and |T1 | correspondly. An initial iteration step P (w|t) and P (t|d) are set to |W | is added in the EM-iteration, which will generate the biased parameter values. These values are then converged to a stable distribution in the following EM iteration. We find this method will generate a better TMs than the random initial values. The EM iteration process using the biased initial value are show in Algorithm 1. In the biased initiation process, we select a document di for each ti ∈ T and assuming that the distribution of word P (ti |w, di ) is the converged distribution of wi in ti (P (wi |ti )). Then, the parameter P (w|t) can be adjusted using the following equation: count(w, d)P n (t|w, d) n d . (5) P (w|t) = |D| ∗ w count(w , di )P n (t|w , di )
Topic Detection by Topic Model Induced Distance Using Biased Initiation
315
Since the EM iteration is processed by assuming that the document di is a mixture of |T | topic models, using the distribution of w ∈ di as the real distribution of a TM is more reasonable than random values. We call this initial process a biased initiation. The biased initial values will converge to the real distribution as the EM iteration processed on the document set. Algorithm 1. EM Iteration Using Biased initial value Input: P (w|t), P (t|d), |D|, |T |, ε, loopmax , loopbias Output: converged P (w|t), P (t|d) 1 P (w|t) = |D| ; 1 P (t|d) = |T | ; select a biased di for each ti ∈ T ; while not end of loopbias do E-step using Equation 2 ; do biased adjust using Equation 5 ; adjust P (t|d) using Equation 4 ; end while not end of loopmax and above ε do E-step using Equation 2 ; do biased adjust using Equation 3 ; adjust P (t|d) using Equation 4 ; end return P (w|t), P (t|d) ;
3.4
Formulation of the TM Distance
Instead of directly using the converged matrix, we define a distance on the matrix. The TM induced topic distance is derived from the converged TM from the EM iteration. When the EM iteration converges, we will get the mixture weight of TM θi document d P (ti |dj ). Then using the TM as k dimensional coordinate each document dj can be represented as a k dimensional vector: V (dj ) = (P (t0 |dj ), . . . , P (tk |dj )).
(6)
The TM distance between document di and dj can be calculated as the sum of squared error in the TM space: D(di , dj ) = ||V (di ) − V (dj )||2 .
(7)
Using this TM distance, we can get the similarity matrix of the web news collection. From Equation 4 we can see that the TM induced distance considers the word distribution in different TM, which makes it more discriminative in topic detection. To prove the effect of TM distance for topic detection problem, we compare this distance with cosine-like distance widely used in text clustering.
316
Y. Wu et al.
The cosine-like distance used in this research is defined by the follow equations, which is also used in [4] to compute the LexRank values. cosine(di , dj ) =
3.5
tfw,di tfw,dj (idfw )2 . 2× 2 (tf idf ) (tf idf ) w,d w w,d w i j w∈di w∈dj w∈di ∩dj
(8)
Parameters in TM Clustering for Topic Detection
In this research, a uni-gram TM is used as a example, which only considers the occurrence number of the words. The words occurred in too many web news containing few topical information can be dropped by a threshold. Usually, the maximum document frequency(DF) DFmax and the minimum document frequency DFmin is widely used in Information Retrieval. Words with a high DF above DFmax or a low DF below DFmin will be dropped. From the view of topic representation, words occurred in too many topics contain few topical information. So the DFmax is used in both of the two Distance measure methods. The minimum DF is not used since many topic specific words have a low DF. We use the same DFmax for both cosine-like distance and TM induced distance, which is set to 0.35 according the comparison experiments using different DFmax . There is another parameter in the decomposition process using TM: the predefined number of TMs kt . As a unsupervised method, the number of topics in the collection of web news is unknown. Usually, kt is set to a reasonable large value. As the EM iteration converges, the probability of words P (w|t) and the mixture weight of a document P (t|d) will drop to zero in some TM. When the probability value is very low this TM 1 will be dropped. In the experiment we use |D| d∈D P (t|d) to drop the TM with a low probability. Thus, a suitable number of kt can be selected from the iteration.
4 4.1
Experiments and Results Testing Corpus
The corpus used in experiments is a collection of web news pages totally 5033 documents from 30 topics. The web news are collected from the special news column of major portal sites in China using our web crawler pooler. News pages in the special news column are grouped by the reported topic which makes it easy to label the web pages with topics. The number of pages in each topic can be seen from Fig. 1. The web news are preprocessed before they are used in the experiments. HTML script, punctuations, numbers and other non-informative text were removed using a purify program. The advertisements text in the web news were
Topic Detection by Topic Model Induced Distance Using Biased Initiation
317
Fig. 1. Web Page Numbers in Different Topics
removed by hand. Then the texts are processed by our Chinese word segmentation and Named Entity Recognition system ELUS. The ELUS system uses a trigram model with smoothing algorithm for Chinese word segmentation, and Maximum entropy model for Named Entity Recognition. In the third SIGHAN2006 bakeoff, the ELUS[6] system is the best in MSRA open test. In the following experiments, we use the balanced test set with topics randomly sampled from the corpus. In each randomly sampled topic, 30 web news are randomly selected. 4.2
Convergence of the EM Iteration Using Biased Initiation
In the first series of experiments, test sets with different topic number range from 5 to 10 are randomly sampled to test the converge speed of the EM iteration using the biased initiation method. The squared error defined in Equation 9 is used to test the converge of the EM iteration. Fig. 2 shows that using a biased initiation method , the EM iteration will converge in less than 100 loops. (P n (t|d) − P n−1 (t|d))2 (9) errorn = d∈D t∈T
4.3
Auto Seletion of kt
In this experiment, we select a reasonable large value for kt and dropping the low probability TMs to make the iteration converges to a suitable kt .The threshold for dropping TM is defined as: 1 P (t|d). (10) λ= |D| d∈D
λ can be used to control the converged number of TM.
318
Y. Wu et al.
Fig. 2. Converge speed of EM using biased initiation in different |T |
4.4
Comparison of Biased Initiation and Random Values
We proposed the biased initiation method in topic decomposition. In order to compare our method with the topic decomposition using random values, we use the two methods to decompose a test set with 9 topics into different number of TM. A maximum iteration number of 150 are used in both of the two methods. Then, using the TM induced distance derived from the converged matrix of the two methods, the web news are clustered into different number of classes ranging from 4 to 20 using K-means. The best clustering results of the two methods in 1000 runs of K-means are compared using F-measure. Fig. 3 shows the comparison of the two methods. From Fig. 3 we can see that the parameter estimation using a biased initiation is better than the random values since the converge speed of the parameter matrix is quick. We only plot the compare result with TM number in {6, 9, 10, 14} for keeping clear. 4.5
Comparison of TM Disdance and Cosine Distance
Different values of kt can be get by set different threshold of λ. Then we will get the TM induced distance corresponding to the different kt . This distance is then used in K-means clustering compared with the cosine-like distance. Fig. 4 shows the F-measure of the clustering result under different number of converged TMs on a test set containing 9 topics using 1000 runs of K-means. Both of the distance use the same randomly selected centers in each K-means loop. But the allocating of points and the generating of new centers are based on the corresponding distance matrix. The clustering results are then evaluated using F-measure. The clustering using TM distance with kt below the real topic numbers is also plotted in Fig. 4.
Topic Detection by Topic Model Induced Distance Using Biased Initiation
319
Fig. 3. Compare of Biased Initiation and Random Values
We can see that, the TM induce distance generates more topical clusters than the cosine-like distance in the value of k in K-means near the real topic number. However, if the dropping threshold λ is set to a higher value the EM iteration will converge to a kt lower than the real topic number(kt = {4, 6}). The clustering result with kt less than the real topic number is not good as the clustering with a larger kt (kt = {14, 18}).
Fig. 4. Compare of TM Distance with Cosine Distance for |T | = 9
4.6
Precision and Recall in Each Classes
As the harmonic mean of precision and recall, F-measure is widely used to compare the efficiency of the whole clustering result. The F-measure result is described in the previous subsection. In this part, we will give the comparison of precision and recall in each class using the two distances.
320
Y. Wu et al.
Here, we use the best clustering result in the 9 topics test set with k = 9 in K-means and kt = 14 in TM decomposition. The comparison of precision, recall and F-measure in each class using the two distance can be seen from Table 2. The class label for each class can be seen from Table 1. Since the topic is randomly select from the 30 topics, some of the topics are related to a more general category. Classes 2, 5, 9 are topics about disasters. Classes 3, 7 are topics about the diseases. From Table 2 we can see that the clustering result of related topics are not as good as the topics less related with other topics(Class 1, 4, 6) using cosine-like distance. A better precision and recall can be archived in these related topics{3, 7}, {2, 5, 9} using TM induced distance. Table 1. Topics in the Clustering Result Class ID Topic 1 President Election in America 2 Heavy Snow Disaster in China 3 Bird Flu in China 4 Prime Minister Resigned in Japan 5 Boat Sank in Egypt 6 Death of Micheal Jackson 7 H1N1 in China 8 SanLu milk Scandal 9 Earth Quick in Sichuan Provence of China
Table 2. Comparison of Precision and Recall in Each Class Topics Class ID 1 2 3 4 5 6 7 8 9
Using the EM based topic decomposition method with a biased initiation, we can get the TM induced distance on two test set with topic number 15 and 20. The experiment result can be seen from Fig. 5 and Fig. 6. We can get the same conclusion as from Fig. 4.
Topic Detection by Topic Model Induced Distance Using Biased Initiation
321
Fig. 5. Compare of TM Distance with Cosine Distance for |T | = 15
Fig. 6. Compare of TM Distance with Cosine Distance for |T | = 20
From the experiments we can see that, the topic detection using TM induced distance is more effective than cosine-like distance as the number of topics and pages grows.
5
Conclusion
In this paper, we proposed a new distance measure method Using EM based TM clustering, which is widely used in the topic structure mining. A biased initiate method is proposed instead of the random values used in related research. The TM induced distance is defined based on the converged measure matrix. Using
322
Y. Wu et al.
this TM induced distance, the web news collection can be clustered into different topical groups. A collection of 5033 web news from 30 topics is gathered from the special news column of Chinese websites. The topical tags are labeled by hand. The TM induced distance and cosine-like distance are compared on this corpus using K-means clustering. To eliminate the random factors of the comparison, the clustering based on the two distance shares the same initial centers in each Kmeans loop. The F-measure method is used to measure the clustering result using two distance. The experiments show that, the proposed topic decomposition method using biased initiation is effective than the topic decomposition using random values. Using the converged matrix we can get the TM induced distance, which will generate more topical clusters than the cosine-like distance. The efficiency of our method can be seen from the data set containing related topics, in which the TM induced distance can achieve a better precision and recall. Acknowledgments. This investigation was supported by the National Natural Science Foundation of China( No. 60703015 and No. 60973076).
References 1. Allan, J., Papka, R., Lavrenko, V.: On-line new event detection and tracking. In: SIGIR 1998: Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp. 37–45 (1998) 2. Chen, K.Y., Luesukprasert, L., Chou, S.T.: Hot Topic Extraction Based on Timeline Analysis and Multidimensional Sentence Modeling. IEEE Trans. on Knowl. and Data Eng., 1016–1025 (2007) 3. Gildea, D., Hofmann, T.: Topic-Based Language models using EM. In: Proceedings of the 6th European Conference on Speech Communication and Technology, pp. 109–110 (1999) 4. Erkan, G., Radev, D.R.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Artif. Int. Res., 457–479 (2004) 5. Jain, A.K., Merty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv., 264–323 (1999) 6. Jiang, W., Guan, Y., Wang, X.: A Pragmatic Chinese Word Segmentation System. In: Proceedings of the Fifth SIGHAN Workshop on Chinese Language Processing, Sydney, pp. 189–192 (2006) 7. Kelly, D., D´ıaz, F., Belkin, N.J., Allan, J.: A User-Centered Approach to Evaluating Topic Models. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 27–41. Springer, Heidelberg (2004) 8. Li, H., Yamanishi, K.: Topic analysis using a finite mixture model. In: Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora., pp. 35–44. ACM, NJ (2000) 9. Liu, S., Merhav, Y., Yee, W.G.: A sentence level probabilistic model for evolutionary theme pattern mining from news corpora. In: SAC 2009: Proceedings of the 2009 ACM symposium on Applied Computing, pp. 1742–1747. ACM, New York (2009)
Topic Detection by Topic Model Induced Distance Using Biased Initiation
323
10. Mei, Q., Zhai, C.: Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: KDD 2005: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining, pp. 198–207. ACM, New York (2005) 11. Michael, S., George, K., Vipin, K.: A Comparison of Document Clustering Techniques. In: KDD 2000, 6th ACM SIGKDD International Conference on Data Mining, Sydney, pp. 109–110 (2000) 12. Pons-Porrata, A., Berlanga-Llavori, R., Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques. Inf. Process. Manage., 752–768 (2007) 13. Sun, B., Mitra, P., Giles, C.L., Yen, J., Zha, H.: Topic segmentation with shared topic detection and alignment of multiple documents. In: SIGIR 2007: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 199–206. ACM, New York (2007) 14. Zhai, C., Velielli, A., Yu, B.: A cross-collection mixture model for comparative text mining. In: KDD 2004: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 743–748. ACM, Seattle (2004) 15. Zobel, J., Moffat, A.: Exploring the similarity space. In: SIGIR Forum, pp. 18–34. ACM, New York (1998) 16. The 2004 Topic Detection and Tracking Task Definition and Evaluation Plan (2004), http://www.nist.gov/speech/tests/tdt/
Mining Significant Least Association Rules Using Fast SLP-Growth Algorithm Zailani Abdullah1, Tutut Herawan2, and Mustafa Mat Deris3 1 Department of Computer Science, Universiti Malaysia Terengganu Department of Mathematics Education, Universitas Ahmad Dahlan, Indonesia 3 Faculty of Information Technology and Multimedia, Universiti Tun Hussein Onn Malaysia [email protected], [email protected], [email protected] 2
Abstract. Development of least association rules mining algorithms are very challenging in data mining. The complexity and excessive in computational cost are always become the main obstacles as compared to mining the frequent rules. Indeed, most of the previous studies still adopting the Apriori-like algorithms which are very time consuming. To address this issue, this paper proposes a scalable trie-based algorithm named SLP-Growth. This algorithm generates the significant patterns using interval support and determines its correlation. Experiments with the real datasets show that the SLP-algorithm can discover highly positive correlated and significant of least association. Indeed, it also outperforms the fast FP-Growth algorithm up to two times, thus verifying its efficiency. Keywords: Least association rules; Data mining; Correlated.
Mining Significant Least Association Rules Using Fast SLP-Growth Algorithm
325
require excessive computational cost, there are very limited attentions have been paid to discover the highly correlated least ARs. For both frequent and least ARs, it may have a different degree of correlation. Highly correlated least ARs are referred to the itemsets that its frequency does not satisfy a minimum support but are very highly correlated. ARs are classified as highly correlated if it is positive correlation and in the same time fulfils a minimum degree of predefined correlation. Recently, statistical correlation technique has been widely applied in the transaction databases [3], which to find relationship among pairs of items whether they are highly positive or negative correlated. In reality, it is not absolute true that the frequent items have a positive correlation as compared to the least items. In this paper, we address the problem of mining least ARs with the objectives of discovering significant least ARs but surprisingly are highly correlated. A new algorithm named Significant Least Pattern Growth (SLP-Growth) to extract these ARs is proposed. The proposed algorithm imposes interval support to capture all least itemsets family first before continuing to construct a significant least pattern tree (SLP-Tree). The correlation technique for finding relationship between itemset is also embedded to this algorithm. The reminder of this paper is organized as follows. Section 2 describes the related work. Section 3 explains the basic concepts and terminology of ARs mining. Section 4 discusses the proposed method. This is followed by performance analysis thorugh two esperiment tests in section 5. Finally, conclusion and future direction are reported in section 6.
2 Related Work There are numerous works has been published to discover the scalable and efficient methods of frequent ARs. However, only few attentions have been paid for mining least ARs. As a result, ARs that are rarely found in the database are always ignored by the minimum support-confidence threshold. In the real world, the rarely ARs are also providing significant and useful information for experts, particularly in detecting the highly critical and exceptional situations. Zhou et al. [4] suggested an approach to mine the ARs by considering only infrequent itemset. The limitation is, Matrix-based Scheme (MBS) and Hash-based scheme (HBS) algorithms are facing the expensive cost of hash collision. Ding [5] proposed Transactional Co-occurrence Matrix (TCOM for mining association rule among rare items. However, the implementation of this algorithm is too costly. Yun et al. [2] proposed the Relative Support Apriori Algorithm (RSAA) to generate rare itemsets. The challenge is if the minimum allowable relative support is set close to zero, it takes similar time taken as performed by Apriori. Koh et al. [6] introduced Apriori-Inverse algorithm to mine infrequent itemsets without generating any frequent rules. The main constraints are it suffers from too many candidate generations and time consumptions during generating the rare ARs. Liu et al. [7] proposed Multiple Support Apriori (MSApriori) algorithm to extract the rare ARs. In actual implementation, this algorithm is still suffered from the “rare item problem”. Most of the proposed approaches [2,4−7] are using the percentage-based approach in order to improve the performance of existing single minimum support based approaches. Brin et al. [8] presented objective measure called lift and chi-square as correlation measure for ARs. Lift compares the frequency of pattern against a baseline
326
Z. Abdullah, T. Herawan, and M.M. Deris
frequency computed under statistical independence assumption. Instead of lift, there are quite a number interesting measures have been proposed for ARs. Omiecinski [9] introduces two interesting measures based on downward closure property called all confidence and bond. Lee et al. [10] proposes two algorithms for mining all confidence and bond correlation patterns by extending the frequent pattern-growth methodology. Han et al. [11] proposed FP-Growth algorithm which break the two bottlenecks of Apriori series algorithms. Currently, FP-Growth is one of the fastest approach and most popular algorithms for frequent itemsets mining. This algorithm is based on a prefix tree representation of database transactions (called an FP-tree).
3 Basic Concept and Terminology 3.1 Association Rules (ARs) ARs were first proposed for market basket analysis to study customer purchasing patterns in retail stores [1]. Recently, it has been applied in various disciplines such as customer relationship management [12], image processing [13]. In general, association rule mining is the process of discovering associations or correlation among itemsets in transaction databases, relational databases and data warehouses. There are two subtasks involved in ARs mining: generate frequent itemsets that satisfy the minimum support threshold and generate strong rules from the frequent itemsets. Let I is a nonempty set such that I = {i1 , i2 ,L , in } , and D is a database of transactions where each T is a set of items such that T ⊂ I . An association rule is a form of A ⇒ B , where A, B ⊂ I such that A ≠ φ , B ≠ φ and A I B = φ . The set A is called antecedent of the rule and the set B is called consequent of the rule. An item is a set of items. A kitemset is an itemset that contains k items. An itemset is said to be frequent if the support count satisfies a minimum support count (minsupp). The set of frequent itemsets is denoted as Lk . The support of the ARs is the ratio of transaction in D that
contain both A and B (or A U B ). The support is also can be considered as probability P( A U B ) . The confidence of the ARs is the ratio of transactions in D contains A that also contains B. The confidence also can be considered as conditional probability P(B A) . ARs that satisfy the minimum support and confidence thresholds are said to be strong. 3.2 Correlation Analysis
A few years after the introduction of ARs, Aggrawal et al. [14] and Brin et al. [8] realize the limitation of the confidence-support framework. Many studies have shown that the confidence-support framework alone is insufficient at discovering the interesting ARs. Therefore, the correlation can be used as complimentary measure of this framework. This leads to correlation rules as
A⇒ B
(supp, conf, corr )
(1)
Mining Significant Least Association Rules Using Fast SLP-Growth Algorithm
327
The correlation rule is measure based on the minimum support, minimum confidence and correlation between itemsets A and B. There are many correlation measures applicable for ARs. One of the simplest correlation measures is Lift. The occurrence of itemset A is independence of the occurrence of itemset B if P( A U B ) = P( A)P(B ) ; otherwise itemset A and B are dependence and correlated. The lift between occurrence of itemset A and B can be defined as: P( A I B ) lift ( A, B ) = (2) P( A)P(B ) The equation of (4) can be derived to produce the following definition: P (B | A ) lift ( A, B ) = (3) P (B ) or conf ( A ⇒ B ) lift ( A, B ) = (4) supp(B )
The strength of correlation is measure from the lift value. If lift ( A, B ) = 1 or P(B | A) = P(B ) (or P( A | B ) = P(B )) then B and A are independent and there is no correlation between them. If lift ( A, B ) > 1 or P(B | A) > P(B ) (or P( A | B ) > P(B )) , then A and B are positively correlated, meaning the occurrence of one implies the occurrence of the other. If lift ( A, B ) < 1 or P(B | A) < P(B ) (or P( A | B ) < P(B )) , then A and B are negatively correlated, meaning the occurrence of one discourage the occurrence of the other. Since lift measure is not down-ward closed, it definitely will not suffer from the least item problem. Thus, least itemsets with low counts which per chance occur a few times (or only once) together can produce enormous lift values. 3.3 FP-Growth
The main bottleneck of the Apriori-like methods is at the candidate set generation and test. This problem was resolved by introducing a compact data structure, called frequent pattern tree, or FP-tree. FP-Growth was then developed based on this data structure and currently is a benchmarked and fastest algorithm in mining frequent itemset [11]. FP-Growth requires two times of scanning the transaction database. Firstly, it scans the database to compute a list of frequent items sorted by descending order and eliminates rare items. Secondly, it scans to compress the database into a FP-Tree structure and mines the FP-Tree recursively to build its conditional FP-Tree. A simulation data [15] is shown in Table 1. Firstly, the algorithm sorts the items in transaction database with infrequent items are removed. Let say a minimum support is set to 3, therefore alphabets f, c, a, b, m, p are only kept. The algorithm scans the entire transactions start from T1 until T5. In T1, it prunes from {f, a, c, d, g, i, m, p} to {f, c, a, m, p, g}. Then, the algorithm compresses this transaction into prefix tree which f becomes the root. Each path on the tree represents a set of transaction with the same prefix. This process will execute recursively until the end of transaction. Once the complete tree has been built, then the next pattern mining can be easily performed.
328
Z. Abdullah, T. Herawan, and M.M. Deris Table 1. A Simple Transaction Data TID T1 T2 T3 T4 T5
a a b b a
c b f c f
m c h k c
f f j s e
Items p l m o o p l p m n
4 Methodology 4.1 Algorithm Development
Determine Interval Support for least Itemset. Let I is a non-empty set such that I = {i1 , i2 ,L , in } , and D is a database of transactions where each T is a set of items such that T ⊂ I . An item is a set of items. A k-itemset is an itemset that contains k items. An itemset is said to be least if the support count satisfies in a range of threshold values called Interval Support (ISupp). The Interval Support is a form of ISupp (ISMin, ISMax) where ISMin is a minimum and ISMax is a maximum values respectively, such that ISMin ≥ φ , ISMax > φ and ISMin ≤ ISMax . The set is denoted as Rk . Itemsets are said to be significant least if they satisfy two conditions. First, support counts for all items in the itemset must greater ISMin. Second, those itemset must consist at least one of the least items. In brevity, the significant least itemset is a union between least items and frequent items, and the existence of intersection between them. Construct Significant Least Pattern Tree. A Significant Least Pattern Tree (SLP-Tree) is a compressed representation of significant least itemsets. This trie data structure is constructed by scanning the dataset of single transaction at a time and then mapping onto path in the SLP-Tree. In the SLP-Tree construction, the algorithm constructs a SLP-Tree from the database. The SLP-Tree is built only with the items that satisfy the ISupp. In the first step, the algorithm scans all transactions to determine a list of least items, LItems and frequent items, FItems (least frequent item, LFItems). In the second step, all transactions are sorted in descending order and mapping against the LFItems. It is a must in the transactions to consist at least one of the least items. Otherwise, the transactions are disregard. In the final step, a transaction is transformed into a new path or mapped into the existing path. This final step is continuing until end of the transactions. The problem of existing FP-Tree are it may not fit into the memory and expensive to build. FP-Tree must be built completely from the entire transactions before calculating the support of each item. Therefore, SLP-Tree is an alternative and more practical to overcome these limitations. Generate Significant Least Pattern Growth (SLP-Growth). SLP-Growth is an algorithm that generates significant least itemsets from the SLP-Tree by exploring the tree based on a bottom-up strategy. ‘Divide and conquer’ method is used to decompose task into a smaller unit for mining desired patterns in conditional databases, which can optimize the
Mining Significant Least Association Rules Using Fast SLP-Growth Algorithm
329
searching space. The algorithm will extract the prefix path sub-trees ending with any least item. In each of prefix path sub-tree, the algorithm will recursively execute to extract all frequent itemsets and finally built a conditional SLP-Tree. A list of least itemsets is then produced based on the suffix sequence and also sequence in which they are found. The pruning processes in SLP-Growth are faster than FP-Growth since most of the unwanted patterns are already cutting-off during constructing the SLP-Tree data structure. The complete SLP-Growth algorithm is shown in Figure 1. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36:
Read dataset, D Set Interval Support (ISMin, ISMax) for items, I in transaction, T do Determine support count, ItemSupp end for loop Sort ItemSupp in descending order, ItemSuppDesc for ItemSuppDesc do Generate List of frequent items, FItems > ISMax end for loop for ItemSuppDesc do Generate List of least items, ISMin <= LItems < ISMax end for loop Construct Frequent and Least Items, FLItems = FItems U LItems for all transactions,T do if (LItems I I in T > 0) then if (Items in T = FLItems) then Construct items in transaction in descending order, TItemsDesc end if end if end for loop for TItemsDesc do Construct SLP-Tree end for loop for all prefix SLP-Tree do Construct Conditional Items, CondItems end for loop for all CondItems do Construct Conditional SLP-Tree end for loop for all Conditional SLP-Tree do Construct Association Rules, AR end for loop for all AR do Calculate Support and Confidence Apply Correlation end for loop
Fig. 1. SLP-Growth Algorithm
330
Z. Abdullah, T. Herawan, and M.M. Deris
4.2 Weight Assignment
Apply Correlation. The weighted ARs (ARs value) are derived from the formula (4). This correlation formula is also known by lift. The processes of generating weighted ARs are taken place after all patterns and ARs are completely produced. Discovery Highly Correlated Least ARs. From the list of weighted ARs, the algorithm will begin to scan all of them. However, only those weighted ARs with correlation value that more than one are captured and considered as highly correlated. For ARs with the correlation less than one will be pruned and classified as low correlation.
5 Experiment Test The performance comparison is made by comparing the tree structure being produced by both FP-Growth and SLP-Growth. In terms of total number of ARs, both FPGrowth and Apriori-like will produce exactly the same number. At this final phase, the algorithm will determine which ARs are highly correlated based on the specified thresholds of correlation, respectively. The interpretations are will be made based on the results obtained. 5.1 A Dataset from [16]
We evaluate the proposed algorithm to air pollution data taken in Kuala Lumpur on July 2002 as presented and used in [16]. The ARs of the presented results are based on a set of air pollution data items, i.e. {CO , O , PM , SO , NO } . The value of each item is with the unit of part per million (ppm) except PM 10 is with the unit of micrograms (μgm ) . The data were taken for every one-hour every day. The actual data is presented as the average amount of each data item per day. For brevity, each data item is mapped to parameters 1,2,3,4 and 5 respectively, as shown in Table 2. From Table 2, there are five different parameters (items), i.e. {1,2,3,4,5} and 30 transactions. Each transaction is defined as a set of data items corresponds to such numbers. The executed transactions are described in Table 3. In addition, SLP-Growth algorithm is also incorporated with lift measurement to determine the degree of correlation of significant least itemsets. Table 4 shows 26 ARs and it correlations. Top 10 ARs from the Table 4 have a positive correlations with the least item 2 is a consequent. The highest correlation value for least item 2 as a consequent is 2.31 and the lowest is 1.07. Surprisingly, item 4 which is the second highest of support count (77%) has a negative correlation in ARs 1→4. To evaluate the significances of the least item, domain expert has been confirmed that the item 2 (O3 ) is the most dominant factor in determining the criticality status of air pollution. Figure 4 shows the prefix tree of both algorithms. Figure 3 states the support counts for all items in air pollution data. During mining stage, SLP-Growth only focuses for those branches inside the rectangle box (1→4→5→3→2) and, ignored other branches (4→3, 5→3). Thus, the ARs or itemsets that produced by SLP-Growth are proven more significances; consist of least items and surprisingly, most of them are highly positive correlated. In addition, SLP-Growth can reduce the activities of mining unnecessary branches as compared to FP-Growth. 2
3
10
2
2
Mining Significant Least Association Rules Using Fast SLP-Growth Algorithm Table 2. The mapped air pollution data
Data CO2 ≥ 0.02 O3 ≤ 0.007 PM10 ≥ 80 SO2 ≥ 0.04 NO2 ≥ 0.03
Items 1 2 3 4 5
Table 3. The executed transactions TID
Items
T1 T2 T3 : T30
1 5 1 1 4 : 1 4
Items (desc) 1 5 1 1 4 : 1 4
Table 4. ARs with different weight schemes ARs 1 4 1 4 1 1 5 3 1 4 4 1 5 1 1 1 5 4 1 1 4 1 4 1 1 1
Fig. 2. Tree Data Structure. FP-Tree and SLP-Tree indicate the prefix-tree respectively. The number of nodes in SLP-Tree is obviously less than appeared in FP-Tree.
Total Support Count of Items 100
93
90
77
80
70
Percentag e
70 53
60 50 40 30 20
10
10 0 1
2
3
4
5
Item
Fig. 3. Total support count of each item. Item 2 only appears 10% from entire transactions and considered as a least item. The rest of items are frequent.
5.2 Retail Dataset from [17]
We finally compare the real computational performance of the proposed algorithm against a benchmarked algorithm FP-Growth. Retails dataset from Frequent Itemset Mining Dataset Repository is employed in this experiment. This experiment have been conducted on Intel® Core™ 2 CPU at 1.86GHz speed with 512MB main memory, running on Microsoft Windows XP. All algorithms have been developed using C# as a programming language. The aims of the experiment are to evaluate the actual computational performance and complexity of SLP-Growth versus the benchmarked FP-Growth algorithm.
Mining Significant Least Association Rules Using Fast SLP-Growth Algorithm
333
Table 5 shows the fundamental characteristics of the dataset and performance of both algorithms at Figure 4. Thus, variety of interval supports or minimum supports (Supp) are employed. Here, ISMin in interval support is set to equavelant to Supp and ISMax is set to 100%. In average, processing time to mine ARs using SLP-Growth is 2 times faster than FP-Growth. From Figure 4, the processing times are decreasing once the minimum supports are increasing. Figure 5 shows the computational complexity for both algorithms. The computational complexity is measure based on the number of iteration counted during constructing and mining the SLP-Tree. The result reveals that the average number of iterations in SLP-Growth is 4 times less than FP-Growth. Therefore, SLP-Growth algorithm is significantly efficient and more suitable in mining the least patterns. In addition, it also can be used in mining the frequent itemsets and proven outperform the FP-Growth algorithm. Table 5. Retails Characteristics Data sets Retails
Size 4.153 MB
#Trans 88,162
#Items 16,471
Performance Analysis for Retails dataset 600000
D u ra t io n ( m s )
500000 400000 SLP-Grow th
300000
FP-Grow th
200000 100000 0 0.7
0.6
0.5
0.4
0.3
Supp(%)
Fig. 4. Computational Performance of Mining the Retails dataset between FP-Growth and SLPGrowth Algorithms
Figure 6 and Figure 7 shows the computational performance of SLP-Growth algorithm based on several interval supports, ISupp. The ISMin and ISMax values in ISupp are set in small ranges of 0.1, 0.2 and 0.3, respectively. In Figure 6, it indicates that the total processing time to mine the SLP-Tree is depend on the number itemsets discovered. In a case of Retails dataset, the significant least itemsets are increased when the ISupp is decreased. Therefore, more significant itemsets are revealed if the settings of interval supports are keep decreasing.
334
Z. Abdullah, T. Herawan, and M.M. Deris
Computational Complexity for Retails dataset 120000000 100000000
Iteration
80000000 SLP-Growth
60000000
FP-Growth
40000000 20000000 0 0.7
0.6
0.5
0.4
0.3
Sup p ( %)
Fig. 5. Computational Complexity of Mining the Retails dataset between FP-Growth and SLPGrowth Algorithms
Performance Analysis for Retails datasets using SLP-Grow th Algorithm 250000
200000
Iteration
150000
ISupp(x,x+0.1) ISupp(x,x+0.2) ISupp(x,x+0.3)
100000
50000
0 0.7
0.6
0.5
0.4
0.3
ISup p , x ( %)
Fig. 6. Computational Performance of Mining the Retails dataset using SLP-Growth Algorithm with different Interval Supports
From Figure 7, the computational complexity are increased when the ISupp is descreased. This fenomena indicates that if more significant least patterns are discovered, more iteration are required to accomplish the mining processes. However, the lowest ISupp is not the superior factor to determine the computational complexity. Thus, computational issues are basically depend on the number of least itemsets that might be discovered based on predetermined ISupp.
Mining Significant Least Association Rules Using Fast SLP-Growth Algorithm
335
Computational Com plexity for Retails dataset using SLP-Grow th Algorithm 30000000 25000000
Iteration
20000000 ISupp(x,x+0.1) ISupp(x,x+0.2)
15000000
ISupp(x,x+0.3) 10000000 5000000 0 0.7
0.6
0.5
0.4
0.3
ISup p , x( %)
Fig. 7. Computational Complexity of Mining the Retails dataset using SLP-Growth Algorithm with different Interval Supports
6 Conclusion ARs mining is undeniable a very crucial in discovering the exceptional cases such as air pollution, rare events analysis etc. It is quite complicated, computationally expensive and thus only few attentions are interested in this area. The traditional supportconfidence approach and existing algorithms such as Apriori and FP-Growth are not scalable enough to deal with these complex problems. In this paper we proposed a new algorithm named SLP-Growth to generate highly correlated and significant least ARs. Interval Support is introduced to ensure only desired least items for the ARs are produced. We compared our algorithm with existing algorithm on a benchmarked and a real datasets. The results show that our algorithm can discover the significant least ARs and minimize the prefix tree generation. In addition, our method also can generate the significant least ARs that are highly correlated and very excellent in computational performance.
References 1. Agrawal, R., Imielinski, T., Swami, A.: Database Mining: A Performance Perspective. IEEE Transactions on Knowledge and Data Engineering 5(6), 914–925 (1993) 2. Yun, H., Ha, D., Hwang, B., Ryu, K.H.: Mining Association Rules on Significant Rare Data Using Relative Support. The Journal of Systems and Software 67(3), 181–191 (2003) 3. Xiong, H., Shekhar, S., Tan, P.-N., Kumar, V.: Exploiting A Support-Based Upper Bond Pearson’s Correlation Coefficient For Efficiently Identifying Strongly Correlated Pairs. In: The Proceeding of ACM SIGKDD 2004 (2004) 4. Zhou, L., Yau, S.: Association Rule and Quantative Association Rule Mining Among Infrequent Items. In: The Proceeding of ACM SIGKDD 2007 (2007) 5. Ding, J.: Efficient Association Rule Mining Among Infrequent Items. Ph.D Thesis, University of Illinois, Chicago (2005)
336
Z. Abdullah, T. Herawan, and M.M. Deris
6. Koh, Y.S., Rountree, N.: Finding sporadic rules using apriori-inverse. In: Ho, T.-B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 97–106. Springer, Heidelberg (2005) 7. Liu, B., Hsu, W., Ma, Y.: Mining Association Rules With Multiple Minimum Supports. SIGKDD Explorations (1999) 8. Brin, S., Motwani, R., Silverstein, C.: Beyond Market Basket: Generalizing ARs to Correlations. Special Interest Group on Management of Data (SIGMOD), 265–276 (1997) 9. Omniecinski, E.: Alternative Interest Measures For Mining Associations. IEEE Trans. Knowledge and Data Engineering 15, 57–69 (2003) 10. Lee, Y.-K., Kim, W.-Y., Cai, Y.D., Han, J.: CoMine: Efficient Mining of Correlated Patterns. In: The Proceeding of ICDM 2003 (2003) 11. Han, J., Pei, H., Yin, Y.: Mining Frequent Patterns Without Candidate Generation. In: The Proceeding of SIGMOD 2000. ACM Press, New York (2000) 12. Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2006) 13. Au, W.H., Chan, K.C.C.: Mining Fuzzy ARs In A Bank-Account Database. IEEE Transactions on Fuzzy Systems 11(2), 238–248 (2003) 14. Aggrawal, C.C., Yu, P.S.: A New Framework For Item Set Generation. In: Proceedings of the ACMPODS Symposium on Principles of Database Systems, Seattle, Washington (1998) 15. Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: Pfp: Parallel Fp-Growth For Query Recommendation. In: Proceedings of RecSys 2008, pp. 107–114 (2008) 16. Mustafa, M.D., Nabila, N.F., Evans, D.J., Saman, M.Y., Mamat, A.: Association rules on significant rare data using second support. International Journal of Computer Mathematics 83(1), 69–80 (2006) 17. http://fimi.cs.helsinki.fi/data/
Maximized Posteriori Attributes Selection from Facial Salient Landmarks for Face Recognition Phalguni Gupta1, Dakshina Ranjan Kisku2, Jamuna Kanta Sing3, and Massimo Tistarelli4 1
Department of Computer Science and Engineering, Indian Institute of Technology Kanpur, Kanpur - 208016, India 2 Department of Computer Science and Engineering, Dr. B. C. Roy Engineering College / Jadavpur University, Durgapur – 713206, India 3 Department of Computer Science and Engineering, Jadavpur University, Kolkata – 700032, India 4 Computer Vision Laboratory, DAP University of Sassari, Alghero (SS), 07041, Italy {drkisku,jksing}@ieee.org, [email protected], [email protected]
Abstract. This paper presents a robust and dynamic face recognition technique based on the extraction and matching of devised probabilistic graphs drawn on SIFT features related to independent face areas. The face matching strategy is based on matching individual salient facial graph characterized by SIFT features as connected to facial landmarks such as the eyes and the mouth. In order to reduce the face matching errors, the Dempster-Shafer decision theory is applied to fuse the individual matching scores obtained from each pair of salient facial features. The proposed algorithm is evaluated with the ORL and the IITK face databases. The experimental results demonstrate the effectiveness and potential of the proposed face recognition technique also in case of partially occluded faces. Keywords: Face biometrics, Graph matching, SIFT features, Dempster-Shafer decision theory, Intra-modal fusion.
vision researchers have adapted and applied an abundance of algorithms for pattern classification, recognition and learning. There exist the appearance-based techniques which include Principal Component Analysis (PCA) [1], Linear Discriminant Analysis (LDA) [1], Fisher Discriminant Analysis (FDA) [1] and Independent Component Analysis (ICA) [1]. Some local feature based methods are also investigated [4-5]. A local feature-based technique for face recognition, called Elastic Bunch Graph Matching (EBGM) has been proposed in [3]. EBGM is used to represent faces as graphs and the vertices localized at fiducial points (e.g., eyes, nose) and the geometric distances or edges labeled with the distances between the vertices. Each vertex contains a set known as Gabor Jet, of 40 complex Gabor wavelet coefficients at different scales and orientations. In case of identification, these constructed graphs are searched and get one face that maximizes the graph similarity function. There exists another graph-based technique in [6] which performs face recognition and identification by graph matching topology drawn on SIFT features [7-8]. Since the SIFT features are invariant to rotation, scaling and translation, the face projections are represented by graphs and faces can be matched onto new face by maximizing a similarity function taking into account spatial distortions and the similarities of the local features. This paper addresses the problem of capturing the face variations in terms of face characteristics by incorporating probabilistic graphs drawn on SIFT features extracted from dynamic (mouth) and static (eyes, nose) salient facial parts. Differences in facial expression, head pose changes, illumination changes, and partly occlusion, result variations in facial characteristics and attributes. Therefore, to combat with these problems, invariant feature descriptor SIFT is used for the proposed graph matching algorithm for face recognition which is devised pair-wise manner to salient facial parts (e.g., eyes, mouth, nose). The goal of the proposed algorithm is to perform an efficient and cost effective face recognition by matching probabilistic graph drawn on SIFT features whereas the SIFT features [7] are extracted from local salient parts of face images and directly related to the face geometry. In this regard, a face-matching technique, based on locally derived graph on facial landmarks (e.g., eye, nose, mouth) is presented with the fusion of graphs in terms of the fusion of salient features. In the local matching strategy, SIFT keypoint features are extracted from face images in the areas corresponding to facial landmarks such as eyes, nose and mouth. Facial landmarks are automatically located by means of a standard facial landmark detection algorithm [8-9]. Then matching a pair of graphs drawn on SIFT features is performed by searching a most probable pair of probabilistic graphs from a pair of salient landmarks. This paper also proposes a local fusion approach where the matching scores obtained from each pair of salient features are fused together using the Dempster-Shafer decision theory. The proposed technique is evaluated with two face databases, viz. the IIT Kanpur and the ORL (formerly known as AT&T) databases [11] and the results demonstrate the effectiveness of the proposed system. The paper is organized as follows. The next section discusses SIFT features extraction and probabilistic graph matching for face recognition. Experimental results are presented in Section 3 and conclusion is given in the last section.
Maximized Posteriori Attributes Selection from Facial Salient Landmarks
339
2 SIFT Feature Extraction and Probabilistic Matching 2.1 SIFT Keypoint Descriptor for Representation The basic idea of the SIFT descriptor [6-7] is detecting feature points efficiently through a staged filtering approach that identifies stable points in the scale-space. Local feature points are extracted by searching peaks in the scale-space from a difference of Gaussian (DoG) function. The feature points are localized using the measurement of their stability and orientations are assigned based on local image properties. Finally, the feature descriptors which represent local shape distortions and illumination changes, are determined.
50
50
100
100
150
150
200
200
50
100
150
200
50
100
150
200
Fig. 1. Invariant SIFT Feature Extraction on a pair of Face Images
Each feature point is composed of four types of information – spatial location (x, y), scale (S), orientation (θ) and Keypoint descriptor (K). For the sake of the experimental evaluation, only the keypoint descriptor [6-7] has been taken into account. This descriptor consists of a vector of 128 elements representing the orientations within a local neighborhood. In Figure 1, the SIFT features extracted from a pair of face images are shown. 2.2 Local Salient Landmarks Representation with Keypoint Features Deformable objects are generally difficult to characterize with a rigid representation in feature spaces for recognition. With a large view of physiological characteristics in biometrics including iris, fingerprint, hand geometry, etc, faces are considered as highly deformable objects. Different facial regions, not only convey different relevant and redundant information on the subject’s identity, but also suffer from different time variability either due to motion or illumination changes. A typical example is the case of a talking face where the mouth part can be considered as dynamic facial landmark part. Again the eyes and nose can be considered as the static facial landmark parts which are almost still and invariant over time. Moreover, the mouth moves changing its appearance over time. As a consequence, the features extracted from the
340
P. Gupta et al.
mouth area cannot be directly matched with the corresponding features from a static template. Moreover, single facial features may be occluded making the corresponding image area not usable for identification. For these reasons to improve the identification and recognition process, a method is performed which searches the matching features from a pair of facial landmarks correspond to a pair of faces by maximizing the posteriori probability among the keypoints features. The aim of the proposed matching technique is to correlate the extracted SIFT features with independent facial landmarks. The SIFT descriptors are extracted and grouped together by searching the sub-graph attributes and drawing the graphs at locations corresponding to static (eyes, nose) and dynamic (mouth) facial positions. The eyes and mouth positions are automatically located by applying the technique proposed in [8]. The position of nostrils is automatically located by applying the technique proposed in [9]. A circular region of interest (ROI), centered at each extracted facial landmark location, is defined to determine the SIFT features to be considered as belonging to each face area. SIFT feature points are then extracted from these four regions and gathered together into four groups. Then another four groups are formed by searching the corresponding keypoints using iterative relaxation algorithm by establishing relational probabilistic graphs [12] on the four salient landmarks of probe face. 2.3 Probabilistic Interpretation of Facial Landmarks In order to interpret the facial landmarks with invariant SIFT points and probabilistic graphs, each extracted feature can be thought as a node and the relationship between invariant points can be considered as geometric distance between the nodes. At the level of feature extraction, invariant SIFT feature points are extracted from the face images and the facial landmarks are localized using the landmark detection algorithms discussed in [8], [9]. These facial landmarks are used to define probabilistic graph which is further used to make correspondence and matching between two faces. To measure the similarity of vertices and edges (geometric distances) for a pair of graphs [12] drawn on two different facial landmarks of a pair of faces, we need to measure the similarity for node and edge attributes correspond to keypoint descriptors and geometric relationship attributes among the keypoints features. Let, two graphs be G ' = {N ' , E ' , K ' , ς '} and G ' ' = {N ' ' , E ' ' , K ' ' , ς ' '} where N', E', K', ζ' denote nodes, edges, association between nodes and association between edges respectively. Therefore, we can denote the similarity measure for nodes n'i ∈ N ' and n' ' j ∈ N ' ' by
s n ij = s (k 'i , k ' ' j ) and the similarity between edges e'ip ∈ E ' and e' ' jq ∈ E ' ' can be denoted by s
e
ipjq
= s (e'ip , e' ' jq ) .
Further, suppose, spectively. Now,
n'i and n' ' j are vertices in gallery graph and probe graph, re-
n' ' j would be best probable match for n'i when n' ' j maximizes the
posteriori probability [12] of labeling. Thus for the vertex n'i ∈ N ' , we are searching
Maximized Posteriori Attributes Selection from Facial Salient Landmarks
341
the most probable label or vertex n 'i = n' ' j ∈ N ' ' in the probe graph. Hence, it can be stated as
n 'i = arg max P (ψ i
n '' j
j , n '' j ∈ N ''
| K ',ς ', K ' ',ς ' ' )
(1)
To simplify the solution of matching problem, we adopt a relaxation technique that efficiently searching the matching probabilities P
n
ij
n'i ∈ N ' and
for vertices
n' ' j ∈ N ' ' . By reformulating Equation (1) can be written as n ' i = arg max P
n
j , n '' j ∈ N ''
ij
(2)
This relaxation procedure considers as an iterative algorithm for searching the best labels for n'i . This can be achieved by assigning prior probabilities P to s
n
ij
n
ij
proportional
= s (k 'i , k ' ' j ) . Then the iterative relaxation rule would be n
P n ij .Q ij
Pˆ n ij =
∑ q ,n
n
q∈N
n
∏
∑s
repeated
until
Q ij = p n ij
(3)
P n iq .Q iq e
ipjq
.P n pq
(4)
p , n ' p ∈ N 'i q , n n q ∈ N n j
Relaxation
cycles
are
the
difference
between
prior
probabilities P ij and posteriori probabilities Pˆ ij becomes smaller than certain threshold Φ and when this is reached then it is assumed that the relaxation process is stable. Therefore, n
n
max
i ,n 'i ∈N ', j , n '' j ∈N ''
P n ij − Pˆ n ij < Φ
(5)
Hence, the matching between a pair of graphs is established by using the posteriori probabilities in Equation (2) about assigning the labels from the gallery graph G ' to the points on the probe graph G ' ' . From these groups pair-wise salient feature matching is performed in terms of graph matching. Finally, the matching scores obtained from these group pairs are fused together by the Dempster-Shafer fusion rule [10] and the fused score is compared against a threshold for final decision.
3 Experimental Evaluation To investigate the effectiveness and robustness of the proposed graph-based face matching strategy, experiments are carried out on the IITK face database and the ORL
342
P. Gupta et al.
face database [11]. The IITK face database consists of 1200 face images with four images per person (300X4), which have captured in control environment with ±20 degree changes of head pose and with almost uniform lighting and illumination conditions, and the facial expressions keeping consistent with some ignorable changes. For the face matching, all probe images are matched against all target images. On the other hand, the ORL face database consists of 400 images taken from 40 subjects. Out of these 400 images, we use 200 face images for experiment, in which ±20 to ±30 degrees orientation changes have been considered. The face images show variations of pose and facial expression (smile/not smile, open/closed eyes). When the faces have been taken, the original resolution is 92 x 112 pixels for each one. However, for our experiment we set the resolution as 140×100 pixels in line with IITK database. ROC Curves Determined from ORL and IITK Face Database 100
<--- Genuine Acceptance Rate --->
90 80 70 60 50 40 30 20 0 10
ROC curve for ORL face database ROC curve for IITK face database
1
10
2
10
<--- False Acceptance Rate --->
Fig. 2. ROC curves for the proposed matching algorithm for ORL and IITK databases
The ROC curves of the error rates obtained from the face matching applied to the face databases are shown in Figure 2. The computed recognition accuracy for the IITK database is 93.63% and for the ORL database is 97.33%. The relative accuracy of the proposed matching strategy for ORL database increases of about 3% over the IITK database. In order to verify the effectiveness of the proposed face matching algorithm for recognition and identification, we compare our algorithm with the algorithms that are discussed in [6], [13], [14], and [15]. There are several face matching algorithms discussed in the literatures which tested on different face databases or with different processes. It is duly unavailable of such uniform experimental environment, where the experiments can be performed with multiple attributes and characteristics. By extensive comparison, we have found that, the proposed algorithm is solely different from the algorithms in [6], [13], [14], [15] in terms of performance and design issues. In [13], the PCA approach discussed for different view of face images without transformation and the algorithm achieved 90% recognition
Maximized Posteriori Attributes Selection from Facial Salient Landmarks
343
accuracy for some specific views of faces. On the other hand, [14] and [15] use Gabor jets for face processing and recognition where the first one has used the Gabor jets without transformation and later one has used the Gabor jets with geometrical transformation. Both the techniques are tested on Bochum and FERET databases which are characteristically different from the IITK and the ORL face databases and the recognition rates are 94% and 96%, respectively at maximum, while all the possible testing are done with different recognition rates. Also, another two graph based face recognition techniques drawn on SIFT features have been discussed in [6] where the graph matching algorithms are developed by considering the whole face instead of the local landmark areas. The proposed face recognition algorithm not only devised keypoints from the local landmarks, but it also combines the local features for robust performance.
4 Conclusion This paper has proposed an efficient and robust face recognition techniques by considering facial landmarks and using the probabilistic graphs drawn on SIFT feature points. During the face recognition process, the human faces are characterized on the basis of local salient landmark features (e.g., eyes, mouth, nose). It has been determined that when the face matching accomplishes with the whole face region, the global features (whole face) are easy to capture and they are generally less discriminative than localized features. On contrary, local features on the face can be highly discriminative, but may suffer for local changes in the facial appearance or partial face occlusion. In the proposed face recognition method, local facial landmarks are considered for further processing rather than global features. The optimal face representation using probabilistic graphs drawn on local landmarks allow matching the localized facial features efficiently by searching and making correspondence of keypoints using iterative relaxation by keeping similarity measurement intact for face recognition.
References 1. Shakhnarovich, G., Moghaddam, B.: Face Recognition in Subspaces. In: Li, S., Jain, A. (eds.) Handbook of Face Recognition, pp. 141–168. Springer, Heidelberg (2004) 2. Shakhnarovich, G., Fisher, J.W., Darrell, T.: Face Recognition from Long-term Observations. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2352, pp. 851–865. Springer, Heidelberg (2002) 3. Wiskott, L., Fellous, J., Kruger, N., Malsburg, C.: Face recognition by Elastic Bunch Graph Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 775–779 (1997) 4. Zhang, G., Huang, X., Wang, S., Li, Y., Wu, X.: Boosting Local Binary Pattern (LBP)Based Face Recognition. In: Li, S.Z., Lai, J.-H., Tan, T., Feng, G.-C., Wang, Y. (eds.) SINOBIOMETRICS 2004. LNCS, vol. 3338, pp. 179–186. Springer, Heidelberg (2004) 5. Heusch, G., Rodriguez, Y., Marcel, S.: Local Binary Patterns as an Image Preprocessing for Face Authentication. In: IDIAP-RR 76, IDIAP (2005)
344
P. Gupta et al.
6. Kisku, D.R., Rattani, A., Grosso, E., Tistarelli, M.: Face Identification by SIFT-based Complete Graph Topology. In: IEEE workshop on Automatic Identification Advanced Technologies, pp. 63–68 (2007) 7. Lowe, D.: Distinctive Image Features from Scale-invariant Keypoints. International Journal of Computer Vision 60(2), 91–110 (2004) 8. Smeraldi, F., Capdevielle, N., Bigün, J.: Facial Features Detection by Saccadic Exploration of the Gabor Decomposition and Support Vector Machines. In: 11th Scandinavian Conference on Image Analysis, vol. 1, pp. 39–44 (1999) 9. Gourier, N., James, D.H., Crowley, L.: Estimating Face Orientation from Robust Detection of Salient Facial Structures. In: FG Net Workshop on Visual Observation of Deictic Gestures, pp. 1–9 (2004) 10. Bauer, M.: Approximation Algorithms and Decision-Making in the Dempster-Shafer Theory of Evidence—An Empirical Study. International Journal of Approximate Reasoning 17, 217–237 (1996) 11. Samaria, F., Harter, A.: Parameterization of a Stochastic Model for Human Face Identification. In: IEEE Workshop on Applications of Computer Vision (1994) 12. Yaghi, H., Krim, H.: Probabilistic Graph Matching by Canonical Decomposition. In: IEEE International Conference on Image Processing, pp. 2368–2371 (2008) 13. Moghaddam, B., Pentland, A.: Face recognition using View-based and Modular Eigenspaces. In: SPIE Conf. on Automatic Systems for the Identification and Inspection of Humans. SPIE, vol. 2277, pp. 12–21 (1994) 14. Wiskott, L., Fellous, J.-M., Kruger, N., von der Malsburg, C.: Face recognition by Elastic Bunch Graph Matching. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 775–779 (1997) 15. Maurer, T., von der Malsburg, C.: Linear Feature Transformations to Recognize Faces Rotated in Depth. In: International Conference on Artificial Neural Networks, pp. 353–358 (1995)
Agent Based Approach to Regression Testing Praveen Ranjan Srivastava1 and Tai-hoon Kim2 1
Computer Science & Information System Group, BITS PILANI – 333031 (India) [email protected] 2 Dept. of Multimedia Engineering, Hannam University, Korea [email protected]
Abstract. Software Systems often undergo changes or modifications based on the change in requirements. So, it becomes necessary to ensure that the changes don’t bring with themselves any side effects or errors that may be hampering the overall objective of developing good quality software. A need is felt to continuously test the software so that such type of risks may be kept at minimum. In recent years, agent-based systems have received considerable attention in both academics and industry. The agent-oriented paradigm can be considered as a natural extension of the object-oriented (OO) paradigm. In this paper, a procedure for regression testing has been proposed to write algorithms for monitor agent and the test case generator agent for regression testing using an agent based approach. For illustration an example Book Trading agent based system is used for the testing purposes. Keywords: Regression Testing, Agent.
to monitor these changes and another agent would generate the test cases for the changed version of the program. For testing purposes we have taken a two agent Book Trading software system comprising of a Book Buyer agent and a Book Seller agent and for this we have used JADE [2]. JADE is a tool for developing and maintaining agent based systems. The extension of object-oriented software testing techniques to agent oriented software testing was already proposed [3]. An attempt was made to apply random testing for agents; behavior based testing for agents and partition testing at agent level. A procedure using message flow graph (MFG) [4] for regression testing of agent oriented software systems was proposed. This method is useful when the interaction pattern of an agent changes. Also, a procedure for selection for selection of modification traversing regression test cases using execution traces was proposed. Software testing is an important part of SDLC [5]. Moreover, regression testing is a much more important issue which involves repeating the execution of a test suite after software has been changed, in an attempt to reveal defects introduced by the change. One reason this is important is that it is often very expensive. If the test suite is comprehensive then it can take significant time and resources to conduct and evaluate the new test.
2 Automation of Regression Testing Traditional regression testing strongly focuses on black-box comparison of program outputs. It has been proposed to apply continuous regression testing by facilitating the development of a robust regression test suite, as well as fully automating regression test suite generation and execution. A basic behavioral regression test suite can be generated automatically to capture the project code's current functionality. Once this baseline regression test suite is in place, its intelligence can be incrementally improved with the concept of agents. Since the environment in which agents work changes dynamically, it needs to monitor the agents at each point of time. JAMES [6] (Java based agent modeling environment for simulation), can be used for creating the virtual environments and generating the dynamic test cases with the help of Monitor agent and subsequently the Test case generator agent [7].The simplest way to set up automated regression testing is to construct a suite of test cases, each of which consists of a test input data and the expected outputs. The input data are processed and the resulting outputs compared with the correct answer outputs. But one thing has to keep in mind that if the earlier test cases defined for the program still successfully passes the testing of the newer version of the software then there is no need to generate new test case for that. The proposed test case generator would generate new test cases only if the earlier test cases don’t satisfy the modifications in the code. For agents, a “Test Fixture” is a collection of multiple related tests. Before each test, the Test Fixtures Setup and Teardown methods are called if present. Each test has multiple Assertions that must have the correct results for the test to pass. Agents can be used for generating regression test cases. The behaviors of an agent can be considered as the basic units or modules of an agent [10]. So, like unit testing
Agent Based Approach to Regression Testing
347
is performed for any software module, the agents can be tested whenever there is a change in their behavior. Since agents communicate through messages, the agent under testing can be sent messages for its reaction and further analyzing. Monitor agents would monitor the behavior of the agents under testing [11, 12]. For example, the Book buyer agent specifies the title and price of the target book when requesting the seller agents in the Request Performer behavior. If on demand it is required to include the publisher and author of the book when trying to buy the book, then the behavior is also changed. The monitor agent in this case would identify the changes and report it to the test case generator agent. The test case generator agent would rerun the test cases defined for the RequestPerformer behavior. In this case the previous test cases would fail because two new attributes publisher and author have been added. So, new test cases need to be generated. The test case inputs would now include four attributes i.e. title, price, publisher, and author. The expected results could be maintained manually in the Test Oracle.
3 Software Agent Methodology A software agent can be defined as “a software system that is situated in some environment and is capable of behaving flexible autonomous action in order to meet its design objectives”. Agents communicate with each other through the Agent Communication Language (ACL) as described above. [8] This is a simple example of a FIPA-ACL message with a request performative. (request :sender (agent-identifier: name [email protected]) :receiver (agent-identifier :name [email protected]) :ontology travel-assistant :language FIPA-SL : protocol fipa-request :content ""((action (agent-identifier :name [email protected]) ))"" ) JADE (Java Agent Development Environment) is used for developing the Book Trading System. This is a development environment for building multi agent systems and is fully implemented using the java language. It strictly follows the FIPA specification. On launching the book trading agent system, a Dummy Agent has been used to stimulate the Book buyer agent and the Book Seller agent by sending user-specified messages and the reaction in terms of received messages are analyzed. The introspector agent is an inbuilt agent provided by JADE RMA GUI [8]. It shows the received and sent messages for the Book Buyer agent and the Book Seller agent in separate windows. It also shows the queue of messages any particular agent in a container. The introspector agent and the sniffer agents [8] provided by JADE show the sequence of interactions between the book buyer and book seller agents diagrammatically as shown in fig 1.[13]
348
P.R. Srivastava and T.-h. Kim
Fig. 1. Sequence diagram showing messages between Book Buyer and Book Seller agents
In this example there are two agents i.e. a BookBuyerAgent and a Bookseller Agent communicating with each other dynamically. The Book Buyer Agent continuously looks out for a particular book by sending requests to the Seller agent .It possesses the list of known seller agents .Whenever it gets a reply from the seller side , it updates the list. It also tracks information about the best offer and the best price along with the number of replies from the seller side. Then it sends a purchase request to the seller agent providing best offer price. On receiving a reply from the seller side it orders for the book. Either the book may be available, not available, or sold already. The Buyer and the Seller agent would act accordingly. The Book Seller agent contains a catalogue of books for sale. It maps the title of a book to its price. It also provides a provision for a GUI by means of which the user can add books in the catalogue. First of all the Seller agent registers the book-selling service. Then it specifies the behaviors for serving the queries and purchase orders coming from the Book Buyer agent side. The seller agent removes the purchased book from its catalogue and replies with an INFORM message to notify the buyer that the purchase has been successfully completed. A. Behaviors of Book Buyer Agent TickerBehaviour: - It schedules a request to seller agents after regular events. RequestPerformer:-It is used by Book-buyer agents to request seller agents the target book. B. Behaviors of Book Seller Agent OfferRequestsServer: - This is the behavior used by Book-seller agents to serve incoming requests for offer from buyer agents. If the requested book is in the local
Agent Based Approach to Regression Testing
349
catalogue the seller agent replies with a PROPOSE message specifying the price. Otherwise a REFUSE message is sent back. PurchaseOrdersServer: -It is a cyclic behavior and is used by Book-seller agents to serve incoming offer acceptances (i.e. purchase orders) from buyer agents. The seller agent removes the purchased book from its catalogue and replies with an INFORM message to notify the buyer that the purchase has been successfully completed. The Book Buyer and the Book Seller agent communicate with each other through ACL messages. ACL is the standard language for communication defined for Multiagent systems. The Book Buyer and the Book Seller agent send the following messages to each other: - CFP (call for Purpose), PROPOSE, REFUSE, ACCEPT_PROPOSAL, INFORM and FAILURE. If we see the present usage trends of the s/w testing community, we can see that regression testing is mostly performed by automating the test scripts [9]. The test scripts are written to ensure that modified versions of software function faultlessly and without bringing any new side effect errors. These automated test script tools may work on object oriented systems. But the main problem is that these tools don’t support dynamism and independency on their part. They can’t behave on the basis of changes in the software that are performed dynamically. So there is a need to develop a tool that can change its behavior dynamically and generate the test cases. C. AGENTS USED IN THE PROPOSAL 1. Book Buyer Agent: - This agent generates requests for books to the book seller. 2. Book Seller Agent: - This agent serves the requests coming from the Book Buyer agent. It behaves according to the type of request coming from the Buyer agent. 3. GUI Agent: - This agent serves to present a user interface so that new books can be added in the catalogue and the user can interact with the agents’ functionality. 4. Monitor Agent: - This agent would monitor the state of the current system and track any changes if they occur. 5. Test case generator agent: - This agent would act on the basis of the findings done by the monitor agent and would generate regression test cases automatically. 6. Test Case Executor: - This agent executes the test cases generated by the generator agent and stores the results along with the test cases in the database file.
4 The Testing Framework The testing framework consists of the following components as shown in Fig .2: 1. Files containing agents under test. 2. File containing agent information storing information extracted from the agent classes under test as well as tests performed and results given. This is done by parsing the agent classes. 3. Test Oracle. 4. The Monitor Agent. 5. Test Case Sequence Generator. 6. Test Case Executor.
350
P.R. Srivastava and T.-h. Kim
Fig. 2. The testing framework
A. Generating Test Case Sequences Whenever there is a change in the behavior of agents, then the sequence of messages communicated between the agents is also changed. After code generation, agent classes are parsed to extract the information from them. This information includes such details as the names of each of member functions within its classes, their parameters, and their types. The agent’s structure is used for white box testing because single behavior can be seen as a black box. Applying the coverage criteria on the agents, it is possible to describe the internal structure of the agent. After that black box testing of the agent behavior can be initiated. This will lead to generation of a series of test sequences that is a series of messages. Test cases are generated keeping in view the sequence of messages between the agents. We can use the Dummy agent and the Introspector agent provided by JADE for testing purposes here. Sample sequences of messages when Book Buyer agent the Book Seller Agent interacts with each other are shown in Table 1: Table 1. Sequences of messages between Book Buyer and the Book Seller Agent
B. Algorithm for developing Monitor agent Data variables: - Boolean Modified Struct Change { Integer LineNumberChanged ; String Behavior Changed; }; // LineNumberChanged indicates location of change // Behavior Changed identifies the changed behavior Input:-File Older Version, File New Version // Older Version contains the agents classes before change. // New Version contains the agents classes after change Output:-Struct Change Ch Define the behaviors of the Monitor agent Define the action () method for the operations to be performed when the behavior is in execution. Modified=Compare (Older Version, New Version); If (Modified==TRUE) { Ch.LineNumberChanged =line no. of the changed code; Ch.BehaviorChanged=identifyBehavior(LineNumberChanged) } Else { Ch.LineNumberChanged =NULL; Ch.BehaviorChanged=NULL; } Return (Ch) to Test Case Generator //keep tracks of the changes so that these can be used for generating test cases for the modified version of the program. Declare the done () method to indicate whether or not a behavior has completed and is to be removed from the pool of behaviors an agent is executing. C. The Test Case Generator Agent The function of test case generator agent depends on the output of the monitor agent. The monitor agent has the responsibility to continuously monitor the source code and to identify when and where changes have been done. Then Monitor agent identifies the location of change and specifies it to the Test Case Generator Agent. By identifying the location of change the functionality that has been changed can be identified. In case of agent based systems this functionality is the behaviors of the agents. Agents behave dynamically under different conditions by sensing their environment. The Test Case Generator Agent will take as input the identified behavior along with the previously defined test cases for that behavior. The previously defined test cases for any behavior could be found from the Test Case Repository stored in the database. The
352
P.R. Srivastava and T.-h. Kim
test Oracle for the behavior is taken as a reference for comparing with the new test cases generated for the behavior. The results are analyzed to ensure that the modified code functions as per the requirements. D. Algorithm for Test Case Generator • • • • • • • • • • • • • • • • • • • • • • • • •
• •
Input: Location of change L , Agent A Determine the behavior associated with the change in A. Behavior B=changedBehavior(A, L) Extract the changed parameters of B Param P= changedParameters(B) TestCase Tc=PreviousTestCases (Behavior B) If Satisfy(Tc ,B, P)==1 { Execute test case Tc Results R=execute(Tc); } Else { Generate New Test Cases for B against P TestCase TcNew=generate(B,P); Execute test case TcNew Results R=execute(TcNew); } Find expected results from Test Oracle Expected E=find(Test Oracle, Behavior B); Boolean bool=Compare(R,E); If(bool=TRUE) { System.out.println (“Test Passed.”); Save TcNew in test repository } Else { System.out.println (“Test Failed”); } Prepare Test Report report Save report into Database File.
5 Result The Book Buyer and Seller agents are tested on the basis of their behaviors. Both of them can be tested individually with the help of the dummy agent. But initially the seller agents are initialized by adding book details in their corresponding catalogues as shown in Table 2.
Agent Based Approach to Regression Testing Table 2. Addition of Books in Sellers’ catalogues
Seller Agent
Book Title
Price
S1
Learning C++
300
S2
Learning C++
290
S3
Java Unleashed
470
S4
Java Unleashed
490
S5
Software Engineering Principles Management
500
S6
380
Table 3. Expected offers and the actual results for any buyer
Target Book
List of Sellers
Learning C++ Java Unleashed Software Engineering Principles of Management
Actually bought from
S1,S2
Best offer given by S2
S2
S3,S4
S3
S3
S5
S5
S5
S6
S6
S6
Testing the Request Performer behavior of Book Buyer Agent Message Received (test data) Inform Propose Propose Refuse Refuse
Actual Reply CFP Accept Proposal CFP CFP Accept Proposal
Test case Result Passed Passed Failed Passed Failed
353
354
P.R. Srivastava and T.-h. Kim
Testing the Offer Request Server behavior of Book Seller Agent Message Received (test data) Request Request CFP CFP CFP Accept Proposal Accept Proposal
Expected Reply
Actual Reply
Test case Result
Inform Inform Propose Refuse Refuse Inform
Inform Refuse Propose Propose Refuse Refuse
Passed Failed Passed Failed Passed Failed
Inform
Inform
Passed
The differences in prices offered by different book sellers may be due to varying discounts. Now if a buyer wants to buy the books dynamically, they join the same environment. The expected and the actual results for any buyer who wants to buy a book is as shown in Table 3. Changing the sequence of messages between the Book Buyer and the Book Seller can change the behavior of the agents. So in that case new test cases are generated for that behavior. For example, if book publisher and author details are also required to be added in the sellers’ catalogue then the behavior of the seller certainly changes and as a result the sequence of messages also changes. The following table shows the change in Offer Request Server behavior of the book seller agent when two new attributes is added for book details.
Test Data {Price, Title} {Price, Title, publisher, author}
Expected Message
Actual Message
Test Results
Propose
Refuse
Failed
Propose
Propose
Passed
The above proposed algorithm is implemented in the source code of Binary Search algorithm. As the algorithm is proposed in two steps in the first phase the CFG of the given source code is drawn. Then possible paths of execution are taken and the statements changed within the path in the enhanced project are considered. Priority level is given to the test cases executing the definite paths on the basis of the number of statements have changed. In the next phase of the algorithm, from the existing set of test cases normalized test cases are chosen by n-way testing technique, so that the modified test suite must be efficient enough to cover all the possible sets of combinations formed from the previous one.
Agent Based Approach to Regression Testing
355
6 Conclusion and Future Work In this paper, agents have been used for doing regression testing of a Book Trading dual agent system. The algorithms for developing Monitor agent and the Test Case Generator Agent have been proposed. This way, an attempt has been made to simplify the time and effort complexities associated with regression testing. Since agent based testing is an emerging area, the paper is limited to developing only the algorithms. For developing the complete coding of the Monitor agent and the Test Case generator agent, in depth knowledge of code parsing, code compiling and also agent based regression testing techniques is quite essential.
References [1] Pressman, R.S.: Software Engineering-A practitioner’s approach, 6th edn. McGraw Hill International, New York (2005) [2] Bellifemine, F., Poggi, A., Rimassa, G.: JADE A FIPA 2000 Compliant Agent Development Environment [3] Srivastava, P.R., et al.: Extension of Object-Oriented Software Testing Techniques to Agent Oriented Software Testing. Journal of Object Technology (JOT) 7(8) (November December 2008) [4] Srivastava, P.R., et al.: Regression Testing Techniques for Agent Oriented Software. In: 10th IEEE ICIT, Bhubanswer, India, December 17-20. IEEEXPLORE (2008), doi:ieeecomputersociety.org/10.1109/ICIT.2008.30 [5] Desikan, S., Ramesh, G.: Software testing principles and practices. Pearson Education, London (2002) [6] Himmelspach, J., Röhl, M., Uhrmacher, A.: Simulation for testing software agents – An Exploration based on JAMES. In: Proceedings of the 35th Winter Simulation Conference: Driving Innovation, New Orleans, Louisiana, USA, December 7-10. ACM, New York (2003) ISBN 0-7803-8132-7 [7] Kissoum, Y., Sahnoun, Z.: Test Cases Generation for Multi-Agent Systems Using Formal Specification, http://www.cari-info.org/actes2006/135.pdf [8] Bellifemine, F., Caire, G., Greenwood, D.: AG, Switzerland, Developing Multi-Agent Systems with JADE. Wiley Publications, Chichester [9] Morrison, S.: Code Generating Automated Test Cases, [email protected] 214-769-9081 [10] Zhang, Z., Thangarajah, J., Padgham, I.: Automated Unit Testing for Agent Systems. In: Proceedings of the 7th international joint conference on Autonomous agents and multiagent systems: demo papers, pp. 1673–1674 (2008) [11] Ummu Salima, T.M.S., Askarunisha, A., Ramaraj, N.: Enhancing the Efficiency of Regression Testing Through Intelligent Agents. In: International Conference on Computational Intelligence and Multimedia Applications (2007) [12] Jeya Mala, D., Mohan, V.: Intelligent Tester –Test Sequence Optimization framework using Multi- Agents. Journal of Computers 3(6) (June 2008) [13] Coelho, R., Kulesza, U., von Staa, A., Lucena, C.: Unit Testing in Multi-agent Systems using Mock Agents and Aspects. In: Proceedings of the 2006 International Workshop on Software Engineering for Large-scale Multi-agent Systems, Shanghai, China, pp. 83–90 (2006)
A Numerical Study on B&B Algorithms for Solving Sum-Of-Ratios Problem Lianbo Gao and Jianming Shi Department of Computer Science and Systems Engineering, Muroran Institute of Technology [email protected]
Abstract. The purpose of this paper is threefold; (1) Offer a synopsis of algorithmic review and to make a comparison between two branchand-bound approaches for solving the sum-of-ratios problem; (2) Modify an promising algorithm for nonlinear sum-of-ratios problem; (3) Study the efficiency of the algorithms via numerical experiments. Keywords: Sum-of-ratios problem, fractional programming, branchand-bound approach.
1
Introduction
In this research we consider the following Sum-Of-Ratios (SOR) problem (P ), which is a class in fractional programming problems. p nj (x) max h(x) = d (x) (P ) j=1 j s.t. x ∈ X, where p ≥ 2, −nj (·), dj (·) : Rn → R are bounded and convex in X for all j = 1, . . . , p.The set X = {x ∈ Rn | Ax ≤ b}, is a convex compact subset of Rn . Here A is an m × n matrix, b is a vector. Generally, the sum of concave/convex ratios in (P ) is not quasiconcave [Sc77]. Actually, the SOR problem has many locally optimal solutions that are not globally optimal. Theoretically, this problem is N P-hard, even in the case of the sum of a concave/convex ratio and a linear function [FrJa01, Ma01]. In that sense, the locally optimal techniques are incompetent to solve the difficulties with problem (P ). The difficulty of the problem mainly arises from the number of ratios. Some algorithms are proposed to solve the problem with a modest number of ratios [KoYaMa91, KoAb99, KoYa99, MuTaSc95, HoTu03]. Two most promising algorithms are proposed by Kuno [Ku02] and Benson [Be02]. When the number of ratios is relative larger (say, greater than 10), the branch-and-bound (B&B) approach for solving problem (P ) is most powerful and simple [ScSh03]. The main difference of two algorithms as follows.
Corresponding author.
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 356–362, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Numerical Study on B&B Algorithms for Solving SOR Problem
357
Kuno used a trapezoid to make a concave envelope over function tj /wj , while Benson used a rectangle to do this. Therefore their B&B operators work on different feasible regions. The former is trapezoidal and the later is rectangular. Kuno’s algorithm is designed for linear ratios, Benson’s is for nonlinear ratios problem. Nature question about the two algorithms are: which one is more efficient? Is it possible to extend Kuno’s algorithm to solving nonlinear SOR problem? To the best knowledge of the authors, there are no numerical experiments that evaluates the efficiency of Benson’s algorithm. We believe that the numerical behavior of an algorithm is crucial to the algorithm. So, this research provides a benchmark for other researchers working on this problem. Hence, the results and outcome in his study will avail to researchers to develop well-designed software global optimization. This paper is orgainzed as follows. In section 2, we outline Benson’s and Kuno’s algorithms and their properties, we extend Kuno’s algorithm into solving the nonlinear SOR problem. The numerical efficiency is reported in section 3. Finally, we give some concluding remarks in section4.
2
Review the Algorithms
In this section, we review two B&B algorithms that were proposed by Benson [Be02] and Kuno [Ku02], respectively. Denote lj := min{nj (x) | x ∈ X}, uj := max{nj (x) | x ∈ X}; Lj := min{dj (x) | x ∈ X}, Uj := max{dj (x) | x ∈ X}; p Hj . Hj := (tj , sj ) ∈ R lj ≤ tj ≤ uj , Lj ≤ sj ≤ Uj , j = 1, 2, . . . , p , H =
2
j=1
We assume that lj ≤ uj and Lj ≤ Uj for j = 1, . . . , p. We consider the following problem: p tj max s j=1 j s.t. −nj (x) + tj ≤ 0, j = 1, 2, . . . , p, (PH ) dj (x) − sj ≤ 0, j = 1, 2, . . . , p, (tj , sj ) ∈ Hj , j = 1, 2, . . . , p, x ∈ X. Through this paper, we denote by t = (t1 , . . . , tp ) , s = (s1 , . . . , sp ) and (t, s) ∈ H if and only if (t1 , s1 , . . . , tp , sp ) ∈ H. We denote by ⎧ ⎫ −nj (x) + tj ≤ 0, j = 1, 2, . . . , p, ⎬ ⎨ F (H) := (x, t, s) dj (x) − sj ≤ 0, j = 1, 2, . . . , p, ⎩ ⎭ x ∈ X, (t, s) ∈ H
358
L. Gao and J. Shi
for the feasible region of problem (PH ). Thus we see that F (H) is a convex compact set. ˆ j be the lower bounds and upper bounds for tj and sj satisfying Let ˆ lj , u ˆj , L ˆ ˆ j ≤ sj ≤ U ˆj ≤ Uj , j = 1, 2, . . . , p. that lj ≤ lj ≤ tj ≤ uˆj ≤ uj , Lj ≤ L Denote that
ˆ j := (tj , sj ) ∈ R2 ˆ ˆ j ≤ sj ≤ U ˆj H lj ≤ tj ≤ uˆj , L ˆ =H ˆ1 ×H ˆ2 × ··· × H ˆ p . Obviously, H ˆ is a subset of for j = 1, 2, . . . , p and H H. Within a framework of B&B algorithm, Benson’s algorithm finds a globally optimal solution of problem (PH ) by lessening a difference between the best incumbent lower bound and the upper bound. Kuno used a trapezoid to construct an approximation of the objective function for linear SOR problem, i.e., ni (·) and di (·) are linear/affine. Denote that Ω := {(t, s) ∈ R2p | tj = ni (x), sj = di (x), j = 1, 2, . . . , p, x ∈ X}
(2.1)
and Γj := {(tj , sj ) ∈ R2 | ¯lj ≤ tj + sj ≤ u¯j }, Δj := {(tj , sj ) ∈ R2 | s¯j tj ≤ sj ≤ t¯j tj } for all j, where ¯lj ≤ min{nj (x) + d(x) | x ∈ X}, u ¯j ≥ max{nj (x) + d(x) | x ∈ X}, s¯j ≤ min{nj (x)/d(x) | x ∈ X}, t¯j ≥ max{nj (x)/d(x) | x ∈ X}.
(2.2)
As point out in [Ku02] that all values in (2.2) can be obtained by solving a linear and di (·) are linear. programming if ni (·) Denote that Γ := pj=1 Γj , Δ := pj=1 Δj . Then problem (P ) is equivalently rewritten as problem (Mp ): ⎫ ⎧ p ⎬ ⎨ (MP ) z = max tj /sj (t, s) ∈ Ω ∩ Γ ∩ Δ , (2.3) ⎭ ⎩ j=1
In the algorithm, Δ is divided into smaller trapezoids Δh (h ∈ H) such that p h h Δ = j=1 Δj ; Δj = h∈H Δhj and that the interior sets of Δhj are disjoint, i.e., intΔhj 1 ∩ intΔhj 2 = ∅ if h1 = h2 . Obviously, for each h if we can solve the following subproblem P (Δh ): p tj max z(Δh ) = s P (Δh ) (2.4) j=1 j s.t. (t, s) ∈ Ω ∩ Γ ∩ Δh . Note that Δhj is a 2-dimensional cone. Suppose that for each h Δhj := {(tj , sj ) ∈ R2 | s¯hj tj ≤ sj ≤ t¯hj tj }.
A Numerical Study on B&B Algorithms for Solving SOR Problem
359
Then Γj ∩ Δhj is a trapezoid with following four vertices: shj + 1), (¯ uj , s¯hj u ¯j )/(¯ shj + 1), (¯lj , s¯hj ¯ lj )/(¯ h h h¯ h ¯ ¯ ¯ ¯ ¯ ¯j )/(tj + 1), (lj , tj lj )/(tj + 1). (¯ u j , tj u
(2.5)
If nj (x) and dj (x) are linear then problem P¯ (Δh ) is equivalent to the following linear programming: p max z¯(Δh ) = ζj j=1 s.t. (t¯hj + 1)(sj − s¯hj tj ) − ¯ ljh ζj + ¯ljh s¯hj ≥ 0, ∀j, h h h P¯ (Δ ) (2.6) (¯ sj + 1)(sj − t¯j tj ) − u¯hj ζj + u ¯hj t¯hj ≥ 0, ∀j, h h s¯j ≤ ζj ≤ t¯j , ∀j, (t, s) ∈ Ω ∩ Γ ∩ Δh , x ∈ X. Extension of Kuno’s Algorithm for Solving Nonlinear Case Though Kuno’s algorithm is originally designed to solve a linear SOR problem, the idea behind the algorithm is useful to solving a nonlinear SOR problem [Ku02]. Now we briefly discuss an extension of Kuno’s algorithm for solving nonlinear SOR problem. When −nj (x) and dj (x) are convex, then Ω in (2.1) is no longer convex. To keep the convexity of the feasible region, we consider the following region. min{nj (x) | x ∈ X} ≤ tj ≤ max{nj (x) | x ∈ X} ΩN = (t, s) ∈ R2p (2.7) min{dj (x) | x ∈ X} ≤ sj ≤ max{dj (x) | x ∈ X} Then (P ) is equivalent to the following problem: p tj max z(Δ) = s j=1 j s.t. −nj (x) + tj ≤ 0, j = 1, 2, . . . , p, Q(Δ) dj (x) − sj ≤ 0, j = 1, 2, . . . , p, (t, s) ∈ ΩN ∩ Γ ∩ Δ, x ∈ X.
(2.8)
We note that ΩN ∩ Γ ∩ Δ is a convex set that can be represented by linear inequalities. Then the feasible region of (2.8) is convex set. For any Δh in the ¯ h ) is able to serve as an upper algorithm, the maximum z¯(Δh ) of following Q(Δ h bound of problem Q(Δ ): p max z¯(Δh ) = φhj (tj , sj ) j=1 ¯ h ) s.t. −nj (x) + tj ≤ 0, j = 1, 2, . . . , p, Q(Δ (2.9) dj (x) − sj ≤ 0, j = 1, 2, . . . , p, (t, s) ∈ ΩN ∩ Γ ∩ Δh , x ∈ X.
360
L. Gao and J. Shi
Instead of solving P¯ (Δh ) in Kuno’s algorithm we use the idea in [Ku02] to solve ¯ h ). When −nj (x) and dj (x) are convex, nonlinear SOR propblem by solving Q(Δ ¯ the values of lj and u ¯j in (2.2) can be obtained by solving a d.c. programming. Suppose S is a simplex containg X and all vertices VS = {vS1 , . . . , vSn+1 } of S are available. We can obtain ¯lj and u ¯j as follows: ¯lj := min{nj (v i ) | i = 1, . . . , n + 1} + min{dj (x) | x ∈ X}, S u ¯j := max{nj (x) | x ∈ X} + max{dj (vSi ) | i = 1, . . . , n + 1}
(2.10)
and s¯j :=
min{nj (x) | (vSi ) | i = 1, 2, . . . , n + 1} , max{dj (vSi ) | i = 1, 2, . . . , n + 1}
(2.11)
max{nj (x) | x ∈ X} t¯j := . min{dj (x) | x ∈ X}
3
Numerical Experiments
In this section, we make a comparison of numerical behavior of the algorithms that we have discussed in previous sections. We believe that it is fair to evaluate different algorithms based on numerical experiments by solving the same instances under the same environments. To do this, we coded both algorithms with Scilab 3.0. Table 1 provides information on the surroundings of the experiments in details. Table 1. Environments of Experiments Code CPU Memory OS Scilab Ver. 3.0 Pentium 4 2.80 GHz 512MB Win XP Professional
3.1
Linear Ratios Case
When nj and dj in (P ) are linear/affine functions, (P ) is a linear SOR problem. We consider the following problem in the numerical experiments: p n max h(x) = i=1 nji xi + c n i=1 dji xi + c j=1 n (3.12) (Le ) aki xi ≤ 1.0, k = 1, . . . , m, s.t. i=1 xi ≥ 0, i = 1, . . . , n. In this study, nji , dji ∈ [0.0, 0.5], aki ∈ [0.0, 1.0] and c ∈ [2.0, 80.0]. Every instance is solved by both algorithms. Table 2 gives the CPU time in seconds.
A Numerical Study on B&B Algorithms for Solving SOR Problem
361
Table 2. CPU time (s) for Linear Ratios Case (Kuno, Benson) (m, n) = (5, 5) (m, n) = (10, 10) (m, n) = (15, 15) (m, n) = (20, 20)
As proposed in previous section, Kuno’s algorithm can be extended to solving nonlinear SOR problem. Now we inquire into the behavior of Benson’s algorithm and the extension of Kuno’s algorithm when −nj and dj are convex. It is not easy to generate a good test problem for nonlinear SOR problem with a large number of ratios. p nj1 x21 + nj2 x1 + nj3 x22 + nj4 x2 + nj5 x23 + nj6 x3 + nj0 max dj1 x21 + dj2 x1 + dj3 x22 + dj4 x2 + dj5 x23 + dj6 x3 + dj0 j=1 (N L2 ) (3.13) s.t. x1 + x2 + x3 ≤ 10, −x1 − x2 + x3 ≤ 4, x1 , x2 , x3 ≥ 1. Here, for an odd j we set (nj1 , nj2 , nj3 , nj4 , nj5 , nj6 , nj0 ) = (−1, 4, −2, 8, −3,12, 56), (dj1 ,dj2 ,dj3 ,dj4 , dj5 ,dj6 ,dj0 ) = (1, −2, 1, −2, 0, 1, 20), for an even j we set (nj1 ,nj2 ,nj3 , nj4 ,nj5 , nj6 ,nj0 ) = (−2, 16, −1, , 8, 0, 0, 2), (dj1 , dj2 , dj3 , dj4 , dj5 , dj6 , dj0 ) = (0, 2, 0,4, 0,6,0). Then it is easy to confirm that nj (x) is concave and dj (x) is convex. Both algorithms obtain an optimal solution (1.84, 1.00, 1.00) with the optimal value to problem (N L2 ). Table 3 reports the CPU time with various values of p. Table 3. The CPU time (s) for solving problem (N L2 )
Benson Kuno
p = 2 p = 4 p = 6 p = 8 p = 10 30 109 656 3222 27085 32 116 684 3346 27499
From the limited experiments we see that Beson’s algorithm is slightly better than Kuno’s algorithm. We know that it is difficult to judge an algorithm without substantive experiments. From the results of our numerical experiments, it seems that there is no significant difference between the retangular and trapezoid division in the algorithms for solving the nonlinear SOR problem.
362
4
L. Gao and J. Shi
Concluding Remarks
We have discussed the efficiency of two most promising algorithms, Kuno’s and Benson’s algorithms, for solving the sum-of-ratios problem through the numerical experiments. Kuno’s original algorithm is designed for solving the linear sum-ofratios problem. We have extended the algorithms to solving the nonlinear sum of ratios problem and compared the efficiency between Benson’s algorithm and the extension. Without doing substantial numerical experiments it might be hard to firmly evaluate any algorithm. Even though, the primary experiments indicate that Kuno’s method might be more efficient than Benson’s solving the linear sum-ofratios problem while Benson’s algorithm might be somewhat better than Kuno’s algorithm for nonlinear sum-of-ratios problem.
References [Be02]
[Ku02] [Cr88] [FrJa01] [HoTu03]
[KoAb99] [KoFu00]
[KoYaMa91]
[KoYa99]
[MuTaSc95]
[Ma01]
[Sc77] [ScSh03]
Benson, H.P.: Using concave envelopes to globally solve the nonlinear sum of ratios problem. Journal of Global Optimization 22, 343–364 (2002) Kuno, T.: A branch-and-bound algorithm for maximizing the sum of several linear ratios. Journal of Global Optimization 22, 155–174 (2002) Craven, B.D.: Fractional Programming. Sigma Series in Applied Mathematics, vol. 4. Heldermann verlag, Berlin (1988) Freund, R.W., Jarre, F.: Solving the sum-of-ratios problem by an interior-point method. Journal of Global Optimization 19, 83–102 (2001) Hoai Phuong, N.T., Tuy, H.: A unified monotonic approach to generalized linear fractional programming. Journal of Global Optimization 26, 229–259 (2003) Konno, H., Abe, N.: Minimization of the sum of three linear fractional functions. Journal of Global Optimization 15, 419–432 (1999) Konno, H., Fukaishi, K.: A branch-and-bound algorithm for solving low rank linear multiplicative and fractional programming problems. Journal of Global Optimization 18, 283–299 (2000) Konno, H., Yajima, Y., Matsui, T.: Parametric simplex algorithms for solving a special class of nonconvex minimization problems. Journal of Global Optimization 1, 65–81 (1991) Konno, H., Yamashita, H.: Minimization of the sum and the product of several linear fractional functions. Naval Research Logistics 46, 583–596 (1999) Muu, L.D., Tam, B.T., Schaible, S.: Efficient algorithms for solving certain nonconvex programs dealing with the product of two affine fractional functions. Journal of Global Optimization 6, 179–191 (1995) Mastumi, T.: NP Hardness of Linear Multiplicative Programming And Related Problems. Journal of Information and Optimization Sciences 9, 113–119 (1996) Schaible, S.: A note on the sum of a linear and linear-fractional function. Naval Research Logistics Quarterly 24, 691–693 (1977) Schaible, S., Shi, J.: Fractional programming: the sum-of-ratios case. Optimization Methods and Software 18, 219–229 (2003)
Development of a Digital Textbook Standard Format Based on XML Mihye Kim1, Kwan-Hee Yoo2,∗, Chan Park2, and Jae-Soo Yoo2 1
Department of Computer Science Education, Catholic University of Daegu, 330 Hayangeup Gyeonsansi Gyeongbuk, South Korea [email protected] 2 Department of Computer Education and IIE, Department of Information Industrial Engineering, Department of Information Communication Engineering, Chungbuk National University, 410 Seongbongro Heungdukgu Cheongju Chungbuk, South Korea {khyoo,szell,yjs}@chungbuk.ac.kr
Abstract. This paper proposes a standard format for those digital textbooks which are used in Korean elementary and secondary school. This standard format comprises an array of function and XML-based document formats. The format aims to maximize the effectiveness of learning by combining the usual advantages of printed textbooks and the additional functions of digital media such as searching and navigation, audiovisuals, animations, 3D graphics and other state-of-the-art multimedia functions. Another objective is to facilitate the interoperability of digital textbooks among different users and service providers by developing an XML-based document format. Keywords: Digital textbook, Digital textbook standard format.
To utilize these advantages, several DTs have been developed. The focus of most studies on DTs, however, is on those with simple contents and technologies, which can not completely substitute for PTs. Rather, they were conceived of as supplementary resources using a computer or a website [2], [4], [5], [6]. In addition, they are often ‘computer-based books’ into which the contents of PTs have been simply transferred without the advantages of traditional paper textbooks. Furthermore, the learning habits of students used in PTs had not been considered in the development of such DTs [4]. As a result, they had been criticized for their lack of practical use in actual school environments [7], [8]. As a consequence of the expansion of ubiquitous computing environments, current studies on DTs are going beyond their simple supplementary functions, and the form of DTs is becoming more advanced [3], [4], [9], [10]. Digital contents, however, use a variety of different and incompatible document formats [11]; that is, there are many heterogeneous proprietary formats based on providers, creating usage barriers and reducing adoption rates. DTs are also often combined with a proprietary operating program, which makes their size huge and makes it difficult to revise or enhance their contents. Under this protocol, even if only part of the contents needs revising, all the contents plus operating program must be redistributed [9]. As such, the need to develop standards for textbook contents is emerging. Once a standard has been developed, DTs could perform their intended function as the main media for teaching and learning activities in schools. In educational business fields, many companies would be able to freely participate in the development of DTs, which would make digital contents for more abundant, and enable the development of new information delivery technologies [4]. The aim of this study is to define a standard format for DTs, that are used in Korean elementary and secondary schools, which would include all the functions and roles of PTs while adding the advantages of digital media; that is, the aim of this study is to come up with DTs that have the advantages of PTs and additional digital media functions such as searching and navigation, and multimedia learning functions such as provision of audiovisual content, animations, and 3D graphics, to make the learning experience more convenient and effective [3], [4], [9]. We would like to develop DTs in close adherence to the paradigm of traditional PTs, to accommodate the learning habits of students, in addition to completely substituting for the existing PTs. Another objective of this study is to facilitate the interoperability of DTs among different users and providers, as well as across different operating environments. In this paper, the proposed standard format is named the Korean Digital Textbook (KDT) Standard. The structure of this paper is as follows. In Section 2, previous studies on standards for DTs and electronic books are reviewed. Section 3 presents the research methods that were used to come up with the KDT standard. In Section 4, the developed KDT standard format is presented. The paper concludes with a discussion of possible future directions of the research described in this paper.
2 Related Work An electronic book (e-book) is an electronic publication that can be read with the aid of a computer, a special e-book reader, a personal digital assistant (PDA), or even a
Development of a Digital Textbook Standard Format Based on XML
365
mobile phone [12]. In other words, an e-book is a book displayed on a screen, rather than on paper. Unlike an ordinary book, a textbook contains educational materials in the form of systematized knowledge or information that students and teachers use in school. Accordingly, DTs are used in a comparatively limited way, to facilitate the educational process in schools, and they are less inclusive in concept than e-books. In Korea, as the development of DTs accelerates, a need to standardize a DT format has emerged; and some studies on the educational and technical aspects of such standardization have recently been conducted. In 2005, the Korea Education and Research Information Service (KERIS) directed a study on the standardization of Korean DTs [9]. The study defined the meaning of a DT and the functions that it should offer. In another study, a draft of a Korean standard for DTs was developed [3]. This study extended the previous study [9] and more concretely defined the DT functions with an actual example of a mathematics DT. The contents of this paper are based on these studies. In 2007, the Korean Ministry of Education, Science, and Technology established and began to promote a mid- and long-term DT commercialization strategy, which aims to develop a DT model. The ministry also planned to test-run the DTs in 100 schools nationwide by 2011, and to distribute them to all schools by 2013 [13]. At the international level, instead of studies on the models and standards for DTs, several standardization studies for e-book contents have been conducted. The use of proprietary formats and the lack of standards are also the major obstacles in the use of e-books [12], [14]. Having recognized these problems, the International Digital Publishing Forum (IDPF) developed a standard, the Open Publication Structure (OPS), for presenting e-book contents [15]. The OpenReader Consortium [16] is also establishing a universal end-user digital publication format named the Open EBook Publication Structure (OEBPS), which has an XML-based framework [17]. Academic publishers are also working on the development of a common XML-based format that can be used universally for digital contents [12]. In Japan, the Japanese Electronic Publishing Association announced in 1999, JapaX 0.9 as its standard for storage and exchange of e-book contents [18]. Regarding the standardization of digital textbook with the academic contents, the IMS Global Learning Consortium is developing the Common Cartridge Standard. The IMS Consortium believes that it is time to standardize DTs, and to clearly define their functions and develop a concrete development plan for them. As well, there are widely used international document formats for e-books or DTs such as Microsoft’s XAML (eXtensible Application Markup Language) [20], Adobe’s MXML (XML-based Markup Language) [21], and MS Office’s open XML formats [22]. One of these might be used as the document format for Korean DTs. To make these languages usable for Korean DTs, however, they must be able to support Korean characters. As they are still incomplete in this respect, this study establishes a new standard that is suitable for Korean word processing and educational environments and that is based on the common XML document format. The KDT standard, however, refers to various international standard formats that are already defined for international compatibility. It is expected that DTs will ultimately integrate curriculum and classroom into one, so that in the near future, digital media will play a major role in addressing the learning demands of the digital generation.
366
M. Kim et al.
3 Method of Development of a KDT Standard Format To develop a DT standard for Korean elementary and secondary schools, first, a literature survey was performed. Then interviews with those who had participated directly or indirectly in previous studies on DTs were conducted. Consultations with experts were subsequently undertaken. As there is no clear standardization report yet for a DT format, it was believed that the literature survey, which included case studies and DT prototypes, would be a very important method of determining the direction of this study. Consequently, the KDT concepts were established through case studies, especially by analyzing the advantages and disadvantages of previously developed DTs. In addition, mathematics textbooks for elementary and middle schools were analyzed as a model for identifying more concrete KDT functions. Due to the short duration of this study, a questionnaire survey of teachers, who were in charge of the courses, was excluded. Instead, interviews were conducted with those who had participated directly or indirectly in previous DT studies. The interviews were aimed at analyzing the role of the KDT with respect to the improvement of learning achievements, the promotion of learning interest, and support for systematic learning patterns and effective teaching methods. Another aim of the interviews was to determine and confirm the characteristics of technologies that are needed to promote learning achievements. In addition, experts were consulted to establish the basic elements and functions of the KDT standard and to examine whether the design methodologies, development guidelines and practical models, which had been developed, are appropriate or not. The experts consisted of technical staff, elementary school teachers, XML specialists, professors, elementary mathematics specialists, and educational engineering specialists who had participated in DT development projects.
4 KDT Standard Format KDT is defined as “a digital learning textbook that maximizes the convenience and effectiveness of learning by digitizing existing printed textbooks, to provide the advantages of both printed media and multimedia learning functions such as images, audiovisuals, animations, and 3D graphics as well as convenience functions such as search and navigation” [3], [9], based on previously developed concepts of DTs. Considering international compatibility, the KDT standard format refers to various international standard formats that are already defined. The standard formats referenced are for the search function, hyperlinks, multimedia, mathematical symbols and formulas, and 2D/3D graphical and animated representations. The KDT standard format consists of a function format and an XML-based document format. The function format defines the functions that must be provided by DTs. Each of these functions is defined by its name, description, and types of input values, examples, and references. The XML-based document format defines document formats for representing the DT functions using XML components to support the interoperability of DTs.
Development of a Digital Textbook Standard Format Based on XML
367
Table 1. The category number and name of a defined function in the function format No
User information Enter user name User authentication Register password Display contents Display text or images Page view Double-page view Single-page view Zoom in and out Zoom in Zoom out Fit page Fit to width Fit to height Page scroll Page scrolling Indicate page Indicate page thickness Indicate page number Text hide Text hiding Writing Stylus writing Delete writing Autosave writing Stylus writing impossible Memo View memo
Edit bookmark Delete bookmark Move to previous/next bookmark Set bookmark for log-out Search Keyword search Multimedia search Search among DTs Keyword search Multimedia search Print function Print textbook contents Print memo Print notes Coy function Copy text Copy images Sound effect Click sound effect Select sound effect Error sound effect Open/Close sound effect Multimedia View multimedia objects View multimedia in new window Interactive multimedia View interactive multimedia View inter. Multi. in new window Hyperlink
10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9
Enter memo Autosave memo Edit memo Open memo Delete memo Select memo pen color Search memo Create a table of memo contents Assign window position Move memo window Resize memo window Notes View notes Enter notes Save notes Edit notes Delete notes Indicate date of notes
16 17 17.1 17.2 17.3 17.4 18 18.1
Save notes as an image Save notes as a text file Resize notes window Underline Underlining Select color for underline Select shape for line Select thickness for line Save underlining Edit underlining Delete underlining Highlight Highlighting Select color for highlight Select shape for highlight Select thickness for hl. Autosave highlighting Edit highlighting Delete highlighting Voice memo Record voice memo Play voice memo Delete voice memo Textbox Create textbox Autosave textbox Edit textbox Delete textbox Display hyperlink- contents in textbox Create input box Formula Enter formula Edit formula Delete formula View formula Navigation function Move by the table of contents (TOC) Move by the table of tags Move by pre/next page Move by page number Look over textbook Page tuning Bookmark Set bookmark Move to other bookmark Save bookmark
27.1 27.2 28 28.1 28.2 29 29.1 29.2
Hyperlink shape Hyperlink move Glossary View glossary Search glossary Subject menu View additional menu for subject Construct additional menu for new subject Data transmission Teacher to individuals Teacher to groups Teacher to class Formative evaluations View questions Solve questions View evaluation results View statistical data
4.1 Function Format of the KDT Standard The function format is partitioned into eight areas in order to place them appropriately in the XML hierarchy: authentication, display, input, move, search, print, multimedia support, and learning support functions. Each of these functions is further divided into sub-functions that are explained in detail with its name, description, and types of input values, examples and references. Due to space limitations, only the names and brief explanations of each function with examples for the representative features are presented. Table 1 shows the category number and name of a function defined in this function format. Authentication. Similar to writing the owner’s name on a PT, the authentication functions allow users to enter their name on a DT (i.e., the owner of the DT) when they begin to use it. These functions also include the password function, with which users can protect their private information and maintain security, and the log-in function, which certifies a user as an authorized user of the DT. Display Functions. These functions are related to the display of the contents of a DT on the screen of a desktop, tablet PC, or notebook. Text, images, tables, and other contents of a DT are displayed, just as in PTs, but are integrated with other multimedia features such as 3D graphics, animation, and audiovisual objects. The functions of displaying the text and images of a DT, viewing a DT by a page unit (single or double page), zooming in and out, and fitting a page to the width and/or height of a screen are included in these display functions, as well as the functions of page scrolling, indicating the page thickness, page numbering, and text hiding. The page scroll function controls the movement of the viewing window up and down, and left and right, and the textbook’s thickness. The page number function shows the changes in the textbook’s thickness, depending on the number of pages. The text hiding function includes the functions of either hiding or showing certain portions of the contents, such as answers or explanation materials. When zooming in, some portions of the content may extend beyond the screen, for which horizontal and vertical scrolling functions are made available as needed. Input Functions. The input functions control data entry with several input devices such as a pen, keyboard, or mouse, and include the stylus writing, memo, note, underline, highlight, voice memo, textbox, and formula functions. The stylus writing allows users to write onto any portion of the DT. The memo and note functions enable the entry, editing, saving, opening, deletion, and viewing of memos and notes. The functions of selecting a memo pen color and creating a table of memo contents, as well as the functions of indicating the date and time of notes and saving notes as images or text files, are also defined in these memo and note functions. The underline function includes the functions of editing, saving, and deleting underlined text by selecting the color, shape, and thickness of the line, and the highlighting function using a highlight pen. The input functions also contain functions related to voice memos, textbox inputting, the creation of an input box in which users can enter the answers to questions in formative evaluation, and the formula edit function. Fig. 1 shows examples of these input functions.
Development of a Digital Textbook Standard Format Based on XML
369
Fig. 1. Examples of the stylus writing, memo, note, underline, and formula functions
Move Functions. The move functions allow users to quickly and easily move to a certain location in a DT using the Table of Contents (TOC), the Table of Bookmarks, and the Table of Tags, entering a page number, moving to a page using the previous/next button, browseing, and page turning. The move functions also include the bookmark function, with which users can save the location of certain contents that they want to remember in the DT. The log-out function is also included in the move function to provide a related functionality by which users can move quickly to the last page that they were studying when they open the textbook. Fig. 2 shows examples of the move function.
Fig. 2. Examples of moving to a page via the Table of Contents, the page number, the scroll bar, and the arrow bar
Search and Print Functions. Users can navigate around DTs more conveniently via the search functions than the move functions. Two main search methods, text search and multimedia search, are supported. That is, users can find specific content or multimedia information not only within a DT, but also among several DTs by entering any text word or multimedia object. The print functions let the users output the contents of a DT by page, section, or chapter, or even the whole book, via a printer or other printing device. User-created memos and notes can also be printed, in full or in
370
M. Kim et al.
part. As well, the copy function is defined to allow copying of a specific part of a DT such as text, images and memos to a word processor or other program. Multimedia Support Functions. The multimedia support function is to help users’ visual understanding through multimedia features to enhance learning experience. For example, teachers can use motion pictures to attract the interest of students at the beginning of the class as shown in the left screen of Fig. 3. In another case, with instructions involving diagrams or geometry, teachers can present such objects in 3D motion graphics, moving them right and left or up and down as shown in the right screen of Fig. 3. That is, the DT can enhance students’ understanding by presenting pictures or images in 3D graphics rather than in the 2D graphics of PTs. When a complex calculation is required in the teaching process, tools like a calculator and a specific program can be provided. The functions support multimedia objects such as pictures, voice, animations, graphics, and motion pictures. The multimedia support functions also define the functions for opening multimedia in a new window and for interactive multimedia to support interaction between the user and the DT. Fig. 3 shows examples related to the multimedia support functions. The multimedia objects can be selected and activated by using a mouse, electronic pen, or keyboard.
Fig. 3. Examples of motions pictures and 3D motion graphics
Learning Support Functions. These functions give users access to additional information to facilitate their understanding in the learning process using a DT. A hyperlink format for referring to specific contents inside and outside the DT is defined in this function. A text or a multimedia object, and websites in a specific location in the DT can be designated with hyperlinks. Hyperlinks can be shown to students in various formats, depending on the operating environment and services. These functions also include dictionary or glossary mechanisms to support the DT, including glossary search. Additional menus for each subject can be constructed, and therefore, this function defines the feature for formulating individually designed menus for each subject. In addition, functions for teachers to send data to individuals, groups, or the entire class are defined. Furthermore, as shown in Fig. 4, teachers can set up formative and summative evaluations for individuals, groups, or their whole class and they can also formulate statistical data based on the evaluation results. For example, a teacher can register formative evaluation questions to estimate the learning achievements of
Development of a Digital Textbook Standard Format Based on XML
371
students during the instruction process. The students can then solve the problems directly on the DT and send their answers to the teacher and the teacher can evaluate the answers immediately and return the results to the students with feedback. Moreover, the teacher can determine the contents of instruction to follow-up according to the results of the evaluation; that is, the teacher can perform individualized educational or instructional approach based on students’ specific level. Fig. 4 shows examples of the learning support functions. The left screen of Fig. 4 shows an example of a formative evaluation exam formed by a number of questions, whereas the right screen shows an answer sheet digitalized for summative evaluation. As such, the DT can perform various online evaluations.
Fig. 4. Examples of the learning support functions
4.2 XML-Based Document Format of the KDT Standard The XML document format consists of the basic information document format and the supplementary information document format. The basic information represents the digital textbook contents as XML documents to enable the use of the same contents in different environments. The format for supplementary information presents various additional information that users create while using a DT through its operating environment or a viewer. To manage such additional information and to use them with the same meaning in different environments, they are maintained in the XML document format. The XML-based document format for the KDT contents refers to the Document Type Definition (DTD) that defines the schema for the XML document structure to describe the contents of the original DT. The XML elements and attributes are defined as the DTD format. XML makes it possible to distinguish the contents of a document in terms of their meaning, rather than in terms of their style, and to express the contents hierarchically. An XML document does not include information on style, because it creates its style using eXtensible Stylesheet Language (XSL) or Cascading Style Sheet (CSS). Similarly, the DT functions are defined via XSL or CSS, so that the user can view the contents of the DT regardless of the type of the style sheet language that was used.
372
M. Kim et al.
Fig. 5. A hierarchical structure of the XML-based KDT
Basic Structure of the XML Document Format. Fig. 5 shows the hierarchical structure of the XML document of the KDT. The hierarchical structure of the XML document format starts with the top-level (i.e., root) element ‘kdt’. It is divided into the metadata region and the content region. The metadata region contains the basic information of the DT and the content region contains all learning contents of the DT. That is, the element ‘kdt’ consists of the ‘metainfo’ and ‘textbooks’ elements. The element ‘metainfo’ is formed by the logical and additional information of a document. The element ‘textbooks’ expresses the contents of one or more actual textbooks with a hierarchical structure. The element ‘dc-metainfo’ is divided into the 15 elements (dc: ***) defined by Dublin Core (the Dublin Core Metadata Initiative: http://dublincore.org/), and the element ‘x-metadata’ that the user can extend and define. The service unit of the KDT can consist of more than one textbook so that this can be composed of a collection of textbooks. The basic structure of a collection is constructed with the ‘cover’, ‘front’, ‘textbook(s)’ and ‘back’ element in a hierarchy, and one textbook is structured with the ‘cover’, ‘front’, ‘body’ and ‘back’ element. The element ‘body’ is the main part of a textbook, and consists of the ‘part(s)’, ‘chapter(s)’, ‘section(s)’, and ‘media object(s)’. Additional elements can be appended into the element ‘body’ and the ‘section’ can have ‘subsection(s)’.
Development of a Digital Textbook Standard Format Based on XML
373
Document Format for Basic Information. The basic information document format defines the functions of the KDT standard that should be presented in XML. First, the basic attributes that must be included in all XML elements are defined. Then the root element, the elements for the basic information, the elements related to metadata, and the elements that express the learning contents are defined. Common Attributes of XML Elements. All elements of the XML document format have common attributes. These are ‘id’ for reference, ‘role’ for distinguishing among elements, ‘lang’ to indicate the language used, ‘comment’ to describe the objective of the DT, ‘revision’ for change history of the DT, and ‘hdir’ and ‘vdir,’ which define the horizontal or vertical writing. Table 2 shows this common attributes. Table 2. Common attributes of XML elements Name id role lang comment revision hdir vdir
Type ID CDATA NMTOKEN CDATA (changed|added|deleted|none) (ltr|rtl) (ttb|btt)
Usage Identifier Distinguish of element Contents language Comments Revision Horizontal writing Vertical writing
The attribute ‘id’ is a unique identifier to distinguish elements. It can be used to construct a link in any place within the document. This attribute is declared using the ID type. In general, the value of the ‘id’ attribute can be omitted, except for some elements. The attribute ‘role’ is the extension of an element that specifies the requests of the style-sheet processing with keywords whenever necessary. It is also used when an additional explanation is required for an element. It is declared using the CDATA type. The attribute ‘lang’ describes the language of contents markup. Using this attribute, different fonts can be used in the style-sheet if necessary. It is declared with the NMTOKEN type. The attribute ‘comment’ can be used to describe the purpose of the DT. The attribute ‘revision’ can be used to specify the history of the content changes after publication of the DT. The attributes ‘hdir’ and ‘vdir’ are used to specify the horizontal direction or the vertical direction respectively in the presenting of the contents. The types of the attribute ‘hdir’ are ltr (left to right) and rtl (right to left), and the types of the ‘vdir’ are ttb (top to bottom) and btt (bottom to top). The actual presentation of the contents is processed by the style-sheet. Elements for Basic Information. We divided these into three categories and defined accordingly: root element, basic elements, metadata elements and learning content elements. Fig. 6 shows the definition of the element ‘kdt’. Other elements are also defined in the same way as the element ‘kdt’. Since the volume of all the elements is very large, the explanations, properties, and usage of other elements were excluded, and only the name and usage of the elements were described. Tables 3 and 4 show the XML elements of the basic information defined in the XML document format.
374
M. Kim et al.
Fig. 6. Definition of the element ‘kdt’ Table 3. Basic elements of the basic information Element kdt textbooks textbook cover isbn volid front preface
Usage Korean Digital Textbook Textbooks Textbook Cover ISBN Volume identifier Front matter Preface
Element foreword intro body back affiliation role vita stylesheet
Usage Foreword Introduction Body matter Back matter Author’s affiliation Role indicator Curriculum vita Print style sheet information
Development of a Digital Textbook Standard Format Based on XML
375
Document Format for Supplementary Information. During the use of a DT through its operating environment or browser, teachers or students can create various additional information. To save and manage such information and to use them with the same meaning in different environments, the XML document format for this additional information is defined independently from the XML document format for the DT contents. The XML document format for the supplementary information refers to the DTD that defines the schema of the XML document structure. The XML elements and attributes that are suggested in this paper can be defined either via the DTD format or the XML schema format. Additional formats for other information that are not defined in the KDT document format can be further defined, if necessary. The supplementary information is divided into, and defined accordingly as, the formats for the user certification; the formats for the saving of the stylus writing, memos, notes, underlines, highlights, voice memos, textboxes, and bookmarks; the formats for the glossary, formulas and additional menus for each subject. In this paper, the XML document structures for each format were not described due to the limited space. Table 5 shows the XML elements defined in the document format for the supplementary information. Table 5. Elements of the supplementary information Supple. Information User Authentication Stylus Writing Memo Note Underline Highlight Voice Memo Textbox Bookmark Glossary Formula Additional Menus
5 Conclusion A DT is not only a very important learning tool for the future of education, but is also an intelligent tool that can support better learning environments for students. For more advanced education, more improved tools, textbooks, and devices will be needed. As a consequence, many active researches on DTs have been done and many companies and research institutions have developed various types of DTs. However, most DT contents so far have used proprietary document formats, often making them incompatible. These proprietary formats and the lack of standards of DTs are major obstacles in the evolution of DTs. Accordingly, there is a need to establish a standard format for DTs in Korea. In response to such a need, we have developed a DT standard format for Korean elementary and secondary school textbooks, referred to as the ‘KDT standard format’. This standard integrates the advantages of traditional PTs with the advantages of multimedia learning functions, such as searching and navigation, as well as audiovisual,
376
M. Kim et al.
animation, and 3D-graphics and animated representations. In addition, the standard format supports interoperability among different providers and users by defining the XML-based document format. We believe that this study is valuable because this standard format can lead users into more active use of DT by accommodating the paradigm of PTs to maximum extent. It can also allow interoperability of DTs among different users and providers so that many business companies could freely participate in the development of DTs. Moreover, the KDT functions defined in this paper can be used as guidelines for the development of DTs by many publishers. Furthermore, taking part in many companies, digital contents will be enriched and more useful to learning so that the quality of DTs would be improved. Once DTs are actively used in classrooms, they should create a marked change in Korea’s educational environment by helping teachers go beyond unilateral teaching methods that are intensively based on teachers’ lectures and toward interactive and self-directed learning. To verify the effectiveness of the proposed KDT standard format and to formulate an improvement plan, a DT was developed for Grade 6 mathematics. For the experiment, test runs of the mathematics DT were carried out in four elementary schools for two semesters [23]. The experiment results showed that the mathematics DT was actively used in mathematics classes in a number of ways that were identical to how PTs are used. In addition, the results of the survey on the user satisfaction for the DT indicated that the overall satisfaction was very high and there were very positive responses on the DT usage and learning activity environments. Furthermore, the results of the learning achievement test showed the possibility that the more familiar with the use of DTs the students become, the greater the effectiveness of the DT usage class will be [23]. DTs continue to evolve. For the technical and pedagogical advancement of DTs, studies and test-runs must continue over an extended period. Many issues must be addressed before DTs could become widely used. These issues are related not only to the development of textbooks, but also to the changes in many aspects of the educational system, particularly the curriculum, the distribution systems, and education for teachers. Furthermore, previous experience shows that using DTs may create various new needs among teachers and students. Some of these issues may be addressed when they are recognized as the natural course of changes in society in accordance with the introduction to and application of DTs. Others may require proactive efforts from the Korean educational system and collaboration among the business and government sectors, particularly regarding technology issues. Acknowledgments. This work was supported by the MEST and KOTEF through the Human Resource Training Project for Regional Innovation and by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (The Regional Research Universities Program and Chungbuk BIT Research-Oriented University Consortium).
References 1. Tonkery, D.: E-Books Come of Age with Their Readers. Research Information 24, 26–27 (2006) 2. Yim, K.: Future Education and Digital Textbook. Journal of Korea Textbook Research, Korea Textbook Research Foundation 51, 6–12 (2007)
Development of a Digital Textbook Standard Format Based on XML
377
3. Byun, H., Cho, W., Kim, N., Ryu, J., Lee, G., Song, J.: A Study on the effectiveness measurement on Electronic Textbook, Korean Education & Research Information Service, Research Report CR 2006-38, Republic of Korea (2006) 4. Yoo, K., Yoo, J., Lee, S.: The present state of the standardization of digital textbooks. Review of Korean Institute of Information Scientists and Engineers 26(6), 53–61 (2008) 5. Son, B.: A Concept and the Possibility of Digital Textbook. Journal of Korea Textbook Research, Korea Textbook Research Foundation 5, 13–19 (2007) 6. Yun, S.: A Study on Some Problems in the Adapting of Digital Textbook. Journal of Korea Textbook Research, Korea Textbook Research Foundation 51, 20–26 (2007) 7. Kim, N.: Design and Implementation of Electronic Textbook for High School Based on XML, Master Thesis, Graduate School of Education, Yonsei University (2001) 8. Kim, S.: Development of a User Interface Prototype for Electronic Textbook System, Master Thesis, Department of Computer Science Education, Hanyang University (1998) 9. Byun, H., Yoo, K., Yoo, J., Choi, J., Park, S.: A Study on the Development of a Electronic Textbook Standard in 2005, Korean Education & Research Information Service, Research Report CR 2005-22, Republic of Korea (2005) 10. Kim, J., Kwon, K.: The Features of Future Education in Ubiquitous Computing Environment, Korean Education & Research Information Service, Research Report KR 2004-27, Republic of Korea (2004) 11. Son, B., Seo, Y., Byun, H.: Case Studies on Electronic Textbook, Korean Education & Research Information Service, Research Report RR 2004-5, Republic of Korea (2004) 12. Nelson, M.R.: E-Books in Higher Education: Nearing the End of the Era of Hype? EDUCAUSE Review 43(2), 40–56 (2008) 13. Ministry of Education & Human Resources Development (Before: Ministry of Science, Education and Technology), Strategy for Commercial Use of Digital Textbook, Republic of Korea (2007) 14. Hillesund, T., Noring, J.E.: Digital Libraries and the Need for a Universal Digital Publication Format. Journal of Electronic Publishing 9(2) (2006) 15. Open Publication Structure (OPS) 2.0 v0.984, http://www.idpf.org/2007/ops/OPS_2.0_0.984_draft.html 16. OpenReader Consortium, http://www.openreader.org/ 17. eBook Publication Structure (OEBPS) 1.2, http://www.idpf.org/oebps/oebps1.2/index.htm 18. Japan Electronic Publishing Association, http://www.jepa.or.jp/ 19. IMS Global Learning Consortium: Common Cartridge Working Group, http://www.imsglobal.org/commoncartridge.html 20. Microsoft Corporation, XAML Overview, http://msdn.microsoft.com/enus/library/ms747122.aspx 21. Coenraets, C., Evangelist, M.F.: An overview of MXML: The Flex markup language, Adobe Systems Incorporated, http://msdn.microsoft.com/en-us/library/ms747122.aspx 22. Adobe Systems Incorporated, Developing Applications in MXML, http://livedocs.adobe.com/flex/3/html/help.html?content=mxml _1.html 23. Lee, S., Yoo, J., Yoo, K., Byun, H., Song, J.: Design and Implementation of e-Textbook Based on XML. Journal of Korea Contents Association 6(6), 74–87 (2006)
A Pattern-Based Representation Approach for Online Discourses Hao Xu1,2, 1
2
College of Computer Science and Technology, Jilin University, China Department of Information Science and Engineering, University of Trento, Italy [email protected]
Abstract. Navigation online is one of the most common daily experiences in the research communities. Although researchers could get more and more benefits from the development of Web Technologies, most online discourses today are still the electronic facsimiles of traditional linear structured articles. In this paper, we propose a pattern-based representation approach to providing readers another efficient means to get into desirable knowledge units directly without being overwhelmed by additional detailed information. Our ultimate goal is to facilitate a new format of reading and search online for scientific discourses.
1
Introduction
Online publishing makes scientific publications much more easier to disseminate, navigate, and reuse in research communities. Moreover, the representation formats for online discourses are significant to efficiencies and effects. In the last decade, a handful of models targeting the research on scientific discourse representations were proposed based on Cognitive Coherence Relations [1] or the Rhetorical Structure Theory [2]. Nevertheless, there doesn’t appear to be an unified widely used discourse knowledge representation model, especially for the ubiquitous Scientific Knowledge [3] on Semantic Web. In this work, we will tackle this problems using a semantic pattern approach inspired from Pattern Language [4] by Christopher Alexander and Semantic Patterns [5] by Steffen Staab et al. We focus on how patterns can be applied to describe knowledge representation, composition, relation, and how semantic framework can be applied to categorize and retrieve that knowledge at both data and metadata levels which facilitates the visualization for readers. In the remainder of paper, we will firstly describe the related work and projects in scientific discourse representation in Section 2, and proceed with an outline of our pattern approach to the problem along with a case study in Section 3 and Section 4. Section 5 concludes.
This research was done in KnowDive Group, University of Trento, supervised by Prof. Fausto Giunchiglia.
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 378–384, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Pattern-Based Representation Approach for Online Discourses
2
379
Related Work
This section presents a succinct overview of existing dominant scientific publication representation models and projects. Conceptually, all of them share a similar representation form with the features of coarse-grained rhetorical structure, fine-grained rhetorical structure, relations, domain knowledge and shallow metadata support [6]. ABCDE Format is proposed by Anita de Waard et al., which provides an open standard and widely reusable format for creating rich semantic structures for the articles during writing. The ABCDE stand for ”Annotation”, ”Background”, ”Contribution”, ”Discussion”, and ”Entities” respectively [7]. Using this format, people can easily mark papers semantically, especially in the LaTex editing environment. Scholarly Ontologies Project1 led by Simon Buckingham Shum et al. in Open University aims at building and deploying a prototype infrastructure for making scholarly claims about the significance of research documents. ”Claims” are made by building connections between ideas. The connections are grounded in a discourse/argumentation ontology, which facilitates providing services for navigating, visualizing and analyzing the network as it grows. They also implemented a series of software such as ClaiMaker, ClaimFinder, ClaimBlogger and so on. SWRC(Semantic Web for Research Communities) Project2 specifies an ontology for research communities, which describes several entities related to research community like persons, organizations, publications and their relationships. It is widely used in a number of applications and projects such as AIFB portal, Bibster and the SemIPort project. [8] It aims at facilitating scientific resources’ distribution, maintenance, interlinking and reuse. SALT (Semantically Annotated LaTeX)3 is developed by Digital Enterprise Research Institute (DERI) Galway. It provides a semantic authoring framework which aims at enriching scientific publications with semantic annotations, and could be used both during authoring and post-publication. It consists of three ontologies, i.e. Document Ontology, Rhetorical Ontology, and Annotation Ontology [9] , which deal with annotating linear structure, rhetorical structure, and metadata of the document respectively. Liquid Publication Project4 aims to take advantage of achievement of these ongoing projects mentioned above and propose some possible solutions for managing ubiquitous Scientific Knowledge Objects during their creation, evolution, collaboration and dissemination. Also, it is dedicated to providing a viable means to generate semantic documents for scientific publications in a simple and intuitive way, which will be an extension and complement of ABCDE format and SALT essentially. 1 2 3 4
A pattern for scientific papers in our approach is described in Table 1 by sections as follows: Table 1. Description of Patterns for Scientific Publications Section Pattern Name Intent Document Structure Rhetorical Structure Metadata Related Entities Related Patterns Example
Description Meaningful descriptor of the pattern. Short statement which situation the pattern addresses. The linear structure of the traditional printable paper. The rhetorical structure hidden in a publication’s content. A categorized Metadata Schema for various Types of Scientific Publications [10]. Entities related to Research communities mentioned or annotated in publication’s content. A Classification of Related Patterns in our proposed Pattern Repository or other imported ontologies. Show that how to put this pattern into practice.
We provide an open-standard, widely (re)useable rhetorical structure for both authoring and post-publication, which is an extension of the ABCDE format and the Cell format5 for modeling different types of papers instead of being either too general or too specific. Here are some patterns for the papers we predefined in our Patterns Repository: 001. Theoretical theory development: Syllogistic Reasoning 002. Theoretical Modeling: Syllogistic Reasoning 003. Inductive Reasoning 004. Case Study 005. Positivistic Reasoning 006. Reasoning by Falsification 007. Vision Paper 008. Survey Paper 009. Evaluation Paper 010. Book Review 011. Critique 012. PhD Thesis ...... ...... ...... ......
4
Pattern Example: An Evaluation Paper Pattern
An Evaluation Paper Pattern is defined as follows: – Pattern Name: Evaluation Paper – Intent: Used for Evaluation Papers’ writing and reading. 5
Article of the Future http://beta.cell.com/index.php/2009/07/article-of-the-future/
A Pattern-Based Representation Approach for Online Discourses
381
– Document Structure: Section, Subsection, Acknowledgement, References. – Rhetorical Structure: Summary, Background, Problem, Algorithm, Data Set, Results, Discussion, References. – Metadata: There are seven categories of metadata for this Pattern, such as General, Life Cycle, Content, Technical, Intellectual Property, Reputation, and Meta-metadata. Metadata Specification for this pattern is proposed in [10]. – Related Entities: Entities of Person, Institution, Conference, Project and so on highlighted or annotated in the paper. – Related Patterns: The classification of Pattern Repository and other imported Ontologies/Classifications such as ACM Classification, SALT Ontology6 , and Bibliographic Ontology7 . – Example: A Large Scale Taxonomy Mapping Evaluation. URL: http://www.dit.unitn.it/ yatskevi/taxme1.pdf We take this example paper mentioned above as a case study and also tentatively propose our interface for scientific publications patterns.
Fig. 1. Document Structure
Figure 1 depicts the document structure of paper. It provides a traditional means of reading as a linear structure by sections. The rhetorical structure of the paper is illustrated in Figure 2. The paper is reorganized by a pattern of rhetorics including Summary, Background, Problem, Algorithm, Data Set, Results, Discussion, and References, which is a more integrated and linked structure. A reader can easily access in-depth information on a specific algorithm or experimental results for instance without being overwhelmed by additional explanatory details. In this case, an expert in Semantic 6 7
Mapping area could use this rhetorical structure facilitating his/her reading the concrete algorithm more directly and efficiently. In Figure 3, we describe the metadata schema for Evaluation Paper Pattern. A Metadata Schema comprises sets of attributes. An attribute represents a property of an object as a pair name-value, where the name of the attribute identities its meaning and the value is an instance of a particular data type. The data type
Fig. 2. Rhetorical Structure
Fig. 3. Metadata
A Pattern-Based Representation Approach for Online Discourses
383
of an attribute could be either a simple data type e.g. Integer, Float, Date or an entity type such as Person, Conference or Project. A reader could flexibly use these categorized metadata for search or navigation. We always provide attribute values with clickable hyperlinked URLs. Figure 4 lists sets of related entities. Entities are differentiated by both entity types and various relationships. For example, some of the listed conferences are ”PresentedIn” relation and others are ”ReferencedConference” relation. More detailed related entities haven’t been showed in this figure such as Person, Institution and Project. The last function tab is ”Related Patterns” which is described in Figure 5. Here we can clearly find the classification of related patterns in the Pattern Repository we proposed. Or we can optionally classify papers into other imported Classifications or Ontologies. Pattenized modularization is a viable means to externalize a Scientific Knowledge with hidden rhetorical structures and linked knowledge, which makes a feasibility for searching and navigating scientific publications more efficiently and semantically. We intend to extend our investigations to other different types of
Fig. 4. Related Entities
Fig. 5. Related Patterns
384
H. Xu
Scientific Publications and specify all these patterns in RDF (Resource Description Framework)8 in near future.
5
Conclusion
In this paper, we propose a pattern-based approach to solving problems of Scientific Publications’ representation. The new visualization format consisting of document structure, rhetorical structure, metadata, related entities and related patterns provide more choices to readers when they navigate and search certain desirable knowledge. For the future, we will refine our pattern repository and pattern structures. Our main goal is to help researchers read articles online more efficiently, and also make our pattern approach being in step with the development of Semantic Web enriched in tremendous magics.
References [1] Spooren, W.P.M., Noordman, L.G.M., Sanders, T.J.M.: Coherence relations in a cognitive theory of discourse representation. Cognitive Linguistics 4(2), 93–133 (1993) [2] Thompson, S.A., Mann, W.C.: Rhetorical structure theory: A theory of text organization. Technical report, Information Science Institute (1987) [3] Fausto, G., Ronald, C.: Scientific knowledge objects v.1. Technical report, University of Trento, Dipartimento di Ingegneria e Scienza dell’Informazione,Trento, Italy (2009) [4] Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, I., Schlomo, A., Alexander, C.: A pattern language: Towns, buildings, construction. Addison-Wesley, Boston (1977) [5] Maedche, A., Staab, S., Erdmann, M.: Engineering ontologies using semantic patterns. In: Proceedings of the IJCAI 2001 Workshop on E-Business & the Intelligent Web, Seattle, WA, USA, August 5 (2001) [6] Buckingham, T.C.S., Groza, S.T., Handschuh, S., de Waard, A.: A short survey of discourse representation models. In: Proceedings 8th International Semantic Web Conference, Workshop on Semantic Web Applications in Scientific Discourse, Washington, DC. LNCS, Springer, Berlin (2009) [7] de Waard, A., Tel, G.: The abcde format enabling semantic conference proceedings. In: SemWiki (2006) [8] Sure, Y., Bloehdorn, S., Haase, P., Hartmann, J., Oberle, D.: The swrc ontology - semantic web for research communities. In: EPIA, pp. 218–231 (2005) [9] Groza, T., Handschuh, S., M¨ oller, K., Decker, S.: Salt - semantically annotated latex for scientific publications. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 518–532. Springer, Heidelberg (2007) [10] Xu, H., Giunchiglia, F.: Scientific knowledge objects types specification. Technical report, University of Trento, Dipartimento di Ingegneria e Scienza dell’Informazione, Trento, Italy (2009) 8
RDF: urlhttp://www.w3.org/RDF/
A Fault Tolerant Architecture for Transportation Information Services of E-Government Woonsuk Suh1, Boo-Mann Choung1, and Eunseok Lee2 1 National Information Society Agency NIA Bldg, 77, Mugyo-dong Jung-ku Seoul, 100-775, Korea [email protected], [email protected] 2 School of Information and Communication Engineering, Sungkyunkwan University 300 Chunchun Jangahn Suwon, 440-746, Korea [email protected]
Abstract. Many governments have been spreading Intelligent Transportation Systems (ITS) nationwide based on the National ITS Architecture. The real time information of transportation is one of the key services of an electronic government (e-government) and the ITS. The ITS consists of advanced communications, electronics, and information technologies to improve the efficiency, safety, and reliability of transportation systems. The core functions of the ITS are collection, management, and provision of real time transport information, and it can be deployed based on the Common Object Request Broker Architecture (CORBA) of the Object Management Group (OMG) efficiently because it consists of interconnected heterogeneous systems across national and local governments. Fault Tolerant CORBA (FT-CORBA) supports real time requirement of transport information stably through redundancy by replication of server objects. However, object replication, management, and related protocols of FT-CORBA require extra system CPU and memory resources, and can degrade the system performance both locally and as a whole. This paper proposes an improved architecture to enhance performance of FT-CORBA based ITS by generating and managing object replicas adaptively during system operation with an agent. The proposed architecture is expected to be applicable to other FT-CORBA based systems for an e-government. Keywords: CORBA, E-Government, Fault Tolerance, Transportation.
through many regions to reach their destinations. Second, travelers should be able to receive real time information from many service providers, while driving at high speed and transport information should be able to be collected and transmitted to them in real time. Third, the update cycle of transport information to travelers is 5 minutes internationally, such as Vehicle Information and Communication System (VICS) in Japan [19]. The ITS is deployed by various independent organizations and therefore is operated on heterogeneous platforms to satisfy the characteristics, functions, and performance requirements described earlier. FT-CORBA with stateful failover is needed to satisfy real time requirements of transport information considering the update cycle of 5 minutes. In stateful failover, checkpointed state information is periodically sent to the standby object so that when the object crashes, the checkpointed information can help the standby object to restart the process from there [18]. FT-CORBA protocols need additional CORBA objects such as the Replication Manager and Fault Detectors, server object replicas, and communications for fault tolerance, and therefore require accompanying CPU and memory uses, which can cause processing delays, thereby deteriorating the performance. Processing delay can be a failure for real time services of transportation information. This paper proposes an agent based architecture to enhance the performance of FT-CORBA based ITS. Due to the real time and composite characteristics of ITS, the proposed architecture is expected to be applicable to most applications. In section 2, CORBA based ITS and FT-CORBA related work are presented. In section 3, the proposed architecture introduces an agent to enhance performance of FT-CORBA based ITS. In section 4, the performance of the proposed architecture is evaluated by simulation focused on usage of CPU and memory. In section 5, this research is concluded and future research directions are presented.
2 Related Work The physical ITS architectures established by the countries mentioned earlier are summarized as Fig. 1 [12]. Centers
There are several representative CORBA based ITS worldwide. The Beijing Traffic Management Bureau (BTMB) in China had built an ITS using IONA's Orbix 2000 for the 2008 Olympic Games [10]. The Los Angeles County in US coordinates multiple traffic control systems (TCSs) on its arterial streets using a new Information Exchange Network (IEN) whose network backbone is CORBA software [3]. The Dublin City Council in Ireland has selected IONA Orbix™ as the integration technology for an intelligent traffic management system [10]. The Land Transport Authority in Singapore performed the ‘traffic.smart’ project, which is based on CORBA [8]. The Incheon International Airport in Korea had built information systems including ITS based on IONA Orbix 2.0 [11]. The ISO documented ISO TR 24532:2006 which clarifies the purpose of CORBA and its role in ITS [9]. It provides some broad guidance on usage, and prepares the way for further ISO deliverables on the use of CORBA in ITS. The Object Management Group (OMG) established the FT-CORBA which enhances fault tolerance by creating replicas of objects in information systems based on the CORBA. The standard for FT-CORBA aims to provide robust support for applications that require a high level of reliability, including applications that require more reliability than can be provided by a single backup server. The standard requires that there shall be no single point of failure. Fault tolerance depends on entity redundancy, fault detection, and recovery. The entity redundancy by which this specification provides fault tolerance is the replication of objects. This strategy allows greater flexibility in configuration management of the number of replicas, and of their assignment to different hosts, compared to server replication [17]. End-to-end temporal predictability of the application’s behavior can be provided by existing real-time fault tolerant CORBA works such as MEAD and FLARe [1][2][14]. However, they also adopt replication styles of FT-CORBA mentioned earlier as they are. Active and passive replications are two approaches for building faulttolerant distributed systems [5]. Prior research has shown that passive replication and its variants are more effective for distributed real time systems because of its low execution overhead [1]. In the WARM PASSIVE replication style, the replica group contains a single primary replica that responds to client messages. In addition, one or more backup replicas are pre-spawned to handle crash failures. If a primary fails, a backup replica is selected to function as the new primary and a new backup is created to maintain the replica group size above a threshold. The state of the primary is periodically loaded into the backup replicas, so that only a (hopefully minor) update to that state will be needed for failover [7]. The WARM_PASSIVE replication style is considered appropriate in ITS in terms of service requirements and computing resource utilization. In practice, most production applications use the WARM PASSIVE replication scheme for fault tolerance. It is recommended in the field of logistics according to FT-CORBA specification as well. However, a method is required to maintain a constant replica group size efficiently. FT-CORBA protocols need additional CORBA objects such as the Replication Manager and Fault Detectors, server object replicas, and communications for fault tolerance, and therefore require accompanying CPU and memory uses, which can cause processing delays, thereby deteriorating the performance. Processing delay can be a failure for real time services of transportation information. Natarajan et al. [16] have studied a solution to dynamically configure the appropriate replication style,
388
W. Suh, B.-M. Choung, and E. Lee
monitoring style of object replicas, polling intervals and membership style. However, a method to maintain minimum number of replicas dynamically and autonomously, which means adjusting “a threshold” specified in the warm passive replication style for resource efficiency and overhead reduction of overall system, needs to be developed and improved.
3 Proposed Architecture The FT-CORBA can be represented as Fig. 2 when an application uses the WARM PASSIVE style. 10 11
Client CORBA ORB Fault Notifier CORBA ORB
Fault Detector CORBA ORB
Fault reports
Fault Detector CORBA ORB Factory CORBA ORB
poll
6
poll
Secondary
the same protocols as the primary applied
Application Manager CORBA ORB 1
Fault notifications
APP. Object CORBA ORB
Naming Service CORBA ORB
12
APP. Object CORBA ORB Primary 3 5 2
Replication Manager CORBA ORB
4 9
6
poll
Fault Detector CORBA ORB Factory CORBA ORB 8
Check Point Server CORBA ORB
7:create IOGR
Fig. 2. FT-CORBA Protocol
The processes of Fig. 2 are summarized as follows [15]. 1. An application manager can request the Replication Manager to create a replica group using the create object operation of the FT-CORBA’s Generic Factory interface and passing to it a set of fault tolerance properties for the replica group. 2. The Replication Manager, as mandated by the FT-CORBA standard, delegates the task of creating individual replicas to local factory objects based on the Object Location property. 3. The local factories create objects. 4. The local factories return individual object references (IORs) of created objects to the Replication Manager. 5. The Replication Manager informs Fault Detectors to start monitoring the replicas. 6. Fault Detectors polls objects periodically. 7. The Replication Manager collects all the IORs of the individual replicas, creates an Interoperable Object Group References (IOGRs) for the group, and designates one of the replicas as a primary. 8. The Replication Manager registers the IOGR with the Naming Service, which publishes it to other CORBA applications and services. 9. The Replication Manager checkpoints the IOGR and other state. 10. A client interested in the service contacts the Naming Service. 11. The Naming Service responds with the IOGR. 12. Finally, the client makes a request and the client ORB ensures that the request is sent to the primary replica. The Fault Detector, Application Object, and Generic Factory in Fig. 2 are located on the same server.
A Fault Tolerant Architecture
389
The administrator of ITS can manage numbers of object replicas with the application manager in Fig. 2 by adjusting fault tolerance properties adaptively. However, administration of ITS needs to be performed autonomously and adaptively with minimal intervention by the administrator. In addition, the use of system CPU and memory resources in FT-CORBA is large, which can affect the real time characteristics of ITS due to processing delays because FT-CORBA is an architecture to enhance fault tolerance based on the redundancy of objects. Accordingly, it is possible to enhance efficiency and prevent potential service delays if an autonomous agent (FTAgent) is introduced to the FT-CORBA based ITS, which adjusts the minimum numbers of object replicas autonomously and adaptively. It can be applied to other applications based on FT-CORBA. An autonomous agent is a system situated within and a part of an environment that senses that environment and acts on it, over time, in pursuit of its own agenda, and so as to effect what it senses in the future [6]. The FTAgent has algorithm and database [13] which can help to maintain the number of replicas efficiently because they require system CPU and memory resources both directly and indirectly, which can lower performance in terms of the overall ITS. The FTAgent is introduced in Fig. 3 on the same system as the Replication Manager in Fig. 2 which maintains 3 replicas for each object in this paper, i.e., the primary, first secondary, and second secondary replicas. 5 Manager Object
2
4
8
CORBA ORB 1 Fault notifications
Database
9 Replication Manager
Check Point Server CORBA ORB
Agent
CORBA ORB
Database
7:create IOGR
Fig. 3. Architecture to improve FT-CORBA
The FTAgent maintains its DB to support the Replication Manager for management of object replicas whose schema is as shown in Table 1. Table 1. DB maintained by the FTAgent IOGR IDs date(dd/mm/yy) time failure 1 failure 2 flag riskyk NoROR 1 01/01/10 00:00:00~00:04:59 0 0 0 0 1 1 · 00:05:00~00:09:59 0 0 0 0 1 1 · · · · · · · 1 · 23:50:00~23:54:59 1 1 1 10 0 1 01/01/10 23:55:00~23:59:59 1 1 1 11 0 1 02/01/10 00:00:00~00:04:59 1 0 0 0 1 1 · · · · · · · 1 31/01/10 23:55:00~23:59:59 0 1 0 0 1 · · · · · · · · 100 31/01/10 23:55:00~23:59:59 0 1 0 0 1
390
W. Suh, B.-M. Choung, and E. Lee
The IOGR IDs identify replica groups of each object whose numbers are 100 in this paper. The numbers of records in Table 1 are maintained to be under 1 million because values of the time attribute of Table 1 are measured by 5 minutes per day. The date identifies days of one month. The time is measured every 5 minutes. The failure 1 means failures of primary object replicas which are original or recovered from previous failures. The failure 2 means failures of first secondary replicas after becoming the primary ones. The values of failure 1 and failure 2 are 0 for working and 1 for failed, respectively. The flag has two values which are 0 when primary or first secondary is working and 1 when both primary and first secondary have failed for respective 5 minutes as a service interval. The riskyk is a fault possibility index for object groups, which is assigned to each interval of 5 minutes for one hour backward from current time, and is set to zero at first. The k and riskyk are equivalent and they ranges from 0 to 11 because the flag is set to 1 up to a maximum of 11 times for one hour. The values are assigned in the way that 11 and 0 are assigned to the nearest and furthest intervals of 5 minutes to current time, respectively. The FTAgent searches the DB managed by Replication Manager and updates states (failed or working) of primary and first secondary replicas of each object (1~100) on its own DB in real time resuming every 5 minutes which ranges from previous to next middles of the information service interval of 5 minutes, restricted to one month (last 30 days) from current time. Search intervals are set between the respective middles of the former and latter service intervals because the moment of updating transport information is more important than any other time. The FTAgent identifies whether there are simultaneous failures of primary and first secondary replicas of each object by searching its DB in real time. Object faults of ITS result from recent short causes rather than old long ones because it is influenced by road situations, weather, and traffic, etc., which vary in real time. If simultaneous failures for 5 minutes have originated for one month until now that the first secondary replica crashes, which has been promoted to the primary as soon as the original primary one has failed, and it is in the rush hours, the FTAgent requires the Replication Manager to adjust the number of replicas of relative objects to 3 or 2, otherwise to reduce it to 2. In other words, the FTAgent lets the Replications Manager adjust the number of object replicas autonomously and adaptively. The decision by the value of the parameter rush hours of whether it is in the rush hours is beyond this paper and depends on judgment in terms of traffic engineering. The algorithm of the FTAgent is described as follows. FTAgent(int rush hours){ while(there is no termination condition){ (1) search whether primary replicas of each object are working on the DB maintained by Replication Manager (RM) in real time resuming every 5 minutes which ranges from previous to next middles of the information service interval of 5 minutes, restricted to last 30 days from current time; (2) if(primary replica is working){failure 1=0 for all object groups identified by IOGRs; flag=0;}
A Fault Tolerant Architecture
391
(3) else{failure 1=1 for all object groups; (4) confirm whether first secondary of each object promoted to primary by RM is working on the RM DB; (5) if(first secondary is working){failure 2=0;flag=0;} (6) else{failure 2=1; (7) confirm whether the replica created by RM, substituting for crashed primary is working; (8) if(it is working){failure 1=0; flag=0;} (9) else flag = 1;}} (10)Decision_Number_of_Replicas(rush hours);}} Decision_Number_of_Replicas(int rush hours){ (11)an array for numbers of two successive 1’s of flag values for all object groups=0; (12)search successions of two 1’s in flag values for all object groups; (13)if(there are two successive 1’s of flag values) add to the number of two successive 1’s of flag values for relevant objects; (14)if{(number of two successive 1’s ≥ 1 for last one hour)and(rush hours)}{ (15) NoROR=[3-3 {max(riskyk)/11}]/3;NoROR1=NoROR;
select the smaller one between NoROR1 and NoROR2, round it off, and assign the result to NoROR; (18) let RM keep the number of relevant object replicas minus NoROR, whose selection is the order of their ID numbers;} (19)else if{(number of separate 1’s≥2 for last one hour)and(rush hours)}{ (20) if(min|ti-tj|<5minutes)let RM keep the number of relevant object replicas 3; (21) else let RM reduce the number to 2;} (22)else let RM reduce the number to 2 which mean the two of the 3 replicas which are working at the moment and whose priority for selection is the order of their ID numbers;} In line (15), NoROR stands for the number of reduced object replicas and in line (16), NoRORd means the minimum number of reduced object replicas in the time slots of 5 minutes at each day for last 30 days. In line (20), ti and tj mean the time when flag values are 1, respectively. The proposed architecture in this paper can be applied to the work such as MEAD and FLARe to increase resource availability and decrease overheads by enhancing utilization efficiency of CPU and memory, thereby improving end-to-end temporal predictability of the overall system.
392
W. Suh, B.-M. Choung, and E. Lee
4 Evaluations The items for performance evaluation are total time of CPU use and maximum usage of memory of servers related to the 11 processes except for the 12th process in Fig. 2 from the beginning to termination of the simulation of two types to maintain 3 and 2 object replicas for fault tolerance [4]. The simulation has been performed on the PC with Intel Pentium Dual CPU 2.16 GHz, 1.96 GB memory, and Windows XP as the OS. The programs are implemented in Visual C++ 6.0. Firstly, the use rate of CPU during simulation is 100% on the implementation environment, and therefore it is appropriate to measure and compare total times of CPU use from beginning to termination of the simulation programs of two types. They must be measured for all servers related to creation and maintenance of object replicas in Fig. 2. The processes without numbers on arrows in Fig. 2 are not considered. Accordingly, the number of CPUs to be considered is 11. Secondly, the peak usage is more appropriate for memory rather than continuous measurement of memory use. Therefore, the maximum usage of two types of 3 and 2 replicas is measured respectively. Total time of CPU use and maximum usage of memory are compared in that the Replication Manager maintains 3 and 2 replicas of objects respectively. Namely, the 11 processes prior to the client requesting services in Fig. 2 are simulated with 2 separate programs which describe the two types in terms of CPU and memory use. The components of the FT-CORBA are the same and therefore they are not designed in the programs in terms of memory use. The processing latencies with loops in the programs are set for the type of 3 replicas as follows: 1) latency between internal components: 2 sec. 2) latency between external components: 3 sec. 3) latency for the FTAgent to search the DB maintained by the Replication Manager and itself and to deliver related information to it : 5 sec. Of course, latencies directly related to creating and maintaining 2 replicas are set to two thirds of those for 3 replicas. The values of the established processing latencies are variable due to basic processes of the OS in the implementation environment, which is ignored because the variableness originates uniformly in simulations of both types to be compared. The conditions presented earlier are based on the experimental fact that the processing latency to select records which have the condition of the line (14) in the algorithm is about 3 seconds in case of the Oracle 9i DBMS which maintains 1 million records with 13 columns on IBM P650 with 4 CPUs of 1.24GHz and 12GB memory, and is 34 Km distant from a client. A commercial internet browser is used for an object to simulate usage of CPU and memory in creation and termination of 3 and 2 object replicas obviously. The object is executed 3 and 2 times by types and kept as processes until the simulation is finished. The items to be compared are total time of CPU use and maximum usage of memory from the beginning to termination of the simulation as mentioned earlier. The types of 3 and 2 replicas are simulated respectively by executing the relevant programs 5 times where the browser set to 3 different URLs is called 3 times consecutively. The function executing the browser set to 3 different URLs is assumed as an object. The URLs are www.nia.or.kr, www.springer.com, and www.ifip.org. The results for the total CPU time used are shown in Fig. 4.
A Fault Tolerant Architecture
393
ce s 35 in e 30 im t no it 25 uc ex E 20 15 10 5 0
1
2 3 replicas
3 2 replicas
4 5 Number of times of experiment
Fig. 4. Total time of CPU use in sec
The total time of CPU use ranges from 27.17 to 30.58 seconds for the type of 3 replicas. The arithmetic mean is 28.86 seconds and the standard deviation is 1.46 seconds, which is 5.4% based on the minimum of 27.17 seconds. On the other hand, the total time of CPU use ranges from 24.94 to 26.50 seconds for the type of 2 replicas. The arithmetic mean is 25.69 seconds and the standard deviation is 0.60 seconds, which is 2.4% based on the minimum of 24.94 seconds. The deviations result from basic processes of Windows XP, the properties of processed data, and a variable network situation, which causes deviations because the browser is called for an object. The performance improvement in terms of CPU is 10.98% through comparison of the values of the two arithmetic means. Accordingly, the improvement ranges from 0 to 10.98% whose lower and upper bounds correspond to simultaneous failures of 100% and 0% of primary and first secondary replicas, respectively. The results for maximum usage of memory are shown in Fig. 5. Peak of memory 160 140 120 100 80 60 40 20 0
1
3 replicas
2
2 replicas
3
4 5 Number of times of experiment
Fig. 5. Maximum usage of memory in MB
394
W. Suh, B.-M. Choung, and E. Lee
The peak of memory usage ranges from 129.25 to 137.95 MB for the type of 3 replicas. The arithmetic mean is 134.14 MB and the standard deviation is 3.55 MB, which is 2.75% based on the minimum of 129.25 MB. On the other hand, the peak of memory usage ranges from 87.02 to 95.78 MB for the type of 2 replicas. The arithmetic mean is 92.18 MB and the standard deviation is 4.12 MB, which is 4.73% based on the minimum of 87.02 MB. The deviations result from the same causes as in case of CPU described earlier. The performance improvement in terms of memory is 31.28% through comparison of the values of the two arithmetic means. Accordingly, the improvement ranges from 0 to 31.28% whose lower and upper bounds correspond to simultaneous failures of 100% and 0% of primary and first secondary replicas respectively. The simulation has been performed with other URLs of www.ieee.org, www.acm.org, and www.iso.org to investigate how much the properties of processed data and a variable network situation influence the results. The total CPU time used ranges from 30.45 to 36.95 seconds for the type of 3 replicas. The arithmetic mean is 33.92 seconds. On the other hand, the total time of CPU use ranges from 25.67 to 26.39 seconds for the type of 2 replicas. The arithmetic mean is 25.91 seconds. The performance improvement in terms of CPU is 23.61% through comparison of the values of the two arithmetic means. Accordingly, the improvement ranges from 0 to 23.61%. Therefore, the improvement is 12.63% higher than that with the previous URL. The peak of memory usage ranges from 127.28 to 139.93 MB for the type of 3 replicas. The arithmetic mean is 133.27 MB. On the other hand, the peak of memory usage ranges from 89.30 to 96.81 MB for the type of 2 replicas. The arithmetic mean is 92.36 MB. The performance improvement in terms of memory is 30.70% through comparison of the values of the two arithmetic means. Accordingly, the improvement ranges from 0 to 30.70%. Therefore, the improvement is 0.58% lower than that with the previous URL. To sum up, the influence of the properties of processed data and a variable network situation on the ratio of performance improvement in terms of CPU and memory is not abnormal although there is a difference in CPU because of network delay.
5 Conclusion The ITS which is one of key systems for the e-government can be deployed based on FT-CORBA efficiently considering heterogeneous and real time properties of it. However, improvement is needed to enhance performance of the ITS based on FTCORBA because it requires additional uses of CPU and memory for object redundancy. This paper has proposed an architecture to adjust the number of object replicas autonomously and adaptively with an agent of the FTAgent. In the future, additional research is needed as follows to optimize the number of object replicas in real environment of ITS. Firstly, the FTAgent can improve performance of its own over time by learning from statistical data related to recovery of replicas by objects such as the interval to check failures and their frequency, which means improvement of the line (14) through (22) of the algorithm. Secondly, the size of the DB maintained by the FTAgent has to be studied experimentally as well which is the record of failures for one month in this paper. It will be decided according to the characteristics of transpor-
A Fault Tolerant Architecture
395
tation information which generates in real time. The proposed architecture can be applied to implementing the National ITS Architectures established by countries mentioned earlier and to other FT-CORBA based systems for e-government because the ITS is a composite one which has properties of most applications.
References 1. Balasubramanian, J., Gokhale, A., Schmidt, D.C., Wang, N.: Towards Middleware for Fault-tolerance in Distributed Real-time and Embedded Systems. In: Meier, R., Terzis, S. (eds.) DAIS 2008. LNCS, vol. 5053, pp. 72–85. Springer, Heidelberg (2008) 2. Balasubramanian, J., Tambe, S., Lu, C., Gokhale, A.: Adaptive Failover for Real-time Middleware with Passive Replication. In: 15th Real-time and Embedded Application Symposium, pp. 1–10. IEEE, Los Alamitos (2009) 3. County of Los Angeles Department of Public Works, http://www.ladpw.org/TNL/ITS/IENWeb/index.cfm 4. FatihAkay, M., Katsinis, C.: Performance improvement of parallel programs on a broadcast-based distributed shared memory multiprocessor by simulation. Simulation Modelling Practice and Theory 16(3), 347–349 (2008) 5. Felber, P., Narasimhan, P.: Experiences, Approaches and Challenges in building Faulttolerant CORBA Systems. Transactions of Computers 54(5), 497–511 (2004) 6. Franklin, S., Graesser, A.: Is it an Agent, or just a Program?: A Taxonomy for Autonomous Agents. In: Jennings, N.R., Wooldridge, M.J., Müller, J.P. (eds.) ECAI-WS 1996 and ATAL 1996. LNCS, vol. 1193, p. 25. Springer, Heidelberg (1997) 7. Gokhale, A., Natarajan, B., Schmidt, D.C., Cross, J.: Towards Real-time Fault-Tolerant CORBA Middleware. Cluster Computing: The Journal on Networks, Software, and Applications Special Issue on Dependable Distributed Systems 7(4), 15 (2004) 8. Guan, C.C., Li, S.L.: Architecture of traffic.smart. In: 8th World Congress on ITS, ITS America, Washington, DC, pp. 2–5 (2001) 9. International Organization for Standardization: Intelligent transport systems - Systems architecture, taxonomy and terminology - Using CORBA (Common Object Request Broker Architecture) in ITS standards, data registries and data dictionaries. ISO TR 24532:2006 (2006) 10. IONA Technologies, http://www.iona.com/ 11. Lee, J.K.: IICS: Integrated Information & Communication Systems. Journal of Civil Aviation Promotion 23, 71–80 (2000) 12. Ministry of Construction and Transportation. National ITS Architecture, Korea (2000) 13. Nagi, K., Lockemann, P.: Implementation Model for Agents with Layered Architecture in a Transactional Database Environment. In: 1st Int. Bi-Conference Workshop on Agent Oriented Information Systems (AOIS), pp. 2–3 (1999) 14. Narasimhan, P., Dumitras, T.A., Paulos, A.M., Pertet, S.M., Reverte, C.F., Slember, J.G., Srivastava, D.: MEAD: support for Real-Time Fault-Tolerant CORBA. Concurrency and Computation: Practice and Experience 17(12), 1533–1544 (2005) 15. Natarajan, B., Gokhale, A., Yajnik, S.: DOORS: Towards High-performance Fault Tolerant CORBA. In: 2nd Distributed Applications and Objects (DOA) conference, pp. 1–2. IEEE, Los Alamitos (2000) 16. Natarajan, B., Gokhale, A., Yajnik, S., Schmidt, D.C.: Applying Patterns to Improve the Performance of Fault Tolerant CORBA. In: 7th International Conference on High Performance Computing, pp. 11–12. ACM/IEEE (2000)
396
W. Suh, B.-M. Choung, and E. Lee
17. Object Management Group: Fault Tolerant CORBA. CORBA Version 3.0.3 (2004) 18. Saha, I., Mukhopadhyay, D., Banerjee, S.: Designing Reliable Architecture For Stateful Fault Tolerance. In: 7th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT 2006), p. 545. IEEE Computer Society, Washington (2006) 19. Vehicle Information and Communication System (VICS), http://www.vics.or.jp/english/vics/index.html
Design and Implementation of Binary Tree Based Proactive Routing Protocols for Large MANETS Pavan Kumar Pandey and G.P. Biswas Department of Computer Science and Engineering Indian School of Mines, Dhanbad, Jharkhand [email protected], [email protected]
Abstract. The Mobile Ad hoc Network (MANET) is collection of connected mobile nodes without any centralized administration. Proactive routing approach is one of those categories of proposed routing protocol which is not suitable for larger network due to their high overhead to maintain routing table for each and every node. The novelty of this approach is to form a binary tree structure of several independent sub-networks by decomposing a large network to sub-networks. Each sub-network is monitored by selected agent node which is selected by several broadcasted regulations. Agent node maintains two routing information one for local routing within sub-network and another for routing through all other agent node. Initially whole network is divided in two nearly equal independent sub-networks, and this process follows up to feasible number of nodes in each sub-network. So in routing mechanism first source node checks for destination within sub-network then source send destination address to respective parent agent node if destination is not available in local routing, this process follows up to the destination node. This approach allowed any proactive routing protocol with scalability for every routing mechanism. The proposed approach is thoroughly analyzed and its justification for the connectivity through sub-networks, routing between each source to destination pair, scalability, etc are given, which show expected performance. Keywords: Wireless Ad Hoc Networks, Proactive Routing Protocol, Scalable Proactive Routing Protocol, Binary Tree Based Routing Protocol.
This kind of network is most usable for environments where nodes are frequently changing and low bandwidth available, such as moving vehicles in battlefield. Many restrictions should be well considered, such as limited power and bandwidth. One of the major issues related to MANET implementation is the routing. The major routing approaches are obtained into this area are the well-known DSDV [4] in proactive and the AODV [2] in reactive routing protocols. If the number of nodes increases, the routing overhead will increase rapidly. In order to support multi-hop routing [2], there are several different protocols have been proposed. There are different standards to divide these routing protocols in different category: proactive routing versus reactive routing or flat routing versus cluster based routing and so on. In proactive protocols [1, 3], routes between every pair of nodes are established in advance even if no data transmission is required. Routes to all destinations are updated periodically. These routing protocols maintain consistent, up-to-date routing information for every pair of source and destination in the network. In contrast, on demand (reactive) protocols [1, 3] establish the route to a destination only when data transmission is required. Thus, a node broadcast route request packet to the network and waits for the route reply message to form a route to the destination node. In flat routing protocol, each pair of source and destination communicate through peer-to-peer relationship and participating in routing equally. This routing is having poor performance in terms of scalability [8] in a certain extent. For example, DSDV [4, 5], DSR [7] and AODV [5] are typical flat routing protocols. The hierarchical [1, 6] routing protocol communicates through so many sub-networks forming by dividing the whole network into many logical areas. There are different routing strategies are used inside and outside the logical area. Compared with flat routing protocol, the hierarchical routing protocol possesses better performance and is propitious to support scalable networks. At present, the growing interest in wireless ad hoc network techniques has resulted in many hierarchical routing protocols such as ZRP [9] etc, each routing protocol having its own advantages and drawbacks. The novelty of this protocol to provide better scalability, low delay, and low normalized routing overhead in multi-hop wireless ad hoc networks. The main feature of this protocol is to divide large network in different independent clusters and use the proactive routing approach to construct routing table for each and every node. The network is divided in several sub-networks having feasible number of nodes and selects the agent node for each sub-network to interconnect all these sub-networks. In this protocol proactive routing approaches is used in both within sub network and inter sub-network routing. This approach can be used in organization having several different co-organizations which are required to communicate among them. So for better study of this paper we must aware about proactive routing protocols and some related previous works which is discussed in detail in next section. The rest of this paper is organized as follows: Next, Proactive routing approach, a detailed description of this protocol, illustrating the main features of the protocol’s operation, is given, the correctness of protocol and we analyze its performance complexity, finally last section presents our conclusion.
Design and Implementation of Binary Tree Based Proactive Routing Protocols
399
2 Proactive Routing Protocols and Related Works Proactive routing protocol periodically updating its routing information for network topology and every node eventually has consistent and up-to-date global routing information for the entire network. This approach has the advantages of timely exchanging network information such as available bandwidth, delay, topology etc. and supporting real-time services. Although it is not suitable for large scale networks since many unnecessary routes still need to be established and the periodic updating may increase routing overhead and communication overhead. In proactive routing family basically Destination Sequenced Distance Vector Routing (DSDV) [4] and Optimized Link State Routing Protocol (OLSR) [11] are included. Destination-Sequenced Distance Vector routing protocol [4, 5] (DSDV) is a typical routing protocol for MANETs, with minor modification on the Distributed Bellman-Ford algorithm. In DSDV, Every mobile node in the network maintains a routing table that records all possible destinations within the network, next hop to be visited and the total number of hops to each destination. Each route is tagged with a sequence number which is originated by the destination, indicating how old the route is. Updates are transmitted periodically or immediately when any significant topology change is detected. There are two ways of performing routing update: “full dump”, in which a node transmits the full information regarding to updating of routing table, and “incremental update”. All other nodes repeat this process until they receive an update with a higher sequence number to provide it with a fresh route again. To avoid fluctuations in route updates, DSDV employs a "settling time" data, which is used to predict the time when route becomes stable. An Optimized Link-State Protocol (OLSR) [11] another proactive routing protocol is already proposed. It is also suitable for smaller network, there are also scalability problems is available on larger networks. To improve the performance of OLSR routing algorithms there is need to combine features of distance-vector and linkstate schemes. This protocol is known as wireless routing protocol (WRP), which removes the counting-to-infinity problem and eliminates temporary loop without increasing the amount of control traffic. There are numerous modifications in DSDV already have been proposed so far to improve the performance of DSDV routing protocol in wireless ad hoc network. The primary attribute of DSDV are simplicity in terms of implementation, loop free to avoid the routing overhead and the storage overhead due to routing tables should be low. An efficient DSDV (Eff-DSDV) [10] protocol proposed to improve the performance of DSDV which overcomes the problem of stale routes. This protocol establish the temporary route through some neighbor nodes which nodes having valid route to particular destination when the link breakage is detected. DSDV-MC [13] protocol extends the DSDV routing protocol into a multiple-channel version. To maintain the consistency of routing tables in a dynamically varying topology, each node periodically transmits updates. A new proactive routing protocol called Multipath Destination Sequenced Distance Vector protocol. MDSDV [14] is totally based on the well defined single path protocol known as DSDV. This protocol extends the DSDV protocol up to multiple separate paths to each destination in the network. Two new fields called second hop and link-id are added in routing table. Both second hop and link-id which is
400
P.K. Pandey and G.P. Biswas
generated by the destination are used to get separate paths from any pair of source destination. One another approach to routing in the multi-hop wireless ad hoc networks is based on the Subarea Tree [15], which is originated by some root nodes. When a subarea tree is established, a logical subarea has already been established. So many several protocols are also proposed to achieve scalability [12] of routing protocol. Namely a subarea tree contains several logical subarea connected through interconnect node. The strategy in intra subarea as well as among root nodes and interconnect nodes is proactive routing; whereas the strategy of inter subarea is on-demand routing means reactive routing protocol.
3 Proposed Binary Tree Based Proactive Routing Protocol Proposed Binary tree based proactive (BTB-Proactive) routing approach is to overcome the problem of scalability of any proactive routing protocol using binary tree decomposition for multihop wireless ad hoc network. In this approach the whole network is divided in several sub-networks, those are connected in binary tree architecture and each sub-network is monitored by agent node. Every agent node maintains two routing information one is for local routing within sub-network and another for routing to other parent-child agent nodes in binary tree architecture. A route for every source to destination pair can be defined using any proactive routing protocol. According to this protocol first the network is divided in equal two parts and select agent nodes for those sub-networks then those sub-networks are again divided in two equal sub-networks and again select agent nodes for these sub-networks. In forming binary tree architecture agent nodes of current sub-networks are connected to the agent node of previously divided sub-network, in this way the whole network is converted in binary tree of several sub-networks. 3.1 Deployment of Network in Several Sub-networks This is the first step of proposed approach for proactive routing protocol. During this step the network is divided in several sub-networks which are connected in binary tree architecture. This step is purely oriented to scalability of proactive routing protocol because when the number of nodes in network is very large then routing process is very difficult in this type of network. Therefore to provide the scalability of routing approach the large network must be divided in sub-networks of feasible number of nodes. In this step first, the large network is divided in two nearly equal independent subnetworks. No nodes should be common in two or more sub-networks. If number of nodes in sub-networks is not feasible to perform routing efficiently using any proactive routing approach then the sub-network is again divided in two nearly equal subnetworks. This process follows up to each sub-network contains feasible number of nodes to perform routing mechanism perfectly. This value is not any standard value and it is not fixed, therefore it can be varied from network to network. This step gives the output of many independent sub-networks containing feasible number of nodes. 3.2 Selection of Agent Node for Each Sub-network This is the next step of proposed decomposition approach. This step is responsible to select the node to monitor the sub-network. Agent node is selected by auto discovery
Design and Implementation of Binary Tree Based Proactive Routing Protocols
401
process, which is little complex because it contains following steps first broadcasting a regulation which is basically a condition after satisfying this condition particular node is selected as a agent node this regulation is dependent on network characteristics it may be ID Number, number of neighbor nodes, residual energy, stability or other measure values. In this way the agent node of each sub-network is selected for monitoring the whole sub-network and other agent nodes. This node is also part of the sub network. Every sub-network should have agent node which connects its own sub-network to other sub-network. This node plays key role in this approach because it is responsible for routing among several subnetworks. In this decomposition approach every node keeps the information regarding to all nodes in sub-network. Agent node maintains two types of information first information is same as other nodes in sub-networks and another table is regarding to other selected agent node of several sub-networks. So after dividing network in several sub-networks every node maintain a routing table according any proactive routing table format. Table 1 describes all fields included in table maintained by each nodes in MANETs. Table 1. Field description of routing table of proposed routing approach Field
Description
Destination address
All nearest neighbors
Next Hop
Next node to be visited for particular destination.
Metric
Total no. of nodes to be visited up to destination.
Sequence No.
No assigned by destination represent how the route is stale
Settling time
Time when route to particular destination is stabilized.
After maintaining routing table every node having destinations of all nodes in particular sub-network with next hop to be visited and total cost up to those destination. Every route has a sequence number which is originated by destination. Always route with higher sequence number is preferred for routing if two nodes having same sequence number then better metric route is preferred. Based on the sequence number of the table update, it may forward or reject the table. In this way the agent node of each sub-network is selected for monitoring the whole sub-network and other agent nodes. This node is also part of the sub network. Data is also kept about the length of time between arrival of the first and the arrival of the best route for each particular destination. Based on this data a decision may be made to delay advertising routes which are about to change soon. 3.3 Establishment of Sub-networks Binary Tree In the process of establishing binary tree architecture, each agent node of network is connected to the agent nodes of its sub-network, we can see that every agent node has acquired the routing information of other nodes included in its sub-network and
402
P.K. Pandey and G.P. Biswas
others agent node. So through agent node every node in network has its own proper routing information. In binary tree architecture one network is divided in two equal independent part so agent node of network is connected to agent nodes of both subnetworks, then again this process is followed up to the feasible limit of nodes in this way it will form binary tree architecture of several sub-networks. In fig 1 binary tree architecture of several sub-networks which are connected through several agent nodes are presented.
Fig. 1. Binary Tree Architecture with connected agent node
Meanwhile, after having finished the process of establishing tree architecture, independent hierarchical architecture is formed. Every node in network is connected to all other nodes through several agent nodes of sub-networks. In network every node directly connected with other nodes within sub-networks and connected with all nodes of sibling sub-network through parent agent node, therefore every pair of nodes connected in network. 3.4 Maintenance of Whole Network Topology and Routing Mechanism Our proposed approach is basically related to provide scalability of network therefore it is based on partition of network and then applies any proactive routing approach for routing and updating mechanism. So for routing mechanism there are two steps is followed for finding proper route form source to destination. First, source node try to find destination within sub-network if it is available then source is directly connected to destination otherwise source send this destination address to agent node of this subnetwork and agent node communicate with parent agent node regarding to route to destination same process is followed up to destination. Based on the above-mentioned description, the routing mechanism proceeds as follows:
Design and Implementation of Binary Tree Based Proactive Routing Protocols
403
If the destination node of data packet is in sub-network of source node, source node will forward the packet to the destination directly. If the destination node of data packet is not in sub-network of source node, the node will forward the packet to its agent node. Then agent node sends this data packet to agent node of sibling subnetwork through parent agent node. Then current agent node will work as previous agent node, this process repeat many times up to the process when destination node match in the sub-network of any agent node then find route from source to destination. In this route one source node then all agent nodes up to destination is considered. The whole route from supposed source-destination pair is described in fig 2.
Fig. 2. Routing from source node to destination
In updating information every node identifies whether its neighbor node still exists or not through detecting the routing information sent periodically by its neighbor node, and vice versa. One of the most important parameters to be chosen is the time between broadcasting the routing information packets However when any new or substantially modified route information is received by a Mobile Hosts, the new information will be retransmitted soon subject to constraints imposed for damping route fluctuations effecting the most rapid possible dissemination of routing information among all the cooperating Mobile Hosts. The broken link may be detected by the layer protocol or it may instead be inferred if no broadcasts have been received for a while from a former neighbor a broken link is described by a metric infinite (i.e., any value greater than the maximum allowed metric). Since this qualifies as a substantial route change, such modified routes are immediately disclosed in a broadcast routing information packet. Sequence numbers defined by the originating Mobile Hosts are defined to be even numbers, and sequence numbers generated to indicate infinite metrics are odd numbers. In order to reduce the amount of information carried in these packets two types will be defined. One will carry all the available routing information called a “full dump”.
404
P.K. Pandey and G.P. Biswas
The other type will carry only information changed since the last full dump called an “incremental”. 3.5 Pseudo-code of Binary Tree Based Proactive Routing Protocol Procedure BTB-Proactive (G, S, D, R) Ref. 3 //visualize network as graph G (V,E), S is source node, D is destination node and R represent route from S to D node. { Procedure divide-network (V, v1, v2, k) Ref. 3.1 // v1 & v2 is number of nodes in divided sub-network and k indicates feasible number of nodes { If (V < =k) v1 = v2 = V; Else { If (V%2 = = 0) v1 = v2 = V/2 Else v1 = (V+1) / 2; v2 = (V / 2); } } Procedure select-agent (v, A, R) Ref. 3.2 // v number of nodes in sub-network, A is agent node and R is broadcasted regulation { For (i=1; i=< v; i++ ) { If (v[j].feature = Regulation (R) ) A= v[i]; End if } } Procedure Update (R) Ref. 3.4 // procedure for updating the route R due to change in topology. { Update.Metric(R); // New[ ]:Best route for current seq, // Old[ ]: Best route for previous seq if (R.seq == New[R.dest].Seq && R.Metric < New[R.dest].Metric) { New[R.dest] = R; New[R.dest].BestTime = Now; } else if (R.Seq > New[R.dest].Seq) else if (R.Seq > New[R.dest].Seq) { Old[R.dest] = New[R.dest]; // save best route of last seq no
Design and Implementation of Binary Tree Based Proactive Routing Protocols
405
New[R.dest] = r curr[r.dest].first_time = now; curr[r.dest].best_time = now; } end if end if Procedure Routing (G, S, D, R) Ref. 3.4 //visualize network as graph G (V,E), S is source node, D is destination node and R represent route from S to D node. { For (j=1; j=< K1; j++) // K1 is number of nodes in sub-network of source If (v[j] = D) R = {there exists (S, D) belongs to G: for every S, D belong to G} Else v[j] = A; Call Routing (G, A, D, R1) // new route is R1when N become source node. R= R U R1 End if } 3.6 Example A simplified example is given in Figure 3 and there are 16 mobile nodes. At first all the nodes are same and each node is with a unique ID. The line between two nodes denotes a
Fig. 3. Network with proper route for example of proposed approach
406
P.K. Pandey and G.P. Biswas
wireless link and these two nodes can communicate directly if both are present in same sub-network. Assume the condition of feasible number of node of MANET for proactive routing protocol is 4 so the network having 16 nodes is divided up to 4 nodes and binary tree formed among several sub-networks. Let node 6 want to send data packet to node 13 then firstly node 6 checks for node 13 in its own sub-network node 6 having node 1, 2, 4, 6, 7 in its sub-network so node 6 send this packet to corresponding agent node 4, then node 4 checks in other subnetwork of agent node 9 also because it is also agent node of parent sub-network. it is not available then node 4 send this packet to node 5, then node 5 work as a source node and find route to node 13 according to proposed approach and combined with previous path. Then finally this process stops and route from node 6 to node 13 is 6 Ù 4 Ù 5 Ù 13.
4 Performance Analysis of BTB-Proactive Approach The binary tree based proactive routing protocol is basically used for large MANET to provide scalability by dividing the whole network in several independent subnetworks. This approach can be used in organization where several different ad hoc networks are connected for communication therefore the performance of our proposed routing protocol play important role in routing approach. The performance is measured in terms of time complexity. There are some assumptions to determine the actual performance of algorithm. Suppose the number of nodes in network is n, and the feasible number of nodes in each MANET for proper routing using proactive routing protocol is k. this value also play important role in size of network because the network is divided up to the k value. There are mainly two operations are performed in procedure BTB-Proactive one is divide-network and other one is select-agent. Therefore the performance proposed protocol is dependent on number of sub-networks. Suppose we are not considering the value of k so the complexity of the proposed routing protocol is O (log2 n) because this protocol is based on binary tree architecture so the complete network is divided in two parts then again in two parts so the complexity will be same as complexity of binary tree. Selection of agent node is also performed in same way of binary tree operation so the total complexity of algorithm in terms of time complexity is O (log2 n) for n number of nodes in network.
5 Justification of Proposed Routing Protocol According to the protocol, initially every node in network having its own unique ID Then Can this approach act on the network and develop a numerous independent subnetwork including all nodes in network and can it extended up to any number of nodes? This question is concerned with the effectiveness of proposed approach and the correctness wants to be proved. The following theorem assures it 5.1 Theorems We visualize the network as a graph G (V, E), where V is the set of vertices and E is the set of edges. Assume G is connected then there are following situation is true for the network.
Design and Implementation of Binary Tree Based Proactive Routing Protocols
• • •
407
Through BTB-Proactive every v ∈ V of the divide-network (V, v1, v2, k) so v ∈ v1 ∪ v2 (i.e., ∉ v1 ∩ v2). After adding some nodes N total nodes become V+N then case i must be true for nodes V+N also. There exists a route R= {S Ai………. D} where S: source node, D: destination node, Ai: agent node. S, D, Ai belongs to V and i =1, 2, 3……n where n is number of sub-network.
5.2 Proof 1.
In the case we want to prove for the process of divide-network every node of network must be included in either of two sub-networks and any node can not be included in both sub-networks because divided sub-networks are independent. This case is proved by mathematical induction. For V=1 so no need to divide further then first situation is correct for V=1. Suppose above case is true for V= n, i.e. v[i] V for i=1, 2 . . .. . n and first case of given theorem is true for n nodes. So for V= n+1 above situation must be true because if n is odd so n+1 is even number and if n is even so n+1 is odd number for both case is already considered in procedure BTB-proactive. G is connected so this node will be considered in any sub-network so case 1 is also true for n+1 node also, i.e. in this approach no nodes is skipped and no nodes are common in two subnetworks.
2.
After adding some nodes N, total nodes become (V+N), if all V nodes are participated in some sub-network so all (V+N) nodes is also participated in some sub-network. In case of 1, it is already proved that all V nodes are participated in some sub-network; no nodes in V are skipped. After adding N nodes, using forming independent sub-network of whole network having (V+N) nodes, we can say that there exists y ∈ V participate divide-network (V+N, v1, v2, k) where z ∈ N, therefore there exists z ∈ v1 ∪ v2. Through the agent node in the sub-network of one node is connected to the sub-network of another node without skipping any node in total (V+N) nodes.
3.
According to this approach Route (S, D) = {S, A1, A2, A3……….D} exists, there into Aj (j = 1, 2, 3 . . .n) are agent nodes. This situation proves that every pair of nodes having a route using mathematical induction. For V=1 so there are no route are required so this case is true for V= 1. Suppose this case is true for V=n, i.e. source and destination form a route form source node S to destination node D through several agent nodes Aj (j=1, 2 . . n). So for V = n+1 so this extra node is neighbor of some node and it is also connected some nodes which are included in n number of nodes so after dividing the network this node must be included in any sub-network with previous n nodes therefore path must be established in n+1 nodes also through some other agent nodes.
408
P.K. Pandey and G.P. Biswas
6 Conclusion A scalable approach of any proactive routing protocol for multi-hop wireless ad hoc network is proposed. This approach divides the large network in several independent sub-networks which are connected in binary tree fashion through several agent nodes and thus supports any proactive routing schemes in dense and large scale ad hoc networks. It is also observed that the performance is better especially when the number of nodes in the network is higher by reducing routing overhead caused by high nodal density. The selection of agent node shows a great gains when the network is dense due to provide modularity in network. The correctness of the protocol indeed show that this approach provides a flexible routing framework for scalable routing over mobile ad hoc networks while keeping all the advantages introduced by the associated proactive routing scheme.
References 1. Spojmenovic, I.: Handbook of Wireless network and Mobile computing. WileyInterscience Publication, Hoboken 2. Chiang, C.C., Gerla, M.: Routing and multicast in multihop, mobile wireless networks. In: Proceedings of IEEE ICUPC 1997, San Diego, CA, October 1997, pp. 546–551 (1997) 3. Royer, E.M., Toh, C.-K.: A Review of Current Routing Protocols for Ad-hoc Wireless Mobile Networks. IEEE Personal Communications, 46–55 (April 1999) 4. Perkins, C.E.: Highly dynamic destination-sequenced distance-vector routing (DSDV) for mobile computers. In: Proceedings of ACM SIGCOMM, pp. 234–244 (1994) 5. Rahman, A.H.A., Zukarnain, Z.A.: Performance Comparison of AODV, DSDV and IDSDV Routing Protocols. In: Mobile Ad Hoc Networks (February 2004) 6. Pei, G., Gerla, M., Hong, X.Y., et al.: A wireless hierarchical routing protocol with group mobility. In: Proceedings of IEEE WCNC 1999, New Orleans, LA, September 1999, pp. 1538–1542 (1999) 7. Broch, D.J.J., Malts, D.: The Dynamic Source Routing Protocol for Mobile Ad Hoc Networks. IETF, Internet Draft: draft-ietf-manet-dsr-00.txt (March 1998) 8. Iwata, Chiang, C.-C., Pei, G., Gerla, M., Chen, T.-W.: Scalable Routing Strategies for Ad Hoc Wireless Networks. IEEE Journal on Selected Areas in Communications, Special Issue on Ad-Hoc Networks, 1369–1379 (August 1999) 9. Haas, Z.J., Peariman, M.R.: The performance of query control schemes for the Zone Routing Protocol. IEEHACM Transactions on Networking 9(4), 427–438 (2001) 10. Pei, G., Gerla, M., Chen, T.W.: An Efficient Destination Sequenced Distance Vector Routing Protocol for Mobile Ad Hoc Networks (2008) 11. Clausen, T., Jacquet, P., Laouiti, A., Minet, P., Muhlethaler, P., Qayyum, A., Viennot, L.: Optimized Link State Routing Protocol. IETF Internet Draft, draft- ietfmanet-olsr-06.txt (September 2001) 12. Pandey, P.K., Biswas, G.P.: Design of Scalable Routing Protocol for Wireless Ad hoc network Based on DSDV. In: Proceedings of ICMIS, January 2010, pp. 164–174 (2010) 13. Lee, U., Midkiff, S.F., Park, J.S.: A Proactive Routing Protocol for Multi-Channel Wireless Adhoc Networks, DSDV-MC (October 2003) 14. King, P.J.B., Etorban, A., Ibrahim, I.S.: A DSDV-based Multipath Routing Protocol for Mobile Ad-Hoc Networks (July 2003) 15. Liu, G., Shan, C., Wei, G., Wang, H.: Subarea Tree Routing (STR) in Multi-hop Wireless Ad hoc Networks (March 1997)
Extract Semantic Information from WordNet to Improve Text Classification Performance Rujiang Bai, Xiaoyue Wang, and Junhua Liao Shandong University of Technology Library Zibo 255049, China {brj,wangxixy,ljhbrj}@sdut.edu.cn
Abstract. Since a decade, text categorization has become an active field of research in the machine learning community. Most of the approaches are based on the term occurrence frequency. The performance of such surface-based methods can decrease when the texts are too complex, i.e., ambiguous. One alternative is to use the semantic-based approaches to process textual documents according to their meaning. In this paper, we propose a Concept-based Vector Space Model which reflects the more abstract version of the semantic information instead of the Vector Space Model for the text. This model adjusts the weight of the Vector Space by importing the hypernymy-hyponymy relation between synonymy sets and the Concept Chain in the WordNet. Experimental results on several data sets show that the proposed approach, conception built from Wordnet, can achieve significant improvements with respect to the baseline algorithm. Keywords: Text classification; document representation; Wordnet; conception based VSM.
1. Multi-Word Expressions with an own meaning like “European Union” are chunked into pieces with possibly very different meanings like “union”. 2. Synonymous Words like “tungsten” and “wolfram” are mapped into different features. 3. Polysemous Words are treated as one single feature while they may actually have multiple distinct meanings. 4. Lack of Generalization: there is no way to generalize similar terms like “beef” and “pork” to their common hypernym “meat”. While items 1 – 3 directly address issues that arise on the lexical level, items 4 rather addresses an issue that is situated on a conceptual level. To break through these limitations, work has been done to exploit ontologies for content-based classification of large document corpora. Gabrilovich et al. [5,6] applied feature generation techniques to text processing using ODP and Wikipedia. Their application on text classification has confirmed that background-knowledge-based features generated from ODP and Wikipedia can facilitate text categorization. Furthermore, their results show that Wikipedia is less noisy than ODP when used as knowledge-base. However, ODP and Wikipedia are not structured thesauri as WordNet[7], and therefore they cannot resolve synonymy and polysemy (two fundamental problems in text classification) directly. In this paper, we using WordNet tackle these issues. The rest of this paper is organized as follows. Section 2 overviews the related work. Section 3 introduces WordNet. In Section 4, we present the sense disambiguation-based text classification algorithm based on WordNet. Section 5 presents our experiment results and discussion. We summarize our work in Section 6.
2 Related Work To date, the work on integrating semantic background knowledge into text representation is quite limited, and the classification or clustering results are not satisfactory. The authors in [8,9] successfully integrated the WordNet resource for a document categorization task. They evaluated their methods on the Reuters corpus, and showed improved classification results with respect to the Rocchio and Widrow-Hoff algorithms. In contrast to our approach, Rodriguez et al. [8] and Urena-Lopez et al. [9] utilized WordNet in a supervised scenario without employing WordNet relations such as hypernyms and associative relations. Furthermore, they built the term vectors manually. The authors in [10] utilized WordNet synsets as features for document representation, and subsequent clustering. Word sense disambiguation was not performed, and WordNet synsets actually decreased clustering performance. Hotho et al. [11] integrated WordNet knowledge into text clustering, and investigated word sense disambiguation strategies and feature weighting schema by considering the hyponym relations derived from WordNet. Experimental results on the Reuters Corpus have shown improvements in comparison with the best baseline. However, their approach ignores the abundant structural relations within WordNet, such as hierarchical categories, synonymy and polysemy. In this paper, we tackle these issues.
Extract Semantic Information from WordNet
411
3 About WordNet WordNet is a lexical database for the English language. It groups English words into sets of synonyms called synsets, provides short, general definitions, and records the various semantic relations between these synonym sets. The purpose is twofold: to produce a combination of dictionary and thesaurus that is more intuitively usable, and to support automatic text analysis and artificial intelligence applications. The database and software tools have been released under a BSD style license and can be downloaded and used freely. The database can also be browsed online. WordNet was created and is being maintained at the Cognitive Science Laboratory of Princeton University under the direction of psychology professor George A. Miller. Development began in 1985. Over the years, the project received funding from government agencies interested in machine translation. 3.1 Database Contents As of 2006, the database contains about 150,000 words organized in over 115,000 synsets for a total of 207,000 word-sense pairs; in compressed form, it is about 12 megabytes in size. WordNet distinguishes between nouns, verbs, adjectives and adverbs because they follow different grammatical rules. Every synset contains a group of synonymous words or collocations (a collocation is a sequence of words that go together to form a specific meaning, such as "car pool"); different senses of a word are in different synsets. The meaning of the synsets is further clarified with short defining glosses (Definitions and/or example sentences). A typical example synset with gloss is: good, right, ripe -- (most suitable or right for a particular purpose; "a good time to plant tomatoes"; "the right time to act"; "the time is ripe for great sociological changes") Most synsets are connected to other synsets via a number of semantic relations. These relations vary based on the type of word, and include: Nouns hypernyms: Y is a hypernym of X if every X is a (kind of) Y (canine is a hypernym of dog) hyponyms: Y is a hyponym of X if every Y is a (kind of) X (dog is a hyponym of canine) coordinate terms: Y is a coordinate term of X if X and Y share a hypernym (wolf is a coordinate term of dog, and dog is a coordinate term of wolf) holonym: Y is a holonym of X if X is a part of Y (building is a holonym of window) meronym: Y is a meronym of X if Y is a part of X (window is a meronym of building) Verbs hypernym: the verb Y is a hypernym of the verb X if the activity X is a (kind of) Y (to perceive is an hypernym of to listen) troponym: the verb Y is a troponym of the verb X if the activity Y is doing X in some manner (to lisp is a troponym of to talk)
412
R. Bai, X. Wang, and J. Liao
entailment: the verb Y is entailed by X if by doing X you must be doing Y (to sleep is entailed by to snore) coordinate terms: those verbs sharing a common hypernym (to lisp and to yell) Adjectives related nouns similar to participle of verb Adverbs root adjectives While semantic relations apply to all members of a synset because they share a meaning but are all mutually synonyms, words can also be connected to other words through lexical relations, including antonyms (opposites of each other) and derivationally related, as well. WordNet also provides the polysemy count of a word: the number of synsets that contain the word. If a word participates in several synsets (i.e. has several senses) then typically some senses are much more common than others. WordNet quantifies this by the frequency score: in which several sample texts have all words semantically tagged with the corresponding synset, and then a count provided indicating how often a word appears in a specific sense. 3.2 Knowledge Structure Both nouns and verbs are organized into hierarchies, defined by hypernym or IS A relationships. For instance, the first sense of the word dog would have the following hypernym hierarchy; the words at the same level are synonyms of each other: some sense of dog is synonymous with some other senses of domestic dog and Canis familiaris, and so on. Each set of synonyms (synset), has a unique index and shares its properties, such as a gloss (or dictionary) definition.
At the top level, these hierarchies are organized into base types, 25 primitive groups for nouns, and 15 for verbs. These groups form lexicographic files at a maintenance level. These primitive groups are connected to an abstract root node that has, for some time, been assumed by various applications that use WordNet. In the case of adjectives, the organization is different. Two opposite 'head' senses work as binary poles, while 'satellite' synonyms connect to each of the heads via synonymy relations. Thus, the hierarchies, and the concept involved with lexicographic files, do not apply here the same way they do for nouns and verbs. The network of nouns is far deeper than that of the other parts of speech. Verbs have a far bushier structure, and adjectives are organized into many distinct clusters. Adverbs are defined in terms of the adjectives they are derived from, and thus inherit their structure from that of the adjectives.
Extract Semantic Information from WordNet
413
4 Text Classification Algorithm Based on WordNet In this section, we first present our proposed word sense disambiguation method WSDN (word sense disambiguation based on WordNet ). WSDN is based on the idea that a set of words co-occurring in a document will determine the appropriate senses for one another word despite each individual word being multiply ambiguous. A common example of this effect is the set of nouns base, bat, glove and hit. Although each of them has several senses, when taken together, the intent is baseball game, clearly. To exploit this idea automatically, a set of categories representing the different senses of words needs to be defined. A counter is maintained in each category, which counts the number of words that have its associated senses. The sense of an ambiguous word is determined by the category with the largest counter. Then, the nearest ancestors of the senses of all the nonstopwords are selected as the classes of a given document. 4.1 WSDN Construction Using each separate hierarchy as a category is well defined but too coarse grained. For example, in Figure 1, from 1 to 9 senses of “board” are in the {entity, thing} hierarchy. Therefore, WSDN is intended to define an appropriate middle level category.
Fig. 1. Nine different senses of the noun “board” in WordNet2.1
414
R. Bai, X. Wang, and J. Liao
To define the WSDN of a given synset, s, consider the synsets and the hyponymy links in WordNet as vertices and directed edges of a graph. Then, the WSDN of s is defined as the largest connected subgraph that contains s, containing only descendents of an ancestor of s, and containing no synset that has a descendent that includes another instance of a member of s as a member. A WSDN is represented by the root of the WSDN. Figure 2-5 illustrate the definition of WSDN, assuming synset s consists of k words w1, w2, w3…wk, and H1, H2, H3…Hn are n ancestors of s, where Hm is a father of Hm-1. Hm (1≤m ≤n) has a descendent synset which also includes wj (1≤ j≤ k) as a member. So, Hm-1 is one of the roots of the WSDNs of s, as shown in Figure 2.
H(m)
H(m-l) W(j) W(l)... W(k)
Fig. 2. Definition of WSDN1
If m is 1, s itself is the root, shown in Figure 3.
H(l)
W(j)
W(l)... W(k)
Fig. 3. Definition of WSDN2
If no such m is found, the root of WordNet hierarchy, r, is the root of the WSDN of s, as shown in Figure 4. If s itself has a descendent synset that includes wj as a member, there is no WSDN in WordNet for s, as shown in Case 5.
Extract Semantic Information from WordNet
415
Fig. 4. Definition of WSDN3
Fig. 5. Definition of WSDN4
Because some synsets have more than one parents, synsets can have more than one WSDNs. A synset has no WSDN if the same word is a member of both the synset and one of its descendents. For example, in Figure 1 the WSDN of synset “committee sense” of “board” is rooted at synset {group, grouping} (and thus the WSDN for that sense is the entire hierarchy where it occurs) because no other synset containing "board" in this hierarchy (Figure 4); the WSDN of “circuit_board” sense of “board” is rooted at {circuit, closed_circuit} because synset {electrical_device} has a descendent {control_panel, display_panel, panel, board} containing "board" (Figure 2), and the WSDN of “panel” sense of “board” is rooted at the synset itself because its direct parent {electrical_device} has a descendent synset {circuit_board, circuit_card, board, card} containing "board" (Figure 3). 4.2 Word Sense Disambiguation After the WSDNs for each synset in WordNet are constructed, they can be used to select the sense of an ambiguous word in a given text-document. The senses of the nouns in a text document in a given document collection are selected by using the following two-step process. A procedure, called marking (w), is fundamental to both of the steps. Marking (w) visits synsets and maintains a counter for each synset, which is increased by 1 whenever the synset is visited. Given a word w, what marking (w)
416
R. Bai, X. Wang, and J. Liao
does is to find all instances of w in WordNet, and then, for each identified synset s, follow the parent-child links up to the root of the hierarchy while incrementing the counter of each synset it visits. The first step of the two-step process is collectionoriented, that is, marking (w) is called for each occurrence of w in all the documents in the collection. The number of times marking (w) is called for each w is maintained by some counters. The first step produces a set of global counts (relative to this particular collection) at each synset. The second step is document-oriented, that is, marking (w) is called for each occurrence of w in an individual text document. Again the number of times marking (w) is called is maintained for the given individual document. The second step produces a set of local counts at the each synset. Given the local and global counts, a sense for a given ambiguous word w contained within a particular document is selected as follows:
Dif =
local _ visits global _ visits − local _ calls global _ calls
(1)
Difference is computed at the root of the WSDN for each sense of w. If a sense does not have a WSDN or if the local count at its WSDN root is less than 2, difference is set to 0. If a sense has multiple WSDNs, difference is set to the largest difference over the set of WSDNs. The sense corresponding to the WSDN root with the largest positive difference is selected as the sense of the word in the document. If no sense has a positive difference, no WordNet sense is chosen for this word. The idea behind the disambiguation process is to select senses from the areas of the WordNet hierarchies where document-induced (local) activity is greater than the expected (global) activity. The WSDN construct is designed to provide a point of comparison that is broad enough to encompass markings from several different words yet narrow enough to distinguish among senses.
5 Empirical Evaluation 5.1 Data Sets and Experimental Settings To test the proposed method, we used three different kinds of data sets: UseNet newsgroups (20 Newsgroups), web pages (WebKB), and newswire articles (Reuters 21578). For fair evaluation in Newsgroups and WebKB, we employed the five-fold cross-validation method. That is, each data set is split into five subsets, and each subset is used once as test data in a particular run while the remaining subsets are used as training data for that run. The split into training and test sets for each run is the same for all classifiers. Therefore, all the results of our experiments are averages of five runs. The Newsgroups data set, collected by Ken Lang, contains about 20,000 articles evenly divided among 20 UseNet discussion groups (McCallum & Nigam, 1998; Nigam et al., 1998). Many of the categories fall into confusable clusters; for example, five of them are comp.* discussion groups, and three of them discuss about religion.
Extract Semantic Information from WordNet
417
In this paper, we used only 16 categories after removing four categories: three miscellaneous categories (talk.politics.misc, talk.religion.misc, and comp.os.mswindows.misc) and one duplicate meaning category (comp.sys.ibm.pc.hardware).1 After removing words that occur only once and on a stop word list, the resulting average vocabulary from five training data has 43,249 words (no stemming). The second data set comes from the WebKB project at CMU (Craven et al., 2000). This data set contains web pages gathered from university computer science departments. The pages are divided into seven categories: course, faculty, project, student, department, staff, and other. As the data set was used in other studies (Joachims, 2001; Lanquillon, 2000; McCallum & Nigam, 1998; Nigam, 2001), we used the four most populous entity-representing categories: course, faculty, project, and student. The resulting data set consists of 4198 pages. The resulting average vocabulary from five training data has 18,742 words. The Reuters 21578 Distribution 1.0 data set consists of 12,902 articles and 90 topic categories from the Reuters newswire. Following other studies (Joachims, 2001; Nigam, 2001), the results of ten most populous categories were reported. To split train/test data, we followed a standard ‘ModApte’ split. We used all the words in the title and body, and we used a stop word list and no stemming. The vocabulary from training data has 12,001 words. About 25% of documents from training data of each data set were selected for a validation set. After all parameter values of our experiments were set from the validation set, we evaluated the proposed method using these parameter values. We applied a statistical feature selection method ( χ statistics) for each classifier at its preprocessing stage (Yang & Pedersen, 1997). As performance measures, we followed the standard definition of recall, precision, and F1 measures. For evaluation performance average across categories, we used the micro-averaging method (Yang, 1999). Results on Reuters are reported as precisionrecall breakeven points, which is a standard information retrieval measure for binary classification (Joachims, 2001; Yang, 1999). Each category c is associated with a classifier, which is based on the Naïve Bayes method (NB) and the Support Vector Machine (SVM). Both NB and SVM required a fixed (predefined) feature set, which was built using the χ2 (chi-square) weighting technique. The process of experiments as follows: Firstly, we use NB and SVM class three datasets based on traditional bag of words model. Secondly, documents preprocessed. All terms in the documents are replaced by its sense in the context (Sec4.2). After that, NB and SVM methods are applied. The resulting systems are named NB_WSDN, NB_BOC, SVM_WSDN, and SVM_BOC, respectively. 2
5.2 Results and Discussion Figure 6 shows the MicroF1 of SVM_WSDN and SVM_BOC separately. Figure 7 shows the MacroF1 of SVM_WSDN and SVM_BOC separately. After word sense disambiguation based on Wordnet does not provide the improvement achieved by the traditional Vector Space Model.
418
R. Bai, X. Wang, and J. Liao
1 0.8 o 0.6 r c i M 0.4 0.2 0 Reuters
20NG Datasets SVM_BOC
WebKB
SVM_WSDN
Fig. 6. MicroF1 for SVM_BOC and SVM_WSDN
1 0.8
o r 0.6 c a M 0.4 0.2 0 Reuters
20NG Datasets SVM_BOC
WebKB
SVM_WSDN
Fig. 7. MacroF1 for SVM_BOC and SVM_WSDN
Figure 8 shows the MicroF1 of NB_WSDN and NB_BOC separately. Figure 9 shows the MacroF1 of NB_WSDN and NB_BOC separately. This result gets obviously improvement. We find SVM method does not sensitive the word features. The reason is that SVM depends on support vectors not all the vectors.
Extract Semantic Information from WordNet
419
1 0.8 o 0.6 r c i M 0.4 0.2 0 Reuters
20NG Datasets NB_BOC
WebKB
NB_WSDN
Fig. 8. MicroF1 for NB_BOC and NB_WSDN
1 0.8 o 0.6 r c i M 0.4 0.2 0 Reuters
20NG Datasets NB_BOC
WebKB
NB_WSDN
Fig. 9. MacroF1 for NB_BOC and NB_WSDN
6 Conclusions and Future Work In this paper, we proposed a text classification method based on word sense disambiguation. In order to define an appropriate mid-level category for each sense, WSDN was implemented on WordNet. Then, each non-stopword in a given document was mapped to the concept hierarchy where each synset maintains a counter. The WSDN and the associated counters determined the intended sense of the ambiguous word. Our proposed sense-based text classification algorithm is an automatic technique to disambiguate word senses and then classify text documents. If
420
R. Bai, X. Wang, and J. Liao
this automatic technique can be applied in real applications, the classification of e-documents must be accelerated dramatically. It must be a great contribution to the management system of Web pages, e-books, digital libraries, etc. In our future work, we plan to construct a relation graph for each concept, which includes synonyms, hyponyms and associative concepts. The use of such graph can be useful to achieve an improved disambiguation process. Acknowledgments. This work was supported by Shandong Natural Science Foundation of China Project # ZR2009GM015.
References 1. Yang, Y., Lin, X.: A re-examination of text categorization methods. SIGIR, 42–49 (1999) 2. Han, E., Karypis, G.: Centroid-Based Document Classification Analysis & Experimental Result. In: Zighed, D.A., Komorowski, J., Żytkow, J.M. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 424–431. Springer, Heidelberg (2000) 3. McCallum, A., Nigam, K.: A Comparison of Event Models for Naïve Bayes Text Classification. In: AAAI/ICML, Workshop on Learning for Text Categorization (1998) 4. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002) 5. Gabrilovich, E., Markovitch, S.: Feature generation for text categorization using world knowledge. In: Proceedings of the 19th international joint conference on artificial intelligence, IJCAI 2005 (2005) 6. Gabrilovich, E., Markovitch, S.: Overcoming the brittleness bottleneck using Wikipedia: enhancing text categorization with encyclopedic knowledge. In: Proceedings of the 21st AAAI conference on artificial intelligence, AAAI 2006 (2006) 7. Miller, G.: WordNet: a lexical database for english. Communications of the ACM (1995) 8. de Buenaga Rodriguez, M., Gomez Hidalgo, J.M., Agudo, B.D.: UsingWordNet to complement training information in text categorization. In: The 2nd international conference on recent advances in natural language processing, RANLP 1997 (1999) 9. Urena-Lopez, L.A., Buenaga, M., Gomez, J.M.: Integrating linguistic resources in TC through WSD. Comput. Hum. 35, 215–230 (2001) 10. Dave, K., Lawrence, S., Pennock, D.M.: Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international World Wide Web conference, WWW 2003 (2003) 11. Hotho, A., Staab, S., Stumme, G.: Wordnet improves text document clustering. In: Proceedings of the semantic web workshop at SIGIR 2003 (2003) 12. Reuters-21578 text categorization test collection, Distribution 1.0. Reuters (1997), http://www.daviddlewis.com/resources/testcollections/reuters 21578/ 13. Hersh, W., Buckley, C., Leone, T., Hickam, D.: OHSUMED: an interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th annual international ACM-SIGIR conference on research and development in information retrieval (SIGIR 1994), pp. 192–201 (1994) 14. Lang, K.: Newsweeder: learning to filter netnews. In: Proceedings of the 12th international conference on machine learning (ICML 1995), pp. 331–339 (1995) 15. Joachims, T.: Text categorization with support vectormachines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)
Managing Ubiquitous Scientific Knowledge on Semantic Web Hao Xu1,2, 1
2
College of Computer Science and Technology, Jilin University, China Department of Information Science and Engineering, University of Trento, Italy [email protected]
Abstract. Managing ubiquitous scientific knowledge is a part of daily life for scholars, while it also becomes a hot topic in the Semantic Web research community. In this paper, we propose a SKO Types framework aiming to facilitate managing ubiquitous Scientific Knowledge Objects (SKO) driven by semantic authoring, modularization, annotation and search. SKO Types framework comprises SKO Metadata Schema, SKO Patterns and SKO Editor corresponding to metadata layer, ontology layer and interface layer respectively. SKO Metadata Schema specifies sets of attributes describing SKOs individually and relationally. SKO Patterns is a three-ontology based model in order to modularize scientific publications syntactically and semantically, while SKO Editor supplies a LaTex-like mark-up language and editing environment for authoring and annotating SKOs concurrently.
1
Introduction
During the past five decades, the theory of metadata has been developed in a variety of directions, typically in digital library area. The advent of Semantic Web has also had a significant impact on scientific publication management, especially on formal classification[1], information retrieval, archival preservation, metadata interoperability and so forth. A major concern in scientific publication research community today is to continue to improve semantics during a SKO’s entire lifecycle[2], i.e. creation, dissemination, evaluation, publication, and reuse, since the concept of ”Semantic Publishing”[3] was investigated quite intensively in recent years. In despite of existing editing tools and search engines for scientific publications have brought tremendous magic for researchers in their daily life, they have still not satisfied the ever-increasing demands for semantics and precision. Since a scientific paper is always preserved as an indivisible unit, there is still a major barrier for people to search and navigate certain fragments of SKOs directly, such as a table of the experiment results, a theoretical definition, or the ”contribution” section of a paper. Another issue is how to semantically enrich a SKO
This work was done in the course of ”Research Methodology” given by DISI, University of Trento and supervised by Prof. Fausto Giunchiglia.
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 421–430, 2010. c Springer-Verlag Berlin Heidelberg 2010
422
H. Xu
during authoring and post-publication, as well as mark up the relevant metadata and entities therein. Although a lot of effort is being spent on improving these weaknesses, the efficient and effective method has yet to be developed. In this paper, we introduce a tentative solution for authoring, annotating and search semantic documents. To begin with, we propose a SKO Metadata Schema which is an extension of current metadata standards, since there is still no specific metadata schema for SKOs at the time of this writing. Moreover, SKO Patterns consists of three ontologies, which are used to classify and modularize SKOs. A document ontology is used to break a paper into parts syntactically. A rhetorical ontology helps user match rhetorical blocks with paper parts semantically, while an annotation ontology is based on SKO metadata schema to annotate attributes and entities appearing in a paper. In the end, SKO Editor is at the implementation level for the afore-mentioned infrastructure. The paper proceeds as follow. In section 2, a short review of metadata, semantic technologies, and some state-of-the-art utilities for SKO management is given. Section 3 discusses some problems with semantic authoring, annotation and search in research community. A SKO Types framework is proposed in Section 4, along with its three components, i.e. SKO Metadata Schema, SKO Patterns, and SKO Editor. Section 5 contains some conclusions plus some idea for future work.
2
State of the Art
Metadata is generally defined as “data about data” or “information about data”, which is used to facilitate resource discovery, e-resources organization, interoperability, digital identification, archiving and preservation. There are three main types of metadata, i.e. descriptive metadata, structural metadata, and administrative metadata[4]. During the past fifty years, many metadata schemas are developed in a variety of disciplines. Standards for metadata in digital libraries include Dublin Core1 , METS (Metadata Encoding and Transmission Standard)2 , PREMIS (PREservation Metadata: Implementation Strategies) schema3 , and OAI-PMH (Open Archives Initiative - Protocol for Metadata Harvesting)4. Dublin Core Metadata Element Set is the best-known international standard for crossdomain information resource description. Moreover, FOAF (Friend of a Friend)5 defines an open, decentralized technology and metadata schema for connecting social web sites, and the people they describe. LOM (Learning Object Metadata)[5] focuses on learning objects, digital or non-digital, and their management, location, and evaluation. Those afore-mentioned standards constitute the metadata foundation for scientific publication management. 1 2 3 4 5
Managing Ubiquitous Scientific Knowledge on Semantic Web
423
Meanwhile, metadata promotes the evolution of semantic technologies, e.g. ontology, mark-up language, semantic search, semantic matching and so forth. An ontology is a formal representation of a set of concepts[6]. It focuses on a specific domain and the relationships between concepts in it, which is applied to reason about the metadata of that domain or to define the domain. A markup language is an artificial language comprising metadata, markup and data content. People use it to describe the information with respect to the structure of text or its display, which has already been popularly used in computer typesetting and word-processing systems, such as HTML, XML, RDF (Resource Description Framework)6 and OWL (Web Ontology Language)7 . Additionally, semantic matching[7, 8] and semantic search[9] have improved the search process by leveraging XML, RDF and OWL data to produce highly relevant results. The essential difference between semantic search and traditional search is that semantic search is based on semantics, while traditional one is mainly resulted by keywords mapping. Recently, applications of scientific publication search have proliferated like Google Scholar8, Citeseer9 , and so on. With the advent of semantic browser[10], semantic wiki[11, 12] and semantic desktop[13], users may enjoy more conveniences brought by semantic web and social network services. A semantic wiki is a collaborative spot that has an underlying model of the knowledge described in its pages. It allows the ability to capture or identify metadata within pages, and the relationships between entities, in ways that can be queried or exported. The concept of ”semantic desktop” is to improve personal knowledge management and collaboration. Tudor Groza et al.[14] proposed a Semantically Annotated LaTeX for Scientific Publications (SALT) framework for authoring and annotating semantic documents on the desktop, which is also an extension and implementation of ABCDE format in Anita de Waard et al.[15] The Liquidpub project10 proposes a paradigm shift in the way scientific knowledge is created, disseminated, evaluated and maintained. This shift is enabled by the notion of Liquid Publications[16], which are evolutionary, collaborative, and composable scientific contributions. Fausto Giunchiglia et al.[17] gave a formal definition of SKO and its associated structures. The approach they presented is based on three organization levels (Data, Knowledge and Collection) and also three states (Gas, Liquid, Solid) that regulate the metadata and operations allowed at each level.
3
Problem Statement
The motivation comes from a narrative of writing a PhD qualifying paper. To start with, the student uses Google Scholar and Citeseer to accumulate his 6 7 8 9 10
background knowledge to achieve the state of the art and generates an initial ”Gas” idea. Whereafter, he discusses it with his supervisor and colleagues via email, and begins to draft his ”Liquid” paper. After several iterations, he finishes editing the qualifying paper by LaTex and sends the ”Solid” PDF file to teacher. He gets feedbacks from reviewers and checks the review forms item by item according to his paper to make a final modification. Although some progress has been made in such scenario, at least three major obstacles must be overcome before a semantic framework can be realized. Firstly, it’s not so efficient for collaborative work in this use case. Since a SKO evolves and changes its lifecycle in a distributed production environment, several versions of the SKO generates, and various comments and reviews mix. A supervisor could give some general comments by email, while commenters and reviewers suggest several detailed critiques with un-unified formats of files. There is still no standard schema and container to describe, comment on, and review SKOs to facilitate collaboration, version management and metadata sharing. Secondly, when the student hunts for background knowledge about his research topic, it frequently happens that he desires to gain some parts of paper directly, such as a result of an evaluation experiment, a definition of a novel concept, an impressed figure, etc. To date, a paper or a SKO is always applied as a basic indivisible unit, which needs a specific modularity for SKO’s rhetorical representation and description. Thirdly, when the student finds some interesting related works, e.g. a reference, a relevant project, or even a researcher mentioned in a paper, he has to input their titles or names to the search engines beginning with a time-consuming navigation. Marking them up as entities and annotating them with Uniform Resource Identifiers (URI), along with sets of attributes could definitely facilitate the efficiency for SKO search and navigation. Semantically enriching papers is still a difficult problem, yet to be adequately resolved. Papers always lack of semantics both during authoring and post-publication period. To help readers easily and intuitively attain a rhetorical block which describes background, contribution or discussion is another research issue to be tackled. The three prime issues can be summarized: 1. Current metadata schema standards are not sufficient to describe SKOs/SKO parts and their relationships. 2. Modularity patterns for semantically modelling different kinds of SKOs are needed both for reading and writing purposes. 3. Existing editing tools for SKOs like LaTex and Open Office are not fit for semantic authoring and annotating. The above difficulties are real challenges faced by researchers attempting to develop.
4
Possible Solutions
In this section, we propose a SKO Types Framework including SKO Metadata Schema, SKO Patterns and SKO Editor three components, which are dedicated
Managing Ubiquitous Scientific Knowledge on Semantic Web
425
to resolve the three problems addressed in Section 3. SKO Metadata is an extension of current standards e.g. Dublin Core and LOM in digital library area, which supplies the schema with semantics and lifecycle features. SKO Patterns is based on three ontologies, i.e. document ontology, annotation ontology and rhetorical ontology. It’s a faceted classification[18], dealing with not only syntactic patterns, but also semantic ones. SKO Editor supports a semantic editing environment for managing SKOs and their metadata during both authoring and post-publication. These three components constitute the foundation for SKO Types theory and applications. 4.1
SKO Metadata Schema
A tentative conceptual data schema that defines the structure of metadata for SKO is specified in this subsection, which aims to change the ways of Scientific Knowledge Object representation, creation, dissemination, collaboration, and evaluation. Our approach proposes to extend the shallow metadata schemas currently used with the notion of Entities. SKO Metadata comprises sets of attributes. An attribute represents a property of an object as a pair name-value, where the name of the attribute identifies its meaning and the value is an instance of a particular data type. The data type of a SKO attribute could be either a simple data type e.g. Integer, Float, Date or an entity type such as SKO, Person, Organization, Conference and so on. Entity is a basic information element that represents either digital or physical object. Generally, entity is a collection of object’s metadata defined as a set of attributes. And entity type (EType) identifies an entity of being a particular instance of a kind, along with metadata and constraints over it. More precisely, EType defines two sets of entity attributes, i.e. strictly mandatory attributes (SMA) and mandatory attributes (MA), where SMA cannot have Null value and MA can. When the data type of an attribute value is an entity type, we call this attribute a relational property which is modelled as unidirectional link via attribute. For example, the paper ”SKO Types” is a SKO, which has an attribute ”author”. The data type of attribute ”author” is Person entity type. ”Hao Xu” is the author of ”SKO Types”, so that attribute ”author” indicates a link from a SKO entity to a Person entity. Therefore, extending basic set of data types with EType data types is crucial to represent relationships among entities for further semantic services. Besides, Meta-metadata is attribute of attribute which represents attribute provenance information, like creator of the attribute or its timestamp. We group related attributes as categories in SKO Metadata Schema, which consists of six categories as follows. And Fig.1 illustrates its E-R diagram. 1. The general category groups the general information that describes the SKO as a whole. 2. The lifecycle category groups the features related to the history and current state of this SKO and those who have affected this SKO during its evolution.
426
H. Xu
3. The relational category groups features that define the relationship between the SKO and other Entities. 4. The technical category groups the technical requirements and technical characteristics of the SKO. 5. The rights category groups the intellectual property rights, authorship, copyrights and conditions of use for the SKO. 6. The meta-metadata category groups information about the metadata instance itself, rather than the SKO that the metadata instance describes.
Fig. 1. SKO Metadata Schema
4.2
SKO Patterns
We propose an open-standard, widely (re)useable format, which is an extension of the ABCDE format for modeling different kinds of paper. In SKO Metadata Schema, we predefined three kinds of SKOs, i.e. article, monograph and article collection, while paper can be further divided into journal paper, conference paper, and tech report. A monograph could be one of book, master thesis or PhD thesis, while an article collection is a complex SKO, such as conference proceedings or a journal issue. In this paper, we only take ”paper” into account as a first touch. We define three ontologies as basics of general SKO Patterns: 1. Document Ontology: capturing the syntactic structure of a SKO (section, subsection, figure, table, etc). 2. Rhetorical Ontology: modelling the rhetorical structure of a SKO (Annotation, Background, Contribution, Discussion, Entity, etc). 3. Annotation Ontology: creating the bridge between the rhetorical structure and the syntactic structure, especially for the metadata creation and extraction. We use these ontologies to modularize this PhD qualifying paper as a case study per se. Document ontology is based on Latex syntax, including section,
Managing Ubiquitous Scientific Knowledge on Semantic Web
427
Fig. 2. The Rhetorical Ontology Schema
subsection, figure, and so forth. According to the rhetorical ontology given in Fig.2, the rhetorical structure of this paper will be Annotation, Background: state of the art, Background: problem, Contribution: solution, Discussion: conclusion, Discussion: future work and Entities. The annotations contain title, author, abstract, references, and so on. Entities consist of Entity: SKO, Entity: person, Entity: project, Entity: conference, etc. Modularization is a viable means to divide a SKO into parts, which makes a feasibility for searching SKO parts and entities directly. We intend to extend our investigations to other kinds of SKOs and specify all these three ontologies in RDF in near future. 4.3
SKO Editor
We wish to provide a SKO Editor for authoring and annotating semantic documents. As a first attempt, SKO Editor would be a LaTex-like editing environment and supports the creation of both content data and related metadata for scientific publications. PDF file format is an ideal container for SKO semantics, since it can be considered as the de facto standard in terms of electronic publishing. The vision of SKO editor aims at SKOs’ creation, distribution, collaboration and evaluation. This will be enabled by the use of ontologies we predefined in SKO Patterns and the metadata schema we specified in SKO Metadata Schema. We insist on that the best way to present a narrative to a computer is to let the author explicitly create a rich semantic structure for the SKO during writing. SKO Editor provides a viable way for authoring and annotating semantic documents by SKO Patterns. With SKO Editor, readers can quickly glance through the contribution and skip to the section they are interested in. The writing at syntax level in SKO Editor will be compatible with regular LaTex commands. And the specific annotation commands are proposed as a mark-up language as follows. All these commands provide the support for creating rhetoric elements,
428
H. Xu
creating implicit and explicit visual annotations and for inserting arbitrary annotations in SKOs. In fact, semantic annotation creates a bridge between the actual SKO and its metadata. We propose a pseudo mark-up language in Fig.3, which describes a semantic writing and reading environment. Ideally, after annotating an entity like a person or a project, we could get its attributes automatically by the system without another single search. For example, in Fig.3 when we click on the Person ”Hao Xu”, the system retrieves his attributes such as ”name”, ”affiliation”, ”email” and so forth which are predifined in SKO Metadata Schema.
Fig. 3. PDF/HTML creation from annotated SKO
5
Conclusion
In this paper we aim to propose some possible solutions for managing ubiquitous Scientific Knowledge Objects during their creation, evolution, collaboration and dissemination. Also, we are dedicated to provide a viable means to generate semantic documents for scientific publications in a simple and intuitive way. To achieve this objective, we have attempted to introduce a SKO Types framework that consists of metadata layer, ontology layer and interface layer. SKO Metadata Schema specifies a set of attributes for each kind of SKO. A relational attribute indicates the relationships between two certain entities as a link. SKO Patterns is formed by three main ontologies, i.e. the document ontology, the rhetorical ontology and the annotation ontology. SKO Editor takes charge of authoring and annotating SKOs. Semantics and interoperability would be its two prominent features. For the future, the focal point will be an extension and refinement of SKO Types framework, especially for stable and exchangeable metadata, specific ontologies across disciplines, as well as implementation of SKO Editor and SKO
Managing Ubiquitous Scientific Knowledge on Semantic Web
429
Web Platform. We would define metadata/patterns not only for various SKOs, but also for other Entity Types related to SKOs, such as Researcher, Project, Conference, along with their own attribute sets in order to facilitate Entity Search[19]. Our ultimate goal is to convince users to utilize our platform dealing with SKOs management. Thus, we will also perform an evaluation phase after implementation.
References [1] Fausto, G., Maurizio, M., Ilya, Z.: Towards a theory of formal classification. In: Proceedings of Twentieth National Conference on Artificial Intelligence (AAAI 2005), Pittsburgh, Pennsylvania, USA, July 9-13. AAAI Press, Menlo Park (2005) [2] Baez, M., Casati, F., Marchese, M.: Universal resource lifecycle management. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 1741– 1748 (2009) [3] Shotton, D.: Semantic publishing: the coming revolution in scientific journal publishing. Learned Publishing 22(2), 85–94 (2009) [4] National Information Standards Organization. Understanding metadata. NISO Press, Bethesda (2004) [5] IEEE learning technology standards committee (ltsc): IEEE, p. 1484.12 learning object metadata working group (2000) [6] Ganter, B., Stumme, G., Wille, R.: Formal concept analysis: Theory and applications. j-jucs 10(8), 926–926 (2004) [7] Fausto, G., Pavel, S., Mikalai, Y.: Semantic matching. Encyclopedia of Database Systems (2009) [8] Fausto, G., Mikalai, Y., Pavel, S.: Semantic matching: Algorithms and implementation. In: Spaccapietra, S., Atzeni, P., Fages, F., Hacid, M.-S., Kifer, M., Mylopoulos, J., Pernici, B., Shvaiko, P., Trujillo, J., Zaihrayeu, I. (eds.) Journal on Data Semantics IX. LNCS, vol. 4601, pp. 1–38. Springer, Heidelberg (2007) [9] Ramanathan, G., Rob, M., Eric, M.: Semantic search. In: WWW 2003: Proceedings of the 12th international conference on World Wide Web, pp. 700–709. ACM Press, New York (2003) [10] Lee, T.B., Chen, Y., Chilton, L., Connolly, D., Dhanaraj, R., Hollenbach, J., Lerer, A., Sheets, D.: Tabulator: Exploring and analyzing linked data on the semantic web. In: Proceedings of the 3rd International Semantic Web User Interaction Workshop (SWUI 2006), p. 6 (2006) [11] V¨ olkel, M., Kr¨ otzsch, M., Vrandecic, D., Haller, H., Studer, R.: Semantic wikipedia. In: WWW 2006: Proceedings of the 15th international conference on World Wide Web, pp. 585–594. ACM, New York (2006) [12] Souzis, A.: Building a semantic wiki. IEEE Intelligent Systems [see also IEEE Intelligent Systems and Their Applications] 20(5), 87–91 (2005) [13] Sauermann, L., Bernardi, A., Dengel, A.: Overview and outlook on the semantic desktop. In: Proc. of Semantic Desktop Workshop at the ISWC (2005) [14] Groza, T., Handschuh, S., M¨ oller, K., Decker, S.: Salt - semantically annotated latex for scientific publications. In: Franconi, E., Kifer, M., May, W. (eds.) ESWC 2007. LNCS, vol. 4519, pp. 518–532. Springer, Heidelberg (2007) [15] de Waard, A., Tel, G.: The abcde format enabling semantic conference proceedings. In: SemWiki (2006)
430
H. Xu
[16] Casati, F., Giunchiglia, F., Marchese, M.: Publish and perish: why the current publication and review model is killing research and wasting your money. Ubiquity 8(3), 1 (2007) [17] Fausto, G., Ronald, C.: Scientific knowledge objects v.1. Technical report, University of Trento, Dipartimento di Ingegneria e Scienza dell’Informazione, Trento, Italy (2009) [18] Vickery, B.: Faceted classification for the web. Axiomathes 18, 1122–1151 (2007) [19] Chang, K.C.: Entity search engine: Towards agile best-effort information integration over the web (2007)
A Semantic Pattern Approach to Managing Scientific Publications Hao Xu1,2, 1
2
College of Computer Science and Technology, Jilin University, China Department of Information Science and Engineering, University of Trento, Italy [email protected]
Abstract. With the advancement of digital library techniques and open access services, more and more off-the-shelf utilities for managing scientific publications are emerging and wide-spread used. Nevertheless, most online articles of today remain the electronic facsimiles of the traditional linear structured papers lacking of semantics and interlinked knowledge. In this paper, we propose a pattern-based approach to externalize ubiquitous scientific publications in step with the development of Semantic Webs and Patterns Theory, which aims to tremendously evolve the means of reading, writing, and publishing for research communities.
1
Introduction
Externalization represents the process of articulating tacit knowledge into explicit concepts which was defined by Nonaka[1]. A cognitive externalization makes scientific publications much more easier to disseminate, navigate, understand and reuse in research communities[2]. In the last decade, a handful of models targeting the externalization of the rhetoric and argumentation captured within the discourse of scientific publications were proposed, represented by ABCDE Format[3], Scholarly Ontologies Project1 , SWRC(Semantic Web for Research Communities) Project2 and SALT (Semantically Annotated LaTeX)3 . Moreover, the journal of Cell launched an ”Article of Future” Initiative4 lately to provide a new online format that complements the traditional print paper. However, few online publishers have resolved the problem of how best to employ the empirical rhetorical structures to enrich the representations of scientific publications. Essentially there doesn’t appear to be a widely accepted knowledge representation model for scientific papers on Semantic Webs. Making an article organized not only by linear structure, but also by rhetorical structure 1 2 3 4
This research was done in KnowDive Group, University of Trento, supervised by Prof. Fausto Giunchiglia. Scholarly Ontology Project: http://projects.kmi.open.ac.uk/scholonto/ SWRC: http://ontoware.org/projects/swrc/ SALT: http://salt.semanticauthoring.org/ ”Article of Future” Initiative: http://beta.cell.com/index.php/2010/01/celllaunches-article-of-the-future-format
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 431–434, 2010. c Springer-Verlag Berlin Heidelberg 2010
432
H. Xu
with semantic links and metadata will definitely help the readers more efficiently access the specific information units, without being overwhelmed by undesirable additional details. In this paper, we will tackle the problem mentioned above using a semantic pattern approach inspired from Pattern Language[4] by Christopher Alexander and Semantic Patterns[5] by Steffen Staab et al. We focus on how patterns can be applied to describe various types of scientific publications’ representation, composition, relation, and how entity-relationships can be applied to categorize and retrieve the knowledge at both data and metadata levels on Semantic Webs.
2
A Pattern Approach to Scientific Publications Management
As inspired by the literature on architecture[4] and software engineering[6], we use the term ”pattern” to indicate reusable templates for capturing successful practices of managing recurrent tasks. Initially, Alexander developed and used the pattern approach to capture his perceptions of the ”timeless way” of designing towns and buildings. His theory is based on the consideration that every design problem is the result of a certain configuration of forces in a specific context, while ”Each pattern describes a problem which occurs over and over again in our environment and then describes the core of the solution to that problem.” Our research aims at specifying and employing patterns which capture experiences of rhetorical structures. This section outlines our approach to acquiring, describing, and modularizing the structure representation, organization and presentation of the article itself. Users will subsequently be able to query the resulting catalog of patterns according to data and metadata specified to various scientific publications. A pattern for scientific papers in our approach is described in Table 1 by sections as follows: Table 1. SKO Pattern Structure Section Pattern Name Intent Sequence
Description Meaningful descriptor of the pattern. Short statement which situation the pattern addresses. A sequence of activities. Document structure, Rhetorical Structure and Serialization[7]. Structure We will focus on the rhetorical structure by our approach and import the existing dominant ontologies for Document Structure and Serialization. Classification A Classification of Related Patterns in Pattern Repository. Metadata A Metadata Schema for Scientific Publications[8]. Example Show that how to put this pattern into practice.
We provide an open-standard, widely (re)useable rhetorical ”Structure” for both authoring and post-publication, which is an extension of the ABCDE format and the Cell format for modeling different types of paper instead of being
A Semantic Pattern Approach to Managing Scientific Publications
433
either too general or too specific. In our Patterns Repository, we predefined three types of Scientific Publications, i.e. article, monograph and article collection, while paper can be further divided into journal paper, conference paper, and tech report. A monograph could be one of book, master thesis or PhD thesis, while an article collection is a complex Scientific Knowledge Objects[7], such as conference proceedings or a journal issue. In this paper, we only take ”paper” into account as a first touch. Scientific Publications Patterns will be ontologystructured and also applied with faceted classifications[9]. Here are some patterns for the papers from Computer Science community we predefined for example: 001. 002. 003. 004. 005. 006. 007. 008. 009. 010. 011. 012.
Design Briefings Empirical Papers Experience Papers Methodology Papers Opinion Papers System Papers Theory Papers Vision paper Survey Book review Critique PhD Thesis
Our main contribution to scientific knowledge representation is that we specify types of Scientific Publications into patterns. We define each pattern with further specific representation of rhetorical structure. Besides, we enrich these rhetorical structure and rhetorical chunks with semantics using Scientific Publication Metadata Schema[8]. An example of Scientific Pulication Pattern as described as follows. Here we omit the specifications of attribute elements. – Pattern Name: PhD Thesis – Intent: Used for PhD theses’ writing and reading. – Sequence: In the ”PhD thesis defense” pattern, we will use this ”PhD Thesis” Pattern as a related pattern. Other related pattern could be ”submit”, ”review”, ”evaluate”, ”comment” and so on. – Structure: Introduction, Motivation, State of the Art, Problem, Methodology, Solution, Evaluation, Discussion, Conclusion and Future Work. – Classifications: Scientific Publication: -Paper: -PhD Thesis – Metadata:identifier, title, author, subject, abstract, reference, hasVersion, hasChapter, hasSection, hasTextChunk, hasFigure, hasTable, copyRight. – Example: Ilya Zaihrayeu’s PhD Thesis. URL:http://www.dit.unitn.it/ ilya/Download/Publications/PhD-Thesis.pdf Pattenized modularization is a viable means to divide a Scientific Knowledge into parts, which makes a feasibility for searching and navigating elementary knowledge units directly. We intend to extend our investigations to other different kinds of
434
H. Xu
Scientific Publications and specify all these patterns in RDF (Resource Description Framework)5 in near future.
3
Conclusion
In this paper, we present a novel pattern approach to solving problems of Scientific Publications’ representation and management on Semantic Web. A pattern consists of several components dealing with metadata and rhetorical representation of data respectively. The main contribution of this work is to provide a high-level pattern language for the externalization of the rhetoric and argumentation captured within scientific knowledge objects which will definitely facilitate discovery, dissemination and reuse of scientific knowledge in our research communities. For the future, the focal point will be an extension and refinement of Scientific Knowledge Objects Patterns Framework6, especially for stable and exchangeable metadata, specific ontologies across disciplines for Pattern Repository, as well as implementation of Scientific Publication Patterns Platform.
References [1] Takeuchi, H., Nonaka, I.: The knowledge-creating company: How japanese companies create the dynamics of innovation. Oxford University Press, Oxford (1995) [2] Simon, T.C., Groza, B.S.T., Handschuh, S., de Waard, A.: A short survey of discourse representation models. In: Proceedings 8th International Semantic Web Conference, Workshop on Semantic Web Applications in Scientific Discourse, Washington, DC, October 26. LNCS. Springer, Berlin (2009) [3] de Waard, A., Tel, G.: The abcde format enabling semantic conference proceedings. In: SemWiki (2006) [4] Ishikawa, S., Silverstein, M., Jacobson, M., Fiksdahl-King, I., Schlomo, A., Alexander, C.: A pattern language: Towns, buildings, construction. Addison-Wesley, Boston (1977) [5] Maedche, A., Staab, S., Erdmann, M.: Engineering ontologies using semantic patterns. In: Proceedings of the IJCAI-01 Workshop on E-Business & the Intelligent Web, Seattle, WA, USA, August 5 (2001) [6] Gamma, E., Helm, R., Johnson, R., Vlissides, J.: Design patterns: elements of reusable object-oriented software. Addison-Wesley Professional, Reading (1995) [7] Fausto, G., Ronald, C.: Scientific knowledge objects v.1. Technical report, University of Trento, Dipartimento di Ingegneria e Scienza dell’Informazione, Trento, Italy (2009) [8] Xu, H., Giunchiglia, F.: Scientific knowledge objects types specification. Technical report, University of Trento, Dipartimento di Ingegneria e Scienza dell’Informazione,Trento, Italy (2009) [9] Giunchiglia, F., Dutta, B., Maltese, V.: Faceted Lightweight Ontologies. In: Conceptual Modeling: Foundations and Applications: Essays in Honor of John Mylopoulos, pp. 36–51. Springer, Heidelberg (2009) 5 6
RDF: http://www.w3.org/RDF/ This research is partly supported by the European Project: Liquid Publication http://project.liquidpub.org/
A Bootstrap Software Reliability Assessment Method to Squeeze Out Remaining Faults Mitsuhiro Kimura1, and Takaji Fujiwara2 1
Faculty of Science and Engineering, Hosei University, 3-7-2 Kajino-cho, Koganei-shi, Tokyo, 184-8584 Japan [email protected] 2 Business Cube and Partners, Inc., 1-20-18 Ebisu, Shibuya-ku, Tokyo, 150-0013 Japan [email protected]
Abstract. This paper develops a bootstrap software reliability assessment method which can evaluate the number of remaining software faults at the final stage of the software testing process. The bootstrap method for reliability assessment problems has been already developed in the literature. However the method has a weak point which affects the applicability to the data set to be analyzed. We propose a new calculation formula in order to overcome this weak point. After showing the reliability assessment method by the traditional NHPP (nonhomogeneous Poisson process) models, we compare the performance of software reliability prediction with the bootstrap-based method by using a real software fault data set. Keywords: Software reliability, Growth curve model, Bootstrap method, Nonhomogeneous Poisson process, Data analysis.
1
Introduction
Precise software reliability assessment is necessary to evaluate and predict the reliability and performance of a developed software product. This issue has been still urgent one in software engineering to develop a quality software product with the reasonable cost. In order to tackle with this software development management issues, a number of software reliability assessment methods and models have been proposed by many researchers over the last three decades (e.g. [1,2,3]). Among these models, software reliability growth models (e.g. [4]) play an important role of software reliability assessment. This approach focuses on the testing process of the software development, and analyze a time-dependent behavior of software testing process by some observable quantities. Usually, the quantities form a data set, and the testing managers assess it to estimate the degree of the software reliability or testing progress.
This work was partially supported by KAKENHI, the Grant-in-Aid of Scientific Research (C)(20500036).
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 435–446, 2010. c Springer-Verlag Berlin Heidelberg 2010
436
M. Kimura and T. Fujiwara
One of the essential problems arising in the software reliability analysis is that the testing managers can obtain only one data set from their testing process of the software under the development. That is, no one executes the software testing twice or more by the fixed set of test cases. From the view point of the statistics, we are limited to use only one sample (path) of a stochastic phenomenon to find values of its unknown parameters. Our first motivation of proposing a new bootstrap-based method came from this difficulty. In this paper, we first describe the modeling and how to estimate the model parameters. Comparing the model with a traditional NHPP (nonhomogeneous Poisson process) model, we discuss the advantages and weak points of both models and methods by analyzing a sample but actually collected data set.
2
Generalized Growth Curve Model
In this study, we assume that a data set forms (ti , yi ) (i = 1, 2, . . . , n), where ti is the i-th testing time recorded and yi the cumulative number of detected software faults up to time ti . It is also assumed that 0 < y1 < y2 < . . . < yn is satisfied for our basic model [5]. One of our main concerns is how we estimate the number of remaining software faults that are latent in the software system. In the literature, many growth curve models have been developed and applied to such data sets to forecast the software reliability in the software testing process. In this section, we provide one generalization of a certain class of reliability growth curves. In software reliability growth modeling, the following growth curves are widely known [4]. E(t) = m1 (1 − e−m2 t ) (m1 > 0, m2 > 0), D(t) = d1 (1 − (1 + d2 t)e
−d2 t
) (d1 > 0, d2 > 0).
(1) (2)
The functions E(t) and D(t) are respectively called an exponential curve and delayed S-shaped one. Each one is used as a mean value function of NHPP. These two curves have convergence values respectively when t goes to ∞ as lim E(t) = m1 ,
(3)
lim D(t) = d1 .
(4)
t→∞ t→∞
It is understood that the number of software faults is finite. Taking the following transformation for these functions, we have dE(t) } = log{m1 m2 } − m2 t, dt dD(t) log{ /t} = log{d1 d22 } − d2 t. dt log{
(5) (6)
We can see that the above equations show a linear regression formula. From this fact, these curves can be generalized by using the following linear regression formula.
A Bootstrap Software Reliability Assessment Method
437
dH(t) α /t } = A − Bt, (7) dt where H(t) is a non-decreasing and differentiable function and α, A and B are the constant unknown parameters. In this study, we assume B > 0 to describe a software reliability growth phenomenon by the model. Nothing to say, when α = 0 and α = 1, we have log{
H(t)|α=0 = E(t), H(t)|α=1 = D(t),
(8) (9)
respectively. Also the differential equation (7) can be solved with an initial condition H(0) = 0 as H(t) =
eA {Γ [α + 1, 0] − Γ [α + 1, Bt]}, B α+1
where Γ [α + 1, x] is the incomplete gamma function defined as ∞ Γ [α + 1, x] = sα e−s ds.
(10)
(11)
x
The parameter α must satisfy α > −1. H(t) in Eq. (10) is called an incomplete gamma function model [6]. This model has more applicability as a growth curve than the exponential or delayed S-shaped model by the parameter α.
3
Estimation of the Number of Remaining Faults
In this section, we show two methods to estimate the number of remaining software faults in the software. First we assume that the number of software faults which are latent at the beginning of software testing is a constant, which is denoted by a. This parameter can be expressed by a = lim H(t) = t→∞
3.1
eA Γ [α + 1, 0]. B α+1
(12)
NHPP-Based Method
Let {N (t), t ≥ 0} be a nonhomogeneous Poisson process, which N (t) represents the cumulative number of detected software faults up to testing time t. We have the following formula as Pr[N (t) = n] =
E[N (t)]n exp[−E[N (t)]] (n = 0, 1, 2, . . .), n!
(13)
where E[N (t)] is a mean value function of NHPP. We can apply H(t) in Eq. (10) to this function as E[N (t)] = H(t). (14)
438
M. Kimura and T. Fujiwara
Simultaneously, the variance of N (t) when t is given can be represented by Var[N (t)] = H(t).
(15)
Therefore letting M (t) be the number of remaining software faults at time t, we have M (t) = a − N (t). (16) Its mean and variance are respectively given by eA Γ [α + 1, Bt], B α+1 Var[M (t)] = H(t). E[M (t)] =
(17) (18)
In order to estimate the unknown parameters α, A, and B, we apply the method of maximum likelihood. The likelihood function L(α, A, B) is represented as L(α, A, B) =
where t0 ≡ 0 and y0 ≡ 0. Maximizing L with respect to α, A, and B, we obtain the estimates of these parameters. Usually we use the function log L to estimate them as follows. log L =
n
(yk − yk−1 ) ln[H(tk ) − H(tk−1 )] − H(tn ) −
k=1
n
ln[(yk − yk−1 )!]. (20)
k=1
Searching A and B which provide the maximum value of ln L with changing the ˆ with α. given value for α, we can have Aˆ and B ˆ 3.2
Bootstrap-Based Method
In this study, we focus on the degree of the deviation of M (t). Kimura and Fujiwara [6] recently showed the software quality control charting by using a bootstrap method. We here briefly describe the key concept of the method and its limitation. Equation (7) can be called as an adaptive linear regression. Its independent α variable is t and the objective one is log{ dH(t) dt /t }. We here denote the objective variable by z(α, t). We cannot know the values of z(α, t) in principle even if t is given, because the values of α, A, and B are all unknown and simultaneously dH(t) cannot be evaluated. However, if we use a numerical differentiation to dt evaluate dH(t) approximately, this regression analysis becomes to be evaluated dt adaptively. That is, we present the objective value if t = ti as ⎧
yi −yi−1 1 yi+1 −yi α ⎪ log + (1 ≤ i ≤ n − 1) ⎪ ⎨ 2 ti+1 −ti ti −ti−1 /ti z(α, ti ) = , (21) ⎪ ⎪ ⎩ log yn −yn−1 /tα (i = n) n tn −tn−1
A Bootstrap Software Reliability Assessment Method
439
where t0 ≡ 0 and y0 ≡ 0. By using the virtual data set (ti , z(α, ti )) (i = 1, 2, . . . , n), we apply the following linear regression model. z(α, ti ) = A − Bti + i (i = 1, 2, . . . , n),
(22)
where i represents an error term. Note that the value of α is still unknown at this stage. From Eq. (22), we formally obtain the estimators of A and B by 1 ˆ× 1 Aˆ = z(α, ti ) + B ti , n i=1 n i=1 n
n
ˆ= B
i=1
n
1 ti ) × z(α, ti ) n i=1
(23)
n
(ti − n
1 2 (ti − ti ) n i=1 i=1 n
.
(24)
These estimators are the functions of α. Therefore recalling the sum of squared errors S(α), we derive it as n
S(α) =
n i=1
2i
(ti − t¯)z(α, ti ) n 2 i=1 z(α, ti ) − z¯(α) + (ti − t¯) = , n 2 i=1 (ti − t¯)
(25)
i=1
where 1 z(α, ti ), n i=1 n
z¯(α) =
1 t¯ = ti . n i=1
(26)
n
(27)
Minimizing S(α) with respect to α numerically, we can obtain the estimates α. ˆ ˆ can be respectively evaluated by Eqs. (23) and (24). For the sake Thus Aˆ and B of the numerical evaluation with this adaptive regression analysis, we need to use some sort of mathematical tool which provides formula manipulation system. Therefore we can estimate the number of software faults at the beginning of the software testing process as ˆ
a ˆ=
eA Γ [ˆ α + 1, 0]. ˆ ˆ α+1 B
(28)
By using the bootstrap method, we can obtain the bootstrap samples of the number of remaining software faults. The linear regression in Eq. (22) is suitable for the bootstrap sampling. We show this re-sampling scheme as follows [6].
440
M. Kimura and T. Fujiwara
ˆ0 = B ˆ with the data set (ti , z(ˆ Step 1. Estimate Aˆ0 = Aˆ and B α, ti )) (i = 1, 2, . . . , j). The number j is given and tj means the evaluation time point of this scheme (j ≤ n). Step 2. Calculate the residual w(ti ) by ˆ0 ti ) (i = 1, 2, . . . , j) w(ti ) = z(ˆ α, ti ) − (Aˆ0 − B
(29)
Step 3. Set the total number of iteration K. Let p = 1 (p = 1, 2, . . . , K). Step 4. Generate a new zp (ˆ α, ti ) by randomly choosing one value w(t∗ ) from the set of {w(t1 ), w(t2 ), w(t3 ), . . . , w(tj )}. Therefore we have a sequence of j bootstrap samples by ˆ0 ti + w(t∗ ) (i = 1, 2, . . . , j). zp (ˆ α, ti ) = Aˆ0 − B
(30)
Thus a new bootstrap data set can be generated as {zp (ˆ α, t1 ), zp (ˆ α, t2 ), . . . , zp (ˆ α, tj )}.
(31)
Step 5. Estimate the parameters Ap and Bp by the following regression formula as α, ti ) = Ap − Bp ti (i = 1, 2, . . . , j). (32) zp (ˆ Step 6. Let p = p + 1 and go back to Step 4 if p < K. Step 7 Stop ˆp ) (p = 1, 2, . . . , K). Also Hence we obtain K pairs of bootstrap estimates (Aˆp , B we have K bootstrap samples of the number of remaining software faults at time tj by ˆp A ˆ p (tj ) = e ˆp tj ] (p = 1, 2, . . . , K). M Γ [ˆ α + 1, B (33) ˆ ˆpα+1 B This method works under the strict condition of y0 ≡ 0 < y1 < y2 < . . . < yn . The condition is needed to calculate z(α, ti ) in Eq. (21). However in the final stage of software testing process, we often observe the data set which behaves not to meet the condition (cf. Fig. 1), because this is a squeeze-out process for the remaining faults latent in the software system. Therefore our bootstrapbased method described above cannot be applied to such data sets as is. In order to overcome this limitation, we have tried several ways of extracting the α values of log{ dH(t) dt /t } from the original data pairs of (ti , yi ) (i = 1, 2, . . . , n). Consequently in this study, we propose the following numerical differentiation method. Instead of Eq. (21), we use z(α, ti ) = log
k
1 yi+l − yi+l−1
k
l=1
ti+l − ti+l−1
/tα i ,
(34)
where k is the maximum number of consecutively same values among {y1 , y2 , . . . , yn }. In Eq. (34), the index i and k must hold i + k ≤ n.
A Bootstrap Software Reliability Assessment Method
441
of detected faults yi 20 15 10 5
5
10
15
20
Testing time ti
Fig. 1. Time behavior of cumulative number of detected faults in the final stage of software testing process
4
Numerical Examples and Discussion
In this section, we show an example of software reliability analysis. Especially we present the estimation results of the number of remaining software faults. The data set was collected from the final software testing phase of a certain real software development project. This software system under the testing belongs to a category of software development tools, and this software works as a webbased application. This application software also consists of Java and Visual Basic languages, and each size of the program code is about 5800 steps in Java and 10400 steps in VB. Figure 1 illustrates the overall data set (ti , yi ) (i = 1, 2, . . . , 23). First we analyze the data by the NHPP model which described in the previous section. Table 1 shows the results of the estimated parameters. The estimated mean value function E[N (t)] in Eq. (14) along with the data set is shown in Fig. 2. From this result, we can obtain the number of remaining faults aˆ − y23 and its 90% confidence interval based on a Poisson distribution. The results are shown in Table 2. Figure 3 additionally represents the estimated probability function of a − y23 . On the other hand, we have estimated these quantities by the bootstrapbased method. We have set K = 1000 for the iteration of the bootstrap, and fixed k = 5 in Eq. (34). Also we confirmed that w(ti ) in Eq. (29) obeys the Table 1. Estimated parameters (by NHPP model) Aˆ 0.6096
ˆ B 0.2372
α ˆ 0.79
442
M. Kimura and T. Fujiwara
of detected faults yi 20 15 10 5
5
10
15
20
Testing time ti
Fig. 2. Estimated E[N (t)] by the NHPP model Table 2. Estimated remaining software faults at t23 (NHPP model) a ˆ − y23 22.45 − 22 = 0.45
confidence interval (90%) −7 < a − y23 < 9
Probability 0.08
0.06
0.04
0.02
10
5
5
10
of faults
Fig. 3. Estimated probability function of the number of remaining faults evaluated at t23 (Plot is smoothed)
normal distribution with the significance level of 5% by using the KolmogorovSmirnov test (cf. Fig. 4). The estimated results of the 1000 values of a − y23 yield a histogram, which is depicted in Fig. 5. The parameter α was estimated as α ˆ = −0.3286. From this histogram, we calculated the mean number of remaining faults and the 90% confidence interval of the number of remaining software faults. The upper and lower bounds were obtained by a pro-rate basis from the histogram. Table 3 shows these results.
A Bootstrap Software Reliability Assessment Method
443
Fig. 4. Results of Kolmogorov-Smirnov test for w(ti )
Fig. 5. Histogram of the number of remaining software faults evaluated at t23
For this data set, the bootstrap-based method provides narrower confidence interval than that of the NHPP-based method. Additionally, this software system has actually occurred one software failure in the operation phase. In this sense, our estimation result of a ˆ − y23 = 1.20 in Table 3 might be a good prediction. Figure 6 is the time-dependent behavior of the estimated mean number of remaining software faults by the NHPP-based and bootstrap-based methods. We have plotted the results with changing the evaluation time from t16 to t23 . Table 3. Estimated remaining software faults at t23 (bootstrap) a ˆ − y23 23.20 − 22 = 1.20
confidence interval (90%) −2.82 < a − y23 < 5.59
444
M. Kimura and T. Fujiwara
Fig. 6. Estimated mean number of remaining faults evaluated at t16 to t23
At a glance, the NHPP model gives us a very stable forecast between t16 and t23 . However, from the testing managers’ point of view, this plot by NHPP-based method only means “This software still has few (1 ∼ 3) failures.”In other words, from t16 to t23 , the reliability growth of this software seems saturated already. At this point, the bootstrap-based method can indicate that this software test might be close to its end at about t21 , and finally, the number of remaining software faults is about 1 at t23 . This difference of the evaluation results comes from the fact that the bootstrapbased method uses the information on the decrease of log{ dH(t) dt } along with the testing time, but the NHPP-based method does not mainly utilize such information. Hence the bootstrap-based method could give us a precise forecast for this data set and at the same time, this method could work as an indicator of the software reliability growth phenomenon. However, our bootstrap-based method still has some limitations. We refer to them in the following list. 1. This bootstrap-based method is applicable if the time behavior of (ti , yi ) (i = 1, 2, . . . , n) shows saturation. That is, the data point (ti , z(ˆ α, ti )) needs to go down in average on the plot, because the parameter B must be positive in the following regression formula as log{
dH(t) α /t } = A − Bt. dt
For example, we depict the first regression result in Fig. 7. The parameters ˆ0 were obtained from this regression analysis. This figure shows Aˆ0 and B that the software testing is going to be saturated.
A Bootstrap Software Reliability Assessment Method
445
z0.3286, tik 2
1
Testing time 5
10
15
1
Fig. 7. First regression results (α ˆ = −0.3286)
2. In the case that the parameter k is large in Eq. (34), the size of the data set becomes small since i + k ≤ n. Thus Eq. (34) needs more improvement. This calculation method is a simple way to overcome the weak point which was mentioned in Section 3.2, but rough. Consequently, we consider that if we jointly employ these different methods (NHPP model and bootstrap) to evaluate the software reliability and the progress of the testing process, the evaluation results will be more trustworthy. On the contrary, such a methodology that the software testing manager provides many kinds of the mean value functions of NHPP models, chooses the best function in the sense of goodness-of-fit to the data set to be analyzed, and evaluates the software reliability, might not exert a good performance and will not acquire the useful information for the software testing management, especially, in its final stage.
5
Conclusion
In the software testing process, the software testing managers would like to squeeze out the remaining software faults from the software. In such a situation, we often observe a data set which behaves like no more detection of the software faults. We called this situation a squeeze-out process of the software faults. The basic bootstrap-based assessment method which has been already proposed in [6] has not been able to analyze a data set which was gathered from such a squeeze-out process. In this study, we made some improvement to the method so as to analyze the data set. We showed several comparisons between the traditional NHPP model and our bootstrap method from the view point of the characteristics of the software reliability prediction, and discussed the advantages and weak points of these methods. As a result, we have confirmed that our bootstrap-based method provides a good assessment in terms of the estimation of the number of remaining
446
M. Kimura and T. Fujiwara
software faults and the method works as an indicator of the progress of the software testing process. In the future study, we need to find a better way to extract the slope of the sequence of (ti , yi ) (i = 1, 2, . . . , n). This will improve the performance of our bootstrap-based software assessment method.
References 1. Musa, J.: Software Reliability Engineering. McGraw-Hill, New York (1999) 2. Kapur, P.K., Garg, R.B., Kumar, S.: Contributions to Hardware and Software Reliability. World Scientific, Singapore (1999) 3. Pham, H. (ed.): Recent Advances in Reliability and Quality in Design. Springer, London (2008) 4. Ryu, M. (ed.): Handbook of Software Reliability Engineering. McGraw-Hill, New York (1995) 5. Kimura, M.: A Study on Two-parameter Numerical Differentiation Method by Gamma Function Model. In: 12th ISSAT International Conference on Reliability and Quality in Design, Chicago, pp. 225–229 (2006) 6. Kimura, M., Fujiwara, T.: Practical Optimal Software Release Decision Making by Bootstrap Moving-Average Quality Control Chart. International Journal of Software Engineering and Its Applications 4(1), 29–42 (2010)
Markov Chain Monte Carlo Random Testing Bo Zhou, Hiroyuki Okamura, and Tadashi Dohi Department of Information Engineering, Graduate School of Engineering Hiroshima University, Higashi-Hiroshima, 739–8527, Japan {okamu,dohi}@rel.hiroshima-u.ac.jp
Abstract. This paper proposes a software random testing scheme based on Markov chain Monte Carlo (MCMC) method. The significant issue of software testing is how to use the prior knowledge of experienced testers and the information obtained from the preceding test outcomes in making test cases. The concept of Markov chain Monte Carlo random testing (MCMCRT) is based on the Bayes approach to parametric models for software testing, and can utilize the prior knowledge and the information on preceding test outcomes for their parameter estimation. In numerical experiments, we examine effectiveness of MCMCRT with ordinary random testing and adaptive random testing. Keywords: Software testing, Random testing, Bayes statistics, Markov chain Monte Carlo.
1
Introduction
Software testing is significant to verify reliability of software system. It is important to consider how testing can be performed more effectively and at lower cost through the use of systematic and automated methods. Since exhaustive testing, the checking of all possible inputs, is usually prohibitively difficult and expensive, it is essential for testers to make best use of their limited testing resources and generate good test cases which have the high probability of detecting as-yet-undiscovered errors. Although random testing (RT) is simple in concept and is often easy to implement, it has been used to estimate reliability of the software system. RT is one of the commonly used testing techniques by practitioners. However, it is often argued that such random testing is inefficient, as there is no attempt to make use of any available information about the program or specifications to guide testing. A growing body of research has examined the concept of adaptive random testing (ART) [5], which is an attempt to improve the failure-detection effectiveness of random testing. In random testing, test cases are simply generated in a random manner. However, the randomly generated test cases may happen to be close to each other. In ART, test cases are not only randomly selected but also evenly spread. The motivation for this is that, intuitively, evenly spread test cases have a greater chance of finding faults. Chen and Markel [6] also proposed quasi-random testing which uses a class of quasi-random sequences possessing T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 447–456, 2010. c Springer-Verlag Berlin Heidelberg 2010
448
B. Zhou, H. Okamura, and T. Dohi
property of low-discrepancy to reduce the computational costs, compared to ART. Besides random test case generator, there are two more test case generation methods [9]. Structural or path-oriented test case generators [10] are based on covering certain structural elements in the program. Most of these generators use symbolic execution to generate test case to meet a testing criterion such as path coverage, and branch coverage. Goal-oriented test case generators [11] select test case from program specification, in order to exercise features of the specification. In this paper, we propose a new software random testing method; Markov chain Monte Carlo random testing (MCMCRT) based on the statistical model using the prior knowledge of program semantics. The main benefit of MCMCRT is that it allows the use of statistical inference techniques to compute probabilistic aspects of the testing process. The test case generation proceed is accomplished by using Markov chain Monte Carlo (MCMC) method which generates new test case from previously generated test cases based on the construction of software testing model like input domain model. The rest of this paper is organized as follows. Section 2 summarizes the previous software testing methods. Section 3 describes MCMCRT. Section 4 presents numerical experiments and compares the proposed method to existing methods. Finally, in Section 5, we discuss the results and future works in the area of software testing.
2 2.1
Software Random Testing Random Testing and Adaptive Random Testing
Among the test case selection strategies, random testing (RT) is regarded as a simple but fundamental method. It avoids complex analysis of program specifications or structures and simply selects test cases from the whole input domain randomly. Hence, the test case generation process is cost effective and can be fully automated. Recently, Chen et al. [5] proposed adaptive random testing (ART) to improve on the fault detection capability of RT by exploiting successful test cases. ART is based on the observation [7] that failure-causing inputs are normally clustered together in one or more contiguous regions in the input domain. In other words, failure-causing inputs are denser in some areas than others. In general, common failure-causing patterns can be classified into the point, strip and block patterns [2]. These patterns are schematically illustrated in Fig. 1, where we have assumed that the input domain is two-dimensional. A point pattern occurs when the failure-causing inputs are either stand alone inputs or cluster in very small regions. A strip pattern and a block pattern refer to those situations when the failure-causing inputs form the shape of a narrow strip and a block in the input domain, respectively. Distance-based ART (DART) [5] is the first implementation of ART. This method maintains a set of candidate test cases C = {C1 , C2 , . . . , Ck } and a set
Markov Chain Monte Carlo Random Testing
( Point)
( Strip)
449
( Block)
Fig. 1. Failure pattern
of successful test case S = {S1 , S2 , . . . , Sl }. The candidate set consists of a fixed number of test case candidates which are randomly selected. The successful set records the locations of all successful test cases, which are used to guide the selection of the next test case. For each test case candidate Ci , DART computes its distance di from the successful set (defined as the minimum distance between Ci and the successful test cases), and then selects the candidate Ci having the maximum di to be the next test case. Restricted random testing (RRT) [3] is another implementation of ART. It only maintains the successful set S = {S1 , S2 , . . . , Sl } without any candidate set. Instead, RRT specifies exclusion zones around every successful test case. It randomly generates test case one by one until a candidate outside all exclusion zones is found. Both DART and RRT select test cases based on the locations of successful test cases, and use distances as a gauge to measure whether the next test case is sufficiently far apart from all successful test cases.
3 3.1
Markov Chain Monte Carlo Random Testing Bayes Statistics and MCMC
Assume that we need to compute the posterior probability p(ξ|x) of unknown parameter ξ and data x based on the likelihood p(x|ξ) and the prior probability p(ξ). According to the Bayes rule; p(ξ|x) =
p(x|ξ)p(ξ) , Z
we get the posterior probability, where Z is normalizing constant: Z = p(x|ξ)p(ξ)dξ.
(1)
(2)
In general, Eq. (2) becomes multiple integration. When the dimension number of ξ is high, it is usually very difficult or impossible to compute the normalizing constant.
450
B. Zhou, H. Okamura, and T. Dohi
MCMC is a general-purpose technique for generating fair samples from a probability in high-dimensional space. The idea of the MCMC is simple. Construct an ergodic Markov chain whose stationary distribution is consistent with the target distribution. Then simulate the Markov chain based on sampling, and the sample obtained by the long-term Markov simulation can be regarded as a sample drawn from the stationary distribution, i.e., the target distribution. In MCMC, a Markov chain should be constructed such that its stationary distribution is the probability distribution from which we want to generate samples. There are a variety of standard MCMC algorithms; Gibbs sampling [1] and Metropolis-Hastings algorithm [8]. Here we summarize Gibbs sampling. Given an arbitrary starting value x0 = (x01 , . . . , x0n ), let p(x) = p(x1 , . . . , xn ) denote a joint density, and let p(xi |x−i ) = p(xi |x1 , . . . , xi−1 , xi+1 , . . . , xn ) denote the induced full conditional densities for each of the components xi . The Gibbs sampling algorithm is often presented in the following [1]: Repeat for j = 0, 1, . . . , N − 1. (j+1) (j) Sample y1 = x1 from p(x1 |x−1 ). (j+1) (j+1) (j) (j) from p(x2 |x1 , x3 , . . . , xn ). Sample y2 = x2 . . . (j+1)
Sample yi = xi
(j+1)
Sample yn = xn
3.2
(j+1)
from p(xi |x1 . . .
(j+1)
(j)
(j)
, . . . , xi−1 , xi+1 , . . . , xn ).
(j+1)
from p(xn |x−n ).
Software Testing Model
Before describing MCMCRT, we discuss how to represent software testing activities as a parametric probability model. In fact, since MCMCRT is essentially built on statistical parameter estimation, its fault-detection capability depends on the underlying software testing model used to generate test cases. In this paper, we introduce Bayesian networks (BNs) to build effective software testing models. BNs are annotated directed graphs that encode probabilistic relationships among distinctions of interest in an uncertain-reasoning problem. BNs enable an effective representation and computation of the joint probability distribution over a set of random variables. BNs derive from Bayesian statistical methodology, which is characterized by providing a formal framework for the combination of data with the judgments of experts such as software testers. A BN is an annotated graph that represents a joint probability distribution over a set of random variables V which consists of n discrete variables X1 , . . . , Xn . The network is defined by a pair B =< G, Ξ >, where G is the directed acyclic graph. The second component Ξ denotes the set of parameters
Markov Chain Monte Carlo Random Testing -1
-1
-1
1
-1
-1
1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
1
-1
-1
-1
-1
-1
-1
1
451
Fig. 2. Input domain model
(a)
(b)
(c)
Fig. 3. Neighborhood relationship of input
of the network. This set contains the parameter ξxi |Φi = PB (xi |Φi ) for each realization xi of Xi conditioned on Φi , the set of parents of Xi in G. PB (X1 , . . . , Xn ) =
n i=1
PB (Xi |Φi ) =
n
ξXi |Φi .
(3)
i=1
Consider a software testing model using BNs. As a simple representation, this paper introduces a two-dimensional n-by-n input domain model, where each node indicates an input for the software. Assume that each input, node shows in Fig. 2, has unique state T(i,j) = {1, −1}, i, j = 1, 2, . . . , n, where -1 means this input chosen as test case is successfully executed, and 1 means this input chosen as test case causes a failure. Based on the input domain model, the problem of finding a failure-causing input is reduced into finding the node having the highest probability that the state is 1. Define the test result as T . According to the Bayes rule, we have p(T(1,1) , T(1,2) , . . . , T(n,n) |T ) ∝ p(T |T(1,1) , T(1,2) , . . . , T(n,n) ) ×p(T(1,1) , T(1,2) , . . . , T(n,n) ),
(4)
where ∝ means the proportional relationship. We assume that each input in the input domain has four neighbors, as shown in Fig. 3(a). For each node, we can get the marginal posterior probability:
452
B. Zhou, H. Okamura, and T. Dohi
P (T(i,j) = 1|T(i−1,j) = t1 , T(i+1,j) = t2 , T(i,j−1) = t3 , T(i,j+1) = t4 ) ∝ P (T(i−1,j) = t1 |T(i,j) = 1)P (T(i+1,j) = t2 |T(i,j) = 1) ×P (T(i,j−1) = t3 |T(i,j) = 1)P (T(i,j+1) = t4 |T(i,j) = 1)P (T (i, j) = 1). (5) This means that whether the input has fault is related to if the neighbors have faults. In detail, the conditional probability is given by P (T = t|S = 1) =
exp(ξ1 t) , exp(ξ1 t) + exp(ξ1 t¯)
(6)
and P (T = t|S = −1) =
exp(ξ2 t) , exp(ξ2 t) + exp(ξ2 t¯)
(7)
where S is one of the neighbor inputs, t¯ is the reverse of t. When ξ1 = −ξ2 , the input domain model is equal to the well-known Ising model in physics. According to Eqs. (5)-(7), the state of input is defined by exp(βt
φ i=1
P (T(i,j) = t|TΦ(i,j) = tΦ(i,j) ) = exp(βt
φ i=1
tΦ(i,j) )
tΦ(i,j) ) + exp(β t¯
φ i=1
,
(8)
tΦ(i,j) )
where β is a constant and φ is the total number of neighbors of the input. 3.3
MCMC-RT
Similar to ART, MCMCRT utilizes the observation of previous test cases. The concept of MCMCRT is based on the Bayes approach to parametric models for software testing, and can utilize the prior knowledge and the information on preceding test outcomes as their model parameters. Thus different software testing models provide different concrete MCMCRT algorithms. In the framework of input domain model, MCMCRT is to choose the input which has the highest probability of a failure as a test case based on Bayesian estimation. Therefore the first step of MCMCRT is to calculate the state probability of each input by using MCMC with prior information and the information on preceding test outcomes. If we know the fact that failure-causing inputs make a cluster, the probabilities of the neighbors of a successful input are less than the others. Such the probability calculation in MCMCRT is similar to the distance calculation in ART. The concrete MCMCRT steps in the case of input domain model is as follows: Step 1: Construct the input domain model and define the initial state of each node in such model. Step 2: Repeat the following steps k times and return (MCMC Step). Step 2-1: Choose one node randomly from the input domain.
Markov Chain Monte Carlo Random Testing
453
Step 2-2: According to Eq. (8), calculate the fault existing probability P of the node chosen. Step 2-3: Generate a random number u from U(0, 1). If P < u, set the state of node chosen to 1, which means fault exists. Otherwise, set the state to -1, which means no fault exists. Step 3: Select the node which has state 1 randomly from the input domain as the test case. Step 4: Execute the test scheme, according to the test result, if there is no fault found, set the state of node to -1 and return step 2 until reveal the first failure or reach the stopping condition. In the context of test case selection, MCMCRT has been designed as a more effective replacement for random testing. Given that MCMCRT retains most of the virtues of random testing, and offers nearly optimum effectiveness. MCMCRT follows the random testing with two important extensions. First, test cases selected from the input domain are probabilistically generated based on a probability distribution that represents a profile of actual or anticipated use of the software. Second, a statistical analysis is performed on the test history that enables the measurement of various probabilistic aspects of the testing process. The main problem of MCMCRT is test case generation and analysis. A solution to the problem is achieved by constructing a model to obtain the test cases and by developing an informative analysis of the test history.
4
Numerical Experiments
In this section, we investigate the fault-detection capabilities of MCMCRT, compared to existing method. In our experiments, we assumed that the input domain was square and the size was n-by-n. Failure rate, denoted by θ, is defined by the ratio of the number of failure-causing inputs to the number of all possible inputs. F-measure refers to the number of tests required to detect the first program failure. For each experiment, failure rate θ and failure pattern were fixed. Then a failure-causing region of θ was randomly located within the input domain. With regard to the experiments for point patterns, failure-causing points were randomly located in the input domain. The total number of failure-causing points is equivalent to the corresponding failure rate. A narrow strip and a single square of size equivalent to the corresponding failure rate were used for strip patterns and block patterns, respectively. Here we examine failure-detection capabilities of RT, ART and MCMCRT. In this numerical experiments, RT is a little different from ordinary RT by avoiding selection of the already examined test cases. DART and RRT are executed by using the algorithm described in [4]. The parameter β of MCMCRT is examined in four situations; 1, -0.6, -1 and -2, since we found that it is more effective to detect a failure-causing input. In the experiments, we perform k = 1000 times MCMC steps to update the state of inputs. We also consider several variants of the input domain model as Fig. 3(b)
454
B. Zhou, H. Okamura, and T. Dohi
Table 1. F-measure results for input domain size n = 40 and failure rate θ = 0.00125 point 482 527 482 β=1 502 a β = −0.6 367 β = −1 307 β = −2 412 β=1 506 MCMCRT b β = −0.6 243 β = −1 394 β = −2 338 β=1 487 β = −0.6 410 c β = −1 255 β = −2 264 RT DART RRT
and Fig. 3(c) show. For each combination of failure rate and failure pattern, 100 test runs were executed and the average F-measure for each combination was recorded. Tables 1 and 2 present the results of our experiments, where a, b, c correspond to the shape of input domain model in Fig. 3(a)-(c), respectively. It is clear from these results that software testing using MCMC method offers the considerable improvements in effectiveness over random testing and the failure-finding efficiency of the MCMCRT is close to that of the ART.
Markov Chain Monte Carlo Random Testing
5
455
Conclusion
RT is a fundamental testing technique. It simply selects test cases randomly from the whole input domain and can effectively detect failures in many applications. ART was then proposed to improve on the fault-detection capability of RT. Previous investigations have demonstrated the ART requires fewer test cases to detect the first failure than RT. It should be noted that extra computations are required for ART to ensure an even spread of test cases, and hence ART may be less cost-effective than RT. We have proposed MCMCRT to improve the efficiency of failure-finding capability. The original motivation behind MCMCRT was to use statistical model to develop the test case generation because the probabilities of failure-causing inputs are not evenly spread in the input domain. Failures attached to relatively high-probability test cases will impact the testing stochastic process more than failures attached to lower-probability test cases. We constructed the input domain model and used MCMC method to find the inputs having high probabilities of a failure. Currently, MCMCRT has been applied only to factitious input domains. Ongoing investigation, and future research, we plan to examine the performance of MCMCRT in source codes of real program. Since ART needs the definition of distance to generate test cases, in some real programs, it is difficult to calculate the distance. In such situation, we believe that MCMCRT is a better choice since MCMC method just calculate the failure including probabilities of inputs. In summary, in this paper we have presented a new random testing scheme based on MCMC method and constructed the concrete algorithm of MCMCRT on the input domain model. According to the algorithm, we generate test cases using the information of previous test cases. Several numerical experiments were presented and they exhibited that MCMCRT had an F-measure comparable to that of ART. In future research, we plan to perform MCMCRT in another input domain model based on Bayesian network, since the Ising model is insufficient to represent actual software testing activities. Further, we will discuss the software reliability evaluation according to MCMCRT.
References 1. Brooks, S.P.: Markov chain Monte Carlo method and its application. Journal of the Royal Statistical Society, Series D (The Statistician) 47(1), 69–100 (1998) 2. Chan, K.P., Chen, T.Y., Mak, I.K., Yu, Y.T.: Proportional sampling strategy: guidelines for software testing practitioners. Information and Software Technology 38(12), 775–782 (1996) 3. Chan, K.P., Chen, T.Y., Towey, D.: Normalized restricted random testing. In: Rosen, J.-P., Strohmeier, A. (eds.) Ada-Europe 2003. LNCS, vol. 2655, pp. 368– 381. Springer, Heidelberg (2003) 4. Chen, T.Y., Huang, D.H., Tse, T.H., Yang, Z.: An innovative approach to tackling the boundary effect in adaptive random testing. In: Proceedings of the 40th Annual Hawaii International Conference on System Sciences, p. 262a (2007)
456
B. Zhou, H. Okamura, and T. Dohi
5. Chen, T.Y., Leung, H., Mak, I.K.: Adaptive random testing. In: Maher, M.J. (ed.) ASIAN 2004. LNCS, vol. 3321, pp. 320–329. Springer, Heidelberg (2004) 6. Chen, T.Y., Merkel, R.G.: Quasi-random testing. IEEE Transactions on Reliability 56(3), 562–568 (2007) 7. Chen, T.Y., Tse, T.H., Yu, Y.T.: Proportional sampling strategy: a compendium and some insights. Journal of Systems and Software 58(1), 65–81 (2001) 8. Chib, S., Greenberg, E.: Understanding the Metropolis-Hastings algorithm. The American Statistician 49(4), 327–335 (1995) 9. Ferguson, R., Korel, B.: The chaining approach for software test data generation. ACM Transactions on Software Engineering and Methodology 5(1), 63–86 (1996) 10. Korel, B.: Automated software test data generation. IEEE Transactions on Software Engineering 16(8), 870–879 (1990) 11. Korel, B.: Dynamic method for software test data generation. Journal of Software Testing, Verification and Reliability 2(4), 203–213 (1992)
An Integrated Approach to Detect Fault-Prone Modules Using Complexity and Text Feature Metrics Osamu Mizuno1 and Hideaki Hata2 1
Kyoto Institute of Technology, Matsugasaki GoshoKaido-cho, Sakyo-ku, Kyoto 606-8585, Japan [email protected] http://se.is.kit.ac.jp/ 2 Osaka University, 1-5 Yamadaoka, Suita, Osaka 565-0871, Japan
Abstract. Early detection of fault-prone products is necessary to assure the quality of software product. Therefore, fault-prone module detection is one of the major and traditional area of software engineering. Although there are many approaches to detect fault-prone modules, they have their own pros and cons. Consequently, it is recommended to use appropriate approach on the various situations. This paper tries to show an integrated approach using two different fault-prone module detection approaches. To do so, we prepare two approaches of fault-prone module detection: a text feature metrics based approach using naive Bayes classifier and a complexity metrics based approach using logistic regression. The former one is proposed by us and the latter one is widely used approach. For the data for application, we used data obtained from Eclipse, which is publicly available. From the result of pre-experiment, we find that each approach has the pros and cons. That is, the text feature based approach has high recall, and complexity metrics based approach has high precision. In order to use their merits effectively, we proposed an integrated approach to apply these two approaches for fault-prone module detection. The result of experiment shows that the proposed approach shows better accuracy than each approach.
1 Introduction Fault-prone module detection is one of the most traditional and important areas in software engineering. In order to improve the software process from the viewpoint of product quality, detection of fault-prone modules has quite important role in the improvement activity. Therefore, studies to detect fault-prone modules have been widely conducted so far [1, 2, 3, 4]. Most of these studies used some kind of software metrics, such as program complexity, size of modules, or object-oriented metrics, and constructed mathematical models to calculate fault-proneness. Machine learning approaches have been used for fault-prone module detection recently. Use of machine learning approaches induces development of new software metrics for fault-prone module detection [5]. Thus, several new metrics suites have been proposed so far. For example, Layman et al. showed that change history data are effective for fault-prone module detection [6]. Kim et al. proposed a notion of “memories of T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 457–468, 2010. c Springer-Verlag Berlin Heidelberg 2010
458
O. Mizuno and H. Hata
bug fix” and showed that such memories of bug fix deeply related to the existence of faults in a module [7]. On the other hand, we have introduced a text feature based approach to detect faultprone modules [8, 9]. In this approach, we extract text features from the frequency information of words in source code modules. In other words, we construct a large metrics set representing the frequency of words in source code modules. Once the text features are obtained, the Bayesian classifier is constructed from text features. In faultprone module detection of new modules, we also extract text features from source code modules, and Bayesian model classifies modules into either fault-prone (FP) or nonfault-prone (NFP). Since less effort or cost needed to collect text feature metrics than other software metrics, it may be applied to the agile software development process easily. Although there are many approaches to detect fault-prone modules, they have their own pros and cons. Consequently, it is recommended to use appropriate approach on the various situations. This paper tries to show an integrated approach using two different fault-prone module detection approaches. To do so, we prepare two approaches of fault-prone module detection: a text feature metrics based approach using naive Bayes classifier [9] and a complexity metrics based approach using logistic regression [10]. From the result of pre-experiment, we find that each approach has the pros and cons. That is, the text feature based approach has high recall, and complexity metrics based approach has high precision. In order to use their merits effectively, we proposed an integrated approach to apply these two approaches for fault-prone module detection. The result of experiment shows that the proposed approach shows better accuracy than each approach.
2 Metrics for Fault-Prone Detection 2.1 Complexity Metrics In order to conduct a comparative study, we prepared a data set obtained from Eclipse project by Zimmermann [10, 11], which is called promise-2.0a data set. In the data set, 31 complexity metrics as well as the number of pre- and post-release faults are defined and collected. Although promise-2.0a includes metrics collected from both files and packages, we used the metrics from files for this study. One of advantages to use promise-2.0a from Eclipse is that it is publicly available on the Web. Many researchers can use the same data set and compare their approaches. The complexity metrics are shown in Table 1. There are 5 metrics from the viewpoint of methods, 4 metrics from the viewpoint of classes, and 4 metrics from the viewpoint of files. As for the metrics related to methods and classes, statistical values such as average, max, and total are collected. Consequently, there are 31 kind of metrics data in the data set. The data set includes the values of metrics shown in subsection 2.1 and fault data collected by the SZZ algorithm [12] for each class file, that is, software modules. An overview of the data for a software module is shown in Table 2. The data is obtained from Eclipse version 2.0, 2.1, and 3.0. The number of modules in Eclipse 2.0, 2.1 and
An Integrated Approach to Detect Fault-Prone Modules
459
Table 1. Complexity metrics in the Eclipse data set [10]
methods FOUT MLOC NBD PAR VG classes NOF NOM NSF NSM files ACD NOI NOT TLOC
Metric Number of method calls (fan out) Method lines of code Nested block depth Number of parameters McCabe cyclomatic complexity Number of fields Number of methods Number of static fields Number of static methods Number of anonymous type declarations Number of interfaces Number of classes Total lines of code
Table 2. An overview of promise-2.0a data set name type plugin string A plugin name file string A file name pre integer Number of pre-release faults post integer Number of post-release faults ACD integer Metric ACD FOUT avg real Average of metric FOUT FOUT max integer Maximum of metric FOUT FOUT sum integer Total number of FOUT .. .. .. . . .
3.0 are 6,729, 7,888 and 10,593, respectively. Hereafter, we call data sets from Eclipse C C C 2.0, 2.1, and 3.0 as M20 , M21 , and M30 , respectively. As shown in Table 2, two kinds of fault data are collected. Here, we used the number of post-release faults, post, to determine whether the class file is faulty or not. Concretely speaking, if post > 0, the class file considered as faulty; otherwise non-faulty. 2.2 Text Features We have proposed a text feature based approach to detect fault-prone modules [9]. In this approach, text features are extracted from source code that is removed comment. This means that everything except for comment words separated by space or tab can be treated as a feature. The number of each text feature is counted per module. For replication of experiment, the Weka data mining toolkit [13] is used in this paper. To extract features properly, every variable, method name, function name, keyword, and operator connecting without a space or tab is separated. Since using all features requires much time and memory, the approximate number of features used can be determined by setting options in Weka. This option is intended to discard other less useful features.
460
O. Mizuno and H. Hata Table 3. Number of faulty modules for each Eclipse version Eclipse version # modules 2.0 post > 0 (faulty) 975 post = 0 (non-faulty) 5,754 Total 6,729 2.1 post > 0 (faulty) 868 post = 0 (non-faulty) 7,020 Total 7,888 3.0 post > 0 (faulty) 1,568 post = 0 (non-faulty) 9,025 Total 10,593
These text features can be regarded as one of the metrics Num(termi ), where termi represents ith text features. Text feature metrics are very large-scale compared with other complexity metrics suite. Furthermore, one of the large difference of text feature metrics with other metrics suite is that text features are “customized” for target projects. That is, text features have to be extracted from training data and should be applied to test data in one target project. It is difficult to reuse one text feature metrics suit to the other project. In this study, since the promise-2.0a has file name entry, we can easily obtain corresponding source codes from source code archives. We then extracted 1,614 kinds of text features from all source code modules in Eclipse 2.0, 2.1, and 3.0 to compare with promise-2.0a. Figure 1 shows a part of extracted text features from Eclipse. Hereafter, T we call data sets of text feature metrics extracted from Eclipse 2.0, 2.1, and 3.0 as M20 , T T M21 , and M30 , respectively.
3 Fault-Prone Module Detection Approaches For the fault-prone module detection, we used two models: Logistic regression and Naive Bayes classifier. Both of them are implemented in Weka data mining toolkit [13]. 3.1 Logistic Regression with Complexity Metrics Logistic regression, a standard classification technique in the experimental sciences, has already been used in software engineering to predict fault-prone components [14,15,16]. A logistic model is based on the following equation: P (Y |x1 , · · · , xn ) =
eb0 +b1 x1 +···+bn xn 1 + eb0 +b1 x1 +···+bn xn
where x1 , · · · , xn are explanatory variables in the model, and Y is a binary dependent variable which represents whether or not a module is fault-prone. P is the conditional probability that Y = 1(i.e. a module is fault-prone) when the values of x1 , · · · , xn are determined. Coefficients are estimated by the maximum likelihood method using the data for training.
An Integrated Approach to Detect Fault-Prone Modules
Fig. 1. A part of extracted text features from Eclipse source code
3.2 Naive Bayes Classifier The naive Bayes classifier classifies a module as follows: argmax C∈{F P,N F P }
P (C)
n
P (mi |C),
i=1
where C is a class, which is FP or NFP, and P (C) is the prior probability of class C and P (mi |C) is the conditional probability of a metric mi given class C. Menzies et al. reported that defect predictors using naive Bayes achieved standout good results compared with OneR, J48 in their experiment using the Weka [4]. 3.3 Integrated Approach The basic idea of integrated approach for fault-prone detection is simple. We guess that there are four kind of approaches for fault-prone module detection: (1) high capability to find faults but low cost effectiveness, (2) low capability to find faults but high cost effectiveness, (3) high capability to find faults and high cost effectiveness, and (4) low capability to find faults and low cost effectiveness. If we find an approach (3), it is the best solution! However, unfortunately, most of fault-prone module detection approaches is in either (1), (2), and (4). Practically, since faults in the source code is not uniformly distributed, we can apply an approach of (1)
462
O. Mizuno and H. Hata Table 4. Classification result matrix Classified non-fault-prone fault-prone Actual non-faulty True negative (TN) False positive (FP) faulty False negative (FN) True positive (TP)
to the code modules with more faults and an approach of (2) to the code modules with less faults. To determine whether the code modules have many faults or not, we used the metric pre in Table 2. pre is the number of pre-release faults for a module. The number of prerelease faults is a known metric before fault-prone module prediction. If a module have many pre-release faults, the module seems to include many post-release faults, too.
4 Experiments 4.1 Procedure of Pre-experiment In order to see the pros and cons of both approaches, we conducted two pre-experiments as follows: E1 : Fault-prone module prediction approach in [10] using complexity metrics and logistic regression. This approach is previously proposed by Zimmermann. E2 : Fault-prone module prediction approach using the text feature metrics and naive Bayesian classifier [9]. This approach is previously proposed by Hata. For each experiment, three classifications with different training and test data are performed. We used data of Eclipse 2.1 for training and data of Eclipse 3.0 for testing. That ∗ is, we construct a fault prediction model using the data M21 , and test the constructed ∗ model using the data M30 . The procedure of E1 is as follows: C 1. Build a logistic model from a training data set, M21 . C 2. Classify a test data set, M30 , by the constructed logistic model.
The procedure of E2 is as follows: T . 1. Build a naive Bayes classifier from the text feature metrics for training, M21 T 2. Classify a test data set, M30 , by the constructed naive Bayes classifier.
4.2 Evaluation Measures Table 4 shows a classification result matrix. True negative (TN) shows the number of modules that are classified as non-fault-prone, and are actually non-faulty. False positive (FP) shows the number of modules that are classified as fault-prone, but are actually non-faulty. On the contrary, false negative shows the number of modules that are classified as non-fault-prone, but are actually faulty. Finally, true positive shows the number of modules that are classified as fault-prone which are actually faulty.
An Integrated Approach to Detect Fault-Prone Modules
463
Table 5. E1 : Predicted result by logistic regression model using complexity metrics [10] Classified non-fault-prone fault-prone Actual non-faulty 8,939 86 faulty 1,350 218 Precision 0.717 Recall 0.139 Accuracy 0.864 F1 0.233
In order to evaluate the results, we prepare two measures: recall, precision. Recall is the ratio of modules correctly classified as fault-prone to the number of entire faulty modules Recall is defined as follows: Recall =
TP . TP + FN
Precision is the ratio of modules correctly classified as fault-prone to the number of entire modules classified fault-prone. Precision is defined as follows: Precision =
TP . TP + FP
Accuracy is the ratio of correctly classified modules to the entire modules. Accuracy is defined as follows: Accuracy =
TP + TN . TN + TP + FP + FN
Since recall and precision are in the trade-off, F1 -measure is used to combine recall and precision. F1 -measure is defined as follows: F1 =
2 × recall × precision . recall + precision
In this definition, recall and precision are evenly weighed. 4.3 Result of Pre-experiments The results of experiments are summarized in Tables 5 and 6. Table 5 shows the result of E1 , the result using logistic regression with complexity metrics [10]. Table 6 shows the result of E2 , the result using naive Bayes classifier with text feature metrics. In each table, (a), (b), (c) shows different training and testing pairs. We can observe that trends of prediction are opposite between complexity based and text feature based approaches. That is, complexity based one achieves high precision, low recall, and relatively high accuracy. On the other hand, text feature based one achieves low precision, high recall, and relatively low accuracy.
464
O. Mizuno and H. Hata Table 6. E2 : Predicted result by Bayesian classifier using text features Classified non-fault-prone fault-prone Actual non-faulty 6,539 2,486 faulty 594 974 Precision 0.282 Recall 0.621 Accuracy 0.709 F1 0.387
As shown in Tables 5 and 6, E2 has larger F1 . This implies that the prediction of fault-prone module by the text feature metrics is more balanced than that of complexity metrics. Let us investigate the result in more detail. We can see that the complexity metrics based approach in E1 tend to predict modules to be non-fault-prone. On the other hand, text feature metrics based approach in E2 tend to predict modules to be fault-prone. This difference seems to derived from the amount of metrics used in both approaches. The number of metrics used in E1 is 31, while the number of metrics used in E2 is 1,614. Since less information of fault-prone modules used for the prediction, the predicted result leans toward non-fault-prone, we guess. Consequently, in E1 , the complexity metrics based approach achieves higher precision. This implies that if this approach predicts a module as “fault-prone”, the decision will be correct with high probability. However, this approach will miss more than 80% of actual faulty modules. On the other hand, in E2 , the text feature metrics based approach achieves higher recall. This implies that if this approach predicts a module as “fault-prone”, 65% of actual faulty modules are covered by the prediction. However, this the correctness of the prediction is not so high. Actually, we have 3 or 4 wrong prediction to get one correct prediction for “fault-prone”. 4.4 Experiment for the Integrated Approach E3 : Fault-prone module prediction approach using both complexity metrics based and text feature based approaches. This approach is proposed in this study. The procedure of E3 is as follows: C 1. Build a logistic model from a training data set, M21 . T 2. Build a naive Bayes classifier from the text feature metrics for training, M21 . 3. Identify plugins that seems to include more faults using the number of pre-release ∗ ∗1 ∗2 faults in Eclipse 3.0. We then divide test data, M30 , into two sets: M30 and M30 , which is estimated to include many faults, and not, respectively. C2 4. Classify a test data set, M30 , by the constructed logistic model. T1 5. Classify a test data set, M30 , by the constructed naive Bayes classifier.
Here, we apply the integrated approach to the same data source. To do so, we first identify which part of source code is more faults injected.
An Integrated Approach to Detect Fault-Prone Modules
465
Table 7. Number of pre-release faults for each plugin in Eclipse 3.0 Rank 1 2 3 4 5 6 7 8 9 10 11
Table 7 shows a list of the number of total pre-release faults for each plugin in Eclipse 3.0. Eclipse 3.0 includes 70 plugins and there are 7,422 pre-release faults found. We can see that 5,698 faults, that is, 77% of total, are detected from top 10 plugins. We thus estimate that these top 10 plugins have more post-release faults than others. Therefore, we apply the text feature based approach to these 10 plugins and the complexity metrics based approach to the rest 60 plugins. The result of application is shown in Table 8. The upper row shows the prediction by the complexity metrics based approach, and the lower row shows that by the text feature metrics. The evaluation measures for each approach is also attached in the table. As we can see in Table 8, the main purpose of the integrated approach is achieved. That is, the evaluation measures become more balanced than that of original approaches. Especially, from the viewpoint of F1 , the integrated approach achieves better F1 value than that of both complexity and text feature approaches in Tables 5 and 6. We can say that the integrated approach can be a cost-effective approach to detect fault-prone modules in practice.
5 Threats to Validity The threats to validity are categorized into four categories as in [17]: external, internal, conclusion, and construction validities. External validity mainly includes the generalizability of the proposed approach. For this study, since we applied Eclipse data set only, there are a certain degree of threats to external validity. One of the construction validity threats is the collection of faultprone modules from open source software projects. Since the data used in this study is publicly available one [10], we share the construction validity with the studies using the same data. A development of more precise ways to collect faulty modules from software repository mitigates this threat.
466
O. Mizuno and H. Hata
Table 8. E3 : Predicted result by the integrated approach. (up: complexity metrics, down: text feature metrics).
Actual non-faulty faulty Complexity metrics (Upper row)
As for internal validity, we cannot find any threats to conclusion validity in our study at this point. The way of statistical analysis usually causes threats to conclusion validity. We cannot find any threats to conclusion validity in our study at this point.
6 Related Work Much research on detection of fault prone software modules has been carried out so far. Munson and Khoshgoftaar used software complexity metrics and the logistic regression analysis to detect fault-prone modules [16]. Basili et al. also used logistic regression for detection of fault-proneness using object-oriented metrics [14]. Fenton et al. proposed a Bayesian Belief Network based approach to calculate the fault-proneness [18]. In addition to them, various approaches have been carried out such as Neural network [19, 20], zero-inflated Poisson regression [21], decision trees [22], linear discriminant analysis [23, 24], and so on. On the other hand, data mining based approaches have been carried out. Menzies et al. used the result of static code analysis as detectors of fault-prone code [4]. Stoerzer et al. tried to find failure inducing changes from dynamic analysis of Java code [25]. Hassan and Holt computed the ten most fault-prone modules after evaluating four heuristics: most frequently modified, most recently modified, most frequently fixed, and most recently fixed [26]. Kim et al. have tried to detect fault density of entities using previous faults localities based on the observation that most faults do not occur uniformly [27]. Ratzinger et al. [28] investigated that interrelationship between previous refactoring and future software defects. Zimmermann et al. collected a set of complexity metrics of both faulty modules and non-faulty modules from Eclipse project and make the data
An Integrated Approach to Detect Fault-Prone Modules
467
publicly available. They also show the result of fault-prone module detection using logistic regression [10].
7 Conclusion In this study, we propose an integrated approach for fault-prone module detection using two different approaches: a complexity-metrics-based approach and a text-featuremetrics-based approach. As future work, we have to compare the effort or cost to collect text feature metrics and conventional software metrics. Besides, we have to apply our approach to not only open source software development but also to actual development in industries. In addition, further investigation of misclassified modules will contribute to improvement of accuracy.
Acknowledgements This research is partially supported by the Japan Ministry of Education, Science, Sports and Culture, Grant-in-Aid for Young Scientists (B), 20700025, 2009.
References 1. Briand, L.C., Melo, W.L., Wust, J.: Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. on Software Engineering 28(7), 706– 720 (2002) 2. Khoshgoftaar, T.M., Seliya, N.: Comparative assessment of software quality classification techniques: An empirical study. Empirical Software Engineering 9, 229–257 (2004) 3. Bellini, P., Bruno, I., Nesi, P., Rogai, D.: Comparing fault-proneness estimation models. In: Proc. of 10th IEEE International Conference on Engineering of Complex Computer Systems, pp. 205–214 (2005) 4. Menzies, T., Greenwald, J., Frank, A.: Data mining static code attributes to learn defect predictors. IEEE Trans. on Software Engineering 33(1), 2–13 (2007) 5. Catal, C., Diri, B.: Review: A systematic review of software fault prediction studies. Expert Syst. Appl. 36(4), 7346–7354 (2009) 6. Layman, L., Kudrjavets, G., Nagappan, N.: Iterative identification of fault-prone binaries using in-process metrics. In: Proc. of 2nd International Conference on Empirical Software Engineering and Measurement, September 2008, pp. 206–212 (2008) 7. Kim, S., Pan, K., Whitehead Jr., E.E.J.: Memories of bug fixes. In: Proc. of 14th ACM SIGSOFT international symposium on Foundations of software engineering, pp. 35–45. ACM, New York (2006) 8. Mizuno, O., Kikuno, T.: Training on errors experiment to detect fault-prone software modules by spam filter. In: Proc. of 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, pp. 405–414 (2007) 9. Hata, H., Mizuno, O., Kikuno, T.: Fault-prone module detection using large-scale text features based on spam filtering. Empirical Software Engineering (September 2009), doi:10.1007/s10664–009–9117–9
468
O. Mizuno and H. Hata
10. Zimmermann, T., Premrai, R., Zeller, A.: Predicting defects for eclipse. In: Proc. of 3rd International Workshop on Predictor models in Software Engineering (2007) 11. Boetticher, G., Menzies, T., Ostrand, T.: PROMISE Repository of empirical software engineering data repository, West Virginia University, Department of Computer Science (2007), http://promisedata.org/ ´ 12. Sliwerski, J., Zimmermann, T., Zeller, A.: When do changes induce fixes? (on Fridays). In: Proc. of 2nd International workshop on Mining software repositories, pp. 24–28 (2005) 13. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005) 14. Basili, V.R., Briand, L.C., Melo, W.L.: A validation of object oriented metrics as quality indicators. IEEE Trans. on Software Engineering 22(10), 751–761 (1996) 15. Briand, L.C., Basili, V.R., Thomas, W.M.: A pattern recognition approach for software engineering data analysis. IEEE Trans. on Software Engineering 18(11), 931–942 (1992) 16. Munson, J.C., Khoshgoftaar, T.M.: The detection of fault-prone programs. IEEE Trans. on Software Engineering 18(5), 423–433 (1992) 17. Wohlin, C., Runeson, P., H¨ost, M., Ohlsson, M.C., Regnell, B., Wessl´en, A.: Experimentation in software engineering: An introduction. Kluwer Academic Publishers, Dordrecht (2000) 18. Fenton, N.E., Neil, M.: A critique of software defect prediction models. IEEE Trans. on Software Engineering 25(5), 675–689 (1999) 19. Gray, A.R., McDonell, S.G.: Software metrics data analysis - exploring the relative performance of some commonly used modeling techniques. Empirical Software Engineering 4, 297–316 (1999) 20. Takabayashi, S., Monden, A., Sato, S., Matsumoto, K., Inoue, K., Torii, K.: The detection of fault-prone program using a neural network. In: Proc. of International Symposium on Future Software Technology, Nanjing, October 1999, pp. 81–86 (1999) 21. Khoshgoftaar, T.M., Gao, K., Szabo, R.M.: An application of zero-inflated poisson regression for software fault prediction. In: Proc. of 12th International Symposium on Software Reliability Engineering, pp. 66–73 (1999) 22. Khoshgoftaar, T.M., Allen, E.B.: Modeling software quality with classification trees. Recent Advances in Reliability and Quality Engineering, 247–270 (1999) 23. Ohlsson, N., Alberg, H.: Predicting fault-prone software modules in telephone switches. IEEE Trans. on Software Engineering 22(12), 886–894 (1996) 24. Pighin, M., Zamolo, R.: A predictive metric based on statistical analysis. In: Proc. of 19th International Conference on Software Engineering, pp. 262–270 (1997) 25. Stoerzer, M., Ryder, B.G., Ren, X., Tip, F.: Finding failure-inducing changes in java programs using change classification. In: Proc. of 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 57–68. ACM Press, New York (2006) 26. Hassan, A.E., Holt, R.C.: The top ten list: Dynamic fault prediction. In: Proc. of 21st IEEE International Conference on Software Maintenance, Washington, DC, USA, pp. 263–272. IEEE Computer Society, Los Alamitos (2005) 27. Kim, S., Zimmermann, T., Whitehead Jr., E.J., Zeller, A.: Predicting faults from cached history. In: Proc. of 29th International Conference on Software Engineering, Washington, DC, USA, pp. 489–498. IEEE Computer Society, Los Alamitos (2007) 28. Ratzinger, J., Sigmund, T., Gall, H.: On the relation of refactorings and software defect prediction. In: Proc. of 5th International workshop on Mining software repositories, pp. 35–38. ACM, New York (2008)
An Effective Video Steganography Method for Biometric Identification* Yingqi Lu1, Cheng Lu1, and Miao Qi2,** 1
School of Computer Science and Technology, Jilin University, China 2 School of Computer Science and Information Technology, Northeast Normal University, China [email protected]
Abstract. This paper presents an effective video steganography method to protect the transmitted biometric data for secure personal identification. Unlike the most of existing biometric data hiding methods, the hiding content in this work is an image set for guaranteeing the valid identification, but not a single image or feature vector. On the basis of human visual system (HVS) model, both interframe and intra-frame motion information are considered to make the hiding method more invisible and robust. Instead of embedding the data redundantly for resisting attacks, the biometric image set is embedded into the video sequence discretely using discrete wavelet transform (DWT). Specially, the sequence number of each frame is embedded into the corresponding frame as a watermark for detecting the integrity of stego-video and guaranteeing the exactly extraction of the biometric image set. Finally, the extracted image set is identified by the biometrics system. Extensive experimental results demonstrate that the proposed method can achieve perfect imperceptibility and good robustness, and guarantee the secure biometrics. Keywords: Video Steganography, Human Visual System, Discrete Wavelet Transform, Biometrics.
1 Introduction Nowadays, biometrics has been widely recognized as an effective and reliable identify recognition technique. The rapid development of network and multimedia promotes the extensive applications of biometrics. It is well known that the transmitted information across network is easily intercepted or attacked. Once the biometric information is intercepted, the attacker might alter the content to degrade the performance of biometrics. Therefore, it is a necessary and important issue to protect the security and integrity of transmitted biometric data. Researchers have taken advantage of watermarking techniques [1-4] to handle this issue. Although the watermarking methods can make the embedding results more robust and assure the valid recognition, the security is not high enough since those * **
This work was supported by Students Innovative Pilot Scheme Project, Jilin University, China. Corresponding author.
methods used biometric image itself as the transmission carrier and the carrier was exposed to the network resulting in the suspicion and destruction easily due to the special biometric characteristics. To increase the security, [5] embedded biometric data into the three different types of cover images to divert the attention of attackers. Literature [6] presented the content-based hiding method in which the iris template was embedded into the public cover image. However, like most existing methods, the content of each transmission is single biometric data but not data set. If the biometric data is attacked severely and cannot be used as secure data, it must be re-transmission for effective recognition. Obviously, this will increase transmission cost and make the recognition prolonged. To overcome this drawback, literature [7] proposed a biometric data watermarking scheme based on video stream. In their study, the fingerprint feature image was embedded into the video redundantly based on the combination of DWT and LSB for obtaining stronger robustness. However, the motion information of video was not considered in the process of embedding. In this paper, we propose an effective image set based video steganography for secure biometric identification. Unlike existing biometric data hiding methods, the hiding content is an image set but not a single biometric image or feature vector to make the method more practicable. To improve the perceptual invisibility and robustness of the hiding method, the biometric data is embedded into the video sequence discretely through considering the motion information. Particularly, before embedding the biometric data, a watermark formed by sequence number of each frame is embedded into the corresponding frame to guarantee the extraction of biometric data exactly. In other words, for assuring secure and effective identification performance, the watermark plays a detective and instructional role for extraction of biometric data to resist some temporal attacks. Extensive experimental results indicate the proposed method not only can achieve perfect stego-video quality, but also robust against some common attacks and guarantee the validity of biometrics. The rest of this paper is organized as follows. In Section 2, we describe the proposed video steganography method in detail. The extensive experiments are presented and analyzed in Section 3. Finally, the conclusions are given in Section 4.
2 The Proposed Video Steganography Method In this paper, the palmprint image set is used as the embedded biometric data for secret transmission. Some popular testing video sequences are employed as the cover database. Given a palmprint image set, a video sequence is taken randomly as the transmission carrier from the video database. First, the sequence number of each frame is embedded into the corresponding frame as a watermark, aiming to assure the exact extraction of secret data under the condition that the stego-video is submitted to some temporal attacks. Then, the motion analysis method is adopted to decide the embedding regions. Finally the biometric data is embedded into the fast motion blocks in the fast motion frames. In the following, we first describe the motion analysis method briefly. Then, the embedding and extraction of watermark and palmprint image set.
An Effective Video Steganography Method for Biometric Identification
471
2.1 Motion Analysis To make the hiding method more invisible and robust, many researchers have considered human visual system model in the procedure of embedding using the motion detection techniques [8-10]. Generally, the secret information is embedded into the moving fast regions, since human just notice the status of motion but are not sensitive to slight content change of fast motion. The temporal difference [11] is an effective method for motion detection. The approach is to take consecutive video frames to determine the absolute difference. Suppose I n is the intensity of the nth frame, then the difference image Dn of I n can be computed as:
Dn = I n − I n −1
(1)
where I n −1 is the former frame of I n . Based on temporal difference, there are two distinct methods named as frame-based method and block-based method to obtain the positions of embedding where picture contents are moving fast. Frame-based method: In the frame-based method, a frame is treated as whole and the motion activity is computed. If the sum of pixel values in the difference image Dn is greater, motion variations of consecutive video frames in the scene are significant and the motion activity is higher. Block-based method: Instead of the frame-based method which treats the frame as a whole object, the difference image Dn is divided into blocks in advance. The sum of pixel values in each block is calculated and then the motion activity of each block is obtained. Similarly, if the sum of pixel values in the block is greater, the motion activity is higher. Obviously, high motion activity value means the motion level is fast. Frame-based method considers the intra-frame motion information and block-based method depicts the inter-frame motion information. Fig. 1 shows an example of motion analysis of the video sequences ‘News’. Fig. 1(a) is the motion activity histogram using framebased method. Fig. 1(c) and (d) are the two consecutive frames ( 352 × 288 ), and (b) is the activity distribution using block-based method. As shown in Fig. 1, we can see that the motion level of each frame is different. Even in the same frame, the motion activity of each block is mutative.
(a)
(b)
(c)
(d)
Fig. 1. The motion activity analysis of both intra-frame and inter-frame
472
Y. Lu, C. Lu, and M. Qi
2.2 Watermark and Palmprint Embedding and Extraction
Given a video sequence, the sequence number of each frame as a watermark is embedded into each frame for resisting some temporal attacks and extracting the biometric image set exactly. For the simplicity and practicality, least significant bits (LSB) method [12] is used to embedding the watermarks. In our study, the discrete wavelet transform (DWT) is adapted to embed the palmprint images. The embedding processes can be described as follows.
Step 1: Embedding regions location. One level DWT is performed on each watermarked frame. According to the length of binary sequence, some frames which have higher motion activity of approximations sub-band (that is, these frames are the ones with fast motion) are selected as embedded frames. Simultaneously, some blocks which have higher motion activity in these frames are used for palmprint image set embedding. Step 2: palmprint images conversion. For convenient embedding, each pixel of palmprint image is represented by eight. The binary sequence is obtained by connecting each pixel orderly. Thus, the N secret images with the size of m × n can be converted to a binary sequence B = ( B ( s ) ∈ {0, 1}, s = 1,2, K , N × m × n × 8) . Step 3: Binary sequence embedding. Inspired by the embedding algorithm [13], the binary sequence is embedded into the approximation coefficients of DWT in the regions with higher motion activity located in Step 1. The embedding rule is Vk' ,ij = Vk ,ij + α × B( s ),
(2)
where Vk , ij is the approximation coefficient with higher motion activity of kth frame,
α is the embedding strength. The extraction process is the reverse of embedding. Given a received stego-video, the watermarks are first extracted from each frame, which is used to judge whether the stego-video is attacked in the process of transmission and the type of temporal attacks. Then, considering the watermarks, the binary sequence is extracted according to the motion activity of original video. To recover the palmprint images, the binary sequence is segmented into N group. For each group, each eight bits are composed as an item and converted to decimal form to obtain the palmprint image set for further identification.
3 Experimental Results In order to valid the effectiveness and efficiency of the proposed video steganography method, four popular test video sequences formed the video database are used as carriers for hiding the biometric image set. The first frames of the four video sequences are shown in Fig. 2 and the detailed information is list in Table 1.
An Effective Video Steganography Method for Biometric Identification
News
Pairs
Indoor
473
Highway
Fig. 2. The first frames of the four video sequences Table 1. The detailed information of test video sequences
Sequence Size Frames rate (fps) Frame number Motion level
News 352 × 288 24 300 small
Paris 352 × 288 24 300 normal
Indoor 320 × 240 24 300 normal
Highway 320 × 240 24 300 acute
The proposed method is evaluated on the palmprint database from Hong Kong Polytechnic University (http://www4.comp.polyu.edu.hk/~biometrics). In our study, 1000 palmprint from 100 individuals are used to evaluate the performances of proposed approach. Each individual provides ten images, and five images are used for training and the others for transmitted testing set. The size of palmprint image is 128 × 128 . To satisfy the requirements of practical applications and enhance the validation of identification results, the identification method based on image set is employed in our study. The locally linear discriminate embedding (LLDE) [14] algorithm is adopted for feature extraction. Given a test image set, the features of each image are matched with all the templates in template database using Euclidean distance and the nearest neighbor classifier is used to decide the class of each image. In decision stage, we adopt voting rule to confirm the final class of the image set. In our experiments, the content of each transmission is five palmprints. Thus, the length of embedded binary sequence B is 5 × 128 × 128 × 8 = 655360 bits. For the block-based motion analysis, the size of each block is set 16 × 8 , the embedding capacity of each block is 128 bits, and the first 32 blocks with the higher activity is used for embedding. As a result, the embedding capacity is 2240 bits per frame and it needs total 160 frames for embedding the whole binary sequence. The embedding strength is set α = 8 . In the following, we analysis the proposed video method from three aspects: security, imperceptibility and robustness. 3.1 Security Evaluation
The improved security of our method can be demonstrated as follows: (1) Instead of watermarking technique, steganography technique is adopted to secret communication. The video sequences used transmission carriers are public, and have no specific and visual relations to the biometric data. Thus, the possibility to be attacked is degraded.
474
Y. Lu, C. Lu, and M. Qi
(2) The hiding content is a biometric image set but not a single biometric image or feature vector. Under the condition that one or two images are destroyed, the other images still can guarantee the validity of identification results. As analyzed above, our proposed method holds high security. 3.2 Imperceptibility Evaluation
The perceptual degradation of video quality caused by the embedded image set is evaluated by Peak Signal to Noise Ratio (PSNR). In our experiment, since the carrier is selected randomly from the four video sequences for a palmprint image set, 100 testing image set need 100 video carriers. The average PSNR values between the original video sequence and stego-video are computed for each video sequence. Moreover, only the PSNR of the embedded frame is computed. Table 2 lists the averaged PSNR values. In general, the image quality is acceptable if the PSNR value is greater than 35. That is, the embedded information is invisible to human eyes. We can observe that all PSNR values are greater than 44. These results illustrate that our method can achieve good stego-video quality. Table 2. PSNR values of the between original video and the stego-video sequences
Video sequences PSNR
News 46.61
Paris 46.01
Indoor 45.24
Highway 44.51
3.3 Robustness Evaluation
As presented in the Section 2.2, the sequence number of each frame as a watermark is embedded into the video sequence. The role of watermarks embedding is used to judge whether the received stego-video is intact or submitted to some temporal attacks but not extracting the secret image set blindly. The advantage of this embedding is that we can rectify the attacked video to make the extracted secret image set as exact as possible, which also can improve the identification result.. Given a received stego-video S’, we first extract the watermark of each frame and stored in a vector V’. The extraction of secret image set is carried out through matching the length of S’ and V’ with the length of original video S and the corresponding vector of sequence number V, respectively. The final extraction rule is as follows: If length(S’)=length(S) & V’ =V then the stego-video is intact and extract the palmprint image set normally. elseif length(S’) ≠ length(S) then the stego-video is attacked by frame dropping and fill the dropped frames before palmprint image set extraction according the difference between V and V’. elseif length(S’)= length(S) & V’ ≠ V & reorder (V’ ) =V then the stego-video is attacked by frame swapping and swap the swapped frames before palmprint image set extraction. else extract the palmprint image set normally.
An Effective Video Steganography Method for Biometric Identification
475
To evaluate the robustness of proposed method against unintentional and intentional attacks, we attack the stego-video with both spatial and temporal operations, including JPEG compression, spatial filtering, noise addition, scaling, frame dropping, averaging and swapping. To test the effect of various attacks on the identification rate, different attacked factors for each type of attacks are implemented in our study. When there is no attack, the identification rate can achieve 99%. 3.3.1 JPEG Compression Compression is one of most common attacks. JPEG is a popularly and widely used compression format. Fig. 3 (b) show the palmprint set extracted from the compressed stego-video with 80% compression ratio. The original image set in shown in Fig. 3(a), the same as the following attacks. The number below each image is the corresponding mean square error (MSE) between original and extracted one.
(a)
0.2208
0.2165
0.2136 (b)
0.2145
0.2116
Fig. 3. The results of JPEG compression attack
3.3.2 Spatial Filtering Filtering is a common operation in digital image processing such as high pass filtering and low pass filtering. In this experiment, the stego-video is filtered by Gaussian low pass filtering with 3× 3 . Fig. 4 show the palmprint set extracted from the filtered stego-video, where the standard deviation in Gaussian function is set 0.45.
0.1966
0.1959
0.2004
0.1952
Fig. 4. The results of spatial filtering attack
0.1862
476
Y. Lu, C. Lu, and M. Qi
3.3.3 Noise Addition Noise addition is also one of common attack in the process of transmission. We evaluate the robustness by adding impulse noise into the stego-video. Fig. 5 shows the extracted palmprint set from the attacked stego-video added 3% impulse noise.
0.2661
0.2653
0.2760
0.2688
0.2667
Fig. 5. The results of impulse noise attack
3.3.4 Scaling We use different scaling factors to change the size of video frame with nearestneighbor interpolation. Fig. 6 shows the extracted palmprint set from the attacked stego-video with 1.1 scaling factor.
0.4574
0.4530
0.4596
0.4973
0.5146
Fig. 6. The results of scaling attack
3.3.5 Frame Dropping For this type of attacks, n frames are dropping randomly. Through comparing the extracted sequence number of each frame with the original one, we can judge the dropped frames. Suppose that the frame f n is judged as a lost frame, the former frame
of f n in the incomplete video is duplicated to replace the lost frame. Fig. 7 shows the extracted palmprint set after frame dropping attack, where the number of dropped frame is eight.
0.1648
0.1150
0
0.1074
Fig. 7. The results of frame dropping attack
0.1832
An Effective Video Steganography Method for Biometric Identification
477
3.3.6 Frame Averaging Frame averaging is a simple collusion attack. The procedure of attacking one frame is described as follows. First, we selected a frame f n randomly. Then, the average
frame f n of f n −1 , f n and f n +1 is computed. Last, the frame f n is replaced by average f n . The extracted palmprint set after averaging 6 frames are shown in Fig. 8.
0.1177
0.1112
0
0.1203
0.0081
Fig. 8. The results of frame averaging attack
3.3.7 Frame Swapping For frame swapping attacks, we exchange one or more pair of randomly chosen frames. Because the sequence number of each frame is first extracted to estimate the frame swapping attack, the image set is extracted after swapping the swapped frames. Therefore, our proposed method resists this type of attacks absolutely. Table 3. Identification results of various attacks
Compression attacks Compression Ratio Identification rate Spatial filtering attacks Standard deviation Identification rate Noise addition attacks Noise density Identification rate Scaling attacks Scaling factor Identification rate Frame dropping attacks Number of dropped frames Identification rate Frame averaging attacks Number of dropped frames Identification rate
90% 97%
85% 95%
80% 91%
75% 90%
70% 80%
0.35 99%
0.40 98%
0.45 87%
0.50 63%
0.55 50%
0.01 98%
0.02 97%
0.03 89%
0.04 68%
0.05 45%
1.1 51%
1.2 14%
1.3 59%
1.4 24%
1.5 99%
2
4
6
8
10
99%
99%
98%
98%
100%
2
4
6
8
10
98%
100%
99%
99%
98%
478
Y. Lu, C. Lu, and M. Qi
The identification results of various attacks with different attack factors are list in Table 3. Seen from the spatial attacks, we can observe that our method is robust against attacks with small intensities except scaling attack. With the increase of the attack intensities, the identification rates are decreased rapidly with the increase of the attack intensities. The change trends of temporal attacks are different from the spatial attacks. Our proposed method is robust against these attacks in the view of identification rates. Moreover, the identification rate can achieve 100% sometimes which is superior to the result without attack. These results are related to the type of attacks. Because the index of attacked frames is generated randomly, the frames of embedding might not be selected for attacking in some experiments. Even though the embedded frames are destroyed, the destroyed palmprint information for identification might be redundant or not favorable for classification. Therefore, the attack can keep stable identification results and even make for the improved identification results. In summary, our proposed method exhibits high security, perfect imperceptibility and good robustness, which can accomplish the task of secret transmission effectively and assure the valid biometric identification.
4 Conclusions In this paper, an effective video steganography method has been proposed to protect the biometric data. Different from the most existing biometric hiding methods, the content of each transmission is an image set instead of a single image or template to enhance the practicability and validity of identification. To make the proposed method more invisible and robust, the motion analysis is adopted and the biometric images are embedded into the frequency coefficients discretely. Specially, the watermarks are embedded the video to detect the integrity of stego-video to assure the exactly extraction of secret data. We also analyze the security, imperceptibility and robustness of our method in detail. The extensive experimental results show it can protect the integrity of biometric data and further guarantee valid identification.
References 1. Ratha, N.K., Connell, J.H., Bolle, R.M.: Secure data hiding in wavelet compressed fingerprint images. In: Proceedings of the 2000 ACM Workshop on Multimedia, pp. 127–130 (2000) 2. Jain, A.K., Uludag, U., Hsu, R.L.: Hiding a face in a fingerprint image. In: Proceeding of the International Conference on Pattern Recognition, vol. 3, pp. 756–759 (2002) 3. Vatsa, M., Singh, R., Noore, A.: Feature based RDWT watermarking for multimodal biometric system. Image and Vision Computing 27, 293–304 (2009) 4. Noore, A., Singh, R., Vatsa, M., Houck, M.M.: Enhancing security of fingerprints through contextual biometric watermarking. Forensic Science International 169, 188–194 (2007) 5. Ratha, N.K.: Hiding Biometric Data. IEEE Transaction on Pattern Analysis and Machine Intelligence 25, 1494–1498 (2003) 6. Khan, M.K., Zhang, J., Tian, L.: Chaotic secure content-based hidden transmission of biometric templates. Chaos, Solitons and Fractal 32, 1749–1759 (2007)
An Effective Video Steganography Method for Biometric Identification
479
7. Jung, S., Lee, D., Lee, S., Paik, J.: Biometric Data-based Robust Watermarking Scheme of Video Streams. In: ICICS, pp. 1–5 (2007) 8. Lu, Z., Ge, Q., Niu, X.: Robust adaptive video watermarking in the spatial domain. In: The 5th international symposium on test and measurement, pp. 1875–1880 (2003) 9. Ye, D.P., Zou, C.F., Dai, Y.W., Wang, Z.Q.: A new adaptive watermarking for real-time MPEG videos. Applied Mathematics and Computation 185, 907–918 (2007) 10. Cerin, O., Ozcerit, A.T.: A new steganography algorithm based on color histograms for data embedding into raw video streams. Computers Security 28, 670–682 (2009) 11. Anderson, C., Burt, P., van der Wal, G.: Change detection and tracking using pyramid transformation techniques. In: Proceedings of SPIE—Intelligent Robots and Computer Vision, vol. 579, pp. 72–78 (1985) 12. Bender, W., Gruhl, D., Morimoto, N.: Techniques for data hiding. IBM Systems J. 35, 3– 336 (1996) 13. Tao, P.N., Eskicioglu, A.M.: A robust multiple watermarking scheme in the Discrete Wavelet Transform domai. In: Proc. SPIE, vol. 5601, pp. 133–144 (2004) 14. Li, B., Zheng, C.-H., Huang, D.-S.: Locally linear discriminant embedding: An efficient method for face recognition. Pattern Recognition 41, 3813–3821 (2008)
A Video Coding Technique Using Octagonal Motion Search and BTC-PF Method for Fast Reconstruction Bibhas Chandra Dhara1 , Sanjoy Kumar Saha2 , and Bhabatosh Chanda3 1
Department of Information Technology, Jadavpur University [email protected] 2 Department of Computer Science and Engineering, Jadavpur University sks [email protected] 3 Electronics and Communication Sciences Unit, Indian Statistical Institute [email protected]
Abstract. Video coding systems includes motion compensation, frequency transformation, quantization, and lossless or entropy coding. For applications like, video playback, the time complexity of the decoder is an important issue. Motion estimation (ME) is the most time constraint modules of video coding technique, and the frequency transformation/inverse transformation also consume a considerable amount of time. For real-time application, decoder has to be fast enough to reconstruct the frames from transmitted data; but the most time constraint module of the decoder is the inverse transformation. In this paper, a fast motion estimation algorithm is used and in residual frame coding purpose, a fast method based on block truncation coding with pattern fitting concept is employed. Proposed video coding method is a fast one with a good quality at the reasonable bit-rate, also the decoder is much faster. Keywords: Video coding/decoding, motion estimation, octagonal search, btc-pf coding.
1
Introduction
Increasing applications of multimedia demands for higher video coding/decoding efficiency, and applications like, video playback. In video sequence, there is high correlation of pixel values between frames as well as within the frame, which are referred to as temporal redundancy and spatial redundancy, respectively. Video compression standards MPEG-x or ITU-H.26x [1,2] have several mechanism to exploit these redundant information. Significant modules are motion compensation, transform coding followed by quantization, and entropy coding. The most important feature of any video coding technique is the ability to exploit the spatial and temporal redundancies inherent in a video sequence. This is accomplished through predictive coding, where each pixel is predicted and thus a residual frame is obtained, which needs to be transmitted. If pixel values are predicted from the other pixels of the same frame, the spatial redundancy is reduced T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 480–490, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Video Coding Technique Using Octagonal Motion Search
481
and is called intra-frame prediction. As an alternative, pixel values are predicted from the other frames of the video sequence and is called inter-frame prediction. It reduces the temporal redundancy. MPEG/ITU standards allow many prediction models to estimate pixels. For example, H.264 [2] normally generates 16×16 inter predictive blocks and also allows 4 × 4 intra prediction. Most significant tool of video compression system is inter-frame prediction, which is also known as motion compensation. Block matching algorithm (BMA) is widely used in motion compensation. In BMA, the current frame is divided into a number of non-overlapping macroblock of size N × N . For each macroblock, the objective is to find a N × N block that matches best in the reference frame (which is already encoded). The process to find the best match block from reference frame is known as motion estimation (ME). The exhaustive search over the entire reference frame gives the optimum match; however, this is impractical because of time complexity. Instead, the search is restricted to [−p, p] search window around the location of the current block. Moreover, search is performed only at a few selected locations within the search window guided by some strategy. A macroblock is referred to as (x, y), the left-top corner of the block. If (x + u, y + v) be the location of the best matched block in reference frame then the motion is defined by the motion vector (MV) (u, v). There are many choices for block distortion measure (BDM), such as mean-square-error (MSE), sum of absolute difference (SAD), etc. SAD is more appealing to video coding for its simplicity and performance. Full search (FS) is the simplest block-matching algorithm, with optimum result. However, this method is impractical due its time complexity. In the literature there are several fast ME algorithms. These algorithms include fast full search methods [3], may use simplified matching criterion [4], or consider some selected location within the search window [5,6,7,8]. In video coding standard, normally N = 16 and p=7, 15, 63 depending on the degree of motion in the video sequence. However, high compression can be achieved by variableblock size motion estimation [9]. H.264 supports seven block sizes for motion estimation: 4 × 4, 4 × 8, 8 × 4, 8 × 8, 8 × 16, 16 × 8, and 16 × 16. Transform based coding technique plays a very important role in video and image compression. Discrete cosine transform (DCT) is most widely used because its performance is close to KLT and efficient hardware and software implementations. Video compression standard like H.261/2, MPEG-2,-4 part 2 uses 8 × 8 DCT to encode the difference frames. H.264/AVC uses 4 × 4 integer approximation of DCT. Video standards, like H.261/2, MPEG-1/2, etc., use adaptive quantization step, where as in H.261 a flat quantization matrix is used. The final component of the video compression standards is the entropy coding. The advance entropy coding, such as arithmetic coding, is employed in H.263 and H.264. H.264 uses a number of techniques for entropy coding: Golomb codes, context adaptive variable length coding (CAVLC), and context adaptive binary arithmetic coding (CABAC). In this paper, a video coding method is proposed that leads to very fast decoding of frames. Here, to exploit the temporal redundancy octagon based fast BMA [10] is employed, and the difference frame is coded by hierarchical BTCPF, which is a modified version of BTC-PF [11] method. The proposed method
482
B.C. Dhara, S.K. Saha, and B. Chanda
(a)
(b)
(c)
(d)
Fig. 1. Distribution of search points over circular region: (a) Square grid pattern, (b) Diamond pattern, (c) Hexagonal pattern, and (d) Octagonal pattern
is a fast one and it gives a quite good result with a reasonable bit-rate. The organization of the paper is as follows. Section 2 focuses on the motion estimation method and the BTC-PF method is introduced in section 3. The Proposed coding method is described in section 4. Section 5 presents an experimental results and analysis of the proposed method. It also compares the result with standard H.264. Finally, conclusions are drawn in section 6.
2
Octagonal Search
Inter frame prediction reduces the temporal redundancy by block motion estimation as stated earlier. The fast search methods, in general, use some selected search points (SPs) to find out the MV, and these methods are based on the assumption that the error surface is unimodal ,i.e., the block distortion error decreases monotonically as the search point moves closer to the global minimum. In these methods the next search locations are selected according to the search pattern. Hence, the search pattern has to be isotropic (i.e., circular) with respect to current search point and search length should be unlimited. Different search patterns approximating a circular region are shown in Fig. 1 where square pattern [Fig. 1(a)] is adopted by TSS [12], NTSS [13] and 4SS [14], diamond pattern [Fig. 1(b)] by DS [5], and hexagonal pattern [Fig. 1(c)] by HEXBS [6] and ENHEXBS [8]. From the patterns, it is clear that the octagon pattern [Fig. 1(d)] most closely resembles the circular search pattern. Octagonal search (OS) [10] is a fast BMA and is faster than recently reported other fast BMAs. To speed-up the search, in this method the search pattern, i.e., the octagonal pattern (OP) is morphologically decomposed into two different search patterns: square pattern (SQP) [Fig. 2(a)] and cross pattern (CP) [Fig. 2(b)], and used alternately. Thus, during search the octagonal pattern is effectively generated by dilating a SQP by a CP (or conversely), because OP = SQP ⊕ CP. A demonstration is given in Fig. 3. This method has the capabilities to trap both small as well as large motion activity as the search length is not restricted. To further speed-up the search, spatio-temporal information is used to predict the initial search center (ISC). In the prediction method, the motion vectors of (0,1) and (1,0) spatial neighboring blocks and (0,0) temporal neighboring block are used. The motion vector which gives the minimum BDM of the current block
A Video Coding Technique Using Octagonal Motion Search
(a)
483
(b)
Fig. 2. The basic search patterns used in OS with size 2: (a) Square pattern (SQP), size is half of the horizontal (or vertical) distance between any two boundary points, (b) Cross pattern (CP), distance between center and any other point is the size of this pattern
is taken as the ISC. The search starts at ISC with SQP of size is 2 and the process continues using CP and SQP alternatively in the subsequent steps. If the wining point of the current step is the center of the current search pattern, the size of the next pattern is reduced by 1. The search process continues until the size becomes zero. Fig. 4 illustrates different search path to find the motion vector (MV), encircled search points are used as search center(s) in subsequent steps to find the MV.
(a)
(b)
Fig. 3. Octagon Pattern, SQP followed by CP: (a) CP employed to left-top boundary point of SQP, (b) CP employed to four corner points of the SQP and results an octagon pattern with 3 × 3 grid structure of SPs at center
3
BTC-PF Method
BTC-PF [11,15] method is a combined version of block truncation coding [16] and vector quantization [17] methods. In block truncation coding, an image is first divided into non-overlapping blocks of size n × n. Usually n is taken to be integer power of 2. Compression is achieved by representing each block by Q
Fig. 4. Examples of OS method, encircled SPs represents duplicate computation of BDM. a) ISC = (0,0), MV(0,0). (b) ISC = (0,0), MV(+1,-3). (c) ISC = (+3,-4), MV(+5,-4).
different gray values (Q n2 ) corresponding to a Q-level pattern of size n × n. In conventional BTC [16] Q is 2. Hence, a Q-level pattern of size n × n and Q different gray values are required to reconstruct each image block. In BTC-PF method, instead of determining the Q-level pattern based on the block statistics, it is selected from a set of, say, M predefined Q-level patterns. The pattern should match the candidate image block in terms of some quality measure. Thus, the index of selected pattern and Q gray values are sufficient for reconstruction. As the method selects a pattern from a set of predefined patterns, the quality of the reconstructed image is, in general, little lower than that of the conventional BTC. However, this little sacrifice in PSNR earns a huge gain in compression ratio and the performance of BTC-PF method depends on the pattern selected. The method of selection of best pattern for an image block B is as follows. For an image block B, let pixels are xi (i = 1, 2, . . . , n2 ) and the corresponding pixel intensities f (xi ). Let the available patterns in the given patternbook are Pj (j = 1, 2, . . . , M ) of size n × n and the levels present in each of these patterns are represented by t where 0 ≤ t ≤ Q−1, i.e., any pattern Pj = pj0 ∪pj1 ∪. . .∪pj(Q−1) , such that pjs ∩pjt = φ if s = t and pjt is the collection of pixel coordinates having level t in Pj . In other words, pjt = {xi |Pj (xi ) = t}. The image block B is fit to these patterns in least-square-error sense and the pattern, which fits the best, is selected. The pattern fit error between the image block B and the pattern Pj may be defined as ej =
Q−1
ejt
(1)
t=0
where ejt =
(f (xi ) − μt )2
(2)
1 f (xi ) |pjt | x ∈p
(3)
xi ∈pjt
μt =
i
jt
A Video Coding Technique Using Octagonal Motion Search
485
where |pjt | is the cardinality of the set pjt , i.e., the number of pixels having level t in the j th pattern. Finally, index I of the best fit pattern is the one for which this error is minimum, i.e., I = arg{
min
j∈1,2,...,M
{ej }}
(4)
Now if max{μt } − min{μt } of the best fit plane PI is less than a predefined threshold, the corresponding image block is treated as smooth block. A smooth block is represented by an additional index not considered in the pattern set. To reconstruct a smooth block, only the over all block mean μ is required, where Q−1 t=0 µt ∗|pIt | ; otherwise, index I of the selected pattern and corresponding μ= n2 Q means {μt : t = 0, 1, . . . , Q − 1} are necessary. Thus, compared to the conventional BTC, BTC-PF method quantizes the bit patterns by Pj s and the gray values present in the block by μt s. In this method to represent the Q-level pattern only log2 M bits are required rather than log2 Q n2 bits, where x represents smallest integer greater than or equal to x. Usually, log2 M log2 Q n2 which leads to significant compression. In this work, the patternbook is generated by clustering technique using a large number of residual frames as a training set [11].
4
Proposed Coding Method
In this paper, a video codec is proposed. The performance (bit-rate, time complexity and quality) of the coding technique depends highly on intra-prediction, inter-prediction, transformation used including quantization, and the entropy coding method. The proposed method not only reduces the time complexity of the decoder, it also reduces the complexity of the ME of the encoder. Thus, the method is very much suitable for application like video playback, video retrieval and video on-demand where once compressed video frames are reconstructed frequently. The basic block diagram of the proposed coder is shown in Fig. 5. The motion estimation accomplished through the octagonal search method. BTCPF method is used to encode the difference frame, which leads to only table look-up is sufficient for decoding. In the decoder of a standard video coding technique, inverse transformation consumes major time. Since, table look-up is much faster [15] than inverse DCT, the decoder of the proposed method is many times faster. Like standard video encoder, our proposed method has three main modules: (i) interpixel redundancy removal, (ii) hierarchical BTC-PF method to encode the difference frames, and (iii) Huffman coding for information obtained from other modules. Redundancy removal process consists of intra-frame prediction and interprediction. In intra-prediction (I-prediction), first the image (frame) is partiˆ c) using tioned into 4 × 4 blocks. An image block B(r, c) is first estimated as B(r, ˜ ˜ − 1, c). information of already processed neighboring blocks B(r, c − 1) and B(r ˆ The residual block Be (r, c) is defined as Be (r, c) = B(r, c) − B(r, c). The residual block, Be (r, c), is coded by the BTC-PF method. The details of the estimation
Reconstruction of blocks of size 16 x 16, 8 x 8, 4 x 4
Intra−Prediction
Motion vector
Reconstructed frame
Reconstruction of difference frame
Input
Pattern book
486
Inter−Prediction
Fig. 5. Block structure of the proposed video encoder and decoder
process explained in [15]. In inter-prediction process, only P-frame prediction is used. The temporal redundancy is exploited by octagonal based matching algorithm (OS) [10]. The octagonal search method is faster than other BMAs such as DS [5], ETSS [7], HEXBS [6], ENHEXBS [8], etc. The reference software of H.264 [18], uses a hybrid motion estimation method, which includes HEX, DS, CS, etc. The used BMA is as follows: Step 1: Predict the motion vector, it is the (component wise) median of the spatio-temporal neighbor motion vectors. Select ISC as the predicted vector or the (0,0) vector whichever gives minimum distortion for the current macroblock. Step 2: Determine the initial search pattern, square pattern (SQP) or cross pattern (CP). If the ISC directs towards either horizontal or vertical motion w. r. t. the current position, the cross pattern is selected else the square pattern is selected. The initial size of the pattern is 2. Step 3: Find the wining point for the next step. If wining point is the center of the current pattern the size is decreased by 1. Step 4: If size is non-zero, consider the other pattern at the wining point and goto Step 3. Step 5: If the size become zero first time, then reset size =1 and consider the other pattern at the wining point and goto Step 3; else Stop. Hierarchical BTC-PF method. (HBTC-PF) is employed to encode the residual frames resulted due to removal of the inter-pixel redundancy. The used hierarchical BTC-PF method given below: Step 1: The difference frame is first partitioned into 16 × 16 blocks (B16 ). Step 2: For each B16 do the following Step 2.1: Compute the block intensity range (R16 ). If R16 ≤ T h, then the block is represented by the block mean else B16 decomposed into 4 8 × 8 blocks (B8 ). For each B8 block do the following
A Video Coding Technique Using Octagonal Motion Search
487
Step 2.1.1: Compute the block intensity range (R8 ). If R8 ≤ T h, then the block is represented by the block mean else if T h < R8 ≤ 2T h then a 4 × 4 block is generated from 8 × 8 by sub-sampling by 2, and the block is then coded by BTC-PF method else (i.e., R8 > 2T h) 4 4 × 4 blocks (B4 ) are constructed. For each B4 do the following Step 2.1.1.a Compute the block intensity range (R4 ) if R4 ≤ T h the block is represented by the block mean else if T h < R4 ≤ 2T h then the block coded by BTC-PF method else (i.e. R4 > 2T h) the values of the block is estimated from the neighbor blocks and the corresponding error is quantized and transmitted. Entropy coding. technique is a lossless coding method and used to reduce the statistical redundancy. In the proposed method, inter-prediction module returns motion vector for each macroblock; and BTF-PF method outputs Q gray-levels and the index of the selected pattern for each block (of size 4 × 4). In addition in the HBTC-PF method, B16 block is partitioned hierarchically and coded. The string which representing the partition, called “partition string”, has to be send along with other information. In H.264, to achieve greater compression context-based entorpy coding used. In this method, we use simply Huffman coding technique to encode all the information except the motion vector.
5
Experimental Results
The performance of the proposed method is evaluated on some standard video sequences like, ‘football’, ‘foreman’, ‘news’, ‘Miss America’, etc. This set consists of sequences having different degrees of motion. For performance analysis we use only luminance component of the video sequences. In the proposed method, we have introduced two tools for video coding: a fast motion estimation method and a coding method for difference frame which leads to a very fast decoding method. First we evaluate the performance of the proposed ME algorithm within H.264 platform. To do this, we use the source code of H.264 which is available at [18] and its ME algorithm is replaced by our method. The result is shown in Table 1. It is clear that the proposed ME method have same performance in terms of PSNR (quality) and bpp (bit-rate), and is faster than the method used in [18]. In Table 2, the performance of the proposed method is compared with that of H.264 and is found inferior. The main reason of inferior performance of the proposed method is the use of HBTC-PF in place of transforms based coding. It is well establish that, the performance of the transform based image compression method is superior over spatial domain based method and all the video coding standards uses the transform based compression technique. In the proposed method, we have purposefully used BTC-PF method as our main target to develop a coding method, which would lead to a very efficient decoding technique. The complexity of the decoder of any transformation based is same as that of encoder, because the complexity of both forward and inverse transformation is
488
B.C. Dhara, S.K. Saha, and B. Chanda Table 1. Comparative performance of H.264 ME and proposed ME algorithm Video sequence
PSNR Carphone 35.83 Miss America 39.06 Mother Daughter 36.29 Football 36.72 Foreman 35.93 Hall Monitor 37.37 News 37.38 Average 36.94
Given ME Algo Our ME Algo bpp ME time (in sec.) PSNR bpp ME time (in sec.) 0.317 1.686 35.85 0.309 1.243 0.078 0.600 39.02 0.078 0.619 0.105 1.325 36.28 0.103 1.144 0.382 3.018 36.76 0.402 2.269 0.234 1.999 35.90 0.241 1.577 0.096 3.942 37.38 0.096 3.966 0.098 1.415 37.37 0.098 1.338 0.187 1.998 36.94 0.190 1.736
Table 2. Experimental results of the proposed video coding technique Video sequence Proposed method PSNR bpp Carphone 35.49 0.652 Miss America 38.63 0.176 Mother Daughter 35.73 0.266 Football 34.10 0.776 Foreman 35.44 0.427 Hall Monitor 37.01 0.238 News 36.89 0.253 Average 36.18 0.398
same. BTC-PF method is asymmetric in that sense, i.e., the time complexity of the decoder is negligible compare to the encoder. The decoder requires only table look-up and a few addition operations; the basic steps of the proposed decoder is shown in the right part of the Fig. 5. The decoding technique of transform based coding normally uses 8 × 8 inverse DCT, which requires 176 multiplication and 464 addition. The details of the calculation is given in [15]. A 4 × 4 integer approximation of DCT is used in H.264, and the inverse transformation in that case needs 64 addition and 16 shifting operations [19]. Computational requirement of the proposed method may be given as follows. HBTC-PF partitions the blocks B16 hierarchically based on the intensity range and a block is coded by – block mean; and no operation is required to reconstruct. – BTC-PF method; in this experiment a 3-level BTC-PF method is employed which returns index I, and gray levels μ1 , μ2 , and μ3 . With some extra bits, called “order index”, the gray levels are ordered as μ1 ≤ μ2 ≤ μ3 and then coded as μ1 , μ2 − μ1 , μ3 − μ2 . Hence, for these blocks, 2 addition are required for reconstruction. – quantized prediction error; median of three neighbors is used as the predicted value. This is followed by quantizing the error, which is transmitted directly. So for each pixel 3 comparisons, one shifting operation and one addition is required for reconstruction.
A Video Coding Technique Using Octagonal Motion Search
489
Table 3. Average complexity of the decoder at block level Block Size 4×4 8×8
Video Standard Proposed method Mult Add Comp Shift Mult Add Comp Shift 0 64 0 16 0 2.60 2.40 0.80 176 464 0 0 0 10.40 9.60 3.20
In the proposed method, it is estimated that less than 5% (approx.) blocks are coded by prediction method. Assuming remaining all other blocks are coded by BTC-PF, the complexity of the proposed decoder summarized in the Table 3. The results in Table 3 indicates how much our decoder is faster than the that of standard method.
6
Conclusions
In this paper, a video coding method is proposed that leads a very fast decoding. The proposed method uses a fast motion search method, which is faster than the hybrid search method used in H.264. For residual frame coding, a hierarchical BTC-PF method is used. The decoding complexity of the HBTC-PF is negligible compared to the transformation-based methods used in video standards. The proposed method is suitable for application like video playback, video-ondemand, video retrieval, where once compressed video frames are reconstructed frequently.
References 1. ITU-T: Recommendation h.261- video codec for audiovisual services at p 64 kbits/s (1990) 2. Rec., I.T.: Joint video specification (itu-t rec. h.264 — iso/iec 14496-10 avc) joint committee draft, joint video team (jvt) of iso/iec mpeg and itu-t vceg, document jvt-g050r1.doc. (2003) 3. Ahn, T.G., Moon, Y.H., Kim, J.H.: Fast full search motion estimation based on multilevel successive elimination algorithm. IEEE Trans. Circuits and Syst. Video Technol. 14, 1265–1269 (2004) 4. Bhaskaran, V., Konstantinides, K.: Image and Video Compression Standards: Algorithms and Architectures, 2nd edn. Kluwer Academic Publishers, Dordrecht (1999) 5. Zhu, S., Ma, K.K.: A new diamond search algorithm for fast block-matching motion estimation. IEEE Trans. Image Processing 9, 287–290 (2000) 6. Zhu, C., Lin, X., Chau, L.P.: Hexagon-based search pattern for fast block motion estimation. IEEE Trans. Circuits and Syst. Video Technol. 12, 349–355 (2002) 7. Jing, X., Pui, L.P.: An efficient three-step search algorithm for block motion estimation. IEEE Trans. Multimedia 6, 435–438 (2004) 8. Zhu, C., Lin, X., Chau, L., Po, L.M.: Enhance hexagonal search for fast block motion estimation. IEEE Trans. Circuits and Syst. Video Technol. 14, 1210–1214 (2004)
490
B.C. Dhara, S.K. Saha, and B. Chanda
9. Khan, N.A., Masud, S., Ahmad, A.: A variable block size motion estimation algorithm for real-time h.264 video coding. Signal Processing: Image Communication 21, 306–315 (2006) 10. Dhara, B.C., Chanda, B.: Block motion estimation using predicted partial octagonal search. In: Proc. VIE 2006, pp. 277–282 (2006) 11. Dhara, B.C., Chanda, B.: Block truncation coding using pattern fitting. Pattern Recognition 37(11), 2131–2139 (2004) 12. Koga, T., Iinuma, K., Hirano, A., Iijima, Y., Ishiguro, T.: Motion compensated interframe coding for video conferencing. In: Proc. Nat. Telecommun. Conf., pp. G.5.3.1–G.5.3.5 (1981) 13. Li, R., Zeng, B., Liou, M.: A new three-step search algorithm for block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology 4, 438–442 (1994) 14. Po, L.M., Ma, W.C.: A novel four-step search algorithm for fast block motion estimation. IEEE Transactions on Circuits and Systems for Video Technology 6, 313–317 (1996) 15. Dhara, B.C., Chanda, B.: Color image compression based on block truncation coding using pattern fitting principle. Pattern Recognition 40, 2408–2417 (2007) 16. Delp, E.J., Mitchell, O.R.: Image compression using block truncation coding. IEEE Trans. Commun. 27, 1335–1342 (1979) 17. Salomon, D.: Data Compression: The Complete Reference. Springer, New York (2000) 18. H.264/avc reference software, http://iphome.hhi.de/suehring/tml/ 19. Malvar, H.S., Hallapuro, A., Harczewicz, M., Kerofsky, L.: Low-complexity transformation and quantization in h.264/avc. IEEE Trans. Circuits and Syst. Video Technol. 13, 598–603 (2003)
Rough Set Approach in Ultrasound Biomicroscopy Glaucoma Analysis Soumya Banerjee1 , Hameed Al-Qaheri2 , El-Sayed A. El-Dahshan3 , and Aboul Ella Hassanien4 1
3
Birla Inst. of Technology, CS Dept. Mesra, India 2 Kuwait University, CBA, IS Dept. Kuwait Physics Dept., Faculty of Science, Ain Shams University, Abbassia, Cairo, Egypt 4 Information Technology Department, FCI, Cairo University 5 Ahamed Zewal Street, Orman, Giza, Egypt
Abstract. In this paper, we present an automated approach for Ultrasound Biomicroscopy (UBM) glaucoma images analysis. To increase the efficiency of the introduced approach, an intensity adjustment process is applied first using the Pulse Coupled Neural Network with a median filter. This is followed by applying the PCNN-based segmentation algorithm to detect the boundary of the anterior chamber of the eye image. Then, glaucoma clinical parameters have been calculated and normalized, followed by application of a rough set analysis to discover the dependency between the parameters and to generate set of reduct that contains minimal number of attributes. Experimental results show that the introduced approach is very successful and has high detection accuracy. Keywords: Rough analysis.
1
Sets,
Classification,
PCNN,glaucoma
images
Introduction
Glaucoma is a disease that can cause a severe impairment of visual function and leads to irreversible blindness if untreated. About 60 million people worldwide will have glaucoma by 2010, and the number will increase to nearly 80 million by 2020, according to a recent study in the British Journal of Ophthalmology [1]. It has been estimated that one-half of the glaucoma patients are affected by angle closure glaucoma [2]. Angle closure glaucoma (ACG) has been called the most common form of glaucoma in the worldwide, and the leading cause of bilateral blindness [2,3,4]. If the disease is detected in its early stages, damage can be minimized and the long term prognosis for the patient is improved. UBM acts at a frequency of 50 to 100 Hz with 20 to 60 μm resolution and 4 mm penetration[6,7]. It produces high resolution images of the anterior part of the eye by which a qualitative and a quantitative evaluation of structures and their relation can be done [5]. In spite of recent advances in ultrasonic imaging, manually glaucoma clinical parameters assessment on UBM images by physicians T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 491–498, 2010. c Springer-Verlag Berlin Heidelberg 2010
492
S. Banerjee et al.
is still a challenging task due to poor contrast, missing boundary, low signal-tonoise ration (SNR), speckle noise and refraction artifacts of the images. Besides, manual identification for glaucoma clinical parameters is tedious and sensitive to observer bias and experience. Thus, Semi- or automatic angle closure glaucoma clinical parameters measurements methods provide robust results with a certain degree of accuracy and can remove the physical weaknesses of observer interpretation within ultrasound images[8,9]. This is essential for the early detection and treatment of glaucoma disease. Rough set theory [10,11,12] is a fairly new intelligent technique that has been applied to the medical domain and is used for the discovery of data dependencies, evaluates the importance of attributes, discovers the patterns of data, reduces all redundant objects and attributes, and seeks the minimum subset of attributes. Moreover, it is being used for the extraction of rules from databases. One advantage of the rough set is the creation of readable if-then rules. Such rules have a potential to reveal new patterns in the data material. This paper introduces a rough set scheme for Ultrasound Biomicroscopy glaucoma images analysis in conjunction with pulse coupled neural network. This paper is organized as follows: Section 2 discusses the proposed rough set approach in Ultrasound Biomicroscopy Glaucoma Images Analysis in detail. Experimental analysis and discussion of the results are described in section 3. Finally, conclusions and future work are presented in section 4.
2
Rough Set Approach in Ultrasound Biomicroscopy Glaucoma Analysis
Figure 1 illustrates the overall steps in the proposed Ultrasound Biomicroscopy Glaucoma Rough Sets Images Analysis Scheme using a UML Activity Diagram where a square or rectangular represents a data object, a rounded rectangular represents an activity, solid and dashed directed lines indicate control flow and data object flow respectively. Functionally, RBIS can be partitioned into three distinct phases. 2.1
Preprocessing Phase
In the first phase of the experiment the UBM eye images have been preprocessed to remove noise. Eye structure in UBM images are not very clear and this makes them very challenging to analysis, both for naked human eye and any automatic assessment algorithm. PCNN is a very powerful tool to enhance the boundaries in ultrasound images. To increase efficiency of automating the boundary detection process, a pre-processing process should be considered to enhance the quality of the eye images before detection their boundaries. An intensity adjustment process is applied first using the Pulse Coupled Neural Network with a median filter [13,14]. The success of the application of PCNNs to image segmentation depends on the proper setting of the various parameters of the network, such as the linking
Rough Set Approach in Ultrasound Biomicroscopy Glaucoma Analysis
493
Fig. 1. Rough Set Approach in Ultrasound Biomicroscopy Glaucoma Analysis
parameter β thresholds θ, decay time constants αθ , and the interconnection matrices M and W . The image can be represented as an array of M × N normalized intensity values. Then the array is fed in as an M × N inputs of the PCNN. If initially all neurons are set to 0, the input results in activation of all
494
S. Banerjee et al.
of the neurons at the first iteration. The threshold of each neuron, Θ, significantly increases when the neuron fires, then the threshold value decays with time. When the threshold falls below the respective neuron’s potential (U ), the neuron fires again, which again raises the threshold. The process continues creating binary pulses for each neuron. We observe that the visible difference between the enhanced image and the original image is not too drastic. The results corresponding to the segmentation without preprocessing results in blank image but with preliminary preprocessing, it does not result in blank image. 2.2
Clinical Parameters Assessment Phase
The second phase of the experiment shows the clinical parameters assessment. The degree of angle opening was measured using the following variables: trabecular-iris angle (TIA), the angle-opening distance (AOD) at 500 micron from the scleral spur (AOD500), and angle-recess area (ARA500), as described by Palvin et al. [15,16]. Clinical parameters assessment algorithm. We designed an algorithm to identify the sclera spur, and then automatically calculate the distance along a perpendicular line drawn from the corneal endothelial surface to the iris at 500 μm yielding the AOD500 μm. The total area bounded by the iris and cornea at 500 μm from the sclera spur(apex point) was calculated as the angle-recess area (ARA500). Also, the TIA was measured from the apex point. Then the measured TIA and AOD500 parameters are fed to the classifier to classify the cases as normal and glaucomatous eye. The angles of patients were categorized as Grade 0 to Grade 4, using Shaffer’s classification[3]. These angles were quantified by ultrasound biomicroscopy using the following biometric characteristics:500 μm (AOD500) and (ARA500) [3,17]. The angles were further segregated as narrow angles (Schaffer’s Grade 2 or less) and open angles (Schaffer’s Grade 3 and 4).
3 3.1
Implementation and Results Evaluation UBM Images Characteristic
The UBM images were from the New York Glaucoma Research Institute, obtained with the UBM Model 840, Paradigm Medical Industries Inc, with a 50 MHz transducer probe. The image has a lateral and axial physical resolution of approximately 50 μ and 25 μ respectively and a penetration depth of 4-5 mm, typically of dimensions 5 x 5 mm at a resolution of 440 x 240 pixels. Twenty images were used in the verification of the technique. The technique was implemented on PC with a 3 GHz P4 processor using MATLAB 7.01.
Rough Set Approach in Ultrasound Biomicroscopy Glaucoma Analysis
3.2
495
PCNN: Chamber Boundary Detection Results
Preprocessing results: In the first phase of the experiment, the UBM eye images have been preprocessed to remove noise. Eye structure in UBM images are not very clear which makes them very challenging to analysis, both for naked human eye as well as any automatic assessment algorithm. It can be seen that with the preprocessing module which removes image noise, smoothes images and enhances the image resolutions, the performance of the segmentation module can be significantly improved. Figure (2-a) is the original image. After noise removal and image enhancement by the preprocessing module, the output image is shown in Figure (2-b). Figure (2-c) shows the boundary of the anterior chamber on the original image. Figure (2-d) shows the boundary of the anterior chamber alon.
a) Original
b) PCNN enhanced c) segmentation
d) boundaries
Fig. 2. Determination of chamber boundaries
Table (1) represents the Chamber area rough decision system. We reach the minimal number of reducts that contains a combination of attributes which has the same discrimination factor. The final generated reduct set which is used to generate the list of rules for the classification is: {TIA, with Support 100%} A natural use of a set of rules is to measure how well the ensemble of rules is able to classify new and unseen objects. To measure the performance of the rules
496
S. Banerjee et al. Table 1. Chamber area decision table Angle-TIA 45.43 24.8 13.68 13.6 24.58 56.4 37.44
AOD500 28.161 11.78 6.13 6.05 11.52 48.19 20.61
ARA 63.04 150.17 77.66 75.89 145.03 771.28 277.53
Decision class 1 0 0 0 0 1 1
is to assess how well the rules do in classifying new cases. So we apply the rules produced from the training set data to the test set data. The following present the generated rules in a more readable format: R1: IF TIA < 29.94 THEN Decision Class is 0.0 R2: IF TIA >= 29.94 THEN Decision Class is 1.0 Measuring the performance of the rules generated from the training data set in terms of their ability to classify new and unseen objects is also important. Our measuring criteria were Rule Strength and Rule Importance [18] and to check the performance of our method, we calculated the confusion matrix between the predicted classes and the actual classes as shown in Table (2). The confusion matrix is a table summarizing the number of true positives, true negatives, false positives, and false negatives when using classifiers to classify the different test objects. Table 2. Model Prediction Performance (Confusion Matrix) Actual Class 0 Class 1
Predict Class 0 17 0 1.0
Predict Class 1 0 32 1.0
Accuracy 1.0 1.0 1.0
Several runs were conducted using different setting with strength rule threshold. Table (4) shows the number of generated rules using rough sets and for the sake of comparison we have also generated rules using neural network.
4
Conclusions
We have developed an advanced hybrid rough pulse coupled neural network scheme for Ultrasound Biomicroscopy glaucoma images analysis and provided
Rough Set Approach in Ultrasound Biomicroscopy Glaucoma Analysis
497
a methodology for assessing the clinical parameters of angle closure glaucoma based on UBM images of the eye. To increase the efficiency of the introduced hybrid scheme, an intensity adjustment process is applied first, based on the Pulse Coupled Neural Network with a median filter. This is followed by applying the PCNN-based segmentation algorithm to detect the boundary of the prostate image. Combining the adjustment and segmentation enables us to eliminate PCNN sensitivity to the setting of the various PCNN parameters whose optimal selection can be difficult and can vary even for the same problem. Then, chamber boundary features have been extracted and normalized, followed by application of a rough set analysis to discover the dependency between the attributes and to generate set of reduct that contains minimal number of attributes.
References 1. Quigley, H.A., Broman, A.T.: The number of people with glaucoma worldwide in 2010 and 2020. Br. J. Ophthalmol. 90(3), 262–267 (2006) 2. Razeghinejad, M.R., Kamali-Sarvestani, E.: The plateau iris component of primary angle closure glaucoma. Developmental or acquired Medical Hypotheses 69, 95–98 (2007) 3. Kaushik, S., Jain, R., Pandav, S.S., Gupta, A.: Evaluation of the anterior chamber angle in Asian Indian eyes by ultrasound biomicroscopy and gonioscopy. Indian Journal of Ophthalmology 54(3), 159–163 (2006) 4. Quigley, H.A.: Number of people with glaucoma worldwide. Br. J. Ophthalmol. 80, 389–393 (1996) 5. Radhakrishnan, S., Goldsmith, J., Huang, D., Westphal, V., Dueker, D.K., Rollins, A.M., Izatt, J.A., Smith, S.D.: Comparison of optical coherence tomography and ultrasound biomicroscopy for detection of narrow anterior chamber angles. Arch. Ophthalmol. 123(8), 1053–1059 (2005) 6. Urbak, S.F.: Ultrasound Biomicroscopy. I. Precision of measurements. Acta Ophthalmol. Scand. 76(11), 447–455 (1998) 7. Deepak, B.: Ultrasound biomicroscopy An introduction. Journal of the Bombay Ophthalmologists Association 12(1), 9–14 (2002) 8. Zhang, Y., Sankar, R., Qian, W.: Boundary delineation in transrectal ultrasound image for prostate cancer. Computers in Biology and Medicine 37(11), 1591–1599 (2007) 9. Youmaran, R., Dicorato, P., Munger, R., Hall, T., Adler, A.: Automatic detection of features in ultrasound images of the Eye. Proceedings of the IEEE 3, 1829–1834 (2005); IMTC (16-19 May 2005) Ottawa, Canada 10. Pal, S.K., Polkowski, S.K., Skowron, A. (eds.): Rough-Neuro Computing: Techniques for Computing with Words. Springer, Berlin (2002) 11. Pawlak, Z.: Rough Sets. Int. J. Computer and Information Sci. 11, 341–356 (1982) 12. Grzymala-Busse, J., Pawlak, Z., Slowinski, R., Ziarko, W.: Rough Sets. Communications of the ACM 38(11), 1–12 (1999) 13. El-dahshan, E., Redi, A., Hassanien, A.E., Xiao, K.: Accurate Detection of Prostate Boundary in Ultrasound Images Using Biologically inspired Spiking Neural Network. In: International Symposium on Intelligent Siganl Processing and Communication Systems Proceeding, Xiamen, China, November 28-December 1, pp. 333–336 (2007)
498
S. Banerjee et al.
14. Hassanien, A.E.: Pulse coupled Neural Network for Detection of Masses in Digital Mammogram. Neural Network World Journal 2(6), 129–141 (2006) 15. Pavlin, C.J., Harasiewicz, K., Foster, F.S.: Ultrasound biomicroscopy of anterior segment structures in normal and glaucomatous eyes. Am. J. Ophthalmol. 113, 381–389 (1992) 16. Hodge, A.C., Fenstera, A., Downey, D.B., Ladak, H.M.: Prostate boundary segmentation from ultrasound images using 2D active shape models: Optimisation and extension to 3D. Computer Methods and Programs in Biomedicine 8(4), 99–113 (2006) 17. Gohdo, T., Tsumura, T., Iijima, H., Kashiwagi, K., Tsukahara, S.: Ultrasound biomicroscopic study of ciliary body thickness in eyes with narrow angles. American Journal of Ophthamology 129(3), 342–346 (2000) 18. Ning, S., Xiaohua, H., Ziarko, W., Cercone, N.: A Generalized Rough Sets Model. In: Proceedings of the 3rd Pacific Rim International Conference on Artificial Intelligence, Beijing, China, vol. 431, pp. 437–443. Int. Acad. Publishers (1994) 19. Sbeity, Z., Dorairaj, S.K., Reddy, S., Tello, C., Liebmann, J.M., Ritch, R.: Ultrasound biomicroscopy of zonular anatomy in clinically unilateral exfoliation syndrome. Acta Ophthalmol. 86(5), 565–568 (2008) 20. Dorairaj, S.K., Tello, C., Liebmann, J.M., Ritch, R.: Narrow Angles and Angle Closure: Anatomic Reasons for Earlier Closure of the Superior Portion of the Iridocorneal Angle. Acta Ophthalmol. 125, 734–739 (2007); Okamoto F., Nakano S., Okamoto C., et al.: Ultrasound biomicroscopic findings in aniridia. Amer. J. Ophthalmol. 137(5), 858–862 (2004)
Video Copy Detection: Sequence Matching Using Hypothesis Test Debabrata Dutta1 , Sanjoy Kumar Saha2 , and Bhabatosh Chanda3 1
2 3
Tirthapati Institution, Kolkata, India CSE Department, Jadavpur University, Kolkata, India ECS Unit, Indian Statistical Institute, Kolkata, India
Abstract. video copy detection is intended for verifying whether a video sequence is copied from another or not. Such techniques can be used for protecting the copyright. A content-based video detection system extracts signature of the video from its visual constituents. Signature of the test sequence is matched against the same of the sequences in the database. Deciding whether two sequences are similar enough even with the presence of distortion is a big challenge. In this work, we have focused on sequence matching. We have proposed a hypothesis test based scheme for comparing the similarity of two sequences. Experiments have been carried out to verify the capability of the concept and result seems satisfactory. Keywords: Video Copy Detection, Video Fingerprinting, Sequence Matching, Hypothesis Test.
1
Introduction
Technological development has made capturing and storage of video data easier and inexpensive. It has led to huge growth in video data volume. Moreover, development in the arena of network and communication and increase in bandwidth has encouraged video sharing, broadcasting enormously. All these have an adverse effect on copyright management. The technology has enabled easy access, editing and duplicating of video data. Such activities result into violation of digital rights. Considering the huge volume of a video database, detection of a copy becomes very difficult. But, it is the basic requirement in protecting the intellectual property right. Driven by the importance of copyright protection, a new area of research called video fingerprinting has come up. Lee et al. [1] has defined fingerprint as perceptual features for short summaries of a multimedia object. The goal of video fingerprinting is to judge whether two video have the same contents even under quality-preserving distortions like resizing, frame rate change, lossy compression [2]. Video fingerprinting is also commonly referred to as video copy detection. There are two basic approaches to address the issue of copyright detection – watermarking and content-based copy detection. In the first approach, watermark/ non-visible information is embedded into the content and later, if required, this T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 499–508, 2010. c Springer-Verlag Berlin Heidelberg 2010
500
D. Dutta, S.K. Saha, and B. Chanda
embedded information is used for establishing the ownership. On the other hand, in content-based approach, no additional information is inserted. It is said that ”Video itself is the watermark” [3]. Unique signatures (features) are derived from the content itself. Such signatures are also extracted from the questioned version of the media and are compared with those of the original media stored in the database [4,5,6,7,3]. Apart from protecting the right, copy detection may also help in media tracking [8] i.e. how many times a particular media is being used. Performance of a copy detection scheme relies on suitable signature extraction and a sequence matching scheme. The system must survive in the presence of various distortions adopted by the copier. In this work, we have focused on the aspect of sequence matching. The paper is organized as follows. After this introduction, section 2 presents a brief review of video copy detection techniques. Section 3 describes the proposed methodology. Experimental results are presented in section 4 and finally, concluded in section 5.
2
Past Work
Features of a video copy detection system must satisfy the properties outlined in [2]. It must be robust so that fingerprint of a degraded video and the original one should be similar. It should be pairwise independent to have different fingerprints for perceptually different fingerprints. Finally, the fingerprint must support fast search i.e. it should be search efficient. Various features like colour histogram [9,10], luminance based descriptors [11,12,13], dominant colour [3] have been tried. Various gradient based features [2,14] are also used. Joly et al. [15] considered local descriptors based on Harris detector. Wu et al. [16] have suggested trajectory based visual patterns. Temporal ordinal measurement has been proposed as global descriptors by Chen and Stentiford [17]. DCT based hash algorithm has been used by Coskun et al. [18]. Ordinal measure [19], combination of spatio-temporal information [11] also have been used as signature. Maani et al. [20]have developed local descriptors for identified regions of interest based on angular intensity variation and region geometry. The test/query video and those in the database are to be matched on the basis of extracted signature. This matching is a crucial part of a video copy detection system. A variety of schemes have been tried by the researchers. In [11], spatio-temporal measure to compute the similarity between two video sequences has been presented and it relies on a threshold in detecting a copy. Moreover, computing the distance with all the database clips is prohibitive. The scheme presented in [2,21] also suffer from the problem of threshold selection. Wu et al. [16] had to take the burden of computing a huge similarity matrix and in hash function based scheme [18], selection of suitable hash function is difficult. Moreover, a hash function is very sensitive to the changes in the content and making the system robust against distortion is of great challenge. Various schemes like ordinal measure based technique [19], histogram intersection of DCT frames [22]have been proposed. Similarity between two sequences also have been measured by calculating the number of frames matched between two
Video Copy Detection: Sequence Matching Using Hypothesis Test
501
shots [23]. Again, such comparisons are to be carried out with all the sequences in the database. Shen et al. [24] proposed to compute similarity based on the volume intersection between two hyper-sphere governed by the video clips. Schemes based on indexes built over the signatures are also found [12,25,26]. Several keyframe based schemes have been reported. Jain et al. [27] have proposed a set of keyframe (or sub-sampled frame) based sequence matching method. Similar approaches have also been reported in [28,9,29]. Various clustering based scheme [30,23] have also been tried. Frames are clustered and one or more keyframes are extracted from each cluster. Comparison is restricted to keyframes only. Mani et al. [20], in their technique, selected a set of matched keyframes from the database corresponding to each keyframe in the query sequence. From the matched set of keyframes, it tried to find out continuous subsequence. If the length of such subsequence exceeds a threshold then considered as a copy. The scheme reduces computation as the final matching is restricted with in a limited set of database sequence. But, selection of threshold poses a problem. Thus, it appears from the past work that sequence matching is an important issue and it demands attention.
3
Proposed Methodology
In a video copy detection method, the task is to verify whether or not a test/query sequence is a copied version of a sequence present in the database. It has already been discussed that such a system consists of two major modules namely extraction of signature (feature vector) and sequence matching. Signatures must fulfill the diverging criteria such as discriminating capability and robustness against various geometric and signal distortion. Sequence matching module bears the responsibility of devising the match strategy and verifying the test sequence with likely originals in the database. In this work, we put our effort in developing the sequence matching module. We have relied on hypothesis test based strategy for the purpose. As the video signature is likely to be multi-dimensional, we have considered multivariate Wald-Wolfowitz run test [31] based hypothesis testing. 3.1
Multivariate Wald-Wolfowitz Test
Wald-Wolfowitz runs test is used to solve the similarity problem of non-parametric distribution of two samples. Suppose, there are two samples X and Y of size m and n respectively and the corresponding distributions are Fx and Fy . H0 , the null hypothesis to be tested and H1 , the alternative hypothesis are as follows: H0 : X and Y are from same population, i.e. Fx = Fy H1 : They are from different population, i.e. Fx = Fy In classical Wald-Wolfowitz test, it is assumed that sample points are univariate. N = n + m observations are sorted in ascending order and the labels X or Y are assigned to them depending on the sample to which they belong. Friedman and Rafsky [32] have suggested a multivariate generalization by using the minimal spanning tree (MST) of the sample points. In this approach each sample point
502
D. Dutta, S.K. Saha, and B. Chanda
is considered as a node and every node is connected to the closest node (based on the similarity between their feature vectors) to form the MST of the sample points. Now if we remove all the edges connecting pair of points coming from two different samples, each subtree formed would consist of samples from one and only one population and is equivalent to a run of a univariate case. Thus number of nodes in each subtree is equivalent to run-length and R, the number of subtrees, is equivalent to number of runs. Test statistic W is defined as R − E[R] W = V ar[R]
(1)
where 2mn 2mn−N +2 + (NC−N V ar[R] = N (N −1) × ( N −2)(N −3) × (N (N − 1) − 4mn + 2)), C is the number of edge pairs in MST sharing a common node and E[R] = 2mn N + 1. As W follows standard normal distribution, a critical region may be chosen for a given level of significance which signifies the maximum probability of rejecting a true H0 . If W falls within the critical region, H0 is rejected. Physically, low value of R expresses that two samples are less interleaved in the ordered list and it leads to the interpretation that they are from different populations. 3.2
Sequence Matching
Video sequence is a collection of frames. Each frame is described by a n-dimensional feature vector. Thus, a sequence may be thought of as {Vi }, the set of feature vectors where Vi is the feature vector corresponding to i-th frame in the sequence. Let, St and Sd are the test sequence and a sequence from database which are to be compared. Signatures are extracted for St and Sd to obtain the set of feature vectors {Vt } and {Vd } respectively. Thus, {Vt } and {Vd } may be thought of as two samples and hypothesis testing described in section 3.1 can be applied to verify whether they are from same population or not. If Vt and Vd belong to the same population, it is declared that the sequences are similar. As the database consists of large number of sequences, it is prohibitive to compare test sequence with each and every sequence in the database. In order to address this issue and to reduce the number of sequences, we outline the proposed scheme as follows. – Obtain Kd , the collection of keyframes (representative frames) extracted from all the video sequences in the database. – Obtain Kt , the collection of keyframes extracted from the test sequence. – For each keyframe in Kt , Find the most similar one (in terms feature vector) from Kd to obtain matched keyframe set, Km . – Form a candidate sequence set, Sc by taking the sequences corresponding to the keyframes in Km . – Verify St only with Sd (where, Sd ∈ Sc ) using multivariate Wald-Wolfowitz test.
Video Copy Detection: Sequence Matching Using Hypothesis Test
503
Further refinement of the scheme may be done based on the following consideration. In order to avoid the possible exclusion of any possible candidate sequence, instead of best matched one, few top order matched keyframes may be considered to generate Km . It may increase the size of Sc . But the growth of size of Sc can be controlled by putting higher priority on the candidate sequence with higher number of matched keyframes. Indexing scheme may be employed to search for similar keyframes in the database to reduce the cost of matching scheme. 3.3
Extraction of Signature
It has been outlined earlier that our focused effort is on the sequence matching technique. But, in order to verify the effectiveness of the said scheme, we need to extract the signatures of the frames of a video sequence. For this purpose we have relied on wavelet based features.
LL
HL
LH
HH
Fig. 1. Wavelet Decomposition
We have considered the grayscale version of the image and it has been decomposed into four sub-bands (LL, LH, HL and HH) as shown in Fig. 1 using 2-dimensional Haar wavelet transformation. Thus, average intensity or the low frequency component is retained in the LL sub-band and other three show the high frequency components i.e. the details of the image. Energy of the values in each sub-band is considered as features. Iteratively, decomposition is continued considering LL sub-image as the image. Normally, along with the energy, average intensity is also considered as the feature. But, as it gets more affected by the common attacks like change in brightness and contrast, we have relied only on energy. In successive iteration, as we deal with the average image in LL band, the impact of the noise also gets reduced and enables us to cope up with some specific attacks. In our experiment, we have considered 5 levels of decompositions to obtain 20-dimensional feature vector as the signature of each frame.
4
Experimental Results
We have carried out our experiment in a focused manner to judge the effectiveness of the proposed hypothesis test based sequence matching technique. As we
504
D. Dutta, S.K. Saha, and B. Chanda
have not implemented the whole system, we have used a database of manageable size which is sufficient enough to verify the core part of the matching strategy. We have worked with a database consisting of 1000 sequences obtained from various recordings of news, sports, documentary sequences and TRECVID 2001 and 2005 database. 300 test sequences have been prepared in various ways. 50 of them are chosen from video sequences which are not part of the database and for these no match is expected when matched with database sequence. Test sequences are also generated by randomly selecting the frames from a database sequence. We have considered around 100 such sequences. For rest of the test sequences we have incorporated various attacks.
(a)
(b) Fig. 2. (a) Sample frames from different sub-parts of a database sequence (b) Sample frames from test sequence
(a)
(b) Fig. 3. (a) Sample frames from a database sequence (b) Frames after contrast enhancement
Video Copy Detection: Sequence Matching Using Hypothesis Test
505
(a)
(b) Fig. 4. (a) Sample frames from a database sequence (b) Frames with increased brightness
A sequence may have considerable variation so that visually it may be thought of as a collection of different sub-sequences. Sampled frames of one such sequence is shown in 2(a). Test sequence is generated focusing on one part only(see Fig. 2(b)). While copying a sequence, copier may change the contrast or brightness. Test sequences have been generated by incorporating such attacks (see Fig. 3 and 4). As shown in Fig. 5, test sequences have been generated by adding random noise in the sampled frame of original sequence. Few test sequences also have been generated by applying red, green and blue filters (see Fig. 6).
(a)
(b) Fig. 5. (a) Sample frames from a database sequence (b) Frames corrupted by noise
For all the frames in the database, feature vectors are computed and keyframe(s) for all the sequences are obtained following the methodology proposed in [33]. Keyframes are stored in the database. In the same way, frames in the test sequence are described by the feature vector and keyframe(s) are also selected. In the present scope of work, set of candidate sequences are obtained by comparing the keyframes of test sequences with those in the database in an exhaustive manner. Euclidean distance between the feature vectors is taken as the measure of dissimilarity between two frames. Corresponding to each keyframe in the test sequence, the database sequence containing the best matched keyframe is included in the candidate set. Finally, test sequence is matched with those in candidate set using the hypothesis test based scheme.
506
D. Dutta, S.K. Saha, and B. Chanda
(a)
(b)
(c)
(d) Fig. 6. (a) Sample frames from a database sequence (b), (c), (d) Frames obtained applying red, green and blue filter respectively Table 1. Copy Detection Performance Sampled Contrast Brightness Noise Filtered Frames Changed Changed Added image 100% 82% 84% 91% 76%
The scheme is successful in identifying all the cases of true-false (the cases where the sequence are truly not a copy) i.e. all the test sequences taken from outside the databases are also correctly detected as the original. Table 1 shows the performance of the proposed scheme under various attack. The sequences obtained by randomly selecting the frames from a database sequence have also been correctly identified as a copy. In case of other attacks, the performance is quite satisfactory but has certain cases of failure also. Addition of noise increases the spread of feature value but runs in hypothesis test are less affected. As a result, proposed scheme can identify the copy in most of the cases. As long as noise is affecting up to 40% pixels and intensity values are moderately modified, it was identified as a copy. Variation in contrast leads to considerable change in the energy of wavelet sub-bands and the scheme can not withstand such variation beyond a limit. For brightness variation, the change in sub-band energy is less significant than that due to contrast variation. But, due to brightness shift runs get affected. It may noted that wide variation affects the quality of the copy heavily and degradation makes it of no use. The copied version of a sequence may be obtained by applying colour filter on the original one.
Video Copy Detection: Sequence Matching Using Hypothesis Test
507
For such filtered version, a mixed result has been achieved. Details of a particular colour is lost leading to significance change in energy of the sub-bands. Thus, if the presence of a colour is insignificant in a sequence then the corresponding filtered version does not provide much clue about the original one and the scheme fails. More devoted effort is required to judge the performance against such attacks and it has to be dependent on the features/signatures used to represent the frames. In general, the experiment has established that the proposed methodology has the strong potential in addressing the issue of sequence matching even under the possible attacks.
5
Conclusion
In this work, we have presented a novel scheme for video copy detection. By comparing the keyframes of test sequence and database sequences, a subset of database sequence is taken as candidate set. Finally, we have proposed a multivariate WaldWolfowitz run based hypothesis testing scheme to verify whether the test sequence and any sequence of the candidate set are same or not. Experimental result shows that proposed sequence matching scheme is effective enough to detect the copy. It was also evident that the proposed methodology can cope up with commonly deployed attacks. In future, further work may be carried out in developing the features to handle the attacks in more elegant manner and suitable indexing scheme may also be used to work with large database.
References 1. Seo, J.S., Jin, M., Lee, S., Jang, D., Lee, S.J., Yoo, d.C.: Audio fingerprinting based on normalized spectral subband centroids. In: Proc. ICASSP, pp. 213–216 (2005) 2. Lee, S., Yoo, C.D.: Video fingerprinting based on centroids of gradient orientations. In: Proc. ICASSP, pp. 401–404 (2006) 3. Hampapur, A., Bolle, R.: Comparison of sequence matching techniques for video copy detection. In: Proc. Intl. Conf. on Multimedia and Expo., pp. 188–192 (2001) 4. Chang, E.Y., Wang, J.Z., Li, C., Wiederhold, G.: Rime: A replicated image detector for the world-wide-web. In: Proc. SPIE Multimedia Storage and Archiving Systems III, pp. 68–71 (1998) 5. Kim, C.: Ordinal measure of dct coefficients for image correspondence and its application to copy detection. In: Proc. for SPIE Storage and Retrieval for Media Databases, pp. 199–210 (2003) 6. Kim, C.: Content-based image copy detection. Signal Process. Image Comm. 18(3), 169–184 (2003) 7. Mohan, R.: Video sequence matching. In: Proc. ICASSP, pp. 3697–3700 (2001) 8. Hampapur, A., Bolle, R.: Feature based indexing for media tracking. In: Proc. Intl. Conf. on Multimedia and Expo., pp. 67–70 (2000) 9. Cheung, S.C.S., Zakhor, A.: Efficient video similarity measurement with video signature. IEEE Trans. CSVT 13(1), 59–74 (2003) 10. Li, Y., Jin, L., Zhou, X.: Video matching using binary signature. In: Proc. Intl. Symp. on Intelligent Signal Processing and Comm. Systems, pp. 317–320 (2005)
508
D. Dutta, S.K. Saha, and B. Chanda
11. Kim, C., Vasudev, B.: Spatiotemporal sequence matching for efficient video copy detection. IEEE Trans. on CSVT 15(1), 127–132 (2005) 12. Oostveen, J., Kalker, T., Haitsma, J.: Feature extraction and a database strategy for video fingerprinting. In: Chang, S.-K., Chen, Z., Lee, S.-Y. (eds.) VISUAL 2002. LNCS, vol. 2314, pp. 117–128. Springer, Heidelberg (2002) 13. Hua, X.S., Chen, X., Zhang, H.J.: Robust video signature based on ordinal measure. In: Proc. ICIP, pp. 685–688 (2004) 14. Lowe, D.G.: Object recognition from local scale invariant features. In: Proc. ICCV, pp. 1150–1157 (1999) 15. Joly, A., Buisson, O., Frelicot, C.: Content-based copy retrieval using distortionbased probabilistici similarity search. IEEE Trans. Multimedia 9(2), 293–306 (2007) 16. Wu, X., Zhang, Y., Wu, Y., Guo, J., Li, J.: Invariat visual patterns for video copy detection. In: Proc. ICPR, pp. 1–4 (2008) 17. Chen, L., Stentiford, F.W.M.: Video sequence matching based on ordinal measurement. In: Technical Report No. 1, UCL Adastral (2006) 18. Coskun, B., Sankur, B., Memon, N.: Spatio-temporal transform based video hashing. IEEE Trans. Multimedia 8(6), 1190–1208 (2006) 19. Bhat, D.N., Nayar, S.K.: Ordinal measures for image correspondence. IEEE Trans. PAMI 20(4), 415–423 (1998) 20. Maani, E., Tsaftaris, S.A., Katsaggelos, A.K.: Local feature extraction for video copy detection. In: Proc. ICIP, pp. 1716–1719 (2008) 21. Yan, Y., Ooi, B.C., Zhou, A.: Continuous content-based copy detection over streaming videos. In: Proc. Intl. Conf. on Data Engg., pp. 853–862 (2008) 22. Naphade, M., Yeung, M., Yeo, B.: A novel scheme for fast and efficient video sequence matching using compact signatures. In: Proc. SPIE Conf. Storage and Retrieval for Media Databases, vol. 3972, pp. 564–572 (2000) 23. Chen, L., Chua, T.S.: A match and tiling approach to content-based video retrieval. In: Proc. Intl. Conf. on Multimedia and Expo. (2001) 24. Shen, H., Ooi, B.C., Zhou, X.: Towards effective indexing for very large video sequence database. In: Proc. SIGMOD, pp. 730–741 (2005) 25. Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. IEEE Trans. PAMI 19(5), 530–535 (1997) 26. Zhao, H.V., Wu, M., Wang, Z.J., Liu, K.J.R.: Forensic analysis of nonlinear collusion attacks for multimedia fingerprinting. IEEE Trans. IP 14(5), 646–661 (2005) 27. Jain, A.K., Vailaya, A., Xiong, W.: Query by clip. Multimedia System Journal 7(5), 369–384 (1999) 28. Chang, S.F.S., Chen, W., Meng, H.J., Sundaram, H., Zhong, D.: Videoq: An automated content based video search system using visual cues. ACM Multimedia, 313–324 (1997) 29. Sze, K.W., Lam, K.M., Qiu, G.: A new keyframe representation for video segment retrieval. IEEE Trans. CSVT 15(9), 1148–1155 (2005) 30. Guil, N., Gonzalez-Linares, J.M., Cozar, J.R., Zapata, E.L.: A clustering technique for video copy detection. In: Proc. Iberian Conf. on Pattern Recog. and Image Analysis, pp. 451–458 (2007) 31. Wald, A., Wolfowitz, J.: On a test whether two samples are from the same population. Annals of Mathematical Statistics 11, 147–162 (1940) 32. Friedman, J.H., Rafsky, L.C.: Multivariate generalizations of the wald-wolfowitz and smirnov two-sample tests. The Annals of Statistics 7(4), 697–717 (1979) 33. Mohanta, P.P., Saha, S.K., Chanda, B.: Detection of representative frames of a shot using multivariate wald-wolfowitz test. In: Proc. ICPR, Florida, USA (2008)
An XML-Based Digital Textbook and Its Educational Effectiveness Mihye Kim1, Kwan-Hee Yoo2,∗, Chan Park2, Jae-Soo Yoo2, Hoseung Byun2, Wanyoung Cho2, Jeeheon Ryu3, and Namgyun Kim4 1
Department of Computer Science Education, Catholic University of Daegu, 330 Hayangeup Gyeonsansi Gyeongbuk, South Korea [email protected] 2 Department of Computer Education and IIE, Chungbuk National University, 410 Seongbongro Heungdukgu Cheongju Chungbuk, South Korea {khyoo,szell,yjs,hobyun,wycho}@chungbuk.ac.kr 3 Department of Education, Chonnam National University, 77 Yongbongro Bukgu Kwangju Chongnam, South Korea [email protected] 4 Department of Mathematics Education, Cheongju National University of Education 330 Cheongnamro Heungdukgu Cheongju Chungbuk, South Korea [email protected]
Abstract. Textbooks are undergoing a transformation into digital textbooks, which can offer a diverse range of supplementary digital media functions including sounds, audiovisuals, animations, 3D graphics and other state-of-the-art multimedia features. This paper proposes such an XML-based digital textbook aiming to maximize the learning effectiveness by integrating the advantages of traditional printed textbooks with additional digital media functions. The functions of the digital textbook are defined and an XML document format is established to facilitate more flexible use of and the interoperability of digital textbooks among different users and providers. As an application of these proposals, a digital textbook for sixth-grade mathematics was developed and then tested for two semesters at three elementary schools, to assess the overall effectiveness of the proposed concept. Our results indicate that classroom use of the digital mathematics textbook is comparable to that of a printed textbook, but the digital version offer more diverse learning opportunities and facilitate the improvement of learning achievement. Keywords: Digital textbook, XML-based digital textbook, Mathematics digital textbook, Educational effectiveness of digital textbook.
1 Introduction Rapid advance in digital technology, along with concurrent development in Information and Communication Technology (ICT), are making computers and related devices a ubiquitous part of life. In a digital-based learning environment, the textbooks used in schools are also undergoing a transformation into digital textbooks (hereafter, ∗
DTs), which offer greater flexibility to teachers and students than traditional analog media [1], [2]. Students born into the digital age; that is, the so called ‘digital natives’ or ‘Google generation’ born after 1993 prefer electronic sources of information [3], [4], because they have been immersed in learning environments that are different from those familiar to ‘analog natives’ [5]. To address these changing needs, various types of DTs have been developed [6], [7], [8]. Unlike ordinary books, textbooks are the primary learning resources used in schools. Accordingly, DTs are defined as curriculum-based digitized textbooks designed to replace printed textbooks in schools via a desktop computer or proprietary terminal through wired or wireless networks without time or space limitations. They can provide all the functions of traditional printed textbooks, as well as the added benefit of various types of digital media. In addition, they can be swiftly updated with the latest information, and are also capable of providing a much wider range of learning activities, by linking to other educational materials [5]. To date, however, most existing DTs are used as supplementary materials to existing printed textbooks, rather than as stand-alone replacements. They are often included as part of E-learning content [1], [5], [9]. In addition, the traditional learning habits of students using printed textbooks have not been considered in the development of DTs, resulting in minimal use [1]. A study of DT design [10] pointed out that students are more familiar with DTs that are similar to printed textbooks, and they interact with DTs according to their experience with paper textbooks. Our aim is to develop DTs in close adherence to the paradigm of traditional printed textbooks that combine the advantages of paper textbooks with those of digital media, such as searching and navigation, audiovisuals, animations, and 3D graphics, and other state-of-the-art multimedia features to make the learning experience more convenient and effective [1], [2], [11]. Another objective is to facilitate the interoperability of DTs among different users and providers by developing an XMLbased document format. The main difference between the previous work and our work is an emphasis on teaching and learning functions for the DTs beyond simple textbook digitization; that is, the proposed DT integrates teaching and learning activities into the process of utilizing the DT through various multimedia learning support functions. Note that the contents of this paper are based on studies [2], [12], [13] conducted by the Korea Education and Research Information Service (KERIS). This paper is organized as follows. Section 2 presents a review of existing literature on DTs. Section 3 provides the functions of the proposed DT and an XML document format compatible with a variety of DTs. In Section 4, a DT for sixth-grade mathematics is presented as an application of the proposed DT. Section 5 describes the results of an experimental trial using the mathematics DT with students. The paper concludes with a discussion of possible directions for future research.
2 Related Work DTs are used in a comparatively limited way, to facilitate the educational process in schools utilizing a desktop computer, tablet PC, or proprietary terminal. DTs were originally known as ‘electronic textbooks’, emphasizing their formal and functional
An XML-Based Digital Textbook and Its Educational Effectiveness
511
aspects, but since 2007 they have been referred to as ‘digital textbooks’ to emphasize their teaching and learning functions [14]. In Korea, earlier studies in this area focused primarily on DTs with simple contents and technologies that were not designed to completely replace printed textbooks. Rather, they were conceived of as supplementary materials for printed textbooks, delivered through a website [1], [5]. Since 2005, however, the direction of research has been toward developing DTs as a replacement for printed textbooks. In 2005, the KERIS directed a study on the standardization of Korean DTs [8]. The following year, a draft of a Korean standard for DTs was created [2], and a study to measure the effectiveness of DTs was carried out [12]. In 2007, the Korean Ministry of Education & Human Resources Development established and began to promote a mid- and longterm DT commercialization strategy, the objective of which was the formulation of a DT model. The ministry has also planned an experimental trial of DTs in 100 schools nationwide by 2011, and an initiative to distribute DTs to all schools by 2013 [15]. At the international level, there have been a number of case studies in this area. One of these was an electronic textbook project named ‘eduPAD’ launched by Singapore’s Ministry of Education in September 1999 [16]. An eduPAD was a portable computerized device with a variety of functions, such as the display of animation, the use of hyperlinks, and wireless Internet capability. In 2000, a pilot test of eduPAD was conducted with 160 first-year middle school students. Contrary to project expectations, however, the system did not promote dynamic interaction between students and teachers or facilitate collaborative peer learning or independent learning. Another case study on electronic textbooks was the MalayBook project in Malaysia [17]. The MalayBook was a slightly modified educational version of the standard Psion netBook that was developed for the distribution of e-books. In 2002, several thousand MalayBooks capable of substituting for paper textbooks were produced and distributed to about 8,000 students for a test run. However, the MalayBook included only the contents of printed textbooks, supported no additional learning functionality, and hence was not suited to satisfy students’ diverse learning needs, and the project failed [18]. In the United States, research has focused more on systems for digitizing printed materials that are intended primarily for higher education, rather than for elementary and secondary schools. Many universities including the University of Texas [19], California State University [20], the University of Denver [21], and the University of Phoenix have developed web-based electronic textbooks for online education. The digitization of textbooks and distribution of DTs seem to be progressing more rapidly in higher education than in elementary and secondary schools. In fact, the purchase of textbooks in the UK universities has gradually declined over the years [22], and schools are becoming increasingly interested in digital courseware solutions. In addition, terminals such as Rocket e-Book, Softbook Reader, and GoReader, which are used exclusively with electronic textbooks, have been introduced into the market. GoReader [23], [24] has many features that students use in printed textbooks, such as a highlighting pen, a pencil, post-it notes, and writing functions. It has storage space for about 150 textbooks, but offers only reading, highlighting, and on-screen memos, rather than more useful learning functions. Very recently, the California Digital Textbook initiative was enacted to make free DTs available by next fall to high school math and science classes throughout California [25].
512
M. Kim et al.
The UK government conducted the Electronic Books ON-screen Interface (EBONI) project, which developed guidelines for the design of electronic textbooks and e-book hardware for higher education [26], [27]. In addition, many other countries including Japan, Canada, and European countries are sponsoring research on Elearning and DTs.
3 An XML-Based Digital Textbook The proposed DT was developed based on an XML document format. Aside from XML, there are also other representation languages such as HTML, PDF, and SGML. Among those, HTML and PDF do not allow the logical and systematic expression of contents. Herewith, PDF also does not allow the separation of document contents and styles. On the contrary, XML can separately represent content and style and can logically structure contents with its searching, linking, and document-styling functions. XML was designed not only to make it easy to use SGML on the Web but also for ease of implementation and for interoperability with both SGML and HTML. For these reasons, we adopted XML as the representation language for the document format of the proposed DT. 3.1 Functions of the Digital Textbook The functions of the DT are defined by dividing into eight categories: authentication, display, input, move, search, print, multimedia, and learning support. These functions are further divided into sub-functions, and each of these is given a detailed definition that includes a name, description, types of input values, examples, and references. Due to space limitations, a summary of the DT functions is only presented here as shown in Table 1. 3.2 XML-Based Document Format for the Digital Textbook The XML document format of the DT functions is defined in two categories: basic and supplementary information. The document format for basic information is designed to present the original contents of the DT. The format for supplementary information is designed to display the various types of additional information that users create while using the DT, via the operating environment or a viewer. The XML-based document format for DT contents refers to the Document Type Definition (DTD) that defines the schema for the XML document structure. The XML elements and attributes are defined in accordance with the DTD. They can be identically defined in accordance with the schema. XML makes it possible to distinguish the content of a document in terms of meaning, rather than in terms of style. It also represents the contents in a hierarchy. An XML document does not include any information on style, because it creates its style using eXtensible Style sheet Language (XSL) or Cascading Style Sheet (CSS). Similarly, the DT functions are defined via the XSL or CSS, so that users can view the contents of the DT regardless of the type of style sheet language used.
An XML-Based Digital Textbook and Its Educational Effectiveness
513
Table 1. Summary of the digital textbook functions Function Type Functions Authentication User authentication (register a user name and password), Enter a user name Function in a DT, Login Display Function Display texts or images, View by page unit (single or double page), Zoom in and out, Fit the page to the screen, Page scroll, Fit to width and/or to height of a page, Indicate page thickness and page number, Text hide Input Fucntion Writing (Stylus writing, Save, delete writing), Memo & Notes (Enter, save, edit, open, view, delete memo/notes, Select memo pen color, Create a table of memo contents, Assign, move, resize a memo/note window, Indicate the date and time of notes, Save notes as an image or a text file), Underline & Highlight (Select a color, shape, thickness for underline/highlight, Save, edit, delete an underline and highlight), Voice memo (Record, play, delete voice memo), Textbox (Create, save, edit, delete a textbox, View the content of a textbox hyperlink), Create input box, Formula (Enter, edit, delete, view a formula) Move Function Navigation function (Move to a particular page in the DT by the table of contents and tags, by previous, next buttons, and by entering page number or page turning), Bookmark (Set, view, save, edit, delete a bookmark, Move to previous or next bookmark, Set bookmark for the log-out page) Search Function Search within a DT or among DTs (by a keyword or a multimedia object) Print Function Print function (Print a specific page, section, chapter, whole), Print memo and notes (in part or in full), Copy function (Copy a specific text or image to a word processor), Sound effects (Click, select, error sound effect) Multimedia Multimedia (View pictures, 3D motion graphics, Animations, Audiovisuals Function or Visual reality, Open multimedia objects in new window), Interactive multimedia (View interactive multimedia, Open interactive multimedia objects in new window) Learning Hyperlink (Create hyperlink), Glossary (View, search glossary), Subject Support menu (Construct, view additional menus for a subject), Data transmission Function (Teacher to student, to group, to class), Formative/Summative evaluations for individuals, groups, or the entire class (View questions, Solve questions, View evaluation results and statistical data)
The XML document format is constructed in a hierarchical structure as shown in Fig. 1. This structure of the XML format for a DT starts with the top-level element ‘dt’, which consists of the ‘metainfo’ and ‘textbooks’ elements; the former includes basic and logical information about the DT, while the latter designates the contents of one or more actual textbooks. The element ‘dc-metainfo’ is divided into the 15 elements (dc:***) defined by the Dublin Core Metadata Initiative (http://dublincore.org/) and the user-defined element ‘x-metadata’. A DT service unit can consist of more than one textbook, and hence may include a collection of textbooks. The basic structure of a collection consists of the ‘cover’, ‘front’, ‘textbook(s)’, and ‘back’ elements in a hierarchy, and similarly each single textbook is structured with the ‘cover’, ‘front’, ‘body’, and ‘back’ elements, where body is the main part of a textbook, consisting of the ‘part(s)’, ‘chapter(s)’, ‘section(s)’, and ‘media object(s)’.
An XML-Based Digital Textbook and Its Educational Effectiveness
515
All elements for the basic and supplementary information of the document format are defined with a full name, description, content specification, and attributes. Because the volume of elements is quite large, only the names of the basic information are listed in Table 2. When using a DT in an operating environment or a viewer, users can create a variety of supplementary information. To present such information consistently in different environments, an XML document format is defined independently from the true DT content document format. These supplementary information is divided into, and defined accordingly as, the formats for the user certification; the formats for the saving of the stylus writing, memos, notes, underlines, highlights, voice memos, textboxes, and bookmarks; the formats for the glossary, formulas and additional menus for each subject. Additional formats for presenting other types of information can be devised as needed. In this paper, the XML document structures for each format were not described due to the limited space.
4 A Mathematics Digital Textbook As an application of the above protocols, a mathematics DT for 6th Grade elementary students in Korea was developed. A viewer for DT display was also developed to operate the mathematics DT on desktop computers, notebooks or tablet PCs, rather than on exclusive terminals with a package format. The structure and contents of the mathematics DT basically adhere to the paradigm of traditional printed textbooks. DT content is displayed in color style, just as in paper textbooks, and stored internally as texts, images, animations, and audiovisuals, but shown in the form of a printed textbook using the style information set for the viewer. Fig. 2 shows an example of texts, tables, and images in the double-page view (the default setting, which can be changed to single-page view at any time). Specific text or multimedia objects can be selected using a mouse, electronic pen, or keyboard, and images and texts can be zoomed in and out. The thickness of the textbook is shown on the left, bottom, and right sides of the screen. The input function controls data entry through various input devices, and includes stylus writing, memo, note, underlining, highlighting, and voice memo functions. The writings, drawings, and notes are displayed on the screen exactly as they are entered. Other basic input functions include voice memo, textbox, and formula input functions. Fig. 3 shows an example of how the various input functions are used. Unlike a printed textbook, the DT offers various types of page movement. The basic move function is accomplished by using the Table Of Contents (TOC), the table of bookmarks or tags, entering a specific page number, or the previous or next page arrow button. The primary advantage of digital media is the search function, which allows a user to quickly and easily find particular contents or objects, not only in a single DT, but also among several DTs. Two main search methods are provided: text and multimedia search. In addition, the user can print any part of the DT, including content, notes, and memos, either in full or in part, by page, section, or chapter, via a printer or other printing device. A copy function is also supported, allowing the user to copy any specific part of the DT to a word processor or other program.
516
M. Kim et al.
Fig. 2. An example of texts, tables, and images in the double-page view
Fig. 3. An example of stylus writing, memo, and highlighting
In addition, the user can print any part of the DT, including content, notes, and memos, either in full or in part, by page, section, or chapter, via a printer or other printing device. A copy function is also supported, allowing the user to copy any specific part of the DT to a word processor or other program. The multimedia support function is designed to facilitate user understanding of the content and improve overall learning achievement via the use of multimedia features. For example, a teacher can play videos or animations at the beginning of a class, to enhance the students’ understanding, while simultaneously stimulating their interest and advancing the learning experience. The function supports multimedia objects
An XML-Based Digital Textbook and Its Educational Effectiveness
517
such as pictures, 3D motion graphics, animations or visual reality. For example, diagrams and geometrical figures appeared in mathematics textbook are presented as 3D graphics that can be moved in any direction, rather than in the 2D graphic format of printed textbooks. An electronic calculator and a specific program are also supported when a complex calculation is required. Multimedia objects are displayed in embedded form inside DT, and some of these have interactive features. The right screen of Fig. 4 shows examples of 3D graphics appeared in mathematics DT. The DT offers various learning support functions to enhance the learning process. Users can easily access any required information inside or outside of the DT, using hyperlinks via the Internet. When students need to refer to a terminological dictionary or to main concepts or terms that are critical to their understanding of a subject, the DT presents the appropriate content by using hyperlinks in the corresponding texts. Texts or multimedia objects inside the DT, as well as external websites, are designated by hyperlinks. Customized menus can be also designed for any subject, by defining the specific features of the subject and appending appropriate submenus. In addition, certain portions of the DT can be hidden, such as answers and explanatory material pertaining to test questions. The teacher can also send information to individuals, groups, or the whole class. Furthermore, teachers can create test questions for formative and summative evaluation. When a teacher wishes to assess the students’ grasp of a topic, the appropriate questions can be sent to the students. The students then answer the questions directly on the DT, and return their answers to the teacher. After receiving the answers, the teacher can immediately evaluate and analyze the results, and return his/her assessment to the students, together with any comments. In this way, evaluation of student understanding is greatly facilitated, and teachers can adapt their educational or instructional approach to the level of student achievement. The teacher can determine the necessary follow-up instructional content for each student, in accordance with individual test results.
Fig. 4. Examples of test items for some problems inside the digital textbook and 3D graphics
518
M. Kim et al.
As previously noted, the content of the proposed DT is structured to facilitate educational activities in relation to student abilities, with additional learning content for weaker students and enrichment learning materials for stronger students. The screen of Fig. 4 shows examples of test items for some problems inside the DT, answers of them can be filled in by students.
5 Experiments The mathematics DT was subjected to a practical in-school trial and classroom-tested for two semesters at three elementary schools. The objective of this experiment was to verify the educational effectiveness of the proposed DT and to examine the possibility of distributing DTs in schools. The experiment was conducted by analyzing the learning patterns of students while using the DT. It was also performed by conducting a learning achievement test of students. 5.1 Experimental Design The learning patterns of students were examined by directly observing how the DT was employed in the classes. To observe classroom discourse and the practical usage of the DT, class proceedings were videotaped over two semesters in three elementary schools. In each class, three video cameras were installed. The video files were then analyzed and the results were subsequently transcribed. A learning achievement test (LAT) was also carried out three times, with the first at the end of the first semester, and the second and third at the end of the second semester. The LAT was measured on the basis of scores from a midterm exam conducted before the use of the DTs. The experimental group was divided into two groups and compared: one that used a DT and one that used a traditional printed textbook (PT). Table 3 lists the experimental groups and their environments. The three schools were labeled A, B, and C. School A is located in a large city, with about 35 students in a class. School B, a small rural school with only one class per year and about 10 students in a class, used only the DT. To carry out a comparative study with school B, one class was selected in a third school C, which has a social and cultural environment similar to school B. Table 3. Experimental group and its environment School Classroom Textbook No. Number of type type of participant class students Digital A-D 4 144 textbook A Print A-P 4 144 textbook Digital B B-D 1 10 textbook Print C C-P 1 19 textbook
Region Classroom Computing environment Urban Urban Rural Rural
Computer room Normal classroom Normal classroom Normal classroom
1 desktop PC per student, 1 electronic whiteboard 1 desktop PC, 1 projection TV 1 tablet notebook per student, 1 electronic whiteboard 1 desktop PC, 1 projection TV
An XML-Based Digital Textbook and Its Educational Effectiveness
519
5.2 Experimental Results 5.2.1 Learning Patterns of Students The video files were analyzed to extract the learning patterns of students, with special attention to how teachers and students employed the DT for a number of curricular items. A comparative analysis of practical usage among the four different classroom types was performed. Several interviews were also conducted with the teachers who participated in the experiments, to verify the results of the video analysis. The followings are the results of the experiment summarized from the analysis of the classroom observations: It was frequently observed that classes that used the DT saved much operational time and effort compared to those that used the printed textbook (A-P and C-P). The A-P and C-P students spent too much time working on the models presented by the teachers, and consequently did not have enough time to formulate their results and present their findings in class. In contrast, the A-D and B-D students, who were often aided by multimedia features, spent less time on operational activities, and thus had more time to formulate their results and present their findings in class. Second, in the A-D and B-D groups, weaker students who were unable to follow the teacher’s instructions or keep up with other students voluntarily studied the material using the multimedia features of the DT, obtaining help from the teacher. The teachers monitored the activities of these students, and confirmed their progress by question and answer. In contrast, such students that used PTs received additional instruction from the teacher on topics they could not understand but did not have the opportunity to explore these topics on their own. This suggests that the use of DTs will allow academically challenged students to understand the material by exploring multimedia features, instead of simply listening to repeated explanations. Third, it was often observed that students in the DT classes freely manipulated the multimedia features of the DT whenever necessary, even without the teacher’s guidance. Moreover, some students formulated their own learning results by navigating the DT. This appears to support the contention that the use of DTs may not only improve the quality of the learning process but also provide wider learning opportunities to students. The above results also seem to support the hypothesis that when a DT is used, students can carry out self-directed learning according to their own abilities. Fourth, the students skillfully used the various functions of the DT, just as they would have used a PT; that is, the mathematics DT was used in the same manner as any math printed textbook. No students had functional difficulties with either the DT or the applications embedded in it. On the contrary, it was the teacher who sometimes asked the students how to use an application. It was thus determined that even if various media functions are included in DTs, these functions do not impede the learning process, and one may expect that a DT environment may make the overall learning process more effective. Obstacles to effective learning did not come from the DT itself but rather from the environment of the computer classroom. Such factors included the inconveniences of moving to the room in which the computers were installed and using desktop computers to access the DT (class A-D). In fact, the students who used a tablet PC (class B-D) were much more comfortable with the DT, and showed a higher level of satisfaction than those who used a desktop computer.
520
M. Kim et al.
5.2.2 Learning Achievement Test A Learning Achievement Test (LAT) was carried out three times in school A, where both digital and PTs were used to verify the educational effectiveness of the proposed DT. The first LAT was administered at the end of the first semester, while the second and third ones were carried out at the end of the second semester. The LAT results were compared with scores from a midterm exam conducted before the DT was used, classified into four performance groups (i.e., percentile score of 25%: 0~24.9%, 25~49.9%, 50~74.9%, 75~100%) and two treatment groups (i.e., DT and PT). Two hundred eighty-eight students participated in the first semester, while 230 students were tested in the second semester. Table 4 shows the results of the LAT based on two treatment and four performance groups for the first (a), second (b), and third (c) tests. Fig. 5 shows the results in Table 4 in graphical formats. Adjusted scores were used for the second and third test results. As shown in Table 4 and Fig. 5(a), the results of the first LAT indicate that there were no differences between the DT and PT usage groups. The ANalysis Of VAriance (ANOVA) was performed on the treatment and performance groups. The results revealed no statistically significant differences in the treatment groups (F1, 280=.00, p>.05), whereas a main effect appeared in the performance groups (F3, 280=40.4, p<.05). This, however, appeared to have been incurred from the learners’ competence parameter and has no direct correlation with learning effectiveness. No significant correlation was found between the treatment and performance groups (F3, 280=.15, p>.05). Regarding the results of the second LAT, the DT usage groups exhibited a lower learning achievement than the PT usage groups in general, as shown Table 4. Average learning achievements of the treatment and performance groups Semester
An XML-Based Digital Textbook and Its Educational Effectiveness
521
Fig. 5. Results of the learning achievement in Table 4
in Fig. 5(b). Within only the percentile in the range of 75~100%, the average score of the DT group showed slightly high scores. The ANOVA results, however, showed that no main effect was incurred by the treatment (F1, 221= .07, p>.05) and performance groups (F3, 221= 2.18, p>.05). Regarding the third LAT results, the DT groups showed a higher learning achievement compared with the PT groups, especially those whose percentile were within the 25~49.9% range. In addition, the ANOVA results of the two treatment (F1, 221= 4.18, p<.05) and four performance (F3, 221= 7.70, p<.05) groups showed a significant difference. Accordingly, for more accurate verification, a test of simple main effect was performed on the performance groups. The results showed that only the percentile in the range of 25~49.9% showed a significant difference (F1, 225 = 4.86, p<.05), and that no significant differences were found in other groups. Overall, the use of DT was not always effective and there was no significant difference between the DT and PT groups. However, it can be assumed that the more familiar the students become with DTs, the greater the effectiveness of the DT class will become.
6 Conclusion In this paper, an XML-based mathematics DT was proposed that closely adhered to the paradigm of the printed textbook, while offering the added advantages of digital media. The objective was to develop a DT that could transparently and seamlessly replace existing printed textbooks in elementary and secondary schools and an appropriate XML-based document format to support interoperability and practical usage in different environments. The aim was also to maximize learning effectiveness by providing digital media functions such as searching and navigation, and especially multimedia learning support functions. A sixth-grade mathematics DT was developed using the proposed protocols and was subjected in practical classrooms for two semesters at three elementary schools to examine the possibility of distributing DTs in schools and to verify the educational effectiveness of the proposed DT. The results showed that the mathematics DT was used in class in much the same way as printed textbooks and the DT demonstrated that the added multimedia features could enhance the overall learning experience and learning effectiveness. Some of the experimental results indicated that the mathematics DT can facilitate the teaching tasks
522
M. Kim et al.
in accordance with student level, in addition to offering a self-directed learning environment and more diverse learning opportunities. Moreover, the DT usage class exhibited potential for improving learning achievement compared to the printed textbook class. These results could support the feasibility of using DTs as the main educational resource replacing existing printed textbooks in elementary and secondary schools. Although this paper may be useful in preparing the way for the possible use of DTs in schools, there are still many fundamental issues that need to be explored. These issues are related not only to the further development of DTs but also to changes in the educational system. In addition, there should be an ongoing investigation of the requirements of teachers and students via a wider range of case studies. Finally, the development of DTs must take into account the computing environment in which they will be used, facilitating the improvement of learning achievements. In the sphere of information and knowledge, the outcome of any knowledge-based activity depends on who owns and utilizes the tools that support knowledge activities in the most efficient manner. DTs are important learning tools that could usher in the future of educational systems, and achieve a balanced development of state-of-the-art education. They are also tools for bolstering an individual student’s knowledge, and improving the overall quality of the learning experience. Better education requires better tools, which in turn require continuous research and effort. Acknowledgments. This work was supported by the MEST and KOTEF through the Human Resource Training Project for Regional Innovation and by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (The Regional Research Universities Program and Chungbuk BIT Research-Oriented University Consortium).
References 1. Yoo, K., Yoo, J., Lee, S.: The present state of the standardization of digital textbooks. Review of Korean Institute of Information Scientists and Engineers 26(6), 53–61 (2008) 2. Lee, S., Yoo, J., Yoo, K., Byun, H., Song, J.: Design and implementation of e-textbook based on XML. Journal of Korea Contents Association 6(6), 74–87 (2006) 3. Tonkery, D.: E-Books Come of Age with Their Readers. Research Information 24, 26–27 (2006) 4. Rowlands, I., Nicholas, D., Williams, P., Huntington, P., Fieldhouse, M., Gunter, B., Withey, R., Jamali, H.R., Dobrowolski, T., Tenopir, C.: The Google generation: the information behaviour of the researcher of the future. Aslib Proceedings: New Information Perspectives 60(4), 290–310 (2008) 5. Yim, K.: Future education and digital textbook. Journal of Korea Textbook Research, Korea Textbook Research Foundation 51, 6–12 (2007) 6. Kim, N.: Design and implementation of electronic textbook for high school based on XML. Master Thesis, Graduate School of Education, Yonsei University, Seoul, Korea (2001) 7. Sohn, W., Ko, S., Choy, Y., Lee, K., Kim, S., Lim, S.: Development of a standard format for eBook. In: Pro. of the 2002 ACM Symposium on Allied Computing, pp. 535–540 (2008) 8. Kim, E.: Design and implementation of electronic textbook of music for elementary school based on XML. Master Thesis, Graduate School of Dongguk University, Seoul, Korea (2004) 9. Son, B.: A concept and the possibility of digital textbook. Journal of Korea Textbook Research, Korea Textbook Research Foundation 51, 13–19 (2007)
An XML-Based Digital Textbook and Its Educational Effectiveness
523
10. Wilson, R., Landoni, M., Gibb, G.: The WEB Book Experiments in Electronic Textbook Design. Journal of Documentation 59(4), 454–477 (2003) 11. Byun, H., Kim, N., Cho, W., Heo, H.: A study on development methodology for Mathematics electronic textbook in 2005. Korean Education & Research Information Service, Research Report RR 2005-23, Republic of Korea (2005) 12. Byun, H., Cho, W., Kim, N., Ryu, J., Lee, G., Song, J.: A study on the effectiveness measurement on electronic textbook. Korean Education & Research Information Service, Research Report CR 2006-38, Republic of Korea (2006) 13. Byun, H., Yoo, K., Yoo, J., Choi, J., Park, S.: A study on the development of an electronic textbook standard in 2005. Korean Education & Research Information Service, Research Report CR 2005-22, Republic of Korea (2005) 14. Jung, E.: Status and future direction of digital textbook. Institute for Information Technology Advancement, Weekly Technology Trends 1347, 14–22 (2008) 15. Ministry of Education & Human Resources Development (Before: Ministry of Science, Education and Technology): Strategy for commercial use of digital textbook. Republic of Korea (2007), http://www.nhrd.net/nhrd-app/jsp/tre0202.jsp?sSeq=20070255, http://www.en.wikipedia.org/wiki/Digital_Textbook (2009) 16. Lourdusamy, A., Hu, C., Wong, P.: Perceived Benefits of eduPAD in Enhancing Learning. In: The International Educational Research Conference, University of Notre Dame, Fremantle, Western Australia (2001), http://www.aare.edu.au/01pap/atp01299.htm 17. Hasbrouck, E.: MalaytBook: Special version of the Psion netBook produced for schools in Malaysia and now available as surplus for a fraction of the standard netBook price. Malaysia (2003), http://www.hasbrouck.org/netbook/ 18. Son, B., Seo, Y., Byun, H.: Case studies on electronic textbook. Korean Education & Research Information Service, Research Report RR 2004-5, Republic of Korea (2004) 19. Dillon, D.: E-books: The University of Texas experience: Part 1. Library Hi Tech. 19(2), 113–124 (2001) 20. Langston, M.: The California State University E-Book Pilot Project: implementations for cooperative collection development. Library Collections, Acquisitions, & Technical Services 27(1), 19–32 (2003) 21. Levine-Clark, M.: Electronic book usage: a survey at the University of Denver. Libraries and the Academy 6(3), 285–299 (2006) 22. Davy, T.: E-Textbooks: Opportunities, Innovations, Distractions and Dilemmas. The Journal for the Serials Community 20(2), 98–102 (2007) 23. Planet eBook. GoReader (2002), http://www.planetebook.com/mainpage.asp?webpageid=15&TBToolI D=1006 24. Dearnly, J., McKnight, C.: The revolution starts with week: the findings of two studies considering electronic book. Information Services and Use 21(2) (2001) 25. Farrell, M.B.: Schwarzenegger’s Push for Digital Textbooks, abc NEWS, June 14 (2009), http://abcnews.go.com/Technology/Economy/story?id=7827997&pa ge=1 26. Wilson, R., Landoni, M., Gibb, G.: A user-centered approach to e-book design. The Electronic Library 20(4), 322–330 (2002) 27. Wilson, R., Landoni, M., Gibb, G.: Guidelines for Designing Electronic Books. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, pp. 127–139. Springer, Heidelberg (2002)
SIMACT: A 3D Open Source Smart Home Simulator for Activity Recognition Kevin Bouchard, Amir Ajroud, Bruno Bouchard, and Abdenour Bouzouane LIARA Laboratory, Universite du Quebec a Chicoutimi (UQAC) 555 boul. Universite, Saguenay (QC), Canada, G7H 2B1 {Kevin.Bouchard,Amir.Ajroud,Bruno.Bouchard, Abdenour.Bouzouane}@uqac.ca
Abstract. Smart home technologies have become, in the last few years, a very active topic of research. However, many scientists working in this field do not possess smart home infrastructure allowing them to conduct satisfactory experiments in a concrete environment with real data. To address this issue, this paper presents a new flexible 3D smart home infrastructure simulator developed in Java specifically to help researchers working in the field of activity recognition. A set of pre-recorded scenarios, made with data extracted from clinical trials, will be included with the simulator in order to give a common foundation for testing activity recognition algorithms. The goal is to release the SIMACT simulator as an open source component that will benefit the whole smart home research community. Keywords: Smart Home, 3D Simulator, Activity Recognition, Assistance, Real Case Scenarios, Open source.
SIMACT: A 3D Open Source Smart Home Simulator for Activity Recognition
525
real case scenarios because of the complexity of gathering real data from conducting clinical trials. Also, real large scale experiments are very expensive, and thus only a limited number can be conducted. Therefore, to adequately investigate this field, one must be able to conduct satisfactory experiments with concrete data at an affordable cost. Also, we need a common foundation upon which different recognition methods and algorithms can be validated experimentally and compared. To address this issue, several projects [7][8][11], over the last few years, tried to develop simulation tools for conducting smart home experiments. However, these previous tools are not flexible enough, they are not specifically designed to test recognition approaches and, more importantly, they do not provide a set of pre-recorded real case activity scenarios with the tool. In this paper, we propose a new flexible 3D smart home simulator developed in Java. This simulator is specifically designed to help researchers working in the field of activity recognition [15] to conduct experiments with real data. The architecture of the system allows a third party application (such as an intelligent recognition agent) to easily connect itself to a database in order to receive, in real time, the simulated sensor inputs. A set of pre-recorded scenarios, made with data extracted from clinical trials, will be included with the tool in order to give a common foundation for testing activity recognition algorithms. The goal is to release the SIMACT simulator as an open source component [14] that will benefit the whole smart home research community. The paper is organized as follows. Section 2 presents the new proposed simulator: software architecture, implementation, scripting, etc. Section 3 presents our actual initiative to gather real data that will be incorporated in the simulator as real case prerecorded scenarios. Section 4 presents an overview of related work. Finally, section 5 concludes the paper by outlining future developments of this work.
2 SIMACT: A 3D Smart Home Simulator SIMACT was built with the intention of providing a software experimentation tool that will help researchers to validate their smart home recognition approaches. It is really a simple and straightforward application. As we can see in Figure 1, the window of the application is divided into three particular zones. The first one is a three dimensional graphic canvas built from one of the most promising game engines in Java: the Java Monkey Engine (JME) [9]. This frame contains all that is necessary to simulate a real environment incorporating virtual sensors. In the 3D zone, you can freely move the camera around the environment to take different points of view and decide where is the most appropriate position to see what is going on. Another quality of the 3D zone is that it was built in a way to abstract code from design. SIMACT comes with a ready-to-use 3D kitchen, but one may decide to change it into another kind of room. These changes are really effortless to achieve, and it is also possible to add sensors and animated objects without touching a line of code. The second zone is the script control zone, which provides a fast, reliable and easy way to control the execution of a script in real time. To make it simple, it works like a video player; you can watch it play, make it pause, and even go back and forward in the execution. During play time, the application shows some useful information about the current step. You can adjust the speed of the execution without changing the real
526
K. Bouchard et al.
time value. For example, if there is one step that normally takes five seconds and the playing speed is multiplied by a factor of five, it will take only a second to execute it, but SIMACT will understand that it has taken five seconds to execute the task. During the execution of a script simulation, SIMACT gathers data from virtual sensors and saves it in a database. Consequently, a third party application (i.e. a recognition agent) may be built independently and connected to the database to extract the data from the simulation. SIMACT will come with a predefined set of scripts to get started in a flash, because we really want to provide an easy-to-use tool. Users also have the possibility to write custom scripts. SIMACT’s flexibility can be truly seen with this abstraction of the script. To be clear, the scripts define a series of interventions in the simulated environment, and SIMACT interprets these actions and act consequently. For example, the script may say: “Tactile mat in the front of oven activated.” and the response from SIMACT will be to change the color of the mat, insert an event in the database and show a message to the user.
Fig. 1. SIMACT in action
There are several options that can modify the way SIMACT works or even the way scripts are interpreted. You may define, for instance, all steps to be time fixed at five seconds, or you can decide to add a random value from one to ten seconds to the execution time of a step. This will allows to capture the human variation in the time taken to perform an action. As an open source application [14], SIMACT will evolve to become a standard reference tool commonly used to test and compare smart home recognition methods and other related ambient algorithms.
SIMACT: A 3D Open Source Smart Home Simulator for Activity Recognition
527
2.1 Software Architecture The software architecture is flexible and adaptable to the specificity of the problems encountered by scientists. It was conceived in a way to separate code from every dynamic aspect of the software. As we can see in Figure 2, there is a set of classes used to control the 3D canvas. These classes are the only ones with JME code and are strictly used to operate the frame; not to control it. More precisely, there are, in the main system, interpreter classes with the role of understanding what to do and how do it. Imagine the event “Open the oven door”. The command will be read from script by a XML reader module and sent back to the command interpreter that will finally tell the 3D canvas to visually open the oven door.
Fig. 2. Architecture representation
XML tags are used to write scripts and to define the 3D environment. This prevents diving into the code each time we want to modify it. Interpreter classes do not only command the 3D frame, they also control the database and the user console. The console is only used to see in real time what is going (triggered events). Every time an event is inserted in the database, the user sees it at the same time on the console. With the database, a third party application could easily get all the data from the simulated person’s scenario in real time and conduct action over it. The third party application is completely independent and can be coded with other languages. 2.2 Java Implementation The SIMACT simulator was programmed entirely in Java because we wanted to make it open source and portable. The open source community [14] around Java is quite large and no other languages meet the requirements. Java is also a powerful objectoriented language which is known in computer science to favor code reusability. The swing library was chosen for the GUI (graphical user interface) because it is crossplatform, well documented and easy to use. The 3D part was a little trickier because Java is not especially used in game programming. After some research, we found JME [9], a relatively new promising 3D engine. JME is in a mature state; it is a serious engine that was adopted by the Java community over the last few years. It gave us all the functionalities that we expected for our 3D simulation and more. For the 3D
SIMACT: A 3D Open Source Smart Home Simulator for Activity Recognition
529
<step time="5"> Open refrigerator door <description>Open the refrigerator door to get the milk The refrigerator door is open-0.5y
3 Experimentations, Trials and Real Case Scenarios In its current beta state, SIMACT is working and offers a variety of functionalities and options. Functionality is surely important, but one of SIMACT most significant contribution to the community is to come with a well defined set of scripts of activities, based on real case scenarios extracted from clinical trials. To achieve that, we signed a formal collaboration agreement with the Cleophas-Claveau rehabilitation center of La Baie (QC), which provides us with an adequate group of cognitivelyimpaired people, such as Alzheimer’s patients. We already have been approved by the ethical committee to make clinical experimentations with a group of 20 people suffering of Alzheimer disease and other similar cognitive deficits. We cooperated with our colleague, Julie Bouchard, a neuropsychologist researcher, who helped us in the determination of the test. We decided to exploit a well-known test used by therapists and named the “Naturalistic Action Test” [17]. We conducted recently a preliminary experiment with 20 normal subjects, who did the NAT activities. These tests were filmed and the data recorded. We analyzed every execution sequence and constructed from there a first set of representative scenarios. Moreover, we have recorded the time taken by each person, and we have detailed elaborate statistics for the different steps of the activities. The collected information will be used to conduct next experiments with patients suffering from Alzheimer’s disease. We are presently recruiting patients and we plan to experiment in the summer of 2010. We are convinced that our efforts will provide great quality real-case scenarios with the SIMACT tool for the benefit of the community. Finally, as the simulator was purposely built to help experiment with new activity recognition approaches in a smart home, we proceeded to a first validation case by using it to test a new recognition model that we just developed. This model is an extension of our previous work [1], which incorporates non-symbolic temporal constraints [3]. Each step (basic actions) of the scenarios included in SIMACT is composed of a precise completion time based on the data recorded in trials. This allowed us to clearly test the efficiency of the new temporal constraints incorporated in our
530
K. Bouchard et al.
model and to show how these constraints helped to identify errors linked to time (ex. letting the water boil for one hour). The details about this new recognition model and the complete results of this experiment will be published later in 2010. 3.1 Details about the NAT Test The NAT [17] has proved during the past few years to be an efficient way to study cognitive abilities. It proposes three standard activities from which we decided to choose only two of them to reduce the patient evaluation’s time. The first selected ADL is preparing coffee and toasts. The Figure 1 illustrates a step by step model of the possible ways of achieving this activity, designed on the basis of the results of the clinical trials. The following model is used to describe the general script outline which will serve to define a larger set of scripts with a simulation of incorrect behavior. The activity is composed of two sub-tasks that can be interleaved: prepare a cup of coffee and make a toast.
Fig. 3. Model of the first selected activity: preparing coffee and making toasts
This activity constitutes a really good choice because it is composed of two subtasks, several interleaved steps, it can be done in less than 10 minutes, and it is similar in nature and size to a lot of smart home literature examples such as preparing tea [12], cooking pasta [1] and brushing teeth [6]. The second chosen activity is wrapping a gift. This activity is less common but brings a different activity context in the collected data. It also gives the opportunity to bring emphasis on many different kind of cognitive impairment within the set of scripts given with SIMACT.
SIMACT: A 3D Open Source Smart Home Simulator for Activity Recognition
531
3.2 Details about the Experimental Protocol Used to Build Scenarios The first experiment that we conducted consists in clinical trials using the designed experiment with an adequate set of normal subjects. The objective were to gather relevant data about the normal performance of each activities: normal completion time for each step and for completing the activity, usual steps order, errors committed, hesitations, differences in sex and ages, etc. Scientists will then be able to use this data, which will be included in the SIMACT database, to calibrate their recognition algorithms and for comparison with abnormal observed behavior. For the realization of this first test, we obtained the approval of the ethical committee and recruited a set of normal subjects on a voluntary basis. Publicity has been made in regional newspapers, as well as on the campus. The participants had to be aged between 18 and 77 years old. The only inclusion criterion was the absence of any type of cognitive impairment. This impairment could either be a psychiatric or neurologic issue or the use of prescription drugs. The participants were also asked to refrain from any alcoholic or drug use, for the 24 hours preceding the test. A total of 20 participants were selected to perform the two activities. The experimental protocol works as follow. The participant took place on a chair, all the material needed for the completion of the activity was laid on the table in front of him. The participant is free to use anything he wants to complete the task and can do so in the order that pleases him. At first, the participants were asked to prepare a cup of coffee with sugar and milk, and to prepare a toast with butter and jelly. In a second time, participants were asked to wrap a gift with the proposed material. There was no time limit. The participants had to quit the room while the assistant was preparing the set for the second activity. Each trial was recorded on video and timed. The video are framed in a matter to preserve the anonymity of the participant, the faces are not shown on the video. We have developed a scoring sheet to record the order in which the different steps of the task were performed. Following the completion of the test, the order and time of each step was individually processed. Statistical analysis (modal and mean analysis, mean comparison and correlation), have been made to determine the order of the steps in the different models and to determine if any subgroup difference was relevant, for example if they were any difference depending of the gender or the age of the participant.
4 Related Works Over the last few years, several projects [7][8][11] tried to address the issue of developing a simulation tool for smart home experimentations. Lazovik proposed in 2009 a new 2D tool [8] with a flexible GUI in order to virtualizes a complete smart home environment. The objective of this tool was to provide a full featured simulation of any possible domotic scenarios. However, this tool is mainly focused on the planning aspect of activities of daily living and does not take account important elements of the recognition process, such as gathering all the details about the inputs of sensors and providing a simple and comprehensive interface to interrogate them in real time. Park recently proposed another simulation tool named CASS (Context-Aware Simulation System for Smart Home) [11]. This tool aims to generate the context information associated with virtual sensors and virtual devices in a smart home domain.
532
K. Bouchard et al.
By using CASS, the self-adaptive application developer can immediately detect rule conflict in context information and determine optimal sensors and devices in a smart home. As we can see, CASS is oriented to focus on sensors distribution problems more than providing a mean for experiment with recognition algorithms. The closest project to our proposal is the smart home simulator proposed by Krzyska [7]. Her team developed a smart home editor providing the functionalities needed to build a customized smart home from scratch. The user turns himself into an architect by defining walls, doors and sensors. Plus, there is scripting functionality similar to ours, allowing one to simulate activation of the sensors. However, a limitation of her approach is that scripting does not allow simulating complex real case scenarios because it was only designed for very basic experimentations. Moreover, the simulation is done in a 2D frame made with Swing library for Java and it lacks clarity. On the contrary, the SIMACT tool specifically focuses on the integration of real case scenarios and is more user-friendly with its animated 3D frame. In addition, none of these previous contributions proposed freely offering their simulation tool to the community in an open source environment allowing for further enhancements. Finally, one of the most significant contribution over previous works consists in providing a complete set of adequate real case scenarios with the tool.
5 Conclusion People will get older during the 21st century because of birth rate decrease combined with longer life expectation [4]. This reality will be challenging knowing that we already lack qualified caregivers. In addition, cognitive disease will continue to ravage our society, amplifying the problem. Smart homes constitute a viable avenue of solution [12]. However, research and development in this field require complex infrastructures and a lot of money that many scientists could not afford. Furthermore, it may be very hard to get access to real patients in order to conduct adequate experiments. Simulation tools constitute, therefore, a very promising and affordable alternative [11] that could be reused anytime. In this paper, we demonstrated the huge potential of SIMACT, a new user-friendly 3D animated simulator offering a simple interface. We also showed how it can be easily adapted to individual needs. We hope that SIMACT will become a very useful tool for the community with its XML scripting language, many options, customizable 3D frame and its set of pre-defined real case scenarios. The next step is to conduct our study with Alzheimer patients in order to add more representative real case scenarios. In further developments, we will also test SIMACT with other third party applications. Finally, we will push the development to add other rooms (bedroom, dining room, etc.) and, of course, with these rooms, to develop real case scenarios. Acknowledgments. We would like to thank our sponsors for their financial support: the Natural Sciences and Engineering Research Council of Canada (NSERC), the Quebec Research Fund on Nature and Technologies (FQRNT), The Canadian Foundation for Innovation (CFI), The Foundation of the Université du Québec à Chicoutimi (FUQAC), and of course our University (UQAC). We also thank to our psychologist partners J. Bouchard, A. Potvin and H. Laprise and the CleophasClaveau center of La Baie for their collaboration.
SIMACT: A 3D Open Source Smart Home Simulator for Activity Recognition
533
References 1. Bouchard, B., Bouzouane, A., Giroux, S.: A Keyhole Plan Recognition Model for Alzheimer’s Patients: First Results. Journal of Applied Artificial Intelligence (AAI) 22(7), 623–658 (2007) 2. Carberry, S.: Techniques for plan recognition. User Modeling and User-Adapted Interaction 11(1–2), 31–48 (2001) 3. Dechter, R., Meiri, I., Pearl, J.: Temporal Constraint Networks. Artificial Intelligence 49(1-3), 61–95 (1991) 4. Diamond, J.: A report on Alzheimer disease and current research. Technical report, Alzheimer Society of Canada, pp. 1–26 (2006) 5. Haigh, K., Kiff, L., Ho, G.: The Independent LifeStyle Assistant: Lessons Learned. Assistive Technology 18(1), 87–106 (2006) 6. Hoey, J., von Bertoldi, A., Poupart, P., Mihailidis, A.: Assisting persons with dementia during handwashing using a partially observable Markov decision process. In: Int. Conference on Vision Systems (ICVS 2007) (Best Paper award), pp. 89–99 (2007) 7. Krzyska, C.: Smart House Simulation Tool, Master thesis, Informatics and Mathematical Modelling, Technical University of Denmark, DTU (2006) 8. Lazovik, E., den Dulk, A.C.T., Groote, M., Lazovik, A., Aiello, M.: Services inside the Smart Home: A Simulation and Visualization tool. In: Int. Conf. on Service Oriented Computing, ICSOC 2009 (2009) 9. Powell, M.: Java Monkey Engine, JME (2008), http://www.jmonkeyengine.com 10. Patterson, D.J., Kautz, H.A., Fox, D., Liao, L.: Pervasive computing in the home and community. In: Bardram, J.E., Mihailidis, A., Wan, D. (eds.) Pervasive Computing in Healthcare, pp. 79–103. CRC Press, Boca Raton (2007) 11. Park, J., Moon, M., Hwang, S., Yeom, K.: CASS: A Context-Aware Simulation System for Smart Home. In: 5th ACIS International Conference on Software Engineering Research, Management & Applications (SERA 2007), pp. 461–467 (2007) 12. Pigot, H., Mayers, A., Giroux, S.: The intelligent habitat and everyday life activity support. In: 5th International Conference on Simulations in Biomedicine (2003) 13. Ramos, C., Augusto, J.C., Shapiro, D.: Ambient Intelligence: the Next Step for Artificial Intelligence. IEEE Intelligent Systems 23, 5–18 (2008) 14. Refaat Nasr, M.: Open Source Software: The use of open source software and its impact on organizations. Master Thesis, Middlesex university, 129 pages (June 2007) 15. Roy, P., Bouchard, B., Bouzouane, A., Giroux, S.: Challenging issues of ambient activity recognition for cognitive assistance. In: Mastrogiovanni, F., Chong, N. (eds.) Ambient Intelligence and Smart Environments, pp. 1–25. IGI global (in press, 2010) 16. Tibben, H., West, G.: Learning User Preferences in an Anxious Home, Smart Homes and Beyond. In: Proc. of the 4th International Conference on Smart homes and health Telematics (ICOST 2006), Milwaukee, USA, December 10-13, pp. 188–195 (2006) 17. Schwartz, M.F., Segal, M., Veramonti, T., Ferraro, M., Buxbaum, L.J. (NAT) The Naturalistic Action Test: A standardised assessment for everyday action impairment. Neuropsychological Rehabilitation 12(4), 311–339 (2002)
Design of an Efficient Message Collecting Scheme for the Slot-Based Wireless Mesh Network Junghoon Lee and Gyung-Leen Park Dept. of Computer Science and Statistics, Jeju National University 690-756, Jeju Do, Republic of Korea {jhlee,glpark}@jejunu.ac.kr
Abstract. This paper proposes an efficient message collection scheme for wireless mesh networks and measures its performance. Built upon the WirelessHART protocol providing a slot-based deterministic network access as well as the split-merge operation capable of switching channels within a single slot, the proposed scheme allocates two consecutive slots to each pair of child nodes for a receiver. In addition to the primary sender, the secondary sender tries to send a message, if it has, after the predefined interval (additional transmit), while the receiver selects the frequency channel bound to each sender according to the clear channel assessment result in a single slot (additional receive). The simulation result, obtained from the discrete event scheduler targeting at the 4-level binary tree topology, shows that the proposed scheme can improve the message delivery ratio by up to 35.4% and reduce the wasteful transmission by up to 38 % for the given slot error ranges, compared with the non-switching scheme. Keywords: Wireless mesh network, WirelessHART, split-merge operation, sensor data collection, message delivery ratio.
1
Introduction
The wireless mesh network can be deployed for the remote monitoring system where the end nodes collect the sensor data and send to the information server [1,2]. If applied to the vehicular network, this architecture enables to monitor the current traffic condition, read remote meter, and provide Internet access to the commuters and drivers [3]. Practically, a lot of traffic-related or environmental devices such as speed detectors, pollution meters, and traffic signal controllers, can be added to the vehicular network. Moreover, vehicles themselves can also
This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the NIPA(National IT Industry Promotion Agency). (NIPA-2010(C1090-1011-0009)). Corresponding author.
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 534–543, 2010. c Springer-Verlag Berlin Heidelberg 2010
Design of an Efficient Message Collecting Scheme
535
carry a sensor device and report the collected sensor data to a static router node. In such a wireless mesh network, mesh routers are generally immobile, so its topology is stable, making MAC and routing the main performance factors [4]. It can exploit the various wireless communication technologies such as 802.11, 802.15, 802.16, and so on, while a new one is still being developed [5,6]. Each wireless communication has its own MAC, which can be either slot-based or not. Built on top of the underlying time synchronization function, a slot-based MAC can provide predictable access to each node. Moreover, it is possible to allocate slots according to the traffic characteristics of each stream and routing decision. Predictability is more important when every traffic is concentrated to the single or limited number of gateways which connects the node inside the mesh network to the outside world such as the Internet. This network can run a process control application, in which timely reaction to the status change is essentially required. To this end, the sensor data and the control reaction must be delivered to controller and to the end nodes having actuators accurately and timely. Slot-based MAC can meet the requirement of such applications. However, it is well known that the wireless channel inherently suffers from dynamic quality fluctuation and severe instability on the transmission channel [7]. In slot-based schemes, if the channel status is not good on the time slot, the slot is wasted. Meanwhile, the WirelessHART standard provides a robust wireless protocol for various process control applications [8]. First of all, the communication protocol standard exploits the slot-based access scheme to guarantee a predictable message delivery, and each slot is assigned to the appropriate (sender, receiver) pair. In addition, for the sake of overcoming the transient instability of wireless channels, a special emphasis is put on reliability by mesh networking, channel hopping, and time-synchronized messaging. This protocol has been implemented and is about to be released to the market [9]. The current version of this protocol standard defines the CCA (Clear Channel Assessment) before the transmission in each time slot, but does not specify how to react to the bad CCA result, opening many possibilities to cope with channel errors. As an example, it can employ a split-merge operation to mask channel errors as well as an efficient routing scheme to find the path that is most likely to successfully deliver the message [10]. As the traffic is concentrated on a gateway node, the message delivery path makes a tree topology [11]. The WirelessHART protocol can improve the reliability and timeliness of message delivery to the gateway node. Namely, for an intermediary node having one or more children, can run merge operation and select the sender which returns a clear channel status. By this, each hop can overcome transmit channel errors. In this regard, this paper is to design and analyze the performance of an reliable message delivery scheme for the wireless mesh network based on the WirelessHART protocol, particularly targeting at the binary tree topology. Each slot is bound to a (sender, receiver) pair which has its own wireless channel, and receiver is the parent of one or more senders. After allocating the consecutive two slots to each child, the intermediary node switches channels according to the channel status assessment. The effect of slot
536
J. Lee and G.-L. Park
save and message recovery is sustained to the root node, or the gateway, so we can expect significant improvement in delivery success ratio. The paper is organized as follows: After defining the problem in Section 1, Section 2 introduces the background of this paper focusing on the WirelessHART protocol and subsidiary issues. Section 3 designs the channel allocation and switch scheme exploiting the split-merge operation during the sensor message collection process. The performance measurement result is discussed in Section 4, and finally Section 5 summarizes and concludes this paper.
2
Background and Related Work
The WirelessHART standard is defined over the IEEE 802.15.4 GHz radioband physical link, allowing up to 16 frequency channels spaced by 5 MHz guard band [8]. The link layer provides deterministic slot-based access on top of the time synchronization primitives carried out continuously during the whole network operation time. According to the specification, the size of a single time slot is 10 ms, and a central controller node coordinates routing and communication schedules to meet the robustness requirement of industrial applications. According to the routing schedule, each slot is assigned to a (sender, receiver) pair. For more reliable communication, each sender performs CCA [12] before the transmission. However, how to react when a channel returns bad CCA condition is not yet defined in the current standard. Due to the half-duplex operation of current wireless systems, collision detection can’t be provided. Hence, automatic CCA before each transmission and channel blacklisting can also be used to avoid specific area of interference and also to minimize interference to others. The 802.15.4 standard specifies that CCA may be performed using energy detection, preamble detection, or a combination of the two [5]. In addition, there are reliable channel estimation methods available for the MAC layer to obviate the erroneous communication over the unclear channel. It takes just 8 bit time, which occupies only a small part in 10 ms slot. Channel probing can be accomplished by CCA, RTS/CTS, and so on. It must be mentioned that false alarms may result in performance degradation, but we assume that this can be disregarded, as the correctness of channel probing is not our concern. Anyway, if the channel is detected not to be clear, the sender does not proceed or try another recovery action, which is neither defined in the standard. The split-merge operation, defined on top of the WirelessHART protocol, enables a sender to try another channel if the CCA result of the channel on the primary schedule is not clear. For the destination node of each connection, a controller may reserve two alternative paths having sufficient number of common nodes. Two paths split at some nodes and meet again afterwards. When two paths split, a node can select the path according to the CCA result in a single slot by switching to the channel associated with the secondary route. When two paths merge, the node can receive from either of two possible senders by a timed switch operation as shown in Figure 1. Besides, the WirelessHART protocol can
Design of an Efficient Message Collecting Scheme
537
be reinforced by many useful performance enhancement schemes such as the virtual-link routing scheme that combines split-merge links and estimates the corresponding error rate to apply the shortest path algorithm [10].
CCA Primary Sender
TsCCAOffset
Secondary Sender
CCA switch guard
Receiver TsRxOffset Fig. 1. Merge operation
3 3.1
Routing and Scheduling Scheme System Model
To begin with, by the channel hopping mechanism, the same sender may change the channel at each slot boundary according to the preassigned hopping sequence. However, one hopping sequence can be considered to be a single frequency channel accessed by TDMA protocol. We assume that each node installs a directional antenna which can increase spatial reuse and reduce interference by directing radio beam towards a desired direction [13]. Namely, each node can tune or switches the channel frequency when necessary just with a small overhead compared with the slot time length. Figure 2(a) shows the sample and target mesh network architecture made up of 14 nodes, where Node 0 is the gateway. This binary tree network has 4 level hierarchy and each leaf node, namely, from Node 7 to Node 14, periodically generates message and sends toward the gateway. We select this network because the maximum hop length is limited to 4 in the WirelessHART standard, but our scheme can be applied to any topologies having the tree structure. Figure 2(b) shows the simple slot allocation. For simplicity, we assume that messages generated in each leaf node have the same period and size. The message size fits for a single time slot and 24 slots are needed to collect every message in a round. Hence, the superframe consisting of 24 slots is repeated during the system operation. For a message to reach the gateway, 3 hops are needed, that is, 3 times slots, assigned to each hop, must be clear. It is true that some slots can overlap in the time axis, for example, transmissions, 7 → 3 and 2 → 0, can occur simultaneously if Node 2 and Node 0 are sufficiently far away from Node 3 and Node 7. However, we do not consider such allocation optimization, because our main concern is reliable transmission within a set of slots. In addition, even though we selected a binary tree structure as it can most benefit from the our scheme, our scheme can work on any topology.
538
J. Lee and G.-L. Park 0 1
2
3 7
4 8
9
5 10
11
6 12
13
14
(a) Sample mesh architecture
7
8
9 10 11 12 13 14 3
3
4
4
5
5
6
6
1
1
1
1
2
2
2
2
8
7 10 9 12 11 14 13 4
4
3
3
6
6
5
5
2
2
2
2
1
1
1
1
(b) Sample slot allocation
Fig. 2. Mesh architecture
3.2
Slot Operation
In Figure 2(b), each slot is assigned to a node which sends or relays a message to its parent node. A message goes from the leaf node to the gateway node hop by hop. For example, a message generated at the leaf node 7 will be transmitted from Node 7 to Node 3 through Slot 0, from Node 3 to Node 1 through Slot 8, and finally from Node 1 to Node 0 through Slot 16. On each slot, the channel status between the sender and its parent node must be good for the message to be transmitted successfully. A message loss results in not only the time slot waste in the subsequent schedule but also the power waste consumed to deliver the message from the leaf node. The schedule shown in Figure 2(b) assigns the slot according to the hop distance from the gateway and the children of the same parent node have adjacent slots. This allocation makes each node and its brother/sister node always have a message on its time slot. In addition, below the basic schedule, Figure 2(b) also specifies secondary senders. It must be mentioned that the primary and secondary senders of a time slot have the same parent. The primary sender, the secondary sender, and the receiver in each slot cooperatively work to improve reliability. The transmissions from two senders to a receiver are allocated different channel frequencies. Let P (t) and S(t) denote the primary and secondary senders at the time slot t, respectively. At t, P (t) having a message to transmit, runs CCA to sense the channel status on its frequency channel, and sends when the result is clear. Otherwise, it just discards its transmission, and it can retry the discarded message on the slot on which it is specified to be a secondary sender. Meanwhile, S(t) also having a message to send, does the same way as P (t) in its own frequency channel. However, the message transmission is started after T sRxW ait specified in the standard. At the slot boundary, the receiver first listens on the channel assigned to the
Design of an Efficient Message Collecting Scheme
539
primary sender. If a message appears within T sRxW ait, it receives the message. Otherwise, it changes the receiving frequency to the secondary sender. In this way, t can be used by the secondary sender even when the primary sender cannot transmit through it. Afterwards, it is possible to find a slot t which satisfies the following condition. Namely, P (t) = S(t ) and P (t ) = S(t)
For example, consider the case t = 0 where P (0) = 7, S(0) = 8. Then, t is 1, as P (1) = 8, S(1) = 7. The distance between t and t in the slot allocation depends on the distance from the gateway. For the leaf node, |t−t | = 1, while for the node 1 hop higher from the leaf node, |t−t | = 2, and so on. At t , as P (t ) has already transmitted its message, the chance is given to S(t ), which was formerly the primary sender of t, but missed. In this way, time slot waste can be avoided and delivery ratio can be much improved. The frequency channel switch is basically performed at each slot boundary when the communication system runs channel hopping mechanism. The proposed scheme needs just one more channel switch on each time slot when necessary in the receiver side. Moreover, if two channels are both in good condition, the secondary sender transmits unnecessarily, leading to wasteful power consumption. However, generally in wireless mesh networks, reliability is more important than power efficiency.
4
Performance Measurement
This section measures the performance of the proposed sensor data collection scheme via simulation using SMPL which provides abundant functions and libraries for discrete event scheduling, easily combined with the commonly used compilers such as gcc and Visual C++ [14]. In most experiments, the main performance parameter is the slot error rate, which depends on the data size and the channel error rate distribution. Here, we employ Guilbert-Elliot error model [7], which is quite simple, but can easily set the average error rate we want for the experiment. In addition, the experiment follows the network architecture shown in Figure 2, where each leaf node generates a message that takes one slot time every 24 slot times. For a message to reach the controller node, it needs 3 hops. By Def ault scheme, we mean the slot assignment schedule in Figure 2(b), and we will compare the performance of the proposed scheme with this Def ault scheme. First, Figure 3 shows the delivery success ratio according to the slot error rate. When the error rate is 0, both schemes can successfully deliver every message from the leaf node to the controller. Both schemes show almost linear decrease in the delivery ratio along with the increase of the slot error rate. The success ratio for the default scheme roughly corresponds to the cube of the slot error rate, as it needs successful three-hop transmissions. As contrast, the proposed scheme can mask the error by switching slot allocation between the adjacent relay or leaf nodes. At each hop, a node has one more slot chance for the message transmission toward it parent node. The performance gap gets larger according to the increase
540
J. Lee and G.-L. Park
Wasteful transmission per message
1 "Default" "Proposed"
0.9
Delivery ratio
0.8 0.7 0.6 0.5 0.4 0.3 0
0.05
0.1 0.15 0.2 Slot error rate
0.25
0.3
0.6 "Default" "Proposed"
0.5 0.4 0.3 0.2 0.1 0 0
0.05
Fig. 3. Delivery success ratio
0.1 0.15 0.2 Slot error rate
0.25
0.3
Fig. 4. Power save
of the slot error rate, reaching 35.4 % when the slot error rate is 0.3, where the proposed scheme can deliver 3-hop messages more than 70 %. In addition, suppose that the transmission from Node 7 to 3 has succeeded but that from 3 to 1 fails. Then a message created at Node 7 has finally fails. The power used for the first transmission is wasted. This situation is more harmful, when the last hop transmission, namely, from 1 to 0 or from 2 to 0, fails. The enhanced end-to-end delivery ratio can minimize such power loss. Figure 4 shows how much the proposed scheme can reduce the wasteful power consumption. For all the slot error ranges, the proposed scheme keeps the waste below 12 %, while the loss of the default scheme goes up to 50 %. Even though the experiment assumes that every link has the same link quality, it is possible to make the link closer to the controller more reliable by exploiting stronger signal power. In that case, both schemes can reduce the power waste resulting from the delivery failure in intermediary hops. The performance improvement cannot be achieved without any power loss. The receiver node first attempts to receive from the primary sender, which is one of its two child nodes. If it does not see a message, it tunes the frequency to the secondary sender, which is the other child node. Here, the receiver can 0.3
0.5 "Overhead"
Additional Transmit per slot
Additional receive per slot
"Overhead" 0.25 0.2 0.15 0.1 0.05 0
0.45 0.4 0.35 0.3 0.25 0.2
0
0.05
0.1 0.15 0.2 Slot error rate
0.25
Fig. 5. Additional receive
0.3
0
0.05
0.1 0.15 0.2 Slot error rate
0.25
Fig. 6. Additional transmit
0.3
Design of an Efficient Message Collecting Scheme
541
perform 2 receive operations in a slot, while some senders must send twice. Figure 5 and Figure 6 show such overhead. Figure 5 plots the average number of additional receive operations per slot. The overhead is roughly proportional to the slot error rate, as more transmission errors lead to more channel switches. In addition, Figure 6 shows the additional transmit operations per slot. When there is no error, the overhead is 0.5. For each child pair, the first one is the primary sender and the second one is the secondary sender in the first of two slots. In the next slot, their roles are switched. Without a communication error, the primary sender always succeeds in the first slot, while the second sender makes wasteful transmission. As contrast, in the next slot, there is no wasteful transmission as the secondary sender has no message to send. Hence, the additional transmission ratio is 0.5. The wasteful transmission decreases according to the increase of the error rate. The delivery failure in the early stage possibly makes the node have no message in the higher tree level. Additionally, the auxiliary transmission is selected by the receiver. After all, the additional transmission ratio reaches 0.22 when the slot error rate is 0.3. 35 "Default" "Proposed"
Maximum consecutive misses
Average consecutive misses
2 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0
"Default" "Proposed"
30 25 20 15 10 5 0
0
0.05
0.1 0.15 0.2 Slot error rate
0.25
0.3
Fig. 7. Average consecutive losses
0
0.05
0.1 0.15 0.2 Slot error rate
0.25
0.3
Fig. 8. Maximum consecutive losses
In the sensor network, sensor messages are collected and processed by the controller. A sensor message generally contains the current sensor value and is input to the control logic. Even if one message is missed, the controller can operate correctly if the next message arrives. Hence, consecutive message losses are quite critical to the correctness of the system operation. Figure 7 and 8 show the average and maximum number of consecutive losses on the message from the same sender, respectively. As shown in Figure 7, the average consecutive loss for the default scheme gets higher up to 2.0 when the slot error rate is 0.3, while the proposed scheme can maintain the consecutive losses below 0.41. Figure 8 plots the maximum number of consecutive losses. Until the slot error rate is 0.05, both schemes have less than 5 losses. However, the default scheme misses a message up to 30 times in a row, while the proposed scheme up to 12, when the slot error rate is 0.3. This result is not acceptable for the general sensor application, so an error compensation scheme is needed to give precedence to the message whose predecessors were more lost.
542
5
J. Lee and G.-L. Park
Conclusions
This paper proposes an efficient message collection scheme for wireless mesh networks and measures its performance. Built upon the WirelessHART protocol providing a slot-based network access as well as the split-merge operation capable of switching channels within a single slot, the proposed scheme allocates two consecutive slots to each pair of child nodes for a receiver. In addition to the primary sender, the secondary sender tries to send a message, if it has, after the predefined interval (additional transmit), while the receiver selects the frequency channel bound to each sender according to the clear channel assessment result in a single slot (additional receive). The simulation result, obtained from the discrete event scheduler targeting at the 4-level binary tree topology, shows that the proposed scheme can improve the message delivery ratio by up to 35.4% and reduce the wasteful transmission by up to 38 % for the given slot error ranges, compared with the non-switching scheme. In addition, the proposed scheme can reduce the average and maximum number of consecutives losses, which is another important performance metric for the sensor network application, from 2.0 to 0.3 and from 30 to 15, respectively, while it brings overhead for the additional receive and transmit operations by up to 0.25 and 0.22, respectively. As future work, we are planning to find how to compensate a node which experiences many consecutive losses. To this end, a fairness issue must be considered when deciding the slot schedule and its primary and secondary sender. More specifically, the primary and secondary sender will be decided dynamically [15].
References 1. Akyildiz, I., Wang, X.: A survey on wireless mesh networks. IEEE Radio Communications, 23–30 (2005) 2. Bucciol, P., Li, F.Y., Fragoulis, N., Vandoni, L.: ADHOCSYS: Robust and serviceoriented wireless mesh networks to bridge the digital divide. In: IEEE Globecom Workshops, pp. 1–5 (2007) 3. Jaap, S., Bechler, M., Wolf, L.: Evaluation of routing protocols for vehicular ad hoc networks in city traffic scenarios. In: Proceedings of the 5th International Conference on Intelligent Transportation Systems Telecommunications (2005) 4. Loscn, V.: MAC protocols over wireless mesh networks: problems and perspective. Journal of Parallel and Distributed Computing Archive 68, 387–397 (2008) 5. Gislason, D.: ZIGBEE Wireless Networking. Newnes (2008) 6. IEEE 802.11-1999: Part 11 - Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications (1999), http://standards.ieee.org/getieee802 7. Bai, H., Atiquzzaman, M.: Error modeling schemes for fading channels in wireless communications: A survey. IEEE Communications Surveys, 2–9 (2003) 8. Song, S., Han, S., Mok, A.K., Chen, D., Nixon, M., Lucas, M., Pratt, W.: WirelessHART: Applying wireless technology in real-time industrial process control. In: The 14th IEEE Real-Time and Embedded Technology and Applications Symposium, pp. 377–386 (2008)
Design of an Efficient Message Collecting Scheme
543
9. Han, S., Song, J., Zhu, X., Mok, A.K., Chen, D., Nixon, M., Pratt, W., Gondhalekar, V.: Wi-HTest: Compliance test suite for diagnosing devices in real-time WirelessHART network. In: The 15th IEEE Real-Time and Embedded Technology and Applications Symposium, pp. 327–336 (2009) 10. Lee, J., Song, H., Mok, A.K.: Design of a reliable communication system for gridstyle traffic control networks. Accepted at the 16th IEEE Real-Time and Embedded Technology and Applications Symposium (2010) 11. Wan, P., Huang, S., Wang, L., Wan, Z., Jia, X.: Minimum latency aggregation scheduling in multihop wireless networks. In: MobiHoc, pp. 185–193 (2009) 12. Ramchandran, I., Roy, S.: Clear channel assessment in energy-constrained wideband wireless networks. IEEE Wireless Magazine, 70–78 (2007) 13. Dai, H., Ng, K., Wu, M.: An overview of MAC protocols with directional antennas in wireless ad hoc networks. In: Proc. International Conference on Computing in the Global Information Technology, pp. 84–91 (2006) 14. MacDougall, M.: Simulating Computer Systems: Techniques and Tools. MIT Press, Cambridge (1987) 15. Naouel, B., Jean-Pierre, H.: A Fair Scheduling for Wireless Mesh Networks. In: The First IEEE Workshop on Wireless Mesh Networks, WiMesh (2005)
A Novel Approach Based on Fault Tolerance and Recursive Segmentation to Query by Humming Xiaohong Yang, Qingcai Chen, and Xiaolong Wang Department of Computer Science and Technology Key Laboratory of Network Oriented Intelligent Computation Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, China {yxh2008,qingcai.chen}@gmail.com, [email protected]
Abstract. With the explosive growth of digital music content-based music information retrieval especially query by humming/singing have been attracting more and more attention and are becoming popular research topics over the past decade. Although query by humming/singing can provide natural and intuitive way to search music, retrieval system still confronts many issues such as key modulation, tempo change, note insertion, deletion or substitution which are caused by users and query transcription respectively. In this paper, we propose a novel approach based on fault tolerance and recursive segmentation to solve above problems. Music melodies in database are represented with specified manner and indexed using inverted index method. Query melody is segmented into phrases recursively with musical dictionary firstly. Then improved edit distance, pitch deviation and overall bias are employed to measure the similarity between phrases and indexed entries. Experimental results reveal that proposed approach can achieve high recall for music retrieval. Keywords: Query by humming, Melody partition, Edit distance, Fault tolerance, Recursive segmentation.
1
Introduction
As is well known, at present the prevalent music search engines like Google [1], Yahoo [2] primarily take textual metadata such as song title, singer name, composer, album or lyrics as keywords to retrieve music, and then return a list of link address of related music to users. However, the current text-based music retrieval approaches may be not convenient to users when they forget the metadata associated with desired music. Therefore alternative music retrieval methods are necessary when people wish to search music and only remember some short parts of music melody. Content-based music information retrieval (MIR) can be regarded as a good complementary music searching method which is more natural and intuitive to users. Hence content-based MIR has been attracting widespread attention and becoming an important research topic with the explosive expansion of digital music on the World Wide Web (WWW) over the past decades. Content-based T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 544–557, 2010. c Springer-Verlag Berlin Heidelberg 2010
A Novel Approach Based on Fault Tolerance and Recursive Segmentation
545
MIR systems ([3], [4]) usually utilize extracted features such as pitch, duration, rhythm, contour, chord etc. to represent music melody and retrieve music from large scale music database. Query by humming (QBH) is one of the most natural content-based MIR methods, which takes a fragment of melody hummed, singed or whistled by users via microphone as query to search music ([5], [6], [7]). QBH system mainly contains following components: i) automatic query transcription module which transforms the acoustic input into note sequence or melody contour; ii) melody representation and indexing module for music in database; iii) searching and matching module that retrieves related melodies from database, carries out similarity measurement between query and indexed melodies, then return a ranked list of candidates to users. The framework of proposed QBH system is illustrated in Fig. 1 where component i is belong to the front-end (FE) module and the back-end (BE) module includes components ii and iii.
Query Input
Front-End Module
Notes Sequence
Ranked
Back-End Module
Candidates
Music Database
Fig. 1. The framework of proposed QBH system
Although QBH method could provide an intuitive way to search music in terms of a short piece of melody remembered in users’ mind, in practical implementation there are many issues that require better techniques to resolve. In front-end module the quality of queries and the accuracy of automatic query transcription affect the final performance of QBH system [8]. Due to the wide variety of musical backgrounds, regardless of musical training or no training, people always cannot sing a song with the same key and rhythm as the original music. Consequently users usually produce singing errors such as key modulation, variable rhythm, wrong notes, notes insertion or omission when singing or humming pieces of songs as input queries without accompaniment. Even with perfect queries, it is still difficult for automatic query transcription module to extract accurate pitches. Most of errors appear in pitch extraction are caused by capturing double or half pitches, which results in notes are split into several notes or unified as one note [9]. To sum up above argumentations, the searching and matching methods used in back-end module must be robust to deal with the singing errors and note segmentation errors caused by the front-end module. This work proposes a novel approach based on fault tolerance and recursive segmentation to handle aforementioned problems. Query transcription module tracks the fundamental frequency F0 of input query and converts it into note
546
X. Yang, Q. Chen, and X. Wang
sequence. We just use the pitch information, and represent the query and music melodies in database with melody strings. Like Chinese words segmentation the query is partitioned into several phrases in terms of a musical dictionary composed of repeating patterns which denote motives or music themes. Music melodies in database are segmented by the same way and indexed. When matching the query with indexing melodies, edit distance with fault tolerant mechanism is employed to measure similarity. Because of imperfect recall about desired music, pitches always deviate from true values. Therefore when calculating the edit distance of two phrases, if the absolute value of pitch difference between two notes is smaller than or equal to predefined threshold, the proposed approach deems the two notes to be identical. Since notes insertion or omission in query leads to inaccurate partition, we employ recursive segmentation to solve this problem and search music. The remainder of the paper is organized as follows. Section 2 portrays melody representation and partition with musical dictionary. Section 3 introduces query transcription firstly, and then describes proposed approach based on fault tolerance and recursive segmentation. Experimental results are reported in section 4. The conclusions of this work are summarized in Section 5.
2
Melody Representation
So far, content-based MIR systems primarily employ note-based or symbol-based manner to represent melody. In former systems music melodies are directly extracted from digital music stored in MP3, MIDI, wav or scores formats, and transformed into notes sequences by extracting pitch and duration features. The latter utilize relative-value based melody representation to reduce the difference between hummed query and melodies in database [3]. Relative pitch can be obtained by subtracting previous pitch from current pitch for each note. Relative duration can be calculated by dividing the continuous note by duration. 2.1
Melody Representation
This work integrates the note-based and symbol-based methods together to represent melody, and employs melody strings to represent transcribed query and music melodies in database. When measuring similarity between query and music melodies, relative pitch is used to calculate edit distance. In addition, this work follows the format of ABC music notation to describe pitch name in different octave. In order to represent the twelve semitones in every octave better and easier, some letters such as R, S, T, U and V are introduced into pitch name mechanism. Therefore according to the format of ABC music notation, semitones in different octave can be defined as follows. Twelve uppercase letters (C R D S E F T G U A V B) are adopted to denote the semitones in central octave, and one comma is appended on bottom-right corner of each letter for those below central
A Novel Approach Based on Fault Tolerance and Recursive Segmentation
547
octave. More commas are added for lower octave halftones by analogy. Similarly lowercase alphabets (c r d s e f t g u a v b) are used to represent the semitones above central octave, and more commas are appended on the top-right corner of each alphabet for higher octave semitones. Table 1 shows an example of melody representation for short pieces of some songs. Music melodies are extracted from digital music in database firstly, then parsed and presented with above format. Table 1. An example of melody representation SongId
Most of music works are composed in terms of musical form which contains some rules such as repetition, comparison, transposition and extension for developing musical themes [10]. In particular the repetition rule means that existing specific notes sequences, known as motives, appear repeatedly in movements [11]. As showed in Table 1, notes sequences like “F-c-c-c-V-A” and “c-V-A-G” appear repeatedly in melody string of SongId 1030. Any sequence of notes appearing more than once in music melody is considered as repeating pattern. Therefore, repeating patterns usually stand for motives or musical theme, and can be used for theme mining. Based on the ground-truth MIDI files used for MIREX 2008 Query-by-Singing/ Humming evaluation task and the Essen Folk Song Database [12] which is a collection of more than 8400 European and Chinese folk songs written in ABC music notation, all melodies are parsed with specified format depicted in previous subsection firstly, and then the frequencies of distinct repeating patterns are calculated. If the statistical frequency of each distinct repeating pattern satisfies the predefined threshold, the repeating pattern is selected as candidates of lexical item. In addition, statistical results about the frequencies of repeating patterns show that the frequency of occurrence decreases as the length of repeating patterns increases. Therefore it is not sensible to select too long repeating patterns as lexical items. The same is true for much shorter repeating patterns since their semantic meanings are ambiguous and like stopwords they appear too frequent among music melodies to be good discriminators for music works. If the length of candidate belongs to the preset range, it is chose as final words to comprise musical dictionary with hash table structure.
548
2.3
X. Yang, Q. Chen, and X. Wang
Melody Partition
Analogous to Chinese information retrieval, transcribed queries and music melodies without obvious separator should be partitioned into phrases with musical dictionary before retrieval. For each melody in database, optimal longest repeating string (OLRS) is extracted firstly. The melody is partitioned into several passages in terms of OLRS by recursion, which will not terminate until no OLRS or the length of OLRS is smaller than predefined condition. Then word segmentation, in other words repeating pattern partition, for every passage is carried out with musical dictionary using backward maximum matching method.
Melody String Melody Segmentation Word Segmentation Inverted Index Construction
Fig. 2. An example of melody partition and inverted index construction
An inverted index is constructed for all words in order to speed the music searching and matching. The inverted index structure consists of two elements: the lexicon that is a set of all different words appearing in music melodies and the hits list which corresponds to a list of occurrences of a word in a music melody including position, frequency and other necessary information. Fig. 2 depicts an example of melody partition and inverted index construction for melody string of SongId 1030 in Table 1.
3
Proposed QBH Approach
In this section we introduce query transcription, similarity measurement firstly, and then describe proposed approach based on fault tolerance and recursive segmentation to music retrieval. 3.1
Query Transcription
Illustrated as Fig. 1, the queries hummed or sung by users via microphone are acoustic signals. The front-end module of QBH systems should convert the queries
A Novel Approach Based on Fault Tolerance and Recursive Segmentation
549
into notes sequence at first [13]. For this task, the automatic transcription component in FE module adopts the YIN algorithm based on the well-known autocorrelation method to estimate the fundamental frequencies of query signals [14]. Firstly we estimate fundamental frequency for each frame using the YIN algorithm in 25ms analysis window with 5ms interval between successive frames. If the difference of fundamental frequencies between continuous frames is small enough and satisfies the preset threshold, these continuous frames are integrated and the average frequency is regarded as their fundamental frequency. Then according to MIDI standard format all frequencies are transformed into semitones using logarithmic function. One step of the frequency corresponds to one semitone that is calculated as following equation: Semitone = 69 + 12/ log 2 × log(f0 /440)
(1)
where f0 indicates the fundamental frequency. Even though the YIN algorithm has high precision to detect fundamental frequency, it is unavoidable for users or query transcription to bring about errors. Therefore post processing is necessary. After getting the average of all semitones in a query, if the difference between a semitone and the average is equal to or greater than one octave, the semitone is set to 0. 3.2
Similarity Measurement
Edit distance (ED) is a common distance metric to measure the similarity between two symbol sequences. On one hand our QBH system employs melody strings to represent the input query and database entries, on the other hand in order to compensate user singing or humming errors, relative pitch can be better to portray the coarse melody contour. Therefore the ED method is suitable to calculate similarity between transcribed query and indexed melodies with relative pitch. The edit distance between two compared sequences can be defined as the minimal transitional number of note insertion, deletion and substitution which are necessary operations to transform source sequence into target sequence. By selecting appropriate cost function, the ED can take user errors into consideration. Insertion cost covers extra hummed notes, while the deletion cost accounts for skipped notes [9]. The substitution cost penalizes errors between wrong notes and the expected reference notes. The ED approach implements dynamic programming principle to calculate melody distance D(Q, P ) between the transcribed query Q = q1 q2 . . . qM and indexed entry P = p1 p2 . . . pN by completing a (M +1)×(N +1) distance matrix D(M+1)×(N +1) . The cell Di,j denotes the minimal melody distance between the two prefix sequences Q1···i and P1···j . Let 1 ≤ i ≤ M and 1 ≤ j ≤ N , the value of each cell in matrix D(M+1)×(N +1) can be calculated by using the following recursive formula:
with initial conditions: D0,0 = 0, Di=1:M,0 = 1, 2, 3, · · · , M, D0,j=1:N = 1, 2, 3, · · · , N. The above recursive formula defines a constant penalty of 1 for note insertion and deletion, the cost for note substitution is 0 if the i-th note of Q is equal to the j-th note of P , otherwise 1. The final edit distance between two sequences Q and P is the value of last cell DM,N in distance matrix. 3.3
Fault Tolerance Mechanism
On account of imperfect memory or without professional knowledge about music, people are usually unable to grasp the music key and tempo when humming or singing. This means errors like key modulation and tempo change caused by users are inevitable. In order to alleviate the impact made by above faults and other flaws such as notes insertion, deletion and substitution on final retrieval performance, it is indispensable for QBH system to apply fault tolerant approaches to promote the robustness of retrieval system. Pitch Deviation. Even if music is hummed faster or slower than correct version or sung in higher or lower key, they also can be recognized easily by audiences. This phenomenon indicates that music is time-scalable and tone-shiftable [15]. Since relative pitch always employed to portray melody contour is insensitive to music key, it is better than absolute pitch to describe the music melody. In the same way relative duration in other words duration ratio is used for time comparison due to its insensitivity to music tempo. Consequently in this work relative pitch is used to calculate the distance between transcribed query and indexed entries. Table 2. Relative pitches of phrases Phrases
AcVAU
AAcVAG
Absolute pitch
69 72 70 69 68
69 69 72 70 69 67
Relative pitch
+3 -2 -1 -1
0 +3 -2 -1 -2
After transcribed query and music melody represented by melody string are partitioned into phrases, the relative pitch of each phrase is calculated by subtracting the previous pitch from current pitch. An example of relative pitches of
A Novel Approach Based on Fault Tolerance and Recursive Segmentation
551
phrases is showed in Table 2. According to the analysis about relative pitches of phrases, we find that the absolute value of difference between two relative values among two phrases which are deemed to be same is usually smaller than or equal to one halftone. Like phrases “AcVAU” and “AAcVAG” in Table 2, we can consider these two phrases are same even though the latter inserts a zero and substitutes -1 with -2, since the pitch difference is just 1. As known users always lead to pitch deviation, but the difference satisfying predefined threshold is permitted. In proposed approach the threshold of admitted pitch deviation is set to 1, namely one halftone. Therefore we adjust the cost function as below formula when calculate the edit distance between two phrases: 0 if | Qi − Pj |≤ 1 Costi,j = (4) 1 if | Qi − Pj |> 1 where Qi indicates the i-th value of relative pitch sequence Q and Pj is the j-th value of relative pitch sequence P . Although pitch deviation with one semitone is admissible, the number of relative pitches which have one halftone error between each other is restrained. Otherwise, the two phrases can not be regarded as the same if there are too many relative pithes existing deviation. For example, the relative pitch sequences “+3 -2 -1 -1” and “+2 -2 0 -2” is not identical since there are three relative pitches with semitone error, and the two phrases consist of only five notes. Therefore proposed approach specifies the upper bound of the number of relative pitches with halftone deviation in terms of phrase length. Overall Bias. If a numerical value is inserted into or deleted from relative pitch sequence and the difference between it and its previous pitch or following pitch is large, we think that the inserted or deleted value alters the original melody contour. Similarly replacing one value for another number with much difference among relative pitch sequence will result in the same effect. These cases always take place when users input a query by humming or singing a short part of music. Unfamiliar with the music, users usually modify music melody with errors like note insertion, deletion or substitution demonstrated in Fig. 3. In order to tolerate these faults we allow preceding problems happening, but the difference between relative pitch inserted, deleted or substituted by users or query transcription module and its adjacent values must be in the range of [-1, 1]. This restriction ensures that the melody contour may not vary drastically. While calculating the edit distance between two phrases using relative pitch sequences according to the equations (2) and (4), the overall bias caused by note insertion, deletion and substitution is counted up. Given relative pitch sequences S = s1 s2 . . . sM and T = t1 t2 . . . tN for query and indexed phrases respectively, edit distance matrix D and flag matrix F with M +1 rows and N +1 columns are constructed firstly. In flag matrix symbols ‘←’ and ‘↑’ stands for insertion and deletion operations respectively, and ‘’ denotes substitution if | Si − Tj |≥ 1 or two relative pitches are deemed as same to each other if | Si − Tj |≤ 1. Along with the recursive computation of edit distance between S and T , the flag matrix
552
X. Yang, Q. Chen, and X. Wang Insertion
+3 -2 -1 -1 2
+3 -2 (+1) -1 -1 2
Deletion
+3 -2 (-1) -1 2
+3 -2 -1 2
Substitution
+3 -2 -1 (-1) 2
+3 -2 -1 (-2) 2
Fig. 3. Insertion, deletion and substitution of relative pitch
is filled step by step. Let 1 ≤ i ≤ M and 1 ≤ j ≤ N , matrix F can be completed according to below formula: ⎧ if Di,j = Di,j−1 + 1 ⎨← if Di,j = Di−1,j + 1 (5) Fi,j = ↑ ⎩ if Di,j = Di−1,j−1 + Costi,j with initial conditions: Di=1:M,0 = D0,j=1:N =
↑, ←.
In particular when any two or all of Di,j = Di,j−1 + 1, Di,j = Di−1,j + 1 and Di,j = Di−1,j−1 + Costi,j are equal in one step, we specify the priority for them: Di,j = Di,j−1 + 1 first, Di,j = Di−1,j + 1 second and Di,j = Di−1,j−1 + Costi,j last. When recursion is over, the matrices of edit distance and flag are filled up. Starting from the last cell FM,N in flag matrix F , the optimal alignment path can be found along the direction of arrows. The operations of insertion, deletion and substitution can be recognized in terms of the roles of arrow symbols on the optimal alignment path. Then overall bias can be summed up using aforementioned method. Providing source sequence “-4 -2 -2 2 -1 2 2” and target sequence “-1 -2 -2 2 5”, Fig. 4 illustrates an example for calculating edit distance and filling flag matrix as proposed approach, where edit distance is 3 and the shadow indicates the optimal alignment path. Dynamic Threshold. Obviously as the length of query increases, the number of errors like note insertion, deletion and substitution will become large. In order to deal with this issue the maximal length of words in musical dictionary is limited to ten notes. Simultaneously three different thresholds of edit distance, the number of pitch deviation and overall bias vary with the note number of query phrase dynamically. The longer the length of query phrase is, the greater is the threshold, which can enhance the effect of fault tolerance. 3.4
Recursive Segmentation
Inspired by Chinese information retrieval, documents are segmented into words and then keywords are used to search relative information. In our QBH system,
A Novel Approach Based on Fault Tolerance and Recursive Segmentation
Tj Si
0
-1
-2
-2
2
553
5
1
2
3
4
5
-4
1
1
2
3
4
5
-2
2
1
1
2
3
4
-2
3
2
1
1
2
3
2
4
3
2
2
1
2
-1
5
4
3
2
2
2
2
6
5
4
3
2
3
2
7
6
5
4
3
3
Fig. 4. Edit distance calculation and flag matrix
analogous method is employed to process music search by humming or singing. Prior to performing retrieval, segmentation is carried out to partition melody string of query into phrases, which is rather different from Chinese word segmentation. The purpose of recursive segmentation is to tolerate faults caused by partition possibly, since it is difficult to judge whether the partition is right or not. From the beginning of melody string, a phrase with maximal length namely ten notes is extracted and its relative pitch sequence is calculated. In terms of the length of this phrase, some indexed entries are selected as preliminary candidates. The edit distance, number of pitch deviation and overall bias described in previous subsection are calculated between this phrase and every preliminary candidate according to fault tolerant principle. If all of them satisfy the predefined dynamic thresholds, the music including preliminary candidate is regarded as a relative result. Meanwhile the end position of the phrase is added into a set of position. Otherwise the music is deemed as irrelevant to the query. Subsequently one note is removed from the end of above phrase, which generates a new phrase. Same operations are implemented to the new phrase to search relative candidate music. This recursive segmentation is not over until the length of new phrase is shorter than or equal to specified threshold aforehand. Then position set is sorted in ascending order of its elements. The minimal element is selected as new position from which recursive segmentation is performed continuously. At the end of each recursion the minimal element is crossed out. Until the element of position set is equal to or greater than the end of query melody, the whole partition of input query terminates.
554
4 4.1
X. Yang, Q. Chen, and X. Wang
Experiments Dataset
Our experiments take the corpus used for MIREX 2008 Query by Singing or Humming evaluation task to evaluate the proposed approach. The datasets consist of 154 monophonic ground-truth MIDI files (with MIDI 0 or 1 format) provided by Roger Jang and ThinkIT and 2000 non-target songs as music database. The non-target songs are randomly selected from the Essen Folk Song Database with ABC music notation format. Music melodies are extracted from MIDI files or ABC files, and represented by melody strings with specified manner described in subsection 2.1. All melody strings are segmented into phrases on which inverted index are built. The acoustic input of our QBH system comprises of 355 queries from ThinkIT corpus. There is no “singing from beginning” guarantee for these queries. In other words these queries may be sung or hummed from any position of songs. Moreover the way to sing or hum is diverse: most of queries are sung with lyrics, a small part of them are hummed with syllables such as /Di/ and /Da/, and a very few are hummed nasal voice. The singing language contains Mandarin and Cantonese. All the queries are digitized at 8 kHz, 16 Bit, 128 kbps, PCM format. 4.2
Environment
The experiments are conducted on a personal computer with an Intel (R) Core (TM) 2 Quad 2.40GHz processor, 2 GB memory and running MS Windows XP operator system. All codes are written in ANSI C++ and complied with all optimization options. 4.3
Performance
In order to investigate the effect of proposed approach with three dynamic thresholds of edit distance (ED), pitch deviation (PD) and overall bias (OB), we carry out several experiments. The results are shown in Table 3 and the performance is measured by retrieval recall. In the table “ED 1-4: 0, 5-6: 1, 7-10: 2” means when the note number of query phrase is 1-4, 5-6 and 7-10, the threshold of ED is 0, 1 and 2 respectively. The specifications for other two thresholds of PD and OB are similar. Due to users imperfect memories and reproduction of melodies, there are more errors as the length of query increases. Consequently the admitted thresholds of ED, PD and OB increase gradually. All the values of parameters are obtained through repeated comparisons and experiments. From the results we can find that the retrieval recall of proposed approach is very high even up to 99.2%, which is directly proportional to the threshold of PD and inversely proportional to those of ED and OB. Without loss of generality, we repeatedly conduct the experiments by randomly selecting other 2000 songs from Essen Folk Song Database and build inverted index for new dataset. Further, the experimental parameters are modified
A Novel Approach Based on Fault Tolerance and Recursive Segmentation
more concretely, especially the threshold of pitch deviation. The results of first and second experiments in Table 4 indicate that pitch deviation is very common in query and the recall can be improved to 99.4% by enlarging this parameter. However, only changing the variable PD usually produces more noise. Therefore it is necessary to decrease the other two thresholds to eliminate irrelevant data. Comparison between the second and third experimental results demonstrates that the performance can be advanced up to 99.7% by restraining the thresholds of ED and OB more strictly, and augmenting that of PD simultaneously. Table 4. Experimental results-II (ED: edit distance, PD: pitch deviation, OB: overall bias) No. 1
Without singing experience or unfamiliar with desired music, users usually bring about many faults such as key alteration, tempo change, note insertion, deletion and substitution. Even query transcription with high precision may cause errors like note split or unification. In order to solve the above issues, this work proposes a novel approach based on fault tolerance and recursive segmentation to QBH
556
X. Yang, Q. Chen, and X. Wang
system. Since users can not remember the melody inaccurately, they always produce pitch deviation when singing or humming. Therefore we permit the occurrence of pitch deviation within semitone when calculating edit distance between query phrase and indexed entries using relative pitch. However, the proposed approach restricts the number of note with pitch deviation, otherwise the melody contour change drastically. Moreover the overall bias caused by note insertion, deletion and substitution is not allowed to be large, which results in more errors. In order to enhance the robust of QBH system, the thresholds of ED, PD and OB vary with the length of query phrase dynamically. Recursive segmentation is employed to partition the query melody and search relevant music. Experimental results show that the performance of proposed approach is outstanding. Next step duration of notes will be added into calculation of edit distance to improve the retrieval precision. Acknowledgments. The authors would like to thank the anonymous reviewers for helpful suggestions. This investigation is supported in part by National Natural Science Foundation of China (No. 60703015 and No. 60973076).
References 1. Google, http://www.google.cn/music/homepage 2. Yahoo, http://music.cn.yahoo.com/ 3. Heo, S.P., Suzuki, M., Ito, A., Makino, S.: An Effective Music Information Retrieval Method Using Three-Dimensional Continuous DP. IEEE Trans. on Multimedia 8(3), 633–639 (2006) 4. Suyoto, I.S.H., Uitdenbogerd, A.L., Scholer, F.: Searching Musical Audio Using Symbolic Queries. IEEE Trans. on Audio, Speech and Language 16(2), 372–381 (2008) 5. Pauws, S.: CubyHum: A Fully Operational Query by Humming System. In: Proceedings of International Society for Music Information Retrieval Conference, ISMIR (2002) 6. Unal, E., Narayanan, S., Chew, E., Georgiou, P.G., Dahlin, N.: A Dictionary Based Approach for Robust and Syllable-Independent Audio Input Transcript for Query by Humming Systems. In: Proceedings of International Conference of ACM Multimedia 2006, pp. 37–43 (2006) 7. Ryynanen, M., Klapuri, A.: Query by Humming of MIDI and Audio Using Locality Sensitive Hashing. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 2249–2252 (2008) 8. Mulder, T.D., Martens, J.P., Pauws, S., Vignoli, F., Lesaffre, M.: Factors Affecting Music Retrieval in Query-by-Melody. IEEE Trans. on Multimedia 8(4), 728–739 (2006) 9. Unal, E., Chew, E., Georgiou, P.G., Narayanan, S.S.: Challenging Uncertainty in Query by Humming Systems: A Fingerprinting Approach. IEEE Trans. on Audio, Speech and Language 16(2), 359–371 (2008) 10. Hsu, J.L., Liu, C.C., Chen, A.L.P.: Discovering Nontrivial Repeating Patterns in Music Data. IEEE Trans. on Multimedia 3(3), 311–325 (2001)
A Novel Approach Based on Fault Tolerance and Recursive Segmentation
557
11. Liu, N.-H., Wu, Y.-H., Chen, A.L.P.: An efficient approach to extracting approximate repeating patterns in music databases. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 240–252. Springer, Heidelberg (2005) 12. Essen Folk Song Database, http://www.esac-data.org/ 13. Rho, S., Han, B.J., Hwang, E., Kim, M.: MUSEMBLE: A Novel Music Retrieval System with Automatic Voice Query Transcription and Reformulation. The Journal of Systems and Software, 1065–1080 (2008) 14. Cheveigne, A.D., Kawahara, H.: YIN, A Fundamental Frequency Estimatior for Speech and Music. Journal of the Acoustical Society of America 111(4), 1917–1930 (2002) 15. Kosugi, N., Sakurai, Y., Morimoto, M.: SoundCompass: A Practial Query-byHumming System. In: Proceedings of ACM SIGMOD 2004 (2004)
Chinese Prosody Generation Based on C-ToBI Representation for Text-To-Speech Byeongchang Kim School of Computer & Information Communications Engineering, Catholic University of Daegu, Gyeongbuk, South Korea [email protected]
Abstract. Prosody modeling is critical in developing text-to-speech (TTS) systems where speech synthesis is used to automatically generate natural speech. In this paper, we present a prosody generation architecture based on Chinese Tone and Break Index (C-ToBI) representation. ToBI is a multi-tier representation system based on linguistic knowledge to transcribe events in an utterance. The TTS system which adopts ToBI as an intermediate representation is known to exhibit higher flexibility, modularity and domain/task portability compared with the direct prosody generation TTS systems. We model Chinese prosody generation as a classification problem and apply conditional Maximum Entropy (ME) classification to this problem. We empirically verify the usefulness of various natural language and phonology features to make well-integrated features for ME framework.
1
Introduction
One of the most critical problems in text-to-speech (TTS) systems is the appropriate control of prosodic features such as prosodic phrase, fundamental frequency(F0,pitch) contours and segmental duration patterns. In TTS systems especially, prosodic phrasing and fundamental frequency contour generation are the most important tasks to produce natural speech. Chinese is a tonal language and a syllable is normally assigned as the basic prosody element in processing. Each syllable has a tone and a relatively steady pitch contour. However, the pitch contour is transformed from the isolated ones to influence each other when they appear in spontaneous speech in accordance with different contextual information. Therefore, Chinese pitch accent prediction is considered a complicated problem. Many machine learning techniques have been introduced to predict pitch accent, including HMM, neural network, decision trees, bagging, and boosting. Xuejing Sun [1] proposed an ensemble decision tree approach in four-class English pitch accent label. Their method shows high accuracy of 80.50% when they just used text features. The performance improvement is 12.28% for the baseline accuracy. Michell L. Gregory and Yasemin Altun [2] proposed a CRF based approach in two-class English pitch accent label. They reported an accuracy of 76.36%. The performance improvement is 16.86% for the baseline accuracy. T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 558–571, 2010. c Springer-Verlag Berlin Heidelberg 2010
Chinese Prosody Generation Based on C-ToBI Representation for TTS
559
C-ToBI as an intermediate representation, normally increases system-level modularity, flexibility and domain/task portability, but should be implemented with no performance degradation. However, labeling C-ToBI on speech corpus is normally very laborious and time-consuming, so we propose an automatic C-ToBI labeling method. We treat entire phrased break prediction and pitch accent prediction as a classification problem and apply a conditional maximum entropy model based on C-ToBI labels. Various kinds of linguistic and phonologic information are represented in the form of a feature. The remainder of this paper is organized as follows: In section 2, we discuss some previous research works on prosody system. In section 3, we present our CToBI based phrase break prediction, pitch accent label prediction and the pitch contour generation method. The effectiveness of our proposed method is verified by experimental results in section 4. Finally, some conclusions are provided in section 5.
2
Approaches for Intonation Modeling
To address various theoretical issues and build successful applications, many intonation models have been proposed in the past. Intonation models are generally divided into two kinds of models: a phonological model and a phonetic model. The goal of the phonological model is to study the universal organization and underling structure of intonation. Complex intonation patterns are compressed into a set of highly abstract vocabulary with wide coverage. These symbols are regarded as the basic entities in representing the intonation. The models include Autosegmental-Metrical (AM) model, Tone and Break Indices (ToBI) model and Institute of Perception Research (IPO) model. Phonetic model uses a set of continuous parameters to describe intonation patterns observable in a F0 contour. An important goal is that the model should be capable of reconstructing F0 contours faithfully when appropriate parameter are given. Moreover, to make it functional, a phonetic model should also be linguistically meaningful. The models include Fujisaki model, INternational Transcription System for INTonation (INTSINT) model, Tilt model, Mohler and Conkie(PaIntE) model, Pitch Target Approximation model, template model, and neural network model [3]. Phonetic models can precisely represent the various fundamental frequency contours, but it is very difficult to extract the necessary features. Phonological models describe the changes of intonation and generate the fundamental frequency from the intermediate representation. A.W. Black and A.J.Hunt used a ToBI labeling system and a linear regression model [4]. This model requires systematic intonation labels which are based on phonology, and needs to convert the classified intonation patterns into fundamental frequency. This model is simple but depends not only on the quality of the speech corpus, but also on the accuracy of labels, so we need various large corpora at training. Much of the ToBI-related prosody modeling research has been conducted for ToBI label prediction and fundamental frequency contour generation from ToBI
560
B. Kim
labels. A.W. Black and A.J.Hunt predicted fundamental frequency contours using linear regression function from manually constructed ToBI-labeled corpus. They obtained a Root Mean Squared Error (RMSE) of 34.8Hz and 0.62 correlation coefficient when compared with originals on the test data, which is a significant improvement compared to previous rule-driven methods [4]. They applied the same technique to J-ToBI (Japanese ToBI) and obtained a RMSE of 20.9Hz and 0.70 correlation coefficient [4]. However, these methods require a massive amount of manually constructed ToBI labeled corpus. Mantthias Jilka proposed a model that predicts ToBI label and fundamental frequency using prescriptive rules. They obtained a RMSE of 32.4Hz and a correlation coefficient of 0.605 [5]. Jinseok Lee et al. proposed an automatic corpus-based prosody modeling method by applying probabilistic method integrated with error-correction. They applied not only morphological features but also syntax-based features to predict K-ToBI phrase break label and to generate fundamental frequency contour. The performance was a RMSE of 22.21Hz and 0.595 correlation coefficient [6].
3
C-ToBI-Based Prosody Prediction Model NLP model Word segmentation POS tagging Unknown word prediction
Feature extraction pitch tone 2 ME−based prediction
Model parameters
3 F0 contour generation
Unit selection model
Fig. 1. Overall architecture of the proposed prosody prediction
Figure 1 shows the overall architecture of our prosody prediction model. The model includes phrase break prediction, pitch accent prediction and fundamental frequency (F0) contour generation system. In our research, we used previously developed word segmentation and POS tagging system called POSTAG/C [7], an unknown word prediction system [8] and a Chinese pinyin generation system [9].
Chinese Prosody Generation Based on C-ToBI Representation for TTS
3.1
561
Phrase Break Prediction
Linguistic research has suggested that Mandarin is structured in a prosodic hierarchy, in which there are mainly three levels of prosodic units: prosodic word (PW), prosodic phrase (PP), and prosodic group (IU) [10]. There is little contrast with the break index tier of C-ToBI. A minor phrase and a major phrase have no sharp distinction for auditors, therefore, we combined them into a prosodic phrase. We used four types of boundaries to identify the three levels of Mandarin prosodic structure. Figure 2 shows a prosodic boundary annotation example. In this example, B0 represents no boundary, B1 for boundary of prosodic word, B2 for boundary of prosodic phrase, and B3 for boundary of prosodic group.
Fig. 2. Phrase break tagged example
Using the previous linguistic analysis systems, we can extract two kinds of features for phrase break prediction. Then we use the conditional maximum entropy (ME) model parameters estimated by the L-BFGS method [11] [12] to predict the phrase break (See Figure 3). The followings show the two kinds of features we employed: 1. Syntactic features: – Lexical word features: Lexical word features include current, previous and next lexical word: W0, W-1 and W1. They are binary features. – POS tag features: POS tag information is one of the most widely used predictors in prosody modeling. A word might differ from itself in part-ofspeech (POS), which carries various prosodic information. In this paper, POS is divided into 43 categories to capture the variations of prosodic features. We include current, previous two and next two POS tags: P0, P-2, P-1, P1 and P2. Our POS tag features are binary features. 2. Numerical features: – Length features: In Chinese speech, the word is the basic pronunciation unit, so the word is the basic unit of a phrase. Some of the prosodic characteristics of a phrase, such as final lengthening effect, are also presented in the word. Therefore, for an n-syllable word, the word length, n, is used to obtain the corresponding prosodic information. Length features correspond to the word length in the number of characters of the current, previous two and next two words: WLEN0, WLEN-2, WLEN-1, WLEN1 and WLEN2. Lengths of words are floating-point value features. In this paper, we normalized the features using the function as follows: N ormalized word
length =
Current wordlength M aximum word length
(1)
562
B. Kim
Input sentence Lexical word segmentation POS tagging Unknown word prediction Feature extraction ME−based pitch accent prediction
Model parameters
Break index labeled sentence Fig. 3. Phrase break prediction process
– Distance features: Distance features are the distance in characters from the current point to the beginning (dis start) and to the end (dis end) of a sentence. Distance features are also floating-point value features. In this paper, we normalized the features using the function as follows: N ormalized distance =
3.2
Distance Sentence length
(2)
Pitch Accent Prediction
In the early 20th century, the tone and intonation research for Chinese entered into a new phase due to two phoneticians: Dr. Liu Fu (also known as Ban-nong) and Dr. Chao Yuan-ren (Y.R.Chao). Chao pointed out that the syllabic tone patterns can be modified by the sentential attitudinal intonation, just like “the small ripples riding on top of large waves”. It clearly explains the relationship between syllabic tone patterns and the sentential intonation contours. Lexical tones in Chinese are known for their sliding pitch contours. When produced in isolation, these contours seem well defined and quite stable. When produced in context, however, the tonal contours undergo certain variations depending on the preceding tones and the following tones. We used six types of classes to identify the pitch accent. Figure 4 shows a pitch accent annotation example. Using the previous linguistic analysis systems, Chinese pinyin generation system and phrase break prediction system, we can extract five kinds of linguistic features, such as phonetic features, POS tag features, phrase break features, position features, and length features. Then we use the conditional maximum entropy (ME) model parameters also estimated by L-BFGS [11] [12] method to predict the pitch accent (See Figure 5). The ME can be viewed as a Maximum Likelihood (ML) training for exponential models, and, like other ML methods, is prone to overfitting of training data. We adopt a Gaussian prior smoothing from several
Chinese Prosody Generation Based on C-ToBI Representation for TTS
Pitch accent labeled sentence Fig. 5. Pitch accent prediction process
proposed methods for ME models. The Gaussian prior is a powerful tool for smoothing general ME models, and can work well in the language models [13]. The features used in our model can be divided into two groups: isolated feature group and co-occurrence feature group. Isolated feature group: the following uni-gram features are used. They include four kinds of features: 1. Phonetic features. Although the tonal contours undergo certain variations depending on the preceding and the following tones, the phonetic features still contain great prosody information, and phonetic information is one of the most widely uses predictor for pitch accent. Phonetic features include pinyin of current syllable, left syllable’s pinyin, right syllable’s pinyin, current syllable’s consonants, current syllable’s vowels, and current syllable’s tones. They include 38 different consonant, 21 different vowel, 5 different tone, and 1,231 different PinYins in our synthesis DB.
564
B. Kim
2. Syntactic features: – POS tag features: POS tag features include current, previous and next POS tags: P0, P-1 and P1. 3. Phrase break features: phrase break features include non-break, prosodic word, prosodic phrase, and prosodic group. 4. Numerical features: – Position features: Position features include the position of a syllable in a sentence, a phrase and a word respectively. In general, the pitch contours in sentence, phrase and word will follow an intonation pattern. For example, the F0 contour will decline in a declarative sentence. This implies that the syllable position in a sentence, phrase and word will affect the prosodic information. – Length features: Length features include current word length, next word length and sentence length. Co-occurrence feature group: The following co-occurrence feature pairs are used. 1. 2. 3. 4. 5. 6. 7.
Current POS tag - position of syllable in a word. Current phrase break - position of syllable in a word Current syllable’s pinyin - position of syllable in a word Current POS tag - next POS tag Left syllable’s pinyin - current syllable’s pinyin Current syllable’s pinyin - right syllable’s pinyin Left syllable’s pinyin - current syllable’s pinyin - right syllable’s pinyin (triple feature)
In several experiments (Section 4), we will analyze each feature’s effects in both phrase break and pitch accent prediction. 3.3
Fundamental Frequency Contour Generation
In most synthesizers, the task of generating a prosodic tone using the ToBI label system consists of two sub-tasks: the prediction of intonation labels from input text, and the generation of a fundamental frequency contour from those labels and other information. We used a popular linear regression method to generate fundamental frequency from C-ToBI labels [4]. This method does not require any other rules for label types, and is general enough for many other languages. Our prediction formula is as follows: target = w1 f1 + w2 f2 + ... + wn fn + I
(3)
Where each fi is the feature that contributes to the fundamental frequency, and we can decide the weights w1 ∼ wn and I through simple linear regression. We applied the above formula to every syllable and obtained a target value of the fundamental frequency. We prepared pitch values extracted from a speech file, divided them into five sections for each syllable, and predicted the fundamental frequency at every point.
Chinese Prosody Generation Based on C-ToBI Representation for TTS
4
565
Experimental Results
4.1
Corpus Analysis
The experiments were performed on a commercial Chinese database provided by Voiceware Inc. The database has 2,197 sentences, 52,546 Chinese characters, which consist of 25,974 Chinese lexical words. The database is POS tagged, pinyin annotated, break-labeled with four class prosodic structures, and pitch accent labeled with six classes. The occurrence probabilities of break indices are shown in Table 1. Table 1. Occurrence probabilities of break indices in the corpus
B0
B1
B2
B3
36.81%
45.78%
9.58%
7.83%
The occurrence probabilities of pitch accents are shown in Table 2. We divided the database into 10 parts and conducted a 10-fold cross validation. Table 2. Occurrence probabilities of tones in the corpus
4.2
Pitch Accent Classes
H-H
H-L
L-L
L-H
H
L
Occurrence Probabilities
16.0%
11.7%
43.4%
22.8%
2.2%
3.9%
Performance Measures
1. Performance for phrase break prediction The performance is assessed with reference to N, the total number of junctures (spaces in the texts including any type of phrase breaks), and to B, the total number of real phrase breaks (onlyB1 , B2 and B3 ) in the test set. The errors can be divided into insertions, deletions and substitutions. An insertion (I) is a break inserted in the test sentence, where there is not a break in the reference sentence. A deletion (D) occurs when a break is marked in the reference sentence but not in the test sentence. A substitution (S) is the number for the case where we correctly recognize a break existing, but assign a wrong level, such as tag B1 as B3 , or B3 as B2 . We used the following performance measures using these definitions [14]. Break Correct(B C) =
B−D−S ∗ 100% B
Juncture Correct(J C)(Accuracy) =
N −D−S−I ∗ 100% N
(4)
(5)
566
B. Kim
For more extensive comparisons, we used another performance measure, called adjusted score, which refers to the prediction accuracy in proportion to the total number of phrase breaks [15]. Adjusted Score(A S) =
JC − N B 1 − NB
(6)
−B Where N B = NN means the proportion of no breaks to the number of inter-word spaces and JC means the Juncture Correct/100. 2. Performance for F0 contour generation A numerical evaluation of generated intonation contours was carried out in order to provide some concrete, objective numbers to assess their quality and to allow a comparison with other methods of intonation generation. This is commonly achieved by determining the root mean squared error (RMSE) and the correlation coefficient. While the RMSE is a simple distance measure between two F0 contours that can be calculated as equation 7, the correlation coefficient indicates whether the two contours exhibit the same tendencies by measuring the deviation from the mean F0 at regular set points in time that can be calculated as equation 8. If the two contours rise or fall in the same position, then the correlation coefficient is close to 1. If the F0 movements are uncorrelated, it equals 0, and if they are contrary, it is negative. Thus, the correlation coefficient is an appropriate method for determining the degree of agreement in the pitch movements of two contours [16]. n 1 (P˜i − Pi )2 (7) RM SE = n i=1
We performed two experiments to show the phrase break prediction results of our ME-based method [17]. First, using several types of feature combinations to show the best feature selection, we can select the best feature combinations for our ME framework. Second, for various levels of comparison, we used POS bigram statistical model, HMM-based model and manual heuristic rules to predict phrase breaks on the same corpus. We evaluate with four boundary classes: B0 , B1 , B2 and B3 and also two boundary classes: B0 and B123 . In this case, we merged three classes B1 , B2 and B3 into B123 . 1. Results of the feature selection In the experiments, we can draw various conclusions on the effect of feature selection for phrase break prediction [17].
Chinese Prosody Generation Based on C-ToBI Representation for TTS
567
– the POS tag is a baseline feature, and the window size of 5 is the best value in this class(A S=0.741). – Adding lexical word features are helpful and the window size of 3 is the best value in this class(A S=0.762). – Length is a useful feature and the window size of 5 is slightly better than the window size of 3 in our experiment(A S=0.809). – Distance feature actually decreases the performance, but through the feature value’s normalization, distance feature becomes helpful to phrase break prediction(A S=0.810). 2. Comparison with other methods In the comparison of our ME-based method with the previous two methods: HMM and POS bigram, we also used about 85 manual prediction rules for Chinese prosodic segmentation [18]. We want to demonstrate that the ME framework can systematically incorporate the heuristic manual rules at the feature selection level. As shown in Table 3, both the accuracy of POS bigram method and the accuracy of HMM-based method are lower than that of the ME-based method. Moreover, ME-based method alone outperforms the heuristic rule-based error correction of POS bi-gram and HMM results. However, error correction rules are less useful for ME-based method because the rules are already effectively integrated into the ME features. This is another benefit of the ME-based method because generating handcraft prediction rules is laborious work. In the table, Acc4 represents the accuracy (Juncture Correct) for four boundary classes, while Acc2 is the Juncture Correct for two boundary classes. Table 3. Comparison with other methods
bigram bigram + rules HMM HMM + rules ME ME + rules
4.4
BC%
Acc4(J C) %
AS
Acc2(J C) %
81.82 85.80 75.62 79.80 86.80 85.92
78.79 81.02 75.05 79.06 86.48 86.55
0.703 0.732 0.623 0.676 0.810 0.812
81.82 85.86 75.62 79.81 90.33 90.54
Pitch Accent Prediction
We performed three experiments to show the pitch accent prediction results of our ME-based method [19]. In the first experiment, we used isolated feature combinations to explore the best feature selection. In the second experiment, we used co-occurrence feature combinations to explore the best combination feature selection. In the third experiment, we set several Gaussian priorities for smoothing the maximum entropy models. The performance measure was simply defined as: c ∗ 100% (9) Acc = N
568
B. Kim
where c is the number of sample cases correctly predicted, and N is the total number of sample cases. 1. Results of isolated feature selection In the experiments, we can draw various conclusions on the effect of isolated feature selection for pitch accent prediction [19]. – The phonetic features are the most important features, which include current pinyin, previous pinyin, next pinyin, consonant, vowel, and tones (Acc=64.67%). – POS tag is a useful feature and the window size of 3 is the best in our experiment. – Adding phrase break feature is helpful. – Position features and length features are also slightly useful for the pitch accent prediction. 2. Results of co-occurrence feature selection In the experiments of co-occurrence feature selection, we can see that the cooccurrence features are also useful for pitch accent prediction. However, adding the last two co-occurrence features actually decreases the performance, because of the data sparseness. Nevertheless, through Gaussian smoothing, we can get the best performance (Acc=70.10%) by finally adding Left pinyin current pinyin - right pinyin co-occurrence features. 3. Results of smoothing From the Gaussian smoothing expriments, we can see that the Gaussian priority of 0.4 gives the best performance of 70.10%. As shown in Table 4, the improvement of the performance is impressive compared with the previous systems, especially when we generate all the training corpora by the automatic tone-labeling process. The baseline is based on the each class frequency (The occurrence probability of L-L is 43.4%.). The improvements are significantly useful for application in Chinese TTS systems. Table 4. Performance improvement
4.5
Class Number
Baseline (%)
Acc (%)
Improvement (%)
6
43.40
70.10
26.70
Fundamental Frequency Generation
We predicted five pitch values for each syllable and generated fundamental frequency with the interpolated method based on the predicted values. We detected pitch values per 16ms from speech with esps/Xwaves. For training and testing, we extracted a feature set from an automatically created file that is aligned with pitch sequences and phone sequences. Further, we eliminated items with ’0’ from pitch values as a noise for the constructing model. We used seven category features to construct a linear regression model as the following features with pitch accent that we have predicted using C-ToBI representation.
Chinese Prosody Generation Based on C-ToBI Representation for TTS
– – – – – – –
569
Current syllable’s consonant; Current syllable’s vowel; Current syllable’s tone; Current syllable’s POS tag; Current syllable’s phrase break index; Syllable’s position in current sentence; Pitch accent.
Since we extracted five pitches from each syllable, we constructed a linear regression model for each pitch, and obtained a total of five models. Since the predicted response is a vector, we computed the RMSE and correlation coefficients for each element of the vectors and averaged them. Table 5 shows the result of our generation model with C-ToBI and its comparison with direct generation without C-ToBI representation. Table 5 shows that we can improve the fundamental frequency generation performance using the C-ToBI based prediction model. Table 5. The result of fundamental frequency generation
RMSE Correlation Coefficient
5
Without C-ToBI
Using C-ToBI
Improvement
47.920 0.531
40.552 0.621
-7.368 +0.090
Conclusion and Future Works
This paper presents a prosody generation method based on a C-ToBI system for Chinese. We proposed a conditional maximum entropy model for Chinese phrase break and pitch accent prediction tasks. We analyzed several combinations of linguistic and phonetic features in order to identify which features are the best candidates for ME-based phrase break and pitch accent prediction. Moreover, we generated the pitch accent prediction training corpora using a fully automatic tone-labeling process. The results obtained from our proposed system show that the selected best feature sets guarantee the success of the prediction method. The ME model’s feature selection is more flexible than that of other machine learning models, since the ME model allows experimenters to encode various dependencies freely in the form of features, and the Chinese phrase break and pitch accent related features are dependent on each other. As shown in our results, the performance of prosody generation using the C-ToBI system is better than the performance without the C-ToBI system. For future works, we will explore more complicated features that benefit Chinese phrase break and pitch accent prediction, and apply more useful smoothing techniques to improve our performance.
570
B. Kim
Acknowledgements The author thanks Voiceware Inc. for providing Chinese speech database for our experiments.
References 1. Sun, X.: Pitch accent prediction using ensemble machine learning. In: Proceedings of ICSLP 2002, Denver, Colorado, pp. 561–564 (2002) 2. Gregory, M.L., Altun, Y.: Using conditional random fields to predict pitch accents in conversational speech. In: Proceedings of the 42nd Annual Meeting of the ACL (2004) 3. Sun, X.: The Determination, Analysis, and Synthesis of Fundamental Frequency. PhD thesis, Northwestern University (2002) 4. Black, A.W., Hunt, A.: Generating f0 contours from ToBI lables using linear regression. In: Proceedings of the international conference on spoken language processing (ICSLP), pp. 1385–1388 (1996) 5. Jilka, M., Mohler, G., Dogil, G.: Rules for the generation of tobi-based american english intonation. Speech Communication 28, 83–108 (1999) 6. Lee, J., Kim, B., Lee, G.G.: Automatic corpus-based tone and break-index prediction using k-tobi representation. ACM transactions on asian language information processing (TALIP) 1, 207–224 (2002) 7. Ha, J.H., Zheng, Y., Lee, G.G.: Chinese segmentation and pos-tagging by automatic pos dictionary training. In: Proceedings of the 14th Conference of Korean and Korean Information Processing, Korea, Korean and Korean Information Processing, pp. 33–39 (2002) (in Korean) 8. Ha, J.H., Zheng, Y., Lee, G.G.: High speed unknown word prediction using support vector machine for chinese text-to-speech systems. In: Proceedings of the International Joint Conference on Natural Language Processing, Korea, pp. 509–517 (2004) 9. Zhang, H., Yu, J., Zhan, W., Yu, S.: Disambiguation of chinese polyphonic characters. In: Proceedings of the 1st International Workshop on Multimedia Annotation, Tokyo, Japan, MMA (2001) 10. AiJun, L.: Chinese prosody and prosodic labeling of spontaneous speech. In: Proceedings of Speech Prosody, Aix-en-Provence, France, pp. 39–46 (2002) 11. Malouf, R.: A comparison of algorithms for maximum entropy parameter estimation. In: Proceedings of CoNLL 2002, Taipei, Taiwan, pp. 49–55 (2002) 12. Le, Z.: Maximum Entropy Modeling Toolkit for Python and C++. Natural Language Processing Lab, Northeastern University, China (2004), http://www.nlplab.cn/zhangle/software/maxent/manual/ 13. Chen, S.F., Rosenfeld, R.: A gaussian prior for smoothing maximum entropy models. Technical Report CMU-CS-99-108, CMU (1999) 14. Taylor, P., Black, A.W.: Assigning phrase breaks from part-of-speech sequences. Computer Speech and Language 12, 99–117 (1998) 15. Sanders, E.: Using probabilistic methods to predict phrase boundaries for a textto-speech system. Master’s thesis, University of Nijmegen (1995) 16. Dusterhoff, K., Black, A.W.: Generating f0 contours for speech synthesis using the tilt intonation model. In: The ESCA Workshop on Intonation: Theory, Models and Application, Athens, Greece, pp. 107–110 (1997)
Chinese Prosody Generation Based on C-ToBI Representation for TTS
571
17. Zheng, Y., Kim, B., Lee, G.G.: Using multiple linguistic features for mandarin phrase break prediction in maximum-entropy calssification framework. In: Proceedings of the International Conference on Spoken Language Processing, Korea, pp. 737–740 (2004) 18. Cao, J.: Syntactic and lexical constraint in prosodic segmentation and grouping. In: Proceedings of Speech Prosody, Aix-en-Provence, France (2002) 19. Kim, B., Lee, G.G.: C-tobi-based pitch accent prediction using maximum-entropy model. In: Proceedings of the International Conference on Computational Science and Its Applications, United Kingdom, pp. 21–30 (2006)
CAS4UA: A Context-Aware Service System Based on Workflow Model for Ubiquitous Agriculture Yongyun Cho1 , Hyun Yoe1 , and Haeng-Kon Kim2 1
Information and Communication Engineering, Sunchon National University, 413 Jungangno, Suncheon, Jeonnam 540-742, Korea {yycho,yhyun}@sunchon.ac.kr 2 Department of Computer information & Communication Engineering, Catholic Univ. of Daegu, Korea [email protected]
Abstract. Practical automation in agriculture can save the time of the cultivation and increased the productivity. Generally, because agricultural environment is very changeable, the natural elements has to be considered in the autonomic cultivation process. Recently, the workflow model has been used as an effectual service automation model in traditional business environment, Web service environment, and ubiquitous computing environment. This paper proposes CAS4UA, which is a context-aware service system based on a workflow model for agriculture. The suggested system uses a workflow model to offer an automatic and context-aware service for ubiquitous agriculture, and to support the ubiquitous agriculture based on USN/RFID. To do this, the system uses a workflow model including various agricultural context information. Through the workflow model, the suggested system can dynamically and automatically control a service flow according to the changeable conditions in agricultural environment. Therefore, the proposed system can be useful in developing a smart service or automating work processes for ubiquitous agriculture.
1
Introduction
Most of works in agricultural environment is very arduous. So, the work automation would often produce such positive results as the growth of productivity, the competitive price, the reduction of work time and so on. The workflow service model has been one of the successful service automation models in such various computing environments as distributed computing environment, Web business environment, ubiquitous computing environment, and so on. Currently, most of
This research was supported by the MKE(Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program supervised by the IITA(Institute of Information Technology Advancement)” (IITA2009-(C1090-0902-0047)).
T.H. Kim and H. Adeli (Eds.): AST/UCMA/ISA/ACN 2010, LNCS 6059, pp. 572–585, 2010. c Springer-Verlag Berlin Heidelberg 2010
CAS4UA: A Context-Aware Service System Based on Workflow Model
573
work automation models or systems in agriculture may are based on USN/RFID. So, the they may need various contexts sensed from real sensor networks for service automation. The workflow model has successfully been used for a service automation in various computing environments. So, the workflow model can reasonably be used as a possible service automation model in ubiquitous agriculture. Recently, many of researches use workflow as a service automation model in such ubiquitous computing environment or pervasive computing environment as u-city, u-campus, u-home, u-agriculture, and so on [1,2,3,4,5,6,7]. The service automation may is to go ahead with the service processes without the human intervention without intervention of humans according to the changes of surrounding situation information [1]. A context-aware service is a kind of a smart service which can be automatically processed according to various situation information. A context in a ubiquitous environment means any information that can be used to characterize the situation of an entity [1,2]. In agriculture, the situation information may be such context information as time, temperature, humidity, air condition, and etc. So, an automation model or system for ubiquitous agriculture has to use the changeable environmental conditions as data to execute services automatically. This paper introduces CAS4UA, which is a context-aware service system based on a workflow model for agriculture. The suggested system uses the expanded version of uWDL[2,11,14], which is an academic context-aware workflow language, as a workflow model for work automation. A developer can execute an agricultural context-aware service with the various workflow model of the expanded uWDL, and can easily describe the agricultural contexts into the workflow. Therefore, a developer with the suggested system can easily develop various context-aware applications for ubiquitous agriculture. And, because a developed context-aware workflow can be easily reused for other context-aware services in agriculture, the developers can raise development efficiency and reusability.
2
Related Work
For now, there are many researched to use workflow technology in variou areas of the existing computing environment. POESIA [7] is a pragmatic approach to use a ontology-based workflow models and activity models to compose Web services in agriculture. To do this, the approach uses a metamodel, which is extend the workflow reference model of the WfMC, using domain-specific multidimensional ontologies in the field of agricultural environment. POESIA suggested a new approach of context-aware workflow through the various implementation issues of the approach. A conceptual system suggested by [15] is a framework for designing and managing context-aware workflows. The framework suggested the methods to integrate contexts with workflows to support context-aware or smart services. The framework introduced a context-activity architecture as the basis of the framework. CAME [12] stands for a novel Context-aware Workflow Management Engine. CAME is a kind of a middleware to control and coordinate devices connected
574
Y. Cho, H. Yoe, and H.-K. Kim
each other across ubiquitous networks. CWME uses a workflow model based on Business Process Management (BPM) to manage the devices and offer contextaware services using devices. CAWE [13] is a framework for the management of context-aware workflow systems based on Web Services. CAWE [13] supports a synthetic and extensible specification of context-sensitive workflows, which can be executed by standard workflow engines. The system explained an initial prototype of the CAWE framework using jBPM, which is a business process management system based on Petri Netmodel implemented in Java. However, even if it proposed a systematic and valuable architecture for contextaware workflow services, it does not offer an explicit method to express the contexts into a workflow scenario as service execution condition, and doesn’t enough describe how it can use user’s situation information or contexts, which can be dynamically occurred in real ubiquitous environments. AGENT WORK [16] is a workflow management system to support dynamic and automatic workflow adaptations. AGENT WORK uses rule-based approach and a temporal estimation method to revise the specific parts of the running workflow dynamically and autonomically according to the changes of conditions. The research [17] uses the workflow technology for smart workflows. For smart workflows, the research suggests a method to narrows a gap between the low-level provisioning of context and the concepts needed in smart workflows. The research shows a method to adopt context information for various semantical domains in workflow’s integration processes. MAPE [18] stands for a monitoring, analysis, planning and execution (MAPE) model to modify workflows on running. MAPE indicates a process and a method that the MAPE model is adopted for the Autonomic Computing community.
3 3.1
A Context-Aware Workflow Service System for Ubiquitoous Agriculture A Conceptual Architecture of the Suggested Service System
For a work automation and a context-aware service in agriculture, this paper introduces a context-aware service system using a workflow model based on the expanded uWDL, which is referred to WSDL and WfMC, as a service automation model. The workflow model can offer a method to directly and explicitly describe changeable conditional contexts in agricultural environment as context information into a context-aware workflow. So, based on the workflow scenario, the suggested system can support a context-aware service and work automation without human’s intervention in agriculture. Figure 1 shows a conceptual architecture of the suggested system. In Figure 1, the architecture of suggested system consists of a context model, a workflow develop environment, a webservice pool and a context-aware workflow engine. The Workflow Model is based on a RDF-based triplet model structure, which consists of <subject>, <predicate>, and