Advances in C omp uter and Inf ormation Sciences and Engineering
A dvances in C omp uter and Inf ormation S ciences and Engineering Edited by Tarek Sob h University of B rid g ep ort, C T , USA
Editor Dr. Tarek Sobh University of Bridgeport School of Engineering 221 University Avenue Bridgeport CT 06604 USA
[email protected]
ISBN: 978-1-4020-8740-0
e-ISBN: 978-1-4020-8741-7
Library of Congress Control Number: 2008932465 c 2008 Springer Science+Business Media B.V. No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper 9 8 7 6 5 4 3 2 1 springer.com
To Nihal, Omar, Haya, Sami and Adam
Contents Preface Acknowledgements
xv xvii
1.
A New Technique for Unequal-Spaced Channel-Allocation Problem in WDM Transmission System .....................................................................................................................................1 A.B.M.Mozzammel Hossain and Md. Saifuddin Faruk
2.
An Algorithm to Remove Noise from Audio Signal by Noise Subtraction ............................................5 Abdelshakour Abuzneid et al.
3.
Support Vector Machines Based Arabic Language Text Classification System: Feature Selection Comparative Study................................................................................................................11 Abdelwadood. Moh’d. Mesleh
4.
Visual Attention in Foveated Images ....................................................................................................17 Abulfazl Yavari, H.R. Pourreza
5.
Frequency Insensitive Digital Sampler and Its Application to the Electronic Reactive Power Meter..........................................................................................................................................21 Adalet N. Abiyev
6.
Non-Linear Control Applied to an Electrochemical Process to Remove Cr(VI) from Plating Wastewater............................................................................................................................................27 Regalado-Méndez, A. et al.
7.
Semantics for the Specification of Asynchronous Communicating Systems (SACS) ..........................33 A.V.S. Rajan et al.
8.
Application Multicriteria Decision Analysis on TV Digital .................................................................39 Ana Karoline Araújo de Castro et al.
9.
A Framework for the Development and Testing of Cryptographic Software .......................................45 Andrew Burnett, Tom Dowling
10.
Transferable Lessons from Biological and Supply Chain Networks to Autonomic Computing...........51 Ani Calinescu
11.
Experiences from an Empirical Study of Programs Code Coverage.....................................................57 Anna Derezińska
12.
A Secure and Efficient Micropayment System .....................................................................................63 Anne Nguyen and Xiang Shao
13.
An Empirical Investigation of Defect Management in Free/Open Source Software Projects...............68 Anu Gupta, Ravinder Kumar Singla
14.
A Parallel Algorithm that Enumerates all the Cliques in an Undirected Graph ....................................74 A. S. Bavan vii
v iii
CONTENTS
15.
Agent Based Framework for Worm Detection......................................................................................79 El-Menshawy et al.
16.
Available Bandwidth Based Congestion Avoidance Scheme for TCP: Modeling and Simulation ......................................................................................................................................85 A. O. Oluwatope et al.
17.
On the Modeling and Control of the Cartesian Parallel Manipulator....................................................90 Ayssam Y. Elkady et al.
18.
Resource Allocation in Market-Based Grids Using a History-Based Pricing Mechanism ...................97 Behnaz Pourebrahimi et al.
19.
Epistemic Structured Representation for Legal Transcript Analysis .................................................101 Tracey Hughes et al.
20.
A Dynamic Approach to Software Bug Estimation ............................................................................108 Chuanlei Zhang et al.
21.
Soft Biometrical Students Identification Method for e-Learning........................................................114 Deniss Kumlander
22.
Innovation in Telemedicine: an Expert Medical Information System Based on SOA, Expert Systems and Mobile Computing..............................................................................................119 Denivaldo Lopes et al.
23.
Service-Enabled Business Processes: Constructing Enterprise Applications – An Evaluation Framework ..........................................................................................................................................125 Christos K. Georgiadis, Elias Pimenidis
24.
Image Enhancement Using Frame Extraction Through Time.............................................................131 Elliott Coleshill et al.
25.
A Comparison of Software for Architectural Simulation of Natural Light.........................................136 Evangelos Christakou and Neander Silva
26.
Vehicle Recognition Using Curvelet Transform and Thresholding ....................................................142 Farhad Mohamad Kazemi et al.
27.
Vehicle Detection Using a Multi-Agent Vision-Based System ..........................................................147 Saeed Samadi et al.
28.
Using Attacks Ontology in Distributed Intrusion Detection System...................................................153 F. Abdoli, M. Kahani
29.
Predicting Effectively the Pronunciation of Chinese Polyphones by Extracting the Lexical Information.......................................................................................................................159 Feng-Long Huang et al.
30.
MiniDMAIC: An Approach for Causal Analysis and Resolution in Software Development Projects ................................................................................................................................166 Márcia G. S. Gonçalves et al.
CONTENTS
ix
31.
Light Vehicle Event Data Recorder Forensics ....................................................................................172 Jeremy S. Daily et al.
32.
Research of Network Control Systems with Competing Access to the Transfer Channel..................178 G.V.Abramov et al.
33.
Service-Oriented Context-Awareness and Context-Aware Services ..................................................184 H. Gümüşkaya, M. V. Nural
34.
Autonomous Classification via Self-Formation of Collections in AuInSys........................................190 Hanh H. Pham
35.
Grid Computing Implementation in Ad Hoc Networks ......................................................................196 Aksenti Grnarov et al.
36.
One-Channel Audio Source Separation of Convolutive Mixture........................................................202 Jalal Taghia, Jalil Taghia
37.
Extension of Aho-Corasick Algorithm to Detect Injection Attacks....................................................207 Jalel Rejeb, and Mahalakshmi Srinivasan
38.
Use of Computer Vision During the Process of Quality Control in the Classification of Grain .........213 Rosas Salazar Juan Manuel et al.
39.
Theoretical Perspectives for E-Services Acceptance Model...............................................................218 Kamaljeet Sandhu
40.
E-Services Acceptance Model (E-SAM) ............................................................................................224 Kamaljeet Sandhu
41.
Factors for E-Services System Acceptance: A Multivariate Analysis ................................................230 Kamaljeet Sandhu
42.
A Qualitative Approach to E-Services System Development .............................................................236 Kamaljeet Sandhu
43.
Construction of Group Rules for VLSI Application ...........................................................................242 Byung-Heon Kang et al.
44.
Implementation of an Automated Single Camera Object Tracking System Using Frame Differencing and Dynamic Template Matching..................................................................................245 Karan Gupta, Anjali V. Kulkarni
45.
A Model for Prevention of Software Piracy Through Secure Distribution .........................................251 Vaddadi P. Chandu et al.
46.
Performance Enhancement of CAST-128 Algorithm by Modifying Its Function ..............................256 Krishnamurthy G.N et al.
47.
A Concatenative Synthesis Based Speech Synthesiser for Hindi........................................................261 Kshitij Gupta
48.
Legibility on a Podcast: Color and Typefaces.....................................................................................265 Lennart Strand
x
CONTENTS
49.
The Sensing Mechanism and the Response Simulation of the MIS Hydrogen Sensor .......................268 Linfeng Zhang et al.
50.
Visual Extrapolation of Linear and Nonlinear Trends: Does the Knowledge of Underlying Trend Type Affect Accuracy and Response Bias?..............................................................................273 Lisa A. Best
51.
Resource Discovery and Selection for Large Scale Query Optimization in a Grid Environment.......279 Mahmoud El Samad et al.
52.
Protecting Medical Images with Biometric Information.....................................................................284 Marcelo Fornazin et al.
53.
Power Efficiency Profile Evaluation for Wireless Communication Applications ..............................290 Marius Marcu et al.
54.
Closing the Gap Between Enterprise Models and Service-Oriented Architectures ............................295 Martin Juhrisch, Werner Esswein
55.
Object Normalization as Contribution to the Area of Formal Methods of Object-Oriented Database Design..................................................................................................................................300 Vojtěch Merunka, Martin Molhanec
56.
A Novel Security Schema for Distributed File Systems .....................................................................305 Bager Zarei et al.
57.
A Fingerprint Method for Scientific Data Verification.......................................................................311 Micah Altman
58.
Mobile Technologies in Requirements Engineering ...........................................................................317 Gunnar Kurtz et al.
59.
Unsupervised Color Textured Image Segmentation Using Cluster Ensembles and MRF Model.........................................................................................................................................323 Mofakharul Islam et al.
60.
An Efficient Storage and Retrieval Technique for Documents Using Symantec Document Segmentation (SDS) Approach ...........................................................................................................329 Mohammad A. ALGhalayini and ELQasem ALNemah
61.
A New Persian/Arabic Text Steganography Using “La” Word ..........................................................339 Mohammad Shirali-Shahreza
62.
GTRSSN: Gaussian Trust and Reputation System for Sensor Networks............................................343 Mohammad Momani, Subhash Challa
63.
Fuzzy Round Robin CPU Scheduling (FRRCS) Algorithm ...............................................................348 M.H. Zahedi et al.
64.
Fuzzy Expert System In Determining Hadith Validity .......................................................................354 M. Ghazizadeh et al.
65.
An Investigation into the Performance of General Sorting on Graphics Processing Units .................360 Nick Pilkington, Barry Irwin
CONTENTS
x i
66.
An Analysis of Effort Variance in Software Maintenance Projects ....................................................366 Nita Sarang, Mukund A Sanglikar
67.
Design of Adaptive Neural Network Frequency Controller for Performance Improvement of an Isolated Thermal Power System.................................................................................................372 Ognjen Kuljaca et al.
68.
Comparing PMBOK and Agile Project Management Software Development Processes ...................378 P. Fitsilis
69.
An Expert System for Diagnosing Heavy-Duty Diesel Engine Faults................................................384 Peter Nabende and Tom Wanyama
70.
Interactive Visualization of Data-Oriented XML Documents ............................................................390 Petr Chmelar et al.
71.
Issues in Simulation for Valuing Long-Term Forwards......................................................................394 Phillip G. Bradford, Alina Olteanu
72.
A Model for Mobile Television Applications Based on Verbal Decision Analysis............................399 Isabelle Tamanini et al.
73.
Gene Selection for Predicting Survival Outcomes of Cancer Patients in Microarray Studies ............405 Tan Q et al.
74.
Securing XML Web Services by using a Proxy Web Service Model .................................................410 Quratul-ain Mahesar, Asadullah Shah
75.
O-Chord: A Method for Locating Relational Data Sources in a P2P Environment ............................416 Raddad Al King et al.
76.
Intuitive Interface for the Exploration of Volumetric Datasets ...........................................................422 Rahul Sarkar et al.
77.
New Trends in Cryptography by Quantum Concepts .........................................................................428 SGK MURTHY et al.
78.
On Use of Operation Semantics for Parallel iSCSI Protocol ..............................................................433 Ranjana Singh, Rekha Singhal
79.
BlueCard: Mobile Device-Based Authentication and Profile Exchange.............................................441 Riddhiman Ghosh, Mohamed Dekhil
80.
Component Based Face Recognition System......................................................................................447 Pavan Kandepet and Roman W. Swiniarski
81.
An MDA-Based Generic Framework to Address Various Aspects of Enterprise Architecture..........455 S. Shervin Ostadzadeh et al.
82.
Simulating VHDL in PSpice Software................................................................................................461 Saeid Moslehpour et al.
83.
VLSI Implementation of Discrete Wavelet Transform using Systolic Array Architecture.................467 S. Sankar Sumanth and K.A. Narayanan Kutty
x ii
CONTENTS
84.
Introducing MARF: A Modular Audio Recognition Framework and its Applications for Scientific and Software Engineering Research ...................................................................................473 Serguei A. Mokhov
85.
TCP/IP Over Bluetooth .......................................................................................................................479 Umar F. Khan et al.
86.
Measurement-Based Admission Control for Non-Real-Time Services in Wireless Data Networks.....................................................................................................................................485 Show-Shiow Tzeng and Hsin-Yi Lu
87.
A Cooperation Mechanism in Agent Organization .............................................................................491 W. Alshabi et al.
88.
A Test Taxonomy Applied to the Mechanics of Java Refactorings ....................................................497 Steve Counsell et al.
89.
Classification Techniques with Cooperative Routing for Industrial Wireless Sensor Networks .............................................................................................................................................503 Sudhir G. Akojwar, Rajendra M. Patrikar
90.
Biometric Approaches of 2D-3D Ear and Face: A Survey .................................................................509 S. M. S. Islam et al.
91.
Performance Model for a Reconfigurable Coprocessor ......................................................................515 Syed S. Rizvi et al.
92.
RFID: A New Software Based Solution to Avoid Interference ..........................................................521 Syed S. Rizvi et al.
93.
A Software Component Architecture for Adaptive and Predictive Rate Control of Video Streaming....................................................................................................................................................526 Taner Arsan, Tuncay Saydam
94.
Routing Table Instability in Real-World Ad-Hoc Network Testbed...................................................532 Tirthankar Ghosh, Benjamin Pratt
95.
Quality Attributes for Embedded Systems..........................................................................................536 Trudy Sherman
96.
A Mesoscale Simulation of the Morphology of the PEDT/PSS Complex in the Water Dispersion and Thin Film: the Use of the MesoDyn Simulation Code...............................................540 T. Kaevand et al.
97.
Developing Ontology-Based Framework Using Semantic Grid .........................................................547 Venkata Krishna. P, Ratika Khetrapal
98.
A Tree Based Buyer-Seller Watermarking Protocol ...........................................................................553 Vinu V Das
99.
A Spatiotemporal Parallel Image Processing on FPGA for Augmented Vision System.....................558 W. Atabany, and P. Degenaar
CONTENTS
x iii
100. Biometrics of Cut Tree Faces..............................................................................................................562 W. A. Barrett 101. A Survey of Hands-on Assignments and Projects in Undergraduate Computer Architecture Courses..................................................................................................................................566 Xuejun Liang 102. Predicting the Demand for Spectrum Allocation Through Auctions ..................................................571 Y. B. Reddy 103. Component-Based Project Estimation Issues for Recursive Development .........................................577 Yusuf Altunel, Mehmet R. Tolun Author Index................................................................................................................................................583 Subject Index ...............................................................................................................................................587
Preface
This book includes Volume I of the proceedings of the 2007 International Conference on Systems, Computing Sciences and Software Engineering (SCSS). SCSS is part of the International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering (CISSE 07). The proceedings are a set of rigorously reviewed world-class manuscripts presenting the state of international practice in Innovations and Advanced Techniques in Computer and Information Sciences and Engineering. SCSS 07 was a high-caliber research conference that was conducted online. CISSE 07 received 750 paper submissions and the final program included 406 accepted papers from more than 80 countries, representing the six continents. Each paper received at least two reviews, and authors were required to address review comments prior to presentation and publication. Conducting SCSS 07 online presented a number of unique advantages, as follows: •
All communications between the authors, reviewers, and conference organizing committee were done on line, which permitted a short six week period from the paper submission deadline to the beginning of the conference.
•
PowerPoint presentations, final paper manuscripts were available to registrants for three weeks prior to the start of the conference.
•
The conference platform allowed live presentations by several presenters from different locations, with the audio and PowerPoint transmitted to attendees throughout the internet, even on dial up connections. Attendees were able to ask both audio and written questions in a chat room format, and presenters could mark up their slides as they deem fit.
•
The live audio presentations were also recorded and distributed to participants along with the power points presentations and paper manuscripts within the conference DVD.
The conference organizers and I are confident that you will find the papers included in this volume interesting and useful. We believe that technology will continue to infuse education thus enriching the educational experience of both students and teachers. Tarek M. Sobh, Ph.D., PE Bridgeport, Connecticut June 2008
xv
Acknowledgements
The 2007 International Conference on Systems, Computing Sciences and Software Engineering (SCSS) and the resulting proceedings could not have been organized without the assistance of a large number of individuals. SCSS is part of the International Joint Conferences on Computer, Information, and Systems Sciences, and Engineering (CISSE). CISSE was founded by Professor Khaled Elleithy and myself in 2005, and we set up mechanisms that put it into action. Andrew Rosca wrote the software that allowed conference management, and interaction between the authors and reviewers online. Mr. Tudor Rosca managed the online conference presentation system and was instrumental in ensuring that the event met the highest professional standards. I also want to acknowledge the roles played by Sarosh Patel and Ms. Susan Kristie, our technical and administrative support team. The technical co-sponsorship provided by the Institute of Electrical and Electronics Engineers (IEEE) and the University of Bridgeport is gratefully appreciated. I would like to express my thanks to Prof. Toshio Fukuda, Chair of the International Advisory Committee and the members of the SCSS Technical Program Committee including: Abdelaziz AlMulhem, Alex A. Aravind, Ana M. Madureira, Mostafa Aref, Mohamed Dekhil, Julius Dichter, Hamid Mcheick, Hani Hagras, Marian P. Kazmierkowski, Low K.S., Michael Lemmon, Rafa Al-Qutaish, Rodney G. Roberts, Sanjiv Rai, Samir Shah, Shivakumar Sastry, Natalia Romalis, Mohammed Younis, Tommaso Mazza, and Srini Ramaswamy. The excellent contributions of the authors made this world-class document possible. Each paper received two to four reviews. The reviewers worked tirelessly under a tight schedule and their important work is gratefully appreciated. In particular, I want to acknowledge the contributions of the following individuals: Ahmad Almunayyes, Alexander Alegre, Ali Abu-El Humos, Amitabha Basuary, Andrew Burnett, Ani Calinescu, Antonio José Balloni, Baba Ahmed Isman Chemch Eddine, Charbel Saber, Chirakkal Easwaran, Craig Caulfield, Cristian Craciun, Emil Vassev, Evangelos Christakou, Francisca Márcia Gonçalves, Geneflides Silva, Hanh Pham, Harish CL, Imran Ahmed, Jose Maria Pangilinan, Khaled Elleithy, Leticia Flores, Ligia Chira Cremene, Madiha Hussain, Michael Horie, Miguel Barron-Meza, Mohammad ALGhalayini, Mohammed Abuhelalh, Phillip Bradford, Rafa Al-Qutaish, Rodney G. Roberts, Seppo Sirkemaa, Srinivasa Kumar Devireddy, Stephanie Chua, Steve Counsell, Uma Balaji, Vibhore Jain, Xiaoquan Gao, and Ying-ju Chen Tarek M. Sobh, Ph.D., PE Bridgeport, Connecticut June 2008
xvii
A New Technique for Unequal-Spaced Channel-Allocation Problem in WDM Transmission System A.B.M.Mozzammel Hossain and Md. Saifuddin Faruk Dept. of Electrical & Electronic Engineering, Dhaka University of Engineering & Technology, Gazipur, Bangladesh
Abstract - For long-haul fiber-optic transmission systems to support multiple high speed channels wavelength-division multiplexing (WDM), is currently being deployed to achieve high-capacity. It allows information to be transmitted at different channels with different wavelength. But FWM is one of the major problems needed to be taken into account when one designs high-capacity long-haul WDM transmission system. Recently, unequal-spaced channelallocation technique have been studied and analyzed to reduce four wave mixing (FWM) crosstalk. Finding a solution by this proposed channel allocation technique need two parameters such as minimum channel spacing and number of channel used in WDM system. To get better result minimum channel spacing has to be selected perfectly.
INTRODUCTION WDM system, which allows information at various channels to be transmitted in different wavelengths, fully exploits the vast bandwidth provided by optical fiber. If the frequency separation of any two channels of a WDM system is different from that of any other pair of channels, no FWM waves will be generated at any of the channel frequencies, thereby suppressing FWM crosstalk. When three carrier frequencies f1, f2 & f3 copropagate in a fiber they will produced a fourth wave having frequency f = f1+f2-f3. Also in FWM two signals (f1, f2) mix to produce two new frequencies: f3 = 2f1 - f2, f4= 2f2-f1. Two techniques that can determine (and mathematically prove) the total numbers of FWM signals falling onto the operating band and each channel for the unequal-spaced WDM systems are: (1) Frequency Difference Triangle (FDT), (2) Frequency Difference Square (FDS). These techniques are also applicable to equalspaced systems. By knowing the two numbers, one can adjust the system parameters, such as minimum channel spacing, in order to reduce the adverse effects of FWM crosstalk and interchannel interference, or avoid the assignment of channels at locations with the most severe crosstalk. A design methodology of channel spacing is presented to satisfy the above requirement. The method is a generalization of what had been proposed in the 1950’s to reduce the effect of 3rd-order intermodulation interference in radio systems. The use of proper unequal channel spacing keeps FWM waves from coherently interfering with the signals. Nevertheless, the FWM waves are still generated at the expense of the transmitted power, giving rise to patterndependent depletion of the channels. In this paper, a “simplified algebraic” framework for finding the solutions to unequalspaced channel-allocation problem is reported. Proposed algorithms are introduced to provide a fast and simple alternative to solve the problem.
PROPOSED CHANNEL ALLOCATION TECHNIQUE: Proposed channel allocation technique base on analytical process of FDT and FDS. Finding a solution by this channel allocation technique need two parameters such as minimum channel spacing and number channel used in WDM system. To get better result minimum channel spacing has to be selected perfectly. Although the unequal-spaced channel-allocation techniques greatly reduce FWM crosstalk, the number of unequal-spaced WDM channels supported is always less than that of conventional equal-spaced WDM systems when the operating optical bandwidth and the minimum channel spacing of both kinds of systems are the same [1]. In other words, the minimum channel spacing in an unequal-spaced system has to be reduced and is thus less than the channel spacing in an equalspaced system when the same number of channels is accommodated in a fixed operating bandwidth. The effect of interchannel interference gets worse as the minimum channel spacing decreases[2]-[5]. However, to reduce the impact of the interchannel interference, the minimum channel spacing must be large enough. They tend to counter-act against each other and we need to have a good balance between them while designing an unequal-spaced WDM system with a fixed operating bandwidth and a pre-determined number of channels. As a result, FWM crosstalk may sometime be unavoidable in order to keep the interchannel interference within an acceptable level [3]. While the effects of interchannel interference and FWM crosstalk in WDM systems have been studied and understood [6], [4], [7], [8], [9] it will be helpful to system designers if there exists a fast tool to measure the strength of FWM crosstalk in the operating band as well as in each channel, instead of going through complex analyzes. At present work it is found that to get the optimum solution/maintain minimum FWM crosstalk falling onto each channel, by the proposed algorithm the following should be kept in mind: 01. Minimum channel spacing for N= 2P channel is as follows:
n = 2 p + p ± q …….. p ≥ 2, q = −2,−1,0,1,2,3.........
Minimum channel spacing where
(1)
02. Minimum channel spacing n = (3P – 1)/2, Number of unequal space WDM channel N = P+1 and the total number of slots occupied by these channel S = P (2P-1) are constructed algebraically for a given prime number (P) [1]. 03. Many other solution can be achieved by adjusting minimum channel spacing (n) and number channel (N) used in WDM system
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 1–4. © Springer Science+Business Media B.V. 2008
2
HOSSAIN AND FARUK
ALLOCATION ALGORITHMS First choose the number of channels (N) that you want to use in
Table-1 Computation of slot vector n by proposed algorithm
your WDM system. Then choose any arbitrary minimum channel
Exa
spacing (n). The maximum channel spacing M= N + n - 2. Now
mple
starting with 0 unequal channel spacing sequences are 0, n, n+1,
N
n
M
S
an example of sot vector n
01
3
1
2
3
[0,1,3]
n+2, n+3 .… …. …. n+N-3, n+N-2. If N>6 and n>6, make two
02
3
3
4
7
[0,3,7]
equal set of resultant unequal channel spacing sequence. The
03
3
4
5
9
[0,4,5]
second set includes m sequential unequal channel spacing
04
4
2
4
9
[0,2,5,9]
sequence except 0, where m = N/2. First set includes 0 and rest of
05
4
3
5
12
[0,3,7,12]
the sequential unequal channel spacing sequence. Otherwise, if
06
4
4
6
15
[0,4,9,15]
N ≤ 6 and n ≤ 6, make one set of resultant unequal channel
07
4
5
7
18
[0,5,11,18]
spacing sequence.
08
6
5
9
35
[0,5,11,18,26,35]
09
6
6
10
40
[0,6,13,21,30,40]
10
6
7
11
45
[0,10,21,28,36,45]
In first case Let N=8, n=10 then M=8+10-2=16 and m=8/2=4. The unequal channel spacing sequence are 0, 10,11,12,13,14,15,16. Elements for first set [0,14,15,16] and elements for second set [10,11,12,13].
11
6
8
12
50
[0,11,23,31,40,50]
12
8
1
7
28
[0,5,11,18,19,21,24,28]
13
8
4
10
49
[0,8,17,27,31,36,42,49]
The resultant unequal channels for WDM are as follows:
14
8
10
16
91
[0,14,29,45,55,66,78,91]
15
8
12
18
105
[0,16,33,51,63,76,90,105]
16
10
13
21
153
EXAMPLE:
n1= 0+0 = 0, n2= n1+14 = 14, n3 = n2 +15 = 29, n4 = n3 +16 = 45, n5 = n4 +10 = 55, n6 = n5 +11=66, n7 = n6 +12=78, n8 = n7 +13=91. In second case Let N=4, n= 4 then M= 4+ 4 – 2 =6. The unequal
17
12
7
17
132
channel spacing sequences are 0, 4, 5, 6. The resultant unequal channels for WDM are as follows: n1= 0+0 = 0, n2= n1 + 4 = 4, n3 = n2 + 5 = 9, n4 = n3 + 6 = 15.
[0,18,37,57,78,91,105,120,136, 153] [0,13,27,42,58,75,82,90,99,109 ,120,132]
18
16
20
34
405
0,28,57,87,150,183,217,236,25 8,280,303,327,352,378,405
3
UNEQUAL-SPACED CHANNEL-ALLOCATION PROBLEM Table -1.1
Table -1.4
The number of FWM product falling onto each channel of the unequal
The number of FWM product falling onto each channel of the unequal
equal WDM system with N=3 shown in above Table-1
equal WDM system with N=8 shown in above Table-1
Equal
Unequal Spaced
channel
chann
spaced
el
[0,1,2]
Example1
Example2
Example3
N1
1
0
0
0
N1
N2
1
0
0
0
N2
N3
1
0
0
0
Total
3
0
0
0
Equal spaced
Unequal Spaced
[0,5,10,15,20, 25,30,35]
Example 13
Example14
Example15
12
0
0
0
15
1
0
0
N3
17
1
0
0
N4
18
1
0
0
N5
18
0
0
0
Table -1.2
N6
17
1
0
0
The number of FWM product falling onto each channel of the unequal
N7
15
0
0
0
equal WDM system with N=4 shown in above Table-1
N8
12
0
0
0
Total
124
4
0
0
Equal channel
Unequal Spaced
spaced [0,2,4,6]
Table -1.5 Exampl
Examp
Examp
Example
The number of FWM product falling onto each channel of the unequal equal WDM system with N=10 and N= 12 shown in above Table-1
e4
le 5
le 6
7
N1
2
0
0
0
0
N2
3
0
0
0
0
N3
3
0
0
0
0
[0,5,10,15,20,25,30,
N4
2
0
0
0
0
35,40,45]
Total
10
0
0
0
0
Table -1.3 The number of FWM product falling onto each channel of the unequal equal WDM system with N=6 shown in above Table-1 channel
Equal
Unequal Spaced
spaced [0,5,10,15
Exa
Example
Example
,20,25,30]
mple
9
10
8 N1
6
0
0
0
N2
8
0
0
0
N3
9
0
0
0
N4
9
0
0
0
N5
8
0
0
0
N6
6
0
0
0
Total
46
0
0
0
channel
Equal spaced
Unequal Spaced
Example 16
Example17
N1
20
0
2
N2
24
0
1
N3
27
0
2
N4
29
1
3
N5
30
0
2
N6
30
0
4
N7
29
1
2
N8
27
0
2
N9
24
0
2
N10
20
1
1
N11
×
×
1
N12
×
×
2
Total
260
3
24
4
HOSSAIN AND FARUK Table -2 Some results obtained from the proposed constructions, where P is a prime number, N is the number of unequal-spaced WDM channels, S is the total number of slots occupied by these channel, n is the minimum channel separation, M is the maximum channel separation and n is the slot vector.
P
N
S
n
M
an example of sot
FWM
vector n
product
3
4
15
4
6
[0,4,9,15]
0
5
6
45
7
11
[0,10,21,28,36,45]
0
7
8
91
10
16
[0,14,29,45,55,66,7
0
Although proposed algorithm is the simplest than other algorithm exists at present, but it has some limitations. For higher order channels it is not able to give optimum solution for WDM system. But the algorithm describe above gives solution for maintaining minimum FWM crosstalk. At present work it is not possible to overcome all of the algorithm’s limitations. Besides in this paper, algorithm is investigating only software simulation by FDT and FDS. To overcome it’s limitations need practical implementation facilities instate of algebraic analytical process. At present research that type of facilities is not available. The author hope so, by Allah willing, in future research the author will be able to overcome all of algorithm limitations and make it unique channel allocation algorithm for WDM system.
REFERENCES
8,91] 11
12
231
16
26
[0,22,45,69,94,120,
3
[1]
136,153,171,190,2 10,231] 13
14
325
19
31
[0,26,53,81,110,14
16
0,171,190,210,231,
[2]
253,276,300,325] 17
18
561
25
41
[0,34,69,105,142,1 80,219,259,300,32
25
[3]
5,351,378,406,435, 465, 496,528,561] 19
20
703
28
46
[4]
[0,38,77,117,158,2 00,243,287,332,37
12
[5]
8,406,435,465,496, 528,561,595,630,6 66,703]
[6]
CONCLUSION To reduce Four-Wave Mixing Crosstalk in high-capacity, long-haul, repeaterless, WDMTransmission system, proposed technique is used for finding the solution to be Unequal-Spaced Channel–Allocation Problem. The problem has been formulated algebraically and provided a programming code for very fast solution. Numerical example has been given to illustrate the constructions. The construction provides a fast and simple alternative to solve the problem, besides the recently exists methods. In the proposed technique there is no bound to find unequal spaced channel allocation problem. But results are only valid with respect to the limitation.
[7] [8]
[9]
Wing C. Kwong, and Guu-Chang Yang “An Algebraic Approach to the Unequal-Spaced Channel-Allocation Problem in WDM Lightwave Systems.” IEEE transactions on communications, vol. 45, no. 3, march 1997 B. Hwang and O. K. Tonguz, “A generalized suboptimum unequallyspaced channel allocation technique—Part I: In IM/DDWDMsystems,”IEEE Trans. Commun., vol. 46, pp. 1027–1037, Aug. 1998. O. K. Tonguz and B. Hwang, “A generalized suboptimum unequallyspaced channel allocation technique—Part II: In coherent WDM systems,”IEEE Trans. Commun., vol. 46, pp. 1186–1193, Sept. 1998. L.G.Kazovsky,”Multichannel coherent optical communications systems,”J. Lightwave Technol., vol. LT-5, pp. 1095–1102, Aug. 1987. M. O. Tanrikulu and O. K. Tonguz, “The impact of crosstalk and phasenoise on multichannel coherent lightwave systems,” IEICE Trans.Commun., vol. E78B, no. 9, pp. 1278–1286, Sept. 1995. F. Forghieri, R. W. Tkach, A. R. Chraplyvy, and D. Marcuse, “Reductionof four-wave mixing crosstalk in WDM systems using unequally spaced channels,” IEEE Photon. Technol. Lett., vol. 6, no. 6, pp. 754F. Forghieri, R. W. Tkach, and A. R. Chraplyvy, “WDM systems withunequally spaced channel,” J. Lightwave Technol., vol. 13, pp. 889–897,May 1995. K. Inoue, H. Toba, and K. Oda, “Influence of fiber four-wave mixingon multichannel direct detection transmission systems,” J. LightwaveTechnol., vol. 10, pp. 350–360, Mar. 1992. K. Inoue and H. Toba, “Theoretical evaluation of error rate degradation due to fiber four-wave mixing in multichannel FSK heterodyne envelope detection transmissions,” J. Lightwave Technol., vol. 10, pp. 361–366,Mar. 1992.
An Algorithm to Remove Noise from Audio Signal by Noise Subtraction Abdelshakour Abuzneid, Moeen Uddin, Shaid Ali Naz, Omar Abuzaghleh Department of Computer Science University of Bridgeport
[email protected],
[email protected],
[email protected],
[email protected]
Abstract- This paper proposes an algorithm for removing the noise from the audio signal. Filtering is achieved through recording the pattern of noise signal. In order to validate our algorithm, we have implementation in MATLAB 7.0. We investigated the effect of proposed algorithm on human voice and compared the results with the existing related work, most of which employ simple algorithm and effect the voice signal. The results show that the proposed algorithm makes efficient use of Voice over the IP communication with less noise in comparison to similar existing algorithm.
I. INTRODUCTION Whenever we speak in microphone, it does not catch only the waves which come out from our mouth, but it also catches the waves coming from other sources like fan, vacuum cleaner, phone ringing, or other sources, and combines the noise with the real voice signal. The input signal to the microphone becomes noisy. To remove this noise we have to know the characteristics of the noise and the needed voice signal, so that we can separate noise from the original voice signal. The three main characteristics of signals are, A. Amplitude This is the strength of the signal. It can be expressed in a number of different ways (as volts, decibels). The higher the amplitude, the stronger (louder) the signal. The decibel is a popular measure of signal strength [8].
B. Frequency This is the rate of change the signal undergoes every second, expressed in Hertz (Hz), or cycles per second. A 30Hz signal changes thirty times a second. In speech, we also refer to it as the number of vibrations per second. As we speak, the air is forced out of our mouths, being vibrated by our voice box. Men, on average, tend to vibrate the air at a lower rate than women, thus tend to have deeper voices. A cycle is one complete movement of the wave, from its original start position and back to the same point again. The number of cycles (or waves) within one second time interval is called cycles-per-second, or Hertz [8].
40db normal speech 90db lawn mowers 110db shotgun blast 120db jet engine taking off 120db rock concerts It has been discovered that exposure to sounds greater than 90db for a period of time exceeding 15 minutes causes permanent damage to hearing. Our ability to hear high notes is affected. As young babies, we have the ability to hear quite high frequencies. This ability reduces as we age. It can also be affected by too much noise over sustained periods. Ringing in the ears after being exposed to loud noise is an indication that hearing loss may be occurring [8].
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 5–10. © Springer Science+Business Media B.V. 2008
6
ABUZNEID ET AL.
C. Phase This is the rate at which the signal changes its relationship to time, expressed as degrees. One complete cycle of a wave begins at a certain point, and continues till the same point is reached again. Phase shift occurs when the cycle does not complete, and a new cycle begins before the previous one has fully completed [8].
Figure 1: Signal 1 recorded at home
Generating Noise in the Background We need to generate background noise. Background noises come in many different shapes and sizes (figuratively speaking). In layman’s terms, Background noise is often described as “office ventilation noise”, “car noise”, “street noise”, “cocktail noise”, “background music”, etc. Although this classification is practical for human understanding, the algorithms that model and produce comfort noise see things in more mathematical terms. The most basic and intuitive property of Background noise is its loudness. This is referred to as the signal’s energy level. Another less obvious property is the frequency distribution of the signal. For example, the hum of a running car and that of a vacuum cleaner can have the same energy level, yet they do not sound the same: these two signals have distinctly different spectrums. The third property of BGN is the variability over time of the first two properties. When a Background noise’s energy level and spectrum are constant in time, it is said to be stationary. Some environments are prone to contain non-stationary BGN. The best example is street noise, in which cars come and go [10].
Figure 2: Signal 2 recorded at computer lab
By magnifying a portion of the signal 1 we can see the noise strip more clearly.
Good algorithms must cope well with all types of background noises. The regenerated noise must match the original signal as closely as possible. II. SUGGESTED ALGORITHM Step 1: We recorded two voice signals in two distinct environments and plotting the signal on graph, we examined that mostly the continuous noise comes in the range between -1.5 to +1.5 as you can see in the following snapshots.
Figure 3: Magnified portion of the Signal in figure 1
AN ALGORITHM TO REMOVE NOISE FROM AUDIO SIGNAL BY NOISE SUBTRACTION
7
Recording this noise strip to a different variable and drawing on graph as shown in the snapshot.
Figure 4: Recorded noise strip
This noise appears when there is no voice signal. If we subtract this signal from the noise signal we get the output signal shown in figure4.
Figure 7: Output signal after subtraction of noise from Signal 2.
We played these signals and examined that there was no noise when there was no recorded voice but there was noise during the voice signal. It is because of the recorded noise empty spaces recorded during the voice where the range of signal was not between -1.5 and +1.5. Step 2: We filled these empty spaces by copying the pattern of noise, on those places where the noise is 0 or straight line. And the recorded noise strip becomes:
Figure 5: Output signal after subtraction of noise from Signal 1.
Figure 8: Noise strip with filled spaces
This whole strip in the above figure is mostly similar to actual noise. By subtracting this noise strip from the first recorded signals we can get the noise free signal as shown in the following snapshot.
Figure 6: Magnified portion of above Signal
8
ABUZNEID ET AL.
Figure 9: Output signal 1 after subtracting noise strip
Figure 11: Output signal 1
Figure 10: Output signal 2 after subtracting noise strip
Figure 12: Output signal 2
We played these signals and found that there was no noise during the voice and without voice. But unfortunately the voice signal is little bit effected it seems like the voice signal lost some of its characteristics.
We played these signals and examined that the final output signals were mostly noise free signal. We have chosen the range +8 to -10. This is a critical point; since the clarity of voice depends on the range we may choose.
Step 3: We solved this problem by putting some specific ranges for noise subtraction. We subtracted noise from the signal range that appears between -1.5 and +1.5, the signal above the +8 and below the -10.
AN ALGORITHM TO REMOVE NOISE FROM AUDIO SIGNAL BY NOISE SUBTRACTION
a = getaudiodata(micrecorder, ‘int8’);
III. FLOW CHART
[m,n] = size(a); j=1; k=1;
Input Voice
% Recording Noise for i = 1 : m if (a(i,1) <= 0) && ( a(i,1) >= -1.5 ) b(i,1) = a(i,1); else b(i,1) = 0; end end
Add noise to Voice
% Subtracting Noise for i = 1 : m if ( a(i,1) < 0 ) r(i,1)=a(i,1)-b(i,1); elseif ( a(i,1) > 0 ) r(i,1)=a(i,1)+b(i,1); end end
Separate noise from recorded signal
Noise Signal
% Plotting Signals figure, plot(a); figure, plot(b); figure, plot(r); x=audioplayer(a,22050,8); y=audioplayer(b,22050,8); z=audioplayer(r,22050,8);
Subtract noise from Noisy signal
% Playing Signals play(x); play(y); play(z); B. Module 2 code
Noise free signal
% Recording Audio Signal micrecorder = audiorecorder(22050,8,1); record(micrecorder,3); a = getaudiodata(micrecorder, ‘int8’);
IV. MATLAB CODE The algorithm has three modules which are coded in Matlab 7.0. We used a microphone to record 3 seconds of 22-kHz, 8bit, mono data. A. Module 1 code % Recording Audio Signal micrecorder = audiorecorder(22050,8,1); record(micrecorder,3);
[m,n] = size(a); j=1; k=1; % Recording Noise for i = 1 : m if (a(i,1) <= 0) && ( a(i,1) >= -1.5 ) b(i,1) = a(i,1); c(j,1) = a(i,1); j = j+1; else
9
10
ABUZNEID ET AL.
elseif ( (a(i,1) > 0 && a(i,1)<1.5) || a(i,1)>8 ) r(i,1)=a(i,1)+b(i,1); else r(i,1)=a(i,1); end
b(i,1) = c(k,1); k = k+1; end end % Subtracting Noise for i = 1 : m if ( a(i,1) < 0 ) r(i,1)=a(i,1)-b(i,1); elseif ( a(i,1) > 0 ) r(i,1)=a(i,1)+b(i,1); end end % Plotting Signals figure, plot(a); figure, plot(b); figure, plot(c); figure, plot(r); x=audioplayer(a,22050,8); y=audioplayer(b,22050,8); z=audioplayer(r,22050,8); % Playing Signals play(x); play(y); play(z); C. Module 3 code % Recording Audio Signal micrecorder = audiorecorder(22050,8,1); record(micrecorder,3);
end % Plotting Signals figure, plot(a); figure, plot(b); figure, plot(c); figure, plot(r); x=audioplayer(a,22050,8); y=audioplayer(b,22050,8); z=audioplayer(r,22050,8); % Playing Signals play(x); play(y); play(z); V. CONCLUSION We presented an algorithm for noise removal from a voice signal through stepwise study methodology. We notice significant improvement in voice signal when it is filtered out through this algorithm. The algorithm works great on uniform noise. We are currently working on other techniques for removing the noise from signal so that it does not affect the characteristics of the original voice. We are extending our work so that we can implement our algorithm in voice over IP, messengers or cellular phones. REFERENCES
a = getaudiodata(micrecorder, ‘int8’); [m,n] = size(a); j=1; k=1; % Recording Noise for i = 1 : m if (a(i,1) <= 0) && ( a(i,1) >= -1.5 ) b(i,1) = a(i,1); c(j,1) = a(i,1); j = j+1; else b(i,1) = c(k,1); k = k+1; end end % Subtracting Noise for i = 1 : m if ( (a(i,1) < 0 && a(i,1)>-1.5) || a(i,1)<-10) r(i,1)=a(i,1)-b(i,1);
[1]
Jaroslav Koton , Kamil Vrba, Pure Current-Mode Frequency Filter for Signal Processing in High-Speed Data Communication, Issue Date: April 2007 pp. 4 [2] Yu-Jen Chen , Chin- Chang Wang , Gwo-Jia Jong , Boi-Wei Wang, The Separation System of the Speech Signals Using Kalman Filter with Fuzzy Algorithm, Issue Date: August 2006 pp. 603-606 [3] Jafar Ramadhan Mohammed, “A New Simple Adaptive Noise Cancellation Scheme Based On ALE and NLMS Filter”, Issue Date: May 2007 pp. 245-254 [4] Jian Zhang , Qicong Peng , Huaizong Shao , Tiange Shao, “Nonlinear Noise Filtering with Support Vector Regression”, Issue Date: October 2006 pp. 172-176 [5] Xingquan Zhu , Xindong Wu, “Class Noise Handling for Effective CostSensitive Learning by Cost-Guided Iterative Classification Filtering,” Issue Date: October 2006 pp. 1435-1440. [6] Carlos Sanchez-Lopez , Esteban Tlelo-Cuautle, “Symbolic Noise Analysis in Gm-C Filters,” Issue Date: September 2006 pp. 49-53. [7] DANIELE RIZZETTO AND CLAUDIO CATANIA, A VOICE OVER IP SERVICE ARCHITECTURE for Integrated Communications, MAY - JUNE 1999 [8] http://www.cs.ntu.edu.au/sit/resources/dc100www/dc1009.htm [9] http://www.bcae1.com/sig2nois.htm [10] http://www.commsdesign.com/design_corner/OEG20030303S0036
Support Vector Machines Based Arabic Language Text Classification System: Feature Selection Comparative Study Abdelwadood. Moh’d. Mesleh Computer Engineering Department, Faculty of Engineering Technology, Balqa’ Applied University, Amman, Jordan. Abstract- feature selection (FS) is essential for effective and more accurate text classification (TC) systems. This paper investigates the effectiveness of five commonly used FS methods for our Arabic language TC System. Evaluation used an in-house collected Arabic TC corpus. The experimental results are presented in terms of macro-averaging precision, macroaveraging recall and macro-averaging F1 measure.
I.
INTRODUCTION
It is known that the volume of Arabic information available on Internet is increasing. This growth motivates researchers to find some tools that may help people to better managing, filtering and classifying these huge Arabic information resources. TC [1] is the task to classify texts to one of a prespecified set of categories or classes based on their contents. It is also referred as text categorization, document categorization, document classification or topic spotting. TC is among the many important research problems in information retrieval (IR), data mining, and natural language processing. It has many applications [2] such as document indexing, document organization, text filtering, word sense disambiguation and web pages hierarchical categorization. TC has been studied as a binary classification approach (a binary classifier is designed for each category of interest), a lot of TC training algorithms have been reported in binary classification e.g. Naïve Bayesian method [3,4], k-nearest neighbors (kNN) [4,5,6], support vector machines (SVM) [7], decision tree [8], etc. On the other hand, it has been studied as a multi classification approach e.g. boosting [9], and multiclass SVM [10,11]. In TC tasks, supervised learning is a very popular approach that is commonly used to train TC systems (algorithms). TC algorithms learn classification patterns from a set of labeled examples, given an enough number of labeled examples (training set), and the task is to build a TC model. Then we can use the TC system to predict the category (class) of new (unseen) examples (testing set). In many cases, the set of input variables (features) of those examples contains redundant features and do not reveal significant input-output (documentcategory) characteristics. This is why FS techniques are essential to improve classification effectiveness. The rest of this paper is organized as follows. Section 2 summarizes the Arabic TC and FS related work. Section 3
describes the TC design procedure. Experimental results are shown in section 4. Section 5 draws some conclusions and outlines future work II. ARABIC TC AND FS RELATED WORK Most of the TC research is designed and tested for English language articles. However, some TC approaches were carried out for other European languages such as German, Italian and Spanish [12], and some others were carried out for Chinese and Japanese [13,14]. There is a little TC work [15] that is carried out for Arabic articles. To our best knowledge, there is only one commercial automatic Arabic text categorizer referred as “Sakhr Categorizer” [16]. Compared to other languages (English), Arabic language has an extremely rich morphology and a complex orthography; this is one of the main reasons [15,17] behind the lack of research in the field of Arabic TC. However, many machine learning approaches have been proposed to classify Arabic documents: SVM with CHI square feature extraction method [18,19], Naïve Bayesian method [20], k-nearest neighbors (kNN) [21,22,23], maximum entropy [17,24], distance based classifier [25,26,27], Rocchio algorithm [23] and WordNet knowledge based approach [28]. It is quit hard to fairly compare the effectiveness of these approaches because of the following reasons: • Their authors have used different corpora (because there is no publicly available Arabic TC corpus). • Even those who have used the same corpus, it is not obvious whether they have used the same documents for training/testing their classifiers or not. • Authors have used different evaluation measures: accuracy, recall, precision and F1 measure. For English language TC tasks, the valuable studies [29,30] have presented extensive empirical studies of many FS methods with kNN and SVM, it has been reported that X 2 square statistics (CHI) and information gain (IG) [29] FS methods performed most effective with kNN classifier. On the other hand, it has been shown that mutual information (MI) and term strength (TS) [29] performed terribly. However, IG [30] is the best choice to improve SVM classifier performance in term of precision.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 11–16. © Springer Science+Business Media B.V. 2008
12
MESLEH
To our best knowledge, the only work that investigated the usage of some FS methods for Arabic language TC tasks is [23]. IG, CHI, document frequency (DF), odd ratio (OR), GSS and NGL FS methods have been evaluated using a hybrid approach of light and trigram stemming. It has been shown that the usage of any of those FS methods separately gave near results, NGL performed better than DF, CHI and GSS with Rocchio classifier in term of F1 measure (it was noticed that when using IG and OR, the majority of documents contain non of the selected terms). It has been concluded that a hybrid approach of DF and IG is a preferable FS method with Rocchio classifier. It is clear that authors of [23] have not reported the comparison results of the mentioned FS methods in terms of recall, precision and F1 measure, and they have not considered SVM which was already known to be superior to their classifiers. In this paper, we have restricted our study of TC on binary classification methods and in particular to SVM and only for Arabic language articles. On the other hand, through fair comparison experiments, we have investigated the performance of the well-known FS methods with SVM for Arabic language TC tasks. III. TC DESIGN PROCEDURE TC system design usually compromises the following three main phases [7]: • Data pre-processing and FS phase is to make the text documents compact and applicable to train the text classifier. • The text classifier, the core TC learning algorithm, shall be constructed, learned and tuned using the compact form of the Arabic dataset. • Text classifier evaluation: the text classifier shall be evaluated (using some performance measures). At the end of the above procedure, the TC system can implement the function of document classification. The following subsections are devoted to Arabic dataset preprocessing, FS methods, text classifier and TC evaluation measures. A. Arabic Dataset Preprocessing Since there is no publicly available Arabic TC corpus to test our classifier, we have used an in-house collected corpus from online Arabic newspaper archives, including Al-Jazeera, AlNahar, Al-hayat, Al-Ahram, and Ad-Dustour as well as a few other specialized websites. The collected corpus contains 1445 documents that vary in length. These documents fall into Nine classification categories that vary in the number of documents. In this Arabic dataset, each document was saved in a separate file within the corresponding category’s directory, i.e. the documents of this dataset are single-labeled. Table 1 shows the number of documents for each category. One third of the articles is randomly specified and used for testing and the remaining articles are used for training the Arabic text classifier
TABLE 1 ARABIC DATA SET Category
Training Articles
Testing Articles
Total Number
Computer Economics Education Engineering Law Medicine Politics Religion
47 147 45 77 65 155 123 152
23 73 23 38 32 77 61 75
70 220 68 115 97 232 184 227
Sports
155
77
232
Total
966
479
1445
As mentioned before, Arabic dataset preprocessing aims at transforming the Arabic text documents to a form that is suitable for the classification algorithm. In this phase, the Arabic documents are processed according to the following steps [5,11,28]: • Each article in the Arabic dataset is processed to remove digits and punctuation marks. • We have followed [15] in the normalization of some Arabic letters: we have normalized letters “( ”ءhamza), “ ( ” ﺁaleph mad), “( ” أaleph with hamza on top), “( ” ؤhamza on w), “” إ (alef with hamza on the bottom), and “( ”ئhamza on ya) to “( ”اalef). The reason for this normalization is that all forms of hamza are represented in dictionaries as one form and people often misspell different forms of aleph. We have normalized the letter “ ”ىto “ ”يand the letter “ ”ةto “”ﻩ. The reason behind this normalization is that there is not a single convention for spelling “ ”ىor “”يand “ ”ةor “ ”ﻩwhen they appear at the end of a word. • Arabic stop words (such as “”ﺁﺧﺮ, “”أﺑﺪا, “ ”أﺣﺪetc.) were removed. The Arabic function words (stop words) are the words that are not useful in IR systems e.g. pronouns and prepositions. • All non Arabic texts were removed. • The vector space model (VSM) representation [31] is used to represent the Arabic documents. In VSM, term frequency (TF) concerns with the number of occurrences a term i occurs in document j while inverse document frequency (IDF) concerns with the term occurrence in a collection of texts and it is calculated by IDF(i) = log(N /DF(i)) , where N is the total number of training documents and DF is the number of documents that term i occurs in. In IR, it is known that TF makes the frequent terms more important. As a result, TF improves recall (see TC evaluation measures subsection for recall definition). On the other hand, the IDF makes the terms that are rarely occurring in a collection of text more important. As a result, IDF improves precision (see TC evaluation measures subsection for precision definition). Using VSM in [32] shown that combining TF and IDF to weight terms ( IDF.TF ) gives better performance. In our Arabic dataset, each document feature vector is normalized to unit length and the IDF.TF is calculated.
SUPPORT VECTOR MACHINES BASED ARABIC LANGUAGE TEXT CLASSIFICATION SYSTEM
We have not done stemming, because it is not always beneficial [33] for TC tasks. And because it has been empirically proved that it is not beneficial [18,19] for Arabic TC tasks too. In fact, the same Arabic root, depending on the context, may be derived from multiple Arabic words. On the other hand, the same word may be derived from several different roots. Table 2 shows some words that are derived from the same root ‘ktb’ ()آﺘﺐ. On the other hand, Table 3 shows some roots that are derived from same word ‘ayman’ ()اﻳﻤﺎن. B. Feature Selection Methods FS is a process that chooses a subset from the original feature set, such that the selected subset is sufficient to perform classification tasks. It is one of the important research problems in data mining [34], pattern recognition [35], and machine learning [36]. FS methods have been widely applied to TC tasks [2,37,38,39,40,41]. FS basic steps are [36,42,43]: • Feature generation: in this step, a candidate subset of features is generated by some search process. • Feature evaluation: using an evaluation criterion, the candidate feature subset is evaluated. (This step measures the goodness of the produced features). • Stopping: using some stopping criterion, decide whether to stop or not, i.e. whether a predefined number of features are selected or whether a predefined number of iterations is reached. • Feature Validation: using a validation procedure, a decision is made whether a feature subset is valid or not. (As a matter of fact, this step is not a part of FS process itself, but in practice, we need to verify the validity of the FS outcome). Generally, FS algorithms are accomplished by the following common approaches [43,44]: • A filter-based method [34]: it selects a subset of features by filtering based on the scores which were assigned by a specific weighting method. • A Wrapper approach [36], where the subset of features is chosen based on the accuracy of a given classifier. • A Hybrid method [45]: takes advantage of the filter and wrapper methods. The major disadvantage of wrapper method [34,46] is its computational cost, this makes wrapper methods impractical for large classification problem. Instead filter methods are often used. In TC task, because the number of features is huge, an important consideration shall be made to select the right FS method to make the learning task efficient and accurate [42]: • FS improves the performance of the TC task in terms of learning speed and effectiveness. (Building the classifier is usually simpler and faster when less features are used). And it reduces data dimension and removes irrelevant, redundant, or noisy data. • On the other hand, FS may decrease accuracy (over-fitting problem [1,44], which may arise when the number of features is large and the number of training samples is relatively small).
13
TABLE 2 DIFFERENT ARABIC WORDS FROM THE ROOT ‘KTB’ Arabic Word آﺎﺗﺐ آﺘﺎﺑﻪ آﺘﺎب
Arabic pronunciation katib kitaba Kitab
English Meaning writer the act of writing some writing, book
In addition to classical FS methods [29] (Document frequency thresholding (DF), the X 2 statistics (CHI), Term strength (TS), Information gain (IG) and Mutual information (MI), Other FS methods have been reported in literatures such as Odds Ratio [47], NGL [48], GSS [49], etc. Table 4 contains the functions for TC commonly used FS methods [2], where t k denotes a term, c i denotes a category. DF for a term t k is the number of documents in which t k occurs, probabilities are interpreted on events of training document space, for example P(t k ,c i ) denotes the probability that a term t k occurs in a document x that does not belong to class c i , P(c i ) is estimated as the number of documents that do not belong to class c i divided by the total number of training documents. Functions are specified “locally” to a specific category c i ; to assess the value of a term t k in a global |C | category sense, either the sum ∑ i =1 f(t k ,c i ) , or the weighted |C | C| sum ∑ i =1 P(c i )f(t k ,c i ), or the maximum max i|=1 f(t k ,c i ) of their category-specific values f(t k ,c i ) are usually computed. In this paper, we have restricted our study on only five FS methods and in particular to CHI, NGL, GSS, OR and MI. C. Text Classifier SVM based classifiers are binary classifiers, which are originally proposed by [50]. Based on the structural risk minimization principle, SVM seeks a decision hyper-plane to separate the training data points into two classes and makes decisions based on the support vectors that are carefully selected as the only effective elements in the training dataset. SVM classifier is formulated in two different cases: the separable case and the non-separable case. In the separable case, where the training data is linearly separable, the optimization of SVM is to minimize: m in
1 2
2 | w | ,
(1)
s .t . ∀ i , y i ( x i w + b ) − 1 ≥ 0
In the non-separable case, where the training data are not linearly separable, the optimization of SVM is to minimize: TABLE 3 DIFFERENT ARABIC ROOTS FROM THE WORD ‘ayman’ Arabic Root
Arabic pronunciation
أﻣﻦ أﻳﻢ ﻣﺎن
eman Ayyiman Ayama’nu
English Meaning peace two poor people will he give support
14
MESLEH TABLE 4 COMMONLY USED FS METHODS CHI
NGL
(microP), macro-averaging recall (macroR) and microaveraging recall (microR) are used and defined as follows:
N .[ P ( t k ,c i ).P ( t k ,c i ) − P ( t k ,c i ) P ( t k ,c i )] 2 P ( t k ) P ( t k ) P ( ci ) P ( ci )
F1M =2[∑ i =1 R i ∑ i =1 Pi ] / N [∑ i =1 R i + ∑ i =1 Pi ], |C |
N .[ P ( t k ,c i ).P ( t k ,c i ) − P ( t k ,c i ) P ( t k ,c i )]
P ( t k ,c i ).P ( t k ,c i ) − P ( t k ,c i ) P ( t k ,c i )
OR
P ( t k | c i ).( 1 − P ( t k | c i )) ( 1 − P ( t k | c i )).P ( t k | c i )
MI
log
|C |
s.t.
1 | w |2 + C ∑ i ξ i , 2 ∀ i , y i (x iw + b ) − 1 + ξ i ≥ 0
|C |
|C |
|C |
(6)
(8)
macroR = ∑ i =1 R i / | C |
(9)
microP = ∑ i =1 TPi / ∑ i =1 (TPi + FPi )
(10)
microR = ∑ i =1 TPi / ∑ i =1 (TPi + FN i )
(11)
|C |
|C |
|C |
(2)
∀ i, ξ i ≥ 0 .
D. TC Evaluation Measures TC performance is always evaluated in terms of computational efficiency and categorization effectiveness. When categorizing a large number of documents into many categories, the computational efficiency of the TC system shall be considered, this includes: FS method and the classifier learning algorithm. TC effectiveness [51] is measured in terms of precision, recall and F1 measure. Denote the precision, recall and F1 measure for a class Ci by Pi , Ri and Fi , respectively. We have: Pi =TPi /(TPi + FP ),
(3)
R i =TPi /(TPi + FN i ),
(4)
Fi =2Pi R i /(R i +Pi ) = 2TPi /(FPi + FN i + 2TPi ).
(5)
Where: • TPi: true positives; the set of documents that both the classifier and the previous judgment (as recorded in the test set) classify under Ci . • FPi: false positives; the set of documents that the classifier classifies under Ci , but the test set indicates that they do not belong to Ci . • TNi: true negatives; both the classifier and the test set agree that the documents in TNi do not belong to Ci . • FNi: false negatives; the classifier does not classify the documents in FNi under Ci , but the test set indicates that they should be classified under Ci . To evaluate the classification performance for each category, precision, recall, and F1 measure are used. To evaluate the average performance over many categories, the macroaveraging F1 ( F1M ), micro-averaging F1 ( F1μ ), macroaveraging precision (macroP), micro-averaging precision
|C |
macroP = ∑ i =1 Pi / | C | , |C |
P ( t k ,c i ) P ( t k ).P ( c i )
m in .
|C |
F1μ =2∑ i =1 TPi /[∑ i =1 FPi + ∑ i =1 FN i + 2∑ i =1 TPi ] (7)
P ( t k ) P ( t k ) P ( ci ) P ( ci )
GSS
|C |
|C |
|C |
Macro-averaging F1 treats every category equally, and calculates the global measure as the mean of all categories’ local measures. On the other hand, micro-averaging F1 computes overall global measures by giving category’s local performance measures different weights based on their numbers of positive documents. To compute macroP and macroR, the precisions and recalls for each individual category are locally computed then averaged over the results of the different categories. On the other hand, to compute microP and microR, precisions and recalls are computed globally over all the testing documents for all categories. In this paper, we will focus on macro-averaging F1 , macroP and macroR. IV. EXPERIMENTAL RESULTS In our experiments, we have used the mentioned Arabic dataset for training and testing our Arabic text classifier. In addition to the mentioned preprocessing steps, we have filtered all terms with term frequency TF less than some threshold (threshold is set to Three for positive features and set to Six for negative features in training documents). We have used an SVM package, TinySVM (downloaded from http://chasen.org/~taku/), the soft-margin parameter C is set to 1.0 (other values of C shown no significant changes in results). To study the effect of FS, we have run a classification experiment without conducting any FS method, i.e. all the features are used (the result of this experiment is referred as original classifier). And in order to fairly compare the Five FS methods (CHI, NGL, GSS, OR and MI), we have conducted three groups of experiments. For each group and for each text category, we have randomly specified one third of the articles and used them for testing while the remaining articles used for training the Arabic classifier. And for each FS method, we have conducted three experiments: the first experiment selects the best 180
15
SUPPORT VECTOR MACHINES BASED ARABIC LANGUAGE TEXT CLASSIFICATION SYSTEM 94
90
Macro-averaging F1
92
MacroP
90 88 86 84
85
80
75
82 70
80 140
160
180
140
Average
Number of Features
CHI
NGL
GSS
OR
MI
CHI
All Features
features, the second experiment selects the best 160 features and finally the third experiment selects the best 140 features. Fig. 1 shows the macroP values for SVM classifier with the five different FS methods at different sizes of feature set. Compared with the original classifier (without feature selection i.e. all the 78699 features are used for training the SVM classifier), only CHI and NGL perform better. However, CHI is more stable than NGL (CHI outperforms original classifier at the three different feature sizes). However, the best SVM classification macroP result is obtained with NGL FS method (93.14 when selecting the best 180 features). In Fig. 2, we show the macroR results. It is observed that all the FS methods outperform the original classifier. CHI, NGL and GSS performed much better than OR and MI. However, the best classification macroR result is obtained with CHI (84.00 when selecting the best 140 features). Fig. 3 shows the macro-averaging F1 results. It is clear that all the FS methods outperformed the original classifier. CHI, 90
[5]
[6]
Average
MacroR
Number of Features
NGL
GSS
OR
MI
All features
Fig. 2. Macro-averaging recall values for SVM classifier with the five FS methods at different sizes of features.
All Features
REFERENCES
65
CHI
MI
Many thanks to Dr. Raed Abu Zitar for kind advice.
[4]
60
OR
ACKNOWLEDGMENT
[3]
70
GSS
We have investigated the performance of Five FS methods with SVM evaluated on an Arabic dataset. CHI, NGL and GSS performed most effective. On the other hand, OR and MI performed less effective. CHI performance is best. In future, we like to study more FS methods for our SVM based Arabic TC system. And we like to deeply investigate the effect of the FS methods on small categories (such as Computer).
80 75
NGL
V. CONCLUSIONS
[2]
85
180
Average
NGL and GSS performed much better than OR and MI. However, CHI outperformed NGL and GSS, and achieved its best macro-averaging F1 result when selecting the best 160 features.
[1]
160
180
Fig. 3. Macro-averaging F1 values for SVM classifier with the five FS methods at different sizes of features.
Fig. 1. Macro-averaging precision values for SVM classifier with the five FS methods at different sizes of features.
140
160
Number of Features
[7]
C. Manning, and H. Schütze, “Foundations of Statistical Natural Language Processing”, MIT Press (1999). F. Sebastiani, “Machine Learning in Automated Text Categorization”, ACM Computing Surveys, Vol. 34, No. 1, 2002, pp.1-47. A. McCallum, and K. Nigam, “A comparison of event models for naïve Bayes text classification”, AAAI-98 Workshop on Learning for Text Categorization, 1998, pp.41-48. Y. Yang, and X. Liu, “A re-examination of text categorization methods”, 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99), 1999, pp. 42-49. G. Guo, H. Wang, D. Bell, Y. Bi, and K. Greer, “An kNN Model-based Approach and its Application in Text Categorization”, Proceeding of 5th International Conference on Intelligent Text Processing and Computational Linguistic, CICLing-2004, LNCS 2945, Springer-Verlag, pages, 2004, pp. 559-570. Y.M. Yang, “Expert network: Effective and efficient learning from human decisions in text categorization and retrieval”, Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), 1994, pp. 13-22. T. Joachims, “Text categorization with Support Vector Machines: learning with many relevant features”, Proceedings of the European Conference on Machine Learning (ECML’98), Berlin, 1998, pp.137-142, Springer.
16 [8]
MESLEH
D. Lewis, and M. Ringuette, “A comparison of two learning algorithms for text categorization”, The Third Annual Symposium on Document Analysis and Information Retrieval, 1994, pp.81-93. [9] R. Schapire, and Y. Singer, “BoosTexter: A boosting-based system for text categorization”, Machine Learning, Vol. 39, No.2-3, 2000, pp.135168. [10] S. Gao, W. Wu, C-H. Lee, and T-S. Chua, “A maximal figure-of-merit (MFoM)-learning approach to robust classifier design for text categorization”, ACM Transactions on Information Systems, Vol. 24, No. 2, 2006, pp. 190-218. [11] J. Zhang, R. Jin, Y.M. Yang, and A. Hauptmann, “A modified logistic regression: an approximation to SVM and its applications in large-scale Text Categorization”, Proceedings of the Twentieth International Conference (ICML 2003), August 21-24, 2003, Washington, DC, USA, pp. 888-895. [12] F. Ciravegna, et.al., “Flexible Text Classification for Financial Applications: the FACILE System”, Proceedings of PAIS-2000, Prestigious Applications of Intelligent Systems sub-conference of ECAI2000, 2000, pp.696-700. [13] F. Peng, X. Huang, D. Schuurmans, and S. Wang, “Text Classification in Asian Languages without Word Segmentation”, Proceedings of the Sixth International Workshop on Information Retrieval with Asian Languages (IRAL 2003), Association for Computational Linguistics, July 7, Sapporo, Japan, 2003, pp. 41-48. [14] J. He, A-H. TAN, and C-L. TAN, “On Machine Learning Methods for Chinese document Categorization”, Applied Intelligence, 2003, pp. 311322. [15] A.M. Samir, W. Ata, and N. Darwish, “A New Technique for Automatic Text Categorization for Arabic Documents”, Proceedings of the 5th Conference of the Internet and Information Technology in Modern Organizations, December, Cairo, Egypt, 2005, pp. 13-15. [16] Sakhr company website: http://www.sakhr.com. [17] A.M. El-Halees, “Arabic Text Classification Using Maximum Entropy”, The Islamic University Journal, Vol. 15, No. 1, 2007, pp 157-167. [18] A.M. Mesleh, “CHI Square Feature Extraction Based SVMs Arabic Language Text Categorization System”, Proceedings of the 2nd international Conference on Software and Data Technologies, (Knowledge Engineering), Volume 1, Barcelona, Spain, July, 22—25, 2007, pp. 235-240. [19] A.M. Mesleh, “CHI Square Feature Extraction Based SVMs Arabic Language Text Categorization System”, Journal of Computer Science, Vol. 3, No. 6, 2007, pp. 430-435. [20] M. Elkourdi, A. Bensaid, and T. Rachidi, “Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm”, Proceedings of COLING 20th Workshop on Computational Approaches to Arabic Script-based Languages, Geneva, August 23rd-27th , 2004, pp. 51-58. [21] G. Kanaan, R. Al-Shalabi, and A. AL-Akhras, “KNN Arabic Text Categorization Using IG Feature Selection”, Proceedings of The 4th International Multiconference on Computer Science and Information Technology (CSIT 2006), Vol. 4, Amman, Jordan, April 5-7, 2006, Retrieved March 20, 2007, from http://csit2006.asu.edu.jo/proceedings. [22] R. Al-Shalabi, G. Kanaan, M. Gharaibeh, “Arabic Text Categorization Using kNN Algorithm”, Proceedings of The 4th International Multiconference on Computer Science and Information Technology (CSIT 2006), Vol. 4, Amman, Jordan, April 5-7, 2006, Retrieved March 20, 2007, from http://csit2006.asu.edu.jo/proceedings. [23] M. Syiam, Z. Fayed, and M. Habib, “An Intelligent System for Arabic Text Categorization”, International Journal of Intelligent Computing and Information Ssciences, Vol.6, No.1, 2006, pp. 1-19. [24] H. Sawaf, J. Zaplo, and H. Ney, “Statistical Classification Methods for Arabic News Articles”, Paper presented at the Arabic Natural Language Processing Workshop (ACL2001), Toulouse, France. ( Retrieved from Arabic NLP Workshop at ACL/EACL 2001 website: http://www.elsnet.org/acl2001-arabic.html). [25] R.M. Duwairi, “A Distance-based Classifier for Arabic Text Categorization”, Proceedings of the 2005 International Conference on Data Mining (DMIN2005), Las Vegas, USA, 2005, pp.187-192. [26] L. Khreisat, “Arabic Text Classification Using N-Gram Frequency Statistics A Comparative Study”, Proceedings of the 2006 International Conference on Data Mining (DMIN2006). Las Vegas, USA, 2006, pp.78-82. [27] R.M. Duwairi, “Machine Learning for Arabic Text Categorization”, Journal of American society for Information Science and Technology, Vol. 57, No. 8, 2006, pp.1005-1010.
[28] M. Benkhalifa, A. Mouradi, and H. Bouyakhf, “Integrating WordNet knowledge to supplement training data in semi-supervised agglomerative hierarchical clustering for text categorization”, International Journal of Intelligent Systems, Vol. 16, No. 8, 2001, pp. 929-947.. [29] Y.M. Yang, and J.O. Pedersen, “A Comparative Study on Feature Selection in Text Categorization”, In J. D. H. Fisher, editor, The 14th International Conference on Machine Learning (ICML’97), Morgan Kaufmann, 1997, pp.412-420. [30] G. Forman, “An Extensive Empirical Study of Feature Selection Metrics for Text Classification”, Journal of Machine Learning Research, Vol. 3, 2003, pp. 1289-1305. [31] G. Salton, A. Wong, and C.S. Yang, “A Vector Space Model for Automatic Indexing”, Communications of the ACM, Vol. 18, No. 11, 1975, pp. 613-620. [32] G. Salton, and C. Buckley, “Term weighting approaches in automatic text retrieval”, Information Processing and Management, Vol. 24, No. 5, 1988, pp. 513-523. [33] T. Hofmann, “Introduction to Machine Learning”, Draft Version 1.1.5, November 10, 2003. [34] M. Dash, M., K. Choi, P. Scheuermann, and H. Liu, “Feature Selection for Clustering-a Filter Solution”, Proceedings of the second International Conference of Data Mining, 2002, pp. 115-122. [35] P. Mitra, C.-A. Murthy, and S.-K. Pal, “Unsupervised Feature Selection Using Feature Similarity”. IEEE Transaction of Pattern Analysis and Machine Intelligence, Vol. 24, No. 3, 2002, pp. 301-312. [36] R. Kohavi, G.H. John, “Wrappers for feature subset selection”, Artificial Intelligence, Vol. 97, No. 1-2, 1997, pp.273-324. [37] E. Leopold, and J. Kindermann, “Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?”, Machine Learning, Vol. 46, 2002, pp. 423-444. [38] K. Nigam, A.K. Mccallum, S. Thrun, and T. Mitchell, “Text Classification from Labeled and Unlabeled Documents Using EM”, Machine Learning, Vol. 39, 2000, pp. 103—134. [39] D. Mladenic, “Feature subset selection in text learning”, Proceedings of European Conference on Machine Learning (ECML), 1998, pp. 95-100. [40] H. Taira, and M. Haruno, “Feature selection in SVM text categorization”, Proceedings of AAAI-99, 16th Conference of the American Association for Artificial Intelligence (Orlando, US, 1999), 1999, pp. 480-486. [41] D. Lewis, “Feature Selection and Feature Extraction for Text Categorization”, Proceedings of a workshop on speech and natural language, San Mateo, CA: Morgan Kaufmann, 1992, pp. 212-217. [42] M. Dash, and H. Liu, “Feature selection for classification”, Intelligent Data Analysis, Vol. 1, No. 3, 1997, pp.131-156. [43] H. Liu, and L. Yu, “Toward integrating feature selection algorithms for classification and clustering”, IEEE Transaction on Knowledge and Data Engineering, Vol. 17, No. 4, 2005, pp. 491-502. [44] H. Liu, “Evolving feature selection”, IEEE Intelligent Systems, 2005, pp.64-76. [45] S. Das, “Filters, wrappers and a boosting-based hybrid for feature selection”, Proceedings of the Eighteenth International Conference on Machine Learning, 2001, pp. 74-81. [46] A.L Blum, and P. Langley, “Selection of Relevant Features and Examples in machine Learning”, Artificial Intelligence, Vol. 97, No. 1-2, 1997, pp. 245-271. [47] D. Mladenic, and M. Grobelnik, “Feature Selection for Unbalanced Class Distribution and Naïve Bayes”, Proceedings of the Sixteenth International Conference on Machine Learning (ICML), 1999, pp. 258267. [48] H.T. Ng, W.B. Goh, and K.L. Low, “Feature Selection, Perceptron Learning, and A usability Case Study for Text Categorization”, Proceedings of SIGIR-97, 20th ACM International Conference on Research and Development in Information Retrieval, Philadelphia, PA, 1997, pp. 67-73. [49] L. Galavotti, F. Sebastiani, and M. Simi, “Experiments on the Use of Feature Selection and Negative Evidence in Automated Text Categorization”, Proceedings of ECDL-00, 4th European Conference on Research and Advanced Technology for Digital Libraries, Lisbon, Portugal, 2000, pp. 59-68. [50] V. Vapnik, “The Nature of Statistical Learning Theory”, SpringerVerlag, New York, 1995. [51] R. Baeza-Yates, and B. Rieiro-Neto, “Modern Information Retrieval”, Addison-Wesley & ACM Press, 1999.
Visual Attention in Foveated Images 1
2
Abulfazl Yavari , H.R. Pourreza
1
Jahad Daneshgahi Institute of Higher Education of Kashmar, Kashmar, Khorasan Razavi, IRAN 2 Ferdowsi University of Mashad, Mashad, Khorasan Razavi, IRAN E-mails: Abulfazl_yavari @yahoo.com,
[email protected]
Abstract – In this paper, we present a new visual attention system which is able to detect attentive areas in the images with nonuniform resolution. Since, one of the goals of the visual attention systems is simulating of human perception, and in human visual system the foveated images processed, therefore, visual attention systems should be able to identify the saliency region to these images. We test the system by two types of the images: real world and artificial images. Real world images include the cloth images with some defect, and the system should detect these defects. In artificial images, one element is different from the others only in one feature and the system should detect it. Index Terms – Visual Attention, foveated Images
I. INTRODUCTION A. Visual Attention The visual attention is a selective process that enables a person to act effectively in his complicated environment [1]. We all frequently use the word “attention” in our everyday conversations. A definition for attention is given in [2]: “attention defines the mental ability to select stimuli, responses, and memories”. These stimuli can be anything, for instance, think of conditions in which you are studying and concentrated on the text you are reading. Suddenly you are interrupted with a loud sound or you smell something burning and attract your attention. Similarly, there are stimuli in the environment that affect our vision, for example, a moving object, picture on the wall, or a bright area in a dark room. These are examples of cases where without any predetermined goal automatically attract our attention. This type of attention is called bottom-up attention [3](fig. 1).
Fig.1 a picture on the wall attract our attention (bottom-up attention)
There are other cases in which we are looking for a special object in the environment, and all the things that have similar features to that object, will attract our attention. Assume, for instance, that we are looking for a red pen on a table. Anything with red color or with a shape like a pen will attract us, and so we may find the desired object in the first or the next focuses. This type of attention is called top-down attention [3] (Fig. 2). B. Foveated Images Human vision system (HVS) is a space variant system [4]. It means that by receding from the gazing point, the resolution gradually decreases and only the totality of the scene will survive. Images that have this feature are called foveated images. The area with the highest resolution is called the fovea [5] (fig.3). We can find the source of this behaviour, by studying the eye’s structure. There are two kinds of vision cells on the retina, cone cells and rod cells. Cone cells are too much less than rod cells in number, but they are sensitive to color, and each cell is individually connected to a nerve. Cone cells are gathered in the fovea area in the retina. Rod cells are too much more than cone cells and they are in the area around the fovea. Multiple rod cells are connected to a single shared nerve and they are sensitive to light [6]. Because humans have non-uniform vision and in fact their brains perform special processing on foveated images, in this paper we concentrate on visual attention on foveated images. Up to now, visual attention is studied only on normal images (uniform resolution).
Fig.2 when we looking for a red pen on the table, regions that have red color, attract our attention (top-down attention)
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 17–20. © Springer Science+Business Media B.V. 2008
18
YAVARI AND POURREZA
In the original version of NVT, only the bottom-up attention was considered, but Navalpakkam and Itti introduced another version which was able to utilize the top-down attention too. The main idea was to have a system capable of learning the values of target’s features, by a set of training image. A drawback of this version was that nothing was considered about the simultaneous use of these two types of attention. B. VOCUS System
Fig.3 foveated image
Section 2 is dedicated to related works and we introduce what currently is performed in the field of visual attention. In section 3 we present our work with enough details and then the tests and evaluation results are discussed in section 4. Finally we present the conclusions and our future works in section 5. II. RELATED WORK The increased interest on research on visual attention and simulation of human behaviour together with the increased processing power of computers has led to development of a wide variety of models and systems on visual attention. In this section, we first introduce several important models on visual attention, and then we will explain the VOCUS system, which our system is based on. A. Basic Models The first computational architecture of visual attention was introduced by Koch and Ullman [7]. This model is based on the Feature Integration Theory (FIT)[8], and has provided useful algorithms for many of current systems. This bottom-up architecture is able to detect the salience regions of images by using three features: color, orientation and intensity. One of the recently introduced systems of visual attention was introduced by Milanese [9] [10]. This model is based on Koch and Ullman model, and uses filter operations to compute the feature maps [1]. It uses these features: orientation, local curvature, color, and intensity. One of the important aspects of this model is that it uses a conspicuity operator to identify the salient regions in a scene. This operator acts based on the On-Off and Off-On cells in human vision system, and computes the contrasts in images. Another system that is one of the best currently existing systems, is Neuromorphic Vision Toolkit (NVT) [11]. The features used in this system are color, orientation, and intensity, and all the computations are performed on the images pyramid. The idea of using a weight function to combine feature maps, and also the center-surround mechanism are used in this system.
In this section, we introduce VOCUS (Visual Object detection with a CompUtational attention System), developed by Frintrop [1]. It has many similarities to NVT system and it uses feature maps and saliency, Inhibition Of Return (IOR), which have been used also in many currently introduced systems. Our proposed system is very similar to this one from the implementation point of view and utilizes many of its ideas. But as one of the goals of visual attention systems is to simulate the human’s perception and his vision system, and foveated images are used in his vision system, it’s important to note that this system is unable to handle foveated images, but our system is capable of locating the target in such images. As described in [1], in this system, on input image, three different feature dimensions are computed: intensity, orientation, and color. The channel for the feature intensity extracts regions with strong intensity contrasts from the input image. First, the color image is converted into gray-scale. From the gray-scale image, a Gaussian image pyramid with five different scales s0 to s4 is computed. The intensity feature maps is computed by centersurround mechanism extracting intensity differences between image region and their surround, similar to ganglion cells in the human visual system [12]. Then the maps for each centersurround variation are summed up by across scale addition: first, all maps are resized to scale s2. Then maps are added up pixel by pixel. After summing up the scales this yields 2 intensity maps. Similar, orientation maps are computed from oriented pyramids [13]. The oriented pyramid in fact consists of four pyramids, one for each of the orientation 0, 45, 90,135. The orientations are computed by Gabor filters detecting bar like feature, according to a specified orientation. The orientation scale maps are summed up by across scale addition for each orientation, yielding four orientation feature maps of scales s2, one for each orientation. To compute the color feature map, the color image is firstly converted into an LAB-image, an LAB image pyramid is generated. From this pyramid, four color pyramids are generated for the distinct colors red, green, blue, and yellow. On these pyramids, the color contrast is computed by center surround mechanism. The maps of each color are rescale to the scale s2 and summed up into four color feature maps. The fusion of feature maps and then conspicuity maps is done by a weighted average. The maps are first weighted by a weight function, and then they are summed up. This function is:
19
VISUAL ATTENTION IN FOVEATED IMAGES
X =X / m
(1)
In which X indicates the map, and m is number of local maxima. Finally, the saliency map, shows the attentive area in the input image III. SYSTEM DESCRIPTION In this section we introduce details of our system that try to find region of interest in foveated images. The architecture of system shares the main concepts with the standard models of visual attention, especially based on the model of Koch and Ullman. The implementation is roughly based on the VOCUS, one of the best known attention systems currently. However, there are several differences in implementation details and type of the input images (Foveated images). The structure of our visual attention system is shown in fig. 4. As illustrated in figure; the input of the system is a foveated image, which may be generated by software or hardware. We use software tools to generate images with the fovea in the centre of the image. In fact, this image is similar to human’s perception from a scene (non-uniform resolution). At first step, a pre-processing is performed on the input images. It includes performing a 3*3 Gaussian filter on the input image, so that this filter is used multiple times in the center of image and it is used by a decreasing rate as we move from the center to the periphery. In any location in the image, the local variance of the image is used to determine the number of times that the filter is used in that location. Moving from the center to the periphery, the local variance decreases, so the number of iterations also is reduced. In the basic version of the system, this step was distributed in all steps of the system, and using an initiative idea, the similar processes were placed in one single step, and this led to a considerable improvement in the performance. In addition, because after performing this step on input images, in some cases, it is possible to continue the work by using VOCUS, we named this step “pre-process”. To gain better and more accurate results, we needed to change some details of the implementation of VOCUS. For example, we needed to change the weight function to decrease the value of unimportant maps, and increase value of other maps. We also changed the value of thresholds to make the system able to work in new conditions.
Fig.4 structure of the system
A. Experiment 1 This image is a tile image that in the top of it there is a black dot (fig.5). As illustrated in figure, our system detected the defect, but VOCUS not able to detect it. Since the resolution in the center of image is better than periphery, the VOSUS worked wrongly.
IV. EXPERIMENTS AND RESULTS In this section, we present experimental results on real world images and artificial images. We implement our system with MATLAB programming language. Our sample pictures include real images of cloth and tile with some defects, and the system should detect these defects. In artificial images, one element is different from the others only in one feature and the system should detect it. In each experiment, we first show that VOCUS is unable to find the defects and then show that our system is able to detect the defects correctly. It’s worth noting that the small green circle indicates the approximate location of the defect.
(a)
(b)
Fig.5. (a) the defect is detected by our system (b) but not by the VOCUS
20
YAVARI AND POURREZA
B. Experiment 2
V. CONCLUSION AND FUTURE WORKS
This image is a cloth image that there is a vertical red defect on it: the defect is detected by our system, but not by the VOCUS (fig.6) C. Experiment 3 This image is an artificial image (pop-out). There is a red blob among the blue one. In this image only one feature is difference: color (fig.7). D. Experiment 4 This image is also a cloth image, that there is a yellow defect on it (fig.8)
We presented a new visual attention system which is enabling of detecting attentive areas in the images with Nonuniform resolution, something that no other currently existing system is capable of. According to the standard attention model of Koch and Ullman, and VOCUS, we concentrate on the three features (color, intensity, orientation), since they belong to the basic feature of human perception and are rather easy to compute. Of course, this can only approximate human behaviour where many other features attract attention, e.g., size, curvature, and motion [1]. In future works, we intend to complete the top-down part of the system. In addition, we should improve the system’s speed to make it possible for being used in real time cases. We will also study addition of the motion feature, which has an important role in attracting the attention. ACKNOWLEDGEMENT The authors would like to thank Simone Frintrop for useful comments on this paper.
REFERENCES [1]
(a)
(b)
Fig.6. (a) the defect is detected by our system (b) but not by the VOCUS
(a)
(b)
Fig.7. (a) red bolo is detected by our system (b) but not by the VOCUS
(a)
(b)
Fig.8. (a) horizontal line is detected by our system (b) but not by the VOCUS
Frintrop, S: VOVUS: a Visual Attention System for Object Detection and Goal Directed Search. PhD thesis University of Bonn Germany. January 9, 2006 [2] Corbetta, M.: Frontoparietal cortical networks for directing attention and the eye to visual locations: Identical, independent, or overlapping neural system? Proc. Of the National Academy of sciences of the United States of America, 95:831-838. 1990 [3] Desimone, R. and Duncan, J.: Neural mechanism of selective Visual attention. Annual reviews of Neuroscience, 18:193-222. 1995 [4] Coltekin, A.: Foveation for 3D visualization and Stereo Imaging. PhD thesis Helsinki University of Technology Finland. February 3, 2006. [5] Chang, E.: Foveation Techniques and Scheduling Issues in Thinwire Visualization. PhD thesis New York University. 1998. [6] Gonzales, R. C. and Woods, R. E.: Digital image processing. AddisonWesley Publishing Company, 1992 [7] Koch, C. and Ullman, S.:Shifys in selective visual attention: Towards the underlying neural circuitry. Human Neurobiology 4 (4, 1985) 219227. [8] Treisman, A. M. and Gelade, G.: A feature integration theory of attention. Cognitive Psychology 12 (1980) 97–136. [9] Milanese, R.:Detecting Salient regions in an Image: From Bilogical Evidence to Computer Implimentation. PhD thesis, University of Geneva, Switzerland. 1993. [10] Milanese, R., Wechsler, H., Gil, S., Bost, J., and Pun, T.: Integration of bottom-up and top-down cues for visual attention using non-linear relaxation. In proc. Of the IEEE Conference on Computer Vision and Pattern Rcognition, pages 781-785. 1994. [11] Itti, L., Koch, C. and Niebur, E.: A model of saliency-based Visual Attention for Rapid Scene Analysis. IEEE Trans. On PAMI 20, pages 1254-1259, 1998. [12] Palmer, S. E.: Vision Science, Photons to Phenomenology the MIT Press 1999. [13] Greenspan, H., Belongie, S., Goodman, R., Perona, P., Rakshit, S., and Anderson, C.: Overcomplete steerable pyramid filters and rotation invariance. IEEE Computer Vision and Pattern Recognition (CVPR), pages 222-228, 1994.
Frequency Insensitive Digital Sampler and Its Application to the Electronic Reactive Power Meter Adalet N. Abiyev, Member, IEEE Department of Electrical and Electronics Engineering, Girne American University
[email protected] Abstract-This paper presents the new reactive power (RP) measurement algorithms and their realization on the electronic elements. The proposed Walsh function (WF) based algorithms simplify the multiplication procedure to evaluate the reactive components from instantaneous power signal. The advantages of the proposed algorithms have been verified by experimental studies. One of these advantages is that in contrast to the known existing methods which involve phase shift operation of the input signal, the proposed technique does not require the time delay of the current signal to the π / 2 with respect to the voltage signal. Another advantage is related to the computational saving properties of the proposed algorithms coming from use of the Walsh Transform (WT) based signal processing approach. It is shown that the immunity of the measurement results to the input frequency variation can be obtained if the sampling interval of the instantaneous power is adopted from the input power frequency. Validity and effectiveness of the suggested algorithms have been tested by use of a simulation tools developed on the base of “Matlab” environment. Keywords-coherent data acquisition, digital sampler, distortion power, frequency insensitive measurement, phase shift, reactive power measurement, signal processing, Walsh function.
I.
INTRODUCTION
Accurate measurement and evaluation of RP measurement is currently dictated by the strong demands to the electrical energy savings during transmission and distributions. The evaluation of the RP is also one of the important tasks in electric power industry, especially in the electrical energy quality estimation and control. The RP influences directly to the power factor and as a result overloads the connecting cables between the electrical energy sources and energy user and plays a vital role in the stable operation of power systems [1]. Last ten years various methods have been developed for RP measurement in both the sinusoidal and multi-harmonic (in presence of harmonic distortion) conditions. The extension of the wavelet transform to the measurement of RP component through the use of a broadband quadrature phase-shift networks is demonstrated in [2]. This wavelet-based power metering system requires the phase shift of the input voltage signal. In [3] the application of new frequency insensitive quadrature phase shifting method for reactive power measurements has been verified by using a time-division multiplier type wattmeter. An electronic shifter based on stochastic signal processing for simple and costeffective digital implementation of a reactive power and energy
meter was developed in [4]. The computer algorithm for calculating reactive (quadrature) power is proposed in [5]. The development of a method using artificial neural networks to evaluate the instantaneous reactive power is described in [6]. In this method the back-propagation neural network is used to approximate the reactive power evaluation function. In [7] the digital infinite impulse response filters are used to measure the reactive power. Although proposed algorithm allows to evaluate the harmonic components of the RP, the suggested method is still complex because the performing of the filtering procedures. A new application of the least error squares estimation algorithm for identifying the reactive power from available samples of voltage and current waveforms in the time domain for sinusoidal and non sinusoidal signals is proposed in [8]. Most of known research works are based on using the method of averaging the value of the product of the current samples and the voltage samples with shifting to the quarter one of the samples (current or voltage) relatively to another. Although the Fourier transform (FT) based digital or analogue filtering algorithms allow the evaluation of RP without shifting operation but a large number of multiplication and addition operations are required when applying FT algorithms for RP evaluation. For example for a 16 point DFT 16 2 = 56 complex multiplications and 16 × 15 = 240 complex addition operations are required. The various algorithms (for example FFT known as the Cooley Tukay algorithm) have been developed to reduce the number of multiplication and addition operations by use of the computational redundancy inherent in the DFT. Unfortunately, FT based algorithms are still computationally complex. The authors in [10] have analyzed WT algorithms employed to energy measurement process and they have shown that the Walsh method represents its intrinsic high-level accuracy due to coefficient characteristics in energy staircase representation. In [11] it is stated that decimation algorithm based on fast WT(FWT) has better performance due to the elimination of multiplication operation and low or comparable hardware complexity because of the FWT transform kernel. The new 2-Dimensional digital FIR filtering based algorithms for measuring of the RP are proposed in [12]. Ibid the WF based existing RP measurement algorithm is cited. The basic idea of this WF based algorithm consists in the resolving of the voltage and current signals separately along the WFs, at first, and then obtaining the RP as the difference of the products of the quadrature components. At least four multiplication-integration, two multiplication, and one
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 21–26. © Springer Science+Business Media B.V. 2008
22
ABIYEV
summation operations required for RP evaluation makes this algorithm comparatively complex and less convenient for implementation. It was the aim of this paper to evaluate RP component from instantaneous power signal without phase shift of π / 2 between the voltage and the current waveforms with relatively less computational demands. This objective was achieved by using the WF. The attraction of WF based approach to RP evaluation comes from the key advantages such as: (a) the WT analyzes signals into rectangular waveforms rather than sinusoidal ones and is computed more rapidly than, for example FFT [9]. WT based algorithm contains additions and subtractions only and as a result considerably simplifies the hardware implementation of RP evaluation. (b) a requirement of IEEE/IEC definition of a phase shift of π / 2 between the voltage and the current signals, typical for reactive power evaluation[3] is eliminated from signal processing operation. The paper is organized as follows. In section two a derivation of the WF based analogue and digital signal processing algorithms for RP evaluation is described. Realization of the proposed algorithm as well as the frequency insensitive electronic RP meter is proposed in section three. The simulation results of WF based algorithms are given in the section four. Section five includes the conclusion of the paper.
A. Analogue Measurement Algorithm Assume that in single-phase circuit a source voltage u (t ) and a current flowing through load i (t ) are the pure sinusoidal signals. Then the instant power p(t ) is given by [5],[8]
T
∫
p (t ) ⋅ Wal (3, t )dt =
0
1 T
T
∫
PWal (3, t )dt −
0
1 T
T
∫ P cos 2ωt ⋅Wal(3, t )dt 0
T
−
1 Q sin 2ωt ⋅ Wal (3, t ) dt T
∫
(3)
0
As can be seen from Fig.3(curves 2 and 3) the average value of the product of PWal (3, t ) as well as the of PCos 2ωt ⋅ Wal (3, t ) over the time T equal to zero. Thus, the first and second integrals in the right side of (3) become zero. Thereby (3) is rewritten as follows
i(t ) = 0.5 sin(ωt − 0.5) .
(1)
Where P = UI cos ϕ , Q = UI sin ϕ , ϕ is the phase angle between the voltage u (t ) and current i (t ) waveforms, ω = 2π / T , T is the cycling period of u (t ) . Timing diagram of the right side terms of (1) is shown in the Fig.1. Fig.2 represents the third order WF Wal (3, t ) with the normalized period of T [9]. Multiplication of both sides of (1) by the third order WF Wal (3, t ) gives p (t )Wal (t ) = PWal (3, t ) − P cos 2ωtWal (3, t ) − Q sin 2ωtWal (3, t )
1 T
Fig.1. Graphical interpretation of the power components defined by (1): ω = 2π / T , T = 1 / f , f = 50 Hz , u (t ) = 2.4 sin ωt
II. AN ANALOG AND DIGITAL SIGNAL PROCESSING ALGORITHMS FOR RP MEASUREMENT
p(t ) = P − [P cos 2ω t + Q sin 2ω t ]
As a next step we take integral from both sides of (2) over the time period of T :
Fig.2. Graphical interpretation of third order Walsh function.
(2)
Timing diagram representation of the right side terms of (2) is shown in the Fig.3. It is important to note that multiplication (1) by the Wal (3, t ) results in the full-wave rectification of the oscillating reactive component of the instantaneous power p(t ) (Fig.3, curve 1).
Fig.3. Graphical representation of the right side terms of (2).
23
FREQUENCY INSENSITIVE DIGITAL SAMPLER AND ITS APPLICATION
1 T
T
∫
p (t ) ⋅ Wal (3, t )dt = −
0
1 T
T
∫
Q sin 2ωt ⋅ Wal (3, t )dt .
(4)
0
By close inspection of Fig.3 (curve 1) this equation can be rewritten as follows 1 T
T
1
(5)
0
Solution (5) for Q results in a new proposed algorithm for measuring of RP:
Q=−
π 2T
Wal (3, β k ) . Multiplication of both sides of (7) by (9) and summation over the n = 0,1,2,..., N − 1 results in
T
∫ p(t ) ⋅Wal (3, t )dt = − T ∫ Q sin(2ω t ) dt 0
The argument, β k changes depending on normalized time of T=0.02sec. as shown in the Fig.4. Fig.5 depicts the β 2 and
T
∫ p(t ) ⋅Wal (3, t )dt
(6)
0
Carefully look at the curve 1(Fig.3) and (6) allows to summarize that the average value of the oscillating reactive power can be measured by use of the derived algorithm without the phase shift of π / 2 between the voltage and the current signals. Efficiency of this algorithm is verified in the section four.
B. Digital Measurement Algorithm For deriving the digital algorithm for RP evaluation (1) can be rewritten in discrete form as follows ⎡ ⎛ 4π ⎞ ⎛ 4π ⎞⎤ p (n ) = P − ⎢ P cos⎜ ⋅ n ⎟ + Q sin ⎜ ⋅ n ⎟⎥ ⎝ N ⎠ ⎝ N ⎠⎦ ⎣
1 N
N −1
∑ p(n)(−1)β n =0
2
=
1 N
N −1
∑ P(−1)β
2
−
n =0
−
1 N
1 N
N −1
∑ P cos(4πn / N )(−1)β
N −1
∑ Q sin(4πn / N )(−)β
2
.
(10)
n =0
As a result of the fact that the (−1) β 2 is periodic and orthogonal with the cos(4πn / N ) over the n = 0,1,2,..., N − 1 , the first and second terms on the right side of (10) become zero. Thereby (10) is given by 1 N
N −1
∑ p(n)(−1)β
2
=−
n =0
1 N
N −1
∑ Q sin(4πn / N )(−)β
2
(11)
n =0
As can be seen from Fig.5, third order discrete WF with the normalized period of T has the discrete values defined as:
(7)
Where n = 0,1,2,..., N − 1. N is the number of samples in power of 2. N is determined in accordance with sampling theorem[9]: N = T / Ts , Ts is the sampling period. In other to derive the digital algorithm for the RP evaluation we need the discrete expression of the WF given by [13],[14]:
Wal (i, β k ) = (− 1)
m ∑ ⎛⎜⎝ ωm − k +1 ⊕ωm − k ⎞⎟⎠ β k k =1
.
(8)
Fig.4. Time representation of final three bit coefficients of β k .
where i is order of WF in the WF system, i=0,1,2,…,N-1, β k is argument of WF and defines the bit(digit) coefficients of β k represented in binary code, β = (β1, β 2 ...β k )2 , β k = 0,1 , ω m is the bit(digit) coefficients of ω m represented in binary code, ω = (ω0 , ω1, ω2 ...ωm )2 , ωm = 0,1 , m is a binary representation of highest-order WF serial number in the WF system. For the evaluation of the reactive power component we use the third-order WF Wal (3, β k ) . For the third-order Walsh function ω = 3 . Since 3 = (0000011)2 the only ω 6 = 1 and ω 5 = 1 . Remaining bit coefficients of the ω m , m = 1,2,3,4 are equal to the zero. Consequently the third-order WF is given by
W (3, β 2 ) = (−1) (ω5 ⊕ω4 ) β 2 = (−1) (1⊕0) β 2 = (−1) β 2
(9)
2
n =0
Fig.5. Time representation of
β 2 and the third-order discrete WF,
Wal (3, t ) .
24
ABIYEV
⎧+1, at the intervals of [0, N/4) and [N/2,3N/4] ( −1) β 2 = ⎨ ⎩− 1, at the intervals of [N/4, N/2) and [3N/4, N - 1]
So (−1)β 2 is in phase with the Q sin(4πn / N ) . Considering this fact (11) is rewritten as
1 N
N −1
∑
P( n)(−1) β 2 = −
n =0
1 N
N −1
∑ Q sin(4πn / N )
(12)
n =0
Similarly to the (5), solution of (12) for Q results in a new digital algorithm for measuring of RP: Q=−
π 2N
N −1
∑ P(n)(−1) β
2
(13)
n =0
Realization of this algorithm on the electronic elements and its verification are described in the next section. III. REALIZATION OF DEVELOPED RP MEASUREMENT ALGORITHMS This section describes the WF based electronic RP meter. Its block diagram is shown in the Fig.6. The voltage and current signals are fed to the inputs of the analog multiplier(AD633), which produces time continuous output (Fig.7a) proportional to the instantaneous power p (t ) defined by (1). The output of AD633 is fed to the analog-to-digital converter(ADC0804) controlled by the control logic(CL). To achieve the frequency insensitive measurement the digital sampler(DS) generates sampling signals(Fig.7b) from input voltage signal u (t ) period shown in the Fig.7b. The ADC0804 converts the input signal p (t ) to the digital output p(n) . The digital samples p(n) given by (7) are shown in the Fig.7a. The output from the ADC0804 is fed to the inputs of the up-down counter(UDC) through the multiplexer. The multiplexer is used to connect the output of the ADC0804 to either the up input or the down input of the UDC in accordance with the discrete WF based RP measurement algorithm given by (13). Thus first and third quarter parts of the p (n) , i.e. n = 1,2,3,4,5,6,7,8,17,18,19,20,21,22,23,24 (See Fig.5 and Fig.8a) are entered to the “UP” input and the second and forth quarter parts of p (n) , i.e. Fig.5 n = 9,10,11,12,13,14,15,16,25,26,27,28,29,30,31,32 (See and Fig.8b) are entered to the “DOWN” input of the UDC.
Fig.6. Block diagram of electronic powermeter
Fig.7. Output waveforms: a) analog multiplier and ADC; b) input u (t ) and DS output signals.
Fig.8. Sampled power signal p(n) : a) first and third quarter parts; b) second and forth quarter parts.
FREQUENCY INSENSITIVE DIGITAL SAMPLER AND ITS APPLICATION
25
Ts = T / N .
(16)
So in accordance with (13) the remainder number in the UDC to the end of the second period of the input signal becomes equal to the RP of the investigated circuit. The display indicates the binary output of UDC. An important component of the electronic RP meter providing the independence of the measurement results from the input signal frequency variation is the DS(Fig.9). DS produces sampling signals with the frequency correlated with the input signal frequency. A zero crossing detector(Fig.9) produces the pulses with the cycling period proportional to the period T of the input voltage signal u (t ) . During first, so-called preparation period T the CL enables the output impulses of the clock to be passed through the binary ripple counter(BRC) only to the input of the binary storage counter(BSC). The counter capacity N of the BRC is defined in accordance with the Shannon criterion of sampling rate of the instantaneous power signal p (t ) . N = 2m , m is the number of bits of BRC. The number of impulses stored in the BSC to the end of preparation time length T , is given by M = ( f cT ) / N .
(14)
Where f c is the clock frequency. Starting at the beginning of the second period T the CL enables the output impulses of the clock to be passed only to the input of the binary counter(BC). The code comparator(CC) compares the parallel outputs of the BSC to the BC outputs. When the number of the clock impulses countered by BC becomes equal to M the CC produces output impulse. This impulse sets the BC to zero and becomes the first output sampling impulse of DS. The clock pulses continue to enter to the input of BC continuously over the second full period of T (Fig.7b). When the number of impulses counted by the BC again becomes equal to the M , the CC produces the second output impulse of the DS setting the BC to zero and so forth. The time interval Ts between the two neighboring output impulses of the DS is given by
Ts = M / f c .
So repetition interval Ts of the DS output impulses is N times less the T . This relationship can also be written in terms of input and output frequencies as follows fs = N ⋅ f .
(17)
Where f = 1 / T is the input signal frequency. Thus, sampling period Ts becomes the function of the period T of the input signal p(t ) being sampled. Carefully look at the derived expressions of (16) and (17) allows to state that designed electronic power meter meets the requirement of coherent data acquisition[15] and thereby is the good solution for the preventing the energy leaking from spectral components[9] when input signal frequency f varies because of the power distribution system instability. IV. SIMULATION RESULTS The simulation circuit was developed by use of the “Matlab6.5” simulation tools. During experimental studying the input voltage u (t ) and the current i (t ) signals were taken as u (t ) = 4 sin(314 t ) and i (t ) = 2 sin( 314 t − ϕ ) . Where ϕ is the phase angle between the voltage u (t ) and the current i (t ) signals shown in the Fig.10. The phase angle ϕ has been varied in the interval of ϕ = 0 − 90° . The time representation of the Wal (3, t ) , p(t ) and the product signal of p(t ) * Wal (3, t ) are shown in the Fig.10. The product signal of p(t ) * Wal (3, t ) is integrated and converted to digital form using the integrated ADC.
(15)
Where Ts is the sampling interval(See Fig.7b) and f s = 1/ Ts is the sampling frequency. Substitution (14) into (15) results in the expression relating sampling interval Ts to the input signal period T :
Fig.9. Digital sampler.
Fig.10. Timing diagram of the signals related to the simulation circuit.
26
ABIYEV
The results of the experimental verification are represented on the Table 1. From the Fig.11 it can be seen the error appearing because of the change of the phase angle φ in the interval of from 0 to 90°. Examination of the Table1 and Fig.12 proves the validity of the proposed algorithm (6) and its simple hardware realization by use of the commercially available electronic elements. The agreement between the calculating results and the simulation circuit outputs reveals: in contrast to the known existing methods for RP evaluation the proposed method does not require a phase shift of π / 2 between the voltage and current signals. The phase shift operation requires the corresponding hardwire[3] which may result in the additional measurement error. TABLE I SIMULATION RESULTS OF THE RP MEASUREMENT ALGORITHMS Outputs of the Percentage Results of simulation circuit error calculating
ϕ, D
P,W
Q,VAR
P,W
Q,VAR
δ, %
0
4,000
0,000
4,002
-0,0127
100
5
3,985
-0,348
3,987
-0,3613
3,558
15
3,864
-1,035
3,866
-1,047
1,169
25
3,626
-1,690
3,627
-1,702
0,724
35
3,277
-2,293
3,278
-2,305
0,508
45
2,830
-2,827
2,83
-2,837
0,342
55
2,296
-3,275
2,296
-3,284
0,259
65
1,693
-3,624
1,693
-3,629
0,131
75
1,038
-3,863
1,036
-3,867
0,103
85
0,352
-3,985
0,3469
-3,986
0,037
90
0,003
-4,000
0,0008
-4,0000
0,000
The requirement of IEEE/IEC definition of a phase shift of π / 2 between the voltage and the current signals, typical for reactive power measurement in sinusoidal conditions, is eliminated from signal processing operation. As a result an increased efficiency of computational operations and hardware implementation is achieved. It is shown that the immunity of the measurement results to the input frequency variation can be obtained if the sampling interval of the instantaneous power is adopted from the input power frequency. Designed electronic RP meter meets the requirements of coherent data acquisition and thereby is the good solution for the preventing of the energy leaking from spectral components when input signal frequency f varies because of the power distribution system instability. This solution can find wide application where immunity to the input signal frequency variation is of prime concern. REFERENCES [1] [2] [3]
[4] [5] [6]
[7] [8]
[9] [10]
Fig.11. Relation between the measurement error ΔQ and the phase-shift rate ϕ .
V. CONCLUSION The WT based algorithms proposed in this paper for measurement of the reactive power simplify the volume of computing operations in comparison with sets of algorithms based on decomposition of signals on harmonics (trigonometric components). The multiplication of the sampled values of the power signal by third order WF is reduced to the sign alteration of the sampled values from +1 to -1 only during even quarters of the input signal period.
[11]
[12] [13] [14] [15]
Fairney, W. “Reactive power-real or imaginary? ” , Power Engineering Journal,Volume 8, Issue 2, April 1994, pp.69 – 75. W.-K. Yoon and M.J. Devaney. “Reactive Power Measurement Usinq the Wavelet Transform”, IEEE Trans. Instrum. Meas.,vol. 49, pp 246252, April 2000. Branislav Djokic, Eddy So, and Petar Bosnjakovic. “A High Performance Frequency Insensitive Quadrature Phase Shifter and Its Application in Reactive Power Measurements” IEEE Trans. Instrum. Meas.,vol. 49, pp 161-165, February 2000. S.L.Toral, J.M.Quero and L.G.Franquelo. “Reactive power and energy measurement in the frequency domain using random pulse arithmetic”, IEE Proc.-Sci. Technol., vol. 148, No. 2, pp.63-67, March 2001. Makram, E.B.; Haines, R.B.; Girgis, A.A. “ Effect of harmonic distortion in reactive power measurement”, IEEE Trans. Industry Applications, Volume 28, Issue 4, July-Aug. 1992, pp.782 – 787. T.W.S.Chow and Y.F.Yam. “Measurement and evaluation of instantaneous reactive power using neural networks”, IEEE Transactions on Power Delivery, Vol. 9, No. 3, July 1994, pp.12531260. A. Ozdemir and A. Ferikoglu. “ Low cost mixed-signal microcontroller based power measurement technique”, IEE Proc.-Sci. Meas. Technol., Vol. 151, No. 4, July 2004,pp.253-258. Soliman S.A., Alammari R.A., El-Hawary M.E., Mostafa M.A. “Effects of harmonic distortion on the active and reactive power measurements in the time domain: a single phase system” Power Tech Proceedings, 2001 IEEE Porto, Volume 1, 10-13 Sept. 2001, 6 pp. Emmanuel C. Ifeachor, Barrie W. Jervis. Digital Signal Processing. A practical Approach. Second Edition. Prentice Hall,2002. Brandolini A., Gandelli A., Veroni F. “Energy meter testing based on Walsh transform algorithms”, Instrumentation and Measurement Technology Conference, 1994. Conference roceedings, 10-12 May 1994, pp 1317 – 1320, vol.3. Bai liyun, Wen biyang, Shen wei, and Wan xianrong. “Sample rate conversion using Walsh-transform for radar receiver”, Microwave Conference Proceedings, 2005. APMC 2005. Asia-Pacific Conference Proceedings.Volume 1, 4-7 Dec. 2005, 4 pp. M. Kezunovic, E. Soljanin, B. Perunicic, S. Levi, “ New approach to the design of digital algorithms for electric power measurements”, IEEE Trans. on Power Delivery, vol. 6, No 2, pp. 516–523,Apr. 1991. Gonorovsky N.S. Radio Engineering Circuits and Signals: Studies. M.: Sov. Radio, 1977. -608 p Trachtman A.M., Trachtman V.A. Fundamentals of the discrete signals’ theory on the finite intervals. - M.: Sov. Radio, 1975. -208 p. P. Carbone and D. Petri, “Average power estimation under nonsinusoidal conditions”, IEEE Trans. Instrum. Meas., vol. 49, No 2, pp. 333–336, Apr. 2000.
Non-Linear Control Applied to an Electrochemical Process to Remove Cr(VI) from Plating Wastewater Regalado-Méndez, A.(a), Tello-Delgado, D. (a), García-Arriaga H. O. (b) Universidad del Mar-Puerto Ángel, (a) Instituto de Industrias, (b) Instituto de Ecología Ciudad Universitaria S/N, Km. 1.5 Carretera Puerto Ángel – Zipolite Puerto Ángel, 70902, Oaxaca, México
Abstract- In this work the process to reduce hexavalent Chromium (Cr(VI)) in an electrochemical continuous stirred tank reactor is optimized. Chromium (VI) is toxic and potentially carcinogenic to living beings. It is produced by many industries and commonly discharged without any pretreatment. This electrochemical reactor itself can not satisfy the Mexican environmental legislation, which manages a maximum discharge limit of Chromium of 0.5 mg L-1. In order to comply with this restriction, a proportional non-linear control law is implemented. Through numeric simulations, it is observed that the proposed control law has an acceptable performance, because the setpoint concentration is reached no matter the inlet concentration. It is also observed that the stabilization time of the reactor with close loop is about 2.6 times less than with open loop.
Keywords: Electrochemical reduction, control gain.
reactor,
Chromium(VI)
I. INTRODUCTION One challenge of the modern chemical engineering consists in satisfying the requirements of the international industrial sector about the development of processes which allow reaching standards, not only in production, health and safety, but also in quality and environmental protection. In the last decades the Mexican industry have experienced a vigorous transformation and growth that together with the strengthen of the environmental policies and the necessity of incrementing the market competitiveness standards, have made that many industrialists invest in the fields of research and technology implementation into the productive process [1]. Metallurgy industry has been one of the main investors not only in the process, but also in the waste handling [2, 3]. A common waste generated in that sector corresponds to hexavalent Chromium (Cr(VI)), which seems to be toxic to the environment and potentially carcinogenic to the living beings [4-5]. The Mexican environmental legislation, in specific the norm NOM-001-SEMARNAT-1996,
establishes a maximum discharge permissible limit of 0.5 mg L-1 [6]. The traditional treatment method that used to reduce this pollutant into its trivalent form, consist in the addition of ferrous sulfate (FeSO4) directly in the residual matrix. Once the reaction has occurred, the addition of calcium hydroxide (NaOH) is necessary to precipitate the reduced Chromium [7]. The main inconvenient that this technique presents is the large amount of sludge generated. This residue according to the Mexican norm NOM-052-SEMARNAT1996 [8] has hazardous characteristics; therefore the necessity of a special handling appears to be obvious. Other techniques used to treat Cr(VI) are the use of bisulfite, evaporation, ionic interchange and electrochemical reduction [9]. Electrochemical reduction is an alternative process that has been widely studied and applied with success and represents a viable option of Chromium treatment [11]. In 2003, Rodriguez, et al. [11] published the design parameters of an electrochemical continuous stirred reactor that reduces Cr(VI) from industrial plating wastewater. This reactor has many advantages, for example, sludge generation is minimized, whereupon the treatment and ultimate disposal costs are reduced; treatment time is short; presents versatility and also low operational cost. Nevertheless, the main disadvantage is that it works under low inlet flows and concentrations. When these variables increase, the global efficiency decreases and the outlet Chromium concentration of 0.5 mg L-1 is not reached. In this work, a control law for the previous reactor is proposed, which consists of a feedback proportional nonlinear controller that through the manipulation of variables, such as dissolution rate and composition, keeps the outlet concentration closer to the design specifications and objectives, which is to remain in the neighborhood of the norm concentration value.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 27–32. © Springer Science+Business Media B.V. 2008
REGALADO-MÉNDEZ ET AL.
28
Feedback control reduces the disturbances in the system keeping the control variable close to the setpoint using a lecture device that measures and compares both variables until the difference is equal or close to the reference value. Afterward, the final control element, usually a valve, lets the control variable to exit [12-14]. Nonlinear proportional controllers aid to make more efficient the design parameters, equip the system with more stability, make the process more sensitive and robust in front to disturbances and setpoint changes, reduce the residence time of the reactor, that means the time that the reactor needs to reach the steady state, and the operational problems, such as oversaturation and instability [12, 15-18]. In the following section, the dynamic equations that represent the electrochemical reactor behavior and the design of the control law are presented.
The diagram of blocks of the control law is shown in Fig. 1.
⎡1 ⎤ ⎢ − 1⎥ / kpr ⎣α ⎦
θ
C
Cs
⎡1 ⎤⎡ β ⎤ ⎥ ⎢ − 1⎥ ⎢ ⎣ α ⎦ ⎣ kpr ⎦
Fig. 1. Scheme of the control law implemented.
The control law begins from the proportional controller general equation [19-20]: (3)
θ = θ + k p (Cs − C (t ))
II SYSTEM DESCRIPTION A. Kinetic model The dynamic model that represents the analytic behavior of the electrochemical reactor developed by Rodríguez, et al. [11] is the following: dCr (VI ) = θ (C0 − C (t )) − rCr (VI ) dt
(1)
Where: rCr (VI )
Variable C C0 rCr k1 K2 V F
θ t
k1C (t ) = 1 + K 2 C (t )
(2)
TABLE 1 VARIABLE DESCRIPTION Description Concentration of Chromium (VI) in solution (mg L-1), variable manipulate Inlet concentration of Chromium (VI) in solution (mg L-1) Chromium (VI) reduction rate (mg L-1 min-1) Rate constant (0.78 min-1) Absorption rate constant (0.155 min-1) Volume of reactor (L) Volumetric flow rate (m3 min-1) Inverse of residence time or dilution rate = F V-1 (0.052 min-1) Time (min)
III. CONTROL LAW DESIGN The control law design is developed to equip the reactor with robustness in front of disturbances and looks for the outlet variable to be keept in the neighbors of the setpoint, which is the norm concentration value, no matters how the inlet concentration is. The nonlinearity is implicitly in the dilution rate, (θ), and is funtion of the concentration.
θ
The sign function is related to the proportional control gain, kp, according to the following equations: ⎧k kp = ⎨ 1 ⎩ k0
if if
sgn(e) = e / abs(e)
sgn(e) ≠ sgn(e) sgn(e) = sgn(e) sgn(e) =
e abs ( e)
(4) (5)
The sign function of the error and its derivate take values of -1 or +1 with the conditions that t > 0 and e ≠ 0. The control gain values ko and k, are calculated in accordance to the relations proposed by Regalado and Álvarez-Ramirez [20], which as: ⎛1 ⎞ k0 = ⎜ − 1⎟ k pr ⎝α ⎠ k1 = β k0
(6) (7)
The kpr value, which corresponds to the process gain, is calculated using the Internal Model Control (IMC) technique. This technique requires calculating the slope of the tangent straight line of the relation between the dilution rate and the concentration: k pr =
dθ Δθ ≅ dC ΔC
(8)
Variables used in those equations are explained in table 2.
NON-LINEAR CONTROL APPLIED TO AN ELECTROCHEMICAL PROCESS TABLE 2 VARIABLE DESCRIPTION Description
Variable θ Cs kp kn α β kpr
140
optimum dilution rate, min-1 120
Setpoint concentration (mg L-1) Proportional control gain (L mg-1 min-1) Control gain (L mg-1 min-1 ) Controller acceleration constant Controller acceleration constant Process gain (min-1)
100
100
80 mg Cr(VI) / L
mg Cr(VI) / L
80
In the following section, the numerical simulation with open loop of the reactor dynamic model and the implementation of the designed control algorithm, are shown.
60
40
260 240
0 0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
θ (1/min)
300 mg/L 220 mg/L 130 mg/L 60 mg/L
280
40
0
A. Reactor kinetic equations modeling Fig. 2 shows the numerical simulation with open loop of the dynamic model represented by equations (1) and (2). The simulation were carried out using different inlet concentration of Cr(VI). As we can see, when this variable increases, the time to reach the steady state increases too and the reduction efficiency decrease.
300
60
20
20
IV. RESULTS AND DISCUSSIONS
220 200
mg Cr(VI) / L
29
180 160 140 120
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
θ (1/min)
Fig. 3. Bifurcation map of Cr(VI).
B. Control law implementation A proportional non-linear control algorithm in the dilution rate, θ, is implemented. The value of the process gain, kpr, was calculated by equation (8) using the bifurcation map, which was also needed to calculate the optimum dilution rate, θ , value. Using numerical simulations and maintaining the previous parameters fixed, the control law is implemented in the electrochemical reactor dynamic equation. Different values of control accelerator constants, α and β, were used in order to find which of those will provide the controller with the best performance.
100 80 60 40 20 0 0
10
20
30
40
50
60
70
80
Time (min)
Fig. 2. Performance of the dynamic model proposed to describe Cr(VI) removal in the electrochemical continuous stirred reactor with different inlet concentrations.
Fig. 3 would be the bifurcation map of a process, which is a very useful tool to observe the variation of the Cr(VI) outlet concentration in relation to the dilution rate (θ). This figure shows that the dilution rate should be near zero to reach the Cr(VI) concentration around the level stablished by the Mexican legislation. This means that the reactor has a batch behavior, which would reduce significantly the benefits of the continuous operation at industrial scale. To modify this behavior and get the reactor into continuous state, a control strategy must be implemented.
Fig. 4(a) shows that using a constant value of β = 0.04 and values in the interval of 0.1 ≤ α ≤1.0, different control performances are obtained. In this interval, the acceleration is only representative before reaching the steady state, where is observed a variation in the order of minute tenths, showing the best performance with smaller values of α. For this reason, the value of α = 0.1 is chosen as the optimum acceleration parameter. In Fig. 4(b), α is maintained constant (α = 0.1) and the intervals of variation of β are 0.001 ≤ β ≤ 0.1. The curve shows a similar behavior to the previous figure; however, it is also observed that for this accelerator parameter, the performance changes in the order of minutes. The optimum value is β = 0.1, which is the same value as the one of α.
REGALADO-MÉNDEZ ET AL.
30
CONTROL PAREMETERS β = 0.04 kpr = 0.1834 Cr(VI)0 = 130 mg/l θ = 0.025
0.52
0.51
OPEN LOOP CLOSE LOOP
140
α = 0.1 α = 0.3 α = 0.5 α = 0.7
130 120
CONTROL PARAMETERS α = 0.1 β = 0.1 kpr = 0.1834 θ = 0.0025 Cr(VI)0 = 130mg/L
110 100 90
mg Cr(VI) / L
mg Cr(VI) / L
control law, the outlet concentration is equal to the desired value. The close loop curve presents a smooth behavior to reach the setpoint needed, without overshot and oscillation.
0.50
80 70 60 50 40 30
32.80
32.85
32.90
32.95
33.00
20
TIME (min)
10 0 -10
(a)
0
10
20
30
40
50
60
70
80
90
100
TIME (min)
Fig. 5. Model performance with open and close loop. 3.0
CONTROL PARAMETERS α = 0.1 kpr = 0.1834 Cr(VI)0 = 130 mg/L
2.5
In Fig. 6, it is possible to note that the estimated values of the reaction rate converge to the real values of the kinetic term. In Fig. 5 and 6, it is observed that the steady state is reached at the same time.
θ = 0.025 mg Cr(VI)/L
2.0
β = 0.1 β = 0.01 β = 0.001
1.5
5.0
1.0
CONTROL PARAMETERS α = 0.1 β = 0.1 kpr = 0.1834 θ = 0.0025 Cr(VI)0 = 130mg/L
4.5
0.0 30
31
32
33
34
35
36
TIME (min)
(b) Fig. 4. Performance of the control model using diferent α and β values, mantaining one of those constant.
Optimum values of all control variables implemented in the control law are presented in table 3. TABLE 3 OPTIMUM CONTROL VARIABLE VALUES Variable Values Θ kp α β
0.0025 0.1834 0.1000 0.1000
Fig. 5 presents a comparison between the simulations with open and close loop for an initial inlet concentration of 130 mg Cr(VI) L-1. It is observed in these simulations, that the time needed to reach the steady state with open loop is around 90 minutes, meanwhile the time with close loop is approximately 35 minutes; which means near to 2.6 times less compared with the open loop interval. It is important to mention that the setpoint concentration with open loop is never reached; on the other side; when implementing the
REACTION RATE (mg/min)
4.0
0.5
3.5 3.0 2.5 2.0 1.5 1.0 0.5 0.0 0
5
10
15
20
25
30
35
40
45
50
TIME (min)
Fig. 6. Performance of the reaction rate when the control law is applied.
In Fig. 7 a saturation in the dilution rate (θ) is observed, for an initial inlet concentration of 130 mg Cr(VI) L-1, in the interval of 0 < t < 33. In this period, the controller decreases the dilution rate to zero to enable the Chromium content inside the reactor to be electrochemically reduced; later on, the dilution rate reaches a constant value of 0.4 mg min-1. This does not modify the setpoint concentration. The reactor always works under continuous state; when the system presents saturation, the feedback module supply the
NON-LINEAR CONTROL APPLIED TO AN ELECTROCHEMICAL PROCESS
inlet concentration. The zoom in the figure shows how the dilution rate is increased. 0.0030
0.0025
Apparently, α is not a very significant parameter, as its variations only affect in the order of minute tenths, whereas β seems a more significant parameter. Nevertheless, the reduction in the time needed to operate in steady state represents to the industry, even in order of minutes, an important increase in net profit, because the energy, water, technical and natural resources supply diminish.
0.0020
0.0025
0.0020
-1
θ (MIN )
-1
θ (MIN )
0.0030
0.0010
0.0015
0.0010
0.0005
0.0005
afterward the controller increases that value in function of the inlet concentration. The stability is also adequate, because a correlation exists between the concentration profile and the dilution rate.
CONTROL PARAMETERS α = 0.1 β = 0.1 kpr = 0.1834 θ = 0.0025 Cr(VI)0 = 130mg/L
0.0015
31
0.0000 32.83
32.84
32.85
32.86
32.87
32.88
32.89
32.90
TIME (min)
0.0000 0
5
10
15
20
25
30
35
40
45
50
The optimized model shows cost-effective benefits if implemented. Moreover, the studied process can comply with the Mexican legislation.
TIME (min)
Fig. 7. Performance of the non-linear model
Fig. 8 shows numerical simulations using different inlet concentration values. It is possible to note that for different inlet chromium concentration values, the controller maintains the outlet concentration in the neighborhood of the setpoint. The residence time in the reactor is function of the inlet concentration, and even at high concentration values it is relatively short.
60 mg Cr(VI)/L 95 mg Cr(VI)/L 130 mg Cr(VI)/L 170 mg Cr(VI)/L 210 mg Cr(VI)/L 300 mg Cr(VI)/L
300
250
mg Cr(VI)/L
200
150
It would be particularly interesting to develop the physical construction of the electrochemical reactor, and to verify the validity of the modeled results by the implementation of the control law proposed. VI. ACKNOWLEDGMENT The authors wish to thank their colleagues Prof. Carlos Rugerio, Dr. Miguel Manzanares and Dr. A. Aizpuru for who checked the text. VII. REFERENCES [1]
Bair, J., Gereffi, G., “Local Clusters in Global Chains: The Causes and Consecuences Of Export Dynamism in Torreon´s Industry”, World Development, vol. 29, no. 11, 2001, pp. 1885-1903.
[2]
Volke-Sepulveda, T., Velasco-Trejo, J.A., Rosa-Perez, D.A., “Suelos Contaminados por Metales y Metaloides: Muestreo y Alternativas para su Remediación”, INE-SEMARNAT, México, 2005, pp.19-31.
[3]
Carabias-Lillo, J., Provencio, E., Azuela, A., “Gestión Ambiental Hacia la Industria: Logros y Retos para el Desarrollo Sustentable 1995-2000”, SEMARNAP-ENE-PROFEPA, México, 2000, pp. 2078.
[4]
Martínez, S.A., Rodríguez, M.G., Aguilar, R, Soto, G., “Renoval of Chromium Hexavalent from Rising Chromating Waters Electrochemical Reduction i a Laboratory Pilot Plant”, Water Science and Technology, vol. 49, no. 1, 2004, pp. 115-122.
[5]
National Institute for Occupational Safety and Health, “Occupational Exposure to Hexavalent Chromium (CrVI)”, NIOSH Comments to OSHA, 29 CFR Part 1910 Docket No. H-0054a, November 20, 2002.
[6]
Secretaría del Medio Ambiente y Recursos Naturales, NOM-001SEMARNAT-1996, México, DF, México, 2007.
[7]
Eckenfelder, W., “Industrial Water Pollution Control”, McGraw-Hill, USA, 1989, pp. 87-123.
[8]
Secretaría del Medio Ambiente y Recursos Naturales, NOM- 052SEMARNAT-1996, México, DF, México, 2007.
100
50
0 0
5
10
15
20
25
30
35
40
45
50
TIME (min)
Fig. 8. Perfomance of the control law using different inlet concentration.
V. CONCLUSIONS The control law developed for this process shows a fine performance, it is robust to setpoint and inlet concentration changes; the outlet concentration is always maintained in the neighbors of the value required in the Mexican legislation of 0.5 mg Cr(VI) L-1. The reactor must have a certain period of time to reduce the Chromium content, where the controller reduces the inlet dilution rate to zero;
32 [9]
REGALADO-MÉNDEZ ET AL. Martínez, S.A., Rodríguez, “Dynamical Modeling of the Electrochemical Process to Remove Cr(VI) from Watewaters in a Tubular Reactor”, J. Chem. Technol. Biotechnol, vol. 82, 2007, pp. 582-587.
[10] Rodríguez, M.G., Aguilar, R., Martínez, S.A., Soto, G., “Modeling an Electrochemical Process to Remove Cr(VI) from Rinse-Water in a Stirred Reactor”, J. Chem. Technol. Biotechnol, vol. 78, 2003, pp. 371-376. [11] Martínez, S.A., Rodríguez, M.G., Aguilar, R., Soto, G., “Removal of Chromium Hexavalent From Rising Chromating Waters Electrochemical Reduction in a Laboratory Pilot Plant”, Water Science and Technology, vol. 49, no. 1, 2004, pp. 115-122. [12] Stephanopoulos, G., “Chemical Process Control, An Introduce to Theory and Practice”, PTR Prentice Hall, 1a Ed., USA, 1984, pp. 110, 113-126, 241, 297. [13] Smith, C.A., Corripio, A.B., “Control Automático de Procesos, Teoría y Práctica”, Limusa, México, 1994, pp. 17-25, 177-223, 419, 505. [14] Ogata, K., “Introducción a los Sistemas de Control”, Ingeniería de Control Moderna, Prentice Hall, 3ª Ed., México, 1997, pp. 1-11.
[15] Dullerud, G.E., Paganini, F.G., “A Course in Robust Control Theory, A Convex Approach”, University of Illinois and University or California, pp.- 1-15, 97-124. [16] Stephanopoulos G., “Control Systems with Multiple Loop in Chemical Process Control”, Prentice Hall, USA, 1984, pp. 125-148. [17] Arjit-Sagale, A., pushpavanam, S., “A Comparison of Control Strategies for Nonlinear Reactor-Separator Network Sustaining an Autocatalytic Isothermal Reaction”, Ind. Eng. Chem. Res., vol. 41, 2002, pp. 2005-2012. [18] Alvarez-Ramirez, J., Morales, A., “Robust Stabilization of Jacketed Chemical Reactors by PID Controllers”, Chem. Eng. Sci., vol. 56, 2001, pp. 2775-2787. [19] Regalado-M, A., Control Lineal de Composición en Reactores Contimuos con Estructuras Básicas de Control, Tesis de Maestría en Ciencias en Ingeniería Química, Universidad Autónoma Metropolitana – Iztapalapa, 2003, pp. [20] Regalado M. A. y Álvarez-Ramirez J., “Composition Linear Control in Stirred Tank Chemical Reactors”, New Mathematics and Natural Computation, Vol. 3, No. 3, pp. 385-398.
Semantics for the Specification of Asynchronous Communicating Systems (SACS) A.V.S. Rajan, S. Bavan, G. Abeysinghe School of Computing Science Middlesex University The Borroughs, London NW4 4BT UK Abstract –The objective of the paper is to describe the formal definitions for the Specification of Asynchronous Communicating System (SACS). This is a process algebra which is a descendent of the synchronous variant of Calculus of Communicating Systems (CCS) known as Synchronous Calculus of Communicating Systems (SCCS). To this end, we present Structured Operational Semantics (SOS) for the constructs of SACS using Labelled Transition Systems (LTS) in order to describe the behaviour of the processes. Also, we discuss the semantic equivalences for SACS, especially bisimulation, which provides a method for verifying the behaviour of a specified process. Keywords: Calculus of Communicating Systems (CCS), Synchronous Calculus of Communicating Systems (SCCS), Structured Operational Semantics (SOS), Labelled Transition System (LTS), bisimulation.
I. INTRODUCTION Since the development of Calculus of Communicating Systems (CCS) [5] and other process algebras, many extensions to these process algebras have been proposed to model different aspects of concurrent processing [8]. The semantic equivalences based on bisimulation are defined for these process algebras whose behaviours are described by structured operational semantics and expressed as labelled transition systems. Two programs are considered semantically equivalent if they cannot be distinguished. Semantic equivalences are used to abstract away the internal structure of the programs that cannot be observed. They have also provided a successful method to verify program behaviours [4]. Verifying a program means to show that the program is behaviourally equal to its specification. The Specification of Asynchronous Communicating System (SACS) [1, 2], a point-to-point message passing system, is a process algebra. This is an asynchronous variant of Synchronous Calculus of Communicating Systems (SCCS) [4], which in turn is a synchronous variant of Calculus of Communicating Systems (CCS) [5]. SACS uses the same syntax as that of SCCS but its semantics are different and governed by four design rules. The main aim of SACS is to separate communication from computation so that these two activities can proceed independently. By applying restrictions to the manner in which the SCCS is used, SACS has been derived to specify an asynchronous message passing system that uses point-to-point communication.
This paper presents formal definitions for the syntax and semantics of SACS. Firstly, formal syntax for SACS has been defined. From the defined formal syntax, the Structured Operational Semantics (SOS) for SACS has been demonstrated. Labelled Transition Systems are used to describe the operational semantics of SACS. This paper also investigates the concept of equivalence. This discussion is focussed on bisimulation equivalence for SACS as it is a powerful technique to verify the behaviour of the processes. The paper is structured as follows: Section II describes the syntactic categories of SACS. Section III describes the SOS for SACS. Section IV discusses the bisimulation equivalence for SACS. Section V concludes the discussion. II. SYNTACTIC CATEGORIES OF SACS The basic elements of SACS are listed below and the major syntactic Categories for SACS are listed in Table 1. 1. (channel, port) names: a, b, c, …. where ports are the observable parts of an agent/process which support either the sending or the receiving of information and channels are individual paths through which data (signals) can flow. 2. co-names: input channels ::= a?, b?, c?... where “?” used to denote that the channel is waiting for input. output channels ::= a!, b!, c!,… where “!” is used to denote sending the output. Ports having the same names synchronise/interact. 3. silent action: τ ::= 1, ∂ where 1 denotes an idle event which introduces delay in the system for synchronising the send/receive pairs. When a delay has to be introduced in a process, an idle event is introduced with a suffix “:” followed by the rest of the process body. ∂ introduces a delay before a value is received or sent by the channel. ∂a is replaced by 1: ∂a + a (e.g. ∂a! = 1 : ∂a + a!). 4. prefix ::= τ | a? ( . | : | + ) b? ( . | : | + ) c? ( . | : | + ) … where a? ( . | : | + ) b? ( . | : | + ) c? ( . | : | + ) … is a guard where a, b, c are the input ports. 5. Process_agent/node ::= @ | GP1 + GP2 + GP3 + ... + GPk where @ denotes inaction – do nothing. This is a stop action in CCS.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 33–38. © Springer Science+Business Media B.V. 2008
34
RAJAN ET AL.
GP1, GP2, GP3,…, GPk are the guarded processes in a where the reproduction of P is as many times as needed and Process_agent/node. the number of reproductions is finite 6. Guarded Process: GP1 ::= αP1 III. STRUCTURED OPERATIONAL SEMANTICS FOR SACS performs action α and then behaves like P1; it is same as that Structured Operational Semantics is considered to be the of α → P in CSP and CCS. standard way of giving formal descriptions to concurrent where α is (input_ch1_1 | 1)[( . | : | + ) input_ch1_n] and P is programs and systems which include process algebras. In SOS, the process body followed by (output_ch1 | 1 ) [( . | : | + ) output_ch1_n] : (@ | Process_agent) the behaviour of processes are modelled using transition relations called a Labelled Transition System (LTS). From the 7. System ::= P x Q x R x ….. derivation of SOS, it is clear that these transition relations are where P, Q, R, .... are Process_agents/nodes and x is the created by the inference rules that follow syntactic structure of concurrency operator. It is same as that of P|Q in CCS and P||Q the processes. in CSP except that P x Q is asynchronous message passing. Definition 1(Labelled Transition System): 8. Operators: A Labelled Transition System (LTS) is a triplet {S, K, T} where: 'x' - concurrency operator '.' - simultaneous AND operator ':' - sequential AND operator '+' - OR operator. • S is a set of states, TABLE
I
SYNTACTIC CATAGORIES OF SACS
• •
K = { k | k ∈ K }, k T = { ⎯⎯→ , k ∈ K } is a transition relation where k is a binary transition relation on S. ⎯⎯→ K is a set of labels where
We will write
k . k s' instead of (s, s' ) ∈ ⎯⎯→ s ⎯⎯→
LTS is a set of inference rules used to specify the operational semantics of the calculus. It is defined using the syntactical structure of the term defining the processes and it describes the observational semantics [6].
The transitions of SACS are defined by the inference rules shown in Fig 1.
As in the case for CCS and SCCS, α and β of SACS are i. Disjoint ( α ∩ β = φ ) and ii. Bijection via the complementation function (on κ where α ⊆ κ & β ⊆ κ such that a! ? = a and a? ≠ a! for all a ∈ κ, a ? ∈ α, and a! ∈ β ).
The set of α ∪ β is a visible set of actions. Let $ is the operator used to denote recursion, $P denotes recursion where P is the recursive process. That is, $P = P x $P = P x P x $P = … = P x P x … x $P
Fig. 1. Rules for Generating the LTS of SACS
SEMANTICS FOR THE SPECIFICATION OF ASYNCHRONOUS COMMUNICATING SYSTEMS
Example 1: Consider a vending machine that requires a user to insert a coin and press a button, after which the machine will make and serve a drink. This scenario can be further extended by allowing a user to order more than one drink. In this case, the user may repeat the above behaviour a number of times or alternatively order all the drinks (by continuing to insert coins and press buttons) before removing any served drinks, allowing them to queue up in order and then remove the drinks as they are delivered. It means that the vending machine namely VENDING_MACHINE1 has six processes namely: Init (host), Customer, Machine_Interface1 and Machine_Interface2 and Coffee_Machine and Tea_Machine. These can be combined into a single process at the expense of loosing vital information which is important to understand the behaviour of the intended system. The init process will send the trayEmpty signal to the Coffee_Machine and Tea_Machine and then terminate. A pictorial specification for this problem is depicted in Figure 2. Dotted lines show the optional actions which can take place one at a time. Let CUST denote CUSTOMER, MAC_INT1 denote MACHINE_INTERFACE1, MAC_INT2 denote MACHINE_INTERFACE2, COF_MAC denote COFFEE_MACHINE and TEA_MAC denote TEA_MACHINE.
Fig. 3. SACS Specification for VENDING_ MACHINE1
Fig. 2. Network Diagram for VENDING_ MACHINE1
Fig. 3 shows the SACS specification and Fig 4, the Structured Operational Semantics generated from the rules for LTS for Vending Machine1.
Fig. 4. SOS for VENDING_ MACHINE1
35
36
RAJAN ET AL.
IV. EQUIVALENCE RELATION FOR SACS Two expressions are said to be equivalent when no observation can distinguish the differences between them and they both describe the same system. Definition 2 (Equivalence Relation): An equivalent relation between two processes P, Q ∈ ℜ is written as P ≡ Q binary relation such that it is: i. Reflexive: P ≡ P ii. Symmetrical: if P ≡ R then R ≡ P iii. Transitive: if P ≡ Q and Q ≡ S then P ≡ S. Based on the structure of transition relation, many different types of equivalences can be defined on processes. Most commonly studied process equivalences on LTS include Milner’s simulation equivalence, bisimulation equivalence derived from simulation [7], and trace equivalence etc. This paper focuses on bisimulation equivalence as it is a suitable equivalence when reasoning about concurrent processes [3]. A.
‘~’ is used to refer bisimulation. P ~ Q specifies that P and Q are strongly bisimilar. Example 2: Consider a modified version of VENDING_MACHINE1 considered in Section III namely VENDING_MACHINE2 in which the two machine interfaces are replaced with one machine interface. This means that there are five processes: Init (host), Customer, Machine_Interface and Coffee_Machine and Tea_Machine. Let CUST be CUSTOMER,MAC_INT be MACHINE_INTERFACE, COF_MAC be COFFEE_MACHINE and TEA_MAC be TEA_MACHINE. Fig. 5 shows the network diagram for this version.
Bisimulation Equivalence
Definition 3 (Simulation): A binary relation B ⊆ ℜ x ℜ is a simulation whenever α α 1 1 P B Q, if P ⎯ ⎯→ P' then there is Q ⎯ ⎯→ Q' such that P’ B Q’ where P, Q ∈ ℜ and α1 ∈ α. Two processes are bisimulation equivalent if they have the same traces and the states that they reach are still equivalent. Any relation is an equivalent relation if it could be shown to be reflexive, symmetrical, and transitive. In SACS, bisimulation relation is equivalence, as it is: 1. 2. 3.
Reflexive, P B P Symmetrical, P B Q ⇒ Q B P Transitive, (P B Q) ^ (Q B R) ⇒ P B R
where P, Q, R ∈ ℜ and B denotes bisimulation relation.
Fig. 5. Network Diagram for VENDING MACHINE2
Let VENDING_MACHINE1 and VENDING_MACHINE2 be denoted as V1 and V2 respectively. SOS using LTS for V1 has been generated from its SACS specification and it is shown in Section III. Let us define one possible SACS specification for V2 and derive its SOS specifications. Fig. 6 and Fig. 7 show the SACS and SOS specifications for V2. Prove that V1 and V2 are strongly bisimilar.
An algorithm is needed to prove that two processes/systems in SACS are in bisimulation. In the following section we define the strong and weak bisimulations. B.
Strong Bisimulation Equivalence
Strong bisimulation checks to see whether two agents are equivalent for all their actions, both internal and external. Definition 4 (Strong Bisimulation): A binary relation over the set of states of an LTS is a strongly bisimulation equivalence relation whenever P ~ Q for P, Q ∈ ℜ and α1 ∈ α: α α 1 1 - if P ⎯ ⎯→ P' then for some Q’, Q ⎯ ⎯→ Q' such that P’ B Q’. Conversely, α 1 - if Q ⎯⎯→ Q' then for some P’, P that Q’ B P’.
α
1 ⎯⎯→
P' such
Fig. 6. SACS Specification for VENDING_ MACHINE2
SEMANTICS FOR THE SPECIFICATION OF ASYNCHRONOUS COMMUNICATING SYSTEMS
37
Fig. 7. SOS for VENDING_ MACHINE2
The bisimulation is presented using the LTS as these are the most common structures upon which the bisimulation is studied. Having defined their SOS specifications, the LTS have to be defined in order to be able to compare the transitions/actions. Let the LTS of V def (S1, K1, T1)where 1 S1 def {INIT, CUST, MAC_INT1, MAC_INT2, COF_MAC,TEA_MAC}
K1 def {trayEmpty, coin, Deliver_coffee, Deliver_Tea, Coffee, Tea} Transitions for V1 derived from the SOS of V1 shown in Fig.4 are listed below:
T def {t2 , t2 , t2 , t2 , t2 , t2 , t2 , t2 } 1 1 2 3 4 5 6 7 8
Proof : To prove that V1 ~ V2 we must prove they simulate each other. Starting from their initial states we walk through their actions. Checking (INIT of V1, INIT of V2), t11 in V1 ⇒t21 in V2 and t21 in V2⇒t11 in V1 {(INIT of V1,INIT of V2)} ⊆ ~ if INIT’ of V1 ~ INIT’ of V2) Checking (CUST of V1,CUST of V2) which has two sub transitions CUST_AGT1 and CUST_AGT2, t12, t13, and t14 in V1 ⇒ t22, t23, and t24 in V2 t22, t23, and t24 in V2 ⇒ t12, t13, and t14 in V1 {(CUST_AGT1 of V1, CUST_AGT2 of V2)} ⊆ ~ if CUST_AGT1’ of V1 ~ CUST_AGT1’ of V2) {(CUST_AGT2 of V1, CUST_AGT2 of V2)} ⊆ ~ if CUST_AGT2’ of V1 ~ CUST_AGT2’ of V2) ⇒ {(CUST of V1, CUST of V2)} ⊆ ~ if CUST1 of V1, CUST1 of V2 Therefore, {(INIT of V1, INIT of V2), (CUST of V1, CUST of V2)} ⊆ ~. Checking (MAC_INT1 of V1, MAC_INT1 of V2),
T def {t1 , t1 , t1 , t1 , t1 , t1 , t1 , t1 } 1 1 2 3 4 5 6 7 8
(t15 in V1 ⇒t25 in V2 and t25 in V2⇒t15 in V1
Let the LTS of V def (S2, K2, T2)where 2
{(MAC_INT1 of V1, MAC_INT1 of V2)} ⊆ ~ if MAC_INT1’ of V1 ~ MAC_INT1’ of V2)
S2 def {INIT, CUST, MAC_INT, COF_MAC, TEA_MAC} K2 def {trayEmpty, coin, Deliver_coffee, Deliver_Tea, Coffee, Tea}
Transitions for V2 derived from the SOS of V2 shown in Fig.7 are listed below:
Checking (MAC_INT2 of V1, MAC_INT2 of V2), t16 in V1 ⇒t26 in V2 and t26 in V2⇒t16 in V1 {(MAC_INT2 of V1, MAC_INT2 of V2)} ⊆ ~ if MAC_INT2’ of V1 ~ MAC_INT2’ of V2)
38
RAJAN ET AL.
Therefore, {(INIT of V1, INIT of V2), (CUST of V1, CUST of V2),
and conversely, α1 ii. if Q → Q', then either i. α1 = 1 and P ≈ Q’ or
(MAC_INT1 of V1, MAC_INT1 of V2), (MAC_INT2 of V1, MAC_INT2 of V2)} ⊆ ~. Checking (COF_MAC1 of V1, COF_MAC1 of V2), t17 in V1 ⇒t27 in V2 and t27 in V2⇒t17 in V1
ii. for some P’, P
V. CONCLUSION
{( COF_MAC1 of V1, COF_MAC1 of V2)} ⊆ ~ if COF_MAC1’ of V1, COF_MAC1’ of V2 Therefore, {(INIT of V1, INIT of V2), (CUST of V1, CUST of V2), (MAC_INT1 of V1, MAC_INT1 of V2), (MAC_INT2 of V1, MAC_INT2 of V2), (COF_MAC of V1, COF_MAC of V2)} ⊆ ~ Checking (TEA_MAC1 of V1, TEA_MAC1 of V2), t18 in V1 ⇒t27 in V2 and t28 in V2⇒t18 in V1
{( TEA_MAC1 of V1, TEA_MAC1 of V2)} ⊆ ~ if TEA_MAC1’ of V1, TEA_MAC1’ of V2 Therefore, {(INIT of V1, INIT of V2),
This paper presents the Structured Operational Semantics for SACS, process algebra proposed for asynchronous communication. Study on equivalence relation shows that it can be used to define complex systems and can also be employed in constructing algebra used to progress a design. Various kinds of equivalence are studied for SACS. Out of all the equivalence relations, strong bisimulation and weak/observational bisimulations are discussed. If the two systems are proved to be strongly bisimilar, they are also weakly bisimilar. But the converse always does not hold. While strong bisimulation compares both internal and external behaviours, the weak bisimulation compares only the external behaviours of the processes. This property is most widely used to study concurrent systems. The formal semantics defined for the SACS not only specifies the asynchronous communication that takes place in LIPS but also serve as a reference manual for developers and implementers of the language.
REFERENCES
(CUST of V1, CUST of V2), (MAC_INT1 of V1, MAC_INT1 of V2), (MAC_INT2 of V1, MAC_INT2 of V2), (COF_MAC of V1, COF_MAC of V2), (TEA_MAC of V1, TEA_MAC of V2)} ⊆ ~ Each action in each state of each process of V1 can be simulated by an action of V2 in an associated state, and so we can conclude that, V1 ~ V2 Now we proceed to check that S is a strong bisimulation, but it is clear from the construction of S that it must be a strong bisimulation. C.
Weak/Observational Bisimulation Equivalence
A binary relation over a set of states of the LTS system of SACS is a weak bisimulation if and only if whenever P ≈ Q where P, Q ∈ ℜ and α1 ∈ α: i. if P
α
1 →
P' ,
then either
i. α1 = 1 and P’ ≈ Q or ← ii. for Q’, Q
Q’ and P’ ≈ Q’
P’ and P’ ≈ Q’.
This derivation from Gray’s [4] weak bisimilar/observational equivalence can be used to study system equivalence under weak bisimulation.
[1] S. Bavan, E. Illingworth, A. Rajan, and G. Abeysinghe, 2007. “Specification of Asynchronous Communicating Systems (SACS)”. In: Proceedings of the 2007 IADIS conference on Applied Computing, Salamanca, Spain, 17-20 Feb 2007. [2]
S. Bavan and E. Illingworth, 2000. “Design and Implementation of Reliable Point-to-Point Asynchronous Message Passing System”. In: Proceedings of the 10th International Conference on Computing and Information ICCI’2000, Kuwait, 18th-21st Nov 2000.
[3] W. Fokkink, and W. Fokkink, 2000. “Introduction to Process Algebra”. 1st. Springer-Verlag New York, Inc. [4]
D. Gray, 1999. “Introduction to the Formal Design of Real-Time Systems”, Springer Verlag.
[5]
R. Milner, 1982. ‘‘A Calculus of Communicating Systems’’, Springer Verlag, New York Inc.
[6] E. Tuosto, 2003. “Non Functional Aspects of Wide Area Network Programming”. PhD Thesis, Departmento di Informatica, University of Pisa. [7] E. Tuosto, 2006. “Research Reports in Computer Science - Concurrency and Mobility”. University of Leicester: School of Mathematics & Computer Science. [8] V.C. Galpin, 1998. “Equivalence semantics for concurrency: comparison and application”, Ph.D. Thesis, Department of Computer Science, University of Edinburgh, LFCS report ECS-LFCS-98-397, http://www.lfcs.informatics.ed.ac.uk/reports.
Application Multicriteria Decision Analysis on TV Digital Ana Karoline Araújo de Castro, Plácido Rogério Pinheiro, Gilberto George Conrado de Souza University of Fortaleza (UNIFOR) – Graduate Course in Applied Computer Science (ACS) Av. Washington Soares, 1321 - Bl J Sl 30 - 60.811-341 - Fortaleza – Brasil
[email protected],
[email protected],
[email protected] Abstract - In domains (such as digital TV, smart home, and tangible interfaces) that represent new paradigms of interactivity, deciding the most appropriate interaction design solution is a challenge. There is not yet any work that explores how to consider the users’ experience with technology to analyze the best solution(s) to the interaction design of a mobile TV application. In this study we applied usability tests to evaluate prototypes of one application to mobile digital TV. The main focus of this work is to develop a multicriteria model for aiding in decision making for the definition of suitable prototype. We used M-MACBETH and HIVIEW as modeling tools. Keywords – Mobile Digital TV, Usability, Multicriteria.
I.
INTRODUCTION
In domains (such as digital TV, smart home, and tangible interfaces) that represent new paradigms of interactivity, deciding the most appropriate interaction design solution is a challenge. Researchers in the Human-Computer Interaction (HCI) field have promoted in their works the validation of alternative design solutions with users before producing the final solution. Taking into account user satisfaction and their preferences is an action that has also gained ground in these works when designers are analyzing the appropriate solution(s). Recent research reveals that the understanding of subjective user satisfaction is an efficient parameter for evaluating interface [10]. In the domain of interaction design for digital TV, we claim that it is necessary to consider both international aspects for supporting the accessibility for all and digital contents for supporting a holistic evaluation (content and user interface) of the TV applications that show content through their user interfaces. Structured methods for tasks generally consider quantitative variables (such as: quantity of errors, number of times that the user consulted help, time taken to find a new function, etc). Research has evaluated qualitative aspects (like user satisfaction and emotion with technology) through evaluator observations and comments during usability tests [7]. The users generally are encouraged to judge the attractiveness of the interface, and from these comments, evaluators produce qualitative texts [12]. The aesthetic quality of a product influences users’ preferences but other qualitative aspects influence judgments that transcend the aesthetic appearance [1].
When dealing with Digital Television applications, new interface project and evaluation paradigms have been developed, as shown by Angeli [1]. No work however has integrated qualitative criticisms in order to obtain a ranking of interface solutions. In addition to understanding subjective questions, another problem deliberated in this research is about traditional means of evaluation. They are quite rigid and not flexible to the emergence of new project alternatives and new ways of considering these alternatives. For example, designers evaluated two interface solutions applying usability tests, and chose one to implement. During the development of a system, three additional design solutions arose as a result of new usability patterns. How can designers consider these new alternatives? How can we evaluate whether a new pattern is better than an old solution? In traditional means, usability tests should be applied to all alternatives. With a multicriteria model, these decisions are efficient and only some alternatives would be evaluated. In this project, three interface solutions for Mobile Digital Television Application were evaluated qualitatively by applying Multi-Criteria Decision Analysis (MCDA). This strategy adequately mapped user preferences and furnished information which helped to judge solutions for the project. It also provided a holistic evaluation of interaction situations and more information to understand and organize subjective questions. The analyses of design solutions become more flexible. A ranking generated by the model is a tool which makes it easy to insert new alternatives and judgments for interfaces. In order to be able to use the model, hypotheses were elaborated. These consider important characteristics in the project for interaction with mobile digital TV. From these hypotheses, criteria were established as well as usability tests applied in order to obtain information on user preferences. The main focus of this work is to develop a multicriteria model for aiding in decision making for the suitable prototype for Digital Mobile TV. At the end of this paper we have provided a ranking with the classification of these prototypes. This ranking is composed of the construction of judgment matrixes and constructing value scales for each Fundamental Point of View already defined [6]. The construction of cardinal value scales was implemented through HIVIEW and MACBETH.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 39–44. © Springer Science+Business Media B.V. 2008
40
II.
CASTRO ET AL.
AN APPLICATION TO MOBILE DIGITAL TV PORTAL
The application of this study was a portal for mobile digital TV. This portal allows manipulation of the received information of the system of transmission of digital TV. This information is presented to the user in a visual form and easily illustrates the interaction during the choosing and reception of the services and applications that are available. During the definition of the application, different ways of implementing a given navigating solution appeared (aid with navigating among the existing options on a screen and among the other screens of the system). We investigated the criteria that influenced the user preferences in determining solutions. Proposals were constructed for the portal application of access for PDA mobile devices. The proposals used in the mobile version of TV portal are: The first proposal is similar to digital TV applications. The second proposal is similar to Palm applications. The third proposal is similar to Desktop applications. Prototype 1 was developed similar to the Portal TV application for digital TV [11]. It does not have the scroll function, but rather has arrows and a slower transition similar to TiVo [17] and for navigation arrows are used in the same way as for portal TV. It is an application for people that have seen or have some experience with digital TV (figure 1).
Fig.2 – System Operational Palm OS
Fig. 3 - Prototype 2: Similar to Palm applications
Prototype 3 was developed in a similar way to computer applications (desktop applications). The application for navigation and the buttons to close the screen is similar to computer applications, such as the "X", which closes the open screen and returns to the previous one the user had open (see figure 4).
Fig. 4 - Prototype 3: Similar to Desktop applications Fig.1 - Prototype 1: Similar to TVD applications
In developing prototype 2 we used applications for mobile devices. We can see also the main systems operations of mobile devices with the Palm OS, for example. The operational system for the Palm OS is used in almost 80% of the PDA market (figure 2). The PDAs use elements common to that of the desktop PC, such as, buttons, icons, menus, etc. These sets were adapted for the access touchscreen and the possibility of using the set with only one hand. It has a vertical scroll, icons and texts. People that are accustomed to using palm pilots have an easier time using scroll because even on the main PDA screen (home) (see figure 2) there is a scroll and icons. Going through screens on the PDA is faster because those who use palm pilots want faster results with the click of a pen. Moving through screens is done by clicking and navigation is quickly done through scrolling (see figure 3).
III.
MULTICRITERIA MODEL
According to [4], in decision making it is necessary to look for elements that can answer the questions raised in order to clarify and make recommendations or increase the coherency between the evolution of the process and the objectives and values considered in the environment. In studies developed by [2] the application of a multicriteria model to aid in decision making for standards of usability that should be employed in the mobile TVD application were carried out. The study used the ZAPROS III method [14] which belongs to the Verbal Decision Analysis framework [8]. The result of the application of the model was satisfactory for this work, since the ZAPROS III method assisted the usability specialists to understand the order of preference among the criteria which are commonly used for interface projects using standards. However, the need came up to validate this model using another multicriteria methodology with characteristics different from those of ZAPROS III [13]. Thus, the MACBETH method (Measuring Attractiveness by a Categorical Based Evaluation Technique) was chosen,
41
APPLICATION MULTICRITERIA DECISION ANALYSIS ON TV DIGITAL
developed by [3] and [6]. This method belongs to the MCDA family. This methodology was chosen because it possesses a tool that makes the construction of the judgment matrix easier. It is an interactive technique that helps in the construction, with a set S of stimulus or potential actions, numerical scales of intervals that quantify the attractiveness of the elements of S in the opinion of the actors, based on semantic judgment of difference in attractiveness between two actions [16]. Another point that influenced in the choice of this methodology, is that keeping the consistency of all judgment values during the evaluation of the matrix, is not easy, but the MACBETH method supplies suggestions that can be accepted by decision makers when the identification of inconsistencies occurs. The MCDA is a way of looking at complex problems that are characterized by any mixture of objectives, to present a coherent overall picture to decision makers. It is important to consider that the purpose of multicriteria methods is to serve as aids to thinking and decision-making, but not to make the decision. As a set of techniques, MCDA provides different ways of measuring the extent to which options achieve objectives. A substantial reading on MCDA methods can be found in [4], [5], [6], [16] and [15], where the authors address the definitions and the problems that are involved in the decision making process. In this study, we used the MCDA tool to help in the solution of the problem: M-MACBETH (www.mmacbeth.com). The evaluation process is composed of the construction of judgment matrixes and constructing value scales for each Fundamental point of view (FPV) already defined. The construction of cardinal value scales will be implemented through the MACBETH methodology developed by [6]. Figure 5 shows the tree corresponding to the FPVs that are used in evaluation of prototypes to mobile digital TV application.
2. Attractiveness of the task to be carried out; If the standard allows for good visibility of the content, does the user prefer this standard in relation to the standard that has a familiar appearance to the applications that he is used to using? 3. Locomotion of the user during the manipulation of the interface; If the standard allows for good spatial orientation, which doesn’t demand much of the user’s attention to manipulate, will it be preferred in relation to the standard that has a familiar appearance to the applications that he is used to using, and at the same time allow for excellent viewing of the content? From the family of FPVs it is possible to evaluate the attractiveness of the options for each interest. Although the definition of the describers of impact is a difficult task, its decisiveness contributes for a good formation of judgments and a justified and transparent evaluation [4]. IV.
DESCRIBERS
An FPV is operational in the moment that has a set of levels of associated impacts (describers). These impacts are defined for Nj, that can be ordered in decreasing form according to the describers [16]. In this step of construction of the describers, the decisions were made during the meetings with the specialists in the usability of mobile TVD of the Usability and Quality of Software Laboratory (LUQS, of the University of Fortaleza), involved in the process. For the evaluation of each FPV, the possible states were defined. Each FPV has a different quantity of states. These states were defined according to attractiveness of the patterns involved for each describer. It is important to remember that the describers have a structure of complete pre-order, otherwise, a superior level is always preferable a lesser level. For the evaluation of each FPV, 3 possible states were defined. Tables 1, 2 and 3 show the describers of these FPVs with 3 levels of impact. TABLE I
Fig. 5 - Problem value tree
These FPVs were defined with the assistance of specialists in the usability of mobile TVD of the Usability and Quality of Software Laboratory (LUQS, of the University of Fortaleza). The specialists wished to analyze the aspects that had the greatest influence in the choice of a determined interface project: 1. Familiarity of the user with a determined technology; If a standard is similar to a determined technology familiar to the user, this standard is preferable to him, since it is easier for him to use.
NI N3 N2 N1
Describer for the FPV 1 Description
Order
No familiarity is required with similar applications of determined technology Requires little user familiarity with applications of determined technology
1º
Manipulation of the prototype is fairly easy when the user is familiar with similar applications
3º
2º
TABLE II Describer for the FPV 2
NI N3 N2 N1
Description Allows high accessibility to the content Allows medium accessibility to the content Accessibility to the content is quite difficult
Order 1º 2º 3º
42
CASTRO ET AL. TABLE III Describer for the FPV 3
NI
Description
Order
N3
The user was not hindered in any way when manipulating the prototype while moving
1º
N2
The user was occasionally confused when manipulating the prototype while moving
2º
N1
The spatial orientation of the application is hindered when the user is moving
3º
V.
EVALUATION
Here we display the results obtained from the software used to generate necessary information for supporting the decision as to which prototype should to be used for mobile digital TV. First the levels of good and neutral for each point of view were determined and then the construction of the semantic value judgment matrixes were carried out, obtaining the local preference rankings (MACBETH) and the referred value function. After the construction of the matrixes, the action impact matrix was made. Finally, Hiview software was used for evaluation.
to the impact levels as well as their function of value will be demonstrated graphically in this passage. The following are the judgment matrixes for each FPV. After the construction of these matrixes, we used the MMACBETH program to obtain the local preference scale for each FPV. Using the scales generated by the MMACBETH program, we can also identify the FPV value functions, which enable us to evaluate the non-described impact levels. These are represented by values or intervals which can be projected on a function graph which shows their value according to the scale, as shown in figures 5, 6, and 7.
Fig. 5 - Judgment of the FPV1
A. Good and neutral levels for each Point of View The authors of the MACBETH approach [6] recommend that the levels for good and neutral for each describer be determined before the FPV value functions are constructed. The determining of these levels is necessary so that the decider does not feel attracted or repulsed in relation to a given action when judging the points of view and constructing the rates of substitution. Thus these levels were necessary in determining the rates of substitution, the construction of the value for each FPV, the MACBETH scale recalibrated, considering the level Good as having a value of 100 points, and Neutral as having a value of 0 (zero) points, in order to do the cardinal independence test. The levels Good and Neutral are demonstrated in the table below: TABLE IV Good and Neutral levels for each FPV
FPV
Neutral
Good
FPV 1
N2
N3
FPV 2
N2
N3
FPV 3
N2
N3
B. Construction of the semantic matrixes for value judgments and value functions
Fig. 6 - Judgment of the FPV 2
Fig. 7 - Judgment of the FPV 3
After evaluating the alternatives of all the FPVs individually, an evaluation of the FPVs in one matrix only was carried out. For this, a judgment matrix was created in which the decision maker’s orders are defined according to the preference of the decision maker. This order of preference was obtained by analyzing the preferences of these FPVs (table V). TABLE V Matrix ordination of all the FPVs
After the structuring phase of the problem being studied, which is finished with the construction of the describers for each Fundamental Point of View, the semantic matrixes for decider value judgments are constructed. These matrixes show the relationship between the differences of attractiveness of the describer impact levels, which generates a scale of local preferences by using the MMACBETH tool. The preference of the decider in relation
FPV1 FPV2 FPV3
FPV1 1 0
FPV2 0 0
FPV3 1 1 -
SUM 1 2 0
ORDER 2° 1° 3°
On the first line of table V we can observe the preference of FPV1 in relation to PFV3, thus assuming a value of 1. However, FPV1 is not preferable to FPV2, thus assuming a value of 0. In the second line, we can see that FPV2 is more
APPLICATION MULTICRITERIA DECISION ANALYSIS ON TV DIGITAL
43
preferable to FPV1 and FPV3. And finally on the third line, we see that FPV3 is not preferable to any of the other FPVs. After analysis, we concluded that FPV2 is the most preferable (with 2 points), next FPV1 (with 1 point) and the least preferable was FPV3 (with no points). By ordering all the FPVs a value judgment matrix was constructed in order to determine the preference scale for the FPVs (figure 8). Fig. 9 - The global evaluation of the actions of all the FPVs
Fig. 8 - Judgment of all the FPVs
C. Action Impact Matrix Once the judgment matrixes are constructed, where the local evaluations for each describer were carried out, the impact of each describer on the FPVs can be verified. Table VI shows this impact on each of the FPVs, considering that the levels are still Good and Neutral. The decider identified 3 actions (prototypes) for calculating the impact of the describers. The actions are of the potential real type and were cited in section II of this article.
Fig. 10 - Criteria Contribution for each prototype
The figure 10 shows how the total weighted scores for each option are built up from individual criteria.
TABLE VI Action Impact Matrix
FPV
Neutral
Good
Prototype 1
Prototype 2
Prototype 3
FPV1
N2
N3
N1
N2
N3
FPV2
N2
N3
N1
N3
N3
FPV3
N2
N3
N2
N3
N1
VI.
EVALUATION RESULTS
In this item the value analyses of the sensitivity and the dominance of potential actions in the process of evaluation of the prototype to be used in digital TV are presented. HIVIEW software[9] was used to carry out these analyses. It is one of the instruments used in the decision support processes, essentially for evaluation of models obtained through Multicriteria Methodologies for decision support, in virtue of the fact that it is used as a function of additive aggregation, yet it is compatible with the procedures developed in this study. In figure 9 the global evaluation of the actions is shown when they are confronted with the three FPVs. It was found that FPV1 has a total participation of 38%, FPV2 has a total participation of 54% and FPV3 has a total participation of 8%. Thus, action (prototype) 3 proved to be potentially better with 92 points; action prototype 2 had 81 points, and action prototype 1 had 3 points.
Fig. 11 - Sensitivity Down for the Root Node
The sensitivity down window (figure 11) displays a summary of weight sensitivity for all criteria below the selected node. It is most useful to display this window for the root node and get a summary of sensitivity throughout the whole model. The sensitivity summary shows that the most preferred option at the root node, given current scores and weights, is prototype 3. The sensitivity down window is used to direct further analysis of the model. The sensitivity up graph displays the sensitivity of the overall results to a change in the weight of a selected criterion or node over the entire range of 0 to 100. When used to show the sensitivity of a criterion, it shows more detail than the sensitivity down screen, providing information about how the most preferred option at the top of the tree will change over the 0-100 weight range. In the figure 12, the sensitivity up graph shows a red bar to the left of criterion FPV 1. It also shows a green bar to the right of this criterion. We will investigate this further. The most preferred option at any cumulative weight has the highest y-value. The upper surface of the graph always shows the most preferred option for a given cumulative
44
CASTRO ET AL.
weight. At the vertical red line, prototype 3 has the highest y-value.
Fig. 12 - Sensitivity Up Graph for FPV 1
VII.
CONCLUSION
It is important to point out our intention was not to compare navigation techniques (such as scrollbars, tap-anddrag, and so on) on mobile devices to identify the best one when users are performing navigation and selection tasks. Our goal was to help designers understand how criteria related to users’ experience could influence their preference for a solution. In addition, we showed how to integrate two different areas (HCI and OR - Operational Research) describing an approach for evaluating the Interaction design in a subjective perspective of OR. It means researchers interested in making qualitative analysis of the interaction, which leads to more objective results can use this proposal. The Multicriteria method for Decision Support (MCDA) is constituted by a group of methods and techniques elaborated from the presupposed theories that give preference to the human element, through their values and convictions. This proved to be an advance in traditional operational research, which in model solutions had only considered the technical operative aspects. With respect to the evaluation phase of M-MACBETH and HIVIEW software, they both proved to be compatible instruments for the analyses that were carried out. In the global evaluation, it was also found that action prototype 3 presented itself as being potentially better than the others since it obtained 92 points as opposed to 81 points for action prototype 2 and 3 points for action prototype 1. Therefore we can see the importance of formally evaluating the subjective aspects involved in the analysis of which usability standards should be used for this application of mobile TVD. Prototypes using new usability standards are being developed at LUQS. So, the ranking supplied by this research will be augmented and new contributions will be analyzed in future work. ACKNOWLEDGMENT The authors are thankful to Celestica of Brazil for the support they have received for this project.
REFERENCES [1] A. Angeli, A. Sutcliffe, J. Hartmann. Interaction, Usability and Aesthetics: What Influences Users’ Preferences?, Symposium on Designing Interactive Systems, Proceedings of the 6th ACM conference on Designing Interactive systems, Pages: 271 – 280, 2006. [2] A. L. Carvalho, M. Mendes, M. E. S. Furtado, P. R. Pinheiro. Analysis of the Interaction Design for Mobile TV Applications based on Multi-Criteria. In: The IFIP International Conference on Research and Practical Issues, 2007, Beijing China. Lecture Notes in Computer Science. Berlin : Springer Verlag, 2007. [3] C. A. Bana. & Costa, L. Ensslin, E. C. Correia, J. C. Vansnick. Decision support systems in action: integrates application in a multicriteria decision aid process, European Journal of Operational Research, v. 133, p. 315-335, 1999. [4] C. A. Bana & Costa, E. Beinat, R. Vickerman, Introduction and Problem Definition, CEG-IST Working Paper, 2001. [5] C. A. Bana & Costa, E. C. Correa, J. M. D. Corte; J. C. Vansnick, Facilitating Bid Evaluation in Public Call for tenders: A Social-Technical Approach. OMEGA, v. 30, p. 227-242, 2002, [6] C. A. Bana & Costa, Corte, J. M. D., Vansnick, J. C. MACBETH. LSE-OR Working Paper, 56, 2003. [7] E. Furtado, F. Carvalho, A. Schilling, D. Falcão, K. Sousa, F. Fava. Projeto de Interfaces de Usuário para a Televisão Digital Brasileira. in: SIBGRAPI 2005 – Simpósio Brasileiro de Computação Gráfica e Processamento de Imagens, Natal, 2005. [8] J. Figueira, S. G. M. Ehrgott. (Eds.), Multiple Criteria Decision Analysis: State of the Art Surveys Series: International Series in Operations Research & Management Science, Vol. 78, XXXVI, 1045 p, 2005. [9] KEYSALIS. Hiview for Windows. London: Krysalis, 1995. [10] K. Chorianopoulos and D. Spinellis, User interface evaluation of interactive TV: a media studies perspective, Univ Access Inf Soc; 5: 209–21, 2006. [11] M. Soares Mendes, E. Furtado, (2006). Mapeamento de Um Portal de Acesso de Televisão Digital em Dispositivos Móveis. In: IHC'2006, Natal. IV IHC'2006. [12] N. Tractinsky, A. S. Katz, D. Ikar, What is beautiful is usable, Interacting with Computers, Volume 13, Issue 2, Pages 127145, 2000. [13] O. Larichev and H. Moshkovich, Verbal Decision Analysis for Unstructured Problems, Boston: Kluwer Academic Publishers, 1997. [14] O. Larichev, Ranking Multicriteria Alternatives: The Method ZAPROS III, European Journal of Operational Research, Vol. 131, 2001. [15] P. Goodwin, G. Wright. Decision analysis for management judgment. 2. ed. John Wiley & Sons, Chicester, 1998. [16] P.R. Pinheiro, G.G.C. Souza. A Multicriteria Model for Production of a Newspaper. Proc: The 17th International Conference on Multiple Criteria Decision Analysis, Canada, 315–325, 2004. [17] TIVO. Disponível em:
. Accessed in 03/20/2007.
Softwares Hiview Software, Catalyze Ltd. www.catalyze.co.uk M-Macbeth, www.m-macbeth.com
A Framework for the Development and Testing of Cryptographic Software Andrew Burnett
Tom Dowling
Department of Computer Science National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland Email: [email protected]
Department of Computer Science National University of Ireland, Maynooth Maynooth, Co. Kildare, Ireland Email: [email protected]
Abstract— With a greater use of the Internet and electronic devices in general for communications and monetary transactions it is necessary to protect people's privacy with strong cryptographic algorithms. This paper describes a framework for the development and testing of cryptographic and mathematical software. The authors argue this type of software needs a development and testing framework that is better tailored to its needs than other more general approaches. The proposed framework uses the symbolic mathematics package, Maple, as a rapid prototyping tool, test oracle and also in test case refinement. Finally, we test our hypothesis by evaluation of systems developed with and without the framework using quantitative techniques and a multi component qualitative metric.
I. INTRODUCTION The proliferation of electronic transactions and the growth of pervasive computing mean it is increasingly necessary to protect people's privacy with strong cryptographic algorithms. This paper describes a framework for the development and testing of cryptographic and mathematical software. As with other domain specific areas there are certain characteristics that make cryptographic and mathematical software different to regular software. We describe these characteristics next. Firstly, mathematical and cryptographic software deal with complicated mathematical theory that a regular software engineer would not encounter in his daily work. Cryptographic software is also characterised by its use of very large numbers and its intense computations with them. Another reason cryptographic software is different is that it suffers from the testing oracle 1 problem [1], [5]. The testing oracle problem states that while it is theoretically possible to define an oracle for the program, it is too difficult in practice because the numbers involved are so large. This can be tackled by using specific or simple data to test the program, so that is easy to ascertain if it is correct. This method tells little about how the program functions with larger more complex data which tends to be more error ridden. Another method to address the oracle problem is to use a previous version of the software, that has shown to be correct, under test as a testing oracle. These characteristics create a unique set of challenges for developing and testing such software. The authors feel a development and 1 A formal method of determining whether the system under test has preformed correctly for a given input.
testing framework tailored to cryptographic software would be beneficial. The proposed framework uses the symbolic mathematics package, Maple [2], as a rapid prototype, test oracle and also in test case refinement. This Maple prototype is first used to aid the development of the final Java [3] application and is then further used to test this application. The framework addresses two main target groups involved in the development of cryptographic software. Firstly, we have the situation of a mathematician without the necessary training in software engineering producing unstructured difficult to maintain software. Secondly, there is the case of the software engineer with an incomplete understanding of the mathematical basis of the software leading to unforeseen errors in its operation. The framework described in this paper allows both these situations to be resolved satisfactorily. In the first case the mathematician is allowed to design and develop in a more structured manner. The engineer is also able to produce the software faster and of better quality through the use of the symbolic mathematics package. The developer with both the mathematical and software engineering knowledge can also benefit from the framework as it allows him to work in a structured way consistently. This paper demonstrates the validity of the framework by using sample cryptographic prototypes to demonstrate the process. Also, using these samples we show how the framework decreases development time, increases the dependability of the software and consequently makes it more secure. II. DESCRIPTION OF FRAMEWORK In this section the framework will be presented in three sections. Firstly, Section II-A looks at the development of a rapid software prototype with Maple. Sections II-B and IIC describe how this prototype is used in the testing phase, providing a test oracle and assisting with test generation. An overview of the framework can be seen in Figure 1. A. Rapid Prototype The first stage in the framework is the development of a rapid software prototype of the cryptographic application in question. It was decided that the symbolic mathematics package, Maple, would be use to produce this prototype. We continue by discussing some reasons for choosing Maple.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 45–50. © Springer Science+Business Media B.V. 2008
46
BURNETT AND DOWLING
Fig. 1.
Framework Overview
The first and most important reason for picking Maple is its comprehensive in-built set of complex mathematical functions. This is essential in order to rapidly develop a prototype of a cryptographic algorithm as the complex mathematics can be treated as being high-level in the implementation. For example: • The extended Euclidean algorithm for polynomials can
be calculated by the function gcdex(A,B) (where A and B are polynomials). • The power function with remainder can be found with the powmod(a,n,b) function (i.e. a n mod b where a, b are polynomials and n ∈ Z). • The inverse of a finite field element can be found simply by aˆ(-1) mod p where a is the element whose inverse is required and p is the order of the field. The authors believe this gives Maple the advantage over other similar packages (Mathematica, Matlab, etc...) for our particular use. The second target group mentioned in Section I (i.e. the engineer lacking full mathematical training) will find this particularly useful as some of the mathematics can be abstracted to a high level. A working prototype can be developed quickly without the complete knowledge of the underlying mathematics giving greater understanding of the algorithm and what is required to implement the final version. This Maple prototype would be classed as a throw away prototype [4] as it does not directly evolve into the final version. For the purposes of this paper the final version will be in the Java language. The function of the prototype is to gain a greater understanding of the algorithm and then later it is used to test the final Java version. The next two sections cover the prototypes further use in the testing phase.
B. Test Oracle Once the Maple prototype is complete the Java version can be designed and implemented using the knowledge gained from the prototyping process. As discussed in Section I this kind of software suffers from the testing oracle problem [6] and one solution is to use a previous version of the program as an oracle. Therefore, in our case the prototype is transformed into this type of testing oracle. The problem with this is that we must show the previous version works properly. Clearly, in our case the previous version is the Maple prototype. Since this will be typically much smaller than the final version due to the in-built functionality of Maple it is feasible to test the prototype by human inspection or walk through. The Maple programming language is primarily procedural. This means the standard procedural inspection and walk through techniques can be applied [6]. With the prototype fully tested by hand and the developer having a high confidence in its correct operation it can be used as an oracle. The oracle can be used in many ways to verify the correct operation of the final version. For example, it could be used simply as a black box taking inputs and producing outputs. In addition to just a single output it could also produce various intermediate results at crucial points through the program. In the next section we look at how the oracle is used in the generation of test cases for the Java version. C. Test Case Generation The Maple prototype is used once again in the generation of test cases [6] for the final version. The prototype is used in both white and black box testing. For black box testing the definition of the equivalence partitions and boundary values
A FRAMEWORK FOR THE DEVELOPMENT AND TESTING OF CRYPTOGRAPHIC SOFTWARE
Fig. 2.
47
Evaluation Overview
[6] are done using the Maple prototype. It is then used to actually define the input and output values for the test cases so that all partitions and boundary values are covered in an optimal fashion. White box testing uses the structure of the program to define its test cases so that the program code is covered to a certain degree [6]. The prototype can again be used in the white box testing of the final version. Although the structure of the code will be different to the final Java version, there will be similarities that can be utilized. The prototype can be used to test these portions of similar code by providing test cases that cover the code to required degree. The internal structure of the in-built Maple functions is hidden. Therefore, the methods of the final version that correspond to these cannot be white box tested with the help of the prototype. However, the prototype can be used to black box test these functions in a modular fashion using the same techniques mentioned above. As mentioned in Section II-A the in-built functions of Maple allow the developer to treat the complex mathematics of the algorithm in a high-level fashion leaving the rest of the program relatively straightforward. When it comes to the final version in Java it is highly likely that these complex mathematical portions of the code will contain errors. These errors arise because these pieces of code can no longer be treated in a high level way and must be implemented by the developer. It is these sections of code, corresponding to the in-built Maple functionality of the prototype, that must be given special attention during testing. As mentioned above the
prototype can only aid in black box testing these methods. It can be used to flag these methods in the Java version so that they can be thoroughly tested and maximum code coverage can be obtained. D. Evaluation The framework is evaluated in a number of ways split into quantitative and qualitative techniques. Two quantitative methods used are outlined in figure 2. Method A establishes the effectiveness of the framework as a testing mechanism. This system can be thought of as one system to many test suites. Using a specification of a cryptographic algorithm the framework is used to produce Java code and an accompanying test suite. Two to three generic test case generators are then applied to the code to produce additional test suites. Mutation testing techniques are then used to introduce errors in the Java code. All the test suites are applied to the mutated code and the results of each test suite are analysed. The framework is deemed to be more effective if its test suite finds more errors than the other test suites. Method B evaluates the frameworks effectiveness as a development process. This method can be thought of as two systems to a single test suite. As in method A a specification of a cryptographic algorithm is used by the framework to produce Java code and an accompanying test suite, although the test suite has no further use in this method. Another Java application conforming to the same specification, but developed by another party using a standard Java development
48
BURNETT AND DOWLING
process, is acquired. Both Java applications are subjected to two generic test case generators and the resulting test suite are applied to the relevant code. The results are again analysed. Finding less errors in the code resulting from the framework indicates success. The third technique is the use of metrics to quantify the benefit regarding time and work the framework may or may not have over general techniques. The relative mathematical complexity of the algorithm involved is also established using metrics. Unfortunately, at the time of writing the evaluation stage is not yet complete because of this no specific evaluation data can be presented in this paper. III. APPLICATION ON SAMPLE CRYPTOGRAPHIC ALGORITHMS This section looks at some sample applications that were developed by the authors using the framework. Section III-A is concerned with the development of a Java polynomial arithmetic package [7]. The development of Schoof's algorithm, a method for determining the number of points on an elliptic curve [8], is covered in Section III-B. A. Polynomial Arithmetic Polynomial arithmetic is used in many areas of mathematics and cryptography, for example elliptic curve cryptography [9] and the McEliece cryptosystem [10]. The authors requirements for the polynomial arithmetic package were that it be implemented in Java and cater for all commonly used polynomial operations, such as multiplication, addition, greatest common divisor and modular arithmetic, and also be readily extendible. As the framework procedure states a Maple prototype was created first. This prototype contained all the polynomial functionality of the required Java version. Maple has extensive inbuilt support for polynomial arithmetic. All that was required to implement the prototype was to wrap the in-built procedures into the required interface. ## Maple v9.03 ## mul := proc(A, B, p) return (A * B mod p); end proc; Fig. 3. Polynomial arithmetic prototype's modular polynomial multiplication routine.
Figure 3 shows the prototype's implementation of the multiplication of two polynomials A and B modulo some integer p , clearly this is made very simple using Maple. Similarly, Figure 4 shows the implementation of the greatest common divisor between two polynomials A and B modulo an integer p. Once the prototype was complete it was inspected by hand for errors. This procedure was quite straightforward as most of the code was based on in-built Maple procedures. At this stage the design and development of the final version was underway. As we now had our testing oracle we went about generating
## Maple v9.03 ## greatest_common_divisor := proc(A, B, p) return Gcd(A, B) mod p; end proc; Fig. 4. Polynomial arithmetic prototype's polynomial modular greatest common divisor routine.
the test-cases for the final version focusing on covering the Maple procedures. In Table I we give a few examples of the test-cases for the final version's GCD method, equivalent to Figure 4. Once a full test-suite was generated for the Java version based on the prototype the final version could be thoroughly tested ensuring greater dependability and security for the package. B. Elliptic Curve Point Counting The security of elliptic curve cryptosystems (ECC) [9], [11] is based on the difficulty of the problem of computing discrete logarithms in the elliptic curve group. The best known algorithms for solving this problem for a random elliptic curve have a running time proportional to the largest prime factor dividing the group order. Therefore for the system to be resilient to known attacks it is necessary to know this group order [12]. The first deterministic polynomial time algorithm for computing this group order was developed by Schoof [8], [13]. As in Section III-A we required the application to be implemented in Java. ECC and Schoof's algorithm in particular use a great deal of polynomial arithmetic, so the arithmetic package described in the previous section can be used in the final version to implement the polynomial arithmetic functionality. The Maple prototype of Schoof's algorithm was more complex than the prototype developed in the previous section because it required more than just the in-built functionality of Maple. Elliptic curve arithmetic with polynomials needs to be implemented as Maple has no in-built support for it. Division Polynomials which are bivariate polynomials 2 defined by a recurrence based on the elliptic curve must also be implemented. Division polynomials are closely linked with torsion points [9]. The implementation of these constructs in Maple is still easier than building then from scratch as they can be composed of in-built Maple functions. ## Maple v9.03 ## mpMul := proc(A, B, dp, p) return modp1(Rem(modp1(Multiply(A, B), p), dp), p); end proc; Fig. 5.
Schoof polynomial multiplication.
Figure 5 shows the code for the multiplication of polynomials in the algorithm, they are all multiplied modulo an integer 2
Polynomials in two variables
A FRAMEWORK FOR THE DEVELOPMENT AND TESTING OF CRYPTOGRAPHIC SOFTWARE
p and a division polynomial dp. More detailed information on the Schoof prototype can be found in [14]. As before, the finished prototype was inspected by hand. Even though Schoof's algorithm is more complex than the polynomial package of the previous section it is still relatively small piece of Maple code and it was feasible to inspect it by hand. The test oracle was particularly useful in the development of this application as the authors found there to be very little test data generally available to work with. Common practice for testing this kind of algorithm would be to use a black box technique, given some elliptic curve with a known number of points the curve would be given as input and the number points output by the program. The problem with this is that you need to rely on available data or another point counting program. The Maple oracle fits perfectly here. The disadvantage to black box testing this kind of algorithm is that it does not cater for false positives (i.e. getting the right answer from incorrect calculations). The Maple oracle again solves this problem by being able to produce any intermediate data that may be required to further test/debug the final Java version. Table II shows some sample test-cases used to test the final version. Developing and testing the algorithm in this way allows it to become more dependable and secure. In Section IV we return to this algorithm and look at how the framework improves the performance of the final Java version. IV. APPRAISAL OF APPLICATIONS The preceding sections of the paper have described our proposed framework and examples of its use with sample applications. In this section we discuss a specific application.
49
Most of the analysis of the framework up to this point has been with the Schoof algorithm application described in Section III-B. We present data based on the development of the application and show how the use of the framework aided development. Originally the elliptic curve point counting application was developed within another project of the author's research group. This initial implementation was constructed without the use of any design pattern or development framework. It was implemented directly in Java from text book descriptions. Due to the complex mathematics involved the author found it difficult to test the application with the data that was publicly available. At this time the notion of a framework for developing and testing cryptographic and mathematical software arose. This framework was then used on the Schoof algorithm application. The Maple prototype was developed as discussed in Section III-B and the Java version was tested using testcases generated from the prototype. Figure 6 displays the the various testing iterations of this Java version, by showing the performance in seconds for elliptic curves over progressively larger finite fields [9], [11]. The initial version is the plot labeled 'Version 1' on the graph. The 'Version 2' plot is the Java version after being tested with the first sequence of test-cases checking some intermediary calculations. This led to great improvement in performance by highlighting some implementational errors involving greatest common divisor and modulo operations. Further improvement was gained from more testing by extending the test-cases to incorporate more intermediate calculations. Errors in the mod routine of the polynomial arithmetic package, which the Schoof application uses, were discovered. The recurrence relation used in the
50
BURNETT AND DOWLING
Fig. 6.
Testing Progression of Java version of Schoof's Algorithm.
construction of the division polynomials was also found to contain errors at this stage. The improvement can be seen as the 'Version 3' plot of Figure 6. This section has indicated that the framework can improve the quality, performance and dependability of the software in question and consequently improve its security by reducing errors. V. CONCLUSION In this paper we have discussed how cryptographic and mathematical software should be developed within a structured framework more suited to the specific requirements of that kind of software in order to improve dependability and security. We have described such a framework and how it could be applied to sample applications. The value of the framework was also presented with results from the sample applications. We conclude that, based on our initial results, the framework contributes to error reduction and performance. REFERENCES [1] T. H. Tse, T. Y. Chen, and Z. Zhou, “Testing of large number multiplication functions in cryptographic systems,” in Proceedings of the 1st Asia-Pacific Conference on Quality Software (APAQS 2000). IEEE Computer Society, 2000, pp. 89-98. [2] “Maple, version 9.03,” http://www.maplesoft.com. [3] “The java programming language,” http://java.sun.com.
[4] I. Sommerville, Software Engineering, 6th ed. Addison Wesley, 2001. [5] L. Baresi and M. Young, “Test oracles,” 2001, preprint for ACM Surverys. [6] G. J. Myers, T. Badgett, T. M. Thomas, and C. Sandler, The Art of Software Testing, 2nd ed. John Wiley & Sons, 2004. [7] C. Whelan, A. Burnett, A. Duffy, and T. Dowling, “A Java API for polynomial arithmetic,” in Proceedings of Principles and Practices of Programming in Java (PPPJ). ACM Press, 2003. [8] R. Schoof, “Elliptic Curves over finite fields and the computation of square roots mod p,” Math. Comp., vol. 44, pp. 483-494, 1985. [9] I. Blake, G. Seroussi, and N. Smart, Elliptic Curves in Cryptography, 1st ed. Cambridge University Press, 1999. [10] R. J. McEliece, “A public-key cryptosystem based on algebraic coding theory,” JPL Deep Space Network Progress Report, Tech. Rep. 42- 44, 1978. [11] J. Silverman, The Arithmetic of Elliptic Curves, ser. Graduate Texts in Mathematics. Springer-Verlang, 1986, vol. 106. [12] A. Menezes, S. Vanstone, and T. Okamoto, “Reducing Elliptic Curve Logarithms to Logarithms in a Finite Field,” ACM Transactions, 1991. [13] R. Schoof, “Counting points on elliptic curves over finite fields,” J. Th´eorie des Nombres Bordeaux, vol. 7, pp. 219-254, 1995. [14] A. Burnett and T. Dowling, “Rapid prototyping and performance analysis of elliptic curve cryptographic algorithms using java and maple,” in Proceedings of IASTED Conference on Software Engineering and Applications (SEA '04), 2004.
Transferable Lessons from Biological and Supply Chain Networks to Autonomic Computing Ani Calinescu Computing Laboratory, University of Oxford, Wolfson Building, Parks Road, Oxford OX1 3QD, UK Abstract Autonomic computing undoubtedly represents the solution for dealing with the complexity of modern computing systems, driven by ever increasing user needs and requirements. The design and management of autonomic computing systems must be performed both rigorously and carefully. Valuable lessons in this direction can be learned from biological and supply chain networks. This paper identifies and discusses but a few transferable lessons from biological and supply chain networks to autonomic computing. Characteristics such as structural and operational complexity and the agent information processing capabilities are considered for biological and supply chain networks. The relevance of the performance measures and their impact on the design, management and performance of autonomic computing systems are also considered. For example, spare resources are often found in biological systems. On the other hand, spare resources are frequently considered a must-not property in designed systems, due to the additional costs associated with them. Several of the lessons include the fact that a surprisingly low number of types of elementary agents exist in many biological systems, and that architectural and functional complexity and dependability are achieved through complex and hierarchical connections between a large number of such agents. Lessons from supply chains include the fact that, when designed and managed appropriately, complexity becomes a value-adding property that can bring system robustness and flexibility in meeting the users’ needs. Furthermore, information-theoretic methods and case-study results have shown that an integrated supply chain requires in-built spare capacity if it is to remain manageable.
I. INTRODUCTION The fundamental aim of autonomic computing is to emulate essential structural and operational characteristics of biological systems, in a heterogeneous computing environment, whilst also meeting performance objectives usually associated with supply chains. The tremendous growth in the complexity of managing existing IT network-based systems is one of the underlying reasons for the autonomic computing vision. In this ambitious endeavour, complex issues need to be identified and addressed [1]. The design and management of autonomic computing systems must be performed both rigorously and carefully, and valuable lessons in this direction can be learned from biological and supply chain networks. Biological systems have the desirable properties, whilst supply chains represent the available implementation that best, although imperfectly, matches the desired goal. This paper identifies and discusses only a few from the potential wealth of transferable lessons from biological and supply chain networks to autonomic computing. Characteristics
such as structural and operational complexities and the agent information processing capabilities are considered for biological and supply chain networks. The relevance of the performance measures and their impact on the design, management and performance of autonomic computing systems are also considered. For example, spare resources are often found in biological systems. On the other hand, spare resources are frequently considered a must-not property in designed systems, due to the additional costs they would involve. One of the main lessons from supply chain networks is that, when designed and managed appropriately, complexity becomes a value-adding property that can bring system robustness and flexibility in meeting the users’ needs. The remainder of this paper is structured as follows. Section II briefly introduces the autonomic computing vision and main concepts used in this paper. Section III and Section IV present concepts and examples from biological and supply chain systems, respectively. Section V reviews and discusses the main ideas presented so far, comments on their applicability to autonomic computing, and raises several research questions. Section VI concludes the paper. II.
RELATED WORK ON AUTONOMIC COMPUTING
Autonomic computing consists of systems that can manage themselves when given high-level objectives from administrators. The aim of autonomic computing is to address the complexity crisis associated with managing complex software characterised by increasing computational, technological, communication and systemic requirements [1, 2]. These self-governing systems should be able to configure, heal, optimize and protect themselves without human intervention. Autonomic systems consist of autonomic elements, which are self-organising. “They can discover each other, operate independently, negotiate or collaborate as required, and organise themselves so that the emergent stratified management of the system as a whole reflects both the bottom-up demand for resources and the top-down business-directed application of those resources to achieve specific goals” [3]. In this paper, complex systems represent network-based systems characterised by feedback-driven flow of information, openness, self-organisation and emergence. A network represents a system which has a large number of components capable of interacting with each other and with the environment, and which may act according to rules that may
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 51–56. © Springer Science+Business Media B.V. 2008
52
CALINESCU
change over time and that may not be well understood by an external observer. Self-organisation represents the ability of the components of a system to make local decisions that have a coherent, organising impact on the system as a whole. The system displays organisation without the application of a central internal or external organising principle. Emergence represents the process of complex pattern formation from simpler rules; emergent properties are neither properties had by any parts of the system taken in isolation nor a resultant of a mere summation of properties of parts of the system. III. BIOLOGICAL SYSTEMS Biological systems are highly robust and dependable systems. Depending on their level of complexity, they range from highly decentralized systems to systems characterized by a combination of centralized and decentralized action and control. They are composed by simple entities that combine together to form more complex, self-contained structural and functional entities. The more evolved a biological system is, the higher the diversity, complexity and heterogeneity of its elements. Selforganisation exists, however, even in simple homogeneous systems. The clue to this is embedded in the agents’ information transmission and processing capabilities, and in their ability to act based on this information. Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions for the development and functioning of living organisms. Genes are not only DNA segment carriers, but also carriers of fundamental information that can be read, processed and used by any part of an organism, and that is transmitted from one generation to another. Whether the DNA repetition and redundancy within the genome is functional rather than truly redundant is still an open and controversial issue [4]. A Self-organisation and emergence in biological systems Self-organisation is intrinsically related to emergence [5]. Within biological systems, we can distinguish between individual self-organisation and collective self-organisation. Examples of individual self-organisation in biological systems include the immune system of mammalians, the regeneration (self-healing) of cells, and the brain behaviour [5]. For collective self-organisation, a fundamental question is how large-scale patterns are generated by the actions and interactions of the individual components [6]. Examples of mathematical modelling being used to model and validate collective self-organisation include [6]: • The wave-like front of migrating wildebeest herds; • The generation and use of collective trail systems of animals, including humans; • The collective behaviour of humans within crowds; • Fish schools and bird flocks.
A qualitative insight obtained through modelling consists of the fact that there are costs and benefits associated to individuals being in groups. Grouping may benefit foraging by allowing transfer of information about resources. It may also decrease an individual’s risk of being targeted by a predator. On the other hand, a group is more visible than an individual. Emergence may be illustrated by the optimality of the path followed by the ants foraging for food [5], or by the impact of various grouping strategies chosen by the individuals on the group [6]. For example, group size was shown to be contextdependent in fish populations: individuals increase their probability of being in smallest group sizes when food is around (detected through food odour), and they increase their probability of being in larger group sizes in the presence of alarm substance [6]. Therefore, the fish seemed to be trading off foraging benefits and safety in their movement decisions. B. Brain-Mind Much of the complex human behaviour has been achieved through evolution, growth, selection and learning, at both individual and species levels [7, 8]. Brain architecture: The nervous system has central and peripheral parts [7, 8, 9]. It contains only two main categories of cells: neurons and glial cells. Neurons represent the information-processing and signalling elements and come in a variety of sizes, but they are all variations of the same theme. The glial cells play a variety of supporting roles. “The human brain is composed of approximately 100 billion neurons, and a typical neuron has approximately 5000-200,000 synapses [9]”. There might be as many as 10,000 types of neurons. Furthermore, diversity exists between individual cells within a neuronal subtype. “However, how and when neuronal diversity is generated remains unknown [9]”. Brain functionality: The selective and switching attentions are types of activities often performed by the brain. The long-term and working memory concepts are used by specialists to explain these types of activities. Algorithms prior learned by the brain are performed. Neuroimaging techniques have been used to determine the neural areas selected during certain cognitive tasks. Such patterns of activation provide a map of brain regions that mediate task performance. “The complexity of the human brain permits the development of sophisticated behavioural repertoires, such as language, tool use, self-awareness, symbolic thought, cultural learning and consciousness. Partly owing to the uniqueness of the neuronal heterogeneity and interconnections in our brains, each human is different [9]”. Robustness and dependability: Sleep and self-healing are two main features that determine robustness and dependability in biological systems. A relatively recent finding has shown that sleep and memory are interdependent [10]. Sleep represents a behavioural state that takes about a third of human lives. Sleep fosters conservation of energy and information and repair of injury. Sleep or rest are encountered in organisms of various degrees of complexity, ranging from fish and birds to mammals. Furthermore, sleep deprivation results in the disruption of metabolic and caloric homeostatis and of
BIOLOGICAL AND SUPPLY CHAIN NETWORKS TO AUTONOMIC COMPUTING
immuno-competence, and, if prolonged, in death. Also, the hypothesis whether a similarity between the nervous and immune systems exist has been explored in recent studies [9]. Biological systems are consistently dependable at converting their global objectives into local objectives, at continuously monitoring themselves, detecting when they have deviated from the desired objectives, and at defining and taking real-time corrective actions. IV. SUPPLY CHAINS Supply chains (SC) are defined as designed, network-based systems aimed to provide services or products to end customers [11]. A supply chain instance contains entities such as suppliers, manufacturers and service providers, distributors, retailers, providers and end customers. Bi-directional material, information and money flows are established between the elements of the supply chain. Furthermore, contractual, relationship and cultural links control the behaviour of the chain. Each entity may belong to more than one supply chain. Physical distribution and globalisation are important features which have been facilitated by the Internet. Main concepts used in modern supply chain management include Just-in-Time, agile and lean manufacturing, and mass customisation. These have been further facilitated by the advent and expansion of the Internet. The supply chain objectives are to maximise added-value and optimize quality, cost, speed, dependability and flexibility [12]. These objectives are translated into specific performance measures such as resource utilisation, resource flexibility (agility), volume of work in progress, customer satisfaction, and product and process quality measures. The main global sources of complexity in supply chain systems are variety and uncertainty [13, 14, 15]. Two main types of complexity, structural and operational have been identified [13, 16]. Structural complexity is associated with the amount of information needed to schedule the required products on the existing resources. Operational complexity is associated with the uncertainty at the operational stage. Variety classes include product, resource and task variety. Uncertainty classes include resource breakdowns, unreliable suppliers, customer order changes and quality problems. These appear at all stages of supply chains and, due to the close interdependencies propagated through the network structure, can, if not properly detected and controlled, negatively affect the whole supply chain. The human dimension further contributes to the complexity and flexibility of supply chains, through decisionmaking, subjective interpretation and understanding of rules and absenteeism. Unless the systems are fully automated, people such as managers and schedulers monitor the system, analyse its performance, and make online decisions to correct or amend its behaviour, if needed. In many cases they may use Information Systems to a greater or lesser extent during these stages.
53
Recent results used information theory to identify and support methods for managing operational complexity in the supply chain, such as [14]: • Exporting operational complexity to other organisations; • Charging for the service of coping with imported complexity, through penalty charges or premium rates for satisfying variations in customer order; • Investing in precautionary systems that work to avoid complexity generation, such as holding stock, schedule with excess capacity, investing in a more advanced Information System, or more frequent and effective decision-making; • Investing in sufficient resources to absorb complexity. Contractual agreements should be used to define procedures to be used for acknowledging, quantifying and controlling these complexity transfers at the supplier-customer interfaces. Furthermore, Frizelle and Efstathiou [16] have shown that greater supply chain integration reduces operational complexity, but may increase structural complexity. A possible way to guard against the latter is to increase the frequency of information transfers. Greater integration also reduces inventory levels. These results were validated in two major British supply chain case-studies [14, 15, 16]. The concepts of absorbing and exporting complexity were also illustrated by these results. An important finding of these case-studies is that an integrated supply chain requires in-built spare capacity if it is to remain manageable. In terms of performance measures, structurally integrated chains impose the customer’s measures on the supplier. However, the location of bottlenecks can destroy integration. Bottlenecks need to be dynamically identified and appropriately managed – in order to be stable, every resource utilised by the supply chain must never be loaded to capacity. A. Control Various Information Systems (IS) have been proposed throughout the years for managing and controlling individual entities or overall supply chains [17, 18]. Examples of functions that such systems contain include forecasting, planning, scheduling and performance analysis. The implementation of such systems usually takes many years and costs significantly more than initially estimated. Commitment and trust are essential to their success. Many lessons can be learned from both the successful and unsuccessful Information System implementation stories [19, 20]. Several of the difficulties and problems associated with the introduction of a new Information System include [19, 20, 21]: • Choice of IS based on an incomplete specification of the company’s or supply chain’s needs; • Inaccurate data being supplied to the new IS; • Overall implementation costs, time and resources significantly larger than originally planned or estimated; • Poor implementation; • No or insufficient end-user involvement at various implementation stages and under-committing resources, which lead to lack of trust and motivation.
54
CALINESCU
B. Self-organisation and Emergence in Supply Chains Within supply chains, people often use the information locally available and their available connections to establish and influence social and work relationships and to determine the performance of supply chains. The people are the selforganising component of supply chains. The Bullwhip effect represents the amplification of demand fluctuations from the bottom to the top of the supply chain, i.e. from retailer to factory. The variance of orders may be larger than that of sales, and the distortion tends to increase as one moves up in supply chain. As demand increases, firms order more in anticipation of further increases, artificially communicating high levels of demand. Therefore, the Bullwhip effect is a unexpected and rather undesirable example of human-induced self-organisation in supply chains [11, 22]. Causes of Bullwhip effect include batch ordering, demand forecasting, rationing and shortage gaming, and price fluctuations and promotions. The Bullwhip effect can be reduced by providing more accurate information on orders and lead time, through Electronic Data Interchange (EDI) and Computer-Aided Ordering, demand data sharing, mixed order transfers, and smaller and more frequent ordering, in order to reduce demand variability. Mechanisms for reinforcing customer honesty include more stringent time fencing, i.e. placing restrictions or penalties on customer who change their orders. Another example of self-organisation in supply chains represents the informal communication exchanges between people at various supply chain levels. Examples of these include transmitting information related to unplanned order changes, when an order may be needed later, earlier, or in a different volume/mix than originally planned. Such information exchanges and negotiations often take place, with the facility manager or scheduler deciding to stop an existing job and reschedule a new one in order to satisfy the customers. Various understandings on this type of behaviour may exist within organizations. For example, the scheduler, who is at the interface with the customers, perceives such changes as positive, and considers that the ultimate aim is to satisfy the customers. The company managers, however, may consider that these changes destabilise the system and have a negative impact on other company-specific performance measures, such as schedule adherence and resource utilisation. As a consequence, conflicts may occur and need to be addressed. The underlying reason for these problems lies in the manner in which company objectives and measures are translated, understood and imposed or adopted at local level. Are these objectives feasible? Are the measures transparent and directly related to the specified objectives? If changes in customer orders have to implemented, who meets the additional costs involved by rescheduling, management of work-in-progress jobs or overtime?
IV. TRANSFERABLE LESSONS TO AUTONOMIC COMPUTING This section summarises, compares and emphasises some of the main lessons that can be transferred from biological and supply chain networks to autonomic computing. As stated in [23], “It is incredibly important that we bring the two disciplines of information technology and life sciences together”. The first lesson that deserves mentioning is the elegant simplicity and generic features found in biological systems. For example, there are just two main types of cells in the human nervous system. Autonomy and self-organisation are embedded at each level. Modularity and layers are hidden architectural properties. What is visible and accessible is a coherent, synchronised and dependable functionality, rather than the overall architectural and networking details. Biological systems solutions are scalable, generic, and easy to model and describe. Nevertheless, (and unsurprisingly), the architectural and functional complexity of the nervous system is not yet fully understood. Concepts that have more meaning in supply chain networks than in biological systems include customers, the need to maximise the added value and to reduce waste. On the other hand, biological systems are more focused and selfless in their quest for survival. Growth and evolution are embedded in such systems, whilst reward is also a natural feature. Whilst these features cannot be translated meaningfully into autonomic computing systems, the methods used to translate global objectives into local objectives, and to identify and solve problems in biological systems are features which are worth exploring and understanding better. Biological systems perform consistently more reliably than designed systems. They are better at identifying and managing critical situations and conflict. One of the main differences between biological and supply chain networks is that the first take a holistic, unified approach to a given objective, whilst the latter often fail in converting the global objective into local tasks. A myriad of sensors exist in biological systems. They collect information continuously, process it in parallel and in real-time, and decisions are made, both locally and globally. Furthermore, even the simplest forms and instances of biological systems exhibit self-organisation, self-repair and adaptability. A both holistic and rigorous approach and a thorough understanding of the issues to be modelled and implemented are prerequisites for the success of this journey of defining, validating and implementing autonomic computing systems. This vision builds on the ideas presented in enthusiastic and inspirational research papers such as [1, 23]. In more advanced biological systems, the unity of objectives is obtained via centralized control. The global objectives are transferred and interpreted into local objectives and actions. This is facilitated by the fact that all the resources are contained in a unified manner in a single entity, the body. A dependable and effective operation of autonomic computing systems is envisaged [2, 3]. Complex and diverse issues need to be considered in order to implement this objective. For example, a functional equivalent of sleep should
BIOLOGICAL AND SUPPLY CHAIN NETWORKS TO AUTONOMIC COMPUTING
be programmed and embedded in autonomic computing systems. A feasible and practical solution would need to be found if the systems would be required to perform on a continuous basis. Another question which is worth considering is what would be the feasible and required level of autonomy in autonomic computing systems. Human supervision is not necessarily a negative feature, and acknowledging this aspect sooner rather than later could only help. An information set similar with DNA for biological systems should be identified and formally defined for autonomic computing systems. A library of such information sets could then be defined and updated, the elements of which would formally define the properties of autonomous elements which uses a information set similar with the DNA. This would solve some aspects related to the generic solution issue in a cost-effective way, and would allow informed design decisions to be made. Standards and policies for connecting various types of autonomic computing systems would be defined based on this crucial DNA-similar information. The value of information cannot be overstated for either biological or supply chain networks. The trade-offs between generic versus specific solutions also need to be considered. How generic could and should the agents be? What information should be transmitted and processed, and how should this be done? How should actions be taken based on this information? Should either or both of centralised and decentralised information, decision-making and control be used? How generic should the autonomic computing system as a whole be? Economic and computational costs are important issues, and the trade-offs between these and the desired functionality need to be considered carefully. A framework that would allow modelling and reasoning about emergence would represent a major step forward towards modelling both the relationship between individual agent states and the system’s emergent behaviour, and the reverse process of identifying the possible agent states starting from an emergent system state. A prerequisite for the existence of emergence is that the system exhibits at least two observable levels. This is confirmed by [4], which emphasises that “…causation does not simply run in one direction […]. Without understanding the higher level logic in its own right, the lower level data will often be just that: masses and masses of unexplained data.” Could we define generic principles which would describe how should we connect two or more autonomic computing systems such as to preserve and possibly enhance their autonomous properties? What do they need to learn about each other in order to enhance their functionality? In terms of qualitative insights, there is a limit as to which the performance of a poorly designed network can be improved. Also, it is hard to completely separate the structural and operational properties, in either the design or analysis stages, as these properties are closely connected. There are several trade-offs that need to be considered when designing a system, examples of which include flexibility versus complexity versus costs. These trade-offs have been superbly
55
solved, implemented and refined in natural systems through evolution. Possible research questions focused more on the agentbased level interaction would need to address issues related to formal specification of agents at the structural and behavioural level, such as: How should an agent be defined? How is information represented? What are the information processing capabilities of an agent? What are the rules that agents use? What are the features distinguishing agents at different levels: access to different sources of information, different decisionmaking algorithms, and different channel capacities? Should novel computational concepts such as quantum computing be considered to meet some of these challenging requirements [24]? VI. CONCLUSIONS AND FURTHER WORK DIRECTIONS This paper has explored but a few transferable lessons from biological and supply chain networks. Examples of such lessons include the elegant simplicity of the fundamental elements of biological systems and the richness and essential aspect of the DNA as the main mechanism of transmitting key genetic information. Many questions have been raised and would need to be addressed. The next stage should utilise the current knowledge and modelling and reasoning capabilities to address more complex design questions, such as: What are the cost, performance and control implications, in terms of optimality and robustness, of different design and management policies? What mechanisms may be implemented to detect, react and recover from critical situations? The fundamental aim of this work is to identify and pursue means of bridging the gap between the science of biological systems and computer science, and to advance towards a formal theory of emergence. REFERENCES [1] Jeffrey O. Kephart, “Research challenges in autonomic computing”, Proceedings of the 27th International Conference on Software Engineering, 2005, pp. 15-22, http://portal.acm.or/citation.cfm? id=10624 455.1062464 [2] Jeffrey O. Kephart and D.M. Chess, “The vision of autonomic computing”, IEEE Computer Journal, vol. 36, January 2003, pp. 41-50. [3] D. Johnson-Watt, “Under new management”, ACM Queue, vol. 4, iss. 2, pp. 50-58, March 2006. [4] E. J. Crampin, M. Halstead, P. Hunter, P. Nielsen, D. Noble, N. Smith and M. Twahai, “Computational physiology and the physiome project”, Experimental Physiology, The Physiological Society, vol. 89, iss. 1, pp 126, 2004. [5] G. di M. Serugendo, M.-P. Gleizes and A. Karageorgos. “Selforganisation and emergence in MAS: An overview”, Informatica, vol. 30, pp. 45–54, 2006. [6] I. D. Couzin and J. Krause, “Self-organization and collective behavior in vertebrates”, Advances in the Study of Behavior, vol. 32, pp. 1–75, 2003. [7] J. Nolte, The Human Brain: An Introduction to its Functional Anatomy. Mosby, Inc., Elsevier Science, USA, 2nd ed., 2002. [8] L. R. Squire, F.E. Bloom, S.K. McConnell, J.L. Roberts, N.C. Spitzer and M.J. Zigmond, Eds., Fundamental Neuroscience, Academic Press, Elsevier Science, 2nd ed., 2003.
56
CALINESCU
[9] A. R. Muotri and F. H. Gage, “Generation of neuronal variability and complexity”, Nature, vol. 441, pp. 1087–1093, June 2006. [10] J. A. Hobson and E. F. Pace-Schott, Fundamental Neuroscience, Chapter 42: “Sleep, Dreaming and Wakefulness”, Academic Press, Elsevier Science, 2nd ed., pp. 1085–1108, 2003. [11] D. Simchi-Levi, P. Kaminsky and E. Simchi-Levi, Designing and Managing the Supply-Chain. Concept, Strategies and Case-Studies. McGraw Hill Higher Education, 2000. [12] N. Slack, S. Chambers and R. Johnston, Operations Management. Prentice Hall, London, UK, 2001. [13] A. Calinescu. Manufacturing Complexity: An Integrative InformationTheoretic Approach. D.Phil Thesis, University of Oxford, 2002. [14] S. Sivadasan, J. Efstathiou, A. Calinescu and L. Huaccho Huatuco, “Policies for managing operational complexity in the supply chain”, in Tackling Industrial Complexity: the ideas that make a difference, G. Frizelle and H. Richards, Eds., 9–10 April, 2002, Cambridge University Institute of Manufacturing, UK, pp. 549-555, http://www.ifm.eng.cam.ac.uk/mcn/pdf_files/part9_4.pdf. [15] Y. Wu, G. Frizelle and J. Efstathiou, “A study on the cost of complexity in customer-supplier systems”, International Journal of Production Economics, vol. 106, pp. 217–229, 2007. [16] G. Frizelle and J. Efstathiou, “The urge to integrate”, Manufacturing Engineer, vol. 82, pp.10–13, August/September 2003. [17] K. C. Laudon and J. P. Laudon, Management Information Systems: Managing the Digital Firm, 7th ed., Prentice-Hall International, 2002. [18] J. R. K. Rainer, E. Turban and R. E. Potter, Introduction to Information Systems: Supporting and Transforming Business, John Wiley & Sons, 2007. [19] D. Simchi-Levi, P. Kaminsky and E. Simchi-Levi, Designing and Managing the Supply-Chain. Concept, Strategies and Case-Studies, chapter Information Tecnology for Supply Chain Management, McGraw Hill Higher Education, pp. 261–292. 2000. [20] E. Turban, J. R. K. Rainer, and R. E. Potter, Chapter 8: “How FoxMeyer Went Out of Business, Due to a Failed ERP System”, Introduction to Information Technology, 3rd ed., Wiley Higher Education, 2004. http://www3.interscience.wiley.com:8100/legacy/college/turban/0471348 09/add material/ch08/foxmeyer.pdf. [21] R. M. Kanter, “The ten deadly mistakes of wanna-dots”, Harvard Business Review, vol. 79, pp. 91 -100, January 2001. [22] H. L. Lee, V. Padmanabhan and S. Whang, “Information distortion in a supply chain: The bullwhip effect”, Management Science, vol. 143, iss. 4, pp. 546–558, 1997. [23] C. Kovac, “Computing in the age of genome”, The British Computer Society Computer Journal, vol. 46, iss. 6, pp. 593–597, 2003. [24] R. Srinivasan and H. P. Raghunandan, “On the existence of truly autonomic computing systems and the link with quantum computing”, pp. 1-13, 2004. http://arxiv.org/abs/cs.LO/0411094.
Experiences from an Empirical Study of Programs Code Coverage Anna Derezińska Institute of Computer Science, Warsaw University of Technology 15/19 Nowowiejska Street Warsaw, 00-665 Poland Abstract –The paper is devoted to functional and structural testing of programs. Experimental results of a set of programs are presented. The experiments cover selection of functional tests, analysis of function and line coverage and optimization of test suites. The comparison of code coverage results and selection of the most effective tests are discussed in the relation to the testfirst approach for program development. The types of the code not covered by the tests are classified for different categories.
I. INTRODUCTION Analysis of code coverage supports different activities of software development such as assessment of test adequacy, regression testing, test suite minimization, test case prioritization, etc. [1-6]. It was showed that increase in code coverage is likely to increase software reliability [7], although the correlation can be not always satisfied [8]. Experimental studies showed that test suites built to satisfy coverage criteria are better than random test suites, as far as cost-effectiveness is concerned [6]. The fault detection effectiveness of a test suite depends not only on its size, but also on its coverage. Code coverage increases as a function of number of test cases executed, but the increase can have different characteristics [9]. This work reports on empirical studies investigating code coverage conducted on a group of programs implemented in C or C++ language. For each program a basic set of functional test cases was prepared on the basis of its requirements. Tests were devoted to typical usage of a program and erroneous situations. The function and line code coverage was measured for the test suites using IBM Rational PureCoverage tool [10]. It was analyzed to what extend the functional tests, which could be designed before code implementation, satisfied also criteria of code coverage. With the tool support additional tests were developed in order to increase code coverage. The number of additional test runs and the achieved coverage improvement are discussed. Achieving high coverage levels (e.g., close to 100%) seems to be effective in terms of fault detection [6]. High level of coverage is especially important for faults that are highly difficult to detect. In the literature, encouragements for achieving high coverage can be found. However we cannot easily determine which coverage level is sufficiently high. The limitations of possible and reasonable coverage improvement were investigated in the experiments.
Reduction of test suites based on coverage criteria was studied in many works using different filtering techniques [4,5]. The general idea is to select a subset of tests that is as good as the whole original test suite, under given requirements. The selection of effective test suites that satisfy different criteria is especially important when the same tests are run many times, e.g., in the incremental program development and regression testing. The optimal solution can be non-effective, because it is an instance of a NP-complete problem [11]. In the experiments we studied quasi-optimization of test suites, which have a minimal (or near-minimal) number of tests but assure high code coverage. Test-driven development (TDD) gained popularity as one of elements of Extreme Programming (XP) method [12]. It is claimed to improve test coverage [13], and testing productivity [14]. On the other hand, the average quality of programs is not evidently higher than the programs developed using other approaches [14,15]. One of the key practices of TDD is that programmers write functional tests before code implementation, so-called test-first approach. This testing strategy can be used not only in the context of agile methodologies, like XP. We do not directly studied TDD methods in the experiments, but some aspects of test-first method in the context of code coverage results were visible. It was showed, that first-test development was not sufficient to obtain high code coverage. Moreover, the essential part of tests included in optimized test suites was developed after code implementation. It seems that combining of different approaches, e.g., test-first and iterative test last, can give better results than using only one. In the experiments, any test was equivalent to a single testing run of a program, possible consisting of executing many testing steps. It is a difference to unit testing, in which many unit tests are performed in one testing run. In the analysis of results, the efficiency of testing process was compared in numbers of testing runs (tests in short), and not in terms of the number of conditions checked within one run. The scenario of experiments and basic information about the examined programs are given in the next section. In Sec. III the tests of programs are presented. Next, results of code coverage experiments and improvement of test suites are analyzed. Sec. VI comprises classification of types of uncovered code. Conclusions and threads of validity discussion end the work.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 57–62. © Springer Science+Business Media B.V. 2008
DEREZIŃSKA
58
II. EXPERIMENT CONFIGURATION This section describes steps performed during experiments and programs used in them. A. Experiment scenario Examination of the programs was performed according to the following scenario: 1. Preparation of the basic functional tests. 2. Compilation of projects developed in C or C++ language. 3. Execution of tests using code analyzer IBM Rational Purify [10]. 4. Correction of the faults found in the previous step (if necessary). 5. Realization of functional tests under control of a code coverage analyzer - IBM Rational PureCoverage [10]. Comparison of numbers of calls, numbers of called and missed functions and code lines, as well as percentage of dynamic coverage for particular modules, files and functions. Inspection of uncovered code lines in the source code. 6. Comparison of merged coverage results (from all test runs) with results for the selected subsets of tests 7. Design of new test cases in order to increase code coverage (if necessary and possible). Statement of the maximal obtained code coverage. 8. Evaluation of the quasi-optimized test suite - a set of tests that has a minimal number of test cases and guaranties the maximal code coverage, i.e., that obtained in the previous step (see notation in the Sec. III.B). 9. Analysis of the uncovered code. 10. Execution of tests under control of IBM Rational Quantify [10]. Analysis of program performance, critical paths in graphs of program realization, identification of execution bottlenecks. 11. Attempt of improvement of program performance (execution time) assuming the same functionality. In the rest of this paper the results of code coverage analysis, steps up to 9, will be discussed. Using performance analysis the execution time was shortened only for few programs. The execution times were generally highly influenced by library functions used in the programs. B. Subject programs As a subject for experiment we used 50 C or C++ programs. The most of programs (34) were prepared by students from the 6th semester of Computer Science as projects in different courses, e.g., courses of compilation techniques, analysis of algorithms, object-oriented programming, programming of interactive applications, etc. Other programs (10) were prepared by students not as course projects but for fun or at work for professional purposes. The rest of programs (9) were open-source projects taken from Internet. Analyzing the code coverage we were mainly interested in the code written directly by developers of the applications. Therefore we distinguished between the whole code of an application and a developer code without libraries, which
constitute also the part of the application. A developer code of a program had on average about 40 functions (or methods) or 430 lines of code (including comments). The most of programs were small, up to 500 LOC (lines of code), nine programs were of the range 500-745 LOC, and five programs had from 1000 to 1550 LOC. For the comparison, the same programs including all library functions had on average 257 functions (or methods) and 1067 LOC. The excluded libraries were mostly the standard libraries of C or C++ language. System calls realized during the program execution were not excluded from the consideration of program coverage. III. TESTS OF PROGRAMS This section discusses different types of tests that were included in the basic suites and created additionally after preliminary code coverage analysis. A. Basic functional tests Basic test suites comprised functional tests prepared before analyzing code coverage of a program, independently of the discussed experiments. These sets of tests usually consisted of 1 to 24 test cases (6.5 on average), but the considered programs were not big (Sec. II). Furthermore, a single test case corresponded to one program run and usually was defined by a whole testing scenario comprising a series of steps. Therefore cardinalities of the suites are not comparable to typical suites of unit tests, which could usually have a bigger number of smaller tests. Tests from the basic suites were prepared by the authors before coding of an application, as a part of project specifications (for different courses). Other tests and tests for the programs from Internet were developed for already developed programs. However, all the tests were created basing on specifications on the program functionality and information about the program input/output interface. No detailed knowledge about the program implementation were used, therefore all tests from basic suites could be treated as satisfying the test-first paradigm. Basic sets of tests consisted mainly from the tests checking the conditions of correct functionality of programs. It referred to 80.6% tests in the ratio to all basic tests (Tab. I). TABLE I KINDS OF TESTS Additional Extended tests suites % of % of % of % of % of % of test progr. test progr. test progr. 80.6 100 45.0 60 67.0 100 Basic suites
Functional tests for correct behavior Incorrect functioning Checking Incorrect inputof output files incorrect behavior Incorrect input parameters Additional, e.g. help, program identification Performance and stress tests
12.3 3.1
22 40.5 10 23.0
50 23.1 26 6.3
60 34
0.9
4 10.0
26
4.4
26
2.5
8
6.5
16
4.0
22
1.5
10
-
-
1.0
10
EXPERIENCES FROM AN EMPIRICAL STUDY OF PROGRAMS CODE COVERAGE
Many tests were created using the concepts of equivalence partitioning in the black-box testing. All kinds of algorithms realized by the programs and different combinations of input parameters and different ways of communication with programs were taken into account in the basic tests. About 13% of tests checked especially some boundary conditions of data. Tests from the basic suites were also devoted to checking of program behavior in incorrect situations. In 12.3% of tests a possibility of erroneous realization of algorithms was checked, but it was realized only in 22% of the programs. In some cases program resistance against faulty input/output files (3.1% tests in 10% of programs) or faulty input parameters (0.9% tests in 4% of programs) were checked. For some programs (8%) the basic sets included also tests checking additional program functionality, like usage of help etc. Only 1.5% of basic tests (in 10% of programs) were dedicated to performance measure or stress testing under an excessive overload. B. Notation of test sets Let us assume the following notation for different test suites defined for program P: BT(P) - set of basic functional tests defined for program P, AT(P) - set of additional tests created for program P, FT(P) - set of all tests defined for program P, OT(P) - selected optimized set of tests of program P - i.e., one of possible quasi-optimal set of tests of program P (Sec. V.B). For any program P the following relations between different test suites exist: FT(P) = BT(P) ∪ AT(P), (1) OT(P) ⊆ FT(P), (2) LCov ( OT(P) ) = LCov( FT(P) ) and FCov ( OT(P) ) = FCov ( FT(P) ), (3) where: LCov ( X(P) ) - line coverage obtained after executing program P against the set of its tests X(P), FCov ( X(P) ) - function coverage obtained after executing program P against the set of its tests X(P). It can be observed that from (1) and (2) follow directly: | FT(P) | = | BT(P) | ∪ | AT(P) |, (4) | OT(P) | ≤ | FT(P) |, (5) where: | X(P) | - number of tests of program P in the set X(P). C. Additional tests improving code coverage Additional tests (AT) were created after analysis of the code that was not covered by the basic tests. The distribution of types of these extra tests is shown in Tab. I. An important part of tests dealt with testing of correct functionality which was not taken into account in the basic tests (45% of additional tests, in 60% of programs). It should be noted, that additional test cases often combined different tested subjects (e.g., checking of both correct and incorrect program behavior). In such cases the same test can be counted to the different test types in Tab. I. Boundary conditions were checked in 22.5% of
59
tests cases (and 48% of programs), that is more than in basic tests. More than a half of additional tests referred to the checking of faulty situations. Errors in realization of algorithms were checked using 40.5% of tests (in 50% of programs), in 23% of tests the input/output files were checked (in 26% of programs), 10% of tests were devoted to checking of call parameters (also in 26% of programs). It showed that faulty situations were not sufficiently considered in the basic test suites. In order to trigger the faulty situations, some additional tests required the change of the running environment, system settings etc. For example: executing a program when a swap file was blocked/locked, or with limited access rights. In order to increase code coverage, also other program options were checked by additional tests (16% of tests, in 4% of programs). Additional tests of a program AT(P) were often created modifying, enhancing and combining existing basic tests of the program BT(P). In further steps it benefited in the easier optimization of a number of testing runs necessary to obtain desired code coverage. Two last columns of Tab. I summarizes the types of tests for the entire extended sets of tests FT(P). IV. RESULTS OF CODE COVERAGE ANALYSIS A. Code coverage for basic test suites Functions/methods coverage was high even for the basic test suites (BT) and equal to 92.4% on average (Tab. II). For a half of the programs (52%) all functions were covered. Not full coverage, but above 90% was obtained for 20% of programs, coverage in the range of 90-80% for 12% of programs and in the range of 70-80% also for 12% of programs. Only for two programs the coverage was lower (63% and 50% respectively). Line coverage for basic test suites was 87.2% on average. Almost a half of programs (46%) had line coverage higher than 90%, but only one program equal to 100%. Line coverage in the range of 90-80% was for 34% of programs. In the rest of programs (20%) less than 80% lines of code were covered. B. Maximal code coverage The code coverage was in the most cases increased using additionally created tests (AT). Maximal function/methods coverage obtained for the extended test suites was equal to 95.7% on average. The most of programs was covered in above than 90%, including 58% of programs fully covered (100%) and 25% not fully (less than 100%). The coverage in the range of 90-80% was for 8 programs, and only 3 programs had less than 80% of functions covered. TABLE II CODE COVERAGE
IN [%]
Average coverage Average coverage for Coverage for basic test suites extended test suites increase Function Line Function Line Function Line All programs 92.4 87.2 95.7 93.6 6.7 6.5 Open-source 86.2 83.0 95.0 93.0 13.3 10.0 Programs
60
DEREZIŃSKA
Maximal line coverage obtained for the programs was equal 93.6% on average. Full line coverage (100%) was achieved in 10% of programs. The most of programs (68%) had covered the most of the code (above 90%, but not fully). Coverage of 20% of programs was in the range of 80-90%, and only one program had covered less than 80% of lines. As noted in Sec. II code coverage was analyzed and optimized only for the code developed by a programmer, without library code. The maximal code coverage obtained for the whole programs (i.e., including the library code) was equal to 84.5% for functions and 77.8% for code lines. C. Improvement of code coverage Averaged improvement of code coverage is shown in the two last columns of Tab. II. They include the differences of code coverage obtained for all tests and for the basic tests. On average the coverage increased on 3.2% for functions, and on 6.4% for code lines, respectively. However, if the coverage was equal to 100% for the basic tests of a program the improvement was not possible. Therefore the increase was calculated only as the ratio to other test cases. Such effective, averaged increase of coverage was equal to 6.7% for functions and 6.5% for code lines. According to expectations, the biggest improvement (above 10%) was obtained for the programs that had coverage less than 80% for the basic tests. In the most cases the lower was the initial line coverage LCov( BT(P) ) the higher was the increase of coverage. However, there were also programs with uncovered code (coverage about 60 or 70%), for which neither function coverage nor line coverage was improved. The most of programs (41) were tested by their authors or by colleagues of authors that could cooperate with them. Only open-source programs taken from Internet (9) were tested without any interaction with program designers. Coverage results for the basic tests were worse in the later cases (Tab. II). However after creation of additional tests the averaged results for the programs from Internet were about 95.0% for functions and 93.0% for code lines, i.e., they were similar to results of other programs. V. MINIMIZATION OF NUMBERS OF TESTS IN TEST SUITES The number of test cases in a test suite is a simple but not very precise measure of a testing process. Complexity of test cases was different even for the tests of the same program. Furthermore, a low number of tests made hard a statistical analysis of the collected results. Comparison of number of tests gave, however, the basic estimation of the effort required to perform the program tests, because each test corresponded to one execution run of a program. During experiments the following questions were considered: - How many more tests have to be executed in order to increase (maximize) code coverage of a program? - How many tests can be omitted without loosing the maximal obtained code coverage? - To what extend the additional tests (i.e., designed in order to increase code coverage) are used in the optimized test suites?
A. Basic and extended suites of tests The average numbers of tests are shown in Tab. III. Basic sets of functional tests comprise 6.5 test cases (from 1 to 24 tests) (column 1 in Tab. III). Only for one program basic set gave 100% function coverage and 100% line coverage. The participants of experiments tried to design additional tests for all remaining programs. TABLE III IMPROVEMENT OF TEST SUITES 1. Basic suite 6.5
2. Extra tests 4.0
Average number of tests 6. 5. 4. 3. Sum of Optimized Extra tests in (4/3) optimized suites suite tests 10.5 7.3 3.1 70
% 7. (5/3)
8. (5/4)
30
42
For the most of the programs, few additional tests were created (on average 4 per program). Only for 14% of programs no additional tests were built that improved the code coverage. The maximal obtained code coverage was achieved for 10.5 tests per program on average (from 2 - to 26 tests). These extended test suites could be treated as the “best” suites in accordance to code coverage criteria. B. Optimized suites of tests Optimized suites OT(P) intended to include possible minimal number of test cases. In the same time these suites should give the same maximal code coverage as obtained for all tests in the extended test suite. In general more than one test suite satisfying above conditions for a program can exist. In such cases one of such minimal test suites was considered. Selecting a minimal-size, coverage-maximimizing subset of a test suite is in general an instance of the set-cover problem and therefore is NP-complete [11]. In praxis, different heuristics can be used [4,5]. In the discussed experiments the numbers of tests per program were small; consequently it was easy to compare coverage for different, representative test subsets. In case the number of different combinations of tests was to high, test selection based on a greedy approach. We started with a core of tests covering the biggest numbers of lines (or functions, respectively). Next, only those tests were added to the optimized sets that were indispensable to cover the uncovered parts of the code. The tests covering the bigger parts of code were selected first, in case of many possibilities. In the result, the optimized suites can be treated as nearminimal sets that give the same coverage that was obtained using all tests from the extended and improved test suites. For 26% of programs the optimized sets were the same as the full ones OT(P) = FT(P). For the rest of programs lower number of test runs was sufficient to obtain the same code coverage. The average number of tests, and therefore a test runs, in these optimized suites was equal 7.3 (column 4 Tab. III). In comparison to the basic tests only 1.3 additional test runs was necessary in order to obtain the desired code coverage. For 24% of programs the obtained optimized suites had even less tests than the basic test suites. This was sometimes achieved by
61
EXPERIENCES FROM AN EMPIRICAL STUDY OF PROGRAMS CODE COVERAGE
making a single test case more complex, but resulted in lower number of program runs. The essential part of additional tests was included in the optimized test suites (on average 3.1 tests from 4 - column 5 Tab. III). For the most of programs (64%) all additional tests were used in the optimized suites. However, it should be noted that many additional tests were designed as a modification of some basic tests (Sec. III.B). The numbers of tests are compared in columns 6, 7, 8 of Tab. III. Column 6 gives the ratio of number of optimized tests to all tests. On average 70% of all tests were counted to optimized suites. The ratio of additional tests included in optimized suite to all tests is calculated in column 7. It shows that 30% of tests that were indispensable for the maximal code coverage were created after analysis of implemented code. This value varies from 0 to 80% for the particular programs. The last column indicates that many of tests (42%) from the optimized suites were the additional tests. VI. ANALYSIS OF UNCOVERED CODE The amounts of uncovered code are summarized in Tab. IV. After testing with basic tests 7.6% of functions and 12.8 % of lines (for the code without libraries) remained not covered. After extension with additional tests 4.3% of functions and 6.4% of code lines were still uncovered. The types of uncovered code were classified in 6 categories (Tab. V). Column “How many?” shows the number of programs in which particular types of code remind uncovered. The last column gives the ratio of programs classified to that category to all programs having an uncovered code. In the most cases, an uncovered code was very hard or impossible to be covered. An essential part of this code related
to handling of different faults. For dealing with faults, exception handling (I category) or other checking mechanisms were used. Typical programming solution is checking of values returned by system functions and handling unsuccessful result (e.g. lack of memory) with exceptions. However, forcing of exceptions can be very inconvenient. Therefore the code of exception handling remained uncovered in 17% of programs. Attempt of covering such code can be not economically justified. This part of code was usually limited and could be verified using fault injection methods or code inspection. TABLE IV UNCOVERD CODE Average function coverage Average line coverage Number of functions [%] Number of lines [%] Basic test suites 4.2 7.6 59.3 12.8 Extended test suites 2.7 4.3 36.0 6.4
The biggest part of the uncovered code corresponded to the code that cannot be covered for any test case (category III). Such situation took place in 23% of programs with uncovered code. This code resulted partially from evolution of programs (III.1). The obsolete, unused code cannot disturb a program execution but can decrease testing efficiency and raise the cost of program maintenance. In some such cases code reengineering could be beneficial. However, may of the unused code is foreseen to be possibly used in the future (III.2, III.5). Unused code can also result from the automatic generation of code (or code templates) or from re-usage of code. The unused code cannot be covered, but classifying a code to this category can be a non-trivial task.
TABLE V TYPES OF UNCOVERD CODE (FOR EXTENDED AND FOR OPTIMIZED TEST SUITES)
Types of uncovered code I I.1 I.2 II III III.1 III.2 III.3 III.4 III.5 IV IV.1 IV.2 IV.3 IV.4 IV.5 IV.6 V VI
Exception handling (especially control of unsuccessful call of system and library functions) Including: Control of unsuccessful open, read, write of a file or input/output stream Lack of operational memory, disc memory Fault handling (without exception mechanism), code run only in case of a fault occurrence Code never used (inaccessible) Including: Surplus code (because of construction errors or program evolution) Unused code - can be used in the future (constructors, operators, other functions) Automatically generated constructors, destructors Repeated (redundant) verification Functions without body (stubs) Other difficult or conditionally accessible code Including: Not performed branches in decisions, E.g. default branch in switch instructions, else constructions in menu handling Not evaluated parts of conditional expressions E.g. If (con1 && con2) - if con1 jest false then con2 will be not evaluated Code configuration dependant (other operating system, additional hardware) Diagnostic functions for code development; #ifdef _DEBUG, dump function Code dependant on the randomly generated data Returning of a function value, e.g., WinMain Other code, possible but inconvenient to be tested (e.g., tangled code , help windows) Others
Number of programs How many?
[%]
17 5 4 12 23 5 12 2 3 2 15 4
37.8
26.7 51.1
33.3
2 2 7 1 1 3 3
6.7 6.7
DEREZIŃSKA
62
A part of uncovered code could be covered but was difficult accessible (category IV). For example, a program should be run on two different operating systems or using special hardware. A code used only during development time is not used in normal test runs and therefore will be not covered. The last category (others) includes, for example, virtual functions defined in classes. It was a direct consequence of programming in an object-oriented language. VII. CONCLUSIONS The paper presented results of code coverage experiments on 50 C/C++ programs. The results indicated on the possibility of test suits improvement and gave arguments both for and against creating tests before coding. The following observations can be derived from the presented results: - Basic test suits (that could be developed before coding based on requirements specifications) gave quite high code coverage; on average 92.4% for functions and 87.2% for code lines. - Higher code coverage (95.7% for functions and 93.6% for lines) was easily obtained by non experienced programmers supported by the tools. - Many of test cases (47%) included in the optimized test suites (i.e., effective for regression testing), belonged to additional tests. They were finally specified after running a program with basic tests and not before coding. - The number of tests could be limited in a set satisfying the same maximal code coverage. Optimization of the number of test runs can be especially beneficial for repeated testing during the incremental development and regression testing. - The code not covered by the test runs corresponded often to error handling (especially exception handling - 37.8% of programs). A part of code was also not accessible by any tests (in 51.1% of programs) or very hard accessible (33.3%). Tools supporting code coverage analysis indicated parts of uncovered code. Tool supported classification of the uncovered code would be very helpful in optimizing testing effort and the efficient design of additional tests if possible. They could identify the parts of code that should be verified by other methods, e.g., fault seeding or code inspection. It would be also beneficial to distinguish the code that could not be covered. While performing experiments several threads of validity should be taken into account. A thread to the mono-operational bias was lowered by conducting experiments on many different programs. The coverage results were gathered from the bigger population than 50 programs. In the final analysis the subjects not conformed to a defined process were discarded. The external validity of results could be limited since the experiments were performed by students. However, it was
showed in [16] that students can be used in such experiments. Moreover, all participants of experiment had some industrial experience (at least half a year), mostly in program development. All programs were relatively small, but the procedure scale up. The small number of tests allowed performing experiments in the reasonable time limits and under similar conditions. ACKNOWLEDGMENT This work was supported by the Polish Ministry of Science and Higher Education under grant 4297/B/T02/2007/33. REFERENCES [1] [2] [3] [4]
[5] [6]
[7] [8] [9] [10] [11] [12] [13] [14] [15]
[16]
B.Beizer, Software testing techniques, second ed. Van Nostrand Reinhold, 1990. S. Elbaum, D. Gable, G. Rothermel, “The impact of software evolution on code coverage information”, Proc. of IEEE Intern. Conf. on Softare. Maintenance, 7-9 Nov. 2001, pp. 170-179. J. J. Li, D. Weiss, H. Yee, “Code coverage guided prioritized test generation”, Information and Software Technology, no. 48, 2006, pp. 1187-1198. W. Masri, A. Podgurski, D. Leon, “ An empirical study of test case filtering techniques based on exercising information flows”, IEEE Transaction on Software Engineering, vol. 33, no. 7, July 2007, pp. 454-477. D. Jeffrey, N. Gupta, “Improving fault detection capability by selecting retaining test cases during test suite reduction”, IEEE Transaction on Software Engineering, vol. 33, no. 2, Feb. 2007, pp. 108-123. J. H. Andrews, L. C. Briand, Y. Labiche, A. S. Namin, “Using mutation analysis for assessing and comparing testing coverage criteria”, IEEE Transaction on Software Engineering, vol. 32, no. 8, Aug. 2006, pp. 608-624. F. Del Frate, P. Garg, A. P. Mathur, A. Pasquini, “On the correlation between code coverage and software reliability”, Proc. of 6th Intern. Symp. on Software Reliability. Engin., 24-27 Oct., 1995, pp. 124-132. L. C. Braind, D. Pfahl, “Using simulation for assessing the real impact of test-coverage on defect-coverage”, IEEE Transactions on Reliability, vol. 49, no. 1, March. 2000, pp. 60-70. S. S. Gokhale, R. E. Mullen, “From test count to code coverage using the lognormal failure rate”, Proc. of 15th Intern. Symposium on Software Reliability Engineering, ISSRE”04, 2004. IBM Rational tools: http://www-306.ibm.com/software/rational/ M.R. Garey, D.S. Johnson, Computers and Intractability: aguide to the theory of NP-compleyeness, Freeman and Company, 1979. K. Beck, Extreme programming explained, Sec. Edition Embrace Change, Addison-Wesley, Boston, MA, USA, 2004. D. Astels, Test-Driven Development: A practical Guide, Prentice Hall, NJ, USA, 2003. H. Erdogmus, M. Morisio, M. Torchiano, “On the effectiveness of the test-first approach to programming”, IEEE Transactions on Software Engineering, vol. 31, no. 3, March 2005, pp. 226-237. M. Siniaalto, P. Abrahamsson, “Does test-driven development improve the program code? Alarming results from a comparative case study”, Proc. of 2nd IFIP Central and East European Conference on Software Engineering Techniques, CEE-SET, Poznań, 2007, pp. 125-136. M. Höst., B. Regnell, C. Wohlin, “Using students as subjects - A comparative study of students and professionals in lead-time impact assessment”, Empirical Software Eng., vol. 5, 2000, pp.201-214.
A Secure and Efficient Micropayment System Anne Nguyen and Xiang Shao School of Information Technology, Gold Coast Campus Griffith University, QLD 9726, Australia Corresponding Author: [email protected] Abstract: In this paper, we analyze some key issues concerning systems for processing Internet micropayments. At the heart of the matter is the trade-off between security and cost considerations. A set of criteria is proposed, against which a number of previously proposed and currently operational micropayment systems are reviewed. We also propose a model, called WebCoin, embodying a number of innovations that may enhance micropayment systems currently in use.
I. CHARACTERISTICS OF MICROPAYMENTS Micropayments are very small sums of money, typically used to pay for content access or for small quantities of network resources [1]. While there is no consensus as to exactly how small these sums should be to qualify as micropayments, and some authors do include payments as small as fractions of a cent in their consideration, most discussions have revolved around the range of one cent to a dollar as the typical size of a micropayment, compared with a range of one dollar to ten dollars as that of “minipayments”. The proliferation of content providers over the global information infrastructure and the easy access to this network have combined to offer countless opportunities for mutually advantageous transactions involving snippets of information and other digital material, such as multimedia contents, news items, health updates, images, database content, and so on. To date, providers of such material have generally not charged their users in direct connection with usage, preferring to recoup their costs instead via other means, such as advertising or subscription. A key reason for this has been the fact that frequently the cost of processing the relevant micropayment would far exceed the nominal value of the payment itself. This also explains why most payment systems that have been developed over the past decade or so to facilitate the fast-growing volume of e-commerce are unsuitable for micropayments. For example, payment schemes based on credit cards have become widely used, but they typically require authorization of each payment via contact with an issuing bank, thus adding a time delay as well as increasing the processing costs in purely monetary terms. It has been estimated that it costs about 25 cents to process a credit card transaction [2]. Similarly, the processing costs of e-checks are about 10 cents per transaction. It is clear that for micropayments to be viable on a pay-per-use (also known as
pay-as-you-go, or pay-by-click) basis, their processing costs need to be much lower than above. II.
SOME KEY ISSUES CONCERNING MICROPAYMENT SYSTEMS
For any payment system, security is always an essential issue, for without adequate security the system will ultimately break down from fraudulent usage by consumers, overcharging by vendors, or abuses by unauthorized third parties. As summarized by [3], [4] and others, the main aspects of security include confidentiality; authentication; data integrity and non-repudiation. It is recognized, however, that security is to some extent a subjective matter, in that different users will have different notions and requirements as to what a secure payment system entails [5]. The payment system also has to be able to gain acceptance by the users [6] and this often depends on the costs and time factor associated with using the system. This issue becomes critically important in the context of micropayments, as the values of the purchases themselves are typically tiny and can easily be outweighed by the transaction costs involved. Thus, micropayment systems (MPSs) need to involve minimal operational costs and inconvenience, or the consumer may simply opt not to proceed with the transaction. The trade-off between these two major competing requirements (low cost and security) lies at the heart of the difficulties in designing a practical and effective micropayment system. In principle, a more secure system usually requires higher processing costs. There is a need, therefore, for the right balance between security and cost considerations. In general, a set of payments can be considered safe if the costs of breaking its security are higher than the value of the payments involved. Although the literature relating to the characteristics of an ideal MPS has become quite voluminous, upon reflection, it would appear that four requirements stand out as essential: (1) The micropayment system in question must minimize per-transaction processing costs. (2) It must be capable of minimizing user fraud, such as forgery and double-spending. (3) It must be easy and convenient for the customer to use. (4) It must be able to perform transactions without long delays. Criteria 1, 3 and 4 tend to be consistent with each other, but often conflict with Criterion 2. Criteria 1 and 2 are of
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 63–67. © Springer Science+Business Media B.V. 2008
64
NGUYEN AND SHAO
utmost importance to vendors. Criteria 3 and 4 are of more relevance to customers. In the next section we will review some of the most popular MPSs that have been proposed and/or implemented in recent years, keeping in mind the four criteria listed above. III. EXISTING MICROPAYMENT SYSTEMS The origins of MPSs can be traced back to the latter half of the 1990s, when providers such as PayWord, MilliCent, NetBill, MicroMint, NetCard, DigiCash etc, began to offer services but generally failed to generate enough business [6]. Some of the proposed systems never even went into real operation. It appears that these first-generation MPSs failed largely because of user resistance at the time: consumers had become used to getting online content for free and resisted the idea of having to pay. In addition, the transaction costs involved were generally high. Thus, the development of MPSs became stagnant until the last several years. A number of new companies, dedicated to processing micropayment transactions on the Web with low costs have introduced commercial services. These second-generation MPSs are often run as an essentially local or national operation, such as PepperCoin, BitPass (both of these being focused on the US market), FirstGate (Germany), PayStone (Canada), PayAsYouClick (U.K.) and so on. A.
First generation MPSs MilliCent (see [7] and [8]) was developed by Digital Equipment and its public trial was started in 1997. Its fraud prevention capabilities (Criterion 2) are good, but it is not simple to use (Criterion 3), as the customer is required to buy vendor-specific scrip (tokens) for each new vendor that s/he wishes to deal with. NetBill (see [9] and [10]) has been trialled locally at Carnegie-Mellon University since 1997. Its performance on Criterion 2 is very high, due to comprehensive security measures, including online verification with the system’s server for each transaction. As a result, however, the system performs poorly on Criterion 4 (speed of processing) and on Criterion 1 (low cost of processing). MicroMint was proposed by Rivest and Shamir [11] to provide a reasonable degree of security (Criterion 2) and yet a very fast processing speed (Criterion 4). However, it involves high initial costs, especially as the system’s broker would need to mint large quantities of coins at the beginning of each period, to be sold to the user. Inevitably many of these will be unused, and can be returned, but the processing costs would have been already incurred and wasted. NetCard [12] was developed and demonstrated at Cambridge University. Its security performance is high, due to a range of measures and features, including the use of smart cards. In practice, however, smart cards have not gained wide acceptance and therefore this prerequisite becomes a detraction from its ease of use (Criterion 3). Micro-iKP [13] was proposed by IBM and makes use of its iKP system. It provides a very high security level, due partly to the extensive use of RSA public-key cryptography
and frequent online authorizations. Not surprisingly, these also imply high processing costs (Criterion 1). The NetCents system [14] has a trusted Arbiter to guarantee delivery of purchased goods and help resolve complications that may arise from time to time. Its performance on Criteria 2 and 3 are good, but the added burden of communication and computation detract from Criterion 4 and raise processing costs. B.
Second-generation MPSs The NetPay system [15] was built on the PayWord architecture proposed by [11]. The philosophy of PayWord is to reduce the number of public key operations required per payment by using hash functions. Signed messages (called paywords) are sent from the customer to the vendor for redeeming with the broker. A major deficiency of the PayWord system is that paywords are vendor-specific, so that the customer must perform multiple public key operations to correspond with the various vendors involved. Another deficiency is that the system provides only a low level of protection against double-spending by the customer across vendors. NetPay addresses both of those deficiencies in the PayWord architecture, by using touchstones signed by the broker and indexes signed by each vendor, to be passed on to another vendor, to relay essential information about the status of the customer’s spending relative to its limit. NetPay is characterized by de-centralized off-line processing, customer anonymity and relatively high performance with respect to Criteria 2, 3 and 4, but there is scope for enhancement with respect to Criterion 1. BitPass [16] is a Web-based MPS developed in 2002 at Stanford University. It is a pre-paid account-based system that incorporates a “virtual card” payment concept, like a prepaid phone card, allowing the user to make payments for purchases of digital content. Its performance on Criterion 3 is very high, as it provides both the seller and consumer with ease of use. For consumers, the BitPass virtual card allows them to buy items as they go without the need to keep a physical store-value card or to have an e-wallet installed in their home PCs: the BitPass service integrates into the consumer’s browser without requiring any plug-ins or applications to be downloaded. For online sellers, BitPass offers simplicity in setting up: they only need to download a single file and upload it to their web server, change a few permissions and make BitPass enabled. However, a key disadvantage of BitPass is that it requires the consumer to know in advance how much he or she will eventually want to pay online [17]. Further, it is somewhat vulnerable to threats of forgery and double-spending, thus performing relatively poorly on Criterion 2. PepperCoin [18] founded by the Massachusetts Institute of Technology in 2002, is another micropayment processor that is built on the existing payment infrastructure, based on cryptography, digital certificates and mathematical algorithms first introduced by Rivest in 1997 [19]. It aggregates many small purchases into larger credit transactions. To cut cost, it waits until there are enough
65
A SECURE AND EFFICIENT MICROPAYMENT SYSTEM
purchases to make it worthwhile to submit the charge to the credit card company [20]. Its performance on Criteria 1 and 2 is very good, but the system is relatively complicated to set up, requiring both consumers and merchants to download software and installing successfully. Further, credit card information is required at the user’s account sign-up. Besides those examples, a number of other systems have been put into operation, such as the Europe-based ClickandBuy payment system [21] developed by FirstGate, Payloadz’s payloadz.com system, and Paystone’s paystone.com system. Several systems have also been recently proposed, including Auletta’s web-based system [22], Lee’s web-based micropayment solutions [23], and Parhonyi’s micropayment system [24]. From the brief review above, it appears that technical enhancements can be made to existing MPSs in two respects. First, there are instances where public-key operations can be avoided or replaced with computationally cheaper hashing operations, without unduly compromising security. Second, there are ways in which the number of hashing operations itself can be reduced. In the next section we outline a proposed system model, called WebCoin, which would offer these benefits. IV.
PROPOSED MODEL INCORPORATING ENHANCEMENTS: WEBCOIN
Signature schemes and hash functions are two of the most important cryptographic techniques in developing an epayment system. Hash functions such as SHA-1 [25] take as input the message to be verified (of arbitrary length) and return a fixed-length message digest. If the receiver’s calculation results in an identical digest, this would be an indication that the message’s integrity remains intact. Digital signatures involve message digests as well as the use of cryptography to ensure authentication and enhance security. One of the best known algorithms for generating publicprivate key pairs suitable for such purposes is the RSA algorithm (Rivest, et al, 1978) [25] which involves the computation of modular exponentiations. As Rivest pointed out, hash operations are about 100 times faster than RSA signature verification, and 10,000 times faster than RSA signature generation. Thus, to reduce processing costs, the hash function should be applied wherever public key cryptography is not really essential, keeping in mind that the sums at stake in micropayments are generally tiny and the associated risks of financial loss correspondingly small. Secondly, it is possible to significantly reduce the number of hash operations required when a relatively large-value payment occurs. Whereas PayWord and NetPay provide for only one hashed payword chain (a coin-chain) at a time, we suggest using a pair of coin-chains and coin indexes to keep track of them. Cione and C ten , with their indexes i and j, j represent a one-unit coin and a ten-unit coin, respectively. (Without loss of generality, it may be assumed that the unit is one cent.) The one-unit chain is used for a sum of less than
10 cents, while the ten-unit chain is used for sums involving multiples of ten cents. The use of the latter chain can reduce substantially the total number of hash computations required when, say, 99 cents are involved: the single-chain approach requires 99 coins while the double-chain approach requires only 18 coins. Using coin indexes is another way to reduce the number of hash operations. Coins in each chain are created in sequence by the user as needed, starting from the root coin made by the broker, and ending at the last spent coin – the indexed coin. The latter, in turn, serves as the root coin for generating the next coin at a subsequent purchase. During the coin verification processing, instead of passing a whole chain of coins, only the last coin and its index need to be sent to the vendor. The vendor then verifies the last coin value by using the previous recorded index, and at the same time stores the valid value and current index for future verification. The WebCoin Model The design of the proposed WebCoin micropayment system incorporates the above innovations. We now provide a brief outline of its overall architecture, main components and working principles. For further detail, see [26]. There are three main players in the system: user, vendor and broker. The system’s net server acts as a broker. WebCoin provides an offline micropayment system using lightweight hashingbased encryption. Only users and vendors are involved in each payment transaction. The broker is responsible for the registration of users and vendors, the detection of users’ double-spending and vendors’ over-charging, and the maintenance of vendors’ and users’ accounts. At the beginning, the broker sets up accounts for the vendor and user, who has to place a minimum deposit into his account. Then the broker imposes a maximum spending limit (Max) for the user and creates two root coins - C0one and
C0ten - by randomly picking up and hashing two large numbers, x, y, C 0one = h( x) , C 0ten = h( y )
(1)
These one-cent coin and ten-cent coin will serve both as root coins for the user (to mint further coins) and as coin authenticators for the vendor. The broker subsequently issues the user with a certificate ( U Cerf ). Signed with the broker’s secret key ( SK B ), the certificate contains the broker’s identification ( BID ), the user’s identification ( U ID ), the user’s public key ( PK U ), expiration date (E), and information about coin-chain (C), U Cerf = { BID , U ID , PK U , E, C } SK B
(2)
where C contains root coins and Max. It is defined by (3) C = (C 0one , C 0ten , Max) To initiate a purchase, the user has to sign a purchase commitment message (PC) and send it to the vendor. The
66
NGUYEN AND SHAO
message contains information such as IDs of the user and vendor ( VID ), the purchase date (D) as well as the user certificate, and is signed with user’s secret key ( SK U ).
PC = { U ID , VID , U Cerf , D } SK U
(4)
1. Send Purchase Commitment-PC
Vendor (V)
4. Send Payment Message-PM
2. Verify and store PC
within the purchase commitment PC along with other information such as the expiry date E, Max, C0one , C0ten and so on). By rehashing C0one ( i+g ) times and C0ten ( j+k ) times
5. Verify and store PM
and C ten , the then comparing the their values with C ione j +k +g
User’s PC
Database
Fig. 1. Message flows in each transaction
The message holds the broker-signed certificate so the vendor can verify the user’s legitimacy. For multiple shopping, the user has to send a different PC to each new vendor who will store the PC and deliver in encrypted form the URL requested by the user. The encryption involves a customer shared key ( CSK ), which will be delivered after the payment is received. Fig. 1 shows the message flows in each transaction. To make a payment, the user creates two coin-chains with appropriate lengths to match the transaction amount. This process is based on the original root coins in the user’s certificate, or the current root coins, that is the recorded lastspent coins from the previous payment. For an n-cent payment as the first transaction, the user needs to create i one-cent coins and j ten-cent coins, where
i = mod (n, 10), j = int (n / 10)
(5)
Each coin is the hash value of the previous coin. Thus two coin chains are generated, C1one = h(C 0one ), C 2one = h(C1one ),..., C ione = h(C ione −1 )
(6)
C1ten = h(C 0ten ), C 2ten = h(C1ten ),..., C ten = h(C ten j j −1 )
(7)
For the next m-cent payment, the user needs to take the - as new root coins to extend last-spent coins - Cione and C ten j the existing chains by running g hash computations as in (8) and k hash computations as in (9) one C ione ))) + g = h( h(...h (C i
C
ten j+k
= h(h(...h(C ))) ten j
(10)
The vendor verifies and authenticates payment message PM by applying the user’s public key extracted from his certificate U Cerf (signed by the broker and sent previously
6. Send key-CSK for encrypted contents
7. Decrypt URL; record coins, index
chains, and will serve as root coins at the next payment. The user then needs to apply his secret key to sign the payment message ( PM ) before sending it to the vendor. The PM contains the last-spent-coins with their corresponding indexes and timestamp T.
PM = { g, k, C ione , ten , i, j, T } SK U + g C j +k
3. Send encrypted URL
User (U)
where g = mod (m, 10) and k = int (m / 10). At this point, and C ten become the last coins in the respective coinCione j+k +g
(8) (9)
vendor could assess the validity of the payment. If it is valid, the key for decrypting the requested URL will be sent to the user, otherwise, the service is abandoned. The vendor then stores the last valid coins and indexes, for use at the next verification involving the same user. This will avoid duplicated hash computations. At the end of the day, the vendor sends the broker signed redemption messages which include the users’ certificates signed by the broker, as well as all PMs signed by the customers. The broker verifies the last coins in each PM by performing hash operations in the same way as did the vendor. He also checks for double-spending at this stage. If all coins are valid, the broker updates the users’ and vendors’ accounts. V. SUMMARY Micropayment systems have become popular in recent years as the idea of paying for low-value, high-volume transactions involving music, clip-art, video and other media has gained wider acceptance. This paper analyzes the key requirements of a successful micropayment system, reviews existing systems, and presents a payment model which would offer technical enhancements with respect to efficiency without undue loss in security. In particular, the proposed WebCoin model incorporates double-hash-chain and coin index in coin creation and verification. In the process, one-way hash function and cryptographic signature schemes are integrated to reduce computation costs and enhance both security and performance. REFERENCES [1]
[2] [3]
RSA Laboratories (2007) “What Are Micropayments”, RSA Security Crypto FAQ, Chapter 4 Applications of Cryptography: 4.2 Electronic Commerce. Retrieved: 19 Oct, 2007, from http://www.rsa.com/rsalabs/node.asp?id=2289 Micali, S and Rivest, R. (2002) “Micropayments Revisited”, in Preneel, B. (ed.), Progress in Cryptology --- CT-RSA 2002, vol. 2271 of Lecture Notes in Computer Science, Springer Verlag, Heidelberg, pp. 149-163. O’Mahony, D., et al. (2001) Electronic Payment Systems for ECommerce. 2nd edition, Artech House.
A SECURE AND EFFICIENT MICROPAYMENT SYSTEM [4] [5]
[6] [7] [8] [9] [10] [11] [12] [13]
[14]
[15] [16]
[17] [18] [19] [20]
[21] [22]
[23] [24]
Stallings, W. (1999) Cryptography and Network Security, Principles and Practice. 2nd edition, Upper Saddle River, N.J. Prentice Hall. Centeno, C. (2002) “Building security and consumer trust in Internet payment - The potential of soft measures”. Background paper No. 7 of the Electronic Payment Systems Observatory (ePSO) project, Institute for Prospective Technological Studies (IPTS), Joint Research Centre (JRC), European Commission. Kniberg, H. (2002) “What Makes a Micropayment Solution Succeed”. Masters Thesis KTH, Institution for Applied Information Technology, Stockholm, November. Glassman, S., et al. (1995) “Millicent Protocol for Inexpensive Electronic Commerce”, World Wide Web Journal, vol. 1, no. 1, pp.603618. Manasse, M. (1995) “The Millicent Protocols for Electronic Commerce”, in Proceedings of the First USENIX Workshop on Electronic Commerce, USENIX press, New York, pp.117-123. Sirbu, M. and Tygar, J.D. (1995) “NetBill: An Internet Commerce System Optimised for Network Delivered Services”, IEEE Personal Communication, vol. 2, no. 4, pp. 34-39. Cox, B., Tygar, J. D., and Sirbu, M. (1995) “NetBill Security and Transaction Protocol”, in Proceedings of the First USENIX Workshop on Electronic Commerce, USENIX press, New York, pp. 77-88. Rivest, R. and Shamir, A. (1996) “PayWord and MicroMint: Two Simple Micropayment Schemes”, in Proceedings of the Fourth Cambridge workshop on Security Protocols, Cambridge, pp. 69-87. Anderson, R., Manifavas, C., and Sutherland, C. (1996) “NetCard – A Practical Electronic Cash System”, in Cambridge Workshop on Security Protocols. Springer-Verlag, LNCS. Hauser, R., Steiner, M., and Waidner, M. (1996) “Micro-payments Based on iKP”, Technical Report RZ 2791, IBM Research. [17] Bielski, L. (2005) “A New Shot for Micropayments”. ABA Banking Journal, Vol. 97, pp 27-30. Poutanen, T. J, Hinton, H. and Stumm, M. (1998) “NetCents: A Lightweight Protocol for Secure Micropayments”, in Proceedings of the Third USENIX Workshop on Electronic Commerce. USENIX press, Massachusetts, pp. 25-36. Dai, X. and Grundy, J. (2007) “NetPay: An off-line, decentralized micro-payment system for thin-client applications”. Electronic Commerce Research and Applications, volume 6, pp. 91-101. McBride, J. (2003) “BitPass: A New Way To Pay And Charge For Online Access”. Revenue strategies, Central Coast Communication, Inc. Retrieved 29 Jan, 2007, from http://www.digimarket.net/artman/publish/article_12.shtml Bielski, L. (2005) “A New Shot for Micropayments”. ABA Banking Journal, Vol. 97, pp 27-30. Rivest, R. (2004) “PepperCoin Micropayments”. In Proceedings of Financial Cryptography, Springer, Key West, USA, Feburary. Retrieved 16 Dec, 2006, from http://citeseer.ist.psu.edu/676873.html Rivest, R (1997) “Electronic Lottery Tickets as Micropayments”. In Proceedings of the First Financial Cryptography Conference (FC 97), Springer-Verlag, Anguilla, British West Indies, pp145-150. Jewell, M. (2004) “Credit cards enter the micropayment game”. The Associated Press, published June 28, 2004, USA Today News. Retrieved: 16 May, 2006, from http://www.usatoday.com/tech/news/2004-06-28-micropay_x.htm ClickandBuy (2005) “Why should you choose Click&Buy from FirstGate?”. Retrieved 1 Aug, 2007, from http://firstgate.com/EU/en/downloads/Vorteile.pdf Auletta, V., Blundo,C., and Cimato, S. (2006) “A Web Service Based Micro-payment System,” in Proceeding of the 11th IEEE Symposium on Computers and Communications (ISCC’06), Sardinia, Italy, pp. 328-333. Lee, M.and Kim, K. (2002) “A Micro-payment System for MultipleShopping”. In Proceedings of the Symposium on Cryptography and Information Security 2002 (SCIS 2002), Shirahama, Japan. Parhonyi R., el al. (2005) “An interconnection architecture for micropayment systems”. In Proceedings of the Seventh International Conference on Electronic Commerce, Xian, China, pp.633-640.
67
[25] Rivest, R., Shamir, A. and Addleman, L. (1978) “A Method for Obtaining Digital Signatures and Public-Key Cryptosystem”, Comm. ACM 21(2), pp.120–126. [26] Shao, X., Nguyen, A.T. and Muthukkumarasamy, V (2004) “WebCoin: A Conceptual Model of a Micropayment System”, in Proceedings of the 9th Australian World Wide Web Conference (AusWeb 04), Gold Coast, Australia, pp.322-331.
An Empirical Investigation of Defect Management in Free/Open Source Software Projects Anu Gupta, Ravinder Kumar Singla Department of Computer Science and Applications, Panjab University, Chandigarh, India. [email protected], [email protected] Abstract- Free/Open Source software (F/OSS) is a novel way of developing and deploying software systems on a global basis. Collaborative efforts of developers and users towards identifying and correcting defects as well as requesting and enhancing features form the major foundation of this innovative model. An empirical study using questionnaire survey has been conducted among F/OSS developers to gain an insight of defect management among various F/OSS projects. The present paper focuses on exploring extent of use of defect tracking systems among F/OSS projects and evaluating the perception of F/OSS developers about effectiveness of user participation and efficiency of defect resolution. The findings indicate that the defect management activities significantly vary among F/OSS projects depending on project size, type of user participation and severity of defects. The findings of the present study can help in future to determine the impact of such variations in defect management on overall quality of F/OSS products by using defect data extracted from publicly accessible F/OSS repository.
I. INTRODUCTION Free/Open Source Software (F/OSS) represents an incredible and innovative way of software development in the area of software engineering [1-3]. Initially the distribution and use of F/OSS systems was limited to academia, research laboratories and technical user groups only. Today F/OSS systems are being developed and designed for mass consumption [4]. In F/OSS development, a large number of developer-turned users come together without monetary compensation to cooperate under a model of rigorous peer-review and take advantage of parallel debugging that leads to innovation and rapid advancement in developing and evolving software products [5]. The potential of many eyeballs tend to eliminate defects in the collaborative project and in turn greatly improve the software quality [1]. According to an article published by eWeek Labs, F/OSS organizations in general respond to problems more quickly and openly, while proprietary software vendors instinctively cover up, deny and delay [6]. As F/OSS project is a continuous process of users cum developers reporting defects, fixing defects, adding features and releasing software [7]. Thus a large amount of data regarding all these issues keeps on flowing back and forth among the developers and the users of the F/OSS projects [8]. To keep track of upcoming issues and their updated status a defect tracking system is employed at a central site to support the development team in most of the F/OSS projects. Defect management is a broad area which covers defect handling as well as focuses on preventing defects and ensuring
that the remaining defects, if any, will cause minimal disruption or damage [9]. Defect handling deals with recording, tracking and resolving various kinds of defects. The way defects are being recorded; consistency and completion in recording defects; kind of user participation, period taken to resolve defects can vary from project to project in F/OSS. So it becomes necessary to gather information relating to various such practices from natural environment before initiating any research in the concerned area. Defect analysis can be used to assess software quality, create comparison framework, track product reliability and evaluate responsiveness towards users [10]. In Free/Open Source Software projects, qualitative analysis can be supplemented with the quantitative data because of availability of huge amount of information with a great variety in size, programming languages, programming tools, programming methods etc [11]. The present paper aims to provide a preliminary exploration to improve our understanding of defect management among various F/OSS projects. In the following section we briefly highlight the concerted efforts in the related areas. In Section III we describe the methodology used in the present study. In Section IV we present our quantitative results and in section V we offer a discussion of these results. We then conclude in Section VI by summarizing the contribution of the paper and propose some ideas for future work. II. LITERATURE REVIEW F/OSS development model is based on the ideas of intensive communication among developers, large dependency on peer reviews and frequent release of source code [1]. Developers believe that reliance on these ideas would accelerate discovery and fixing of defects and substantially enhance the quality of the produced software. In order to prove the claims of F/OSS proponents a few studies have been conducted so far [12]. A little is known about how people in these communities coordinate software development across different settings or about what software processes, work practices and organizational contexts are necessary to their success [13]. A comprehensive study quantified a number of the software quality assurance activities used in the F/OSS development, in an attempt to either support or debunk some of the claims made by the F/OSS proponents [14]. Another study provided the general characterization of defect handling in medium and large F/OSS projects. The study focused on defect recording aspect i.e. concept of defect, subject of defect recording,
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 68–73. © Springer Science+Business Media B.V. 2008
AN EMPIRICAL INVESTIGATION OF DEFECT MANAGEMENT
consistency and completeness in recording defects etc. [10] A study of popular Apache web server and Mozilla web browser quantified aspects of developer participation, core team size, code ownership, productivity, defect density and problem resolution intervals by using e-mail archives of source code change history and defect reports [15]. A study of various F/OSS projects investigated the relationship of F/OSS process maturity with the success of the project and found processes connected to coordination and communication vital for F/OSS projects [16]. We conducted an empirical study to explore the use of defect tracking systems, to determine the perception of F/OSS developers about effectiveness of user participation and efficiency of defect resolution among F/OSS projects. III. METHODOLOGY A. Aim of Study The specific research questions which we aim to address in the present study are as follows: RQ1. What is the extent of use of defect tracking systems among F/OSS projects especially in terms of consistency and completion in reporting defects? Is that similar across different projects? RQ2. How do F/OSS developers perceive participation of users towards development and maintenance of F/OSS projects? Does the effectiveness of user participation depend upon the size of the F/OSS project? RQ3. How long does it take to resolve defects of varying severity in F/OSS projects? Does the resolution period of a defect depend upon team size of the F/OSS project? What percentage of defects remains unresolved on an average? B. Survey Format In order to gather data to answer above mentioned questions, an online questionnaire was designed and implemented using phpESP, an open source survey tool hosted on Sourceforge [17]. The survey questionnaire was based upon surveys conducted earlier in the related areas but modified and enhanced for the current study [10,14]. The questionnaire consisted of 25 questions and the most of the questions are of multiple-choice formats. A respondent was free to fill in the additional information or add comments, wherever required. Such format helps to standardize the responses as well make statistical analysis easier. All the questions were made compulsory in order to avoid incomplete responses. The complete questionnaire can be found online [18]. A pilot study was conducted with five F/OSS projects. A draft of the online questionnaire was sent to the software developers engaged with these projects and comments were solicited on ease of understanding the questionnaire, the usability of the web based form and the time it takes to complete. Then the questionnaire was revised on the basis of this feedback. C. Data Collection Methodology To gather data 250 projects were identified on the basis of prior knowledge and information available on collaborative development websites and various other websites. These
69
projects constitute a diverse mix of system size, nature of application and targeted end user type. The projects are publicly accessible allowing participation of users on global basis. There are a number of projects like Apache, Mozilla etc. that include a number of subprojects. Each of such subprojects is considered independent because defect management process may vary among them. The target population for the survey is the F/OSS development community i.e. the people engaged with F/OSS projects as administrator, manager, developer, tester, defect fixer, documenter etc. The F/OSS users are not involved as they have very less knowledge about the defect management process. The mailing lists and forums used for discussing development, defects, quality etc. were subscribed. The communication to F/OSS development community was made by sending e-mails to them. Periodically reminders were also sent. Many respondents appreciated the work and offered their support for future also while a few complained about redundant mails. Finally 188 responses were received from January 1, 2007 to April 30, 2007. Out of these, 23 responses were received from projects, which were not initially targeted. Either they learnt about the survey from the Internet or other targeted F/OSS developers refer to them. Out of 188 responses, 11 responses cannot be used for the analysis due to either corrupted responses or not satisfying the project selection criterion. So 177 responses received from 137 projects are currently being used for analysis. Many popular and wellknown F/OSS projects participated in the survey. A complete list of such projects can be made available. The analysis was performed at two levels, the individual and project level. This suggests a multi-level approach to analysis. When reporting individual-level results, all of the respondents’ records were used. For project-level analysis, individual responses in each project were aggregated. For aggregation, the most experienced developer’s response was selected to represent the values for the project. The data collected was then analyzed and interpreted using spreadsheet package and statistical analysis tool. IV. QUANTITATIVE RESULTS Through the present study we attained knowledge on general characteristics of F/OSS development, use of defect tracking systems, developers’ perception about effectiveness of user participation and efficiency of defect resolution. Accordingly the results are presented in following subsections. A. General Characterization of F/OSS Development All the projects surveyed have been classified into three categories depending upon their size (Lines of Code). The projects with size below 50000 lines of code are considered small. Medium projects have their lines of code between 50000 to 100000. The projects above 100000 lines of code are termed as large projects. These project categories have been used to determine its relationship with various parameters. Majority of F/OSS projects sampled have small team size actively involved in the project (overall 55% of the projects have less than 10 members, 19% have 10 to 20 members, 12%
70
GUPTA AND SINGLA
have 20 to 100 members). Only 14% of the projects have team size more than 100 actively engaged with the F/OSS project development. It is also found that 49% of respondents are engaged with F/OSS projects as developer whereas 27% are working as administrator or project manager. The remaining responses belong to people performing other roles like designer, documenter, tester and defect fixer. 58% of the respondents have experience of more than five years in the field of software development whereas only 10% of the respondents have experience less than one year. About 33% of all the projects sampled have more than 50000 users. Only 15% of the projects are having users less than 100. Most of the projects i.e. 80% of all the projects have average release cycle six monthly or shorter. About 11% projects are released yearly while 9% of the projects are found to have irregular release cycle. It has also been found that less than 20% of code is changed in about 69% of the projects with every major release. While 20% of the projects have average change ranging from 21% to 30% of the project code. Only 10% projects change their code more than 30%. A relationship was investigated between the release frequency and the amount of change by calculating Spearman’s rank correlation coefficient, which came out to be 0.267 at 0.01 level of significance. Therefore it is argued that there is generally weak positive correlation between the release frequency and the amount of average change in each release. But no significant relationship of release frequency and amount of average change could be observed with project size. B. Extent of Use of Defect Tracking System We observe (from row 1 of Table I) that overall 88% of the F/OSS projects use one or the other defect tracking system. About 96% of large F/OSS projects are found to be using defect tracking systems while this number is slightly low in case of medium and small projects. A defect is reported in an F/OSS project mostly when a problem or failure is encountered or a feature is to be requested. Certain other issues are also reported relating to usability, documentation, security, installation, support, porting etc. From row 2 of Table I, we find that among 80% of large projects all these types of issues are being reported whereas small and medium projects differ a lot in this regard. Among 86% of all the F/OSS projects, prerelease as well as post release defects are recorded. Half of such projects make clear distinction between pre-release and post-release defects while others not. Percentage of projects recording both kinds of defects (with or without distinction) is comparatively higher among large F/OSS projects as shown in row 3 of Table I. But this was found to be lower in case of small and medium projects. If a defect tracking system is being used during the development and maintenance of a project then all the related issues must be entered with complete details on regular basis. Moreover defect database can be used for analysis only when all the defect records are recorded consistently with their complete details. It is found that about 65% of the projects record their defects very consistently or almost consistently. The degree of consistency is found to be higher in case of large
projects (i.e. 82% of large projects, 67% of medium projects and about 47% of small F/OSS projects as shown in row 4 of Table I). The same has been indicated in Fig. 1. A typical defect report includes several fields of information. These various fields let to know various aspects of a defect like related software components, type of problem, priority, error message etc. 51% of all the projects claim to be complete in the defect recording. The completeness level is also found higher in case of large projects as compare to small projects as shown in row 5 of Table I. The completeness level among various project size categories has been indicated in Fig. 2. To determine the degree of relationship between the completeness and the consistency as well as their relationship with project size, Spearman’s rank correlation coefficients were computed which are tabulated in Table II along with their significance value. Looking across the first row, we find that there is strong positive correlation between the project size and the consistency in reporting defects (r= 0.558, p=0.000). Whereas there is a weak positive correlation between the project size and the completeness in reporting defects (r= 0.227, p=0.008). Therefore it is concluded that the higher degree of consistency is strongly related to larger size of project whereas higher degree of completeness is weakly related to larger size of project. The second row shows that there is weak positive correlation between the consistency and completeness in reporting defects (r= 0.227, p=0.008). 79% of all the projects regularly update their version control system to keep track of changes made in the source code. Use of version control system is also found to be slightly higher in case of large F/OSS projects as compare to small and medium projects as shown in row 6 of Table I. TABLE I Serial Number 1. 2.
3.
4.
5.
6.
SUMMARY OF F/OSS PROJECTS Particulars Small Medium Large F/OSS F/OSS F/OSS Use of Defect Tracking System Reporting of all types of issues (Related to Change, problem, feature request, usability, documentation etc.) Reporting of Prerelease as well as Post-Release Defects (with or without distinction) Consistency in reporting defects (Fully or Almost) Completeness in reporting defects (Fully or Almost) Updating Version Control System
Overall
80%
85%
96%
88%
56%
59%
80%
66%
85%
74%
93%
86%
47%
67%
82%
65%
38%
44%
67%
51%
76%
78%
82%
79%
71
AN EMPIRICAL INVESTIGATION OF DEFECT MANAGEMENT TABLE II SPEARMAN'S RANK CORRELATIONS Consistency in Reporting Defects Correlation 0.558(a) Coefficient ( r) Significance ( p) 0.000 Consistency in Correlation Reporting Defects Coefficient ( r) Significance ( p) a Correlation is significant at the 0.01 level (2-tailed) b Correlation is significant at the 0.05 level (2-tailed) Project Size
Completeness in Reporting Defects 0.227(a) 0.008 0.214(b) 0.012
PercentageofProjects
60.00 50.00
Defect Reporting
40.00
Very Cons is tently
30.00
Almos t Cons is tently
20.00
Not Very Cons is tently
10.00
Not Cons is tently
0.00
Small
Medium
TABLE III CROSS TABULATION OF PROJECT SIZE AND RATING OF USER PARTICIPATION FOR FEATURE REQUEST Rating of User Contribution towards Feature Request Total Project Very Size Excellent Good Good Fair Poor Small Count 13 14 19 11 8 65
Medium
Large
Large
Project Size
Fig. 1. Consistency in recording defects versus project size
Percentage
20.0
21.5
29.2
16.9
12.3
100.0
Count
2
7
15
8
2
34
Percentage
5.9
20.6
44.1
23.5
5.9
100.0
Count
17
16
30
12
3
78
Percentage
21.8
20.5
38.5
15.4
3.8
100.0
Total Count
32
37
64
31
13
177
Total Percentage
18.1
20.9
36.2
17.5
7.3
100.0
Table IV PEARSON CHI-SQUARE TESTS
Percentage of Projects
60.00 50.00
Always complete
40.00
Almost always complete Sometimes complete
30.00 20.00 10.00 0.00
Rarely complete Small
Medium
Large
For Feature Request For Defect Identification For Patch Submission
Project Size
Value 9.625 9.212 10.008
Asymptotic Significance (2-sided) 0.293 0.325 0.264
Fig. 2. Completeness in recording defects versus project size Percentage of Projects
50.00
C. Effectiveness of User Participation In F/OSS projects, the user participation and feedback constitutes one of the backbones of F/OSS model [1]. The F/OSS users participate in the development and maintenance of F/OSS projects by identifying defects, requesting new features, submitting patches and so on. The respondents were asked to rate the effectiveness of all kinds of user participation on a five-point scale ranging from poor to excellent in order to determine their perception about the user participation and feedback. The Table III shows the count and percentage of respondents’ rating of user participation for feature request among various project size categories. To analyze whether the developers’ perception about user participation and feedback is independent of project size, we conducted a Chi-squared test using a contingency table approach for each kind of user participation. The null hypothesis is that effectiveness of user participation is independent of size of F/OSS project. The computed test statistics are shown in the Table IV. Observing the values of Table IV, the null hypothesis is accepted, since level of significance i.e. p > 0.05. A conclusion is made that effectiveness of user participation for feature request, defect identification and patch submission is independent of size of the F/OSS project. Examining the pattern of numbers in Fig. 3, we observe that feature request has been rated as good to very good in entire project size categories.
Execellent
40.00
Very Good
30.00
Good
20.00
Fair
10.00 0.00
poor Small
Medium
Large
Project Size
Fig. 3. Rating of users’ feature request by project size
Similar kind of analysis was carried out for defect identification as well as patch submission. It is found that defect identification has been identified as very good to excellent among large number of the projects i.e. about 40% of all the projects. The user participation for patch submission is perceived poor to good among majority of projects irrespective of project size. D. Efficiency in Defect Resolution Defects have been categorized as critical, major, minor depending upon their severity and the urgency to resolve. The respondents were asked to answer the period, which is normally taken to resolve these different categories of defects. Examining the pattern of numbers in the Fig. 4, it is noted that in two-third of all the projects critical defects are resolved within a period of 5 days of defect reporting. About 58% of the projects resolve their major defects within a period of 10 days while resolution of minor defects is generally deferred in most of the projects. Only 20% of the projects are resolving their minor defects within a period of 10 days of defect reporting.
72
GUPTA AND SINGLA
To investigate the relationships among period taken to resolve various types of defects as well as their relationship with team size, analysis was carried out. As the values for these different variables are in ranking order, Spearman’s rank correlation coefficients were computed. The relevant values along with their significance are being shown in the Table V. Looking across the first and second column of the matrix, we observe that there is a strong positive correlation between defect resolution period of major defects with the defect resolution period of critical defects (r= .675, p=.000) as well as with defect resolution period of minor defects (r= .596, p=.000). While there is a weak positive relationship between defect resolution period of critical defects with defect resolution period of minor defects (r = .214, p = .004). Thus a smaller defect resolution period for critical defects is related to smaller defect resolution period for major as well as smaller defect resolution period for minor defects, although there is a much weaker relationship between defect resolution period for critical defects and the minor defects. Looking in the last column, we can conclude that there is a weak positive relationship of the team size with major defects (r = .163, p = .030) as well as with minor defects (r = .193, p = .01).Whereas the relationship between team size and defect resolution period for critical defects is statistically insignificant. We can conclude that the efficiency in resolving critical defects is independent of the team size. Percentage of responses
60 50 40 1 - 2 Days 30
3 - 5 Days
20
6 - 10 Days 11- 30 Days
10 0
More than 30 Days Critical
Major
Minor
Defect Severity Level
Fig. 4. Defect resolution period by defect severity Table V SPEARMAN’S CORRELATION COEFFICIENTS How How quickly quickly minor defects major are being defects resolved? are being resolved? How quickly Correlation .675(a) .214(a ) critical Coefficient ( r) defects are Significance ( p) .000 .004 being resolved? How quickly Correlation .596(a ) major Coefficient ( r) defects are Significance ( p) .000 being resolved? How quickly Correlation minor Coefficient ( r) defects are Significance ( p) being resolved? a b
Correlation is significant at the 0.01 level (2-tailed) Correlation is significant at the 0.05 level (2-tailed)
Team Size
.100 .186 .163(b) .030 .193(b) .010
In the survey, the respondents were also questioned about average percentage of unresolved defects at any point of time. We observe that the overall 76% of projects have unresolved defects below 20%. This percentage is slightly higher in case of small projects in comparison to medium and large projects. The major reasons for unresolved defects were also questioned. It is found that the reported defects cannot be reproduced many times and it requires more information. Many time resolution of defect is postponed. The lack of manpower to resolve the defects has been identified as another major reason behind pending defects. To fix a minor problem, excessive change is required several times, which may further introduce many more defects. As the project size grows the code complexity also increases, which is another big reason for increasing number of unresolved defects. V. DISCUSSION We observe that F/OSS projects typically consist of a small number of active developers who contribute most of the code and oversee the design and evolution of the project. These active developers are surrounded by a vast pool of users. A subset of such users use the latest releases and contribute defect reports or feature requests and many times submit patches also. During the past few years, adoption of F/OSS has increased considerably. This growth may be attributed to factors that encouraged users to favor F/OSS over proprietary software, such as cost savings, performance and security. It has been observed that most of the projects irrespective of their size happen to produce the small fixes or increments at very quick interval. The communities surrounding the project use as well test the software and submit their feedback promptly that way project is being enhanced. So it is clearly seen that the F/OSS projects follow the premise “Release Early, Release Often” [1]. Based on the results reported in section 5 we attempt to address our research questions. RQ1. What is the extent of use of defect tracking systems among F/OSS projects especially in terms of consistency and completion in reporting defects? Is that similar across different projects? Answer: The use of defect tracking system to support collaborative and distributed software development is found to be quite high as a whole. However it is felt that these numbers might be confounded by the fact that the collaborative development sites provide the necessary infrastructure. The practice of reporting all types of issues related to problems, failures, feature request etc. differs significantly among F/OSS projects. Level of consistency and completeness in defect reporting are positively correlated to each other and found to be higher in case of larger projects as compare to smaller ones. In many projects developers are able to correct a defect immediately without further investigation. In such a case defect may not be recorded in the defect database. Many times a defect record may contain enough information to reproduce or fix a defect, all the fields may not completed. The collaborative development sites can have a degree of impact in educating the F/OSS developers and users and effectively shaping the F/OSS development model.
AN EMPIRICAL INVESTIGATION OF DEFECT MANAGEMENT
RQ2. How do F/OSS developers perceive participation of users towards development and maintenance of F/OSS projects? Does the effectiveness of user participation is independent of the size of the F/OSS project? Answer: The F/OSS users contribute towards project development and maintenance of F/OSS projects in form of defect identification, feature request and patch submission. The quantitative evidence shows that F/OSS developers perceive defect identification ability of users quite high. While their participation for feature request is considered reasonably well. While user participation in form of patch submission is not welcomed much. Users may lack a global view of broader system constraints that may be affected by any proposed change. Most of times core team has to work hard on such proposed changes for their possible integration into central code. The rating given by F/OSS developers for various kinds of participation does not really depend upon the project size. This is found to be quite similar across different project size categories. RQ3. How long does it take to resolve defects of varying severity in F/OSS projects? Does there exist some relationship among defects of varying severity as well as with team size of the F/OSS project? What percentage of defects remains unresolved on an average? Answer: The defect resolution period is found to be dependent upon the severity of defects. Critical defects gain quick attention while major defects are also resolved within reasonable period. However resolution of minor defects is generally deferred. Higher efficiency in resolving critical defects is related to higher efficiency in resolving major defects also. It has been found that the size of F/OSS development team has no impact on resolution efficiency of critical defects while it has some positive relationship with the efficiency of resolution of major and minor defects. It was also found that only a few percentages of defects remain unresolved in majority of the projects. The reported defects cannot be reproduced many times and it requires more information. Many time resolution of defect is postponed. The lack of manpower to resolve the defects has been identified as another major reason behind pending defects. To fix a minor problem, excessive change is required several times, which may further introduce many more defects. VI. CONCLUSIONS AND FUTURE WORK This paper makes a contribution to the developing body of empirical research on F/OSS by identifying the extent of use of defect tracking systems among F/OSS projects and determining the developers’ perception about effectiveness of user participation and efficiency of defect resolution. Overall defect management is considered significant in F/OSS projects but in practice only large projects record on regular basis with complete details. The effectiveness of user participation is found to be dependent upon the type of user participation rather than being dependent upon the project size. The efficiency in resolving defects is related with the severity of the defects and it has little relationship with the team size. The findings and outcome of present study can help in future to
73
determine the impact of defect management on overall quality of F/OSS products. So the future work would be focused on analyzing data from publicly accessible defect repository to carry in depth study of limited number of projects of diverse nature. Publicly accessible defect data would be supplemented with additional data about the projects and their defects to perform analysis and provide valuable feedback to further improve software quality. ACKNOWLEDGMENTS We are very thankful to 188 Free/Open Source developers who spent their precious time to respond the survey and gave their valuable comments and suggestions. [1] [2] [3] [4] [5]
[6] [7]
[8] [9] [10] [11]
[12] [13]
[14]
[15]
[16]
[17] [18]
REFERENCES E.S. Raymond, “The Cathedral and the Bazaar”, http://www.firstmonday.org/issues/issue3_3/raymond/ Free Software definition, http://www.fsf.org/licensing/essays/freesw.html Open Source Initiative, http://www.opensource.org A. Boulanger, “Open-source versus proprietary software: Is one more reliable and secure than the other?”, IBM Systems Journal, 2005, Vol. 44, No. 2. Stefan Koch, “Evolution of Open Source Software Systems – A Large-Scale Investigation”, Proceedings of the First International Conference on Open Source Systems, Geneva, 11th-15th July 2005. Jim Rapoza, “Open Source Quicker at Fixing Flaws”, eWeek Labs, http://www.eweek.com/article2/0,3959,562226,00.asp K. Crowston, H. Annabi, J. Howison, “Defining Open Source Software Project Success”, Twenty-Fourth International Conference on Information Systems, Seattle, Washington, December14-17 2003, http://floss.syr.edu/publications/icis2003success.pdf Bug Tracking, http://www.bug-tracking.net/tracking-source.htm Defect management, http://www.defectmanagement.com/ A. G. Koru, J. Tian, “Defect Handling in Medium and Large Open Source Projects”, IEEE Software, Jul/Aug 2004, Vol. 21, No. 4. G. Robles, J.M. González-Barahona, C. José, M. Vicente, R. Luis, “Studying the Evolution Of Libre Software Projects Using Publicly Available Data”, International Conference on Software Engineering, 3rd Workshop on Open Source Software Engineering, Portland, Oregon, May 3-11, 2003. http://opensource.ucc.ie/icse2003/3rd-WS-on-OSSEngineering.pdf R. Glass, “Is open source software more reliable? An elusive answer”, The Software Practitioner, 2001, Vol. 11, No.6. Walt Scacchi, “Software Development Practices in Open Software Development Communities: A Comparative Case Study (Position Paper)”, Workshop on Open Source Software Engineering, Toronto, Ontario, May 2001. L. Zhao and S. Elbaum, “Quality Assurance Under the Open Source Development Model”, Journal of System and Software, Apr. 2003, Vol. 66, No. 1, pp. 65-75. A. Mockus, R.T. Fielding, and J. Herbsleb, “Two Case Studies of Open Source Software Development: Apache and Mozilla”, ACM Transactions on Software Engineering and Methodology, 2002, Vol. 11, No. 3, 309-346. M. Michlmayr, “Software Process Maturity and the Success of Free Software Projects”, Software Engineering: Evolution and Emerging Technologies, 2005, Vol. 130, pp. 3-14. Survey Management Tool, http://phpesp.sourceforge.net/ Survey Questionnaire, http://anu.puchd.ac.in/
A Parallel Algorithm that Enumerates all the Cliques in an Undirected Graph A. S. Bavan School of Computing Science Middlesex University The Borroughs, London NW4 4BT UK Email: [email protected]
Abstract-In this paper a new algorithm that guarantees to enumerate all the existing cliques including the maximal clique based on set operations is presented. The first advantage of the algorithm is that it can be parallelized to improve the performance. The second being that it guarantees to find all the cliques. The algorithm cannot be considered as radically different from some of the existing algorithm except it uses simple set operation explicitly to find the cliques thus making the computation simpler. This also allows us to verify the results.
I. INTRODUCTION Maximal clique problem is concerned with determining a largest set of vertices that are fully connected in a graph. Given an undirected graph G(V, E) where V = {v0, .., vn} a set of all the vertices and E={e0., .., el.} set of all the edges, (vi, vj). A maximal clique in G is a subset of fully connected vertices Sv = {sv0, .., svm} with maximum cardinality. This means that there is an edge between each pair of vertices in the subset. Finding such a subset in a graph is an NPcomplete problem[7]. Maximal clique in a graph has many applications which include graph colouring[3, 6], optimization[1], layout testing[8] and communication networks. Number of algorithms have been developed to date for finding the maximal cliques. Most popular ones being based on branch and bound algorithm [2, 4, 5, 11]. In [12] a recursive algorithm that determines the maximum independent set is presented. A survey paper by Pardalos and Xue [10] identifies and discuss the key issues in a branch-and-bound algorithm for the maximal clique problem. Our algorithm is similar to the work that deals with Parallelisation in [9], which is based on an exact algorithm previously developed by one of the authors.
vertices are guaranteed to be generated. This algorithm generates all the existing non-trivial cliques. It is also parallelizable using a farmer worker approach. The algorithm is currently implemented in Java and is expected to be slower than the C version. In the rest of the paper, section 2 presents the algorithm. In section 3 an example based on a graph of size 7 is used to demonstrate the technique. Section 3 also includes some performance results using the DIMACS benchmark data for graphs of size 200 with varying densities and maximal clique sizes. Finally in section 4, a conclusion is presented. II. THE ALGORITHM The algorithm consists of two stages. The first stage eliminates those nodes that have less than 3 edges by calculating degree for each vertex (node) and repeating this process until no more elimination is possible. This process is illustrated in the following example. Consider the graph in figure 1.
0
4
1
2
6
In this paper we present an algorithm based on simple intersection that can be parallelised. The algorithm is based on the notion that all the fully connected subsets (cliques) that includes a particular vertex (target) can be formed by intersecting the row representing that vertex in an adjacent matrix with the rest of the rows in matrix that are connected to the target and subsequently repeating the process using each row as the target in the reduced matrix. This is an exact approach where subsets containing fully connected
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 74–78. © Springer Science+Business Media B.V. 2008
3 5
Figure 1.
75
A PARALLEL ALGORITHM THAT ENUMERATES ALL THE CLIQUES
Its adjacent matrices with diagonal elements set to 1 is as follows:
R/ C
0 1
2
3
4
5
6
deg
0 1 2 3 4 5 6
1 1 0 0 0 0 0
0 1 1 1 1 0 0
0 1 1 1 1 1 0
0 1 1 1 1 1 0
0 1 0 1 1 1 1
0 1 0 0 0 1 1
1 6 3 4 4 4 2
1 1 1 1 1 1 1
Remove 0 and 6 since the degree <3
R/ C
0 1
2
3
4
5
6
deg
0 1 2 3 4 5 6
1 1 0 0 0 0 0
0 1 1 1 1 0 0
0 1 1 1 1 1 0
0 1 1 1 1 1 0
0 1 0 1 1 1 1
0 1 0 0 0 1 1
1 6 3 4 4 4 2
1 1 1 1 1 1 1
3
4
5
6
1
2
3
4
5
deg
1
1
1
1
1
1
4
2
1
1
1
1
0
3
3
1
1
1
1
1
4
4
1
1
1
1
1
4
5
1
0
1
1
1
3
The second stage involves simple intersection of rows in the adjacent matrix. In here, the diagonal elements of the adjacent matrix are set to 1. The reason for this will become clear during the demonstration of the algorithm using an example in section 3. Firstly a candidate node (vi) with largest number of edges is selected. Secondly a reduced matrix is formed by removing all the rows that do not have a 1 in column vi. This means that any node that is not connected to vi cannot be a part of a clique that includes vi. Once the reduced matrix is formed, row vi is intersected with all the other rows vk. The resulting rows vk now becomes the new vk and the row vi is discarded to form the new reduced matrix. Now, a row with the largest number of 1’s (edges) is selected and intersected with all the other rows. This time the intersection process is a selective process as follows:
Given row x
Calculate the degree of the remaining nodes.
2
R/C
R/ C
0 1
deg
0
1 1
0
0
0
0
0
1
1 2
1 1 0 1
1 1
1 1
1 1
1 0
1 0
4 3
3 4 5
0 1 0 1 0 1
1 1 0
1 1 1
1 1 1
1 1 1
0 0 1
4 4 3
6
0 1
0
0
0
1
1
2
0 1
1 0
.. ..
x 1
.. ..
y 1
.. ..
n 0
1 0
.. ..
x 1
.. ..
y 1
.. ..
n 1
and row y
Since all the remaining nodes > two degrees, no more reduction is possible. So the reduced graph is represented by the following adjacent matrix.
0 0
Column x and y in each row must have 1’s as shown above. Then the resulting row becomes the new row y. If the rows x and y do not have matching 1’s as described above, then the original row y becomes the new row y. Once the intersection involving vi with all the other rows is completed, vi is discarded EXCEPT if vi failed to intersect with at least 1 other row. In this case vi is kept as part of the new reduced matrix and the process of intersection is repeated with the next row in the reduced matrix that has the largest number of 1’s. This process is continued until all the rows are intersected with each other. The final matrix will contain all the cliques that include
76
BAVAN
node vi. The cliques are extracted by simply selecting a row at a time and forming a set with the columns that have a non-zero entry. The whole process is repeated for other candidate nodes until a maximum set is found. A pseudo code version of the algorithm where the adjacent matrix for graph is [Aii] is as follows:
R/ C
0
1
2
3
4
5
6
Aii = 1;
0 1
1 1
1 1
0 1
0 0
0 0
0 1
0 1
AT. = max(∑n Ai.) Bi. = AT. ∩ Ai. For all i ≠ T
2 3
0 0
1 0
1 1
1 1
1 1
1 1
0 0
4 5 6
0 0 0
0 1 1
1 1 0
1 1 0
1 1 0
1 1 1
0 1 1
Begin
s1:
Adjacent Matrices:
s2: Bx. = max(∑m Bi.) flag = 0; for all i ≠ x { if (Bx. Intersectable_with Bi.) { Ci. = Bx. ∩ Bi. ; flag=1; } else Ci. = Bi. }
Candidate node is 5, so eliminate all rows whose column 5 is 0.
B= C; if (all_ Bx. _done) store_maxclique for this node T goto label_s2; if (all_ AT. _done) Extract from store the maxclique. goto label_s1; End.
.
R/ C
0
1
2
3
4
5
6
1 2
1 0
1 1
1 1
0 1
0 1
2 2
1 0
3 4 5
0 0 0
0 0 1
1 1 1
1 1 1
1 1 1
2 2 2
0 0 1
6
0
1
0
0
0
2
1
III. AN EXAMPLE Intersect row 5 with all other rows……. To demonstrate the second stage of the algorithm, a graph of size 7 is used as shown in figure 1. The sub-graph consisting of nodes 2, 3, 4, and 5 forms the maximal clique and is marked with thick edges.
0 1
2 4
R/ C
0
1
2
3
4
5
6
1
0
2
1
0
0
2
1
2
0
1
1
1
1
2
0
3
0
0
1
1
1
2
0
4
0
0
1
1
1
2
0
6
0
1
0
0
0
2
1
6 5 Figure 2.
3
Intersect r1 and with the rest if intersectable… r2 has c1 set =>ok r3 has c1 unset => no – r3 unchanged. r4 has c1 unset => no– r4 unchanged r6 has c1 set and r1 has c6 set => ok
77
A PARALLEL ALGORITHM THAT ENUMERATES ALL THE CLIQUES
R/ C
0
1
2
3
4
5
6
2 3 4 6
0 0 0 0
2 0 0 2
2 1 1 0
0 1 1 0
0 1 1 0
2 2 2 2
0 0 0 2
Intersect r2 and with the rest if intersectable r3 has c2 set, not c1(2) => no– r3 unchanged. r4 has c2 set, not c1(2) => no– r4 unchanged. :: r6 has c2 & c6 unset => no– r6 unchanged. Since r2 does not intersect with any, leave r2 and the rest. And go to r3.
R/C
0 1
2
3
4
5
6
2
0 2
1
0
0
2
0
3
0 0
1
1
1
2
0
4
0 0
1
1
1
2
0
6
0 2
0
0
0
2
2
r4 has c3and c4 set => ok r6 has c3 unset and r3 has c6 unset => no – r6 unchanged.
R/ C
0
1
2
3
4
5
6
2
0
2
1
0
0
2
0
4
0
0
1
2
2
2
0
6
0
2
0
0
0
2
1
R4 does not intersect with R6. So the cliques are:
[1, 2, 5] [2, 3, 4, 5] [1, 5, 6]
IV. PERFORMANCE To test the algorithm, Dimacs benchmark data was used and the results for a uni-processor are presented in Table I with the execution times. TABLE I Single Intel ® Xeon™ processors 2.8GHZ Graph size = 200
File
Clique size 21 17 15 12 58 24
Brock200_1 Brock200_4 Brock200_3 Brock200_2 c-fat200_5 c-fat200_2
Density 0.75 0.66 0.61 0.50 0.43 0.16
Time (sec) 0.827 0.031 0.062 0.010 1.726 0.393
Table Table II and Table III list the execution times for random graph size of 100 and 200 with varying density. TABLE II Graph Size = 100 Maximal clique Size = 10
Density 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85
Time(sec) 0.031 0.047 0.047 0.062 0.062 0.078 0.079 0.110 0.126 0.140
78
BAVAN
TABLE III
V. CONCLUSION
Graph Size = 200 Maximal clique Size = 14
Density 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85
In this paper, new algorithm for enumerating all the cliques including the maximal clique using simple set intersection is presented. In order to reduce the number of intersection and avoid unnecessary intersection involving nodes with two or less edges, a pre-processing method is employed. The major advantages of this algorithm are that it guarantees to enumerate all the cliques and it is parallelisable. In addition, it has the scope for improvement in performance. The current implementation is in Java and based on the estimates, a C version will be comparable to existing fast algorithms. Development of this algorithm has also given us new ideas to tackle the problem of selectively choosing the target nodes so that the necessity for using every node in the graph can be avoided. This is even more important when only the maximal clique is required.
Time(sec) 0.344 0.390 0.437 0.516 0.578 0.656 0.766 0.922 1.094 1.344
REFERENCES Parallelisation: Since the algorithm virtually takes each node and enumerate all the cliques that includes the chosen node, it is an ideal candidate for the farmer-worker approach where different nodes and the adjacent matrix is passed to the worker and the host receives the enumerated cliques and determine the maximal clique. When the same data files given in table 1 was executed on two processors, we were able to make a 50% improvement as shown in Table IV. TABLE IV Two Intel ® Xeon™ processors 2.8GHZ and 2.79GHZ
[1]
Brayton, R. K.,Hachtel, G. D., Sangiovanni- Vincentelli, A. L., Multilevel Logic Synthesis, Proc. IEEE, Vol. 78, No. 2, February 1990, pp.264-300.
[2]
Balas, E., Yu, C. S., Finding a Maximal clique in an Arbitrary Graph, SIAM J. Computing 15(4), 1986, pp.1054-1068.
[3]
Biggs, N., Some Heuristics for Graph Coloring, in : R. Nelson and R. J. Wilson, eds, Graph Colourings, Longman, New York, 1990.
[4]
Bron, C., Kerbosch, J., Finding all Cliques of an Undirected Graph, Comm. ACM 16(9), 1973, pp. 575-577.
[5]
Carraghan, R., Pardalos, P. M., An Exact Algorithm for the Maximum Clique Problem, Operational Research Letters. 9, 1990, pp.375-382.
[6]
Franklin, M., Saluja, K. K., Hypergraph Coloring and Reconfigured RAM testing, IEEE Transactions on Computers, Vol. 43, No.6, June 1994, pp. 725-736.
[7]
Garey, M. R., Johnson, D. S., Computers and Interactability: A guide to the Theory of NP-Completeness, Freeman, 1979.
[8]
Lengauer, T., Combinatorial Algorithms for Integrated Circuit Layout, John Wiley & Sons, 1990.
[9]
Pardalos, P.M., Rappe, J., Resende, M.G.C., An Exact Parallel Algorithm for the Maximum Clique Problem, High Performance Algorithms and Software in Nonlinear Optimization, R. De Leone, Murl’i, P.M. Pardalos and G. Toraldo (eds.), Kluwer, Dordrecht, 1998, pp. 279-300.
Graph Size = 200 File
Clique size
Density
Time (sec)
Brock200_1 Brock200_4 Brock200_3
21 17 15
0.75 0.66 0.61
0.380 0.012 0.029
Brock200_2 c-fat200_5 c-fat200_2
12 58 24
0.50 0.43 0.16
0.0004 0.712 0.122
[10] Pardalos, P.M., Xue, J., The Maximum Clique Problem, J. Global Optimization, 4, 1994, pp. 301- 328. [11] Wood, D. R., An Algorithm for Finding a Maximum Clique in a Graph, Operational Research Letters, 21, 1997, pp. 211-217. [12] Tarjan, R. E., Trojanowski, A. E., Finding a Maximum Independent Set, SIAM J. Computing 6(3), 1977, pp. 537-546.
Agent Based Framework for Worm Detection El-Menshawy, A.M.; Faheem, H.M.; Al-Arif, T.; Taha, Z.
Abstract Worms constitute a major source of Internet delay. A new worm is capable of replicating itself to vulnerable systems in a very short time, and infecting thousands of computers across the Internet before human response. A new worm floods the internet and halts most Internet related services, which spoils Internet economy. Therefore, detecting new worms is considered crucial and should gain highest priority. Most of research effort was dedicated to modeling worm behavior; recently defending worm is receiving more interest, but the defense against Internet worms still an open problem. The purpose of this paper is to describe a framework for multiagent-based system for detecting new worms, auto-generating its signature, and distributing this signature. This goal can be achieved through a set of distributed agents residing on computers, routers, and servers. New worm floods Internet in a very high speed. Human role in detecting new worm and generating its signature takes long time. This gives worms a good chance to flood the whole Internet before any reaction. Autonomous, reliable, adaptive, responsive and proactive system is needed to detect new worms without human intervention. These features characterize agents. A framework for automated multiagentbased system for worm detection and signature generation, deployed on routers, computers, and servers is proposed in this paper. Keywords: Multi Agent-based systems, Worm detection. 1 Introduction An Internet Worm is self replicating program that can infect any vulnerable systems. Worm floods the Internet, and infects servers very quickly. So it represents very severe threat to Internet world, [1-8]. Since Morris worm, the first internet worm that appeared in 1988 that could bring down the internet in hours [3],[11] and till now we still have weakness in preventing new worm outbreak. Despite that worm outbreak spread is well understood, and have been studied several times in the previous, but we still can not protect our self from worm and we are exposed to new worm outbreak. There are some solutions for worm threats: 1. 2.
To have software free of bugs, but this is impossible. Software bugs increase by time as computer system get more complicated. To patch all software vulnerabilities, but this is impractical. Most of people don’t keep an eye on patch releases.
3.
To use regular anti-virus and intrusion detection systems or to configure routers, and firewall, but this detects only known worms and fail to prevent new worm outbreak. All signature based solutions fail to detect unknown worms. 4. To use anomaly based solution, but it has its drawback, it has high false negative ratio. Anomaly based solution depends on detecting anomaly behavior or symptoms to detect new worm existence. It misses detects normal behavior as worm symptom, which causes a lot of troubles to the network flow. Some Architecture appeared (Distributed Anti Worm) DAW Architecture [3]. This architecture tries to slow down worm spread. Another localized solution appeared (Double Honeypot system) [8], which isolates attack traffic from normal traffic. Recent research concentrates on worm propagation models [3], [9], [10], [12], [14]. The defense against worm attacks is largely an open problem due to the fact that worms are able to spread substantially faster than human can respond. In most cases, the defense against worm attacks can only be done reactively after the damage has already happened. Some trials were made to overcome worm problem, Moore et al has suggested new system that depends on address blacklisting and content filtering. This enables such system to react quickly in matter of minutes and block all internet paths against worm [3], [10]. Also Williamson proposed to modify the network stack so that the rate of connection requests to distinct destinations is bounded [3], [13]. Both solutions are very effective and capable of preventing worm from spread, but their effectiveness depends completely on majority of implementation. They must be implemented on world wide basis and if not they will not be effective enough to detect new worms and prevent its spread. In this paper we describe a framework for multiagent-based system for detecting worms. We are focusing on implementing fully automated system, to be able to make decision without any human intervention. This will accelerate worm detection process. Worm detection related difficulties come from the long time needed to study worm log file and generate its signature. Fully automating worm detection process will enable us to detect new worms faster. This can be achieved only by: integrating different worm detection mechanisms together, monitoring network traffic from different points (machines and routers), inspecting worm activities at different locations, applying different levels of analysis on the packets, and judging worm symptoms through different phases (machines, routers, and servers). All these factors will help us in eliminating the gap
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 79–84. © Springer Science+Business Media B.V. 2008
80
EL-MENSHAWY ET AL.
between new worm spread speed and the worm detection and signature generation time. Agent based systems are ideally qualified to knock the door of implementing proactive system. This is due to its nature and the features that characterize agents. It provides: robustness, autonomous, independence, adaptation, reliability, and proactive-ness. Agent programs have four major types they are: simple reflex agents, agents that keep track of the world, goal-based agents, and utility-based agents [15], [16]. 2 System Design Considerations Characteristics that best describe worms include packet failure rate and anomaly network behavior pattern change. Correlation between received and sent packets can provide an appropriate measure to analyze and detect possible worm infection. 2.1 Packet Delivery Failure Rate Part of the life cycle of infected computer is to infect more computers. As machine gets infected; it starts searching for new machines to infect. Infected machine scan whole network by sending packets to randomly selected network addresses. Most of these addresses don’t exist. In case packets fail to reach to its destination, the sender machine receives reply indicating that the packet was failed to be delivered. Infected machine receives large number of TCP RST Packet and ICMP PORT UNREACHABLE. Tracing the number of received packets indicating delivery failure will help in detecting infected machine that scan the network for other vulnerable machines. 2.2 Network behavior pattern change Infected machine suffers from anomaly change in its network behavior. Having model representing normal network behavior and keeping an eye on the instant behavior will be good indicator for in-accepted change in the instant system behavior. If anomaly change was detected, the cause of this change can be analyzed determine whether it is worm infection or just irregular change in network behavior.
Computer component monitors network traffic at the computer point. It prevents infected packets from going to or coming from this computer. It helps in detecting new spreading worm activities at the computer side. It tries to detect any local symptoms for worms like: anomaly network behavior, packet delivery failure, and correlation between sent and received packets. It sends its suspected packets to the server component to be double checked. It sends its detected infected packets to router component to prevent worm from spreading more in the network, and to the server component to generate its signature. Router component monitors network traffic at the router point. It prevents routing infected packet in the network. It helps in detecting new spreading worm activities at the router side. It tries to detect correlation among routed packets. It sends its suspected packets to the server component to be double checked. It sends its detected infected packets to computer component to protect it from spreading worm, and to the server component to generate its signature. Server component receives suspected packets from computer component and router component. This enables the server component to monitor the whole network. It analyzes the suspected packets to decide whether it is infected or not. It generates worm signature from infected packets. It distributes worm signature to computer and router components. The components of the system are shown in fig. 1. 4 Computer Component System Structure As shown in fig.2. Infected packets receiver agent receives infected packet from router component or another computer component. Also it receives worm signature from server component. All worm signatures received are stored in worm signature database to be used by other agents. Received and sent packets capturer agents capture all sent and received packets and pass them to Traffic in and Traffic out blocker agents.
2.3
Correlation between sent and received packets In all infected machines there is correlation between send and received packets. Detecting such correlation will be useful in detecting worm infection activities. Compromised machines with worm keep on sending the same packet to all servers. It sends the same packet that it has received and infected it. This correlation may be detected at machines or/and routers. 3 Multiagent-Based Framework The proposed multiagent-based framework consists of three components; they are: computer component, router component, and server component.
Fig. 1 System Components
81
AGENT BASED FRAMEWORK FOR WORM DETECTION
The Blocker agents use the worm signatures database to block any known worm or previously recognized worm. All permitted packets are stored in buffers. All packets in these buffers are subject to different worm symptoms investigation to detect new worms. The investigation tires to detect worms that have not been previously recognized and does not exist in signature database. The investigation symptoms are: network traffic anomaly change, packet delivery failure, and correlation between sent and received packets. Traffic pattern model generator agent models the network traffic behavior. Large window model represent the regular traffic pattern. Short window model represent the immediate traffic. Traffic pattern comparator agent compares these two models to detect anomaly changes in the immediate behavior. The suspected change is passed to Anomaly Behavior Infection decision agent. The Anomaly behavior Table 1 Computer Component Agents Agents Name Traffic In Blocker Agent
Percepts Incoming Traffic Packets
Traffic Out Blocker Agent
Outgoing Traffic Packets
Large window Traffic Pattern Model generator Agent
Incoming and Outgoing Packets
Small window Traffic Pattern Model generator Agent
Incoming and Outgoing Packets
Traffic Pattern Comparator Agent
Common and Current traffic pattern The Strange Network Behavior
Anomaly System Behavior Symptom, Suspicion Decision Agent Anomaly System Behavior Symptom, Categorization Agent Delivery Failure Packets, Suspicion Decision Agent Delivery Failure Packet, Infection Decision Agent Delivery Failure Packet, Categorization Agent Correlation Symptom Decision Agent Correlation , Infection Decision Agent Correlation, Categorization Agent
agent decides whether it is infection or not. The infected packet recognized by the previous agent is passed to Categorizer agent. It analyzes the infected packet to decide about infection category and take the proper action toward this infection. Packet delivery failure suspicion decision agent is responsible of counting the number of received packets indicating delivery failure. Deeper analysis is applied on sent packets if the count of delivery failure packets increased some value within some time. The suspected packet is passed to Infection decision agent. This agent analyze packet that caused increase in delivery failure and decide whether it is infected or not. If it was recognized as infected packet, it would be passed to Categorizer agent. This agent analyzes the infected packet to determine its type and take the proper action toward this infection.
Actions Filter all incoming packets using infected packets and worm signature database. Filter all outgoing packets using infected packets and worm signature database. Analyze in and out packets to generate system traffic pattern that describe networking behavior over long period of time. Analyze in and out packets to generate system traffic pattern that describe networking behavior over short period of time. Compare regular pattern with the current pattern.
Goals Allow only for safe packets to be received & send it to be analyzed again for any vulnerability. Allow only for safe packets to be sent & send it to be analyzed again for any vulnerability. Generate Model describing Common Traffic Pattern.
Type Simple Reflex Simple Reflex Agents that keep track of the world.
Generate Model describing Current traffic pattern.
Agents that keep track of the world.
Detect any change in Network behavior that may be result from worm spread.
Goal Based Agent
Analyze the input to determine if it infected packet or not
Decide about the anomaly behavior if it is infection or not
Goal Based Agent
The Infection Network Behavior
Analyze the Behavior to determine the to determine the type of infection
Determine type of infection
Utility Based Agent
Received packets
Count the number of failed to be delivered packets within window of time
Decide if there anomaly increase in the failed to be delivered packets
Agent that keep track of the world.
increased failed to be delivered packets Infected packet
Analyze these packet to check if it is a new worm spreading
Decide if it is infected packet or not
Goal Based Agent
Analyze infected packet to decide the type of infection
Decide the type of worm
Utility Based Agent
Received packets
Count the repetition in the sent and received packets
To detect anomaly repeated packet in the sent and received
Anomaly Repeated packets
Analyze the repeated packets to decide about the suspected packet
Decide if it is infected packet or not
Agent that keep track of the world. Goal Based Agent
Infected packet
Analyze infected packet to decide the type of infection
Decide the type of worm
Utility Based Agent
82
EL-MENSHAWY ET AL.
Correlation symptom decision agent tries to detect correlation between sent and received packets. If it detects any suspected correlated packets, it sends it to Infection decision agent to determine whether it is infected or not. The infected packets are passed to Categorizer agent that determines the type of infection and takes the proper action toward this infection. Any infected packets are sent to other computer components, and router components using Infected packets sender agent. Any suspected packets are sent to server component using Suspected packets sender agent. Table 1 shows a brief description for each agent, percepts, actions, goals, and types. 5 Router Component System Structure As shown in fig. 3. Infected packets receiver agent receives all infected packets from computer components and worm signature from server component. The packets are stored in worm signature database to be used by other agents. Routed packets capturer agent captures all routed packets and passes them to Blocker agents. The Blocker agent blocks all infected packets and prevents its routing (blocked packet are previously recognized worms). All permitted routed packets are stored in time window buffer. All routed packets in buffer are subject to worm activity investigation. Correlation suspicion decision agent tries to detect any correlation between sent and received in buffers. Any suspected packet is passed to Infection decision agent. If the packet is recognized as infected packet, it is passed to Categorizer agent. This agent determines the type of infection and the proper action to be taken toward this infection. Infected packets sender agent sent all infected packets to computer components. Suspected packets sender agent sends all suspected packets to server component. Table 2 shows a brief description for each agent, percepts, actions, goals, and types. 6 Server Component System Structure As shown in fig. 4. Suspicion packets receiver agent receives all suspected packets from router components and computer components. All suspected packets are stored in suspected packets storage. Infection packets receiver agent receives infected packets from router component, and computer component. Infection decision agent analyzes the suspected packets and stores infected packets into Infection packet storage with received infected packets. Infection categorizer agent analyzes the infected packets to determine type of infection. Signature generator agent generates worm signature from infected packets. Worm signature sender agent sends worm signature for router components and computer component. Infection packet sender agent sends infection packets to router components and computer component. Table 3 shows a brief description for each agent, percepts, actions, goals, and types.
7 Example In our proposed agent based framework for worm detection, we assume that deployed agents on computers and routers have a signature database for previously recognized worms. Agents use this signature database to identify any previously recognized infected packet. Unknown worm will not be identified by our agents on router and will be routed to its destination. Also it will not be detected by computer agents and will infect computer. Identifying unknown worms is implemented through several steps. As computer gets infected its behavior will change. It will start searching for new vulnerable computers to infect it. Sending infection packets from this machine to other machines will make its network behavior go through in-accepted change. Applying investigation on the computer network behavior and comparing immediate behavior against regular behavior will help in identifying anomaly behavior. Upon identifying any anomaly change in network behavior a deeper analysis will be applied on packets in received buffer to identify packet causing this change. While infected computer trying to infect another computers it sends packets to random network addresses. Most of these packets don’t reach to its destination. So infected computer receives large number of failure delivery packets. By keeping an eye on the number of received delivery failure packets we can detect that this machine is going through strange behavior. Deeper analysis can be applied on received and sent packets to identify infection packets that caused infection. Infected computers suffer from correlation between sent and received packets. By investigating this correlation we can determine exactly the infection packet in the sent and received buffers. Agents on routers are investigating routed packets in window of time trying to find any correlation between send and received packets. Upon identifying any correlation in the send and received packets deeper analysis is applied to identify infection packet. If computer agents and router agents could not determine whether it t is infection packet or not. These packets are sent to the server. The server receives suspected packets from all computers and routers. This makes it viewing large scope of the network and more capable of identifying infection packets from suspicion packets. After identifying infection packets it will be sent to all other computers and routers to update its signature database. So these computers will not be infected by such worm. Also routers will not rout such packets to stop worm spread. The proper action is taken to disinfect the infected computer. 8
Conclusion
A framework for multiagent-based system for worm detection has been introduced. The framework is simple and
AGENT BASED FRAMEWORK FOR WORM DETECTION
extendable. This proposed framework sets the rules for multiagent-based systems that can be used in a proactive worm detection mechanism. It provides a functional level description to each agent including its percept sequence, actions, and goals. The system design also supports enhancements either through modifying agent’s goals or integrating new agent into the system. This paper is considered an initial step in standardizing the worm detection architectures based on agent-oriented technology. The system could be implemented easily in distributed systems; this is due to its way of operation since most of processing and analysis is applied locally. This of course allows for its deployment in large-sized networks. Guided by the fact that rapid development realms of artificial intelligence could have been more efficiently utilized keeping in mind that agents are ideally qualified to knock the door of proactive systems implementation, we believe that the system is a step towards a complete multiagent-based system for viruses and worm fighting. Reference: [1] S. Staniford, V. Paxson, and N. Weaver, “How to 0wn the Internet in Your Spare Time,” in Proceedings of the 11th USENIX Security Symposium (Security ‘2002), San Francisco, California, USA, Aug. 2002. [2] D. Moore, C. Shannon, G. M. Voelker, and S. Savage, “Internet Quarantine: Requirements for Containing SelfPropagating Code,” in Proceedings of the 22nd Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM ‘2003), San Francisco, California, USA, Apr. 2003. [3] S. Chen and Y. Tang, “Slowing Down Internet Worms ,” in Proceeding of the 24th International Conference on Distributed Computing and Systems (ICDCS ‘2004),, Tokyo,Japan, Mar. 2004. [4] C. Kruegel and G. Vigna, “Anomaly Detection of Webbased Attacks,” in Proceedings of the 10th ACM Conference on Computer and Communication Security (CCS’2003). Washington D.C., USA: ACM Press, Oct. 2003, pp. 251– 261. [5] D. Moore, V. Paxson, S. Savage, C. Shannon, S. Staniford, and N. Weaver, “Inside the Slammer Worm,” IEEE Magazine of Security and Privacy, pp. 33–39, July 2003. [6] C. Cowan, C. Pu, D. Maier, J. Walpole, P. Bakke, S. Beattie, A. Grier, P. Wagle, Q. Zhang, and H. Hinton, “StackGuard: Automatic Adaptive Detection and Prevention of Buffer-Overflow Attacks,” in Proceedings of the 7th USENIX Security Conference (Security ‘1998), San Antonio, Texas, USA, Jan. 1998, pp. 63–78. [7] M. Eichin and J. Rochlis, “With Microscope and Tweezers: An Analysis of the Internet Virus of November 1988,” in Proceedings of the 1989 IEEE Symposium on Security and Privacy, Oakland, California, USA, May 1989, pp. 326–344.
Fig. 2 Computer Component
Fig. 3 Router Component
Fig. 4 Server Component
83
84
EL-MENSHAWY ET AL.
Table 2 Router Component Agents Agents Name Routing Out Blocker Agent
Percepts All routed packets To the gateway All routed packets From the gateway Allowed routed packets
Actions Filter routed packets to gateway using infected packets and worm signature
Goals Allow only for safe packets to be routed
Type Simple Reflex Agent
Filter routed packets from gateway using infected packets and worm signature
Allow only for safe packets to be routed
Simple Reflex Agent
Reroute the packets
Reroute the packets
Simple Reflex Agent
Allowed routed packets
Reroute the packets
Reroute the packets
Simple Reflex Agent
Correlation suspicion Decision Agent
Buffered routed packets
Compare between routed in and out packets
Detect any anomaly correlation between routed packets
Correlation Infection Decision Agent Correlation Categorization Agent
Correlated packets Infected Packets
Analyze the anomaly packets to decide if it is infected or not
Detect infection packet
Agent that keep track of the world. Goal Based Agent
Analyze the infected packets to determine the type of infection
Determine the type of infection
Utility Based Agent
Actions Analyze suspected packets, detect correlation between suspected packets from different machines Analyze the packet to determine the type of infection
Goals Detect infection packets
Type Goal Based Agent
Determine type of infection
Utility Based Agent
Get Signature of the infection
Goal Based Agent
Routing In Blocker Agent In Re-Routing Agent Out Re-Routing Agent
Table 3 Server Component Agents Agents Name Percepts Suspected packets Infection Decision Agent Infection Categorizer Agent Signature Generator Agent
Infected packets Infection packet and infection type
Analyze them to generate signature of the infection
[8] Y. Tang and S. Chen. Defending against Internet worms: A signature-based approach. In Proc. of IEEE INFOCOM’05, Miami, Florida, May 2005. [9] M. Liljenstam, Y. Yuan, B. Premore, and D. Nicol. A Mixed Abstraction Level Simulation Model of Large-Scale Internet Worm Infestations. Proc. of 10th IEEE/ACM Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), October 2002. [10] D. Moore, C. Shannon, G. M. Voelker, and S. Savage. Internet Quarantine: Requirements for Containing SelfPropagating Code. Proc. of IEEE INFOCOM’2003, March 2003. [11] J. Rochlis and M. Eichin. WithMicroscope and Tweezers: The Worm from MIT’s Perspective. Communication of the ACM, 32(6):689–698, June 1989. [12] S. Staniford, V. Paxson, and N.Weaver. How to Own the Internet in Your Spare Time. Proc. of 11th USENIX Security Symposium, San Francisco, August 2002. [13] M. M. Williamson. Throttling Viruses: Restricting Propagation to Defeat Malicious Mobile Code. Proc. of Annual Computer Security Application Conference (ACSAC’02), December 2002. [14] C. C. Zou, W. Gong, and D. Towsley. Code Red Worm Propagation Modeling and Analysis. Proc. of 9th ACM Conference
[15] S. Russel and P. Norvig, “Artificial Intelligence, A Modern Approach,” Printice Hall Inc., New Jersey, USA, 1995. [16] H. M. Faheem, “A MULTIAGENT-BASED APPROACH FOR MANAGING SECURITY POLICY,” Wireless and Optical Communications Networks, 2005.Second IFIP International Conference, March 2005 pp. 351 - 356
Available Bandwidth Based Congestion Avoidance Scheme for TCP: Modeling and Simulation A. O. Oluwatope, G. A. Aderounmu, E. R. Adagunodo, O. O. Abiona, and F. J. Ogwu# Comnetlab, Department of Computer Science and Engineering Obafemi Awolowo University Ile-Ife, Osun-State, 220001, Nigeria email: {aoluwato, oabiona, gaderoun, eradagun}@oauife.edu.ng, #
Department of Computer Science University of Botswana Botswana email: [email protected] halved current utilisation. To estimate responsiveness, the Abstract - Available bandwidth being the capacity of the ΔCapacity * Delay model where inc tcp is the TCP additive tightest link along a data path established by virtual circuit 8 * 2 * inc tcp service paradigm, is being proposed as an efficient congestion avoidance scheme for TCP to be deployed on a grid system. To increase. It is approximately 1500 bytes in Ethernet by RFC achieve our goal, we modified the congestion avoidance 2581. For example, it will take a typical LAN about 11 segment of the reno version of TCP - multiplicative decrease; round-trip-times (rtt) to attain full utilization from 50Mbps. by replacing it with a scheme based on available bandwidth. Whereas a source needs about 2500 rtt i.e 5 minutes to The proposed scheme was modeled using finite automata attain full capacity utilisation from 500Mbps in a worse theory and simulated in Matlab7.0 across a 10-hop of 1Gbps case grid and 50 minutes for best case grid. The per link data path. To estimate available bandwidth, the size of the probing packet was set to 100 packets using nonresponsiveness in grid is poor. Secondly, TCP floods the intrusive probing method. We observed that our estimation network with packets in transit. The maximum number of algorithm converged, at most, at the third iteration and at packets in transit (NPT) is an approximate measure of the steady-state, TCP current window size increased by 3000maximum number bytes in transit per TCP maximum 4000% at 95% confidence interval. segment size, MSS (≈1500*8bits) Capacity * Delay I. INTRODUCTION . In a typical LAN of NPT ≈ Transmission control protocol, TCP, has performed credible max 8 * MSS well and sustained the Internet since it inception. TCP has 10Mbps and 2ms, the NPT ≈ 2 which implies that the not been without challenges. Over-time, researchers have source has sent at most two packets as at when a duplicate been working round the clock to provide solutions to these ACK was received indicating packet loss. Routers buffers challenges. These efforts have given rise to variants of TCP are unlikely to fill up frequently along the data path some of which have been deployed on the Internet. TCP between source and destination. Whereas in a worst case was originally designed and optimised for short-lived grid of 1Gbps and 120ms, NPT = 10000 and best case grid connections. Short-lived TCP connections are typically of 10Gbps NPT=100000. If we assume a data path of 11local area network, LAN, of 100Mbps capacity, C, and 5ms hops with each router's buffer about 200 packets, then at delay, D, and slow wide area network, WAN, of 2Mbps and arrival of the duplicate ACK, the routers must have been 40ms. In such networks, at steady-state, TCP congestion 10000 control mechanism attains full capacity utilisation, after flooded with packets ( ≈ 909 ) packets per router. congestion avoidance, in approximately 120ms considering 11 a worst case situation. We assume TCP-reno for TCP By the time the source receives a signal to reduce its current except otherwise stated. While in a typical long-lived TCP window size, it is too late to remedy the situation. The connections, such as the grid with capacity 1-10Gbps and situation is even worse for the best case grid. Grids, from 120ms.The existing control mechanism in TCP uses users' view, are computer systems existing, within different additive increase multiplicative decrease, AIMD, as a base organisations and management domains; in an ad-hoc congestion avoidance algorithm [9]. AIMD recovers (i.e network which enables users share computing resources regains full capacity utilisation) sluggishly from a packet across multiple organisations. They are characterised by loss situation in a long-lived TCP connection making TCP huge file transfers spread over different countries. Further unacceptable to grid applications. The idea of TCP recovery motivated by grid developers use of tools and middleware here is termed responsiveness i.e the measure of how fast such as Globus Toolkits and GridFTP [7] respectively; are the source can get back at using the network link at its full built on top of the TCP, we found the TCP inadequate and capacity after suffering a packet loss and invariably having
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 85–89. © Springer Science+Business Media B.V. 2008
86
OLUWATOPE ET AL.
therefore incapable of driving the grid applications whether currently or for the future. A. Previous Works From our earlier discussion, TCP is not well-suited for grid environment deployed over networks with high bandwidth delay products, BDP. The problem lies in the TCP flow control algorithm which floods the routers' buffers with excess packets in transits. This problem is well known and researchers have solved it by setting buffer sizes manually [1], [2]. Manual buffer size settings requires specialized networking experts to do using network measurement tools which may not be possible always. Scheme such as AutoNcFTP [4], Enable [5] and DrsFTP [3] attempted to solve this problem by adjusting buffer sizes dynamically, thereby eliminating the need for a network expert. Drs-FTP sets buffer size to an estimated bandwidth ( i.e. divides number of packets sent or received by the rtt). Although, it is interesting, the problem with this approach is how to determine the interval rtt. Also, the use of packets sent or received does not account for packets losses or retransmissions. M. Herbert and P. Vicat-Blanc/Primet [6]in their work proposed another solution to the problem of BDP in grid environment called network of queues, NoS. In NoS, a congested queue informs a neighbor router of its situation so that it delays the sending of packets to its buffer until there is free buffer space - back-pressure. NoS while solving the problem at hand, introduced yet another problem - additional latency into the system. There is also the age-long Head-of-line (HOL) [6]blocking issue to contend with. In the work of J-P Martin-Flatin and S. Ravot [8], Jacobson's additive [9] increase algorithm was replaced with an algorithm whose main idea is to use binary search to speed up the rate increment, until increment size goes down to one MSS. The ratio of the difference between the last two available bandwidth, Abw, information and the link capacity was used to reduce the current window size after a packet loss, thereafter, applies the binary search based algorithm to throttle bandwidth. The issue of concern with this scheme is the fact that it assumes at booting a new TCP connection, the value for Abw should be the estimated link capacity. This assumption might be over ambitious at the start of a new connection and not be practicable at all times. B.
Contribution and Paper Layout
We propose in this work to address the issue of large number of packets in transit by employing available bandwidth information of a data path to optimally set the source new send-rate after a packet loss has occurred. Our contribution becomes evident in the substitution of Jacobson's multiplicative decrease algorithm [9] with the links available bandwidth information. The rest of the paper is organized as follows: section two presents the proposed scheme modeling, section three discusses the simulation scenario and result and section four concludes the paper and provides future research directions.
II.
THE PROPOSED SCHEME DESCRIPTION
Our initial objective is to build a congestion control scheme that preempts the formulation of large queue requiring complex technique to manage in the routers. This is achieved by performing a link Abw lookups before booting a new TCP connection, during congestion avoidance and fast retransmits phases. This scheme has two challenges. First, an Abw estimation algorithm that is reliable and efficient, secondly, how frequent the lookups should be done during the live-span of a connection? A. What is Available bandwidth, Abw.? Two bandwidth metrics that are commonly associated with Path P are the link's capacity and available bandwidth. The capacity C of a path is described as H (1) C = min[C ] i 1 C models the minimum transmission rate along a path P where H models the number of hops along the path and Ci is the capacity of the i-th hop (link). The hop with the minimum capacity is the narrow link on the path [10]. Available bandwidth (Abw), on the other hand, is the unused or spare capacity of a link during a certain period of time. If Ui is the utilization of link i over an interval (0, t), the average spare capacity of the link is modeled as Ci(1 Ui). Thus, the Abw on path P over the same interval is modeled as H ( 2) A = min[C (1 − U ] bw i i 1 [10] in their model describes a tight link as the link with the minimum available bandwidth which then determines the available bandwidth along the path. Figure 1 above shows that the first link is a narrow link C1 while the third is the tight link A3. The width of each pipe corresponds to the relative capacity of the corresponding link. The shaded area of each pipe shows the utilized part of that link's capacity, while the unshaded area shows the spare capacity. It should be noted that narrow link might occur differently from the tight link as demonstrated in figure 1 and at other times occur on the same link. B. How is Abw Measured? The technique used to measure a link's Abw here is a modified version of the Train of Packet Pair(TOPP) [10]. Suppose that a finite number of TCP packets, n, are sent from
87
AVAILABLE BANDWIDTH BASED CONGESTION AVOIDANCE SCHEME FOR TCP
Cwnd=ssThreshold
NO DUP ACK
phase
Congestion
Slow-Start
NO DUP ACK
Avoidance
NO D size UP A C [ret r._b K uff]= 0
connection
DUP ACK
NO DUP ACK
NO
PA DU
CK
DU P
AC K
PACKET
NO
if si ze[r etr. _buf f]
SEND NEW
NO DUP ACK
Figure 1. Pipe Model with fluid traffic for 4-hop network path source, with time-stamps, to a destination. The destination receives these packets one after the other and acknowledging arrival times t1,…., tn. Our scheme assumes a route that is optimal and fixed. The link’s Abw is assumed to be the first set of packets that arrived contiguously without break, m, at times t(1+α),……, t(m+ α) such that n ≥ m. This process is called link probing. Link probing is repeated for as long as m is not constant. Usually m is rounded down i.e. A = m bw ⎣ ⎦ Algorithm 1.0 Variables: R(ixj): Boolean; n,m: Integer; t(k:1,….,n) : Real Output: m; t(k:1,….,m) Repeat { // At the sender: Rij ← Routing matrix (); // compute routing matrix n ← Probe_Size ; // set probe packet size in packets t(k:1,….,n) ← time stamps(); // set time stamps Call send_probe(n Rij); // probe link with n packets // At the receiver: Call receive_ probe (m, Rij); // catch m packets t(k:1,….,m) ← time stamps(); // set time stamps n ← m; // set n ← m if n ≠ m } until ( n-m≈0)
C. How is rtt estimated ? To estimate the rtt, the delays t1+α - t1, ….. , tm+β - tm are computed per iteration and at the end, the mean is taken as the approximate one-way link delay. The estimated rtt is assumed to be twice the one-way delay. D. The Proposed Scheme Finite Automata Modeling The algorithm is modeled as a 5-tuple deterministic finite state automaton (Q, ∑, δ , q 0 , F ) where • • • • •
Q = {q0, qss, qca, qfr/fr, qfi} ∑ = {A← dupack, B← ack, C← ssthres,D ← (retr buff < Cwnd), E← (retr_ buff = 0)} The start state is q0 F = {q0} δ: Q X ∑ →Q
fr/fr
DUP ACK
Figure 2: State Transition of the Proposed Scheme
TABLE 1 TABLE DEFINING STATE TRANSITION δ qo qss qca qfr/fr qfi
A qo qo qfr/fr qfr/fr qfr/fr
B qss qss qca qfi
(BΛC) qca qfi -
(BΛD) qfi
(BΛE) qca
E. State q0 State q0 models the TCP connection phase in which a chosen path between a source; on the grid, and a destination is being established. The path is made of a set of routers {Ri} with known capacities Ci and utilisation Ui. The algorithm 1.0 is executed to estimate the available bandwidth Abw along such path. Two control variables are set: Cwnd and ssThreshold. Cwnd = A * γ * rtt (3) bw m ⎛ ⎞ ⎜ ∑ [ A ( n − j )] ⎟ bw ⎜ ⎟ j =1 ssThresshold = min⎜ A , ⎟ ( 4) bw m
⎜ ⎜ ⎝
⎟ ⎟ ⎠
F. State qss State qss models a situation in the session wherein cwnd increases carefully in tune with successfully sent packets (No Duplicate Acknowledgment). In this state, as along as acks keep arriving, as confirmation of received packets, cwnd increases step-wisely. G. State qca State qca models a situation during a connection session
88
OLUWATOPE ET AL.
wherein the system send-rate has attained a threshold i.e Cwnd=ssThreshold. In this state, the system re-estimates the prevalent Abw, if it is within the qca tolerance i.e ± δ of ssThresshold, then maintains the cwnd as along as acks keep coming. Else if Abw becomes much greater than ssThresshold, then 1 Cwnd ← [Cwnd + ( A − ssThresshold )] . Else set φ bw Cwnd ← [Cwnd −
1
(A − ssThresshold )] . When a ζ bw dupack occurs, the system transits to the fast restransmit and fast recovery state. The essence of adjusting cwnd is to ensure robustness against any eventuality. H. State qfr/fr In this state the system re-adjusts the source send-rate by setting cwnd and ssThreshold to new values based on the algorithms described in equations (3) and (4) and further readjusts to Cwnd = μ * A , then as retransmitted
⎣
⎦
bw packets arrives at the source, cwnd is increased gradually according to (5). Cwnd = (α + μ ) * A * γ * rtt (5) i bw Otherwise, the system re-adjusts the source send-rate downward gradually according to (6). Cwnd = (α − μ ) * A * γ * rtt ( 6) i bw With further stability in the source send-rate i.e. lost packets being successfully retransmitted across the network, the system checks retransmit buffer size for sizeof[retr buff] < cwnd, if true, the system transits to the next state qfi, else it remains in state qfr/fr. I. State qfi In this state, the system compensates for lost time by sending new packets in addition to retransmitting lost packets. In the event of dupacks, the system transits to qfr/fr, otherwise, it checks the retransmit buffer size for the condition sizeof[retr buff] = 0, if true, resets the ssThreshold value with equation (4) and then transits to qca. III.
Figure 4: Simulation Results of ABCC
To estimate Abw, probe_size was set to 100 TCP packets which were sent at regular interval. In addition, we set the following parameters accordingly γ = 0.2, δ = 25%, φ = 20, ς³ = 4. B. Results and Discussions We carried out about 10 simulation runs per simulation setup first with [α = 0.5 and μ = (0.1, 0.15 and 0.2)] and secondly with [α = 0.75 and μ = (0.05,0.1,0.15 and 0.2]. In figures 3 and 4, we plotted the mean of cwnd against instance of run showing lower and upper mean per run. We also computed the maximum likelihood estimation (MLE) of the means of cwnd and variance at 95% confidence interval. The results from MLE reveal the scenario that produces the largest cwnd with the least variance. The proposed scheme (ABCC) performed best with α = 0.5 &0.75 and μ = 0.15. Although, at α = 0.5 and μ = 0.2, the proposed scheme appears to be interesting too, but, at α = 0.75, the variance became significantly wide again. The mean cwnd at μ = 0.15 settles to a value between (40-50) packets per simulation time-step, from an initial value of less than 10, which suggests α optimal operating domain should be in the neighbourhood of 0.75. Figures 5 and 6 shows the system behaivour at α = 0.75 & μ = 0.15 and α = 0.5 & μ = 0.15 respectively.
MODEL VALIDATION
A. Simulation Setup We described proposed congestion control algorithm in Matlab7.0 over a simple network path consisting of ten routers of 1Gbps each. We modeled router’s instantaneous utilisations as normal random distribution algorithm per rtt.
Figure 5: ABCC Scheme Response Graph Figure 3: Simulation Results of ABCC
AVAILABLE BANDWIDTH BASED CONGESTION AVOIDANCE SCHEME FOR TCP
Quite interestingly, we also observed that our Abw estimation algorithm converges at the end of the second or third iteration. From figure 5, ABCC ensures cwnd starts from near zero, rises steadily to about 50 (slow-start), then continues to rise to about 70 (congestion avoidance) and oscillates about 70 thereafter when the upper bound is being considered. Our future effort is being directed towards curtailing the oscillation width considerably.
Figure 6: ABCC Scheme Response Graph
IV.
CONCLUSION AND FURTHERWORKS
In networks characterised with high bandwidth-delay product BDP, TCP is highly sensitive to packet loss mainly as a result misinterpreting transient link error as congestion, therefore, introducing inefficiency into the network performance. Also, TCP inefficiency is traceable to complex queue management scheme employed in some variants such TCP-RED [12], TCP-REM [11] etc. In this paper, we have shown from our model that it is possible to build a congestion control scheme for long-lived TCP connections in Grids, using the link available bandwidth as a quality of service guaranteeing scheme and a means to preempt complex queues formation. Our scheme also shows that it is possible to increase the aggregate number of packets sent per rtt over 1000%. The limitations of our simulation model include inability to include rtt estimation model and detailed protocol implementation. Another challenge is the large variance observed in the response graphs of figures 5 and 6. In future, we want to implement the detail protocol in ns-2 [13] so that we can systematically investigate the performance of our newly proposed scheme and benchmark against selected similar protocols. We will also investigate fairness and convergence efficiency when subjected to multiple long-lived TCP connections.
89
REFERENCES
[1] Pittsburg Supercomputing Center Enabling HighPerformance Data Transfers On Hosts. Http://Www.Psc.Edu/Networking/Perf Tune.Html.. [2] B. Tierney TCP Tuning Guide For Distributed Applications On Wide-Area Netwrok.In USENIX SAGE Login, Http://Www-Didc.Ldl.Gov/Tcp-Wan.Html, 2001. [3] A. Engelhart M. K. Gardner And W-C. Feng, ReArchitecting Flow-Control Adaptation For Grid Environments, IEEE International Parallel And Distributed Processing Symposium (IPDPS 2004), Santa Fe, New Mexico, USA, April 2004. [4] J. Liu And X J. Ferguson, Automatic TCP Socket Buffer Tuning, In Proceedings Of SC2000: HighPerformance Networking And Computing Conference (Research Gem), Http://Dast.Nlanrnet/Projects/Autobuf, November 2000. [5] B. Tierney,D. Gunter, J. Lee, And M. Stoufer. Enabling Network-Aware Applications. In Proceedings Of The IEEE International Symposium On High-Performance Distributed Computing, August 2001. [6] Marc Herbert And Pascale Vicatblanc/ Primet. A Case For Queue-To-Queue, Back-Pressure-Based Congestion Control For Grid Networks. Proceedings Of The International Conference On Parallel And Distributed Processing Techniques And Applications, PDPTA '04,, Las Vegas, Nevada, USA 2004, 769.774. [7] Globus Project. The Gridftp Protocol And Software. Http://Www.Globus.Org/Datagrid/Gridftp.Html. [8] J-P Martin-Flatin And S. Ravot. TCP Congestion Control In Fast Long-Distance Networks. Technical Report CALT-68-2398, California Institute Of Technology, USA, Http://Netlab.Caltech.Edu/FAST/Publications/Caltech-tr68-2398.Pdf. July 2002. [9] V. Jacobson And M.J. Karels. Congestion Avoidance And Control. ACM Computer Communication Review, 18(4),1988,314-329. [10] R. S. Prasadm. Murrayc. Dovrolis and K.C. Claffy. Bandwidth Estimation: Metrics, Measurement Techniques And Tools. In IEEE Network17(6), 2003. [11] S. Athuraliya V. H. Lis. H. Low and Q. Yin. Random Early Detection Gateways For Congestion Avoidance. In IEEE Network, 2001. [12] S. Floyd and V. Jacobson. Rem: Active Queue Management. In IEEE/ACM Transactions On Networking, 1(4),August 1993, 397-413. [13] K. Fall Andk. Varadhan. The Ns Manual. The VINT Project. UC Berkerley, LBL, USC/ISI And Xerox PARC, Http://Www.Isi.Edu/Nsnam/Ns/Nsdocumentation.Html, July 2005.
On the Modeling and Control of the Cartesian Parallel Manipulator Ayssam Y. Elkadya, Sarwat N. Hannab and Galal A. Elkobrosyb (a) Dept. of computer science and engineering, faculty of engineering, University of Bridgeport, USA. (b) Dept. of engineering mathematics and physics, faculty of engineering, Alexandria University, Egypt.
Abstract-The Cartesian Parallel Manipulator (CPM) which proposed by Han Sung Kim, and Lung-Wen Tsai [1] consists of a moving platform that is connected to a fixed base by three limbs. Each limb is made up of one prismatic and three revolute joints and all joint axes are parallel to one another. In this way, each limb provides two rotational constraints to the moving platform and the combined effects of the three limbs lead to an overconstrained mechanism with three translational degrees of freedom. The manipulator behaves like a conventional X-Y-Z Cartesian machine due to the orthogonal arrangement of the three limbs. In this paper, the dynamics of the CPM has been presented using Lagrangian multiplier approach to give a more complete characterization of the model dynamics. The dynamic equation of the CPM has a form similar to that of a serial manipulator. So, the vast control literature developed for serial manipulators can be easily extended to this class of manipulators. Based on this approach, four control algorithms; simple PD control with reference position and velocity only, PD control with gravity compensation, PD control with full dynamic feedforward terms, and computed torque control, are formulated. Then, the simulations are performed using Matlab and Simulink to evaluate the performance of the four control algorithms.
I. INTRODUCTION Parallel manipulators are robotic devices that differ from the more traditional serial robotic manipulators by virtue of their kinematic structure. Parallel manipulators are composed of multiple closed kinematic loops. Typically, these kinematic loops are formed by two or more kinematic chains that connect a moving platform to a base. This kinematic structure allows parallel manipulators to be driven by actuators positioned on or near the base of the manipulator. In contrast, serial manipulators do not have closed kinematic loops and are usually actuated at each joint along the serial linkage. Accordingly, the actuators that are located at each joint along the serial linkage can account for a significant portion of the loading experienced by the manipulator. This allows the parallel manipulator links to be made lighter than the links of an analogous serial manipulator.
Fig. 1: Assembly drawing of the CPM.
Hence, parallel manipulators can enjoy the potential benefits associated with light weight construction such as high-speed operation [2]. Han Sung Kim and Lung-Wen Tsai [1] presented a parallel manipulator called CPM (figure 1) that employs only revolute and prismatic joints to achieve translational motion of the moving platform. They described its kinematic architecture and discussed two actuation methods. For the rotary actuation method, the inverse kinematics provides two solutions per limb, and the forward kinematics leads to an eighth-degree polynomial. Also, the rotary actuation method results in many singular points within the workspace. On the other hand, for the linear actuation method, there exists a one-to-one correspondence between the input and output displacements of the manipulator. Also, they discussed the effect of misalignment of the linear actuators on the motion of the moving platform. They suggested a method to maximize the stiffness to minimize the deflection at the joints caused by the bending moment because each limb
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 90–96. © Springer Science+Business Media B.V. 2008
ON THE MODELING AND CONTROL OF THE CARTESIAN PARALLEL MANIPULATOR
91
structure is exposed to a bending moment induced by the external force exerted on the end-effector. In this paper using Lagrange formulation, we develop the dynamic equation of the CPM. Based on the dynamical model, we reformulate four basic control algorithms simple PD control with reference position and velocity only, PD control with gravity compensation, PD control with full dynamic feedforward terms, and computed torque control. Then the simulation is used to evaluate the performance of each control algorithm. The paper is organized as follows. In Section II, the kinematic relations are developed. In Section III, we use the Lagrange’s equations of motion to derive the dynamic equation of the CPM. Four control algorithms are reviewed in Section IV. In Section V, Simulation is described. Simulation results are presented in Section VI, followed by concluding remarks in Section VII. II. PROBLEM FORMULATION The kinematic structure of the CPM is shown in figure 2 where a moving platform is connected to a fixed base by three PRRR (prismatic-revolute-revolute-revolute) limbs. The origin of the fixed coordinate frame is located at point O and a reference frame XYZ is attached to the fixed base at this point. The moving platform is symbolically represented by a square whose length side is 2L defined by B1, B2, and B3 and the fixed base is defined by three guide rods passing through A1, A2, and A3. The three revolute joint axes in each limb are located at points Ai, Mi, and Bi and are parallel to the groundconnected prismatic joint axis. The first prismatic joint axis lies on the X-axis; the second prismatic joint axis lies on the Y-axis; and the third prismatic joint axis is parallel to the Zaxis. Point P represents the center of the moving platform. The link lengths are L1 and L2. The starting point of a prismatic joint is defined by d0i and the sliding distance is defined by di d0i. The schematic diagrams of the three limbs of the CPM are sketched in figures 3-a, -b and -c. The relationships for the three limbs are written for the position P[x, y, z] in the coordinate frame XYZ and by differentiation them with respect to time yields to equation 1.
⎡θ11 ⎤ ⎡ x ⎤ ⎢ ⎥ ⎢ ⎥ ⎢θ 21 ⎥ = Γ ⎢ y ⎥ ⎢θ31 ⎥ ⎢⎣ z ⎥⎦ ⎣ ⎦
Fig. 2: Spatial 3-PRRR parallel manipulator
Fig. 3.a: Description of the joint angles and link lengths for the first limb.
(1)
Fig. 3.b: Description of the joint angles and link lengths for the second limb.
92
ELKADY ET AL.
coordinates which are θ11, θ21, and θ31 beside the generalized coordinates x, y, and z. Thus we have θ11, θ21, θ31, x, y, and z as the generalized coordinates. Equation 3 represents a system of six equations in six variables, where the six variables are λi for i = 1, 2 and 3, and the three actuator forces, Qj for j =
Figure 3.c: Description of the joint angles and link lengths for the third limbs.
where ⎡ ⎤ (y −L1 cosθ11 −L) (z −L1 sinθ11) 0 ⎢ ⎥ ((y −L)sinθ11 −z cosθ11)L1 ((y −L)sinθ11 −z cosθ11)L1 ⎥ ⎢ ⎢ ⎥ (x −L1 sinθ21 −L) (z −L1 cosθ21) Γ=−⎢ 0 ⎥ L1(z sinθ21 +(L −x )cosθ21)⎥ ⎢ L1(z sinθ21 +(L −x )cosθ21) ⎢ ⎥ (x −L1 cosθ31) (y −D + L1 sinθ31 + L) 0 ⎢ ⎥ ⎣⎢(x sinθ31 +cosθ31(y −D + L))L1 (x sinθ31 +cosθ31(y −D + L))L1 ⎦⎥
where θ11 , θ21 and θ31 are the derivatives of θ11, θ21 and θ31 with respect to the time and x , y and z are the x, y and z components of the velocity of point P on the moving platform in the reference frame. Differentiation equation 1 with respect to the time, ⎡θ11 ⎤ x⎤ ⎡ ⎡ x ⎤ ⎢ ⎥ ⎢ y ⎥ + d Γ ⎢ y ⎥ ⎢θ 21 ⎥ = Γ ⎢ ⎥ dt ⎢ ⎥ ⎢θ31 ⎥ ⎢⎣ ⎢⎣ z ⎥⎦ z ⎥⎦ ⎣ ⎦
(2)
III. DYNAMICS OF CLOSED-CHAIN MECHANISM
∂q j
j
∑ i =1
i
1 [ m1 + 2 m 2 + m 3 + m 4 ]( x 2 + y 2 + z 2 ) 2 m m + ( 1 + 2 ) L12 (θ 2 + θ 2 + θ 2 ) 11 21 31 6 4
T =
(4)
Where the mass of the tool is m4 and m3 is the mass of the prismatic joint Ai and its actuator. The total potential energy V of the manipulator is calculated relative to the plane of the stationary platform of the manipulator, and is found to be: m + m2 (5) V =− 1 gL (sin θ + cos θ ) − ( m + 2 m + m + m ) gz 2
1
11
21
1
2
3
4
The Lagrange function is L= T− V where
The Lagrange formulation [4] is used to find the actuator forces required to generate a desired trajectory of the manipulator. In general, the Lagrange multiplier approach involves solving the following system of equations: k d ∂L ∂L ∂f i (3) = Q + (λ ( )− ) dt ∂q j
4, 5, and 6. The external generalized forces, Qj for j=1, 2, and 3 are zero since the revolute joints are passive. This formulation requires three constraint equations, fi for i = 1, 2, and 3, that are written in terms of the generalized coordinates. It can be assumed that the first link of each limb is a uniform rod and its mass is m1. The mass of second rod of each limb is evenly divided between and concentrated at joints Mi and Bi. So, the two particles Mi and Bi have the same mass which is 0.5m2. This assumption can be made without significantly compromising the accuracy of the model since the concentrated mass model of the connecting rods does capture some of the dynamics of the rods. The total kinetic energy of the manipulator T is given by:
∂q j
For j =1 to n, where j : is the generalized coordinate index, n: is the number of generalized coordinates, i : is the constraint index, qj: is the jth generalized coordinate, k : is the number of constraint functions, L : is the Lagrange function, where L= T− V, T : is the total kinetic energy of the manipulator, V : is the total potential energy of the manipulator, fi : is a constraint equation, Qj : is a generalized external force, and λi : is the Lagrange multiplier. Theoretically, the dynamic analysis can be accomplished by using just three generalized coordinates since this is a 3 DOF manipulator. However, this would lead to a cumbersome expression for the Lagrange function, due to the complex kinematics of the manipulator. So we choose three redundant
L = A (x 2 + y 2 + z 2 ) + B (θ2 + θ2 + θ2 ) + C (sin θ11 + cosθ21 ) + Ez (6) 11 21 31
Where: 1 A = [ m 1 + 2m 2 + m 3 + m 4 ] , 2
B=(
m1 + m2 m1 m2 2 , gL1 , + ) L1 C = 2 6 4
and E = (m1 + 2m 2 + m 3 + m 4 ) g Taking the derivatives of the Lagrange function with respect to the three generalized coordinates θ11, θ21, and θ31, we obtain (7) 2 B θ11 − C cos θ11 = λ1 2B θ21 + C sin θ 21 = λ2
(8) (9) 2B θ 31 = λ3 Rearrangement of equations 7, 8, and 9 then by substituting into equation 2 yields x⎤ ⎡ λ1 ⎤ ⎡ ⎡ x ⎤ ⎡ − cos θ11 ⎤ (10) ⎢λ ⎥ = 2 BΓ ⎢ ⎥ + 2 B d Γ ⎢ y ⎥ + C ⎢ sin θ ⎥ y 2 21 ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ dt ⎢ ⎥ ⎢⎣ λ3 ⎥⎦ ⎢⎣ ⎢⎣ z ⎥⎦ ⎢⎣ 0 ⎥⎦ z ⎥⎦ Taking the derivatives of the Lagrange function with respect to the three generalized coordinates x, y, and z, we obtain (11) 2A x = Fx − Γ11λ1 − Γ 21λ2 − Γ 31λ3 (12) 2Ay = Fy − Γ12 λ1 − Γ 22λ2 − Γ 32 λ3
(13) 2Az − E = Fz − Γ13λ1 − Γ 23λ2 − Γ33λ3 where Fx , Fy and Fz are the forces applied by the actuator for the first, second and third limbs respectively. Γij is the (i, j)
ON THE MODELING AND CONTROL OF THE CARTESIAN PARALLEL MANIPULATOR
element of the Γ matrix. Rearrangement of equations 11, 12 and 13 then using equation 10 yields ⎡ Fx ⎤ ⎡0⎤ ⎢ F ⎥ = − ⎢ 0 ⎥ + ΓT C ⎢ y⎥ ⎢ ⎥ ⎢⎣ Fz ⎥⎦ ⎢⎣ E ⎥⎦
⎡ − cos θ11 ⎤ ⎢ sin θ ⎥ + (2AI 21 ⎥ ⎢ ⎢⎣ 0 ⎥⎦
+Γ
T
⎡ x ⎤ ⎡ x ⎤ dΓ ⎢ ⎥ 2B Γ) ⎢⎢ y ⎥⎥ + ΓT 2B y dt ⎢ ⎥ ⎢⎣ z ⎥⎦ ⎢⎣ z ⎥⎦
The dynamic equation of the whole system can be written as (14) F = M (q)q + G (q, q )q + K (q) Where ⎡ x ⎤ ⎡ Fx ⎤ , ⎡x ⎤ , ⎡ x ⎤ , M (q ) = 2AI ⎢ ⎥, F = ⎢⎢ Fy ⎥⎥ q = ⎢ y ⎥ q = ⎢ y ⎥ q = ⎢⎢ y ⎥⎥ ⎢ ⎥ ⎢⎣ z ⎥⎦ ⎢⎣ Fz ⎥⎦ ⎢⎣ z ⎥⎦ ⎢⎣ z ⎥⎦
G (q , q ) = ΓT 2B
+Γ
T
2B Γ
d Γ , and dt
⎡0⎤ K (q ) = − ⎢ 0 ⎥ + ΓT C ⎢ ⎥ ⎢⎣ E ⎥⎦
⎡ − cos θ11 ⎤ ⎢ sin θ ⎥ 21 ⎥ ⎢ ⎢⎣ 0 ⎥⎦
Where q is the vector of joint displacement, q is the vector of joint velocities, F is the vector of applied force inputs, M(q) is the manipulator inertia matrix, G (q, q ) is the manipulator centripetal and coriolis matrix, and K(q) is the vector of gravitational forces. IV. CONTROLLER DESIGN In this section, we review four basic control algorithms for control of the CPM: A. PD Control with Position and Velocity Reference
The values for the joint position error and the joint rate error of the closed chain system are used to compute the joint control force F. (15) F = K P e + K D e Where e=qd - q, which is the vector of position error of the individual actuated joints and e = qd − q , which is the vector of velocity error of the individual actuated joints. Where qd and q d are the desired joint velocities and positions, and KD and KP are 3×3 diagonal matrices of velocity and position gains. Although this type of controller is suitable for real time control since it has very few computations compared to the complicated nonlinear dynamic equations, there are a few downsides to this controller. Using local PD feedback law at each joint independently does not consider the couplings of dynamics between robot links. As a result, this controller can cause the motor to overwork compared to other controllers presented next. B. PD Control with Gravity Compensation
Consider the case when a constant equilibrium posture is assigned for the system as the reference input vector qd. It is desired to find the structure of the controller which ensures global asymptotic stability of the above posture. The control law F is given by: (16) F = K P e + K D e + K ( qd )
93
It has been shown [5] that the system is asymptotically stable but it is only proven with constant reference trajectories. Although with varying desired trajectories, this type of controller cannot guarantee perfect tracking performance. Hence, more dynamic modeling information is needed to incorporate into the controller. C. PD Control with Full Dynamics Feedforward Terms
This type of controller augments the basic PD controller by compensating for the manipulator dynamics in the feedforward way. It assumes the full knowledge of the robot parameters. The key idea for this type of controller is that if the full dynamics is correct, the resulting force generated by the controller will also be perfect. The controller is in the form F = M ( qd ) qd + G ( qd , q d ) q d + K ( qd ) + K P e + K D e (17) If the dynamic knowledge of the manipulator is accurate, and the position and velocity error terms are initially zero, the applied force F is sufficient to maintain zero tracking error during motion. D. Computed Torque Control
This controller uses a model of the manipulator dynamics to estimate the actuator forces that will result in the desired trajectory. Since this type of controller takes into account the nonlinear and coupled nature of the manipulator, the potential performance of this type of controller should be quite good. The disadvantage of this approach is that it requires a reasonably accurate and computationally efficient model of the inverse dynamics of the manipulator to function as a real time controller. The controller computes the dynamics online, using the sampled joint position and velocity data. The key idea is to find an input vector F, using the following force law as described by Lewis [3], which is capable to realize an input/output relationship of linear type. It is desired to perform not a local linearization but a global linearization of system dynamics obtained by means of a nonlinear state feedback. (18) F = M ( q )[ qd + K D e + K p e ] + G ( q , q ) q + K ( q ) To show that the computed torque control scheme linearizes the controlled system, the force computed by equation 18 is substituted into equation 14, yielding: M (q )q = M ( q)qd + M (q )[ K D e + K p e]
Multiplying each term by M-1(q), and substituting the relationship, e = qd − q , provides the following linear relationship for the error: (19) e + k D e + k P e = 0 This relationship can be used to select the gains to give the desired nature of the closed loop error response since the solution of equation 19 provides a second order damped system with a natural frequency of ωn , and a damping ratio of
ζ
where: ωn = K P , ζ =
KD 2 KP
(20)
The natural frequency ωn determines the speed of the response. It is customary in robot applications to take the
94
ELKADY ET AL.
damping ratio ζ = 1 so that the response is critically damped. This produces the fastest non-oscillatory response. So, the values for the gain matrices KD and KP are determined by setting the gains to maintain the following relationship: (21) KD = 2 KP V. SIMULATION In controlling the manipulator, any sudden changes in desired joint angle, velocity, or acceleration can result in sudden changes of the commanded force. This can result in damages of the motors and the manipulator. Here, the manipulator is given a task to move along careful preplanned trajectories without any external disturbances or no interaction with environment. The sample trajectory of the end-effector is chosen to be a circular path (see figure 4) with the radius of 0.175 meters and its center is O(0.425 ,0.425 ,0.3). This path is designed to be completed in 4 seconds when the end-effector reaches the starting point P1(0.6, 0.425, 0.3) again with constant angular velocity ω=0.5Π rad/sec. The desired end-effector position along x-axis is x=0.425+0.175cos(ωt) meter, along y-axis is y=0.425+0.175sin(ωt) meter and along z-axis is z = const.=0.3 meter where t is time in seconds. The performance of each control method is evaluated by comparing the tracking accuracy of the end-effector. The tracking accuracy is evaluated by the Root Square Mean Error (RSME). The end-effector error is defined as (22) E xyz = (e x2 + e y2 + e z2 ) where ex, ey , and ez are the position errors in x-, y-, and z-axis given in manipulator’s workspace coordinates. RSME =
∑E
2 xyz
(23)
n
where n is the number of the samples. The simulation is used to find a set of minimum proportional gain KP and derivative gain KD that minimized RSME. It must be considered that the actuators cannot generate forces larger than 120 Newtons. The values of the physical kinematic and dynamic parameters of the CPM are given in table 1 and 2.
In this section, some results are presented for the four control algorithms implemented on the CPM. The simulation results are presented in table 3.
TABLE 1 KINEMATIC PARAMETERS OF THE CPM L(m)
L1(m)
L2(m)
D (m)
Values
0.105
0.5
0.373
0.9144
Values
DYNAMIC PARAMETERS OF THE CPM m1 (kg) m2 (kg) m3 (kg) 1.892994
0.695528
0.2
It was required that the robot achieved the desired trajectory with a position error less than 3x10-3 m after 0.3 seconds. Although this controller is easy to implement and no knowledge of the system is needed to develop this type of controller, the tracking ability is very poor (especially along zaxis because of the limbs weight) compared to the rest of the controllers used in this paper. The position and velocity errors of the end-effector obtained from this controller are shown in figures 5 and 6. To improve the performance, the proportional gain KP must be increased but it is impossible because of the limitation of the actuators. B. PD Control with Gravity Compensation
It was required that the robot achieved the desired trajectory with a position error less than 3x10-4 m after 0.3 seconds. The implementation of the PD controller with gravity compensation requires partial dynamic modeling information incorporated into the controller. The simulation results show a significant improvement in tracking ability from a simple PD controller (see figures 7 and 8).
It was required that the robot achieved the desired trajectory with a position error less than 10-5m after 0.3 seconds. The model based controllers such as this type and computed torque controller can generate force commands more intelligently and accurately than simple non-model based controllers. After 0.4 seconds, the position errors are approximately zeros but the velocity errors are approximately zeros after 0.3 seconds (see figures 9 and 10). D. Computed Torque Control
TABLE 2 Parameters
A. PD Control with Position and Velocity Reference
C. Pd Control with Full Dynamics Feedforward Terms
VI. SIMULATION RESULTS
Parameters
Fig. 4: End-effector path for the circular trajectory.
m4 (kg) 0.3
The initial conditions of the error and its derivative of our sample trajectory of the end-effector are e(0) = [0 0 0]t, and e(0) = [0 e0 0]t then the solution of equation 19 is:
e = e0te −0.5 K D t
(24)
Equation 24 suggests that the derivative gain KD should be a
ON THE MODELING AND CONTROL OF THE CARTESIAN PARALLEL MANIPULATOR
95
maximum value to achieve the desired critical damping but the actuator force cannot exceed more than 120 Newtons. According to equation 24, the position errors in x-, z-axis are zeros because the initial velocity errors in x-, z-axis are zeros. After 0.2 seconds, the position and velocity errors are approximately zeros (see figures 11 and 12). The simulation results show that the computed torque controller gives the best performance. This is a result of the computed torques canceling the nonlinear components of the controlled system. TABLE 3 THE PERFORMANCE OF VARIOUS CONTROLLERS. Controller Pd Control with Position and Velocity Reference Pd Control with Gravity Compensation Pd Control with Full Dynamics Feedforward Computed Torque Control
KP
KD
Position RSME
Velocity RSME
12691
436
2.7x10-3
0.0223
8507
436
3.4804 x10-4
7053
436
3.0256 x10
-4
0.0182
2550.25
101
2.3469 x10-4
0.0161
0.021
Fig. 7: Position error of the end-effector obtained from the PD Controller with Gravity Compensation.
Fig. 8: Velocity error of the end-effector obtained from the PD Controller with Gravity Compensation. Fig. 5: Position error of the end-effector obtained from the Simple PD Controller.
Fig. 6: Velocity error of the end-effector obtained from the Simple PD Controller.
Fig. 9: Position error of the end-effector obtained from the PD Controller with Full Dynamics Feedforward terms within the first 0.4 Seconds.
96
ELKADY ET AL.
VII. CONCLUSION In this paper, using Lagrangian multiplier approach, a model for the dynamics of the manipulator is developed which has a form similar to that of a serial manipulator. Then we have presented four control algorithms on the CPM. The performance of these controllers are studied and compared As expected, complete mathematical modeling knowledge is needed to give the controller complete advantage in motion control. The model based control schemes perform better than non-model based controllers. Hence, the need for studying dynamics of robot manipulator as well as having a good understanding of various basic motion controller theories are important in designing and controlling motion of the robot to achieve the highest quality and quantity of work. ACKNOWLDGMENT Fig. 11: Position error of the end-effector obtained from the Computed Torque Controller within the first 0.25 Seconds.
The authors would like to thank Han Sung Kim for his valuable suggestions during this work. REFERENCES [1]
[2]
[3] [4] [5]
Fig. 12: Velocity error of the end-effector obtained from the Computed Torque Controller within the first 0.25 Seconds.
Fig. 12: Velocity error of the end-effector obtained from the Computed Torque Controller within the first 0.25 Seconds.
H. S. Kim, and L. Tsai, “Design optimization of a cartesian parallel manipulator”, in Proceedings of DETC’02, ASME 2002 Design Engineering Technical Conferences and Computers and Information in Engineering Conference Montreal, Canada, September 29 - October 2, 2002. R. E. Stamper, “A Three Degree of Freedom Parallel Manipulator with Only Translational Degrees of Freedom”, Dissertation submitted to the Faculty of the Graduate School of the University of Maryland at College Park for the degree of PhD, 1997. F. Lewis, C. Abdallah, and D. Dawson, “Control of Robot Manipulators”, MacMillan Publishing Company, 1993. L. W. Tsai, “Robot Analysis: the mechanics of serial and parallel manipulators”, John Wiley & Sons, 1999. M. W. Spong, “Motion Control of Robot Manipulators”, University of Illinois at Urbana-Champaign, 1996.
Resource Allocation in Market-Based Grids Using a History-Based Pricing Mechanism Behnaz Pourebrahimi, S. Arash Ostadzadeh, and Koen Bertels Computer Engineering Laboratory, Delft University of Technology Delft, The Netherlands {behnaz, arash, koen}@ce.et.tudelft.nl
Abstract-In an ad-hoc Grid environment where producers and consumers compete for providing and employing resources, trade handling in a fair and stable way is a challenging task. Dynamic changes in the availability of resources over time makes the treatment yet more complicated. Here we employ a continuous double auction protocol as an economic-based approach to allocate idle processing resources among the demanding nodes. Consumers and producers determine their bid and ask prices using a sophisticated history-based dynamic pricing strategy and the auctioneer follows a discriminatory pricing policy which sets the transaction price individually for each matched buyer-seller pair. The pricing strategy presented generally simulates human intelligence in order to define a logical price by local analysis of the previous trade cases. This strategy is adopted to meet the user requirements and constraints set by consumers/producers. Experimental results show waiting time optimization which is particularly critical when resources are scarce.
I.
INTRODUCTION
In High Performance Computing (HPC) terminology, Grid refers to an environment with the aim of hooking many independent or loosely coupled tasks to available idle processing resources provided by the workstations in the system. Condor [1] is a typical example of such systems that manages pools of hundreds of workstations around the world and allows the utilization of idle CPU cycles among them. Due to heterogeneities present in Grid environments, resource management is often based on approaches which are both system and user centric. System centric approaches are traditional ones which attempt to optimize system-wide measure of performance such as overall throughput of the system. On the other hand, user centric approaches concentrate on providing maximum utilization to the users of the system based on their QoS requirements, i.e., a guarantee of certain levels of performance based on the attributes that the user finds important such as the deadline by which the jobs have to be completed [2]. Economic-based approaches provide an appropriate background in order to encourage resource owners to contribute their processing supplies to the Grid environment. This is the base of user centric performance, where the service received by each individual node tailored for one’s own requirements and preferences, is considered in addition to the utilization of the system as a whole. Nimrod-G [3] is an
instance of economic-based systems which introduces the concept of computational economy in managing and scheduling resources in Grids. Meeting QoS constraints together with maintaining an acceptable level of system performance and utilization is the primary problem to tackle in ad-hoc Grid environments where the availability of resources and workloads are changing dynamically. This dynamic behavior in turn provokes challenges between the consumers and producers of resources in order to get hold of the required resources or tasks. In such a vibrating environment, delivering an appropriate degree of utilization both individually and globally is critical. In the economic-based approaches, scheduling and managing resources are made dynamically at running time and are directed by the end-user preferences and requirements. Economic-based models have been used widely in resource allocation algorithms [4] [5]. A suitable platform for resource allocation in Grids is an auction model. As auctions allow consumers and producers of resources to compete in a dynamic environment where no global information is available and the price is fixed based on local knowledge. Several researches have been reported on auction mechanisms for resource allocation in Grids [6][7][8][9][10]. Our proposed strategy differs from the previous related approaches in two contexts. We introduce a sophisticated history-based dynamic pricing strategy adopted by consumers and producers to determine the preferred prices based on the requirements rather than using a static reservation price. We also investigate the market-based approach in a dynamic network, where the resources are not dedicated and the number and availability of the resources may change at any given time. We utilize an economic-based model for scheduling and managing resources in market-based Grids. A Continuous Double Auction (CDA) protocol with discriminatory pricing policy is used as the basic platform for matchmaking where consumers and producers trade. Our proposed method is distinguished from previously introduced strategy [11] in a sense that a rational analysis of the previous trading cases weighing the matching time for each individual node, is conducted to settle a new price. Budget constraints for nodes are also applied in this implementation.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 97–100. © Springer Science+Business Media B.V. 2008
98
POUREBRAHIMI ET AL.
frame and price. When a resource becomes available and several tasks are waiting, the one with the highest price bid is processed first. If no match is found, the task query is placed in a queue. The queries are kept in queue until the defined TimeTo-Live (TTL) field is expired or a match is found. Transaction price is determined as the average of bid and ask prices.
Fig. 1. Market-based Grid components
We examine accesses to the Grid resources for individual nodes in our model and compare it with a non-economic approach. Elapsed time for matching is calculated in favor of comparative studies. The results are promising and demonstrate a higher degree of waiting time optimization for individual nodes. Furthermore, system performance measurements demonstrate that the new strategy outperforms the previous methods. It provides more or less the same task utilization and is quicker in matchmaking. The paper is organized as follows. Section II introduces the basic market-based Grid model and the proposed pricing mechanism. The experimental results concerning the waiting time for requests/offers are presented in section III. Finally, section IV summarizes the concluding remarks. II. MARKET-BASED GRID MODELING A conventional market-based Grid model includes three different types of agents: Consumer (buyer), Producer (seller) and Auctioneer (matchmaking coordinator). There is one consumer/producer agent per node. A consumer/producer agent submits its request/offer to the auctioneer. The auctioneer agent manages the market, adopting a particular auction protocol. Fig. 1 depicts the components contained in such a system. Continuous Double Auction (CDA) with discriminatory pricing policy is used as the economic-based protocol for matchmaking. CDA supports simultaneous participation of producer/consumer, observes resource/request deadlines and can accommodate the variations in resource availability which is the case in ad-hoc Grids. According to this market model, buy orders (bids) and sell orders (asks) may be submitted at any given time during the trading period. Whenever there are open bids and asks that can be matched or are compatible in terms of price and requirements (e.g. quantity of resources), a trade is executed immediately. The auctioneer seeks correspondence between buyers and sellers by matching offers (starting with the lowest price and moving up) with requests (starting with the highest price and moving down). When a task query arrives at the market place, the protocol searches all available resource offers and returns the best match which satisfies the task constraints, namely resource quantity, time
A. Request / Offer Specifications Each request/offer submitted by consumer/producer has a specification which contains different segments. These segments determine request or offer details, requirements and constraints. • Request. A request message contains three segments: 1. Task Details include information about the task such as execution time and Task ID. Execution time is an estimated processing time needed for execution. As different nodes have different hardware architectures, this time is calculated based on a reference processing unit. Task ID is a unique number denoting each task and is used in the case of multiple requests from the same consumer. 2. Task Deadline specifies the deadline for the task execution. 3. Price Constraints contain buyer price and buyer budget. Buyer price is the upper bound price the consumer is willing to pay for each unit of resource. Budget denotes the maximum budget currently available for the consumer. • Offer. An offer message comprises three segments: 1. Resource Details contain information about the resource characteristics such as CPU speed. 2. Resource Deadline indicates the time interval during which the resource is available. 3. Price Constraint denotes a seller price that is the value of each unit of resource, set as a lower bound. B. History-based Dynamic Pricing Strategy Consumer and producer agents join the market with an initial predefined price and dynamically update it over the time using an intelligent pricing strategy presented in this work. The price is defined as the value of each unit of resource in which the consumer and producer agents are willing to buy or sell. There is an upper limit for consumer price (bid price) which is rationally defined by the individual node budget, as bidPrice*resourceQuantity<=Budget. We also set a minimum value for producers below which they are not willing to offer their resources. The agents perceive the demand and supply of the resources through their previous experiences and update their prices accordingly by careful inspections of the matching times in their former cases. Based on this strategy, ask and bid prices
RESOURCE ALLOCATION IN MARKET-BASED GRIDS
99
are defined respectively for producers and consumers as following: pa (t ) = max{ pmin , pa (t − 1) + Δpa } (1) and pb (t ) = min{ pmax , pb (t − 1) + Δpb } (2)
tunit refers to one unit of time in a particular resource request. The agreement between matched ask-bid pair is made at a transaction price which is defined as:
Where p(t) is the new price and p(t-1) denotes the previous price. pmin is the minimum acceptable value for producers and pmax is the maximum affordable price for consumers, which is defined as: pmax = Budget / resourceQuantity (3)
where pa and pb are prices offered by the seller and buyer respectively. This is the common definition of k-double auction pricing rule [12]. For a fair trade, we have assumed k = 0.5, however the user is free to set this value according to individual requirements and preferences.
resourceQuantity is the quantity of the needed resource. For example, it can refer to job execution time when CPU time is considered as the resource. Δp indicates whether or not the price is increasing. Δp for seller and buyer is defined based on the previous history of resource/task utilizations and timings at the corresponding seller or buyer.
III. EXPERIMENTAL RESULTS
and
Δp a = (u r (t ) − u thR ) ∗ α ∗ p a (t − 1)
(4)
Δp b = (u thT − u t (t )) ∗ β ∗ p b (t − 1)
(5)
α and β indicate the factors corresponding to the rates at which the prices are increased or decreased. uthT and uthR are threshold values denoting the lower bound for task/resource utilizations. In other words, uthT and uthR can be considered as satisfaction thresholds for agents. Low values for these parameters imply that the agent is satisfied with a low usage of its resources or a low completion rate of its tasks. High values, on the contrary, denote more demanding expectations of the agents. The resource and task utilizations, ur(t) and ut(t), define respectively the level of previous utilizations of producer and consumer from the Grid market. They are computed locally by each agent considering the time spans of previous matchmakings in the history of each node. To make the utilization factor even more logical, we weigh the matchmakings made quickly (with small or no delay time after the request is posted) compared to the prolonged ones due to the current parameter specifications. The utilization formula is as follows. t2
u (t ) = ∑ x(i) ∗ i = t1
tTTL (i) t match (i)
t2
∑
tTTL (i )
i = t1 t unit (i )
(6)
x(i) denotes one instance of request/offer within the time interval [t1…t2]. It has either the value of ‘1’ for a matched request/offer or ‘0’ for unmatched one. [t1…t2] is considered as the time interval during which n efforts have been made by the agent to buy or sell resources. In fact, n defines the number of former experiences considered for utilization estimate in the pricing strategy. This value can be dynamically set by the user to indicate the capacity of the matchmaking history memory. In our experiments, n is considered to be 10. tTTL indicates the Time-To-Live field for each offer of the seller agent or the required deadline set by the buyer agent for a resource request.
p = k ∗ p a + (1 − k ) ∗ pb
(7)
To perform our experiments, we set up a Grid like environment based on a LAN in which our application test-bed is developed using J2EE and Enterprise Java Beans. JBoss application server is used to implement the auctioneer. The network consists of three types of agents. Consumers have certain tasks to perform for which they demand resources and producers have idle resources to offer, and auctioneer takes care of the matching process. CPU time is considered as the resource in our system. For simplicity, we ignore allocation requirements and assume that tasks need CPU time only for execution. Whenever a consumer needs CPU time for performing a task, it sends a request to the auctioneer and similarly a producer announces its idle CPU time by sending an offer. The simulations are performed in an environment with 60 nodes. Each node is assigned a specific budget when joining the Grid. Each node creates a number of requests or offers during the simulation process. There are two user centric requirements which should be fulfilled by the system, namely deadline and budget. We compare our method with a non-economic First-Come First-Served (FCFS) approach under similar conditions. It should be noted that in FCFS, no pricing constraint is defined and resources are allocated based on the resource quantity and deadline constraint. We conduct the experiments in a condition where resources are scarce and the number of generated tasks exceeds the available resources. This condition has been chosen in order to investigate the waiting time behavior when resources are scarce, which resembles a critical situation for Grid users. Matching time analysis for each request/offer provides a valuable criterion for performance measurement. Using our proposed method, the time span between a request/offer issue and a corresponding match, is expected to be minimized, which results in quicker matchmaking. In this respect, the available resources in the Grid are not wasted and the throughput of the system is increased. Besides, the demanding nodes are not kept blocked with long delays in request satisfaction. Figure 2 depicts the elapsed waiting time for consumers after submitting a request till a match is found. The graph demonstrates the cases for FCFS and CDA. As inferred from the figure, the waiting time in CDA approach is quite
100
POUREBRAHIMI ET AL.
Fig. 2. Elapsed waiting time for consumers
lower compared to FCFS. This is not the case for producers, where both approaches show more or less the same values for the elapsed waiting times. The reason is due to scarce number of available resources and excessive number of potential tasks, which implies that a match can be quickly found for producers and they are not required to wait much.
IV. CONCLUDING REMARKS In this paper, a market-based resource allocation model in ad-hoc Grids is introduced. In our dynamic proposed model, consumers and producers adopt a sophisticated history-based pricing strategy in order to reasonably update the prices based on their previous matchmaking experiences. The distribution of the tasks among the resources in the Grid is studied with the new approach in the conditions where the resources are assumed to be scarce and the availability of tasks and resources varies over the time. Grid resource access is evaluated regarding the consumer/producer waiting time in the system and is compared with a non-economic conventional FCFS approach. The results show that our proposed strategy optimizes the waiting time for all submitted requests/offers, while FCFS demonstrate a low level of equitable utilization.
REFERENCES [1]
M. Litzkow, M. Livny, and M. Mutka, “Condor - a hunter of idle workstations”, In Proceedings of the 8th International Conference of Distributed Computing Systems, June 1988.
[2]
R. Buyya, D. Abramson, and S. Venugopal, The grid economy. In Special Issue on Grid Computing, vol. 93, pp. 698-714, 2005. [3] R. Buyya, D. Abramson, and J. Giddy, “Nimrod-G: An architecture for a resource management and scheduling system in a global computational grid”, In Proceedings of The 4th Int. Conf. on High Performance Computing in Asia-Pacific Region, USA, 2000. [4] R. Wolski, J. Brevik, J. S. Plank, and T. Bryan, Grid resource allocation and control using computational economies, In Grid Computing: Making The Global Infrastructure a Reality, John Wiley & Sons, 2003. [5] R. Buyya, D. Abramson, J. Giddy, and H. Stockinger, Economic models for resource management and scheduling in grid computing, Concurrency and Computation: Practice and Experience, vol. 14(13-15), pp.15071542, 2002. [6] C. Weng, X. Lu, G. Xue, Q. Deng, and M. Li, “A double auction mechanism for resource allocation on grid computing systems”, In GCC, page 269, 2004. [7] J. Gomoluch and M. Schroeder, Market-based resource allocation for grid computing: A model and simulation, 2003. [8] U. Kant and D. Grosu, “Double auction protocols for resource allocation in grids”, In Proceedings of the Int. Conf. on Information Technology: Coding and Computing, pp. 366-371, 2005. [9] D. Grosu and A. Das, “Auction-based resource allocation protocols in grids”, In Proceedings of the 16th IASTED Int. Conf. on Parallel and Distributed Computing and Systems, pp. 20-27, November 2004. [10] M. D. de Assuncao and R. Buyya, “An evaluation of communication demand of auction protocols in grid environments”, In Proceedings of the 3rd Int. Workshop on Grid Economics & Business, Singapore, 2006. [11] B. Pourebrahimi, K. Bertels, G. Kandru, and S. Vassiliadis, “Marketbased resource allocation in grids”, In proceedings of 2nd IEEE Int. Conf. on e-Science and Grid Computing, page 80, 2006. [12] M. Satterthwaite and S. Williams, “The Bayesian theory of the k-double auction”, The Double Auction Market: Institutions, Theories and Evidence, Santa Fe Institute Studies in the Sciences of Complexity, pp. 99-123, 1991.
Epistemic Structured Representation for Legal Transcript Analysis Tracey Hughes
Ctest Laboratories [email protected]
Cameron Hughes
Alina Lazar
Ctest Laboratories
Youngstown State University
[email protected]
Abstract - HTML based standards and the new XML based standards for digital transcripts generated by court recorders offer more search and analysis options than the traditional CAT (Computer Aided Transcription) technology. The LegalXml standards are promising opportunities for new methods of search for legal documents. However, the search techniques employed are still largely restricted to keyword search and various probabilistic association techniques. Rather than keyword and association searches, we are interested in semantic and inferencebased search. In this paper, a process for transforming the semistructured representation of the digital transcript to an epistemic structured representation that supports semantic and inferencebased search is explored.
1. Introduction HTML based standards and the new XML based standards for digital transcripts generated by court recorders offer more search and analysis options than the traditional CAT (Computer Aided Transcription) technology. The LegalXml standards are promising opportunities for new methods of search for legal documents. The HTML and LegalXML standards allow judges, lawyers and other interested parties to analyze digital transcripts with additional and increasingly sophisticated search techniques. However, the search techniques employed are still largely restricted to keyword search of the digital transcripts and various probabilistic association techniques[1]. Rather than keyword and association searches, we are interested in semantic and inference-based search[2]. In this paper, a process for transforming the semi-structured XML/HTML version of the digital transcript to an epistemic structured representation suitable for semantic and inference-based analysis is explored. This representation allows us to search for implicit knowledge. Implicit knowledge has higher visibility in this epistemic structure than it does in its semi-structured XML/HTML version. The epistemic structures presented in this paper are knowledge representation schemes used for constructing a model of the transcript domain. We are interested in the idea of viewing the digital transcript as a knowledge space[7]. Specifically, the arguments of the attorneys, the testimony of witnesses, and statements of defendants are converted from their semistructured representation to epistemic structures that collectively form the knowledge space of the trial. Once the transcript is converted, queries can be posed that can be
[email protected]
answered using semantic processing techniques[8] rather than probabilistic or keyword search techniques. The answers are ultimately derived from knowledge that is explicitly or implicitly given during the course of a trial. In particular, the questions and answers that occur during the examinations and cross-examinations by the attorneys of the witnesses and defendant(s), as well as unimpeached opening arguments and certain classes of objections are all used as micro-sources for contingent truth. Here we use the notion of contingent truth as it is used in modal logic introduced by Lewis[3] and some of its variations formalized by Kripke[4] and Hintikka[5]. We also take advantage of Hintikka’s formalizations for statements that are simply believed to be true[6]. Many statements made during the course of a trial are only possibly true and therefore answers to queries have to be qualified. Reference [4] and [5] give formalizations for this kind of qualification. In the HTML/XML version of a legal transcript, our candidates for contingent truth (e.g. questions, answers, examinations, cross examinations, etc.) are tagged. For instance, question and answer pairs are conveniently coded as: Q. Was Oren in fact a medical doctor? A. Yes, he’s a surgeon, doctor. The Q. and A. are tags placed in the HTML that conveniently identify the start and stop of a question and answer pair. Together the question and the answer can imply one or more propositions. In this case the proposition: Oren is a surgeon. can be inferred. The truth of this proposition is subject to the credibility, integrity and possibly the belief of the person answering the question or in the trial. This is an example of what we mean by as contingent truth. Each island of contingent truth in the transcript is considered for acceptance as a node in a concept graph[9]. If it is accepted it becomes a weighted part of the transcript’s knowledge space which makes it more visible to a user’s query. The proposition: Oren is a surgeon. is only available by making an inference from the combination of the question and answer pair and is therefore not visible to keyword queries of the semi-structured XML/HTML version
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 101–107. © Springer Science+Business Media B.V. 2008
102
HUGHES ET AL.
of the transcript. Reference[9] initially provided the most detailed treatment of conceptual graphs. We use this treatment of conceptual graph to facilitate our inference-based search to locate knowledge that can only be inferred through chains of implication. For instance, the answer to the question:
•
Deriving the epistemic structures for the knowledge space of the transcript
Was the testimony of Jones impeached?
2. Background
is not explicitly given in the text of the transcript and cannot be found by keyword or associative search techniques. Instead trial information is searched to determine whether Jones was a witness, or defendant. The appropriate examination, crossexamination information is searched to see what statements Jones has made. Then the statements of others in relation to the statements that Jones made are considered. The exhibits that relate to the statements that Jones made are examined. Finally, the statements that Jones made in relation to other statements that he has made are also considered. At that point the question of whether Jones testimony was impeached or not is answered. The search is done by traversing the nodes in the conceptual graphs.
Our transcript conversion process is part of the NOFAQS system. NOFAQS is an experimental system currently under development at Ctest Laboratories. NOFAQS is designed to answer questions and draw inferences from interrogative domains (e.g. interviews, trials, surveys/polls, and interrogations). NOFAQS uses rational agents and Natural Language Processing (NLP) to aid users during the process of deep analysis of any corpus derived from any context that is question and answer intensive. In this paper, we discuss how the system was used on a set of court room transcripts of a famous court case that lasted 76 days.
The combination Q&A into inferred propositions and the building of a conceptual graph model for the transcript are parts of our five step process that transforms the semistructure text into a epistemic structured representation. The five steps in the process are:
The corpus was obtained from the Internet as a collection of Hypertext Mark up Language (HTML) files. It consisted of:
Step 1: Using the HTML/XML and other structured rules of the transcript, convert the entire transcript into a tagged corpus. Step 2: Create a model theoretic semantic of the corpus[10], where the models are captured both as predicates and frames[11]. Step 3: Convert the Q&A pairs to propositions. Step 4: Construct frames from the propositions generated in Step 2. Step 5: Using the structures created in Steps 2, 3, 4 instantiate the structure that represents the knowledge space of the transcript/trial While this is somewhat of a simplification, these five steps capture the basic idea behind the transformation of the original transcript. Predicate Calculus and First Order Logic (FOL) are used as the primary representations for our frames and propositions in Steps 2 through 5. The two primary challenges discussed in this paper are: •
Exploiting the Q&A, opening and closing arguments, objections and other tagged patterns of legal transcripts to take advantage of explicit and implicit knowledge
2.1. Corpus
• • • • •
76 File formatted in standard HTML (one for each day of the trial) 25,979 Questions 25,930 Answers 461,938 Lines of domain relevant text 1,854,242 Domain relevant words
In this case, the corpus is a collection of digital transcripts generated by a scopist and the process of Computer-Aided Transcription (CAT). A scopist edits a transcript translated by CAT software into the targeted language, correcting any mistakes and putting it into the appropriate standard format. The CAT system is computer equipment and software that translates stenographers notes into the targeted language, provides an editing system which allows translated text to be put into a final transcript form, and prints the transcript into the required format. 2.2. Model Theoretic Semantic of Transcript Step 2 of our five step process generated a model theoretic semantic representation of the transcript. Here the model is described as: M = (D,F) where M is a model theoretic semantic representation of all the language that is contained in the trial corpus. M consists of a pair (D,F) where D is the Domain which is the set of people and things referenced in the corpus (e.g. defendants, jurors, attorneys, witnesses) plus the relations (e.g, lawyer(X,Y), trial_day(N), etc.) between those people and things. F is an Interpretation Function which maps everything in the language onto something in the domain [12]. F is
EPISTEMIC STRUCTURED REPRESENTATION FOR LEGAL TRANSCRIPT ANALYSIS
implemented using the Lambda Calculus operator λ and βconversion. The initial model theoretic representation serves as part of the background knowledge for the Inductive Logic Programming (ILP) learning process. The results of our ILP learning process augments the initial M. 2.3. Important Target Predicates Inductive Logic Programming (ILP) techniques were used to help with the process of classifying question types[13] and in determining whether a given Q&A pair constituted a legitimate proposition. ILP is a form of programming that combines machine learning and logic programming[14]. We used ILP to learn two of our important target predicates: question_classification(Q, Class) ques_ans_classification(QA,Resolved)
The question_classification(Q,Class) predicate represents a learned program that when given a question Q Class, the classification of Q. The returns ques_ans_classification predicate is a learned program that when given a question and answer pair QA returns whether or not the question was actually answered. If the question was resolved then the question and answer pair can be considered as a candidate for contingent knowledge. The first target predicate used simple question classification[15]. Table 1 contains the question classifications used. TABLE 1. Classifications of questions
Class person
Interrogative Indicator who
information
how, what
explanation
why
location
where
temporal
when
yes/no
can, do, did, is, were, are, was, will, would, does, could, have
Example You tell the jury who you raised money from and what the money was for? And how many helicopters did the group shoot at? What other weapons did you receive training in? Tell the jury why it is you chose to leave New York to go to Peshawar in Pakistan? Where did you get the $250,000 to buy the farm? When did those four people go back to the Sudan? Can you tell the jury what happened?
The target predicates: question_classification(Q, Class)
103
ques_ans_classification(QA,Resolved)
are important to our process because they are used to build the a posteriori knowledge of the transcript (Step 3). We also used ILP to learn viability of a question and answer pair as a candidate for contingent knowledge. In particular, we use a variation of Shapiro’s model inference system from Reference [14] and the basic FOIL algorithm to learn our ques_ans_classification (QA,Resolved) target. The two target predicates and basic NLP parsing was used to construct propositions in Step 3 of our five step process. 2.4. Epistemic Representation The term epistemic, as used in this paper, is taken from the field of Epistemology. Epistemology provides a framework for the theory of knowledge[16]. Throughout this paper our use of the term knowledge is consistent with its use in epistemology. Knowledge must have as its constituents: justification, truth, and commitment[17]. These constituents differentiate knowledge from information. First Order Logic (FOL) and predicate calculus is used as our primary vehicle for knowledge and query representation. Here a posteriori knowledge is represented using FOL. In particular, the opening and closing arguments and all courtroom testimony are represented as modal propositions. A priori knowledge such as concepts about trials, juries, attorneys, etc., are represented as frames. 2.5. The Epistemic Structure The structure referred to in Step 5 is our epistemic structure formally denoted by ES. Let ES be the structure: ES = Where: G1 is a Graph of a priori propositions G2 is a Graph of a posteriori propositions J is a Set of justification propositions C is a Vector of Commitment F is a non monotonic truth maintenance function on ES For each logical domain in the transcript we have a distinct ES. Therefore, we denote the knowledge space of the transcript formally as Ks where: KS =
ΣEs
In the NOFAQS system each rational agent Ai uses Ks as a search space in the resolution of a query posed by a user[18]. Each query is presented to Ai as either a complete interrogative sentence or a FOL query. Ai is a function implemented as:
104
HUGHES ET AL.
function: SearchAgent returns response class begin FOLv = parse(InterrogativeSentence) FrameNode = map(FOLV) Response = search(Ks,FrameNode) return Response
Here, FOLv is a predicate calculus representation of the user’s query and FrameNode is a partial frame that captures the attributes of the user’s query. The search method of Ai selects a graph traversal search based on the type of FrameNode returned by the map method. This graph search is then applied to Ks . This results in a search by Ai of the concept nodes in G1 and G2. The response is then given a weight and certainty based on how G1 and G2 are mapped into C. So for a typical search in Ks we have: Response = norm (ΣAi(S))
where S is the user’s interrogative sentence. When we decompose ES for our legal transcript representation, we have: G1 = {V,E} where V is a set of nodes that contain propositions and frames representing the non-testimony elements in the transcript (e.g. distribution of attorneys, opening & closing arguments, identity of judge, etc.) E is the set of relationships on set V. G2 = {V,E} where V is a set of nodes that contain propositions and frames representing the testimony elements of the transcript (e.g. questions, answers, direct examinations, objections). E is the set of relations on set V. J = {X | X is non challenged proposition in G1 that provides weight to propositions from G2} C = V[G(m,n)] where V is a vector that represents the agents level of commitment to propositions or concept nodes found in G1 or G2 F(ES) is function that weights propositions as they are added to G1 , G2 or F 3. Method The 76 HTML files were considered as raw data that would require data cleaning, pruning, and noise removal. Further, the files consisted of semi-structured text that is not conducive to inferential analysis. In the second stage of processing, the semi-structured text required that it be transformed into its Model Theoretic Semantic (MTS) representation. This transformation produced 23 classes and 40 basic relations between objects in the 23 classes. Table 2 contains samples of the 23 classes and 40 relations. So for M, where M = (D,F), we have concepts such as exhibits, cross examinations, witnesses in D as well as
relations such as heard, knew, and observed. The MTS representation was used as the baseline for the a posteriori knowledge. The a posteriori knowledge was taken primarily from the approximately 29,000 question and answer pairs that were given during the course of the trial. Its this a posteriori knowledge and its representation as FOL[19] that allows deep analysis against the trial corpus. While the question and answer pairs provided a legitimate source of contingent knowledge, we could not use them until they were classified[20].
TABLE 2 Classes and Sample Relations Classes Attorneys Defendants Witnesses Arguments Testimony Court Questions Recesses Counts (charges) The record
Trial Side-bar Objections Dates Exhibits Examinations Answers Adjournments Jurors Evidence
Relations answered did said told objected recalled saw traveled knew heard
asked do tell examined aware recognize meet know show hear
3.1. Frames for G1 and G2 The frames are schematic models of domain elements in the trial. For instance, the defense attorney frame contains slots and facets such as: frame: DefenseAttorney name: is-a: Attorney role: (lead or support) default: lead objections_sustained: facet: if_needed execute (get_sustained) objections_overruled: facet: if_needed execute(get_overruled) client: facet: if_needed execute(get_client) questions_on_direct: Type-of- List end DefenseAttorney frame There is at least one frame for each main domain element in the trial. Each frame consists of one or more slots. A slot may have one or more facets. The slots represent attributes of the frame. For instance, one attribute of an attorney is the type (either defense or prosecution). A facet represents some kind of special constraint or trigger, or processing for an attribute. For example, the fact that there was an objection during a cross examination may or may not be needed. The facet specifies what to do if it is needed. The slots for the frames for G1 and G2 typically require the values to be filled in from a priori and a posterior knowledge. This means prior to a user query, the frames are partially filled in. Once a query has
EPISTEMIC STRUCTURED REPRESENTATION FOR LEGAL TRANSCRIPT ANALYSIS been posed, the inference process fills in whatever slots are needed to answer the user’s query. 3.2. Q&A Pair Contingent Propositions The responses to the questions had to be classified as either answers or not answers before we could determine whether a Q&A pair was a candidate for contingent truth. The transcripts had many instances of witnesses and defendants that eluded answering the questions by such responses as: “I don’t know”, “I don’t remember”, “I can’t recall”, etc. Other responses were unrelated to the questions. Before a Q&A pair could be considered as a modal proposition the negative responses had to be classified and filtered. Three simple classifications of responses were used for our Q&A pairs: • • •
answered, not_answered answered_not_certain
The computer language Prolog was used for the hypothesis language and to represent the background knowledge in our ILP programming. Our target predicate: ques_ans_classification(QA,Resolved)
was implemented using the clause: tuple( [Q], [A], (X,Y))
where Q is list of words representing the question, A is the list of words representing the response and (X,Y) is the class pair representing the classification of the Q&A. X is the class of Q and Y is the class of A. tuple() is then implemented by hypothesis and answer predicates: hypothesis(question([FrameGrammar]), answer( [FrameGrammar]), class(Q,A) )
where question([FrameGrammar]) is a learned generalization of the Q, answer([FrameGrammar]) is a learned generalization of A, and class(Q,A) is a pair representing the class of Q and A. Each of the generalization of simple frame grammar representations are commonly used in natural language processing. For example: Question: Do you know what time of year it was Mr. X was arrested? Answer: I know it was during Ramadan, but I can’t remember the day. Here is the Prolog representation of the training set tuple and the learned target predicate:
tuple([do,you,know,what,time,of year...], [i,know,it,was,during,ramadan,but...], (yesno,answered_not_certain)) hypothesis(question([Av,Pn,V,Dpn,....]), answer([PPn,V,C,Av,...,i,cant,remember]), (yesno,answered_not_certain))
The background predicates (knowledge) consisted of a lexicon of the parts of speech in English in horn clausal form, and a simple Phrase Structure Rule (PSR) grammar. For instance, the lexicon clauses took the form: adjective(X). interrogative_pronoun(X) transitive_verb(X)
... 4. Discussion The epistemic representation of the transcript provides the user with the ability to pose queries about the content of the transcript that can only be inferred. For instance we are able to ask questions such as: • • • •
Was John Smith’s testimony impeached? Which witnesses were evasive in their answers? Were the opening and closing arguments supported by witness testimony? Did the prosecuting attorney lead the witness X on cross examination
The answers to these types of questions evade keyword and associative search techniques. It would also be difficult to convert the semi-structured transcript into database form and expect answers to these kinds of questions. On the other hand graph search and traversal techniques are well suited for data that can be represented in predicate calculus or as Frames. This motivates the reasoning behind our epistemic representation approach to legal transcript analysis. The fact that the HTML/XML standards for legal transcripts provide tagged elements that make the question and answer analysis process feasible, differentiates Legal transcripts from other types of HTML/XML semi-structured documents. The tagging found in the original transcript facilitates the automation of the conversion process in Step 1. As the new LegalXML standards are refined, the tagging will improve the mapping process between the a priori knowledge in the transcripts and the frames used for the concept nodes. 5. Conclusion Exposing the epistemic structure of a semi-structured transcript allows the agent to apply graph traversal and FOL search techniques to the knowledge space of the transcript. Since the agent is dealing directly with the knowledge space, the agent’s responses will be semantically and pragmatically
105
106
HUGHES ET AL.
related to the original query. This means that the agent’s responses can be logically derived from the knowledge space. While providing meaningful and relevant answers to questions is highly desirable, generating the epistemic structure of a semi-structured transcript is computationally expensive. The original HTML or XML transcript must be completely restructured. This is usually not practical for the general case. The size of the original HTML or XML transcript is increased by a factor of 4. While the resulting format can be expressed in standard FOL or horn clause form, these forms are not immediately available to the many HTML and XML browsers in use. While most of the conversion from semistructured to epistemic structured is automated, the process still requires manual intervention. In addition to these issues, the generation of the epistemic structure is not instantaneous. The generation requires both time and space. However, for some vertical applications and narrow segments of users, the cost of generating the epistemic structure of a semi-structured transcript is offset by the ability to perform deep analysis and get meaningful, relevant and accurate responses to a query. There is at least some temptation to compare the type of knowledge space search discussed in this paper with keyword and probabilistic search used in many of today’s common search engines. That temptation should be resisted. The keyword search engines can successfully process single words, or tokens. Our knowledge space search techniques require either complete interrogative sentences or FOL queries. There is also a shift in responsibility for answer resolution in the two approaches. In the approach discussed in this paper, the search agent is responsible for deriving the answer to the user’s interrogative sentence. Conversely, in keyword and associative type search techniques the user has most of the responsibility in extracting the answer from the results returned from the search. Also the two techniques serve very different audiences with different goals and objectives. Keyword-type approaches are primarily interested in text extraction or resource retrieval, where as the epistemic representation is aimed at text interpretation, inference and knowledge representation of the semantics in the source document. 6. Future Work Our process for converting interrogative sentences into FOL queries and then mapping those queries to the frames of our domain needs much refinement. First, FOL is only capable of representing a subset of the possible interrogative sentence forms that the user might make against a transcript. We’ve used inductive logic programming in attempts to learn a heuristic that can be used to bridge the gap between the variety of query forms and our FOL representation. But this process falls short and needs work. Second, the process that matches FOL to the Frames of our domain degrades rapidly when presented with valid nondeterminism among Frame selection. To offset these issues we will explore the addition of semantic headers and
discourse representation structures to our FOL representation of the original interrogative sentence. Further, we will investigate the use of ILP against the Q&A pairs to determine if there are structures that can be used to supplement our frame-based representation in the concept nodes. In addition to this work, we are interested in automating the the conversion process from semi-structured to epistemic structured as much as possible. 7. References [1] Li, Ping, and Kenneth W. Church, “A Sketch Algorithm for Estimating Two-Way and Multi-Way Associations”, Association for Computational Linguistics, Vol. 33. No 3, 2007. [2] Minker, Jack, “Control Structure of a Pattern-Directed Search System”, SIGART Newsletter, No. 63, 1977.
[3] Lewis, C.I., “A Survey of Symbolic Logic,” Berkeley: University of California Press , 1918. [4] Kripke, S., “Semantical Considerations on Modal Logic,” Acta Philosophica Fennica, pp. 16, 83-94, 1963. [5] Hintikka, J., “Individuals, Possible Worlds and Epistemic Logic,” Nous, pp. 1, 33-62, 1967. [6] Hintikka, J., “Knowledge and Belief: An Introduction to the Logic of the Two Notions,” Cornell: Cornell University Press, 1962. [7] Doignon, J.P., and J.C. Falmagne, “Knowledge Spaces,” Hiedelberg; Springer ISBN-3-540-64501-2, 1999. [8] Nirenburg, S., and V. Raskin, “Ontological Semantics,” MIT Press, 2004. [9] Sowa, J.F., “Semantics of Conceptual Graphs,” Proceeding of the 17th Annual Meeting of the Association for Computation Linguistics, pp. 39-44, 1979. [10] Brasil, S.M., and B.B. Garcia, “Modeling Legal Reasoning in a Mathematical Environment through Model Theoretic Semantics,” ICAIL ‘03. Edinburgh, Scotland, UK, 2003. [11] Minsky, M., “A framework for representing knowledge,” The psychology of computer vision. New York: McGraw-Hill, pp. 211-277, 1975. [12] Covington, M., “Natural Language Processing For Prolog Programmers,” Prentice Hall, 1994. [13] Cussens, J., and S. Pulman, “Incorporating Linguistics Constraints into Inductive Logic Programming,” Proceedings of CoNLL-2000 and LLL-2000, pp. 184-193. Lisbon, Portugal, 2000. [14] Bergadano, F., and D. Gunetti, “Inductive Logic Programming From Machine Learning to Software Engineering,” MIT Press, 1996.
EPISTEMIC STRUCTURED REPRESENTATION FOR LEGAL TRANSCRIPT ANALYSIS [15] Lehnert, W.G., “The Process of Question Answering,” Lawrence Erlbaum Associates, Inc., Publishers, 1978. [16] Gettier, E., “Is Justified True Belief Knowledge?,” Analysis, Vol.23, pp. 121-23. 1963. [17] Dancy, J., “Contemporary Epistemology,” Basil Blackwell Inc., 1986. [18] Chen, G., J. Choi, and C. Jo, “The New Approach to BDI AgentBased Modeling,” ACM Symposium on Applied Computing ‘04. Nicosia, Cyprus, 2004. [19] Blackburn, P., and J. Bos, “Representation and Inference for Natural Language,” CSLI Publications, 2005. [20] Fagin, R., J. Halbern and M. Vard, “ Model Theoretic Analysis of Knowledge,” Journal of Association for Computing Machinery, Vol 38. No. 2, 1991.
107
A Dynamic Approach to Software Bug Estimation Chuanlei Zhang+, Hemant Joshi+, Srini Ramaswamy*, Coskun Bayrak* +
Applied Science Department, *Department of Computer Science University of Arkansas at Little Rock Little Rock, AR 72204 {cxzhang, hmjoshi, srini, cxbayrak}@ualr.edu
Abstract-There is an increase in the number of software projects developed in a globally distributed environment. As a consequence of such dispersion, it becomes much harder for project managers to make resource estimations with limited information. In order to assist in the process of project resource estimations as well as support management planning for such distributed software projects, we present a methodology using software bug history data to model and predict future bug occurrences. The algorithm works in a two-step analysis mode: Local and Global Analysis. Local analysis leverages the past history of bug counts for a specific month. On the other hand, global analysis considers the bug counts over time for each individual component. The bug prediction achieved by the algorithm is close to actual bug counts for individual components of Eclipse software.
I.
INTRODUCTION
Traditionally, most software projects employ centralized software development processes, normally performed at a single location. All of the developers usually work in the same city, or even in the same building. Project management and resource allocation are made based on well-established product plans. This form of a centralized software development process and project management has been successfully used for decades. The globalization of the software industry is has led to products being developed globally, where the software development teams and the project management teams work in disperse geographical locations [1]. Despite the fact that global software development can bring benefits such as improving the product quality by hiring top rated international IT professionals and lowering the development costs by using services from countries with cheaper cost of labor / living, it creates several software engineering challenges due to the impact of temporal, geographical and cultural differences among distributed development and management teams [2] [3]. One of the challenges in global software development is project management and appropriate resource allocation: (1) since the management team and the development team might work at different locations, problem identification, issue resolution, and decision making are not as straightforward as that of the centralized software development scenario; (2) The tasks assigned to distributed teams must be clear and specific, because distributed teams are not as flexible as centralized teams. The hiring of the distributed developers is based on their expected expertise and potentials to rightly fit the foreseen tasks in order to minimize development costs. Therefore, in global software development, it is critical to
accurately identify the task and estimate the cost of software development at different sites and allocate appropriate resources accordingly [4]. Software repositories, such as source control systems, defect tracking systems, and communication archives, contain rich records of the software development history. Such data is available for most software projects and can be used to extract valuable information to guide software development and project management. For example, archived e-mail can be used to study the communication network of the project [5] [6], change logs can be used to study the evolution of the source code, and so on [7]. One type of valuable information that can be extracted from repositories is change propagation, which is to determine the group of components should be changed together because of the interdependencies between software components [8]. For example, Zimmermann et al. [9] applied data mining to software version history and implemented a tool to detect related changes in open-source software including Eclipse€, GCC, and Python. Ying et al. [10] applied association mining techniques on Eclipse and Mozilla. Their association rules can be used to predict future co-changes of software components. The prediction of software change propagations can assist the developers to correctly maintain a software product and the managers to accurately estimate the amount of work involved in a maintenance task and efficiently allocate the necessary resources. Another type of valuable information that can be extracted from software repositories is bug patterns. Bugs are inevitable consequence of a software product development process. The correct prediction of the number of bugs and the type of bugs in a software product can greatly assist in project estimation and management [11] [12]. In this paper, we describe how bug history logs can be used to model the occurrence of bugs and how this pattern can assist the project management to plan for and estimate resource needs in a globally distributed development environment. The rest of the paper is organized as follows. Section 2 reviews related work on mining bug repositories. Section 3 entails the underlying algorithm for bug prediction. In section 4, we discuss the results of the prediction. Finally, section 5 presents our conclusions and avenues for future research work.
1€
http://www.eclipse.org
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 108–113. © Springer Science+Business Media B.V. 2008
A DYNAMIC APPROACH TO SOFTWARE BUG ESTIMATION
II.
RELEVANT WORK
To remain competitive in the fast growing and changing world of software development, project managers must optimize the use of their limited resources to maximally deliver quality products on time and within budget [13]. Software bug repositories have been recognized as data assets that can provide useful information to assist with software development and project management. Considerable research has been performed in this area. One line research is to characterize the information stored in bug repositories. Anvik et al. [14] analyzed the information stored in a bug repository and discussed how to use it. Sandusky et al. [15] identified bug report networks, which are groupings of bug reports due to duplication, dependency, or reference relations. Mockus et al. [16] used bug information to determine the various roles people played in the Apache and Mozilla projects. Crowston and Howison [17] used bug repository information to analyze the social structure of open source projects. Another line of research is to model the bug patterns. Askari and Holt [18] analyzed the data from open source repositories, including defect tracking system, and developed three probabilistic models to predict which files will have bugs. These include: (i) Maximum Likelihood Estimation (MLE), which simply counts the number of events, i.e., changes or bugs, that happen to each file and normalizes the counts to compute a probability distribution. (ii) Reflexive Exponential Decay (RED), which assumes that the predictive rate of modification in a file is incremented by any modification to that file and decays exponentially. (iii) The third model, the RED-Co-Change model, with each modification to a given file, not only increments its predictive rate, but also increments the rate for other files that are related to the given file through previous co-changes. The performance of the different prediction models were evaluated using an informationtheoretic approach. Ostrand and Weyuker [19] [20] [21] [22] developed a statistical model that uses historical fault information and file characteristics to predict which files of a system are likely to contain the largest numbers of faults. This information can be used to prioritize the testing components to make the testing process more efficient and the resulting software more dependable. Still another line of research is to predict the number of bugs and types of bugs in a software product. Gravel et al. [23] used project change history to predict fault incidences. Menzies el al. [24] performed a case study on NASA’s Metrics Data. Through mining the bug history, they built defect detectors to predict the potential number bugs for different software components. Their findings can help project planning and resource allocation. Williams and Hollingsworth [25] presented a method to create bug-finding tools that is driven by the analysis of the previous bugs. They showed that past bug history of a software project can be used as a guide in determining what types of bugs should be expected in the
109
future. Moreover, their approach can also help identify which groups of bug reports are more reliable. III. BUG PREDICTION The lifetime of a bug in a post-delivery product can be divided into two stages, hidden stage and visible stage. The hidden stage starts as soon as a product is released. (Although bugs might enter a software product in different phases of the development process, in this study, only bugs in the released product are considered.) When a bug is identified and reported, it enters into the visible stage. When the bug is killed (the fault is corrected), the visible stage ends [26]. One parameter to measure the bug life is the bug hidden time, which is the duration that a bug is in the hidden stage. Bug hidden time is affected by many factors. For example, a bug in a widely used product may be quickly detected and reported by users, and accordingly has a shorter hidden time. Clearly, the number of bugs in a released product is fixed. Different bugs might have different hidden time, and accordingly, the number of bugs detected and reported each month after the product release is affected by many factors, such as the bug’s concealment, the extent usage of the product, and the user skill level. However, most of these affecting elements are not measurable. Therefore, there is a need of an empirical approach that can predict bugs based on bug finding history instead of based on these not measurable factors. In this paper, we present a bug prediction algorithm. The algorithm is intended to be used to predict the number of bugs to be detected and reported every month. First we make the hypothesis that bug prediction of any given component for a given month, i is most impacted by the bug-count reported for the previous month, (i-1). This can be represented as:
BugCount i ∝ BugCount i-1 So the bug prediction for any month mainly relies on the bug-count of previous month. We introduce the correction factor alpha (α), which is a measure of how much the predicted bug-count deviates from the previous month. Hence, BugCount i = α i + BugCount i-1 (1) The correction factor α is used as an offset adjustment to predict how much the bug-count will vary in the month under consideration from its predecessor. In order to decide the correction factor α, we take into account the bug-count history of all available years and also use local corrections. Thus, we study bug prediction problem by analyzing the historical bug data both locally, and globally. IV. CASE STUDY The case study was performed on Eclipse bugs repositories [27]. We downloaded over 70,000 bugs related to Eclipse project over past 6 years. We particularly were interested in 32 components listed in table 1. Equinox.Bundles Equinox.Framework Equinox.Incubator
Table 1 32 components list Platform.Debug Platform.Doc Platform.IDE
110
ZHANG ET AL. Platform.Releng Platform.Resources Platform.Runtime Platform.Scripting Platform.Search Platform.SWT Platform.Team Platform.Text Platform.UI Platform.Update Platform.UserAssistance Platform.WebDAV Platform.Website
A. Global Analysis Let us look at global bug-count history first. The reported bug-count for each month is calculated by adding all reported bugs for that component in the particular month period. For simplicity, we do not consider the severity of the bug but rather consider all bugs for that component and time criteria. As stated above, we believe that the previous months’ reported bug-count is very significant in predicting the current month’s bug-count. We refer to this as the ‘recency-weighting’ approach where weights are arranged in a manner that boosts the weight of the most recently known monthly bug-count for that component. To model recency, we use a power decay function (base 2) given by 1
f(x) = 2 x where x ≥ 0
(2) The power decay function shown in equation (2) is simple, yet a very powerful technique for weighting the most recent bug-count more than any of its predecessor months. Consider Figure 1 that shows the decreasing weights for 72 months (6 years). The value of function increases over time (x values start with 0 to 71).
Monthly bugcount for Platform.Debug 160 140 100 80 60 40 20 0 -20 0
40
60
Power function f(x)
30
40
50
60
70
80
The X axis shows 72 months beginning with January 2001 until December 2006 and the reported bug-count values for each month. Platform.Debug is one of the components about which a number of bugs have been reported each month for almost all of the last 6 years. The fluctuation in the reported bug-counts is natural because as the newer version of Eclipse is released, new bugs are likely reported. The bug prediction task is to predict whether the trend of the reported bug-count will be upwards or downwards from the 72nd month. In order to determine this trend, we study the overall trend. So, we measure the difference (called as slope, m) between two subsequent bug counts as m i = BugCount i − BugCount i-1 (3) The slope, m, is the indicator of trend of the reported bugcount. If the value of mi is positive, it means an increase in the bug-count over previous month. Similarly negative value of mi indicates a downward trend. The roller-coaster bug-counts are then smoothed to find out the overall trend of all components. Figure 3 shows the slope bug-count for Platform.Debug component. Slope, m of bug-count for Platform.Debug for last 6 years 100
80
slope, m
80
100
x
Fig. 1. Plot of values over power decay weighting function
Let us consider sample monthly bug count for the Eclipse component Platform.Debug (Figure 2).
Difference in bug-count in subsequent months
20
20
Fig. 2. Monthly bug count reported for Eclipse component Platform.Debug over 6 years
power decay function
0
10
72 Months starting from Jan 2001
power decay function
1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
Bug-count reported
120 Bugs reported
Equinox.Website JDT.APT JDT.Core JDT.Debug JDT.Doc JDT.Text JDT.UI PDE.Build PDE.Doc PDE.UI Platform.Ant Platform.Compare Platform.CVS
60 40 20 0 -20 0
10
20
30
40
50
60
70
80
-40 -60 -80 -100 -120 72 months starting from Jan 2001
Fig. 3. Slope, m of subsequent bug-counts for Eclipse component Platform.Debug
A negative slope simply indicates reduction in reported bugs and high positive peaks indicate dramatic increase in the number of bugs reported that month for the corresponding Eclipse component - Platform.Debug. We combine the weight function f(x) for each month with the slope of each month to calculate overall weight function W(x) given by 1
W ( x ) = m x * 2 x where x ≥ 0
(4)
111
A DYNAMIC APPROACH TO SOFTWARE BUG ESTIMATION
January weight factor for any year given as shown in equation (6).
We can calculate the average weighted function W(x) for each month, i for the last 6 years as
Jan -2006
1
0
∑ mi * 2 i
Wi = i =71
72
for all months between Jan 2001 until December 2006. The weighted monthly plot for Platform.Debug component is shown in Figure 4. The spike near the 72nd month indicates how the slopes were weighted using the recency weighting algorithm to give more importance to slopes in the most recent months for reported bug-count of component Platform.Debug Weighted bug-count response in last 6 years for Platform.Debug component
Weight Function response
300 W(x)
200 100 0 0
10
20
30
40
Wjanuary =
(5)
50
60
70
80
-100 -200 -300 72 months starting from Jan 2001
Fig. 4 Weighted slope for monthly bug-count for Platform.Debug component using recency weighting
The average of weighted products for 72 months starting Jan 2001 until Dec 2006 for Platform.Debug is 0.788 which indicates that the overall bug-count for this component is usually high (value close to 1). Although the global analysis sheds significant light on understanding the overall bug-count trend (high or low), it does not include any information about trends that might be specific to a particular month under consideration. As the bug-count is the ‘reported’ bug-count, it may be assumed that the bug-count gets affected by seasonal holidays, vacations or some other local factors. In the next sub-section, we add a local analysis factor for bug-prediction. B. Local Analysis Reported bug-counts are not specific to the month under consideration. For example, for the month of December of any year, the bug-count may be expected to be less due to local factors such as holidays. It could also be possible that no releases were made during December of any year. While global analysis justifies the overall trend of the bug-count, it can not sufficiently model local parameters that need to be taken into account. In this sub-section we present two such techniques considered for this paper. 1. Monthly analysis: We consider all the months of January over past 6 years to predict the number of bugs for January 2007. This is important as it provides a local analysis perspective since it counts the number of bugs reported. So similar to a global analysis, we consider six weight functions (Wj) for Platform.Debug to calculate the average
2.
∑W
j= Jan -2001
j
(6)
6
The average slope in January is important to understand the local trend during January for the last 6 years. The average weight of all January months is 0.08 for Platform.Debug component of Eclipse. Second, we also take into account the correction factor of past 2 year’s January months apart from all January analysis done in step 1. This is important to account for our recency weighting approach. Accordingly, the two most recent ‘previous’ months of January have the most impact on bug prediction. The weight factor according to this step is 0.24902 for Platform.Debug component. Equation (7) shows the weight factor W’ for January-2005 and January-2006. ' W january =
W jan −2005 + W jan −2006 2
(7)
where Wjan-2005 and Wjan-2006 are calculated according to equation (5). C. Results In order to make monthly bug-predictions for the 32 components mentioned in Table 1, we calculate the α factor for the month of January 2007 as ' α i = Wi + W january + W january
where W january and W
' january
(8)
are obtained using equations (6)
and (7) respectively. Finally we can predict the bug-count for January-2007 for each component. Using equations (1) and (8), we calculate predicted bug-count for each component as (9) BugCountjan−2007 = α january + BugCountdec−2006 The Table 2 presents the predicted bug-count for the aforementioned 32 components for the month of January-2007. The comparison with actual bugs reported for those components suggests that our predictions are very close to actual bug-count values. The last column in Table 2 indicates absolute difference. The predicted bug-count is fairly close to the actual bugs reported for various components. We believe that Platform.UI component which predicted 98 bugs less than the actual bug-count is an exceptional case. Most of the other predictions are accurate with average accuracy of 67.14% for 32 components (including Platform.UI) and average accuracy of 77% without Platform.UI component. For most of the components the absolute difference between actual bug-count and predicted value is in single digits indicating good robust algorithm. Using this algorithm, we were placed no.1 in the challenge[28][29] which is to predict the bug count for Feb. 2007 to May 2007.
112
ZHANG ET AL. Table 2 Predicted vs. Actual Bug Count for January 2007
Component
Equinox.Bundles Equinox.Framework
Equinox.Incubator Equinox.Website JDT.APT JDT.Core JDT.Debug JDT.Doc JDT.Text JDT.UI PDE.Build PDE.Doc PDE.UI Platform.Ant Platform.Compare Platform.CVS Platform.Debug Platform.Doc Platform.IDE Platform.Releng Platform.Resources Platform.Runtime Platform.Scripting Platform.Search Platform.SWT Platform.Team Platform.Text Platform.UI Platform.Update Platform.UserAssist -ance Platform.WebDAV Platform.Website Sum
[3]
Actual Bug Count
Predicted Bug Count
Absolute Difference
12 35 5 0 5 73 37 4 22 79 9 2 89 6 19 29 43 4 43 24 21 19 0 1 105 17 24 242 19
3 25 12 0 6 50 31 1 16 49 7 0 59 10 22 17 32 0 18 8 12 11 1 5 67 15 9 144 32
9 10 7 0 1 23 6 3 6 30 2 2 30 4 3 12 11 4 25 16 9 8 1 4 38 2 15 98 13
0
0
0
0 1 989
0 2 664
0 1
V. CONCLUSIONS & FUTURE WORK In this paper, we presented an algorithm which uses bug repository to predict the future bug counts. It can assist project management plan and help to estimate future resource needs, among dispersed developmental teams. The presented algorithm leverages the local and global bug trends for individual components. Future research work directions include: component stability prediction, longer term future predictions, change predictions and accounting for versioning impacts. We would also like to study the impact of bug counts of core components on the bug counts of the dependent components of the software. REFERENCES [1] J. D. Herbsleb and D. Moitra. “Guest editors’ introduction: Global software development,” IEEE Software, vol. 18, no. 2, 2001, pp. 16–20. [2] E. Carmel, Global Software Teams, Prentice Hall, 1999.
E. Carmel and R. Agarwal, “Tactical approaches for alleviating distance in global software development,” IEEE Software, vol. 18, no. 2, 2001, pp. 22–29. [4] S. Cherry and P. Robillard, “Communication problems in global software development: Spotlight on a new field of investigation,” Proceedings of the International Workshop on Global Software Development at the 26th International Conference on Software Engineering, Edinburgh, Scotland, May 2004, pp. 48–52. [5] P. A. Wagstrom, J. D. Herbsleb, and K. Carley, “A social network approach to free/open source software simulation,” Proceedings of the 1st International Conference on Open Source Systems, Genova, Italy, July 2005, pp 16–23. [6] C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan, “Mining email social networks,” Proceedings of the 3rd International Workshop on Mining Software Repositories, Shanghai, China, May 2006, pp.137– 143. [7] K. Chen, S. R. Schach, L. Yu, G. Z. Heller, and Jeff Offutt, “Open-source changelogs,” Empirical Software Engineering, vol. 9, no. 3, 2004, pp. 197–210. [8] A. Mockus and L. G. Votta, “Identifing reasons for software changes using historic database,” Proceedings of 2000 International Conference on Software Maintenance, San Jose, California, October 2000, pp. 120130. [9] T. Zimmermann, P. Weißgerber, S. Diehl, and A. Zellers,”Mining version histories to guide software changes,” IEEE Transactions on Software Engineering vol. 31, no. 6, 2005, pp. 429–445. [10] A. T. T. Ying, R. Ng, M. C. Chu-Carroll, and G. Murphy, “Predicting source code changes by mining change history,” IEEE Transactions on Software Engineering vol. 30, no. 9, 2004, pp. 574–586. [11] G. Canfora and L. Cerulo, “How software repositories can help in resolving a new change request,” Proceedings of the Workshop on Empirical Studies in Reverse Engineering, Budapest, Hungary, September 2005. [12] D. Cubranic and G. C. Murphy. “Automatic bug triage using text classification,” Proceedings of 2004 International Conference on Software Engineering and Knowledge Engineering, Banff, Canada , June 2004, pp. 92–97. [13] A. E. Hassan and R. C. Holt, “The top ten list: dynamic fault prediction,” Proceedings of the 21st International Conference on Software Maintenance, Budapest, Hungary, September 2005, pp. 263–272. [14] J. Anvik, L. Hiew, and G. C. Murphy, “Coping with an open bug repository,” Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange, San Diego, California, October 2005, pp. 35–39. [15] R. Sandusky, L. Gasser, and G. Ripoche, “Bug report networks: Varieties, strategies, and impacts in a F/OSS development community,” Proceedings of the International Workshop on Global Software Development at the 26th International Conference on Software Engineering, Edinburgh, Scotland, May 2004, pages 80–84, 2004. [16] A. Mockus, R. T. Fielding, and J. D. Herbsleb, “Two case studies of open source software development: Apache and Mozilla,” ACM Transactions on Software Engineering and Methodologies, vol. 11, no. 3, 2002, pp. 309–346. [17] C. C. Williams and J. K. Hollingsworth, “Bug Driven Bug Finders,” Proceedings of the 1st ICSE Workshop on Mining Software Repositories, Edinburgh, Scotland, May 2004, pp. 70–74. [18] M. Askari and R. Holt, “Information Theoretic Evaluation of Change Prediction Models for Large-Scale Software,” Proceedings of the 3rd ICSE Workshop on Mining Software Repositories, Shanghai, China, May 2006, pp. 126–132. [19] T. Ostrand and E. J. Weyuker, “The distribution of faults in a large industrial software system,” Proceedings of 2002 ACM International Symposium on Software Testing and Analysis, Rome, Italy, July 2002, pp. 55–64. [20] T. Ostrand, E. J. Weyuker, and R. M. Bell. “Where the bugs are?” Proceedings of 2004 ACM International Symposium on Software Testing and Analysis, Boston, MA, July 2004, pp. 86–96. [21] T. J. Ostrand and E. J. Weyuker, “A tool for mining defect-tracking systems to predict fault-prone files,” Proceedings of the 1st ICSE Workshop on Mining Software Repositories, Edinburgh, Scotland, May, May 2004, pp. 85–89. [22] T. J. Ostrand, E. J. Weyuker, and R. M. Bell, “Predicting the location and number of faults in large software systems,” IEEE Transactions on Software Engineering, vol. 31, no. 4, 2005, pp. 340-355.
A DYNAMIC APPROACH TO SOFTWARE BUG ESTIMATION [23] T. L. Gravel, A. F. Karr, J. S. Marron, and H. Siy, “Predicting fault incidence using software change history,” IEEE Transactions on Software Engineering, vol. 26, no. 7, 2000, pp. 653–661. [24] K. Crowston and J. Howison, “The social structure of free and open source software development,” First Monday, vol. 10, no. 2, 2005. [25] T. Menzies, J. S. Di Stefano, and C. Cunanan, “Mining repositories to assist in project planning and resource allocation,” Proceedings of the 1st ICSE Workshop on Mining Software Repositories, Edinburgh, Scotland, May 2004, pp. 75–79. [26] L. Yu and K.Chen, “Evaluating the post-delivery fault reporting and correction process in closed-Source and open-source software,” submitted for publication. [27] Eclipse Bugzilla Home, https://bugs.eclipse.org/bugs/ [28] MSR. Mining Challenge2007,http://msr.uwaterloo.ca/msr2007/challenge/ [29] Joshi H, Zhang C, Ramaswamy S, Bayrak C, “Local and global recency weighting approach to bug prediction,” Proceeding of the 4th International Workshop on Mining Software Repositories, Minneapolis, Minnesota, May 2007.
113
Soft Biometrical Students Identification Method for e-Learning Deniss Kumlander Department of Informatics, Tallinn University of Technology, Raja St.15, 12617 Tallinn, Estonia [email protected] Abstract-The paper describes a soft biometrical characteristics based approach to the students’ identification process to be used mainly for e-learning environments. This approach is designed to increase security of the examination process from the involved attendees’ identification point of view and should improve the overall security in relatively weakly protected e-learning systems. The approach is called “soft” as doesn’t require any special systems to be used other than e-learning pages embedded software. The paper discusses how the approach can be applied and what kind methods should be used together with the proposed one to produce a complete identification system for elearning.
I.
INTRODUCTION
Importance of an electronic learning as an alternative solution to the traditional education scheme has sufficiently increased recently with development of electronic channels and world globalisation [1, 2]. There are millions of students attending e-courses in developed countries and this number will increase when more countries will be involved. The main problem that arises in the e-learning, especially in the examination process, is students’ identification [1]. There are several methods that can be applied to solve this problem including certified centres and “hard” biometrical methods as a requirement to attend exams using web-cameras. At the same time those approaches mostly do require using some special hardware, which is sometimes either unavailable to students or they are not technically skilled enough to install and use those, so in the result they tend to afraid e-learning, i.e. the whole process. Summarising all previously said, the current students’ identification methods sometimes prevent people from using ecourses. Therefore, in this paper, we are about to propose some soft students identification methods, which does identification without requiring any special hardware or attending courses in special centres. II.
E-LEARNING ENVIRONMENT – THE PROBLEM STATEMENT
The problem statement for identifying students and using the online exam environment greatly depends on the auditorium to which e-courses are targeted. In other words, before we discuss whether remote exams are an acceptable choice to go with we need to consider “Who and why”: who is studying and why what are their goals during the study process. The main problem we are going to research here is a security risk that the person doing the exam is not the one who gets the mark.
The first and the most typical class of students includes those who come from the previous educational level (schools, colleges etc.) right after that level is completed. Typically the teacher deals here with a lot of students – consider for example a bachelor level in universities, i.e. the auditorium is “massive”. The learning process for them is a possibility to achieve much better level than they have now and many of them will try to pass exams using any possible techniques and methods. Notice that the current level of such students is normally below average, so the examination results will surely affect their future. Another class of students includes people, who are going to participate in courses in order to obtain modern knowledge or change their professions. Those people are already obtained some level of education and are not as young and easy-minded as students from the first group. One more interesting class of students includes students who try to achieve the highest educational level. The number of such student is usually small or extra small. Sometimes the learning process is directed to a particular person rather than to a broad auditorium. Consider for example PhD students. The risk that those students will not “show” themselves during exams (asking somebody else’s help to pass the exam) is much smaller as the level they are doing exams at requires quite sophisticated knowledge to be obtained by the exam time. From another side, exams’ positive results give them much more than for previous groups. Therefore students of this category should be closely monitored. The second question we should consider is why we need to employ any students’ identification system and actually the core question - why we do need to have e-learning (i.e. distance) environment at all. First of all it makes possible to learn the subject from any place. Often it is the only reason that is given in the literature, although it not always the only one. Anyway it brings the learning process out from the physical location and sufficiently increases the students’ auditorium. Besides it gives opportunities for many students to attend best courses, which are probably given somewhere else (other countries, cities, universities and so forth). The second reason is again bringing the course out, but not from a room, but from a time dimension. The course is normally given in a certain place and in a certain time. Some courses are provided as records, so it is possible for students to
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 114–118. © Springer Science+Business Media B.V. 2008
SOFT BIOMETRICAL STUDENTS IDENTIFICATION METHOD FOR E-LEARNING
choose a time when they would like to study. There are a lot of people who cannot concentrate just on studying, but have a lot of other duties like having to work, look after their children etc. The off-time lectures is the only way for them to study. Moreover this benefit can be used by the lecturer as well. Sometimes the lecturer cannot give lectures “when” students (university) would like, but he is an expert with significant knowledge in the field, so the lecturer replacement is not a good choice for the subject. In that case s/he can record the lecture at any time s/he wants and publish it to the lectures eenvironment. The final reason to provide knowledge over electronic channels is addressing self-learning people in a country. The society goal is to keep the country work-force on a good level. It doesn’t necessarily require workers attending full courses each year and do not always achieved by just education young students. One way of achieving that goal could be making modern knowledge available in web, so any country citizen can access that. They will use only lectures and exercises that are interesting for them, probably picking out just some relatively small number of topics either since they know the rest or don’t care about subjects they are not using in their everyday work. Finally it is possible to divide the e-learning process by the e-technology involving and the following classes can be identified. Generally saying we can identify the following three core steps: 1. Provide information (course knowledge); 2. Preliminary check (how students were able to understand that); 3. Have a final check – exam. At the same time it is possible to combine those steps in different variations, so we are about to look into alternatives more closely. The first alternative we are going to mention does include ebased learning process as a supporting technology to the main course learning process. It means that there could be some materials or labs published in the web, but all main activities including lectures and the examination process will happen in a class, i.e. in the “real-world” environment, where each part is presented personally (or “normally” as an opposite to be presented virtually). The second alternative is a full e-based knowledge transfer from a teacher to the class with a “personal” examination as described above. There are two main reasons of having this type of e-learning. First of all the examination is done face to face as the teacher is afraid that a student can be using noneallowed materials during the exam and s/he will not have possibilities to control or prohibit (avoid) that. This alternative is applied often if the examination results will give something important to the student (career, financial benefits and so forth). Therefore if a student will not derive such benefits i.e. is studying just for himself, then there is no reasons to control him/her during the examination nor identify. The second main reason of having this type of studying is a sufficient distance
115
between the teacher and students. It is not efficient or even not possible to ask student to come, for example each week, to a classroom especially if they will have to cover thousands miles from their homes. At the same time they will have to do that once - to attend the final exam personally to have their final marks. In some countries (for example in Sweden) students don’t have to go to universities, but could attend the examination in a certified educational centre, where teacher’s deputies will control the process. Notice that despite the fact that the final exam is not done in the virtual environment, many teachers will still like to learn their students giving them virtual lectures looking how they do progress in labs and the final mark will be just a way to summarize the teacher opinion (or prove his/her opinion about the student to avoid mistakes). So the real exam will not give a lot into the final teacher’s decision. The final alternative is: the whole learning process is ebased. The teacher and the student will probably never meet face-to-face, or at least will not in the scope of that subject. Of course there could be an alternative to study this subject in the “real” environment, but the e-learning one doesn’t require the physical presence in any particular place. III. BIOMETRICAL MARKERS AND THEIR CURRENT USE Biometric is a relatively novel approach to the computer security that uses either physical or behavioral person characteristics to identify that person. The major idea is to learn computers to identify persons as people do in everyday communication at home, on streets, in offices and so forth. The most known system of identification is using finger prints and this one was used longer than electronic computers do exist. Actually there are much more types of different security approaches for computers relying on biometrical methods nowadays. Consider for example the following recognitions: face recognition, eyes’ patterns recognition, signature or voice recognition [3, 4]. Unfortunately most biometric approaches require special systems (like you have probably used to use in airports or offices) and therefore approaches those are not good enough to be used in e-learning as rarely students have such devices or are ready to buy those. Notice that we do not consider here e-learning via certified educational centers as students identification there is the same simple as on the real education site and therefore is not a problem to be solved. The approach we are going to propose for using in the elearning process is called keystroke or typing patterns and does not require any special client-side equipment as relies exclusively on the information gathered by the client and server applications (software) and data mining of that information. The keystroke dynamics refers to a behavioral person characteristic where intervals between keystrokes typing is measured and captured in a form of a pattern which is assigned to that person for later use. The method is based on the fact that every user has a specific typing pattern and is
116
KUMLANDER
Training: Input data via a biometrical sensor
Input data via a biometrical sensor
Biometrical measure extraction
Verification/identification:
Patterns database
Input data via a biometrical sensor Biometrical measure extraction
Biometrical measure matching
Identificati on process result
Fig. 1. Biometrical identification systems architecture.
usually used in combination with other approaches for strengthening the main method. The biometrical security system is relatively easy to implement and use. It is possible to vary factors to be measured and users will be verified by. So the system can evolve by implementing more and more features. At the same time the general approach to be defined below remains the same and is similar to other computer based recognition systems. Typically there are three major elements in the system as it is shown in Figure 1. The training part of the system is responsible to registering users in the system by producing patterns for each user and assigning those to them. The sensor produce initial data and the feature extractor do produce data, which is: 1. Bound to a particular measure (like time intervals or used words) 2. Pre-process that and stores in a compact way, which is ready for later use. So there could be several sub processes inside the biometrical measure extraction for each measured characteristic. The registered patterns are saved into a patterns database, which is the second part of the system. After all this is done, the user can go into the system and use it, so the third element of the system activates, which is users recognition. During that step, which is called matching, the current user characteristic compared to the saved and if those are the same then the system accepts the user. Notice that the similarity level of the compared patterns can be set slightly less than 100% to meat statistical deviations of those characteristics in time. The verification can be just a verification of the person during the login phase or a long-term user monitoring. Notice that Figure 1 does present a system, which is trained before using. It is possible to formulate a similar system, which will adopt and evolve together with people. It means that people keystroke dynamic, used words and phrases are not
Biometrical measure extraction
Biometrical measure matching
Identification process result
Patterns database Person pattern adjustment Fig. 2. In time adjustable biometrical identification systems architecture.
something, which is static, so those do change in time. Therefore the system should also allow migration of patterns assigned to each person, or to be re-trainable during the verification process. Consider a system architecture shown on the Figure 2. Here the extracted biometrical characteristics are used both for person identification and to adjust the existing pattern to evolve his/her pattern with that person. The keystroke pattern recognition system monitors first of all the keystroke typing dynamic. It is well-known among telegraph operators that writing/typing dynamics are assumed to be unique to a large degree among different people [5]. Actually the keystroke typing dynamic can be measured by the following sub-characteristics [2, 5]: 1. Duration of the keystroke, hold time; 2. Latency between consecutive keystrokes; 3. Overall typing speed; 4. Habits by using additional keys on the keyboard like typing numbers via number pad part; 5. Frequency errors and backspace vs. delete key using; … and some others The key press, up and down events can be easily captured in modern programming languages. Therefore it is possible to find out when a key was pressed and released and calculate the interval of pressing the button (which probably reflects also the strength of pressing). It is also possible to calculate intervals between keys. Besides, the same keyboard events do provide the pressed key distinguishing between symbols appearing the same on the screen, for example a number 1 typed on the highest keyboard row versus the number 1 typed on the number pad. It is also possible to cluster time intervals by keys and have even more sophisticated patterns. The keystroke recognition systems research is started in 80s [6] and by now includes many types of recognition methods like [2]: 1. Statistical methods [6, 7, 8]; 2. Artificial intelligence methods / machine learning and data mining, including but not restricted by neural
SOFT BIOMETRICAL STUDENTS IDENTIFICATION METHOD FOR E-LEARNING
3.
networks, graphs and Euclidean distance metrics, decision trees etc [9, 10]; Generic algorithms [11] and fuzzy classification methods [12].
Sometimes it is necessary to write down more than one pattern for each user since the pattern depends on the used keyboard (the own one at home or somewhere else), mood or environment (compare working at home or in a train). One more method we would like to propose for the students’ identification and which also bounds to person metaphysical characteristics – a set and frequency of words that a person use to formulate an answer. This type of measure can be applied if a student has to write a free text to answer exam questions. It is well-known nowadays that person’s vocabulary greatly depends on his background, skills, intelligence and family s/he comes from. Therefore it is possible to distinguish persons basing on typical words and phrases s/he is using and this method was quite successfully applied to analyse different authors’ texts (novels) and identify authors of some unsigned texts. This method is much more reliable than the keystroke recognition approaches as depends less on the environment the exam is done (or pattern was recorded), person mood during this time etc. In other words it depends much less on the moment the biometrical factor is measured and it is much more stable over time. At the same time it does require much more text to be produced by a person to be identified in order to apply the method. IV. APPLYING BIOMETRICAL MARKERS AND A SOFT STUDENTS IDENTIFICATION SYSTEM
The first technical problem that appears applying the proposed methods is how the database of patterns can be produced. Patterns should be recorded and verified prior to using, i.e. identifying students. Depending on the environment, purposes and teacher/student relations the process of collecting biomedical information can be either hidden or visible to the person to be identified. One idea can be to record the data during the study period of the subject, for example while the student connects to the course site or while s/he is doing labs. Another idea requires more support from the educational institution management and certain centralization, but eventually can be much more efficient. The biomedical data can be centralized for the whole university, so a course can use data captured in the past (i.e. during previous courses that the student has passed). Especially valuable the data will be if the previous course was done in a controlled environment, i.e. when a student and a teacher had a visual contact. This type of recorded patterns is reliable for later use. Therefore one approach can be to have web cams based exams during first courses (for example the first year) capturing biometrical characteristic data and thereafter transfer students to the full e-
117
learning environment and use there those captured (reliable) biometrical patterns. Notice that different biometrical factors should be used for different exam case / environment / teaching technology types. For example the test like exam will be likely using time intervals for selecting answers and a count of re-selects (changes of mind), which are both the only applicable for this type of exam and is person characteristic so can be used as a biometrical measure. Therefore it is important to formulate the goal (markers and factors to be used) right from the start by analyzing the used e-learning concept and how the final step will be done. The process of recording database data can be organised via a set of small post-lecture checks, when the teacher asks to answer questions to verify how the knowledge was memorized. Normally students are reluctant attending such quiz (which are normally 3-5 minutes long) and never try to cheat (as know that those results will not affect the final mark) and therefore are showing their nature (i.e. their biometrical characteristics). Notice that the proposed security system is sensitive to the amount of information flowing through the verification subsystem. The more data flows through the more exact answer the verification procedure can produce and therefore the multiple choice exam is the worst choice for the good elearning environment. Finally we would like to mention that the soft students identification and verification methods do not include a protection subsystem against students that are using lectures’ materials although there are not supposed to do that. The reason is simple – it is always possible to organise questions so, that they will not require citrating from lectures, but instead will require knowledge to formulate answers. The correct approach is to check an understanding of the learned material. All this should create enough space for applying soft students’ identification approach during the e-learning environment online examination process. As it was mentioned before the biometrical factors are normally used for strengthening other approaches. The reason is simple and lies in the reliability of the verification system. The soft biometrical students’ identification approach must produce no more than alerts, i.e. should never produce a strict result. An automatic “failed” mark cannot be given since the examination stress and possible uncomfortable environment should be considered. Therefore only alert is raised showing a suspicion that somebody else is acting and thereafter an additional routine is run. Here we have several alternatives to follow. First of all it is possible to react on the alert directly and demand a re-examination on site. If a person cheats then the reexamination requirement (even a short version) will lead either to the situation when this person will not appear or s/he will prepare as had to prepare before and the final result (the student obtained the transferred knowledge) will be achieved. Thirdly it is possible to use extra methods (none standard
118
KUMLANDER
verification one like automatic phase recognition system) to ensure reliability of exam results, which are described below. The second additional approach can be dividing the exam into two parts. The first one is an online long complex exam, which can be seen as the main part of the examination. The other one is very short one face-to-face check to compare the long exam answers with the person knowledge to verify the exam end result. Notice that it is similar to the previous situation after an alert, but the short exam is done always and is slightly longer in case of alerts. Anyway, the face-to-face exam doesn’t necessarily mean an exam in a particular place, but can be organised using a web-cam or a certified educational centre. The major goal here applying the e-exam is to decrease the time the teacher has to spend for exams. The third approach is addressing questions to a particular student. This method requires clustering students and keeping a database of progress for each student. The approach asks to use intermediate knowledge checks during which his/her correct answers are saved into a database. After that some of those questions are presented during the main exam and if the student doesn’t know the correct answer any longer then an alert should be issued. It is possible to check the vocabulary of the answer as well and see how the student has formulated answers of the well-known (for him/her so far) questions. Finally it is possible to use another verification system and produce a strict verification result if both systems produce alerts. For example use a web-cam or a certified centre applying the soft biometrical methods to increase the security level. Here the soft biometrical approach is used in order to ensure that the person who came into the certified centre is exactly the person whose documents are checked (i.e. trying to avoid a situation when another person goes to do the exam as the examiner is not a policeman and doesn’t have enough knowledge to check identification cards). The beauty of the soft students’ identification methods is that it doesn’t require special hardware, so is invisible for people who are checked and therefore is a good supporting identification approach. V. CONCLUSION The biometric keystrokes and used words analysis based students identification system can produce considerable benefits in any e-learning environment as a part of e-exams. The proposed approach can be seen as a soft security strengthening method as it doesn’t require any special hardware. This eliminates certain e-courses attending requirements that are set by other (“hard”) biometrical verification and identification methods like for example the face recognition one. The proposed system implementation is quite simple, but does require careful planning and organisation to function adequately and produce meaningful alerts. The system in any case is not a prohibiting type system, but is a warning one and therefore is advised to be used in combination with other methods described in the paper.
REFERENCES [1]
E. González-Agulla, E. Argones-Rúa, C. García-Mateo, and Ó W. M. Flórez, “Development and Implementation of a Biometric Verification System for E-learning Platforms”, EDUTECH, Computer-Aided Design Meets Computer-Aided Learning, IFIP 18th World Computer Congress, 2004, pp. 155-164. [2] O. Guven, S. Akyokus, M. Uysal, and A. Guven, “Enhanced password authentication through keystroke typing characteristics”, Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: artificial intelligence and applications, 2007, pp. 317322. [3] A. K. Jain, A. Ross, S. Prabhakar, “An Introduction to Biometric Recognition”, IEEE Trans. on Circuits and Systems for Video Technology, vol. 14, no. 1, 2004, pp 4-19. [4] A. K. Jain, L. Hong, and S. Pankanti, “Biometric Identification”, Communications of ACM, vol. 43, no.2, 2000, pp. 91-98 [5] J. Illonen, “Keystroke dynamics”, In Advanced Topics in Information Processing Lectures, 2003. [6] R. Gaines, W. Lisowski, S. Press, and N. Shapiro, “Authentication by Keystroke Timing: some preliminary results”, Rand Report R-256-NSF, Rand Corporation, 1980. [7] R. Joyce and G. K. Gupta, “Identity Authentication Based on Keystroke Latencies”, Commun. ACM, vol. 33, no. 2, 1990, pp. 168-176. [8] I.Sogukpinar, L. Yalçin, “User identification via keystroke dynamics”, Ist. Üniv. Journal of Electrical and Electronic Engineering, vol. 4, no. 1, 2004, pp. 995-1005. [9] S. Bleha, C. Slivinsky, and B. Hussien, “Computer-access security systems using keystroke dynamics”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, 1990, pp. 1217-1222. [10] Y. Sheng, V. V. Phoha and S. M. Rovnyak, “A parallel decision treebased method for user authentication based on keystroke patterns”, IEEE Transactions on Systems, Man, and Cybernetics, vol. 35, no. 4, 2005, pp. 826-833. [11] E. Yu and S. Cho, “GA-SVM wrapper approach for feature subset selection in keystroke dynamics identity verification”, Proceedings of the International Joint Conference on Neural Networks, vol 3, 2003, pp. 202253 – 2257. [12] B. Hussien, R. McLaren, and S. Bleha, “An application of fuzzy algorithms in a computer access security system”, Pattern Recognition Letters, vol. 9, no. 1, pp. 39-43.
Innovation in Telemedicine: an Expert Medical Information System Based on SOA, Expert Systems and Mobile Computing Denivaldo Lopes, Bruno Abreu and Wesly Batista Laboratory of Software Engineering and Computer Network – LESERC Federal University of Maranhão – UFMA São Luís - MA, Brazil {dlopes, babreu, wbatista}@dee.ufma.br
Abstract-This paper presents Expert Medical Information System (EMIS) that supports the medical information management such as electronic health records (EHR) and it helps in the diagnostic processes. EMIS is designed using Service Oriented Architecture (SOA). All functionalities of EMIS are provided as services that are consumed by end clients or other services. A service layer masks a business logic and database tiers that execute the functionalities. The business logic contains an electronic health record and an expert system based on PROLOG for improving the diagnostic processes. We present an illustrative example of EMIS to diagnose hemorrhagic dengue.
I.
INTRODUCTION
In the last decades, telemedicine has take a step forward thanks to the advances in electronic devices and telecommunication technologies that improved mobility, provided low cost microprocessor and interconnected devices through fast data communication infrastructures. For example, a smartphone or a handheld can be used to transfer biometric signals to a diagnostic center where a physician can provide a diagnosis based on these biometric signals [1]. Nowadays, Medical Information Systems (MIS) and Computer-Aided Diagnosis (CAD) can take advantages from new software platforms, e.g. Web Services and Java 2 Micro Edition (J2ME), and new software architectures, e.g. Service Oriented Architecture (SOA). On the one hand, new platforms as Web Services, J2ME and dotNET Mobile have provided a hardware/operating system independent platform to develop software systems for mobile devices. On the other hand, old platforms as PROLOG can be exposed as service and can continue co-existing with new systems. PROLOG can be used to create expert systems to aid in a disease diagnostic process. In this paper, we provide Expert Medical Information System (EMIS) that is based on SOA and that brings together a MIS to support medical information management such as electronic health record (EHR) and a CAD to support disease diagnosis such as an expert system for hemorrhagic dengue detection.
This paper is organized as follows. Section 2 presents an overview about some concepts such as medical information system, Web Services and Expert System. Section 3 presents our SOA solution for medical information system and for CAD. Section 4 concludes our paper presenting limitations and trends of our research. II. OVERVIEW A. Medical Information System and Telemedicine Medical information System (MIS) is an information system to support medical domain providing the use of computer, particularly software, and telecommunication infrastructure to transmit, retrieve and store medical information. Nowadays, MIS are distributed systems interconnecting some medical organizations (hospitals, medical centers and so on) and health professionals using the Web as environment [6]. Telemedicine can be defined as the use of electronic information and communication technologies to provide and support health care when distance separates the participants [5]. B. Web Services and SOA The concept of services was introduced before Web Service technologies. In fact, this concept has been used for a long time by OSF’s Distributed Computing Environment (DCE) [3], OMG’s CORBA [8], Sun’s Java RMI, and Microsoft’s Distributed Component Object Model (DCOM) or COM+. A service is an abstraction of programs, business process and other artifacts of software defined in terms of what it does, while Service-Oriented Architecture (SOA) [9] describes how a system composed of services can be built. SOA is also a form of distributed system architecture based on the concept of services that is characterized by the following properties [9]: logical view, message oriented, description orientation, granularity and platform neutral.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 119–124. © Springer Science+Business Media B.V. 2008
120
LOPES ET AL.
Fig. 1. Service Oriented Architecture (SOA)
Figure 1 presents a simplified SOA model. An AgentProvider has Services. These Services are described through a metadata representation, i.e. ServiceDescription. Afterwards, the AgentProvider stores information of its services in a Registry. An AgentRequester searches in the Registry for a specific service following a determined criterion. The Registry returns information of a desired Service. The AgentRequester finds the meta-data of this service and uses it to exchange messages with the service. C. Prolog and Expert Systems “PROgrammation en LOGique”‘ (PROLOG) is a declarative programming language based on Robert Kowalski’s procedural interpretation of Horn clauses [2] [7]. Horn clauses consist of first-order predicate logic and can be classified in two types: facts and rules. Facts can be seen as rules that are always true. Rules consist of calls to predicates that must be satisfied to be true otherwise the rule is false. The Prolog approach is based on know facts, relationships about problems (i.e. rules) and an inference engine that executes questions in order to proof rules. The main features of PROLOG are [7]: rule-based programming, built-in pattern matching and backtracking execution. A Prolog system consists of a text editor, inference engine and shell. A text editor allows writing a Prolog program. Inference engine executes the questions realizing pattern matching and backtracking in order to solve a problem (question). Shell is an interface between the inference engine and user. Several systems provide a Prolog Engine and Shell such as SWI-Prolog1, LPA’s Win-Prolog2 and GNU Prolog3. Expert Systems (ES) are computer applications which embody some non-algorithmic expertise for solving certain types of problems [7]. In other words, ES consists of computer programs that simulate a human expert in a specific domain, i.e. it must have subject-specific knowledge and analytical skills similar to human experts. ES is being developed by researchers in artificial intelligence since 1960s until nowadays and has been applied in systems for financial planning decision, medical diagnostic systems, fault detection tools and simulation of chemical processes. Figure 2 illustrates a generic expert system (based on [7]). A Domain Expert is a human that detains specific knowledge and skill. A Knowledge Engineer 1 2 3
SWI-Prolog Web site: http://www.swi-prolog.org/ Win-Prolog Web site: http://www.lpa.co.uk/ GNU Prolog Web site http://www.gprolog.org/
interacts through the use case Knowledge Extraction to proceed with interviews and other mechanisms to capture the specific knowledge and skills of a Domain Expert. The knowledge and skills are translated in a format to be stored in the Knowledge Base. A User utilizes the User Interface to interact with the Inference Engine to proceed with queries in the Knowledge Base. During the utilization of the expert system, particular data being used to solve a problem are stored in the Working Storage. Domain Expert UserInterface
*
1* * 1
<<uses>>
Knowledge Extraction
User InferenceEngine
1 <<uses>>
<<uses>>
* Knowledge Engineer
Knowledge Base
Working Sorage
*
Fig. 2. Expert Systems: use case diagram
Expert systems for supporting disease diagnosis can be designed and programmed in Prolog using an inference approach called goal driven reasoning or backward chaining. This approach consists of IF THEN rules to break a goal into smaller sub-goals that are easy to proof. Figure 3 presents rules IF...THEN...ENDIF and the equivalent syntax in Prolog following the goal driven reasoning approach. IF firstPremise and secondPremise and thirdPremise and ... and nPremise THEN conclusion ENDIF (a)
conclusion:firstPremise, secondPremise, thirdPremise, ... nPremise. (b)
Fig. 3. Rules: (a) IF...THEN and (b) equivalent Prolog syntax
III. EXPERT MEDICAL INFORMATION SYSTEM Expert Medical Information System (EMIS) aims to support the medical information management and disease diagnosis. Figure 4 presents EMIS that is based on:
INNOVATION IN TELEMEDICINE
121
Fig. 4. Expert Medical Information System: main components
• Service Oriented Architecture (SOA): functionalities are provided as services and the whole system is organized as services, service provider and service consumer. According to Figure 4, EMIS provides functionalities as services available on the Internet. These services can be found in a UDDI registry that provides the location of their definitions, i.e. WSDL document. Then a service consumer can build a stub based on the WSDL document of a service. And interactions can be carried through SOAP messages. The service consumer can be a lightweight application running on a mobile device such a PDA or a smartphone or a Web Application that generates Web Pages accessed through Web Browsers. The business logic of the electronic health records and disease diagnosis support are provided as services. • Expert System: simulates the human expert in a specific domain providing solutions to solve problems according to a knowledge base. Actually, EMIS has an expert system to support the diagnosis of hemorrhagic dengue. • Mobile computing: provides mobility to a health professional in outside of medical centers. According to Figure 4, a user utilizing a mobile device can access the electronic health records that contain information about patients or can access the expert system for helping in a disease diagnosis. The communication between the mobile device and the service provider is realized using wireless telecommunication infrastructure (e.g. cellular network, IEEE 802.11 or Bluetooth). A. Architecture of EMIS EMIS can be organized in the following components: • Service Provider: contains services that mask the business logic and database of the electronic health records and the expert systems. • Service Consumer: is responsible to provide a lightweight application for health agents. It can be of two types: consumer for mobile devices and web page generator. A consumer for mobile devices is developed to run on PDAs and smartphones, while web page generator is a Web
application that provides Web Pages as interface to end users using a Web Browser. Service Provider Figure 5 presents the proposed service provider that is composed of: • UDDI Server: stores information about services. Each service is registered in this UDDI server that can be used to find services. Although a user can make a search, the access to services depends on authorization, authentication and access control of EMIS. • Web Services: provide a layer that masks the business logic and databases of EMIS. The business logic is realized by components that are responsible for processing the medical information and execute the disease diagnosis support. • Medical Process Engine: has the responsibility to coordinate the processes involved in EMIS. For this purpose, it coordinates the following components: Medical Information Manager, Resource Manager and Expert System Handler. • Configuration Manager: provides the functionalities to configure and control the execution of the other components. It is responsible to set the options of each components, manage user accounts, set control access and so on, while Medical Process Engine controls the other components to accomplish the business logic. • Configuration Interface: is a graphical user interface to proceed with the configuration of the system. • Expert System Handler: realizes the interconnection between the Expert System with the rest of EMIS components. • Expert System (Prolog Engine): supports the disease diagnosis based on a knowledge base created with expert knowledge and skills. This expert system is executed by a Prolog engine that executes queries based on facts and rules. • Knowledge Base: contains facts and rules that map the expert knowledge and skills to a format that can be understood to a Prolog Engine. • Resource Manager: manages resources such as medical images, clinical analysis, laboratory exams and so on.
122
LOPES ET AL.
• Resource: contains medical images, clinical analysis, laboratory exams and so on. • Medical Information Manager: manages information about patients and health agents. • Medical Information: contains information about patients and health agents. • Security Service: provides authentication, access control, data confidentiality, data integrity and non repudiation. These services are important to provide a secure system against malicious attacks. UDDI Server Web Service
processing. However, a mobile device is more vulnerable than static computer to be stolen. Thus the stored information in a stolen mobile device can be read by anybody. User Interface
Patient Information Management
Diagnostic Management
Expert System Interface
Notification
Service Provider Business Logic
Medical Process Engine
Security Service
Expert System Handler
Resource Manager
Expert System (Prolog Engine)
Database
Medical Information
Resource Management
Global Management
WebPage Generator
Medical Information Manager
Configuration Interface
Resource
Knowledge Base
Fig. 5. Architecture of EMIS: Service Provider and Web Page Generator
Service Consumer The consumer called Web Page Generator creates Web Pages to users and interacts with the services. Then, users can use a Web Browser to access EMIS systems. This client can run on laptops or desktops or workstations, so it has not hardware dependence. The consumer for mobile devices is a lightweight application running on PDAs and smartphones. Then, health agents can always and everywhere access the electronic health records and the disease diagnosis support. The following considerations are relevant in the development of this lightweight application on mobile devices: • Resource poor: processing power and memory capacity are and will continue being less in mobile devices (e.g. PDAs and smartphones) than in static computers. • Limit to persist data: a lightweight application should locally manage few amount of data, in general, the hard processing and large amount of data should be maintained by a remote application. Moreover, data in mobile device are prone to be lost when the device is broken or when it is stolen. • Limit in the communication infrastructure: mobile devices can easy move from one to another geographic area and are connected with the Internet through a GPRS/EDGE telecommunication infrastructure that has the maximum data transfer rate of 384 kbit/s. • Information security: a mobile device is in general a personal instrument for storing few data or realizing simple
Local Patient Information
Web Service Stub
Resource Storage
Fig. 6. Architecture of EMIS: service consumer for mobile devices
Figure 6 presents our service consumer for mobile devices. It has the following components: • User Interface: interacts with the user providing the functionalities of the other components such as Diagnostic Management. • Diagnostic Management: provides local support to realize a disease diagnosis. For this purpose, the mobile devices should request the aid of the expert system installed in the Service Provider of EMIS. • Resource Management: manages medical images such as Xray, mammography, ECG and so on. Due the reduced screen dimensions of a mobile device, the analysis of large medical images such as X-Ray and mammography is generally very difficult. However, other images such as ECG can be analyzed by a health agent using a small screen like there are in mobile devices. Although mobile devices have a limited screen dimension, some mobile devices like Nokia N95 can receive, store and reproduce images in a TV monitor. Then, a health agent visiting a patient in his house can use his TV monitor to visualize large medical images. • Expert System Interface: is responsible to mediate the User Interface with the Expert System running on the Service Provider. • Notification: as mobile devices are not always connected with the Internet or a local network, a notification facility is desirable to notify when an event occurs e.g. X-ray or blood exam are available in a hospital center. Then, a health agent can download the resource or information reported in this notification. • Web Service Stub: supports the communication via SOAP with the service Provider. • Patient Information Management: manages locally the patient information. A health agent can remotely access the
123
INNOVATION IN TELEMEDICINE
health records of a patient or can store in a mobile device these records for future access. • Local Patient Information: stores locally health records of patients.
Symptoms include fever , headache , rashes, pain in muscles and in joints
B. Electronic Health Record Electronic Health Record (EHR) contains an individual health record for each patient in a digital format that can be easy and fast accessed using computers. Our proposed EHR is stored in two databases: Medical Information and Resource. Figure 7 (fragment) presents the Medical Information schema using UML notation.
[True]
[False]
MedicalExamination Doctor
1
Nurse
[True]
* *
1
IllnessOccurrence
HealthData
LaboratoryTest
1 *
Visual Hemorrage
Diagnosis
1 *
It is not hemorrhagic dengue
1
* 1 1
[False]
Past case of dengue
*
*
Physician
[False]
Fever presents in the last 7 days
Symptom
*
HealthAgent
[True]
1
1
ElectronicHealthRecord *
ExpertSystemDiagnosis [False]
* *
MedicalProcedures
1
[True]
*
1
Patient
Tourniquet test
Treatment
1
PrescribedMedication 1 *
[blood points >= 20] [blood points < 20]
Fig. 7. EHR: schema for Medical Information (fragment)
C. Expert System Our proposed expert system supports disease diagnostic decision. For illustrating its application, we chose hemorrhagic dengue as malady to be diagnosed. Dengue is an infectious disease characterized by rash and aching head and joints [4]. It is transmitted by mosquitoes called Aedes aegypti when it is infected with one of four virus serotypes (DEN-1, DEN-2, DEN-3 and DEN-4) of the genus Flavivirus. These mosquitoes are typical of the tropics, then there are a large incidence of dengue that occurs in tropical countries such as India, Indonesia and Brazil. The most severe type of dengue is dengue hemorrhagic fever that can be characterized by fever, hemorrhagic tendency, thrombocytopaenia (< 100,000 platelets per mm3) and evidence of plasma leakage. In order to confirm the diagnosis of dengue, doctor proceeds with a clinical diagnosis and after can request a laboratory exam of serology or Polymerase Chain Reaction (PCR). The treatment includes augmentation of oral fluid intake to prevent dehydration. It is very important to avoid medicines such as aspirin and non-steroidal anti-inflammatory. Figure 8 presents some steps that are generally employed in clinical diagnosis to detect hemorrhagic dengue. Listing 1 presents a fragment of source code of our expert system developed in Prolog.
Hemorrhagic dengue detected
Request an hospital care
Suspect of hemorragic dengue
Request a laboratory exam
Fig. 8. Activity diagram used to detect hemorrhagic dengue (fragment) Listing 1. Source code in Prolog (fragment) helpDengue :print(‘hemoDengue - Expert System to support’),nl, print(‘ Hemorrhagic Dengue Diagnose’),nl. hemoDengue :print(‘Patient presents fever, rashes, pain in muscles and in joints?’),nl, not(symptoms), print(‘It is not hemorrhagic dengue’),nl. hemoDengue :print(‘Patient presents fever in the last 7 days?’),nl, not(fever), print(‘It is not hemorrhagic dengue.’),nl. hemoDengue :print(‘Past case of dengue?’),nl, not(pastCaseOfDengue), print(‘It is not hemorrhagic dengue’),nl.
124
LOPES ET AL.
hemoDengue :print(‘Patient present visual hemorrage?’),nl, hemoSignals,detectedHemoDengue.
In future research work, we aim to realize tests of EMIS, introduce robust security services and elaborate questionnaires for users such as nurses and doctors.
hemoDengue :print(‘Make the tourniquet test.’),nl, print(‘The blood points are >= 20?’),nl, moreTwenty,detectedHemoDengue. hemoDengue :print(‘Diagnose: Suspect of hemorragic dengue’),nl, print(‘Action: Request a laboratory exam’),nl, suggestion. detectedHemoDengue :print(‘Diagnose: Hemorrhagic dengue.’),nl, print(‘Action: Request an hospital care’),nl, suggestion. symptoms :- read(Resp),yes(Resp),nl. fever :- read(Resp), yes(Resp), nl. hemoSignals :read(Resp), yes(Resp), nl. moreTwenty :read(Resp), yes(Resp),nl. pastCaseOfDengue :- read(Resp), yes(Resp), nl. suggestion :- print(‘Sugestion: You can treat the symptoms but you must not use aspirin and non-steroidal anti-inflammatory.’). yes(yes). yes(y). Fig. 9. Running EMIS: Expert System
D. Prototyping The prototyping is being developed conform to the following description: • Service Provider is being developed using: Java language and JDK version 1.5, JDBC, MySQL server version 5.0, SWI-Prolog version 5.6.30, Interprolog version 2.1.2, Java Web Service Developer Pack (JWSDP) version 2.0 of Sun and Tomcat container. • Web Page Generator is being developed using: Tomcat Container version 5.5, Web Service Stub written in JWSDP version 2.0. • Service Consumer is being developed using: Sun Java Wireless Toolkit for CLDC version 2.5 with CLDC 1.1 and MIDP version 2.0, and Web services API and library for mobile devices provided with the J2ME. Figure 9 presents the utilization of the expert system for supporting a health agent to take a preliminary decision about the presence of hemorrhagic dengue in a patient. IV. CONCLUSION This paper presented EMIS that is a medical information system powered with functionalities of EHR and CAD. EMIS is based on SOA, and then it can be reused by other systems without concern about the platforms used in its implementation. In addition, EMIS is designed to have lightweight clients such as Web browsers on desktops or applications running on mobile devices. EMIS facilitate the health care in remote places providing access to health records and supporting the disease diagnosis.
ACKNOWLEDGMENT The author Denivaldo Lopes would like to thank CTInfo/MCT/CNPq and FAPEMA for supporting this research work and providing a research fellowship. REFERENCES [1]
[2] [3] [4] [5] [6]
[7] [8] [9]
A. V. Halteren, D. Konstantas, R. Bults et al, “MobiHealth: Ambulant Patient Monitoring Over Next Generation Public Wireless Networks”, eHealth: Current Status and Future Trends: Studies in Health Technology and Informatics, 106:107–122, 2004. M. Bramer. Logic Programming With Prolog. Springer, 2005. DCE. Distributed Computing Environment (DCE). Available at http://www.opengroup.org/tech/dce/, Access date: October, 2007. D. J. Gubler, “Dengue and Dengue Hemorrhagic Fever”, Clinical Microbiology Reviews, 11(3):480–496, 1998. M. J. Field. Telemedicine: A Guide to Assessing Telecommunications in Health Care. National Academy Press, Erewhon, NC, 1996. M. Barhamgi, D. Benslimane and P. Champin, “A Framework for Data and Web Services Semantic Mediation in Peer-to-Peer Based Medical Information Systems”, CBMS’06: Proceedings of the 19th IEEE Symposium on Computer-Based Medical Systems, pages 87–92, 2006. D. Merritt. Building Expert Systems in Prolog. Springer-Verlag New York, 1989. OMG. The Common Object Request Broker Architecture: Core Specification, Version 3.0.6 formal/02-12-02, December 2002. W3C. Web Services Architecture (WSA), February 2004. Available at http://www.w3c.org/TR/2004/NOTE-ws-arch-20040211/.
Service-Enabled Business Processes: Constructing Enterprise Applications – An Evaluation Framework Christos K. Georgiadis Department of Applied Informatics, University of Macedonia, Thessaloniki, Greece [email protected]
Elias Pimenidis School of Computing and Technology University of East London, UK [email protected] Abstract — Web services aiming at spontaneously integrating services from different providers in ensuring at offering a user the best possible experience, face considerable challenges in materialising that notion of improved experience. The interrelationships between the concepts of security, performance and interoperability, while critical to any web service, pose the biggest test and constitute the key features of the proposed evaluation framework. The key objective of this framework is to establish a means of evaluating how close a web service can get to achieving best practice without one of the three key features overriding the others.
Cooperative services are capable of intelligent interaction and are able to discover and negotiate with each other, mediate on behalf of their users and compose themselves into more complex services. These exact properties of WS are the ones that pose most challenges and raise the issue of guaranteeing a given “quality” of business processes. This concept of quality is expressed in terms of functional and non-functional requirements, such as performance or security and it is these two particular attributes that attract researchers in their quest for means of evaluating WS.
Index Terms — Business Process Integration, Web Services, Evaluation, Transactions, Service-oriented Architecture.
II. BACKGROUND – RELATE WORK
I. INTRODUCTION As Web Services (WS) technology matures, complete business-level interoperability across trader middleware platforms is certainly a realistic perspective. A suite of standards for conducting electronic business using XML over the Internet is in progress. The goal is to support messaging, metadata description and discovery, secure and/or transacted interactions, and service compositions capable of realizing and managing business processes. A business process specifies the potential execution order of operations from a collection of WS, the data that is shared between these composed WS, which partners are involved, and how they are involved in this context. A WS platform architecture may act as a supporting infrastructure for business processes. And the main reason for this is that WS are capable of integrating business application logic owned by different organizations, in order to provide a loosely coupled architecture for building distributed e-commerce systems with universal interoperability. Therefore, WS-based environments may be considered as logical collections of WS whose publication and use depend on limitations characteristic of business service delivery [1]. WS have been widely adopted in industry as a standard platform-independent middleware technology [2].
In reviewing the work of other researchers in the field of WS evaluation, the authors find some common threads in most published work. Most researchers focus on a common belief in that interoperability of WS must come along with considerable performance penalty [3], [4]. Thus almost all of the proposed models or frameworks used to evaluate some aspects of WS concentrate on the following three factors: - Interoperability, - Security and - Performance. One common finding is that all three could be affected by the automatic choice of partners in forming a WS and that all three could mutually affect each other. Casola et al. [5] argue that although a service provider is able to guarantee a predefined service level and a certain security level, the mere nature of service integration and interoperability and the utilisation of up to date technologies does not allow for automatic measurement of such attributes. The most common approach amongst service providers is the “Service Level Agreement” (SLA). At the state of art, SLAs are primarily related to the quality of service and not to security. To reach the aim of SLA dynamic management to support the interoperability among services, a formalized approach both for defining the different quality factors of a service in a SLA, and for evaluating them automatically is required. Policy languages
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 125–130. © Springer Science+Business Media B.V. 2008
126
GEORGIADIS AND PIMENIDIS
are starting to gain popularity, but they do not specify how to automatically evaluate and compare the related security and quality provisions [5], [6]. Casola et al. [5] propose a two component framework for evaluating WS security. Their framework consists of a SLA policy meta-model providing a flexible approach to define quality criteria and formalize them in an unambiguous way by policies. The policy meta-model can be instantiated by Customers or Providers in order to define the specific Quality Models on which their policies will be based. At the same time, Chen et al. [7] remain sceptical in that while WS security enhances the security of WS, it may also introduce additional performance overheads to standard WS due to additional CPU processing and larger messages transferred. They propose to extend their existing WS performance model by taking the extra WS security overheads into account. The most attractive feature of WS is its interoperability in heterogeneous environments, and exposing existing applications (as WS increase their reach) to different client types. A further advantage is that WS have flexible communication patterns, as an invocation to a WS can be either synchronous or asynchronous. Liu and Gordon [8] focus on the combination of component technologies with WS which further extends the set of architecture choices, with each providing different levels of performance. Thus it is important for an architect to understand these performance implications in the early stages of the architecture design. They analyse the performance of four alternative architectures using J2EE and WS, concluding that WS-based architecture has significant performance degradation due to the large amount of service time spent on parsing XML information in a WS Description Language (WSDL) file. Han et al. [9] argue that as a consequence of the rapid growth of WS applications and the abundance of service providers, the consumer is faced with the predicament of selecting the ‘right’ service provider. They conclude that there are no common evaluation criteria for WS because WS can be applied to many domains and different end users’ desire for quality of WS may vary considerably. They propose a new evaluation model for WS, which supports customizing evaluation factors dynamically. It computes the weight of each evaluation factor using a machine learning algorithm based on samples from domain experts. This model can be used in WS management, WS selection and WS composition. III. COORDINATING AND COMPOSING WEB SERVICES A. Web Services Infrastructures as Integration Middleware A technological resemblance to WS environments is that of application servers, where middleware functions such as messaging, transaction management, etc. assist interoperation of application components. In the same way, WS environments may provide similar middleware functions to service consumers. Yet, unlike application servers, the business environment in which WS evolve
constrains service delivery. Business requirements (from the consumer side) restrict how services are discovered, authenticated, monitored, and mediated. From the provider side, suppliers require regulations on how services are published, brokered, and repurposed through composition with other services. Also, unlike application servers, the middleware itself consists of services that are potentially outsourced in the WS environment [1]. Current WS environments usually have at least one broker who is responsible for delivering services in compliance with the providers’ constraints, such as authentication, and quality in collecting input and returning output. To address existing and future business model unpredictability, WS environments have to let third parties become brokers and mediators. Service providers require methods to integrate their services with other services and brokers [10], [1]. Supporting business process via WS requires initially the establishment of a generic WS coordination and composition architecture. Developing a composite WS requires a specification that provides details about the component WS that it contains: for example, it identifies and triggers the component WS, it describes their execution order, it manages the data dependencies among them, and it triggers corrective strategies for exception handling. When composing WS, the business logic of the consumer is implemented by numerous services. A service consumer invoking a composite service can itself be exposed as a WS. This allows the definition of gradually more complex structures by increasingly aggregating services at advanced level of abstraction [11], [12]. B. Service Coordination and Transactions Supporting transacted interactions depends significantly on the approach used to connecting WS together to form meaningful business processes. Coordination-oriented specifications define the complementary system layer that is necessary to ensure that the multiple WS achieve the desired results of the application and that the cooperation of multiple WS from whatever source (local or remote) produces predictable behaviour despite system failure and leaves the system in a known state [13]. Currently, there are two major coordination specifications: the WS-Coordination (WS-C) and the OASIS WS-Composite Application Framework (WS-CAF). In these specifications, specific standards aim on the introduction of transactions on WS implementations and attempt to bring the best features of transaction processing to WS [14], [15], [16]: - Web services Transactions (WS-T) standard (part of WS-C): it supports both short-running atomic transactions (WS-Atomic Transaction specification) which are executed within limited trust domains and business activities (WS-Business Activity specification) which are potentially long-lived transactions and spread in different domains of trust, - Web Services Transaction Management (WS-TXM) standard (part of WS-CAF): it supports traditional ACID transactions, non-ACID long running actions,
SERVICE-ENABLED BUSINESS PROCESSES
and business process transactions structured as a collection of previous type transactions with the responsibility to perform some application specific work. Business transactions (also known as WS-based transactions) call for an extended transaction model that builds on existing WS standards and supports both long period transaction execution and relaxation of isolation levels. An additional requirement is the ability to negotiate transaction guarantees at runtime [17]. C. Service Composition and Business Processes A service composition combines services following a certain composition pattern to achieve a business goal, solve a scientific problem, or provide new service functions in general [18]. The focus is on tying multiple WS together to create multi-step applications, such as filling a purchase order or resolving an insurance claim. Regarding business process support, the following distinction on WS compositions are identified [19], [20]: - Choreography is typically associated with the public message exchanges that occur between multiple WS, rather than a specific business process that is executed by a single party. While for orchestration, the process is always controlled from the perspective of one of the business parties, choreography is more collaborative in nature: each party involved in the process describes simply the part they play in the interaction. The Web Services Choreography Description Language (WSCDL) is the current specification for this type of service compositions. - Orchestration describes how WS can interact with each other at the message level, including the business logic and execution order of the interactions. These interactions may cross applications or/and organizations and actually formulate a transactional, long-lived multistep process model. WS orchestration describes the way WS coordinate complex multi-party business processes. The Business Process Execution Language (WS-BPEL) specification focuses on supporting the definition of specific WS-based business processes [21], [22]. It is an extensible workflow-based language that aggregates services by orchestrating secure service interactions and transactions. It provides a long-running transaction model that allows increasing consistency and reliability of WS applications. Correlation mechanisms that allow identifying statefull instances of business processes based on business properties, are supported. Orchestration and choreography may be applied simultaneously. Orchestration is the ideal composition approach for internal business processes with strongly coupled internal actors, while choreography is in general preferred for developing cross-organizational business processes between loosely coupled partners [23]. IV. DEFINING THE PROPOSED EVALUATION FRAMEWORK Service-oriented enterprises are quite concerned with
127
possible interruption of services. Changing and negotiating with service providers leads to the emergence of service mediators. Transactions between parties might be complicated, because service intermediaries have several responsibilities trying to associate providers, consumers and even other intermediaries. The multiple roles that service intermediaries are asked to play provide a useful set of technical and user-oriented criteria for evaluating the efficiency, usability and effectiveness of WS usage for business processes. Generally, the effectiveness of WS-based interactions and transactions is influenced by the major particular characteristics of the service-oriented architecture. Among them, we may emphasize the following [21], [24]: - Need for interoperability and for ease of integration – WS-based systems wish to support service dynamic substitution by assembling different services developed separately (and often offered by competing service suppliers) into one working application. - Distributed character – different WS are frequently distributed across distant geographical positions, and this raises all related issues concerning performance. - Decentralized maintenance process – WS form a distributed environment which is not placed within a single trust area and this complicates the security issues and makes unrealistic to count on a single authority to coordinate all participating parties. The set of design concern subjects which we have selected is obviously not an exhaustive list of evaluation criteria. Still, they are sufficient to cover a broader variety of areas under considerations. We may distinguish two sub-sets of criteria: in technical category we may group all decisive factors related to system behaviour, leaving out useroriented issues, which are actually grouped in the corresponding category. A. Technical Criteria Briefly speaking, interoperable and secure services of high-quality are necessary for supporting WS-based processes. We have to notice here, that especially interoperability across multiple transaction protocols is much more easily achieved when the overall system architecture follows an “open standards” philosophy of specifications, and avoids adopting inconsiderately proprietary and “closed” schemes. Regarding QoS considerations, the most significant issues are those related with performance. An additional criterion refers to scalability, and to the level of the adoption of best practices about the recommended granularity of interactions, focused specifically on achieving the proper level of reuse and composability. Among the most significant aspects of security are those of ensuring the confidentiality (secrecy), integrity, and availability of the involved interactions. Moreover, an essential part of security considerations refers to accountability. Based on mechanisms capable to ensure identification-authentication of service consumers (by providing a single definition of users and a single user sign-
128
GEORGIADIS AND PIMENIDIS
on for heterogeneous service architectures), and to ensure proper authorizations regarding service invocations (by establishing flexible access control rules), responsibility is promoted and repudiation problems are unlikely to take place. B. User-oriented Criteria Users involved in service-based processes (such as software developers who either build applications that consume specific services, or build actually service implementations), may appreciate (or may seek for) special characteristics capable of cultivating user trust and loyalty. User trust can be achieved mainly by pre-qualifying service providers, based on past experiences. Additionally, user satisfaction is increased when their interactions are designed taken under consideration the discovery of the best (registered) services based on parameters provided by them. On the other hand, system administrators look for abilities to monitor and report on the security and performance of the application in use. Of special interest is the monitoring of TABLE 1 EVALUATION CRITERIA A. Technical Criteria 1. Interoperability 2. Quality 2.1 Performance 2.2 Scalability 3. Security
3.1 Confidentiality 3.2 Integrity 3.3 Availability 3.4 Accountability
B. User-oriented Criteria 1. Participant-related 1.1 Cultivate user trust 1.2 Cultivate user loyalty 2. Administrator2.1 Monitor and Report related 2.2 Isolate and Diagnose problems availability issues and the capability to isolate and diagnose problems in the service-based environment. C. Context-aware Processes Contextual information may play a decisive role in supporting the whole set of the previously mentioned criteria. So, a central issue related to the evaluation criteria, is the capability of managing the context information properly. Flexible composition schemes demand from the WS, before agreeing to engagement, to consider their internal execution state and to question whether they would be rewarded for the engagement [25]. To reduce their current limitations, WS must measure both their existing capabilities and in progress commitments. In addition, they must evaluate their surrounding environment prior to binding to
any composition. In fact, WS must be context-aware. Context is the information that characterizes the interactions between users, applications, and the environment. By developing context-aware WS it would be possible, to consider the aspects of the environment in which the WS are to be executed. These aspects are multiple and can be related to user profiles, computing resources, time of day, and physical locations [26], [27]. The composition of WS using current standards is only conducted at the level of message interactions. This is hardly sufficient since composition must also be conducted at the level of message semantics. To achieve semantic composition that includes the context of the exchanged data, context information must be provided. This context information is any metadata that explicitly describes the meaning of the data to be exchanged between WS. When WS inputs and outputs are explicitly described using metadata, they can be automatically transformed from one context to another during WS execution. V. USING THE PROPOSED CRITERIA A. Policies All these aspects can be mapped onto policies, which would in essence define a WS’s acceptable behaviour. Policies may be considered as external, dynamically modifiable set of rules that cause the system to adjust to administrative decisions and changes in the execution environment. Thus, policies specify actions that must be performed according to detected events and context changes, and are indeed particularly appropriate for WS composition [25]. Adopting policies introduces the possibility of changing WS’ behaviour without altering a composition specification. By using policies, WS deployers can continuously adjust multiple aspects such as WS’ conflict-resolution mechanisms to accommodate variations in the environment. B. Reflections on the Proposed Evaluation Factors Security measures are not something that can be added in a certain system’s architecture, without having thought of them and design them at the very early stages. Since among the proposed criteria is to encounter successfully a number of security concerns (criteria A.3, B.1.1 and B.2.1), the basic model for describing all major issues regarding the utilization, exploitation and management of the evaluation criteria should operate above all as (or should facilitate the operation of) a rigid security-oriented mechanism, capable to: - Ensure that the whole process (along with its component WS, the participants of the process), make use of state-of-the-art encryption technologies in order to protect its (and their) confidentiality, integrity, and availability in a way that every participating part (whether it is a WS, or other architectural component) is accountable for its actions. - Grant or deny permissions to WS for participating in WS-based compositions.
SERVICE-ENABLED BUSINESS PROCESSES
The main security-related WS specification (WSSecurity) promotes the adoption of policies as the basis on which to interoperate. Policy language approaches (e.g. WSPolicy and OASIS XACML WSPL) are actually available [5], [6], [28] to define a rigid foundation for them. On the other hand, based on what has already been stated in the previous paragraph, policies are a promising way to support interoperability (criteria A.1) and to solve demanding mediation issues (criteria A.2.2, and B.1.2) in WS composition and transaction scenarios. Service mediation may be defined as the act of repurposing existing WS so they can cooperate in ways unanticipated during their development or deployment [1]. Thus, scalability is undoubtedly benefited by service mediation capabilities. Mediation is usually achieved by intercepting, storing, transforming, and routing incoming and outgoing messages. Since service interface is the only contact point between service providers and service consumers, designing a suitable service interface is an issue of critical importance. So, a prominent form of service mediation is service interface adaptation, where the goal is to keep interfaces as generic as possible while adapting them to specificities that arise in each usage scenario. This adaptation may be defined to some extent on user preferences (explicit or implicit) supporting in this way customization and personalized interactions. Certainly, this increases user satisfaction and user loyalty. Policies indeed seem as a proper means to exploit and manage WS evaluation criteria for supporting business processes, because they are capable to support the effectiveness of the security and interoperability technologies. This is done mainly, by handling efficiently the contextual information. The integration of context into WS composition ensures that the requirements of (and also the constraints on) these WS (either security- or interoperability-oriented) are taken into account. Context may support WS in their decision-making process when it comes to whether accepting or rejecting participation in a business process. Predicting what will happen to a WS would also be feasible in case the past contexts (that is, what happened to a WS) are stored. WS can take advantage of the information that past contexts cater, so they adapt their behaviour for better actions and interactions with peers and users. Moreover, context permits tracing the execution of WS during exception handling. It would be possible to know what happened and what is happening with a WS at any time. This is of major importance, as it facilitates everyday administrative maintenance tasks (criterion B.2.2). Last but not least, we have to discuss about the role of the major importance performance criterion (A.2.1). Performance is among the key factors of ensuring that the user is going to continue participating in particular WSbased processes. Certainly, the quality of WS-based processes serves as a benchmark to differentiate service providers [9], [29]. So, a further evaluation point is the effect the combination of interoperability and security would
129
have on performance. This would also link to user oriented criteria and in particular that of user loyalty as performance combined with a variety of choice would certainly affect a user’s intentions of participating in a particular WS-based process. An open issue therefore, is how to establish a means of dynamically evaluating performance in relation to all the other criteria mentioned above (e.g. the performance and effectiveness trade-offs presented in [30]), and how to map such type of calculations onto properly structured policies. C. Business Processes and the Role of Web Services Policies Each of the WS specifications and approaches discussed previously addresses one specific concern and has a value in its own right, independently of any other specification. On the other side, all WS specifications are designed to be composed with each other to provide a rich set of tools for secure, transacted and orchestrated WS, capable to support business processes [31]. WS-Policy is at the start of the path toward standardization as a specification to represent and combine statements about the quality of service properties. It provides an extensible framework that is intended to specifically deal with the definition of constraints and conditions that are associated with the use of WS. But, WSPolicy by itself is just a building block of the overall architecture. In the heart of every WS-Policy, there are the policy assertions that are the statements about the acceptable and supported values for relevant policy parameters. WSPolicy requires each policy domain to design and specify its own assertions. It relies on discipline-specific assertions to represent discipline-specific properties, such as security, transaction and business-specific policies [21], [28]. In this context, WS-Security Policy is a set of assertions designed to describe requirements about encryption algorithms for confidentiality, authentication tokens, digital signature algorithms for integrity, etc. The primary goal of WS-Security Policy is to define an initial set of assertions that describe how messages are secured on a communication path. The intent is to allow flexibility in terms of the tokens, cryptography, and mechanisms used, including leveraging transport security, while being specific enough to ensure interoperability based on assertion matching. Thus, WSSecurity Policy defines the model that the WS use to document their WS-Security support and requirements for requesters. It is therefore, an important building part of WSSecurity, a specification which provides the foundation for security. WS-BPEL’s place at the top of the WS platform stack allows it to inherit the expressiveness and power of the underlying standards for use in business processes. Underlying standards provide interoperability through standardized interfaces and messaging protocols. Moreover, as additional specifications are proposed for the WS framework, business processes are expected to take advantage of the new capabilities from those future standards by layering on top of the business logic that the business process provides. But more important is that with
130
GEORGIADIS AND PIMENIDIS
the modularity of the framework, additional functional aspects to business processes may easily be added. This work has presented a set of evaluation factors for WS composing business processes. The authors focus on how to relate these factors and their relative estimations to WS composition. This means that, additional capabilities that business (BPEL’s) constructs do not provide must be defined. As these capabilities are in a broader view qualityof-service requirements (and thus outside of the business logic), using WS policies is the proper solution. In such situations, the non-functional requirements are better met by different lower pieces of the WS stack. The attachment of policies to the business process enables the separation of concerns that is desirable to keep the business logic intact. In order to not change the business process model, the addition of features or quality-of-service aspects may be realized by attaching WS policies to different parts of the process, the corresponding WSDLs, or both. As a first step, to meet the aim of policy-oriented dynamic management of evaluation factors for service-based business processes and applications, a formalized approach (in the form of assertions compatible with WS-Policy specification) for defining those factors, is required. Furthermore, a mechanism for negotiating and associating a service agreement is required. That is, a decision making process to select a service by comparing service provider’s offers and service consumer’s request regarding the evaluation factors.
[4]
VI. CONCLUSIONS
[17]
This work proposes a two dimensional table of evaluation criteria as a first step of establishing an evaluation framework for WS. In this approach interoperability and security are recognised as essential to a user subscribing to a WS. Thus the proposed framework will be evaluating performance in relation to all those criteria included in table 1 above. The difficulty in establishing such an evaluation framework lies in that all previous attempts of evaluation utilise independent protocols that address security or performance but not both. A new set of protocols or a major revision of current ones is therefore required. It is expected that the evaluation results may help all involved parties (software developers, managers, system administrators, etc.) to become aware of the differentiations between approaches and adopt the various considerations judiciously in their implementations. This work is currently in development. The authors are also looking at means of capturing data that would be utilised with the new framework and improve the effectiveness of evaluation.
[18]
REFERENCES [1]
A. P. Barros, and M. Dumas, “The Rise of Web Service Ecosystems”, IT Professional 8, 5, pp. 31-37, Sep. 2006. [2] T. Erl, Service-Oriented Architecture, New Jersey: Prentice Hall PTR, 2004. [3] S. Chen, B. Yan, J. Zic, R. Liu, and A. Ng, “Evaluation and Modeling of Web Services Performance”, Proc. of the IEEE Intern. Conf. on Web Services (ICWS’06), 2006.
[5] [6] [7] [8] [9] [10] [11] [12] [13]
[14] [15] [16]
[19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30]
[31]
C. C. Marquezan, et al., “Performance Evaluation of Notifications in a Web Services and P2P-Based Network Management Overlay”, Proc. of the IEEE 31st Annual Intern. Computer Software and Applications Conf. (COMPSAC 2007), 2007. V. Casola, A. R. Fasolino, N. Mazzocca, and P. Tramontana, “A policy-based evaluation framework for Quality and Security in Service Oriented Architectures”, Proc. IEEE Intern. Conf. on Web Services (ICWS 2007), 2007. K. Tang, S. Chen, D. Levy, J. Zic, and B. Yan, “A Performance Evaluation of Web Services Security”, Proc. of the 10th IEEE Intern. Enterprise Distributed Object Computing Conf. (EDOC’06), 2006. S. Chen, J. Zic, K. Tang, and D. Levy, Performance Evaluation and Modeling of Web Services Security, Proc. of the IEEE Intern. Conf. on Web Services (ICWS 2007), 2007. Y. Liu, and I. Gordon, “An Empirical Evaluation of Architectural Alternatives for J2EE and Web Services”, Proc. of the IEEE 11th Asia-Pacific Software Engineering Conf. (APSEC’04), 2004. T. Han, H. Guo, D. Li, and Y. Gao, “An Evaluation Model for Web Service”, Proc. of the IEEE First Intern. Multi-Symposiums on Computer and Computational Sciences (IMSCCS’06), 2006. M. Bichler, and K.-J. Lin, “Service-Oriented Computing”, IEEE Computer, Vol. 39, issue 3, pp. 99-101, 2006. S. Chatterjee, and J. Webber, Developing Enterprise Web Services: An Architect’s Guide, New Jersey: Prentice Hall PTR, 2004. S. Dustdar, and W. Schreiner, “A Survey on Web Services Composition”, Int. J. Web and Grid Services, Vol.1, No. 1, pp. 1-30, 2005. M. Little, E. Newcomer, and G. Pavlik, “WS-CAF: Contexts, Coordination and Transactions for Web Services”, unpublished, http://labs.jboss.com/fileaccess/default/members/jbosstm/freezone/resources/papers/ICDCS2005.pdf, 2005. C.K. Georgiadis, and E. Pimenidis, “Web Services Enabling Virtual Enterprise Transactions”, Proc. of the IADIS Int. Conf. E-Commerce, pp. 297-302, 2006. B. Lublinsky, “Transactions and Web Services”, EAI Journal, Vol. 5, N. 1, pp. 35-42, 2003. L.F. Cabrera, et al., “Web Services Transactions Specification”, Vers. 1.0, http://www-128.ibm.com/developerworks/library/specification/ws-tx/, 2005. M. Little, “Transactions and Web Services”, Communications of the ACM, Vol. 46, No. 10, pp. 49–54, 2003. F. Curbera, R. Khalaf, N. Mukhi, S. Tai, and S. Weerawarana, “The Next Step in Web Services”, Communications of the ACM, Vol. 46, No. 10, pp. 29–34, 2003. D. Kaye, Loosely Coupled: The Missing Pieces of Web Services, California: RDS Press, Marin County, 2003. J. Gortmaker, M. Janssen, and R.W. Wagenaar, “The Advantages of Web Service Orchestration in Perspective”, Proc. of the ACM Sixth Intern. Conf. on Electronic Commerce (ICEC’04), pp. 506-515, 2004. S. Weerawarana, F. Curbera, F. Leymann, T. Storey, and D. Ferguson, Web Services Platform Architecture, NJ: Prentice Hall, Upper Saddle River,.2006. A Alves, et al., “Web Services Business Process Execution Language”, OASIS standard, vers. 2.0 , http://docs.oasis-open.org/wsbpel/2.0/OS/wsbpel-v2.0OS.html, April 2007. M.P. Papazoglou, and W.-J. v.d. Heuvel, “Business Process Development Life Cycle Methodology”, Commun. ACM 50, 10, 79-85, Oct. 2007. H.M. Teo, and W.M.N.W. Kadir, “A Comparative Study of Interface Design Approaches for Service-Oriented Software”, Proc. of the XIII Asia Pacific Software Engineering Conference (APSEC’06), 2006. Z. Maamar, D. Benslimane, and A. Anderson, “Using Policies to Manage Composite Web Services”, IT Professional 8, 5, pp. 47-51, Sep. 2006. Z. Maamar, D. Benslimane, and N. C. Narendra, “What can context do for web services?”, Commun. ACM 49, 12, 98-103, Dec. 2006. D. Martin, “Putting Web Services in Context”, Electronic Notes in Theoretical Computer Science, 146, pp. 3–16, Elsevier, 2006. A. Anderson, “Domain-Independent, Composable Web Services Policy Assertions”, Proc. of the 7th IEEE Inter. Workshop on Policies for Distributed Systems and Networks, 2006. F.-J. Lin, K. Huarng, Y.-M. Chen, and S.-M. Lin, “Quality Evaluation of Web Services”, Proc. of the IEEE Intern. Conf. on E-Commerce Technology for Dynamic E-Business (CEC-East’04), 2004. P. Katsaros, L. Angelis, and C. Lazos, “Performance and Effectiveness Tradeoff for Checkpoint in Fault Tolerant Distributed Systems”, Concurrency and Computation: Practice and Experience, Vol. 19(1), pp. 37-63, John Wiley & Sons, January 2007. S. Bajaj, et al., “Web Services Policy Framework (WS-Policy)”, Version 1.2, http://schemas.xmlsoap.org/ws/2004/09/policy, March 2006.
Image Enhancement Using Frame Extraction Through Time Elliott Coleshill University of Guelph CIS Guelph, Ont, Canada [email protected]
Dr. Alex Ferworn Ryerson University NCART Toronto, Ont, Canada [email protected]
Abstract – Image enhancement within machine vision systems has been performed a variety of ways with the end goal of producing a better scene for further processing. The majority of approaches for enhancing an image scene and producing higher quality images have been explored employing only a single image. An alternate approach, yet to be fully explored, is the use of image sequences utilizing multiple image frames of information to generate an enhanced image for processing within a machine vision system. This paper describes a new approach to image enhancement for controlling lighting characteristics within an image called Frame Extraction Through Time (FETT). Using multiple image frames of data, FETT can be used to perform a number of lighting enhancement tasks such as scene normalization, nighttime traffic surveillance, and shadow removal for video conferencing systems. Keywords: Traffic Monitoring, Vision Systems, Image Processing, Computer Vision
Introduction
Dr. Deborah Stacey University of Guelph CIS Guelph, Ont, Canada [email protected]
for generating a new image with superior lighting characteristics for post processing purposes.
The Theory of FETT Using a sequence of images taken through time from a video camera, a single image can be created of a scene with lighting characteristics which help to prevent failures in vision system detection algorithms. The use of FETT can be divided into the following main steps: acquiring images, pixel selection, and finally the generation of a superior image representing the scene with better lighting and contrast characteristics. Step #1: Acquiring Images A sequence of images is taken over time to provide the FETT algorithm with the changing lighting characteristics of the scene. The number of images required depends on the frequency of the change in lighting over the scene. The following is an example showing a sample of input images for FETT.
Lighting characteristics within an image scene are often a major source of problems within machine vision systems today. Most vision systems will discard an image [1], which has poor lighting, and take a new version after “x” amount of time. Other vision systems attempt to adjust the lighting within the image by using techniques such as histogram equalization and lighting normalization [2][3]. The difficulty with these and similar approaches is that they tend to lose key details and information required for post processing techniques such as object or edge detection. One of the main reasons for loss of detail and information is due to the lack of data provided to the system. Most approaches employ only a single image for processing. There has been some investigation into using multiple images, however, this has been mainly in the area of background subtraction and motion detection [4]. The new FETT design employs multiple images providing more data Figure 1: Lighting Sequence Example
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 131–135. © Springer Science+Business Media B.V. 2008
132
COLESHILL ET AL.
φ = (σ > ϖ < τ ) ≅ ς
Step #2: Pixel Selection Average pixel intensity values for each input image are used to generate a scale required to extract the necessary pixel intensities. Each of the input images i through n are aligned and a grid defined where point [1,1] on image 1 is equal to the same pixel position of the scene in image n . A scale is then generated based on the pixel intensities using the following algorithm:
ϖ = ∑ R, G , B
(1)
n
ψ=
∑ϖ i =0
(w × h )
(2)
Using (1) for each pixel position within an image the Red, Green, and Blue pixel intensities (R, G , B ) are summed up and used as the true pixel value ϖ for that pixel position. The next step is to calculate the average ψ of the current pixel position for all the input images provided using equation (2). This is done by summing up all the calculated pixel values across the input sequence and dividing it by the size of the scene where w is the width of the scene in pixels and h is the height in pixels.
(4)
The selected pixel (φ ) is then copied into the final image using the same grid position as defined in the input image.
Testing Results A series of controlled and uncontrolled datasets were generated to test and verify the new FETT theory. These datasets contained image sequences and videos from simple scenes of individual objects with controlled dynamic light transitioning across the object to real world applications such as space-based images, traffic surveillance, and video conferencing where lighting changes cannot be controlled. Dynamic light, as defined herein, is bright light introduced within the scene that causes the camera to saturate and features to be lost. In order to measure the success of the FETT algorithm a Root Mean Square Error (RMSE) algorithm and Histogram comparison were used. For each of the controlled test cases a picture was taken with standard lighting (i.e. no introduced dynamic light) called the “Normal”. This Normal image was used as the “ground truth” for comparison measurement. Figure 2 below provides an example of a “Normal” image (b) against sample input images (a).
Using the average input intensity calculated above, a scale is generated by selecting the high (τ ) and low (σ ) pixel values and then calculating the midpoint by using equation (3).
ς=
(τ − σ ) n
(ς ) of the scale (3)
Step #3: Generation of FETT Image The final step of the FETT algorithm is to extract all the “good” information from the input image sequences and reconstruct a new optimized image of the scene with better lighting characteristics. This process is accomplished by selecting the best pixel (φ ) within the input sequence
{ϖ i ..ϖ n } which (ς ) :
best represents the calculated midpoint
Figure 2: Normal Example The RMSE was calculated for the average pixel intensity across the input images and final FETT Optimized image and compared. From the graph in Figure 3 one can see that the RMSE value calculated for the FETT Optimized images
IMAGE ENHANCEMENT USING FRAME EXTRACTION THROUGH TIME
133
are greater than the average of the input images for most cases. For the cases where the FETT Optimized RMSE was less than the input average it was determined that shadowing effects introduced during the FETT process were reducing the average pixel intensity values. However, all dynamic lighting was removed proving that the FETT Optimized image average pixel intensity was closer to the Normal than the average of the input images.
Figure 5: Histogram Standard Deviation Graph A simple review of the actual histogram pixel plot also shows the improvement in the overall scene lighting. Figure 6 contains Histogram graphs of a single input image (a), and the FETT Optimized image (b). From these one can see the spike of pixels at the white end of the scale is removed from the FETT Optimized image. Figure 3: RMSE Graph A comparison of the image histograms was also performed. The Normal, FETT Optimized, and one of the input images were used to compare the pixel graph, mean and standard deviation. Figure 4 and 5 below show the graphed pixel intensity mean and standard deviations for a sample set of the controlled dataset. Again, one can see that the overall mean and standard deviation of the intensity in the FETT Optimized image becomes closer to the value of the Normal image showing that the dynamic light is being filtered out and the scene is being reconstructed to become similar to that of the Normal.
Figure 6: Histogram Pixel Graphs
Figure 4: Histogram Mean Graph
Using the Histogram pixel plots one can also see the final FETT Optimized image plot is merging towards the Normal image plot. Figure 7 provides another test case showing the Histogram plot of the Normal (b) against the plot of a FETT Optimized image (a). With the exception the extra pixels
134
COLESHILL ET AL.
being distributed and replaced with better represented pixels, the overall plot is very similar to the Normal demonstrating the FETT algorithm trying to recreate the “Normal” image with the information in the image sequence.
When saturated with light, the SVS target dots are undetectable when using edge detection, as seen in Figure 9 (a). Using a sequence of images over time, FETT was used as a pre-processing tool to produce a view with better lighting and contrast characteristics in order to detect the SVS target dots correctly. As one can see in Figure 9 (b), all the faintly detected targets in (a) are represented better.
Figure 7: FETT vs. Normal Histograms Due to the nature of the uncontrolled dataset a Normal image could not be produced. Therefore standard edge detection was used to demonstrate the effects of the FETT Theory as a pre-processing tool. Edge detection was performed on a single input image and the FETT Optimized image. Onboard the International Space Station, there are several black and white dots (Figure 8) strategically placed around the structure. These dots are used as part of the Space Vision System (SVS) for determining position and orientation of modules during installation.
Figure 9: SVS Edge Detection Example Note: Picture shown in the inverse for better visibility.
Applicability and Future Enhancements
Figure 8: SVS Target Dots
To date, the FETT algorithm has been tested in multiple application domains such as target detection on the International Space Station [5], Traffic Surveillance [6], and Video Conferencing systems [7]. In all these application domains we have successfully proven that FETT can remove dynamic lighting and reconstruct targets correctly, reconstruct nighttime traffic in a daytime setting, and remove presenter’s shadows cast onto presentation material. However, all these test cases we have assumed no motion of the video camera. In the future we plan to enhance the FETT Algorithm to include repositioned cameras, multiple cameras at different angles, as well as moving objects within the scene.
IMAGE ENHANCEMENT USING FRAME EXTRACTION THROUGH TIME
Conclusion This paper has described a novel method for image enhancement using a sequence of images through time to generate a view with better lighting and contrast characteristics. With the use of Frame Extraction Through Time (FETT) dynamic lighting information can be extracted and a new image reconstructed to help reduce lighting as a pre-processing step for machine vision system applications without the loss of detail and information within the scene.
References [1] Steven Miller. “Eye in the Sky” Engineering Dimensions, November-December 1999. [2] Kim, Y.-T., Contrast Enhancement Using Brightness Preserving Bi-Histogram Equalization. IEEE - Consumer Electronics, 1997. 43(1): p. 1-8 [3] Y. Matsushita, K.N., K. Ikeuchi, and M. Sakauchi, Illumination Normalization with Time-Dependant Intrinsic Images for Video Surveillance. IEEE - Pattern Analysis and Machine Intelligence, 2004. 26(10): p. 1336-1347.
135
[4] Peleg, M.I.a.S., Motion Analysis for Image Enhancement: Resolution, Occlusion, and Transparency. Visual Communication and Image Restoration, 1993. 4(4): p. 324-335 [5] (James) Elliott Coleshill, Dr. Alex Ferworn, Dr. Deborah Stacey, “Feature Extraction Through Time” , 57th International Astronautical Congress, IAC-06-B4.4.03, Valencia Spain, Oct 2-6, 2006. [6] Elliott Coleshill, Dr. Alex Ferworn, Dr. Deborah Stacey, “Traffic Safety using Frame Extraction Through Time”, SoSE 2007, April 16-18, 2007, San Antonio, TX, USA. [7] Elliott Coleshill, Dr. Alex Ferworn, Dr. Deborah Stacey, “Obstruction Removal using Feature Extraction Through Time for Video Conferencing Processing”, CISSE 2006, Dec 4-14, 2006, Online.
A Comparison of Software for Architectural Simulation of Natural Light Evangelos Christakou1 and Neander Silva1 1
Postgraduate Program, School of Architecture - University of Brasília, DF, Brazil
ABSTRACT This paper reports a study with daylighting simulation in the architectural design process, through the evaluation of two simulation tools – ECOTECT 5.5 (as interface to RADIANCE) and RELUX2006 VISION. This study was developed from the architect’s point of view and in the context of the design process. The evaluation was conducted taking into account criteria such as User Interface, Geometry, Output, Daylight Parameters, Material Description, Processing, Validation, and User Support. Among these criteria, User Interface is especially important for the architect. It is the best way to overcome the natural barriers in using digital processes in architectural design, with the objective of reaching environmental comfort and energy efficiency. The methodology used includes a large number of simulations in tropical climate, aiming at testing the simulation tools in different daylight conditions in Brazil. The results of this evaluation show a great potential for improving the use of simulation tools in the design process, through a better understanding of these tools.
KEYWORDS daylighting, computer simulation, architectural design
INTRODUCTION The possibility to visualize internal spaces before their effective construction has always been every architect’s dream. Therefore, computational visualization with synthetic images is a powerful tool, which effectively helps the architect’s design in many ways. Among them, we can mention the possibility of understanding and optimizing the architectural proposal from the point of view of daylighting (DL) use. DL prediction can contribute to environmental comfort and energy efficiency optimization in architecture. Reference [1] states that only visualizations of the space that are supported by physically based computational models of lighting, and that contain real life radiometric values, are relevant, when thinking about architectural design that takes DL into consideration. Most kinds of visualization software generate false impressions instead of an image that precisely represents the architectural space. One reason for this inaccuracy is the incorrect representation of the light, especially DL. Some DL simulation software use innovative algorithms and they emphasize the precision required by the simulation of the architectural space, under various lighting conditions, such as the ones that are analyzed in this paper.
However, architects usually find it difficult to use these tools, due to their complexity, the lack of user friendly interfaces, incomplete user manuals, and so on. For this reason, the use of simulation software in architectural design is still not common, especially in the Brazilian context [2]. DL implies a thermal load. As result, in the context of the Brazilian climate, predominantly hot, the greatest challenge, regarding energy efficiency and environmental comfort, is to find the balance between the optimization of DL and the decrease in the refrigeration load [3]. To overcome these challenges, the architect must use tools that are precise and, at the same time, interactive, to evaluate the lighting choices or solutions throughout the architectural design process. Therefore, DL simulation software can offer these facilities and should support architects in many steps of the design process, from the conception to the implementation of daylighting strategies and innovative techniques in real spaces [4]. Therefore, this paper reports a study of the computational simulation of DL applied to the architectural design, through the evaluation of two simulation programs – ECOTECT 5.5 (as interface to RADIANCE) and RELUX2006 VISION. Thus, it aims at promoting a better understanding of the computational simulation of DL by means of the analysis and evaluation of the tools, in accordance with the architect’s point of view. The results of this evaluation show a great potential for a better use of simulation tools by architects, through a better understanding of these tools [5].
OBJECTIVES Our objectives were: •
To establish a general picture of the use of DL simulation software in the various phases of the architectural design;
•
To establish an evaluation methodology of different kinds of DL simulation software, from the architect’s point of view;
•
To determine the differences between and evaluate these software that were chosen, considering the adequate parameters to assess DL and its use by the architect;
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 136–141. © Springer Science+Business Media B.V. 2008
A COMPARISON OF SOFTWARE FOR ARCHITECTURAL SIMULATION OF NATURAL LIGHT
•
To identify possible problems of these tools in the context of DL simulation in the architectonic project;
•
To help encourage the use of computational DL simulation in Brazil.
METHODOLOGICAL PROCEDURES The methodology of this research was based on the following phases:(figure 5): •
Study of the main existing DL simulation software and their applicability in architectural design;
•
Selection of two simulation programs to be evaluated;
•
Study of previous papers about the same topic;
•
Definition of the evaluation criteria of the software chosen;
•
Definition of the geometric model used to make simulations, called standard-space;
•
Definition of the evaluation conditions of the simulations and their score;
•
Planning and execution of the simulations;
•
Analysis of the simulations by using a matrix, with the evaluation criteria selected;
The architect’s point of view was essential for the selection of the evaluation criteria which were developed based on previous studies on the subject of DL simulation software, and considered to be the state of the art.
RELATED PAPERS Mathematical models used in Computer Graphics (CG) to generate synthetic images (from 3D models), aiming at predicting DL, are based on Global Illumination models to evaluate all the contributions from direct and indirect sources in light transportation. Among these models, which are physically fundamented, there are basically two types: scenebased methods – independent from the observer’s point of view (e.g. Radiosity) – and image-based methods, dependent on the observer’s point of view (e.g. Ray Tracing). Both of them can produce numerical outputs and high quality images, so as to have quantitative and qualitative evaluations of DL in architectural spaces. In previous studies we found in comparisons among various kinds of simulation software. Most of them, however, were interested in checking out the accuracy of the software and the output resulting from the simulation, the synthetic image. No attention was given to the applicability aspects of the software in question, in the process of architectural design. A list of the most important papers studied:
137
Reference [6] used the KIMBELL Art Museum, designed by L. KHAN, as the basis for their work: to experiment the physically based simulation software, LIGHTSCAPE and RADIANCE. Their conclusions demonstrate that: The algorithms used by RADIANCE are superior to the ones used by LIGHTSCAPE when they calculate the transfer of light to surfaces that are not ideally diffuse. The solution obtained by the algorithm Radiosity is inadequate when it works with the indirect light that comes through the ceiling, for example, where the real materials deviate themselves from Lambert’s ideal characteristics. LIGHTSCAPE’s interface is friendlier and RADIANCE has extremely evident limitations in this aspect. On the other hand, RADIANCE, due to its modular structure and its open code, allows the addition of external modules, which in turn allow an infinite number of options for the most diverse tasks. Reference [6] concluded that both programs show vast possibilities for professional use. Reference [7] made an experimental comparison of two methods used to calculate the distribution of global lighting: the deterministic method Radiosity and the Monte Carlo method (stochastic) – based on standard (foward)Ray-tracing. The programs used were the LIGHSTCAPE and the TBT (Turbo Beam Tracing) – software developed by INTEGRA. The comparison was made in a simple scene, a cube, in which the theoretical results were accessible; complex scenes of architectonic spaces were also used. The criteria used were: time spent for simulation, image precision, and their flaws were also discussed – known as artifacts – in the synthesized images.The results showed that: The Ray Tracing - Monte Carlo algorithm is adequate for technical applications – basically interested in the numerical results of the simulation. Considering the visual quality of the images, the deterministic algorithm Radiosity was better in terms of the images generated, they were smoother. On the other hand, the Ray Tracing Monte Carlo algorithm presented other advantages. We can highlight the fact that it contains reflection models with a wider scope as, for example, cuverd refractors and reflectors, and non-Lambertian diffusers. It also allows greater control over the accuracy of the simulation and visualization of intermediary results, at any given moment. Reference [7] believes that these are all strong arguments to deem the Ray Tracing - Monte Carlo algorithm the main method of Global illumination analysis. Reference [8] discusses the technical characteristics of physically based computational simulation in the research of lighting in architecture. The potential and limitations of the software LIGHTSCAPE, DESKTOP RADIANCE and RADIANCE SIS (Synthetic Imaging System) are evaluated in terms of inputs, algorithms, outputs and analysis. The paper
138
CHRISTAKOU AND SILVA
Walls: White paint Floor: Gray granite Ceiling: White paint Light shelf: White paint Clear glass - 8 mm Furniture:
EVALUATION CRITERIA Two simulation programs were selected among all simulation software available at LACAM, the Laboratory of Ambient Control (University of Brasilia): ECOTECT 5.5 and RELUX2006 VISION. The selection criteria took into account, specially: •
Flexibility, as regards its possible adaptability to the architect’s workflow;
•
Using state of the art algorithms;
•
Numerical accuracy (validation);
•
Possibility of access by a Brazilian architect.
The criteria were established based on research done using papers written previously, such as [6], [7], [8] and [9]. The criteria adopted and the respective weigh of the evaluation of the simulation software, considering the flow of work and its importance, are as follows (table 1): Table 1: Evaluation criteria and weigh 1 - Modeling - Input of Geometry 2 - User Interface 3 - Output 4 - Daylighting Parameters Inputs 5 - Optical Properties of the surfaces 6 - Processing - Efficiency of simulations 7 - Validation - Precision of output 8 - Support to the user
1 2 1 1 1 1 1 1
SIMULATIONS AND MODEL DATA Describing model`s geometry: a parallelogram shaped with dimensions of 6.60 x 3.30 meters. A central window with 1.65 x 1.65 meters and a light-shelf measuring 1.65 x 0.40 meters, at 2.20 meters above the floor. The ceiling is 2.75 meters high and measures 9.60 x 6.30 meters. Materials properties (table 2) and climate conditions of Brasilia, Brazil (table 3) as follows:
Roughness
Specularity
Surfaces
Transmittance
Reference [9] made comparisons whose main interest encompasses the technical accuracy of the DL simulation software, and also the ease to use it, for a user who does not have any technical computer expertise. In order to do this, he developed a criteria methodology for the analysis and comparison of DL simulation software. In broad terms, he concluded that there is no ideal software, since all of them have their own limitations and qualities. Nevertheless, among all of them, only the RADIANCE stands out in all comparative scientific papers, and it seems to be trustworthy in all circumstances.
Optical properties of materials applied to surfaces Table 2: Optical properties of materials
Reflectance
talks about the software from the researcher’s point of view, in search of a future implementation of the simulators, in specific architecture software, but not focusing on the question of the interface.
65,6% 0% 0% 0% 35% 0% 0% 0% 68% 0% 0% 0% 65,6% 0% 2,68% 1% 82% 89,9% Table: beige, brown and metallic colors. Chair: orange and metallic colors.
Climate and daylighting data Table 3: Simulations site: Brasilia-Brazil Latitude Longitude Altitude
15º 52’ S 47º 55’ W 1060 m
Sky conditions March/September June December
Cloudy Sky - var. 3,000 to100,600 lux Clear Sky - var. 8,000 to 90,000 lux Cloudy Sky - var. 3,000 to 20,600 lux
DISCUSSION This article presents DL simulation in the architectural design, through the evaluation of two simulation tools Relux2006 Vision and Ecotect 5.5 - from the architect’s point of view and in the context of the design process. Accordingly, it intends to contribute in offering a better understanding of DL simulation with the analysis and evaluation of the software mentioned above. The evaluation was carried out taking into account criteria like User Interface, Geometry, Output, Daylight Parameters, Material Description, Processing, Validation and User Support. Among these criteria, User Interface is especially important for the architect, since it is the best way to overcome the natural obstacles to using digital processes in architectural design, aiming at reaching environmental comfort and energy efficiency. The results of this evaluation show great potential in promoting improved use of simulation tools by architects, because they promote a better understanding of these tools. In order to evaluate simulation programs, a 3D model (standard space) of a very simple space was built with precise characteristics and located in the city of Brasilia. This model was used, with the objective of studying the performance of these tools in daylighting prediction in architectural design.
A COMPARISON OF SOFTWARE FOR ARCHITECTURAL SIMULATION OF NATURAL LIGHT
139
This evaluation, following the criteria previously established generated these results: RELUX2006 VISION – 84 points and ECOTECT – 67 points. (Table 4) Here is a brief discussion of the qualities, limitations, difficulties and errors that were possible to identify. ECOTECT (as interface to RADIANCE) and RELUX2006 VISION are based on the calculation engine of RADIANCE SIS – validated and precise algorithm. They produced synthetic images in the simulation of the standard space with qualitative and quantitative consistency. There are little variations between their results, which is a result of adjustments and parameters available in the interfaces of each one. RELUX 2006 VISION, according to its manufacturer, makes use of a calculation engine based on RADIANCE SIS, but it is slightly different. It also uses the Radiosity algorithm for limited calculations of the diffuse lighting generated by an overcast sky. Table 4 Evaluation score
1 - Geometry input 2 - User interface 3 - Output 4 - Daylighting parameters 5 - Optical properties of surfaces 6 - Processing / Efficiency of simulations 7 - Validation 8 - User support TOTAL (points)
ECOTECT
RELUX VISION
6 26 21 3 4
-2 62 18 3 4
1
1
2 4 67
2 -4 84
RELUX VISION – (Figures 1 and 2) The user interface has good interaction with the architect’s workflow, which makes learning it easier, besides being in Portuguese language. It has a large number of possibilities for qualitative and quantitative analyses of the behavior of the light in the architectural space. It uses graphical and numerical outputs for almost all the necessary parameters for this task. RELUX VISION got 84 points in its assessment. Its strong point being its interface, and its relatively worse performance happens during “User support” and “Geometry input”.
Figure 2: RELUX 2006 luminances ECOTECT – . (Figures 3 and 4) The simulation software combines edition and modeling tools, besides luminous evaluation and thermal consequences of daylight use. It also considers the architectural project as a combination of various factors, giving the architect the chance to integrate knowledge by using accessible digital processes.
Figure 3: ECOTECT luminances isocontour
Figure 4: ECOTECT iluminances-iso contour Among the programs analyzed, RELUX2006 VISION is probably the most adequate for an architect to use; ECOTECT has a user-friendly interface, but it is not as intuitive. RELUX VISION combines the accuracy of the calculation engine with a correct interface, which is userfriendly and extremely simple. The interaction with the user, an architect, not a specialist in CG, is very simple. For this user, who intends to use the tool as a support of the design process of the architectural space, it provides easy learning and applicability, without excluding the possibility of correct calculations and precise synthetic images. Its applicability in the design process is sufficiently promising; therefore it has all the conditions and capacity to support the various phases of the architectural design.
CONCLUSION Figure 1: RELUX output
DL simulation in architectural design is linked to environmental comfort studies and energy conservation
140
CHRISTAKOU AND SILVA
strategies. The difficulty in finding simple methods or a suitable tool for the architect is the main obstacle to the use of simulation in the design process. In order to evaluate simulation programs, a 3D model of a simple space with precise characteristics, and which is located in Brasilia, was used with the objective of studying the performance of these tools in daylighting prediction in architectural projects. The evaluation demonstrated the need for the improvement of DL simulation software, especially as regards user interface. The methodology adopted for evaluating the software proved to be adequate. Finally, we can observe that there is great potential in the use of simulation software in the architectural design process. Computational tools – simulation software – must make it easy for architects to use them. On the other hand, architects must be adequately prepared to deal with these tools. This will certainly contribute to produce a new generation of architects, who are aware of the importance of DL and energy efficiency concepts and able to deal with new technologies in their design.
ACKNOWLEDGMENT This paper was made possible by the support of CAPES/Ministry of the Education/Brazil.
REFERENCES [1]ERLICH, Charles K. “Computer Aided Perception: A Method to Evaluate the Representation of Glare in Computer Graphics Imagery”, 2002. Masters Dissertation – University of California, Berkley, USA, 2002 [2]AMORIM, Claudia N.D. “Iluminação natural, conforto ambiental e eficiência energética no projeto de arquitetura: estratégias projetuais e tecnológicas para climas tropicais”. Brasilia. Research Project (Edital CT Energ/CNPq 01/2003), Brasilia, 2003 [3]MENDES, Nathan, LAMBERTS, Roberto and NETO, José A. B. C. “Building Simulation in Brazil” in IBPSA, August, 2001 [4]INTERNATIONAL ENERGY AGENCY. IEA, Daylight in Buildings: A Source Book on Daylighting Systems and Components: A Report of Iea Task 21/ Ecbcs, Annex 29, July, 2000 [5]CHRISTAKOU, Evangelos D. “A simulação computacional da luz natural aplicada ao projeto de arquitetura”, Masters Dissertation, Brasilia, DF, Brazil, 2004 [6]ALTMAN, Kurt and APIAN-BENNEWITZ, Peter, “Report on an Investigation of the Application and
Limits of Currently Available Programme Types for Photorealistic Rendering of Light and Lighting in Architecture” - The Kimbell Art Museum as a Case Study for Lightscape, Radiance and 3D-Studio MAX April 7, 2001 [7]KHOUDOLEV, Andrei. “Comparison of Two Methods of Global Illumination Analysis”. Keldysh Institute of Applied Mathematics, Russian Academy of Sciences Moscow, 1996 [8]INANICI, Mehlika N. “Application of the State-of-theArt Computer Simulation and Visualization in Architectural Lighting Research”. In: Building Simulation - Seventh International IBPSA Conference, Rio de Janeiro, Brazil, August, 2001 [9]ROY, Geoffrey G. “A Comparative Study of Lighting Simulation Packages Suitable for Use in Architectural Design”. [S.L.]: Murdoch University, Australia, 2000. http://wwweng.murdoch.edu.au/FTPsite/daylighting.html
A COMPARISON OF SOFTWARE FOR ARCHITECTURAL SIMULATION OF NATURAL LIGHT
Figure5: Methodology workflow (CHRISTAKOU, 2004) .
141
Vehicle Recognition Using Curvelet Transform and Thresholding Farhad Mohamad Kazemi¹, Hamid Reza Pourreza ², Reihaneh Moravejian ³, Ehsan Mohamad Kazemi 4 ¹ Young Researchers Club, Islamic Azad University, Mashad, IRAN, ² Computer Department Ferdowsi University Mashad, IRAN, 3,4 Young Researchers Club, Islamic Azad University, Mashad, IRAN [email protected], [email protected], [email protected], [email protected]
Abstract- This paper proposes the performance of a new algorithm for recognition vehicle’s system. This recognition system is based on extracted features on the performance of image’s curvelet transform & achieving standard deviation of curvelet coefficients matrix in different scales & various orientations. The curvelet transform is a multiscale transform with frame elements indexed by location, scale and orientation parameters, and have time-frequency localization properties of wavelets but also shows a very high degree of directionality and anisotropy.The used classifier in this paper is called k nearestneighbor.In addition, the proposed recognition system is obtained by using different scales information as feature vector. So, we could clarify the most important scales in aspect of having useful information. The results of this test show, the right recognition rate of vehicle’s model in this recognition system, at the time of using the total scales information numbers 2,3&4 curvelet coefficients matrix is about 95%. We’ve gathered a data set that includes of 300 images from 5 different classes of vehicles. These 5 classes of vehicles include of: PEUGEOT 206, PEUGEOT 405, Pride, RENULT5 and Peykan. We’ve examined 230 pictures as our train data set and 70 pictures as our test data set.
I.
INTRODUCTION
Recently, vehicle based access control systems for buildings, outdoor sites and even housing estates have became commonplace. Additionally , various traffic monitoring and control systems that depend on user (man+ vehicle) identification , such as congestion charging would also benefit by augmenting existing number-plate recognition with an additional authentication mechanism. Given an image containing a backward view of a vehicle (car), a system is proposed here that determines its exact class (model). The aim is to obtain reliable classification of a vehicle in the image from a multitude of possible classes (vehicle types) using a limited number of prior examples. Although classification of road going vehicles has been a subject of interest in the past, e.g. traffic control systems and toll levy automation, vehicle type recognition has not hitherto been considered at this level of accuracy. Instead, Most of the systems either detect (classify vehicle or background) or classify vehicles in broad categories such as cars, buses, heavy goods vehicles (HGVs) etc. [5, 2, 6, 7, 3, 1]. Kato et. al. [5] proposes a vehicle detection and classification method based on the multi-clustered modified quadratic discriminant function (MC-MQDF) that is reported to exhibit high levels of detection of a range of vehicle types against road environment
backgrounds. For the same application, Matthews et. al. [7] proposes a region of interest designator based on simple, horizontal and vertical edge responses and shadow detection followed by a Principal Component Analysis (PCA) feature extractor. Features are input into a multi-layer perceptron (MLP) network that discriminates between vehicle and background [7]. Parameterized 3D deformable models of vehicle structure are another approach used in classification between broad categories of vehicles [2, 6, 3]. Ferryman et. al. [2] uses a PCA description of the manually sampled geometric data to define a deformable model of a vehicles 3D structure. By fitting this model to an image, both the pose and structure of the vehicle can be recovered and used to discriminate between the different vehicle categories. An extension of this approach uses MLP networks to perform the classification based on the model parameters [3]. A similar approach is adopted in Lai et. al. [6] where a deformable model is fit to the image in order to obtain vehicle dimensions, which are the basis for discrimination between vehicle categories. A related approach using deformable templates for vehicle segmentation and recognition is employed in [1]. V.S.Petrovic and T.F.Cootes [8] demonstrate that a relatively simple set of features extracted from sections of car’s frontal images can be used to obtain good performance verification and recognition of vehicle type. The recognition system proposed in this case is based on recognizing rigid structure samples obtained using specific feature extraction techniques from an image of the object (vehicle). Recognition is initiated through an algorithm that locates a reference segment on the object, in this case the front number plate. The location and scale of this segment is used as reference to define a region of interest in the image from which the structure is sampled. A number of feature extraction algorithms that perform this task, including direct and statistical mapping methods are investigated. Feature vectors are finally classified using simple nearest neighbor classification. Louka Dlagnekov [9] developed an LPR (License Plate Recognition) system for achieving a high recognition rate without needing a high quality video signal from expensive hardware. He also explored the problem of car make and model recognition for purposes of searching surveillance video archives for a partial license plate number combined with some visual description of a car. His proposed methods will provide
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 142–146. © Springer Science+Business Media B.V. 2008
VEHICLE RECOGNITION USING CURVELET TRANSFORM AND THRESHOLDING
valuable situational information for law enforcement units in a variety of civil infrastructures. Farhad M.kazemi et. al proposed the application of 3 different kinds of feature extractors to recognize & classify 5 models of vehicles. These feature extractors contain of Fast Fourier transform, discrete wavelet transform & discrete curvelet transform. The results of this test show, that the right recognition rate of vehicle’s model in this recognition system, at the time of using curvelet transform (Notice, all curvelet coefficients) is 100%[15,16]. The proposed recognition system in this paper is based on a new transform by the name of Fast Curvelet Transform & Statistic parameter of Standard deviation. While fourier analysis works well on periodic structures (such as textures), and wavelet analysis works well on singularities (such as corners), neither particularly can reconstruct edges in a sparse matter. Curvelet were originally introduced in [4] as a non-adaptive transform that achieves near optimal m-term approximation rates in for twicecontinuously differentiable curves . The performance rates for the Curvelet Transform are quite good. In this paper, we could successfully use this new Transform to obtain invariant features. As it is proved, the kinds of features and the sizes of feature vectors are so important in our recognizing process. On the other hand, Because of existing a large number of curvelet coefficients from backward view of vehicles if these coefficients be performed to our classifier, the speed of our recognition system will be decreased. For passing this problem, we perform, the standard deviation on curvelet coefficients in different angles & scales in its local way. As a result of this performance, the size of feature vector will be extremely decreased. Then, we perform our final feature vector to the Knn Classifier. Our achieved results show that, the construction of Fast Curvelet Transform, standard deviation and Knn classifier can provide us a more suitable vehicle recognition system. As there is no any standard data set for this project, we’ve gathered a data set that includes of 300 images from 5 different classes of vehicles. These 5 classes of vehicles includes of: PEUGEOT 206, PEUGEOT 405, Pride, RENULT5 and Peykan. We’ve examined 2z30 pictures as our train data set and 70 pictures as our test data set. II. CURVELET TRANSFORM The Continuous Curvelet Transform has gone through two major revisions. The first Continuous Curvelet Transform [4] (commonly referred to as the “Curvelet ‘99” transform now) used a complex series of steps involving the ridgelet analysis of the radon transform of an image. The Performance was exceedingly slow. The algorithm was updated in 2003 in [10]. The use of the Ridgelet Transform was discarded, thus reducing the amount of redundancy in the transform and increasing the speed considerably. In this new method, an approach of curvelets as tight frames is taken. Using tight
143
frames, an individual curvelet has frequency support in a parabolic-wedge area of the frequency domain (As seen in Figure 1.). Using the theoretical basis in [10] (where the continuous curvelet transform is created), two separated digital (or discrete) curvelet transform (DCT) algorithms are introduced in [13]. The first algorithm is the Unequispaced FFT Transform, where the curvelet coefficients are found by irregularly sampling the fourier coefficients of an image. The second algorithm is the the Wrapping transform, using a series of translations and a wraparound technique. Both algorithms having the same output, but the Wrapping Algorithm gives both a more intuitive algorithm and faster computation time. Because of this, the Unequispaced FFT method will be ignored in this paper with focus solely on the Wrapping DCT method. The curvelet transform is a multiscale transform with frame elements indexed by location, scale and orientation parameters, and have time-frequency localization properties of wavelets but also shows a very high degree of directionality and anisotropy. More precisely, we here use a new tight frame of curvelets recently developed in [10].
f − f mF
2 L2
≅ O (m
−1 2 )
(4)
Wavelet Approximation: f − f mW
2 L2
≅ O(m −1 )
(5)
Curvelet Approximation: f − f mC
2 L2
≅ O((log m)3 m − 2 )
(6)
As seen from the m-term approximations, the Curvelet Transform offers the closest m-term approximation to the lower bound. Therefore, in images with a large number of C 2 curves (i.e. an image with a great number of long edges), it would be advantageous to use the Curvelet Algorithm.
144
KAZEMI ET AL.
III. THE PROPOSED RECOGNITION METHOD In this paper, we’ve proposed an algorithm with using threshold & another one without. A. The proposed algorithm without using threshold: a. Normalization: The size of all test & train images must be normalized to 128*128 pixels. b. Feature extraction: For gaining the best feature vectors from the backward view of each vehicle, At first, we perform Fast Curvelet Transform on each of normalized images. As a result of performing Fast Curvelet Transform, curvelet coefficients in different 4 scales and various angles, will be obtained. In this stage, we can imagine curvelet coefficients instead of feature vectors, Despite of large number of features. The obtained curvelet coefficients for each picture are numeric. So, according to evidences, performing all curvelet coefficients to classifiers is not suitable. For extracting the best features, and also decreasing the size of feature vector for each picture, we use statistic parameter of standard deviation. Thus, we obtain local standard deviations in scales & various angles of curvelet coefficients matrix. As a result of this action, The obtained number of features from each image, will be decreased from 100000 to 81 features. After passing this procedure, The related feature vector in each picture enter to the related classifier. The achieved consequences of this algorithm are discussed more in experimental result section. c. Classification: The used classifier in this algorithm is called k nearestneighbor. All the results of acting this classifier to the obtained feature vectors are proposed in the next part [14]. B. The proposed algorithm with using threshold: After performing Fast Curvelet Transform on each picture by means of a threshold method, we keep 10 percent of curvelet coefficients and make the other coefficients null. Then, exactly as like as the first algorithm we obtain standard deviation of related curvelet coefficients matrix on each picture and eventually, enter the feature vectors to the related classifier. The obtained consequences of this algorithm are proposed in experimental results section. Experimental results For testing the right performance of our proposed algorithm, we’ve examined 5 common classes of vehicles in Iran ,that include of PEUGEOT 206,PEUGEOT 405,pride, RENULT5 and Peykan. Our total data set includes of 300 images from the backward view of the mentioned vehicles. Our training data set includes of 230 images & our test data set includes of 70 images. All the results are written below, are suggested by the first algorithm.(Notice, curvelet coefficients are gained in 4 scales.) Also, As it’s discussed, we used the first algorithm by means of a threshold as the second algorithm. The obtained consequences from the second policy are proposed in the table below:
Image from the backward view of the vehicle
local standard deviations in scales & various angles of curvelet coefficients matrix
normalization Curvelet Transform
Standard Deviation
K nearest-neighbour Classifier Fig. 2. Recognition process block diagram
The recognition rates of proposed system, when we kept 10%, 50% and 80% of curvelet coefficients and made the others zero, are achieved in table 2. According to table 2, when we kept 10% of curvelet coefficients and made the others zero, our recognition percentages in comparison with using the first algorithm wasn’t changed a lot.(specially, when we performed the related curvelet coefficients to 1,2,3 scales.) These consequences show, using of all curvelet coefficients is not always necessary, and we can use some of the coefficients with the most value as feature vector. Now this question is provided that, which scales is the important one in aspect of useful information? According to table 1, when we use the information of scale 4 for achieving feature vector, our recognition rate got better. So, the scale 4 in aspect of having useful information is the best scale. According to table 1, scales in aspect of having useful information are in order, scale 4,3,2,1. Another question is provided that, if we use different scale’s information, our recognition rate will be better or no? Which of these scales for producing feature vector is better to be used? Table 1 The proposed algorithm without using threshold Scale
Recognition rate without threshold
1 2 3 4 1,2 2,3 3,4 2,3,4 1,2,3 1,2,3,4
49% 76% 81% 84% 81% 92% 73% 95% 86% 86%
145
VEHICLE RECOGNITION USING CURVELET TRANSFORM AND THRESHOLDING
Table 2 The proposed algorithm with using threshold Scale
As a response to all these questions & proposed results in table 1and 2,we can say, if we use scale’s information numbers 2,3 and 4 our recognition rate will be improved & better. One of the reasons of this result can be using of image’s information in different size partitions & various scales. Notice, in scale 1 because of existing larger partitions that include general image’s information and no details, there is a little useful information.
1 2 3 4 1,2 2,3 3,4 2,3,4 1,2,3 1,2,3,4
Recognition rate with threshold .1 49% 76% 81% 77% 81% 91% 70% 95% 85% 85%
Recognition rate with threshold .5 49% 76% 81% 81% 81% 92% 73% 95% 86% 86%
Recognition rate with threshold .8 49% 76% 81% 84% 81% 92% 73% 95% 86% 86%
IV. CONCLUSION In this paper, we showed to obtain feature vectors for creating a recognition vehicle system, we can use the backward view of vehicles. Also in this paper, A new algorithm for recognizing vehicle’s model with two approach is proposed. REFERENCES
Figure 3. PEUGEOT 405, PEUGEOT 206, RENULT5 , pride, and Peykan.
In the first approach, curvelet coefficients for creating feature vector are used. And, in the second approach, by performing a threshold, we’ve used the larger curvelet coefficients for creating feature vector. And eventually, according to the results, we understood, for providing a recognition system, using of all curvelet coefficients is not necessarily needed
[1] MP. Dubuisson-Jolly, S. Lakshmanan, and A. Jain. Vehicle segmentation and classification using deformable templates. IEEE Transactions Pattern Analysis and Machine Intelligence, 18(3):293– 308, 1996. [2] J. Ferryman, A . Worral, G. Sulliva, and K. Baker.A generic deformable model for vehicle recognition. In British Machine Vision Conference, pages 127–136. British Machine Vision Association, 1995. [3] W. Wei, Q. Zhang, and M. Wang. A method of vehicle classification using models and neural networks. In IEEE Vehicular Technology Conference. IEEE, 2001. [4] E.J. Candes, D.L. Donoho, “Curvelets - A surprisingly effective nonadaptive representation for objects with edges”, Curve and Surface Fitting, Vanderbilt Univ.Press 1999. [5]T. Kato, Y. Ninomiya, and I. Masaki. Preceding vehicle recognition based on learning from sample images. IEEE Transactions on Intelligent Transportation Systems, 3(4):252–260, 2002. [6]A. Lai, G. Fung, and N Yung. Vehicle type classification from visual-based dimension estimation. In IEEE Intelligent Transportation Systems Conference, pages 201–206. IEEE, 2001. [7]N. Matthews, P. An, D. Charnley, and C. Harris. Vehicle detection and recognition in greyscale imagery. Control Engineering Practice, 4(4):472–479, 1996. [8]V.S.Petrovic and T.F.Cootes. Analysis of Features for Rigid Structure Vehicle Type Recognition.2003. [9] Video-based Car Surveillance: License Plate, Make, and Model Recognition,Thesis , Masters of Science in Computer Science University of California, San Diego, 2005. [10] E.J. Candes, D.L. Donoho, “New Tight Frames of Curvelets and Optimal Representations of Objects with Smooth Singularities”, Technical Report, Stanford University, 2002.
146
KAZEMI ET AL.
[11] D.L. Donoho, “De-noising by soft-thresholding”, IEEE Transactions on Information Theory, 1995. [12] B.S. Kashin, V.N. Temlyakov, “On best m-term approximations and the entropy of sets in the space L1” Mathematical Notes 56, 11371157, 1994. [13] E.J. Candes, L. Demanet, D.L. Donoho, L. Ying, “Fast Discrete Curvelet Transforms” Technical Report, Cal Tech, 2005. [14] S. Theodoridis, K. Koutroumbas, Pattern Recognition, Academic Press, New York, 1999. [15] F. M.Kazemi, S. Samadi, “Vehicle Recognition Based on Fourier, Wavelet and Curvelet Transforms - a Comparative Study,” IEEE, International Conference on Information Technology (ITNG’07), USA, pp. 939-940, 2007. [16] F. M.Kazemi, S. Samadi “Vehicle Recognition Based on Fourier, Wavelet and Curvelet Transforms - a Comparative Study” , IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.2, pp. 130-135, 2007.
Vehicle Detection Using a Multi-Agent Vision-Based System 1
Saeed Samadi 1, Farhad Mohamad Kazemi2, Mohamad-R. Akbarzadeh-T ³ Department of Electronics, Khorasan Science and Technology Park (KSTP), Mashad, IRAN, 2 Young Researchers Club, Islamic Azad University, Mashad, IRAN, ³Department of Electrical Engineering, Ferdowsi University, Mashad, IRAN [email protected],[email protected], [email protected]
Abstract- In this paper we propose a multi-agents system for vehicle detection in image. The goal of this system is to be able to localize vehicles in a given image. Developed agents are capable of detecting pre-specified shapes from processing this image. Cooperation involves communicating hypotheses and resolving conflicts between the interpretations of individual agents. Specifically in the proposed system, eight process agents, consisting of edge, contour, wheel, LPL (License-Plate Line), LPR (License-Plate Rectangle), PCV (Plate-Candidates Verification), and vehicle symmetry agents, were developed for vehicle detection in various outdoor scenes. In the testing data, there are 500 car blobs and 100 non-car blobs. We show through experiments that our system is 90.16% effective on Detecting vehicles in various outdoor scenes.
I.
INTRODUCTION
As in the general object detection, the approach to vehicle detection is divided into two branches, the feature based approach and the appearance-based approach. The featurebased approach is the method that can extract vehicle candidates using vehicle features, such as edge, intensity, etc. On the other hand, the appearance-based approach uses predesigned vehicle templates. There is a great deal of previous research on vehicle detection using a vision sensor. Betke et al. [1] developed a system that can extract the edge information on the road, track vehicles by a pre-designed objective function, and detect vehicles by rear-light or headlight in the night time. Sukthankar [2] proposed a vehicle detection and distance estimation method in the night time using the taillight pair. The symmetry feature is also widely used in vehicle detection and validation [3], [4]. Clady [5] and Sotelo [6] used the road and lane information for extracting vehicle candidates. Clady created a binary image for eliminating the non-road region using the road intensity model and then extracted vehicle candidates by analysing the dark area while Sotelo extracted vehicle candidates by edge information extending across the lane boundary and validated them for several frames. This paper deals with on a new methodology for computer vision which tries to enhance the carrying out of three specific aspects: (i) integration of knowledge and uncertainty; (ii) cooperation between visual tasks; (iii) enhance image processing tasks. We will then present our design of a general multi-agent object detection system. Advantages of multi-agent image interpretation are [7]: • Possibility of separated knowledge representation
from different image domains . • Possibility of separating image processing algorithms from control strategies, Accommodating a large variety of control heuristics and selecting the best processing methods for different data situations. • Ease of construction and maintenance. • Ability to benefit from parallel architectures. • Focusing ability: Not all knowledge is needed for all tasks. Both spatial and interpretive focus possible. • Heterogeneous problem solving. • Reliability: An agent may be corrected by others. Several multi-agent image interpretation systems have been reported in the literature. In Ref. [8] a system is presented with low-level agents that produce partial data-driven edge-based segmentations of medical images which are merged into a global result. Agents have only indirect influence on the global result and have, for instance, no possibility to negotiate about segmentation methods and parameter settings with neighboring agents facing similar segmentation problems. Rodin et al. [9] present a parallel image processing system based on simple reactive agents in. Agents act according to a perception–action model without problem solving or deliberation. The system is applied to the detection of concentric striae, like the year rings oftrees, with darkening and lighting agents. Agents in the system by Liu et al. in Ref. [10] can either “breed” or “diffuse” (search) with the aim of producing segmentation results similar to split-and-merge algorithms. Agents are purely reactive with predefined behavior determined by contrast, mean and variance of pixel intensities within a certain region. The agents are claimed to be more robust and efficient than split-and-merge algorithms in case of complex shapes. Clouard et al. [11] describe the automatic assembly and control of a chain of image processing operators using highlevel knowledge to achieve a given image processing task. Their system (Borg) is comparable to the image processing task of a single agent in our system. The goal of this system is always only to detect a single type of objects and the possible interaction with other kinds of objects is beyond the scope of the system, as is semantic interpretation. Issues like occlusion and conflicting interpretations are not considered. Bovenkamp et al.[12] present a novel multi-agent image interpretation system which is markedly different from previous approaches in especially it’s elaborate high-level
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 147–152. © Springer Science+Business Media B.V. 2008
148
SAMADI ET AL.
knowledge-based control over low-level image segmentation algorithms. Agents dynamically adapt segmentation algorithms based on knowledge about global constraints, contextual knowledge, local image information and personal beliefs. The agent knowledge model is general and modular to support easy construction and addition of agents to any image processing task. Each agent in the system is further responsible for one type of high-level object and cooperates with other agents to come to a consistent overall image interpretation. Cooperation involves communicating hypotheses and resolving conflicts between the interpretations of individual agents. That system has been applied to IntraVascular UltraSound (IVUS) images which are segmented by five agents, specialized in lumen, vessel, calcified-plaque, shadow and sidebranch detection. In Section II, the system architecture is introduced while in Section III, we present the system control module. In Section IV, we present the structure of agents. In Section V, the Behaviors of agents is presented. Finally, we present the experimental results using the test images that consists of various outdoor scenes in Section VI and conclusion is presented in Section VII. II. SYSTEM ARCHITECTURE A multi-agent system can be a very good architecture for an image interpretation system. Agent platform and Global view of the multi-agent system architecture as applied to vehicle detection shown in Figure 1 and Figure 2.
Image(gray scale) Edge Agent
Contour Agent
Wheel Agent
LPL Agent
Symmetry Agent
LPR Agent
PCV Agent
Resolve cionflict
Resolve cionflict
Detect Vehicle Fig 1. Agent platform.
Vehicle Agent
Agent platform
Image Processing Platform
Image Process Results
Input Images
Fig. 2. Global view of the multi’agent system architecture as applied to vehicle detection.
III. SYSTEM CONTROL In this system , we have identified the following four basic decisions that an agent can make: . image processing: Object detection, object adjustments,…. .Communication: All operators for communication between agents (send, receive, process-message, etc.). .Organization: Storage, retrieval and removal of image objects. .Conflict resolution: Resolving interpretation conflicts between agents. IV.
THE STRUCTURE OF AGENTS
A. Edge Agent The purpose of this agent was to enrich the edge features. This will improve the successful rate of the LPL (License-Plate Line Agent) and LPR (License-Plate Rectangle Agent) Agents.The algorithms sequentially used in this agent are graying , normalizing and histogram equalization.After having obtained a grey-scale image ,we use Sobel filters to extract the edging image, then thresholding the image to a binary one. The resulted images are used as inputs for the contour , LPL,LPR and wheel agents(see Figure 3). B. Contour Agent In order to detect regions of plate-candidate image, we apply contour agent for detecting closed boundary objects. However , this agent has difficulties in processing bad quality images due to scratches, plug-in helixes. In these cases, the contour agent produces incomplete closed boundary lines that do not contain correctly the plate-images. But , edge & contour agents in cooperation with together jump to the conclusion for solving this problem( see Figure 5). a. trace contour Firstly, we trace the outline of an object in binary image. Nonzero pixels belong to an object and zero pixels constitute the background. Suppose A is a two-element vector specifying the row and column coordinates of the point on the object boundary where we want the tracing to begin. We specify the initial search direction for the next object pixel connected to A.
149
VEHICLE DETECTION USING A MULTI-AGENT VISION-BASED SYSTEM
a
b
a
Fig. 3. (a)Orginal image. (b) Edge orginal image.
b
Fig. 5. (a) Orginal image. (b) contour orginal.
We use strings such as ‘0’ for east, ‘1’ for northeast, to specify the direction. Figure 4 illustrates all the possible values for directions. We get certain number of boundary pixels for each region (normalize). C. LPL Agent(License-Plate Line Agent) This agent detect lines from images. The Hough Transform(HT) is applied to the image to extract lines from object-images[13,14]. The parameterized version of the Hough Transform due to Duda and Hart , states that if a line whose normal makes an angle θ with the x axis, and has a distance ρ from the origin is considered, the equation of a line corresponding to any point (x,y) on this line is given by ρ = x cos θ + y sin θ. (1) The coordinates of each line along with its ρ and θ values are examined and a chain of connected line segments is formed. Any chain formed by the extracted line segments is a candidate license plate if it defines a closed loop and the number of its constituent line segments is 4. Ideally, closed loops consisting of 4 line segments are declared as license plates if they meet the following criteria: • Alternate line segments have the same peak value, which indicates that they are of the same size. • Alternate line segments have the same value of θ, which indicates that they are parallel to each other.(we looked for two parallel lines, whose the contained region is considered plate-Candidates ) • Adjacent line segments have a θ separation of 90°, to form right angled corners of the license plate. • Ratio of the non-parallel edges is a constant. Each candidate is verified by using a Plate–Candidates Verification Agent. However, the main limitation of this agent is the time required since the Hough transform is applied to a usually great number of pixels. Especially, the larger image is, the slower the agent is. The speed of the algorithm may be improved by thining image before applying the Hough transform. However, the thinning algorithm is also slow.
For vehicle tracking purposes, on successful detection of a license plate, a new window of interest is defined around it and all further processing is limited to this window. If no license plate is detected in the present frame, the previously defined window is used for processing in the next iteration. If no vehicle is found within the window of interest for 3 successive iterations, the whole image is defined as the window of interest, and the system then tries to find the vehicle from the entire image(see Figure 6). D. LPR Agent (License-Plate Rectangle Agent) We want to detect patterns that can characterize rectangles. For this purpose, we defined LPR Agent (License-Plate Rectangle Agent) [15]. In this section, we explore geometric characteristics of a rectangle in the domain of the Hough Transform, and such characteristics are used for rectangle detection directly in the Hough space. The proposed agent works for rectangle with unknown dimensions and orientations, and does not require the extraction and/or grouping of linear segments (i.e., it is applied directly to the edge map). Some works have been done about shape description in the Hough Space. Rosenfeld and Weiss proved that a convex polygon is uniquely determined by the peaks of its Hough transform (in fact, these peaks form the convex hull of the polygon).
a
b
c Fig. 4. The figure illustrates all the possible values for directions.
Fig. 6. LPL Agent. (a) Accumulation Array from Hough transform , vertical and horizontal axes are inorder ρ(pixels) and theta(degree). (b) Raw image with Lines detected. (c) Raw image with Line segments
150
SAMADI ET AL.
However, we face a different problem: detecting rectangles in images containing several objects. We want to detect patterns in the Hough Space that can characterize rectangles. For that purpose, we note that a rectangle has specific geometric relations, that can be detected directly in the Hough Space. The proposed algorithm for this agent, is explained in [15]. E. PCV Agent (Plate–Candidates Verification Agent) One of the agents for candidate evaluation is applied on images obtained from the LPL and LPR agents to separate plate object (Plate–Candidates Verification Agent). Our evaluating plate-candidates agent based on one main module .This module is evaluating ratio between height and width of candidate (see Figure 7). a. Evaluate ratio between width and height of candidate In this module, we check and only select out candidates that have height and width ratio satisfied pre-defined constraint: minWHRatio<W/H< maxWHRatio Since there are two main types of Iran plate: 1-row and 2row (Figure 7 ),we have two appropriate constraints for two types. 3.2<W/H<4.2 with 1-row plate-candidates 0.7<W/H<1.3 with 2-row plate-candidates The candidates that satisfied one of two above constraints are selected and passed to the next module. F. Vehicle Agent Vehicle Agent (vehicle detection) consists of 3 modules, that is, the preprocessing module working on the input raw image, the vehicle candidate extraction module by a shadow region and a template, and the validation module by a prior knowledge [16]. a. Preprocessing module: We apply histogram equalization below the vanishing line in the raw image. This process can clear the gap between the dark road and other objects on the road can easily extract the shadow region used as the first feature of the vehicle in the day time. After this process, we create a binary image by a low threshold that can eliminate the bright region. b. Vehicle Candidate Extraction module: As the shadow region between the vehicle and the road appears in the day time regardless of weather conditions or low light conditions and we use the shadow region as the first feature of a vehicle. The shadow region is defined as where the intensity value drops below a certain threshold. As in Figure 8, the region with a certain width and height is scanned from bottom to top until it satisfies the constraint.
At First, the histogram profile of x direction is calculated in the input image I(x ,y) using a horizontal scanning, and then the vertical position of shadow region Vs is extracted using variable shadow height by a vertical scanning like Figure.8 (a). The final shadow region is determined by the “AND” operation of the Vs that represents indices with a high intensity change and the dark area in the preprocessed binary image. After transforming extracted shadow regions to the world coordinate by the inverse perspective transform (IPT), these regions are split among different lane while some regions in non-region of interest are filtered out like Fig.8 (b). If the lane information does not exist like the failure of the lane detection or due to an intersection, etc., three virtual lanes are generated instead of the proper lane width estimated from the ego-vehicle position used by the splitting and filtering of shadow regions. Although one vehicle on the lane change may be clustered to two vehicles, this problem is resolved by the symmetry scanning process. c. Vehicle Candidate Validation module: The intensity change by the road repair, road sign, guardrail, oil spill and shadow, by non-vehicle objects in the road side may be mistaken for shadows. Thus the validation of the vehicle candidate is required and there are many false-positive errors by shadow pattern at the road boundary and the bottom of the guardrail in real experiments. Because, this shadow pattern has a regular histogram in some continuous section, we eliminate this shadow pattern before validation step. At First, the symmetry scanning is used for extracting the exact left and right boundaries. For symmetry scanning, we construct an edge angle map using Sobel operators and scan the left and right regions with the center line of the current vehicle position. The window size for symmetry scanning is the pixel width of the current vehicle. Then, we find the symmetry axis with the maximum symmetry rate by as follows:
Symmetryrate = s 2 n Where s is the number of the absolute same edge angle and n is total number of edges. By checking the two vertical edges at both sides of the vehicle, this system finally validates the candidates using the vertical edge histogram. If vertical histograms at both sides exceed certain ratio of the pixel width of a vehicle, this candidate is valid. G. Wheel Agent This section presents an agent for circle detection (wheel detection) (Figure.9) according to the next side of vehicle’s images. The proposed agent also belongs to the multipoint transform category. Theoretically, to evaluate circle parameters for point triplets in an edge image containing n points, ,C3 enumerations of the edge points have to be examined.
VEHICLE DETECTION USING A MULTI-AGENT VISION-BASED SYSTEM
151
V. BEHAVIOR OF AGENTS
Anyway, if specific relations of the circle points are sought, the required number of enumerations can be reduced enormously. The idea of the proposed agent comes from a property of a circle that any three points lying on it and with two of these points being the diameter, the third point forms a right angle triangle with the remaining two points. Using such a criterion to form specific sets of a feature point triplet, the number of enumerations can be reduced to .C2 only. The performance of the proposed agent is promising in that it is both fast and memory saving as compared with conventional Hough transform methods. For more information refer to [17].
H. Symmetry Agent Symmetry agent is one of the basic features characterizing natural shapes and objects. It is believed to enhance recognition and reconstruction, and is likely to be employed in pre-attentive vision. This section presents an agent for extending the normalized cut (n-cut) segmentation algorithm to find symmetric regions present in natural images. We use an existing algorithm to detect possible symmetries present in an image, quickly. The detected symmetries are then individually verified using the modified n-cut algorithm to eliminate spurious detections. The weights of the n-cut algorithm are modified so as to include both symmetric and spatial affinities. A global parameter is defined to model the tradeoff between spatial coherence and symmetry. We refer the reader to [19] for a description of the n-cuts algorithm. This agent was based on a modified version of n-cuts algorithm to include spatial and symmetry affinities. The blue axis is the principal axis of symmetry. For more information refer to [18].The symmetric structure of the wheels causes them to fall into the symmetric segment (Fig.10).
The internal control system is implemented by a rule system. (1)Features extraction: each agent stores different types of information in a local database. (2)Exploration: agents are exploring environment (image) (3)Cooperation: agents cooperate with each other to enhance the quality of their merging plan. A. Cooperation This section presents cooperative behaviors between agents. From the extracted edging image, we use the contour agent to detect closed boundaries of objects. These contour lines are transformed to Hough coordinate to find two interacted parallel lines (one of two parallel lines hold back the other 2-parallel lines and establishes an parallelogram form object) that are considered as a plate-candidate. Since there are quite a few (black) pixels in the contour lines, transforming these points to Hough coordinate required much less computation. Hence, the speed of the algorithm is improved significantly without a noticeable loss of accuracy (Edge agent+ contour agent +LPL agent). However, some images may exist that include other objects such as glasses, head lights, and decorated goods. These objects may also have the shape of two interacted 2parallel lines , and therefore , are also falsely detected as platecandidates. To reject such incorrect candidates, we implement an agent for evaluating whether a candidate is a plate or not(PCV Agent). LPR & LPL agents will be validated by an agent that is called PCV Agent (Figure.1). Then, PCV & vehicle agents in cooperation with together jump to the conclusion for detecting any vehicle .The obtained consequences by PCV & vehicle agents may be in contradictions that all these contradiction are solvable with some (If-then-else) rules. For detecting the wheels, symmetry & wheel agents must cooperate together and precisely at this time contour & wheel agents have the same act, too. If the gained outcomes by wheel & contour agents be in contradictions, these contradictions are solvable with some (Ifthen-else) rules, too.
152
SAMADI ET AL.
REFERENCES VI. EXPERIMENTAL RESULTS We tested our system on images captured in various outdoor scenes for vehicle detection. This data set includes 600 images that are divided to 250 images from front side or back side of vehicle, 250 images from next side of vehicle and 100 images without any scence of vehicle. . Table 1 shows the system performance on the various images. Figure.11 shows the vehicles detected by our system on four images. In Figure.11.d two vehicles not detected. We showed through experiments that our system is effective in detecting vehicles in various outdoor scenes .These results showed that agents with different image processing behaviors can extract complementary information in the image. VII. CONCLUSION The main contribution of this work is a new architecture for vehicle detection using a vision-based Multi-agent System. We showed the efficiency of the system’s different agents. Our experience with the current system has been that it is easy to improve and modify individual agents and to add new agents to the system. In general, control over the image processing task is also very satisfactory. Evaluation results showed that the system can detect various vehicles in various scenes. Table 1 system performance on the various images Image set Camera position Number Correct of images accuracy A From front side 250 91.2% or back side of vehicle B From next side of 250 89.2% vehicle C non-car scence 100 90% A+B+C 600 90.16%
Error accuracy 8.8% 10.8% 10% 9.84%
[1] Margrit Betke, Esin Haritaoglu, Larry S. Daivs, “Real-time multiple vehicle detection and tracking from a moving vehicle,” Machine Vision and Applications, vol. 12, no. 2, pp. 69-83, 2000. [2] Rahul Sukthankar, “RACCOON: A Real-time Autonomous Car Chaser Operating Optimally at Night,” IEEE Intelligent Vehicles ‘93 Symposium, pp. 37-42, July 1993. [3] Thomas Zielke, Michael Brauckmann and Werner von Seelen, “Intensity and Edge-Based Symmetry Detection with an Application to Car-Following,” Computer Vision and Image Understanding, vol. 58, issue 2, pp. 177-190, September 1993. [4] A. Broggi, P. Cerri and P.C. Antonello, “Multi-Resolution Vehicle Detection using Artificial Vision,” IEEE Intelligent Vehicles 2004 Symposium, pp. 310-314, June 2004. [5] X. Clady, F. Collange, F. Jurie and P. Martinet, “Cars Detection and Tracking with a Vision Sensor,” IEEE Intelligent Vehicles 2003 Symposium, pp. 593-598, June 2003. [6] Aiguel Angel Sotelo and Jose E. Naranjo, “Vision-based Adaptive Cruise Control for Intelligent Road Vehicles,” Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 64-69, September, 2004. [7] D. Crevier, R. Lepage, Knowledge-based image understanding systems: A survey, Comput. Vision and Image Understanding 67 (2) (1997) 161–185. [8] C. Spinu, C. Garbay, J. Chassery, A cooperative and adaptive approach to medical image segmentation, in: M.S.P. Barahona, J. Wyatt (Eds.), Lecture Notes in Arti:cial Intelligence, Springer, Berlin, 1995, pp. 379–390. [9] V. Rodin, F. Harrouet, P. Ballet, J. Tisseau, oRis: Multiagents Approach For Image Processing, in: H. Shi, P.C. CoJeld (Eds.), SPIE Conference on Parallel and Distributed Methods for Image Processing II, Vol. 3452, SPIE, San Diego, CA, 1998, pp. 57–68. [10] J. Liu, Y. Tang, Adaptive image segmentation with distributed behaviorbased agents, IEEE Trans. Pattern Anal. Mach. Intell.21 (6) (1999) 544–551. [11] R. Clouard, A. Elmoataz, C. Porquet, M. Revenu, Borg: A knowledgebased system for automatic generation of image processing programs, IEEE Trans. Pattern Anal. Mach. Intell. 21 (2) (1999) 128–144. [12] E. G. P. Bovenkamp , J. Dijkstra , J. G. Bosch and J. H. C. Reiber, Multiagent segmentation of IVUS images ,Pattern Recognition, Volume 37, Issue 4, April 2004, Pages 647-663. [13] Kamat,V.;Ganesan, S.;An efficient implementation of the Hough transform for detecting vehicle license plates using DSP’S , Real-Time Technology and Applications Symposium, 1995. Proceedings 15-17 May 1995 Page(s):58 - 59 .Digital Object Identifier 10.1109/RTTAS.1995.516201. [14] Tran Duc Duan; Duong Anh Duc; Tran Le Hong Du; Combining Hough transform and contour algorithm for detecting vehicles’ license-plates, Intelligent Multimedia, Video and Speech Processing , 2004. Proceedings of 2004 International Symposium on 20-22 Oct. 2004 Page(s):747 – 750,Digital Object Identifier 10.1109/ISIMP.2004.1434172 [15] Jung, C.R.; Schramm, R.; Rectangle detection based on a windowed Hough transform, Computer Graphics and Image Processing, 2004. Proceedings. 17th Brazilian Symposium on 17-20 Oct. 2004 Page(s):113 – 120. [16] SamYong Kim; Se-Young Oh; JeongKwan Kang; YoungWoo Ryu; Kwangsoo Kim; Sang-Cheol Park; KyongHa Park; Front and rear vehicle detection and tracking in the day and night times using vision and sonar sensor fusion, Intelligent Robots and Systems, 2005. (IROS 2005). 2005 IEEE/RSJ International Conference on2-6 Aug. 2005 Page(s):2173 - 2178 ,Digital Object Identifier 10.1109/IROS.2005.1545321 [17] Lam, W.C.Y.; Yuen, S.Y.; Efficient technique for circle detection using hypothesis filtering and Hough transform ,Vision, Image and Signal Processing, IEE Proceedings-Volume 143, Issue 5, Oct. 1996 Page(s):292 300 [18] Gupta, A.; Prasad, V.S.N.; Davis, L.S.; Extracting Regions of Symmetry ,Image Processing, 2005. ICIP 2005. IEEE International Conference on Volume 3, 11-14 Sept. 2005 Page(s):133 - 136 ,Digital Object Identifier .1109/ICIP.2005.1530346 [19] J. Shi and J. Malik, “Normalized cut and image segmentation,” IEEE Trans. Pattern Anal. and Machine Intell., vol. 22, no. 8, pp. 888–905, 2000.
Using Attacks Ontology in Distributed Intrusion Detection System F. Abdoli1, 2, M. Kahani2 1.Ferdowsi University of Mashhad, 2.Communication and Computer Research Center [email protected], [email protected]
Abstract- In this paper we discussed about utilizing methods and techniques of semantic web in the Intrusion Detection Systems. We study, using of ontology, in a Distributed Intrusion Detection System for extracting semantic relation between computer attacks and intrusions. We used Protégé software for building an ontology specifying computer attacks and intrusion. Our Distributed Intrusion Detection System is a network, contains some systems that every system has an individual Intrusion Detection System; and special central system, that contains our proposed attacks ontology. Every time any system detects an attack or new suspected situation, send detection report for central system , with this ontology the central system can extract the semantic relationship among computer attacks and suspected situations in the network; and it is better to decide about them and consequently reduce the rate of false positive and false negative in Intrusion Detection Systems.
I. INTRODUCTION Since intrusion detection was introduced in the mid-1980s, intrusion detection system has developed for almost twenty years to enhance computer security. High false negative and false positive prevent using intrusion detection system practically. In these years the computer scientists attempt to solve this problem and reduce these false rates. They use so many methods and techniques to improve these systems, such as: Data mining [13], State Transition diagrams [12], Clustering [11], Classification [10] and NeuroFuzzy methods [14] …. They attempt to reduce false rate and increase their reliability. Techniques and methods that semantic webs use for gaining their goals, like a concept of “content” and “Ontology” can be used in so many fields of computer science. In 2002 Victor Raskin and his team opened new field in Information Security, they discussed about using concept of “Ontology” in Information Security [4] and they discussed about its advantage. They discussed about ontology as an extremely promising new paradigm in this field.
They say we have a strong classification tools for unlimited events by using ontology. Every method that provides information security for doing their tasks, using concept of “content”; they can utilize methods and techniques of semantic web, for example Intrusion Detection Systems. In this field some researches has done some works in recent years, and they try to improve Intrusion Detection System by utilizing semantic web methods and techniques, like concept of “ontology”. The main goal of this paper is to utilizing these methods to improving Intrusion Detection System and reducing their false rate. The remainder of this paper is to organize as follows: Section 2 presents related work in the domain of using semantic web methodology in the Intrusion Detection Systems. Section 3 presents our proposed model and proposed ontology and the standard format that we use for communications in our network and at the end of this section we discuss about the advantages and disadvantages of our proposed system. Section 4 is about our future work, and end section is conclusion. II. Related work Using semantic web methodology in Intrusion Detection Systems is young. The first research in this area was done in 2003 [1], [2]. So far there is little research that was done in this domain and they utilize different uses of ontology in this area; that we will discuss below. The first research in this area was done by Jeffrey Undercoffer and his team in 2003 [1], [2]; they produced an ontology specifying a model of computer attack. Their ontology is based upon an analysis of over 4,000 classes of computer intrusions and their corresponding attack strategies and it is categorized according to system component targeted, means of attack, consequence of attack and location of attacker. They argue that any taxonomic characteristics used to define a computer attack be limited in scope to those features that are observable
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 153–158. © Springer Science+Business Media B.V. 2008
154
ABDOLI AND KAHANI
and measurable at the target of the attack. They present their model as a target-centric ontology. Moreover, specifying an ontological representation decouples the data model defining an intrusion from the logic of the intrusion detection system. The decoupling of the data model from the Intrusion Detection System logic enables nonhomogeneous Intrusion Detection System’s to share data without a prior agreement as to the semantics of the data. The second work presents an ontology to describe relationship among features observed by multisenor. There exist two kinds of nodes in ontology: value nodes and attribute nodes. By assigning the weight to the edge between values nodes and their parent attributed node, they provide more flexible matchmaking method for intrusion detection. At the same time, the relationship between attribute nodes and their parent can indicate the locality of desired information. An ontology based cooperative detection function is also given in this work [6]. The third work has described an ontologysupported outbound intrusion detection architecture that organizes agents into execution subenvironments called agent cells. The peer-to-peer arrangement of the cells provides a robust nonhierarchical agent structure, and the cells themselves constitute a way of dealing with the malicious-host problem. An attacker-centric ontology serves as a common-knowledge layer for all agents. Traffic and process signatures are generated and matched by independent cells that provide full intrusion detection functionality. Corrugators fuse diagnosis from multiple cells in order to provide more accurate detection. They said that an ontology for Intrusion Detection architecture is necessary to enable more intelligent behavior in agents, to optimize communication contents and interpretation. And to give formalism to the way the components of an architecture interact [3], [7]. In other research it is use semantical ontology for security domain of Intrusion Detection System. And by utilizing it, raw alarms come from heterogeneous Intrusion Detection System, integrated. In fact this method is use for extracting attack scenarios [5]. In the last project the authors propose a novel Breadth and Depth Bayesian classifier and an inference probabilistic algorithm. The inference algorithm is applied over well defined conceptual information integrated in a hybrid Intrusion Detection System by means of ontologies. They said that the attempt of combining both semantic modeling and probabilistic modeling might be exploited for attack prediction in the Pervasive Computing paradigm [9].
In many field of the Intrusion Detection System we can use the semantic web techniques; for example we can use them in the domain of analyzing user behavior and system activities or identifying known attack pattern and also field of analysis of abnormal behavior and activity of systems and etc. In this paper we utilize the concept of ontology for extracting semantical relationship among attacks, intrusions and suspect activities that occur in different systems in our network and attempt to reduce false rate in Intrusion Detection Systems. III. Proposed model A. Introduction of the proposed model The most deficiencies of current Intrusion Detection Systems are: high False Negative and False Positive. What is the reason? Occasionally Intrusion Detection Systems have mistake in their detections and sometimes the inefficient detection is partly caused by insufficient audit data. Many of Intrusion Detection Systems depend on only one kind of sources: network data or host data. However many intrusions can shows character in both of these two data sources. For these reasons we can extend analyzing and investigating field and utilizing more than one system for detecting intrusions and suspect activities in the network. In this research our goal is to present an Ontology based Distributed Intrusion Detection System (ODIDS), in our ODIDS, we have some different system that each one can have their own Intrusion Detection System, depend on their requirement, so they can do their detection process individually. And also we have a central system, which equipped with our proposed ontology. Central system can extract the semantical relationship among attacks, intrusions and suspect activities that occur in different systems in the network with this ontology. In this network each system is a single node in our network with no attention to their Intrusion Detection System. Every node when encounter with new attacks, intrusions and suspect activities in their local system, they analyze their situation and then send their detection report to the central system. Before sending report to the central system they must change their report to the standard format which is understandable for central system. Every time that central system receives a report from other system, it studies the detected situation and found its status in existing ontology. In the next part of this section we discuss about the proposed ontology.
USING ATTACKS ONTOLOGY IN DISTRIBUTED INTRUSION DETECTION SYSTEM
After analyzing the detected situation in the current ontology, the central system can find one of these results: • Detected situation is a part of an attack which occurred in other nodes. • Detected situation is the same of an attack which occurred in other nodes. • Detected situation is a kind of an attack which occurred in other nodes. • Detected situation is an associate with an attack which detected on other nodes. • There is not any similarity or relation between detected situation and other situation or attack that detected on the other node. After studying status of the detected situation in the current ontology and with due attention to the obtained result, the central system send necessary alarm to the related systems so that they accomplish suitable reaction and if it is necessary the central system can update its ontology. In our network, we have some systems with different kind of Intrusion Detection System, and they must have communication with central system. The key problem for central system is how to understand and correlate different kind of information and report to evaluate them. In this paper, we argue a standard format for communicating between central system and other system in network. We have discussed this issue in the part C. Generally, using standard format has so many advantages, for example in distributed system using a standard format for communicating and sending data and information made high information portability and utilizing concept of metadata for messages, which cause facilitating data and information interpretation and increasing the compatibility capability with available information in system. Also we have reusing capability of information in our system and etc [15], [16]. In this paper via using standard format for message communications we reduce message size and increase message semantic load, therefore the central system can analyze and interpret message comes from other system, and reuse their information. B. Proposed Ontology There is several ways for building an ontology for a special domain. For example, we can reuse old ontology, which is available in that domain. We can rebuild and complete them. The other way for building an ontology is using available taxonomy in that domain and builds related ontology based on that taxonomy [18].
Because of the youngness of using concept of ontology in the Intrusion Detection’s domain there is little ontology in this domain and that ontology are neither comprehensive nor good for our purpose. For this reason we chose the second way for building our proposed ontology. For this intention we use the proposed taxonomy by Simon Hansman and Ray Hunt in 2005 [17]. They proposed taxonomy consists of four dimensions which provide a holistic taxonomy in order to deal with inherent problems in the computer and network attack field. The first dimension covers the attack vector and the main behavior of the attack; in this dimension attacks can be categorized in the following group: Viruses, Worms, Buffer overflow, Denial of service attacks, Network attacks, Password attacks and Trojans etc. The second dimension allows for classification of the attack targets, it says the target of an attack, for example, hardware or software. Vulnerabilities are classified in the third dimension and payloads in the fourth. For designing our proposed ontology we use Protégé software, this is free and open source software, which is available on the web [20]. And also for attack instances we use KDD cup 99 database [21].
Figure 1. High Level Illustration of the Proposed Ontology
Figure 1 presents a high level graphical illustration of our proposed ontology. Designed ontology has two main classes: Vector class and Target class. These classes have their own subclass and branches. For example: each instance of Vector class can be categorized in one of these subclasses: Viruses, Worms, Buffer overflow, Denial of service attacks, Network attacks, Password attacks and Trojans (figure 2). Each of these subclasses has their own subclasses that facilitating attack reasoning through proposed ontology. For example: By subclasses of Viruses class, proposed ontology
155
156
ABDOLI AND KAHANI
more easily can found the attack that destroyed files. Figure 3 illustrates the Viruses class and the relation between its subclasses. Each of the subclasses has the relation of “a kind of” to main class (Viruses class). Every time the central system received a report from other systems, due attention to the attack or detected situation properties; if it is necessary, it will updates its ontology and send alarm to the related system or systems. Every time that the central system received a report as an attack from other systems, into consideration of the attack properties, updates its ontology. Therefore if it receives a same report as a suspected behavior from other systems, it will easily specify its attack; visa versa can be done. Figure 3. Illustration of the Viruses class
Figure 2. Illustration of the Vector class
C. Standard format for sending messages In our proposed Intrusion Detection System, every time in our network, each system detected a new suspected situation in its system, they send their report for the central system. And every time the central system updates the ontology or detected new attacks, it will send suitable message for the other system in the network. also in our proposed Intrusion Detection System, every system has its own Intrusion Detection System, which is suitable for its requirements; therefore their reports for central system maybe different from each other, depend on its methodology for intrusion detection; there are different types of Intrusion Detection System, such as those are based on signatures, anomalies, correlation, network monitoring, host monitoring, or application monitoring. The standard format design must attempt to accommodate these diverse approaches by concentrating on conveying “what” an Intrusion Detection System has detected, rather than “how” it detected it. Hence we should modify a standard message format for sending message via our network and every system before sending a massage or report for the central system must convert it to the standard message format which is understandable for central system. The message header in our standard message format contains [19]: <Message type>, <sender>, <Message type>: the type of message can be observed, from this part of header,. The message type can be: • A report of suspected situation or a new
USING ATTACKS ONTOLOGY IN DISTRIBUTED INTRUSION DETECTION SYSTEM
detected attack in one of the system in our network. •
An alarm from the central system to the other system in our network. In this paper we have five kinds of alarm in our network.
<Sender> and : every system in our network can be a sender or a receiver for a message. Sometimes the message receiver is not a single system, for example, the central system can send a message for all systems or a special group of our system in our network. The main part of the message in our standard message format is a bout the data for example; the data part of a report that the central system receives contains fields and information that are suitable and usable for detecting the new detected situation on the ontology. This part of the standard message format contains fields that present IDS type of sender system and the properties of the detected situations and its signature in the sender system, this signature contains details about the detected situation like using protocol, intruder method and etc. Alarm messages are send by the central system to other systems, their message body (data) is about the central system’s detections. Here we have 5 types of alarms: • Detected situation is a part of an attack which occurred on other nodes, in this message, alarm is sending to the all the systems that attack’s parts have been detected on them. •
Detected situation is the same as an attack which occurred on other nodes. In this case message body contains attacks and failures properties. Alarm is sending to the system that detected situation has been detected on it.
•
Detected situation is a kind of attack which occurred on other nodes. In this case message body contains attacks and failures properties. Alarm is sending to the system that detected situation has been detected on it.
•
Detected situation is an associate with an attack or a detected situation which detected on other nodes; in fact they are connected to each other. In this case the alarm is sending to all the systems that are present in attack.
•
157
There is no similarity or relation between detected situation and other situation or attack that detected on the other node. In this case the message contains no alarm and it’s only for notifications.
D. Advantage and disadvantage Our proposed ODIDS can be criticized as bellow: • single point of failure •
time-wasting
The central analyzer is a single point of failure. If an intruder can somehow prevent it from working (for example, by crashing or slowing down the host where it runs), the whole network will be unprotected. But in our network every system has its own Intrusion Detection System. By this property we can solve the single point of failure in our network. The other criticism of our system time wasting. For making connection among the central system and other systems in the network and sending and receiving message among them additional time is needed; in this paper we use a standard message format for sending and receiving these messages in order to reduce time wasting. Also our proposed system may lefts time for sufficient time to central system for processing and extracting the semantic relationships among the attacks and new intrusions. For solving this problem, we can improve our proposed ontology to facilitate its processing and extracting knowledge. Also we can improve hardware of the central system. In our network every system utilizes two Intrusion Detection Systems; first, its own and the second one is the ontology-based decision making on the central system. This characteristic is an important property of our network that we hope to reduce the false rate in our network. In addition when they found a new attack or suspected situation in their system by using own Intrusion Detection System, the central system can help them and send suitable alarm for them. Because the central system extracts the semantical relationship among attacks; a special situation in one system may not be an attack but with the other situation that occur on the other system, they make an attack, or a situation that is unknown and local Intrusion Detection System could not find whether it is an attack or not but the central system find that similar situation in the other system of the network is detected that was an intrusion. Therefore by studying this ontology the central system easily can find whether a special situation is a misuse behavior or not. Therefore false
158
ABDOLI AND KAHANI
negative and false positive of the detection system are reduced. IV.Future work Our future work will focus on the improvement of proposed attack ontology in intrusion detection domain, it means that we want to improve it to contain more attacks and also we will try to find a new approach to speed up the final speed of our proposed Distributed Intrusion Detection System. V. Conclusion This paper introduced a novel Distributed Intrusion Detection System that using special attack ontology to find semantic relationship among attacks and intrusions. Every system in our network has its own Intrusion Detection System that is suitable for its requirement and also they follow the central system’s alarms. Therefore we can reduce false rate in our proposed system. We used Protégé software for designing our proposed ontology. Because of using an individual Intrusion Detection System in each node in our network, we can solve the single point of failure in our network. In addition, we use a standard message format for sending and receiving message in our network.
[5]
[6] [7] [8] [9]
[10]
[11]
[12]
[13] [14]
REFERENCES [1] [2]
[3]
[4]
Undercoffer. J, Joshi. A, Pinkston. J, Modeling Computer Attacks: An Ontology for Intrusion Detection, Springer, pp. 113–135, 2003. J. Undercoffer, A. Joshi,, T. Finin, and John Pinkston, “A target centric ontology for intrusion detection: using DAML+OIL to classify intrusive behaviors”, Knowledge Engineering Review, Cambridge University Press, pp. 2329, January, 2004. S. Mandujano, A. Galván, J. A. Nolazco, “An Ontologybased Multiagent Architecture for Outbound Intrusion Detection”, 3rd ACS/IEEE International Conference on Computer Systems and Applications, AICCSA ´05, vol. 1, pp. 120-128, Cairo, Egypt, January 2005. V. Raskin, C. Helpenmann, K. Triezenberg, and S. Nirenburg, “Ontology in information security: a useful
[15] [16] [17] [18] [19] [20] [21]
theoretical foundation and methodological tool”, New Security Paradigms Workshop, ACM Press, pp. 53-59, Cloudcroft, NM, 2001. Yan, W., Hou, E., Ansari, N., Extracting and querying network attack scenarios knowledge in IDS using PCTCG and alert semantic networks,. IEEE International Conference 2005. Yanxiang.H, Wei.C, Min.Y and Wenling.P ,Ontology Based Cooperative Intrusion Detection System, Network and Parallel Computing, 2004 springerlink Mandujano. S, An Ontology-supported Intrusion Detection System, Taiwanese Association for Artificial Intelligence, 2005 Klaus. M, IDS - Intrusion Detection System, 2005 Anagnostopoulos, T.; Anagnostopoulos, C.; Hadjiefthymiades, S., Enabling attack behavior prediction in ubiquitous environments, Pervasive Services, ICPS ‘05. 2005. Gomez J., Dasgupta D., “Evolving Fuzzy Classifiers for Intrusion Detection”, Proceeding Of 2002 IEEE Workshop on Information Assurance, United States Military Academy, West Point NY, June 2001. Guan Y., Ghorbani A. And Belacel N., “Y-means: A Clustering Method for Intrusion Detection”, Proceedings of Canadian Conference on Electrical and Computer Engineering. Montreal, Quebec, Canada. May 4-7, 2003. Ilgun K., Kemmerer R.A., and Porras P.A., “State Transition Analysis: A Rule-Based Intrusion Detection Approach,” IEEE Transaction on Software Engineering, Vol 2, No 3, 21(3), March 1995. Lee W., Stolfo S.J., Mok K., “A data mining framework for building intrusion detection models”, Proceedings of IEEE Symposium on Security and Privacy, pp 120 –132, 1999. Mohajerani M., Morini A., Kianie M. “NFIDS: A NeuroFuzzy Intrusion Detection System”, IEEE 2003. Lait, Leslie R.; Nash, Eric R.; Newman, Paul A. , The df A proposed data format standard, NASA Center: Goddard Space Flight Center, Mar 1, 1993 09-Ashbindu-GEAS_19 October - The advantage of standard format alerts. www.oasis-open.org/events/ITU-TSimon H, Ray , A taxonomy of network and computer attacks, Elsevier, Computers & Security (2005) 24, 31e43 Deborah L. McGuinness, Ontology Come og Age, spinning the semantic web,2003. DU.Y, WANG. H, PANG. Y, Design of A Distributed Intrusion Detection System Based on Independent Agents, IEEE 2004. http://protege.stanford.edu kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
Predicting Effectively the Pronunciation of Chinese Polyphones by Extracting the Lexical Information Feng-Long Huang, Shu-Yu Ke, Qiong-Wen Fan Department of Computer Science and Information Engineering National United University No. 1, Lienda, Miaoli, Taiwan, R. O. C. 36003 {flhuang,U9324029,U9324016}@nuu.edu.tw
Abstract One of the difficult tasks on Natural Language Processing (NLP) is the sense ambiguity of characters or words on text, such as polyphones, homonymy, and homograph. The paper addresses the ambiguity and issues of Chinese character polyphones, and disambiguity approaches for it. The Sinica ASCED will be used as the dictionary by matching the Chinese words. Furthermore we proposed the two voting schemes, preference and winner take all scoring, for solving the issues. The approach of unifying several methods in the paper will be discussed for promoting the better performance. The final precision rate of experimental result achieves 92.72%. Keywords: Word Sense Disambiguity, Language Model, Voting Scheme.
I. Introduction In recent years, natural language processing (NLP) has been studied and discussed on many fields, such as machine translation, speech processing, lexical analysis, information retrieval, spelling prediction, hand-writing recognition, and so on [1][2]. In the computational models,syntax models parsing, word segmentation and generation of statistical language models have been the focus tasks. In general, no matter what kinds of natural languages, there will be always a phenomenon of ambiguity among characters or words in sentences, such as polyphone, homonymy, homograph, and then the combination of them. The issues are so-called sense ambiguity [3][4]. Disambiguating the issues of word sense disambiguation (WSD) can alleviate the problems of ambiguity in NLP. The paper address the dictionary matching, N-gram language model and voting scheme, which includes two scoring methods: preference and winner take all, to retrieve the Chinese lexical knowledge. The lexical information will be employed to process WSD on Chinese polyphonic characters. There are near 5700 frequent unique characters and among them near 1000 characters have more than 2 different pronunciations, they are the polyphonic characters. Table 2 shows all the 10 these characters, which are selected randomly, discussed in the paper. In the following, the paper is organized: the related
works on WSD are presented in Section II. The proposed methods will be described in Section III and the experimental results are listed and then analyzed furthermore in Section IV. The conclusions and future works are listed in last section.
II. Related works Resolving automatically the word sense ambiguity can enhance the language understanding, which will used on several fields, such as information retrival, document category, grammar analysis, speech processing and text preprocessing, and so on. In the past decades, ambiguity issues are always considered as AI-complete. Based on the generation of large amount of machine readable text, WSD has been one of important tasks on NLP. The approaches [10] on WSD are categorized as follows: 1) Machine-Readable Dictionaries (MRD): Relying on the word information in dictionary for ambiguity. 2) Computational Lexicons: Employing the lexical information, such as the well-known WordNet [11], which contains the lexical clues of characters and lattice among related characters 3) Corpus-based methods Such as part-of-speech (POS), frequency and location of characters and words[12]. There are many works addressing WSD and several methods have been proposed so far. Because of the unique features of Chinese language, such as Chinese word segmentation, more than two different features will be used to achieve higher prediction rate. Therefore, two methods will be arranged furthermore.
III. Description of Proposed Methods In this paper, several methods are proposed to disambiguate the polyphones of Chinese characters; Dictionary Matching, Language Models and voting Scheme
3.1 Dictionary Matching In order to predict correctly the pronunciation category of polyphones, dictionary matching will be exploited for the ambiguity issue. Within a Chinese sentence, the location of polyphonic character Cp is
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 159–165. © Springer Science+Business Media B.V. 2008
160
HUANG ET AL.
set as the centre, we extract the right and left substring based on the centre Cp. Two substrings are denoted as CHL and CHR. In a window size, all possible substrings in CHL and CHR will be segmented and then match the lexicons in dictionary. If the words are existed on both substrings, then we can decide the pronunciation of polyphone based on the priority of longest word and highest frequency of word; length of word first and then frequency of word secondly. In the paper, window size=6 Chinese characters; that means LEN(CHL)= LEN(CHR)=6。 The Chinese dictionary is available and contains near 130K Chinese words (zhong1 wen2 ci2,
labor and time extensive. In general, unigram, bigram and trigram (3<=N) [5][6] are generated. N-gram model calculates P(.) of N th events by the preceding N-1 events, rather than string w1,w2,w3…wN-1. In short, N-gram is so-called N-1)th-order Markov model, which calculate conditional probability of successive events: calculate the probability of N th event while preceding (N-1) event occurs. Basically, N-gram Language Model is expressed as follows: P(
中文詞). Each Chinese word may be composed from
hau4, 注 音 符 號 ); which decided correctly pronunciation of polyphonic character in the word. The algorithm of dictionary matching is described as follows: step 1. Read in the sentence with a polyphone. , step 2. Based on the location of Cp, all the possible substring of CHL and CHR within window (size=6) will be segmented and extracted, then compared with lexicons in Chinese dictionary. step 3. If any word can be found on both substring goto step 4, else goto step 5. step 4. Decide the pronunciation of polyphone on the priority of longest word and then the highest frequency of word. The process ends. step 5. The pronunciation of polyphone Cp will be decided by sequential methods; that means the will be started to disambiguate.
3.2 Language Models - LMs In recent years, the statistical language models have been used in NLP. Supoosed that W=w1,w2,w3,…wn , where wi and n denote the the ith Chinese character and its number in a sentence (0 )。 P(W)=P(w1 ,w2 .....,wn ), //using chain rules. P(
)= P(w1)P(w2|w1)P(w3|
)...P(wn|
(1)
=∏ where
)
denotes string w1,w2,w3,…wk-1.
In formula (1), we calculate the probability , starting from w1, by using w1,w2,w3…wk-1 substring to predict the occurrence probability of wk. In case of longer string, it is necessary for large amount of corpus to train the language model with better performance. It will lead to spending much
∏
|
(2)
N=1, unigram or zero-order markov model. N=2, bigram or first-order markov model. N=3, trigram or second-order markov model.
2 to 12 Chinese characters (zhong1 wen2 zi4, 中 文 字 ). All the words in dictionary contain its frequency, POS, and pronunciation (Juu4 yin1 fu2
)
In formula (2), the relative frequency will be used for calculating the P(.): P(
|
)=
,
(3)
where C(w) denotes the count of event w occurring in training data. In formula (3), the obtained probability P(.) is called Maximum Likelihood Estimation (MLE). While predicting the pronunciation category, we can predict based on the probability on each category t (1 ), T denotes the number of categories for the polyphonic character. The category with maximum probability Pmax(.) will be the target and then the correct pronunciation with respect to the polyphonic character can be decided further.
3.3 Voting Scheme In contrast to the N-gram models above, we proposed voting scheme with similar concept for use to select in human being society. Basically, we vote for one candidate and the candidates with maximum votes will be the winner. In real world, maybe more than one candidate will win the section game while disambiguation process only one category of polyphone will be the final target with respect to the pronunciation. The voting scheme can be described as follows: each token in sentence play the voter for vote for favorite candidate based on the probability calculated by the lexical features of tokens. The total score S(W) accumulated from all voters for each category will be obtained, and the candidate category with highest score is the final winner. In the paper, there are two voting methods:
1)
Winner Take All:
In the voting method, the probability is calculated as follows: ,
(4)
161
PREDICTING EFFECTIVELY THE PRONUNCIATION OF CHINESE POLYPHONES
where w denotes the ouucrrences of in , denotes the occurr training corpus, and ences of token on category . is regarded as the In formula (4) above, on category t. In winner take all probability of scoring, the category with maximum probability will win the ticket. On the other hand, it win one ticket (1 score) while all other categories can’t be assigned any ticket (0 score). Therefore, each voter has just one ticket for voting. The voting scheme is as follows: 1 0
if all other categories
(5)
Based on the formula (5), the total score for each categories can be accumulated for all tokens in sentence: S(W) =P(w1)+P(w2)+P(w3)+…+P(wn)
(6)
=∑
2) Preference Scoring: Another voting method is called as preference. For a token in sentence, the summation of the probability for all the categories of a polyphone character will be equal to 1. Let us show an example (E1) for two voting methods. As presented in Table 1, the polyphone character 卷 has three different pronunciations, 1 ㄐㄩㄢˋ, 2 ㄐㄩㄢˇ and 3 ㄑㄩㄢˊ. Supposed that the occurrence of token 白卷 (blank examination) in these categories are 26, 11 and 3, total occurrence is 40. Therefore, the score for each category by two scoring methods can be calculated. 教育社會方面都繳了白卷 (E1) Government handed over a blank examination paper in education and society. Table 1: example of two scoring scheme. category
count
preference
w-t-all
1 ㄐㄩㄢˋ
26
26/40=0.65
40/40=1
2 ㄐㄩㄢˇ
11
11/40=0.275
0/40=0
3 ㄑㄩㄢˊ
3
3/40=0.75
0/40=0
Total ∑
40
1 score
1 score
3.4 Unknown events-Zero count As shown in Eq. (4), C(•) of a novel, which don’t occur in the training corpus, may be zero because of the limited training data and infinite language. It is always hard for us to collect sufficient datum. The potential issue of MLE is that the probability for unseen events is exactly zero. This is so-called the zero-count problem. It is obvious that zero count will
lead to the zero probability of P(•) in Eqs. (3) and (4). There are many smoothing works in [7][8][9]. The paper adopted the additive discounting for calculating P * as follows: N . p * = (c + δ ) (7) N + Bδ where δ denotes a small value ( δ <0.5); which will be added into all the known and unknown events. The smoothing method will alleviate the zero count issue in language model.
3.5 Classifier-Predicting the Categories Supposed that polyphone has T categories, 1 , how can we predict the correct target ̂ ? As shown in formula (8), the category with maximum probability or score will be the most possible target: ̂=
Pt (W), or
̂=
St (W),
(8)
where Pt (W) is the probability of W in category t, which can be obtained form Eqs(1) for LMs and St (W) is the total score based on the voting scheme from Eqs(6).
IV. Experiment Results In the paper, 10 Chinese polyphones are selected randomly from more than 1000 polyphones in Chinese language. All the possible pronunciations are list in Table 2 and Table 3, 4 characters with 2 or 3 categories and one character with 4 or 5 categories respectively.
4.1 Dictionary and Corpus Academic Sinica Chinese Electronic dictionary, ASCED) contains more than 130K Chinese words, composing of 2 to 11 characters. The word in ASCED is with Part-of-speech (POS), frequency and pronunciation for each character. The experimental data are collected from the corpus of Sinica and news from China Times. The sentences with one of 10 polyphone characters are collected randomly. There are totally 9070 sentences, which are divided into two parts: 8030 (88.5%) and 1040 (11.5%) sentences for training and outside testing respectively.
4.2 Experiment Results Three LMs models are generated: unigram, bigram and trigram. Precision Rate (PR) can be defined as: PR =
NO.
(9)
Method 1: Dictionary Matching The predicted results are shown in Table 3. There are 69 sentences processed by the matching phase and 7 sentences are predicted wrong. The PR reach 89.86%. In the following, several examples are presented:
162
HUANG ET AL.
我們回頭看看中國人的歷史。 We look back the history of Chinese.
(E2)
Based on the matching algorithm, two substring CHL and CHR of polyphone 中 for (E2);
1)Winner take all: Three models; unitoken, betoken and tritoken are generated. As shown in Table 5. Bitoken achieves highest PR of 90.17%.
CHL =”們回頭看看中”,
2)Preference: Three models; unitoken, betoken and tritoken are generated. As shown in Table 6. Bitoken achieves highest PR of 92.72%。
CHR=” 中國人的歷史”.
4.3 Word 2007 precision rate
Upon the word segmentation, the Chinese word and pronunciation are as follows: CHL
CHR
看中
83
ㄓㄨㄥˋ
中國
3542
ㄓㄨㄥ
4.4 Results Analysis
中國人
487
ㄓㄨㄥ
In the paper, preference, winner take all, and language Model are proposed. We compare these methods with MS Word 2007. Preference betoken achieves highest PR among these models and reaches 92.72%. It is apparent that our proposed methods are all superior to the Word 2007.
According the priority of length of word first,中 國人 (Chinese people) will decide the pronunciation of 中 as ㄓㄨㄥ. 看中文再用廣東話猜發音。
(E3)
Read the Chinese and then pronounce it in Canton. Chinese words in CHL 看中
83
MS Office is the famous and well used package around world. MS Word 2007 translate on same testing sentences, the PR is shown in Table 7 and reaches 89.8% in average for same testing sentences.
Chinese words in CHR
ㄓㄨㄥˋ
中文
343
ㄓㄨㄥ
In the following, two examples are shown for correct and wrong prediction by Word 2007.
ˋ ˋ ˋ ˋ ˋ ˇ ˊ ˋ ㄐㄧㄠ ㄩ ㄕㄜ ㄏㄨㄟ ㄈㄤ ㄇㄧㄢ ㄉㄡ ㄐㄧㄠ ˙ㄌㄌ ㄅㄞ ㄐㄩㄢ
教
育 社
會
方
面
都
繳
了
白
卷
Government handed overt a blank examination paper in education and society. (Correct prediction) ˋ ˊ ˊ ˋ ˊ ˋ ˇ ㄅㄤ ㄖㄨㄛ ㄨ ㄖㄣ ㄅㄢ ㄗ ㄧㄢ ㄗ ㄩ
峰迴路轉再看中國方面.
(E4)
The path winds along mountain ridges, then watch the reflection of China. Chinese words in CHL 看中
83
ㄓㄨㄥˋ
無 人 般 自 言 自 語
Talking to oneself as if nobody is around. (wrong prediction)
V. Conclusion
3542
In the paper, we address the issue of ambiguity of Chinese polyphone. First, three methods are proposed to disambiguate: dictionary matching, language models and voting scheme; the last has two scoring methods: winner take all and preference. Compared with the well-known MS Word 2007, our methods are superior and reach 2.92% higher. Combining the matching method, the results as shown in Table 8 can’t prove it is effective.
中國
ㄓㄨㄥ
CHR 中央 中央研究院
2979
ㄓㄨㄥ
50
ㄓㄨㄥ
In example (E5), only CHR contains the segmented words. On the other hand, there are no word in CHL
Method 2: Language Model (LMs) The experiment results of three models unigram、 bigram、trigram are listed in Table 4. Bigram reaches 92.58%, which is highest among three models. Method 3:Voting Scheme
若
Chinese words in CHR
中央研究院未來的展望。 (E5) The future forecast of Academic Sinica of Chinese. Chinese words in CHL
傍
In future, several researches should be attended: • Collecting more corpus and extend the proposed methods to other Chinese polyphones. • More lexical features, such as locationand semantic information, used to enhance the precision rate. • Improving the smoothing techniques.
PREDICTING EFFECTIVELY THE PRONUNCIATION OF CHINESE POLYPHONES
Reference [1] Yan Wu, Xiukun Li and Caesar Lun, 2006, A Structural-Based Approach to Cantonese-English Machine Translation, Computational Linguistics and Chinese Language Processing, Vol. 11, No. 2, June 2006, pp. 137-158 [2] Brian D. Davison, Marc Najork, Tim Converse, 2006, SIGIR Workshop Report, Vol. 40 No. 2. [3] Oliveira, F.; Wong, F.; Li, Y.-P. ,2005, Machine Learning and Cybernetics,. Proceedings of 2005 International Conference on Volume 6, Issue , 18-21 Aug. 2005 Vol. 6, An unsupervised & statistical word sense tagging using bilingual sources, Page(s): 3749 - 3754 [4] E. Agirre, P. Edmonds, 2006, Word Sense Disambiguation Algorithms and Applications, Springer. [5] Jurafsky D. and Martin J. H., 2000, Speech and Language Processing, Prentice Hall. [6] Jui-Feng Yeh, Chung-Hsien Wu and Mao-Zhu Yang, 2006, Stochastic Discourse Modeling in Spoken Dialogue Systems Using Semantic Dependency Graphs, Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 937–944. [7] Standley F. Chen and Ronald Rosenfeld, January 2000, A Survey of Smoothing Techniques, for ME Models, IEEE
Transactions on Speech and Audio Processing, Vol. 8, No. 1, pp. 37-50. [8] Church K. W. and Gale W. A., 1991, A Comparison of the Enhanced Good-Turing and Deleted Estimation Methods for Estimating Probabilies of English Bigrams, Computer Speech and Language, Vol. 5, pp 19-54. [9] Chen Standy F. and Goodman Joshua, 1999, An Empirical study of smoothing Techniques for Language Modeling, Computer Speech and Language, Vol. 13, pp. 359-394. [10] Nancy Ide and Jean Véronis, 1998, 24(1) , Word Sense Disambiguation,The state of the art Computational Linguis- tics, pp. 1~41. [11] Miller, George A.; Beckwith, Richard T. Fellbaum,Christiane D.; Gross, Derek; and Miller, Katherine J. (1990). WordNet: A non-line lexical database. International Journal of Lexicography, 3(4), 235-244. [12] Church, Kenneth W. and Mercer, Robert L.(1993). Introduction to the Special Issue on Computational Linguistics using Large Corpora. Computational Linguistics, 19(1), 1-24.
Table 2: 10 Chinese polyphone characters; its phonetics and meanings. target 中 乘 乾 了 傍
作
著
卷
咽
163
phonetics
word
English
han yu pin yin
ㄓㄨㄥ
中心
center
zhong xin
ㄓㄨㄥˋ
中毒
poison
zhong du
ㄔㄥˊ
乘法
multiplication
cheng fa
ㄕㄥˋ
大乘
ㄍㄢ
乾淨
clean
gan jing
ㄑㄧㄢˊ
乾坤
the universe
qian kun
ㄌㄜ˙
為了
in order to
wei le
ㄌㄧㄠˇ
了解
understand
liao jie
ㄆㄤˊ
傍邊
beside
pang bian
ㄅㄤ
傍晚
bang wan
ㄅㄤˋ
依山傍水
nightfall near the mountain and by the river
ㄗㄨㄛˋ
工作
ㄗㄨㄛ
作揖
ㄗㄨㄛˊ
作興
ㄓㄜ˙
忙著
busy
mang zhe
ㄓㄠ
著急 著想
anxious to bear in mind the interest of
zhao ji
ㄓㄠˊ ㄓㄨˋ
著名
famous
zhu
ㄓㄨㄛˊ
執著
inflexible
zhuo
ㄐㄩㄢˋ
考卷
a test paper
kao juan
ㄐㄩㄢˇ
卷髮
curly hair
Juan fa
ㄑㄩㄢˊ
卷曲
curl
quan qu
ㄧㄢ
咽喉
the throat
yan hou
ㄧㄢˋ
吞咽
swallow
tun yan
da sheng
work
yi shan bang shui
gong zuo zuo yi zuo xing
zhao xiang
164
HUANG ET AL.
咽
從
ㄧㄢ
咽喉
the throat
ㄧㄢˋ
吞咽
swallow
tun yan
ㄧㄝˋ
哽咽
to choke
geng ye
ㄘㄨㄥˊ
從事
to devote oneself
cong shi
ㄗㄨㄥˋ
僕從
servant
pu zong
ㄘㄨㄥ
從容
cong rong
ㄗㄨㄥ
從橫
calm; unhurried in length and breadth
yan hou
zong heng
ps: han yu pin yin Table 3: dictionary matching; total processed sentences of outside testing and precision rate (PR). 中
乘
乾
了
傍
作
著
卷
咽
從
total/avg.
total sentences
33
4
0
0
0
16
11
1
0
4
69
error no.
2
0
0
0
0
Precision rate 93.94 100 ---ps: -- denotes no sentence is processed by matching.
1
3
93.75
72.73
1
0
0
--
0
7
100
89.86
Table 4:avg. PR of outside testing on Language Model.
unigram
中
乘
乾
了
傍
作
著
卷
咽
從
95.88
86.84
92.31
70.21
85.71
96.23
75.32
100
98
91.67
avg. 89.98
bigram
96.75
84.21
96.15
85.11
92.86
94.34
81.17
96.30
100
93.52
92.58*
trigram
80.04
57.89
61.54
58.51
78.57
52.83
60.39
62.96
88
71.30
70.50
PS. * best PR among three models.
Table 5: avg. PR of outside testing on Winner take all scoring. 中
乘
乾
了
傍
作
著
卷
咽
從
unichar
96.96
84.21
80.77
57.45
71.43
94.34
bichar
96.75
86.84
96.15
79.79
85.71
92.45
58.44
85.19
84
87.04
84.69
68.83
100
98
93.52
90.17*
trichar
79.83
60.53
61.54
60.64
78.57
52.83
59.74
66.67
88
71.3
70.69
avg.
avg.
PS. * best PR among three models.
Table 6: avg. PR of outside testing on preference scoring. token
中
乘
乾
了
傍
作
著
卷
咽
從
unichar.
96.96
84.21
80.77
70.21
71.43
94.34
70.13
85.19
88
87.96
87.76
bichar.
96.75
86.84
96.15
87.23
85.71
93.40
81.17
100
98
93.52
92.72*
trichar.
80.04
60.53
61.54
60.64
78.57
52.83
59.74
66.67
88
71.30
70.78
PS. * best PR among three models.
Table 7: avg. PR of outside testing on MS Word 2007
word 2007
中
乘
乾
了
傍
作
著
卷
咽
從
93.37
76.47
76.67
83.65
78.57
93.70
78.33
82.76
100
91.51
avg. 89.80
165
PREDICTING EFFECTIVELY THE PRONUNCIATION OF CHINESE POLYPHONES
Table 8: employing dictionary matching first, then adopting preference betoken scoring. 中
乘
乾
了
傍
作
正確率 96.75 86.84 96.15 87.23 85.71 93.40 Ps. Same PR with betoken of preference scoring in Table 6.
著
卷
咽
從
81.82
96.30
98
93.52
avg.
92.72
MiniDMAIC: An Approach for Causal Analysis and Resolution in Software Development Projects Márcia G. S. Gonçalves et al. Abstract Handling problems and defects in software development projects is still a difficult matter in many organizations. The problems analyses, when performed, usually do not focus on the problems sources and root causes. As a result, bad decisions are taken and the problem is not solved or can even be aggravated by rework, dissatisfaction and cost increase due to lack of quality. These difficulties make it hard for organizations that adopt the CMMI model to implement the Causal Analysis and Resolution (CAR) process area in software projects, as projects usually have to deal with very limited resources. In this context, this work proposes an approach, called MiniDMAIC, for analyzing and resolving defect and problem causes in software development projects. The MiniDMAIC approach is based on Six Sigma’s DMAIC methodology and the Causal Analysis and Resolution process area from CMMI Level 5.
1. Introduction The search for better products and services leads many companies to research and adopt a large number of techniques, tools and models that can improve its products quality. In this context, initiatives such as CMMI [1] and Six Sigma [2] are seen as solutions for cost reduction and quality improvement. Achieving the highest CMMI maturity level demands much effort and dedication of all the people in the organization. One of the biggest challenges is the use of the Causal Analysis and Resolution (CAR) process area in software projects, analyzing and solving problems is a difficult and costly activity and the projects usually have very limited resources. The DMAIC methodology provides a systematic process that defines how problems should be addressed. While grouping some of the main quality tools it also defines a standardized routine for problems resolution that has been proved effective in many software organizations. Many organizations that want to achieve higher CMMI maturity levels have used Six Sigma/DMAIC in combination with CMMI practices. The benefits of using the Six Sigma methodology together with CMMI are very clear, but most of the
times the implementation of the Causal Analysis and Resolution process area in software projects is not feasible due to the following reasons: • DMAIC projects usually last from three to six months. However, projects require a faster resolution of its problems. • DMAIC projects can become too expensive due to the intensive use of statistical tools. For software projects, the obtained benefits may be less than the cost to achieve the improvements. Projects also have more limited resources, what makes this issue even more critical. • A DMAIC team must be very qualified in statistics. Software development projects, on the other hand, can benefit more of other abilities such as project management and business domain knowledge. • DMAIC uses complex tools, with a high implementation cost and a relatively slow return on investment, what makes it not feasible for most projects. In this context, this work presents DMAIC-based methodology, called MiniDMAIC, to implement the Causal Analysis and Resolution CMMI process area in software development projects. The goal of this methodology is to reduce the disadvantages of using DMAIC listed above.
2. Six Sigma and the DMAIC Methodology Six Sigma is a methodology that has been largely used and has brought many benefits to the companies that have adopted it. The goal of Six Sigma is to reduce or eliminate errors, defects and failures in a process. It also aims at process variability reduction and can be applied in most sectors of economic activity [3]. To achieve Six Sigma means reducing defects, errors and failures1 to zero. The methodology combines a rigorous statistical focus with a set of tools that are used to characterize the variability sources in order to control and improve processes results [5]. This work focus on one of the main Six Sigma methodologies: DMAIC (Define, Measure, Analyze, Improve and Control). 1 In Six Sigma, defects, errors and failures are any deviation of a characteristic that may cause the client dissatisfaction [4].
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 166–171. © Springer Science+Business Media B.V. 2008
167
MINIDMAIC
According to Tayntor [2], DMAIC is the most used Six Sigma tool and also the most suitable for software development. The DMAIC methodology consists of five phases. In the Define phase, the problem is identified and the available opportunities for solving it are determined according to the customer’s requirements. In the Measure phase, the current status is verified through performance quantitative measurements so that the decisions can be taken based on facts, not assumptions. In the Analyze phase, the causes of the current performance are determined and the improvement opportunities are analyzed. The improvement opportunities are implemented in the Improve phase. In the Control phase, a performance control of the implemented process is used to ensure the improvement effectiveness. Before the first phase (Define) is started, the improvement project must be selected. The appropriate selection of an improvement project may have a huge impact in the organization’s business, as its process can become much more efficient in a three to six month period [6]. According to Rotondaro [7], two essential aspects in the Project selection are the existence of a strong relation between the project and a customer specified requirement and a clear economic advantage. In addition to these aspects, Pande [8] defines other three characteristics that make a problem suitable for a DMAIC project: • There is a gap between the actual and the necessary or desired performance. • The cause of the problem is not clear or not understood. • There is no predetermined solution and there is not an apparent optimal solution.
3. Causal Analysis and Resolution in CMMI The Capability Maturity Model Integration (CMMI) [9] is a maturity model for product development processes. Many organizations have adopted CMMI because it guides the implementation of continuous improvements to the development process through its five maturity levels: initial, managed, defined, quantitatively managed and optimizing.
3.1. Maturity Level 5 The focus of maturity level five of CMMI is the continuous process improvement. At level four, the focus is on special causes of process variation while at level five the organization addresses common causes of
process variation and improving the process. Measures are used to select improvements, estimate costs and benefits of the proposed improvements and justify improvement efforts [10]. The improvements must be introduced in a disciplined and ordered way so that the process stability is managed and ensured. The level five is composed of two process areas: Organizational Innovation and Deployment (OID) and Causal Analysis and Resolution (CAR). In the present work, only the CAR process area will be detailed, as it is the focus of the presented approach.
3.2. Causal Analysis and Resolution The goal of this process area (Causal Analysis and Resolution – CAR) is to identify defects and other problems causes and to take actions to prevent their occurrence in the future. There is a strong assumption that the processes must be stable so that CAR can be applied, as the effect evaluation practice requires a process performance and capability change validation. Table 1 presents the relationship between specific goals (SG) and their respective specific practices (SP) for this process area.
SG 1
SG 2
Table 1. Causal analysis and resolution in CMMI [1] Determine Causes of Defects SP 1.1 Select Defect Data for Analysis SP 1.2 Analyze Causes Address Causes of Defects SP 2.1 Implement the Action Proposals SP 2.2 Evaluate the Effect of Changes SP 2.3 Record Data
The effective implementation of CAR practices demands an organization with a quantitatively understood and managed process.
4. MiniDMAIC MiniDMAIC is a strategy that simplifies the DMAIC model in order to address problem causes and resolutions in software development projects in a faster and practical manner, reducing risks and costs, preventing future recurrences and implementing process improvements to increase customers’ satisfaction. The proposed approach is based on the steps defined by Tayntor [2] for the DMAIC model in a software development context. The main characteristics of MiniDMAIC are:
GONÇALVES ET AL.
168
• Short duration (1 to 6 weeks) and small cost; • Requires basic statistics knowledge; • Associated to risks; • Specific to software development projects. The problems that need to be addressed in a more rigorous manner and require a MiniDMAIC approach can be defined in the organizational level and then refined during the project planning. It must be clear the difference between problems that require a simple and immediate action and the problems that are better addressed by a MiniDMAIC approach. Examples of project problems that are suitable for the MiniDMAIC approach: • Project is out of control, considering critical customer’s requirements indicators and deviations from the organization’s goals (e.g.: productivity, schedule deviation, defect density). • Problems whose root cause is uncertain. • Problems related to acceptance criteria. Before initiating a MiniDMAIC effort, the measurements to be performed must be defined based on the critical customer requirements. It may be necessary to tailor the organizational measurements or to define new measurements for the project. Plans for collecting, analyzing, reporting and storing the measurements results must be established according to the defined process for the Measurement and Analysis (MA) process area. Appropriate tools, such as electronic sheets or project management tools, can be used to implement the methodology’s steps.
This step identifies the information source that revealed the problem. Examples of sources in a software Step 2 - development project are: Determine • Customer satisfaction surveys; the • Customer claims; problem • Project indicators; source • Benchmarking researches; • Quality audits. The organization may have a predefined list of problem sources. In this step, the goals of the Step 3 MiniDMAIC effort are defined. The Establish goals should be quantitatively goals defined. In a MiniDMAIC effort there is no need for a black belt leader. As the problems are simpler and directly related to the project, only a basic knowledge about Six Sigma and a MiniDMAIC training should be enough. For a MiniDMAIC project, the most important is domain and Step 4 – management knowledge. Therefore, Allocate it is appropriate that the project the team manager be the MiniDMAIC leader. The size of the MiniDMAIC team can vary according to the complexity of the problem to be addressed. In some situations, only the project manager and another project team member should be enough. Of course, other people can take part in some specific steps.
4.1 Define
Table 2. Steps of the “Define” phase The problem to be addressed must be defined so that its relevance is clearly stated and the goals of the effort are determined. Similar problems that have already been addressed in other projects should Step 1 be identified. This may bring Define the knowledge and agility to the Problem MiniDMAIC effort. It is important to describe the impact or consequences of the problem to the project. This description should focus on symptoms and not in causes or solutions.
4.2 Measure The Measure phase consists of analyzing the measurements that are related to the problem and calculating the current Sigma Level. Table 3 details the steps of the Measure phase.
Measure
Define
The Define phase consists of the definition of the problem, sources of information and goals and the team allocation. The steps of this phase are detailed in Table 2.
Table 3. Steps of the “Measure” phase Measurements that provide further evidence about the problem to be addressed must be conducted in this Step 1 - step. Usually, the measurements Conduct will already be in progress measureme according to the defined process to nts the Measurement and Analysis process area. Brainstorming and results from other projects or other organizations may be useful inputs.
169
MINIDMAIC
Step 2 – Calculate current Sigma Level
In this step, the current Sigma Level for the problem to be addressed must be calculated. This step can also be omitted in case it is not possible to calculate the current Sigma Level (e.g.: some customer claims) or due to lack of resources.
4.4 Improve The Improve phase consists of prioritizing and obtaining commitment to proposed actions, elaborating and executing the action plans and monitoring their results. Table 5 details the steps of the Improve phase.
4.3 Analyze
Analyze
Table 4. Steps of the “Analyze” phase In this step, the problem category must be defined. The organization may have a predefined list of Step 1 – categories to facilitate this activity, Define the standardize the work of the problem participants and improve the agility category to search the organizational knowledge repository. Some examples of categories are processes, technology and people. This is one of the most important steps in MiniDMAIC because its purpose is to identify the problem’s root causes. If this activity is not correctly performed, the Step 2 – MiniDMAIC effort may not be Determine effective, as all the following steps the are based on this one. People with problem large knowledge about the problem must join the MiniDMAIC team to causes provide information on the problem causes. Examples of techniques for determining the problem causes are: brainstorming, cause and effect diagram and the five whys method. Step 3 - In this step, a brainstorming should Determine be performed to find possible possible actions to address the problem. This actions to step will be complimented in the address the next phase when these actions and problem other actions that may be identified will be prioritized. causes In this step, the associated risks will Step 4 – be evaluated. This evaluation must Evaluate comply with the defined process for risks the Risk Management (RSKM) process area.
Improve
The Analyze phase consists of defining the problem category, determining its causes and potential actions for addressing them and evaluating risks associated to the problem’s root cause, as we can see in Table 4.
Table 5. Steps of the “Improve” phase In the Analyze phase, some actions have already been proposed. However, in this step new actions can still arise. The proposed actions prioritizing must be conducted Step 1 - according to the defined process to Prioritize the Decision and Analysis proposed Resolution (DAR) process area. actions This process establishes criteria for selecting the proposed solutions. In case only one action is proposed to address the problem, a feasibility analysis of its execution must be conducted. Before the next steps are performed, an impact analysis of the proposed actions implementation must be Step 2 conducted. This analysis will Obtain summarize the positive and negative commitme impacts of the proposed actions. nt The information must be submitted to high-level management for approval. After the actions are prioritized and approved, an action plan for their execution will be elaborated. This plan must contain the following Step 3 – information: Elaborate • Tasks to be performed; and • Responsible for the task; execute • Estimated effort to perform the action plan task; • Due date for the task. The plan is finished when all its tasks have been performed. In this step, the results achieved through the tasks execution will be monitored. This will provide Step 4 visibility of the problem resolution Monitor process. The results must be results monitored according to defined process for the Project Monitoring and Control (PMC) process area.
GONÇALVES ET AL.
Control
Table 6. Steps of the “Control” phase In this step, the final Sigma Level will be calculated in order to Step 1 – analyze the achieved improvement. Calculate In case the current Sigma Level has final Sigma not been calculated in the Measure Level phase, this step does not need to be performed. During the achieved results evaluation, it must be analyzed if Step 2 – the goal that has been defined in the Evaluate Define phase has been achieved. The cost/benefit of using results MiniDMAIC for addressing the problem must be analyzed too. In the end of the MiniDMAIC effort, the achieved results must be reported to the whole organization. These results may be useful as inputs for addressing similar problems in other projects. Before this communication, however, it is necessary to verify the existence of confidential information that must Step 3 – not be available outside the project. Report The communication must be done main according to the defined process for results and the Organizational Process Pocus lessons (OPF) process area. This process learned establishes how the organization’s lessons learned should be shared with the organization. When potation improvements to the organizational process are identified, they must be sent to the Engineering Process Group (EPG) so that they can be analyzed and handled properly.
Table 7. Relationship between MiniDMAIC steps and CAR specific practices Phase
The Control phase consists of calculating the final Sigma Level, evaluating the results and reporting results and lessons learned. Table 6 details the steps of the Control phase.
presented in Table 7. The existence of a relationship between a step and a practice does not mean that the practices are completely addressed by MiniDMAIC, as the practices and their sub practices are applied for both projects and organizational levels and MiniDMAIC is specific to projects. The next sections detail the MiniDMAIC phases and their respective steps. The phases must be strictly followed so that the work is performed in the problem actual causes and not in its consequences or symptoms. Sometimes some aspects are not considered because of lack of time or judgment mistakes. This leads to incorrect decisions and the problems usually occur again in the future. In order to avoid these problems, a phase should begin only after the end of its predecessor. This approach allows a better comprehension of the processes, what makes problems resolution and processes improvement an easier task.
CAR (Specific Practices)
Step 2 Determine the problem source Step 3 – Establish goals
Step 4 – Allocate team
Analyze
Step 1 Conduct measurements Step 2 – Calculate current Sigma Level Step 1 – Define the problem category
Comments Related to process area Quantitative Project Management
-
-
5. MiniDMAIC and CAR For a better understanding of the relationship between MiniDMAIC anda the Causal Analysis and Resolution (CAR) process area, a mapping between MiniDMAIC steps and CAR specific practices is
MiniDMAIC (Steps) Step 1 - Define the problem
Define
4.5 Control
Measure
170
SP 1.1 - Select Defect Data for Analysis
SP 1.2 Analyze Causes
Related to process area Quantitative Project Management Related to GP 2.7 – Identify and Involve Relevant Stakeholders
MINIDMAIC
Step 2 – Determine the problem causes Step 3 Determine possible actions to address the problem causes
SP 1.2 Analyze Causes SP 1.2 Analyze Causes
Step 4 – Evaluate risks
Improve
Step 1 Prioritize proposed actions
SP 1.2 Analyze Causes SP 2.1 – Implement the Action Proposals
Step 2 - Obtain commitment Step 3 – Elaborate and execute action plan
SP 2.1 – Implement the Action Proposals
Step 4 Monitor results SP 2.2 – Evaluate the Effect Changes SP 2.2 – Step 2 – Evaluate the Evaluate results Effect Changes Step 3 – Report main results and lessons SP 2.3 Record Data learned
Related to process area Project Monitoring and Control
The goal of the MiniDMAIC approach, proposed in this work, is to fill this gap. Beside this, the proposed approach can make the implementation of the CMMI Causal Analysis and Resolution process area less complex and less expensive. This can make the adoption of this process area more feasible for software development projects. MiniDMAIC can help organizations achieve higher maturity levels, increase their customer’s satisfaction and reduce process variation in their search for operational excellence. However, the use of MiniDMAIC is not enough for complying with all the Causal Analysis and Resolution process area, as it is not a goal of MiniDMAIC to address organizational problems. In these cases, the use DMAIC would be more suitable.
7. References Related to GP 2.10 - Review Status with High Level Management
Related to process area Project Monitoring and Control
Step 1 – Calculate final Sigma Level Control
171
6. Final Considerations For decades, the organizations have used a large number of different tools and methodologies for addressing problems with different effectiveness. However, none of them aims specifically at problems related to software development projects.
[1] CMMI-DEV. “CMMI for Development”, V1.2 model, CMU/SEI-2006-TR-008, Software Engineering Institute, 2006. [2] Tayntor, C. B., “Six Sigma Software Development”, Flórida, Auerbach, 2003. [3] Smith, B.; Adams, E., “LeanSigma: advanced quality”, Proc. 54th Annual Quality Congress of the American Society for Quality, Indianapolis, Indiana, 2000. [4] Blauth, R., “Seis Sigma: uma estratégia para melhorar resultados”, Revista FAE Business, nº 5, 2003. [5] Watson, G. H., “Cycles of learning: observations of Jack Welch”, ASQ Publication, 2001. [6] Cabrera, Á., “Dificuldades de Implementação de Programas Seis Sigma: Estudos de casos em empresas com diferentes níveis de maturidade”, MsC dissertation, USP-SP, São Carlos, 2006. [7] Rotondaro, G. R., Ramos, A. W., Ribeiro, C. O., Miyake, D. I., Nakano, D., Laurindo, F. J. B, Ho, L. L., Carvalho, M. M., Braz, A. A., and Balestrassi, P. P., “Seis Sigma: Estratégia Gerencial para Melhoria de Processos, Produtos e Serviços”, São Paulo, Atlas, 2002. [8] Pande, S. “Estratégia Six Sigma: como a GE, a Motorola e outras grandes empresas estão aguçando seu desempenho”. Rio de Janeiro, Qualitymark, 2001. [9] Chrissis, M. B., Konrad, M., and Shrum, S., “CMMI: Guidelines for Process Integration and Product Improvement”, 2nd edition, Boston, Addison Wesley, 2006. [10] Kulpa, M. K., and Johnson, K. A., “Interpreting the CMMI: a process improvent approach” Florida, Auerbach, 2003.
Light Vehicle Event Data Recorder Forensics Jeremy S. Daily, Nathan Singleton, Beth Downing, Gavin W. Manes* University of Tulsa 600 S. College Ave. Tulsa, OK 74104 918-631-3056 [email protected] nathan-singleton @utulsa.edu
Digital Forensics Professionals, Inc. 401 S. Boston Ave. Ste. 1701 Tulsa, OK 74103 918-856-5337 [email protected] [email protected]*
*to whom correspondence should be addressed ABSTRACT While traffic crash reconstruction focuses primarily on interpreting physical evidence, the proper generation and preservation of digital data from Event Data Recorders (EDRs) can provide invaluable evidence to crash reconstruction analysts. However, data collected from the EDR can be difficult to use and authenticate, as exemplified through the analysis of a General Motors 2001 Sensing and Diagnostic Module (SDM). Fortunately, advances in the digital forensics field and memory technology can be applied to EDR analysis in order to provide more complete and usable results. This paper presents a developmental model for EDR forensics, centered on the use of existing digital forensic techniques to preserve digital information stored in automobile event data recorders.
Originally, EDRs were included in vehicles to understand “real-world” crash dynamics and diagnose airbag deployments. EDRs may record any of the following: (1) precrash vehicle performance data and system status, (2) accelerations during the crash, (3) safety restraint system use, and (4) driver control inputs. In 2003, the National Highway Traffic Safety Administration (NHTSA) estimated between 67% and 90% of new passenger vehicles are equipped with an EDR [5]. As vehicle and EDR technologies advance, additional data will be collected and new requirements will be mandated by regulatory agencies.
KEYWORDS Automobile Event Data Recorder, Digital Forensics, Evidence Production, Civil Procedure, Crash Reconstruction
1.2 MODERN EVENT DATA RECORDERS Bosch produces the only third party tool capable for obtaining and interpreting data stored in EDRs as of 2007. Only cars from GM and Ford have easily obtainable crash data due to their licensing agreements with Bosch. GM has contracted with Bosch to decode the binary data stored in many SDMs from 1994-present, and Ford has released their technology to Bosch to use for select vehicles.
1. INTRODUCTION Vehicle collisions cause significant financial and personal damages on a daily basis. The importance of determining their cause has lead to the development of techniques and methods for traffic crash reconstruction. This field comprises the scientific gathering and interpretation of all evidence and using analytical tools to determine the events that precipitate a collision. Traditional evidence includes tire marks, vehicle damage, and other position based measurements. Newer vehicles, however, may augment the traditional evidence with digital evidence. The advent of event data recorders (EDRs) in passenger vehicles has provided digital information concerning traffic accidents and has given crash reconstruction professionals a key analytic tool. Most research concerning EDRs has focused on the use of data coming from the recorder rather than the process of collecting data from them. All methods of retrieving data stored in the EDR vehicle modules are proprietary, either through the use of the Bosch Crash Data Retrieval Tool (CDR) or Hexadecimal Translation Tools (HTT); the former is only useable on some GM, Ford, and partner company automobiles, while the latter is produced by the EDR manufacturers.
While their existence is acknowledged, event data recorders and third party data loggers designed for monitoring and fleet management are beyond the scope of this paper.
Accident information is gathered by placing volatile memory within a data loop through which the information of interest passes. When an accident or other catastrophic event occurs, the computer automatically dumps the last five seconds of data from volatile to solid-state memory, where it can then be downloaded and investigated by proprietary cables and software. A good explanation of this process can be found in Chidester, et al [1]. The information currently captured by domestic and foreign passenger automobiles varies by the car’s make, model, and vehicle age. For example, only select GM cars from 1990 to the present contain storage for braking computer and air bag deployment information. Due to the increased inclusion of computers in automobiles for operational purposes, there is a large amount of untapped data that could be collected. The advances in memory technology also allow for greater storage capacities, either by increasing the number of inputs or a larger time span of collected data.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 172–177. © Springer Science+Business Media B.V. 2008
173
LIGHT VEHICLE EVENT DATA RECORDER FORENSICS
When considering the release of event data, concerns exist regarding accuracy, reliability and privacy. Therefore, the Society of Automotive Engineers (SAE) and the Institute for Electrical and Electronics Engineers (IEEE) joined with NHSTA to form working groups to discuss these issues [4]. The majority of the scientific inquiries concerning EDRs have focused on using data once recovered and decoded [2]. Most of the current research in this area has centered on the reliability of data only as it pertains to supporting the physical evidence of the case [6]. 1.3 EVENT DATA RECORDER REGULATIONS The NHTSA’s Title 49 CFR Part 563 defines what constitutes an EDR, standards for information collection and survivability, and accessibility / availability requirements. This rule was originally published August 28, 2006 for comments, and is expected to be finalized in the fourth quarter of 2008. Part 563 is solely concerned with “light-duty vehicles” consisting of models for sale to the general public with a maximum vehicle weight of 8,500 lbs. Initially, the request for comments stated that information contained on the EDRs be made openly available to the public. Due to concerns raised by the manufacturers, it was changed to state the following: Each manufacturer of a motor vehicle equipped with an EDR shall ensure by licensing agreement or other means that a tool(s) is commercially available that is capable of accessing and retrieving the data stored in the EDR that are required by this part. [7] These tool(s) are required for all model year 2011 vehicles, which will begin production starting September 2010. For vehicles equipped with an EDR, this rule defines a minimal data set that must be recorded (see Appendix A). Additionally, Part 563 defines data elements to be stored if the device(s) are capable of recording them, or if certain features are installed in the automobile such as multi-stage frontal airbags. Currently, there are no standards concerning the collection of information stored in other locations, such as the anti-lock brake system (ABS). At the 2007 Highway Vehicle Event Data Recorder Symposium, the NHSTA stated that it would look at other sources of data in vehicles once Part 563 has been finalized. The Society of Automotive Engineers (SAE) is working closely with the American Trucking Association (ATA) to prepare SAE Standard J2728, which contains standards for medium and heavy vehicle EDRs. Much of the J2728 uses the ATA Technology & Maintenance Council Recommended Practice guideline, TCM RP 1214: Guidelines for Event Data Collection, Storage and Retrieval. NHTSA has indicated that they will develop a ruling after completion of light-duty vehicle EDR work to cover medium and heavy vehicles. This ruling is expected to resemble TCM RP 1214 and J2728. 2.0 CRASH DATA RETRIEVAL TOOL (CDR) The CDR is currently the only publicly available product that has the capability to download data from an EDR. This system was developed in participation with GM, and has licensing agreements with GM, Ford, and [soon] Chrysler. At
present, the CDR is only capable of downloading data from select vehicles manufactured since 1994. 1994 – 2007 Buick Cadillac Chevrolet GMC Hummer Isuzu Oldsmobile Pontiac Saab Saturn
2001 – 2007 Ford Lincoln Mercury
TABLE 1 – A brief list of possible CDR supported vehicles. The list of supported vehicles is small, but information can be gathered from vehicles not included on this list through the investigation of persistent memory, which is common in many Electronic Control Modules (ECMs). Many of these ECMs are used by manufacturers for product improvement, and although not centralized, their data can be accessed and can be recovered for use in crash reconstruction. 2.1 USING THE CDR TOOL The connection between the computer and CDR Interface Module is a standard 9-pin RS-232 cable. To connect the Interface Module to the EDR, a special 15-wire cable must be used. The connection to the Interface Module has a standard serial 15-pin connector, but only contains the pins required to make the connection: for most EDRs, two lines are necessary for power and one for signal. The other end of the cable has a specialized connector for the EDR or a vehicle’s OBD-II or DLC port (described later).
Figure 1 – A schematic showing an example CDR/EDR connection. Connecting to the EDR can be accomplished in two ways. In the field, the primary method is to connect through the OBDII or DLC diagnostic ports, typically located under the driver’s side dashboard. This technique is only possible if electrical power to the EDR can be restored. The second method requires direct access to the EDR, and is used if the electrical system has been compromised or if the examination is taking place in a laboratory environment. Although there is no standard location for the EDR, it is typically located inside
174
DAILY ET AL.
the vehicle cabin near the centerline of the vehicle or under/in one of the front seats. Extraction of the EDR typically requires disassembly of the interior. Once the method of connection to the EDR is determined, the CDR Interface Module is connected to the EDR through one of the provided special purpose cables and to the computer via the RS-232 cable. Power is applied to the EDR through a lead attached to the CDR Interface Module. An example CDR/EDR connection is presented in Figure 1. 2.3 CDR AND EDR COMMUNICATION Initially, CDR software communicates with the serial communications port, then checks for the presence of a CDR Interface Module and begins a new case. Then, the investigator must enter case-specific information such as the Vehicle Identification Number (VIN), investigation date, etc. ) For example, when an EDR from a 2001 GMC Sierra 1500 is connected, the CDR software sends 53 56 47 10, and the EDR returns the code 88 59 91 17 00 00 77. The 91 17 sequence corresponds to the specific SDMG2000 EDR, which is a GM product, where G2000 refers to the version. This EDR variant identifier code is used to determine the type of EDR installed, which dictates the cable to be used, specific commands needed, and setup requirements for the CDR Tool. After the EDR model information is obtained, the CDR sends a dump command, which varies according to the EDR model. The dump instruction downloads the nonvolatile memory from the EDR into the CDR Interface Module: when the process is complete, the sequence D0 56 47 93 is sent to the CDR software. At this point, the information is stored in volatile memory on the CDR Interface Module. The CDR software then sends a command to ship approximately half of the downloaded data for processing. This returns the following hexadecimal data: EF 46 03 FA FF FF FF FF FF
D6 33 A4 FA FF FF FF FF FF
01 42 34 FA FF FF FF FF FF
91 39 80 FA FF FF FF FF 81
17 32 83 FA FF FF FF FF
00 00 81 FA FF FF FF FF
00 15 85 FA FF FF FF FF
A7 76 70 FA FF FF FF FF
18 31 FF FF FF FF FF FF
41 80 00 02 FF FF FF FF
53 A3 FA 00 FF FF FF FF
30 A5 FA 00 FF FF FF FF
33 A4 FA 00 FF FF FF FF
34 F8 FA FF FF FF FF FF
30 AC FA FF FF FF FF FF
4B 00 FA FF FF FF FF FF
The beginning of file, EF D6 01, and the end of file, 81, are not part of the information downloaded by the CDR, but are appended to the data in the CDR Interface Module. After the beginning of the file marker, the code 91 17 appears, which corresponds to the EDR type. The rest of the data is downloaded with sequence EF 5A 01 1F 80 5E 00 B9. For this set, 80 marks the start position and 5E marks the end. The CDR Interface Module sends the following: EF BF 02 00 20 FF FF
B4 FF 00 61 20 FF 98
01 FF 00 70 20 FF
FF FF 00 70 20 FF
FF FF 00 6E 20 FF
FF FF 00 6C 00 FF
FF FF 00 6A F8 FF
FF FF 00 00 25 FF
FF FF 00 80 FE FF
80 FF FF 00 00 FF
00 FF FF 00 00 FF
00 7C FF 73 00 FF
FF 04 FF 73 04 FF
80 03 FF 73 00 FF
FE 01 0A 73 FF FF
FF 01 10 00 FF FF
Similar to the first portion of the data above, EF B4 01 is used for the beginning of file marker and 98 is the end of file marker, both of which are appended to the data by the CDR Interface Module. The entire retrieval process, beginning with the CDR sending the EDR a dump command, is repeated two more times for a total of three passes. Once the CDR software completes the retrieval process it analyzes the EDR data and generates a report. The report is placed in a temporary file, which will be deleted unless saved prior to exiting the program. 2.4 CDR ANALYSIS, FILES, AND REPORT Once collected, the CDR software analyses the EDR data and generates a report. For example, acceleration data is passed through an algorithm to generate cumulative change in velocity. Once complete, the CDR Tool saves reports as *.CDR or *.pdf. The *.CDR file contains the data displayed in the actual report, formatting data, and error checking data in the form of a hash value and field size counts. Current analysis of the CDR file shows that the CDR data uses a simple file format. A common element in the hex data is the sequence 0D 0A which is used as both a delimiter to separate the fields inside the form and as the Carriage Return/Line Feed. Additionally, hex 20 is used as the space character and to fill fields of set size within the form. Three sources of data are contained within the dump. The User Entered Data includes information such as VIN, investigator’s name, case number, comments, etc. While the user-entered fields “Investigator”, “Case Number”, “Investigation Date”, and “Crash Date” have a maximum of 64 characters, the “Comments” field varies based on the amount of data entered by the user. Preceding the actual data is a data size marker of two bytes. EDR-supplied data size will vary depending on the amount of data within the EDR; therefore, a size marker for the field will be annotated at the beginning of the data. There are also two hash values located in the hex data. The last hex code value is the Reporting Program Verification Number and the Collecting Program Verification Number, both of which are displayed in the CDR report. The hash value that appears toward the middle of the hex is used to ensure the data has not been altered. The actual EDR data from a GMC Sierra 1500 appears later in the file and is preceded by a size field in byte big-endian. Here, DE 00 computes to 222 bytes, which is the number of bytes stored on the EDR including the initial padding of six sets of zeros. The reason for the padding is unknown. 00 00 00 00 00 00 91 17 00 00 A7 18 41 53 30 33 34 30 4B 46 33 42 39 32 00 15 76 31 80 A4 A6 A5 F8 AD 00 03 A4 34 80 84 81 85 70 FF 00 FA FA FA FA FA FA FA FA FA FA FA FA FA FA FF 02 00 00 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
LIGHT VEHICLE EVENT DATA RECORDER FORENSICS FF FF FF FF FF FF FF FF FF FF 80 00 00 FF 80 FE FF BF FF FF FF FF FF FF FF FF FF FF 7C 04 03 01 01 02 00 00 00 00 00 00 00 00 FF FF FF FF FF 0A 10 00 61 70 70 6E 6C 6A 00 80 00 00 73 73 73 73 00 20 20 20 20 20 00 F8 25 FE 00 00 00 04 00 FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF
Within the file there are two hash values used to verify data. The exact hash functions are unknown. Based on experimentation including attempts at data manipulation and determining the hash functions used, it appears that the data is rehashed every time the CDR Tool is asked to open the document. The new hash would then be compared with the old and if they are not the same, then an error is expunged to the user and the program exits. Although this file format appears to be simple, no public description exists and there is no method to review several of these fields within the CDR software. Furthermore, the reports do not contain key data a digital forensics examiner would come to expect, such as hash values or file verification data. 3. EDRS AS EVIDENCE Current techniques for removing and preserving automobile data do not meet the standards for electronic evidence as outlined by the Federal Rules of Civil Procedure. However, in the past, courts have recognized data from EDRs during trial. The defendant in one of the first trials was General Motors [3], meaning that the tools available from General Motors may have met the requirements. Unfortunately, the only tool available to crash investigators may contain gaps in the binary data record. The presence of any kind of discrepancy or lack of authentication opens the door for acquittals or overturned verdicts. As stated previously, the Bosch/Vetronix CDR is the only publicly available tool to download EDRs. As with many newer systems, there are unresolved issues such as discrepancies in the data between passes and registry numbers being incompletely listed in the output. Since the digital form of the evidence has some issues, the concept of “best evidence” must be used when dealing with actual cases. For an EDR, the best evidence is the first copy of the report generated by the investigator. This means proper evidence handling of the event data recorder with a well documented chain of custody is required. Furthermore, data gathered from the EDR should also corroborate physical evidence. Addressing the accuracy of the data contained in an EDR is beyond the scope of this paper. 3.1 MISSING AND UNINTERPRETED DATA The information presented to the user by the CDR is difficult to understand and presents some issues with data consistency. The report includes downloaded data in hexadecimal format, as well as the presumed registry numbers the data has been pulled from. The registries from the aforementioned 2001 GMC Sierra 1500 contain 6 bytes of data, but others have less. There is no reason given for this difference in the
175
associated documentation, and there appear to be missing segments of these registers in the program. It is unclear whether the registers simply do not contain any information (which seems unlikely due to the lack of the FF sequence), or even more unlikely they do not exist in the EEPROM. Another issue with the presentation of information is that the printout also displays data which is not converted by the program. It is likely that this data is proprietary manufacturer information (i.e. deployment thresholds) and constitutes what is presumed to be the beginning of the data. However, when the data transfer is monitored with a sniffer, discrepancies appear between the different passes. The system’s method of choosing or inserting data is unknown, making it difficult to determine the effectiveness and repeatability of these results. 3.2 DATA COLLECTION STANDARDS In comparing the hexadecimal data from the three data dump passes, discrepancies were noted between each pass in the form of differing bytes. If multiple downloads were performed in short order (each download taking three to five minutes) the number of discrepancies increased. These changed bytes have been found in the first portion of the download from the CDR Interface Module. Examination shows the values of the bytes to be random and not a counter as might be expected. For example in six runs of the software the first two runs showed the same data being sent by the CDR Interface Module. However, there were discrepancies when comparing the three passes in each run. The next four runs also contained discrepancies in the hexadecimal values. When the CDR compiles the report, a “final” copy of the hexadecimal value is included. A comparison of the hexadecimal values from the passes and the report for the first two runs is how the discrepancies were discovered. In some cases, the report copy of the hexadecimal code was a match for different portions of one pass. In other cases, the bytes displayed in the report did not match any of the passes. Comparing the last four runs with the first two shows the discrepancies to occur in the different passes and in the report’s displayed hexadecimal data. Additionally, nowhere in the *.CDR file are the actual three passes which the CDR collects from the EDR. This implies that once the data is read from an EDR the CDR software performs any required calculations and presents a “final version” of the EDR data in its report with the original data apparently being deleted. 3.3 MULTIPLE LOCATIONS FOR DATA Vehicles may contain multiple ECMs to control the different higher–level functions of the vehicle (i.e. ABS, Traction Control, and Power Train). Many of these devices may contain data that is invaluable to the investigator. While GM has taken an central approach to recording pre-crash data in the SDM, Ford takes a distributed approach and stores imperative data in two different modules: (1) the Restraint Control Module (RCM) and (2) the Powertrain Control Module (PCM).
176
DAILY ET AL.
Ford’s RCM makes airbag deployment decisions and runs the necessary system diagnostics. In the event of an accident the RCM will typically record acceleration data, cumulative velocity change (Delta V), seatbelt usage, and other restraint related information similar to the GM EDRs. However, Ford records vehicle speed, brake and throttle usage, and related information in a separate module, the PCM, but only on post– 2003 vehicles installed with Electronic Throttle Control (ETC). The primary function of the PCM is to run engine diagnostics while controlling fuel–air mixture and spark. This data was only retrievable from the PCM by Ford until the recent introduction of support in the Bosch CDR tool. The Ford PCM continuously records data in a circular buffer which holds 25 seconds of data recorded in 200 ms intervals. When the 25 second limit is reached the data begins recording over the data starting at the initial memory location. In the event of a crash, data is written for an additional 5 seconds, assuming power has not been interrupted, and then stops. Since the PCM has no accelerometer, it relies on the “RDI_FLG” bit coming from the RCM. If the RDI_FLG bit is set to 1 then an airbag deployment has been commanded. If the RDI_FLG is 0, then the airbag status is not deployed or unknown. The data is locked in memory and cannot be overwritten for a predetermined number of key cycles (or application of power to the unit). However, in the event of a power loss during a crash, all recording in the PCM stops. The data is not locked and can be overwritten upon the next application of power. Because of this, Ford warns investigators that PCM data can be lost if the vehicle ignition cylinder is turned to the on position, so investigators should never attempt to utilize the vehicle’s power system. This presents a challenge for digital forensics specialists since a very detailed accounting of the status of the PCM is required to generate acceptable evidence in court. A complete examination and interpretation of the data contained in the Ford modules is available by sending the modules to Ford (at a cost). The engineers at Ford retrieve the data and send back a report. Currently, authentication of the data, which includes matching the modules to the vehicle, is performed by the evidence technicians. For example, on a MY2005 Ford Crown Victoria, the digital record from the PCM apparently does not contain any vehicle identification information. This may raise the scepter of doubt in a trial setting. [9] 3.4 UNKNOWN METHODS The final data produced in the report is passed through an algorithm prior to being displayed. As stated at the front of the CDR report: Once the crash data is downloaded, the CDR tool mathematically adjusts the recorded algorithm forward velocity data to generate an adjusted algorithm forward velocity change that may more closely approximate the forward velocity change ... The SDM Adjusted Algorithm Forward Velocity Change may not closely approximate what the sensing system experienced in all types of events. There is no description of the algorithm or how it knows what is “more closely approximate” to what the vehicle experienced. A common approach for analyzing the crash
acceleration pulse is to use a modified haversine function. This function closely approximates the accident pulse and is used to help smooth out these curves, but it is unknown whether it is applied here. 3.5 EVIDENCE IDENTIFIER Initially it was thought the CDR software used information from the VIN, World Manufacturer Identifier, Vehicle Attributes, and Model Year (MY) to determine what type and version of EDR was being read. However, it was discovered that any information could be entered as long as the World Manufacturer Identifier corresponded to a manufacturer supported by the CDR and followed the VIN formatting requirements. [8]. Figure 2 shows the breakdown of VIN codes for vehicle manufacturers producing more than 500 cars per year. 1
2 3 4 5 6 7 8 9 World Check Manufacturer Vehicle Attributes Digit Identifier 11 12 13 14 15 16 Plant Code Sequential Number
10 MY 17
Figure 2 - VIN Breakdown. 3.6 CLEAN SOURCE MEDIA In the most egregious exception to evidentiary requirements, there are no indications that the CDR Interface Module memory is wiped between downloads. In fact, during experimentation, power was applied to the CDR Interface Module and the data in memory was requested prior to performing a download of an EDR. The CDR Interface Module returned the following data. Sent: EF 5A 01 Received: EF D6 01 06 07 08 0D 0E 0F 16 17 18 1D 1E 1F 26 27 28 2D 2E 2F 36 37 38 3D 3E 3F 46 47 48 4D 4E 4F 56 57 58 5D 5E 5F 66 67 68 6D 6E 6F 76 77 78 7D 7E 7F
1F 00 80 00 17 00 09 10 19 20 29 30 39 40 49 50 59 60 69 70 79 7ª
01 0A 11 1A 21 2A 31 3A 41 4A 51 5A 61 6A 71 7A
02 0B 12 1B 22 2B 32 3B 42 4B 52 5B 62 6B 72 7B
03 0C 13 1C 23 2C 33 3C 43 4C 53 5C 63 6C 73 7C
04 05 14 15 24 25 34 35 44 45 54 55 64 65
Sent: EF 5A 01 Received: EF B4 01 86 87 88 8D 8E 8F 96 97 98 9D 9E 9F A6 A7 A8 AD AE AF B6 B7 B8 BD BE BF C6 C7 C8 CD CE CF D6 D7 D8 DD 49
1F 80 5E 00 B9 80 89 90 99 A0 A9 B0 B9 C0 C9 D0 D9
81 8A 91 9A A1 AA B1 BA C1 CA D1 DA
82 8B 92 9B A2 AB B2 BB C2 CB D2 DB
83 8C 93 9C A3 AC B3 BC C3 CC D3 DC
84 85 94 95 A4 A5 B4 B5 C4 C5 D4 D5
74 75
So it seems that at initial turn-on the CDR Interface Module memory is set to unknown predefined data, and that data is available for download as authentic data. 4. FUTURE WORK The aforementioned pitfalls of the CDR software can be addressed by applying long standing digital forensics principals and methodologies. Although the collection of automobile data for the purposes of accident reconstruction is a relatively new field, many lessons can be taken from the evolution of the digital forensics process. It is relatively
LIGHT VEHICLE EVENT DATA RECORDER FORENSICS
simple to apply the same process of creating and displaying a hash value for collected data and implementing the current required chain of custody documentation to automobile data collection events. The computer types and amount of data available for automobiles varies widely when compared to the standard laptop or desktop computers, or smaller digital devices such as cell phones and PDAs. Automobile records currently store only a minute amount of information, 1 kilobyte or less, which serves to simplify the length of the collection process. Many of the standard digital forensics tools could be used on automobiles, and less cumbersome programs could be implemented due to the small amount of data being analyzed. Lessons can also be taken from the airline industry and National Transportation Safety Board (NTSB), where the recovery of black box data is both standardized and wellresearched. Legislation has been proposed to mandate a common set of information recorded on all model year 2011 vehicles and later. In the event of such legislation, sound practices and procedures must be in place in order to most effectively transition to the widespread use of such information. The possibilities for future work in the field of accident reconstruction are significant. Identification of automotive systems beyond those identified by Bosch GmBH including EDRs and ECMs other than those used by GM or Ford would go a long way towards the ability to reconstruct a variety of accidents. Additionally, it is important to develop techniques to decode the data obtained from the different vehicle systems without the use of proprietary decoders. The development of these techniques would be facilitated by the establishment of a lab that can reproduce signals needed to decode the various EDRs. This lab would use both computer simulation of vehicles and mechanical actuators to reproduce crash scenarios. Of course, the necessity to provide some kind of authentication for automobile data is quite important for presenting this information in court. This includes the verification of the write process from the RAM to the EEPROM. This verification of data is one of the most important components of the future of automobile accident reconstruction. 6. CONCLUSION The accident reconstruction industry has the important task of finding facts regarding automobile collisions. Currently, digital evidence extraction from automobiles is not conducted
177
with the same rigor as with more traditional digital forensics applications. This paper addresses some of these current shortcomings in the data collection technique. Lessons learned from the fields of airline crash investigations and digital forensics can provide models for the future of the event data recorder in the accident reconstruction industry. 7. REFERENCES [1]
Chidester, A., Hinch, J., Mercer, T.C., Schultz, K.S., “Recording Automotive Crash Event Data” International Symposium on Transportation Recorders, National Highway Traffic Safety Administration, May 3-5, 1999.
[2]
Fay, R., Robunette, R., Deering, D. Scott, J., “Using Event Data Recorders in Collision Reconstruction,” Society of Automotive Engineers, Warrendale, PA, SAE Technical Paper 2002-01-0535.
[3]
Harris v. General Motors Corp., Electronic Citation: 2000 FED App. 0039P (6th Cir.), File Name: 00a0039p.06
[4]
IEEE Project 1616 Draft Standard Site; http://grouper.ieee.org/groups/1616/home.htm, last accessed 9 April 2007.
[5]
“Preliminary Regulatory Evaluation, Event Data Recorders,” Docket No. NHSTA-18029, December 2003, Available from http://dmses.dot.gov/docimages/pdf89/283747_web.p df Last accessed 8 April, 2007.
[6]
Sebastian A. B. van Nooten, James R. Hrycay, “The Application and Reliability of Commercial Vehicle Event Data Recorders for Accident Investigation and Analysis” Society of Automotive Engineers, Warrendale, PA, SAE Technical Paper 2005-01-1177.
[7]
“Event Data Recorders”, 49 CFR Part 563, Docket No. NHSTA-2006-25666,
[8]
Code of Federal Regulations 49 Chapter V part 565 Vehicle Indentification Number Requirements
[9]
West, Orin, Presentation: “Ford EDR: Current and Future”, NHTSA and SAE Highway Event Data Recorder Symposium, Washington D.C., September 2007.
Research of Network Control Systems with Competing Access to the Transfer Channel
G.V.Abramov, A.E.Emelyanov, M.N.Ivliev
Abstract: The original method of modelling of a network control system as continuous - discrete with a random structure is submitted in the article. The method allows to take into account stochastic character of data transfer via channel, to determine the areas of stability of control systems and to choose the most effective algorithms of control by the distributed technological systems. Moreover, the application of the given method for the analysis and calculation of areas of network control systems stability with competing access to the data transfer channel is shown by the example of standard Ethernet.
Keywords: network control systems, modelling, stability, Ethernet.
I. INTRODUCTION Increase of functioning efficiency of modern open information control systems of the industrial enterprises demands to apply the most productive communication channels. Information networks using wire and wireless Ethernet-technologies (Ethernet, Fast Ethernet, Gigabit Ethernet) now possess the best ratio “speed of the data transfer - the price”. The multiple method of access to the media of data transfer allows to transfer information in the most effective way. In this connection a number of large international firms (Motorola, Freescale, etc.) have begun industrial output of field devices for control systems on Ethernet basis. At the same time plenty of articles devoted to synthesis and the analysis of such control systems has been published. Thus some researchers developing these systems try to exclude uncertainty of transfer time at the expense of reports with guaranteed transfer time [1]; therefore the efficiency of systems is sharply reduced. Others put into the basis the restriction of the information transfer maximum time in the process of designing and do not take into account stochastic character of delays [2] that does not allow to 1 Sergey Orlov. Ethernet in systems of industrial automation. LAN, 2002, Vol. 6 pp. 7-12. 2 Georges J,-P., Vatanski N., Rondeau E., Jamsa-Jounela S.-L. Use of upper bound delay estimate in stability analysis and robust control compensation in networked control systems. In: 12th IFAC Symposium on Information Control Problems in Manufacturing (INCOM ‘ 2006), Vol. 1 pp.107-112.
guarantee the control systems functioning stability and reduces their efficiency. In the distributed control systems using network with multiple access the time of a package delivery has a random character. In this connection it is offered to model elements of such control system as sets « the device - stochastic network system » taking into account the collisions in a communication channel [3]. Thus the law of the information delivery time distribution in a control system has been determined. The original method of research and modelling of a control system as continuous - discrete with the random structure changing at the moment of data transfer time is offered. That will allow taking into account stochastic character of data transfer time via channel, to determine control systems areas of stability, to synthesize the most effective control algorithms of the distributed technological systems. II. THE MATHEMATICAL MODEL OF NETWORK CONTROL SYSTEM Consider network control system (NCS) in which the data from digital sensor (DS) to digital regulator (DR) or controller are passed via multiple access channel (MAC). The function scheme of system is shown in Fig. 1. 3
Abramov G.V., Emelyanov A.E., Kolbaya K.Ch. Mathematical Model of Information System with Multiple Access to the Data Link. In: Control Systems and Information Technology, 2006, Vol. 4 (26), pp. 5-8.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 178–183. © Springer Science+Business Media B.V. 2008
179
RESEARCH OF NETWORK CONTROL SYSTEMS WITH COMPETING ACCESS
Representing (1) in the other form and finding the Laplace transform for signal s (t ) , we shall receive: S ( p) = Fig. 1. Function scheme of NCS with information transfer via MAC
DS reads out output signal of object of regulation (OR) (process) during the certain moments of time, which generally can be random. Received data DS passes to DR in MAC. As MAC we shall consider network Ethernet realizing a method of carrier-sense multiple access with collision detection (CSMA/CD). Characteristic feature of the given method is undetermined time of data transfer via a network from one terminal to another. It is connected with an opportunity of occurrence of collisions, when some devices simultaneously begin data transfer. Assume, that MAC can use some NCS; then DR obtains the data from DS with some random delay. DR uses received data for the development of control influence on OR. For the analysis of the given NCS functioning it is necessary to develop mathematical model of MAC. We shall consider MAC as an NCS element. The signal from DS z (t ) enters its input, and the output signal s (t ) enters DR input. We shall represent DS output signal z (t ) as the signal of a constant level changing at the moment of ОR output signal y (t ) reading time. MAC output signal s(t ) we shall model as the signal of a constant level changing at the moment of time of receiving data DR from DS. Then it is possible to present change of signals y (t ) , z (t ) , s(t ) in time as follows (Fig. 2).
1 ∞ ⋅ ∑ [z (t i ) − z (t i −1 )] ⋅ e −ti ⋅ p (2) p i =−∞
Taking into account, that MAC output signal s (t ) is constant in a time interval T = [ti −1; ti ] and is equal to DS output signal during the previous moment of data receiving, then z (ti −1 ) = s (ti )− . The designation s(ti )− specifies the value of s (t ) corresponding to the moment t , aspiring to ti at the left. DS output signal at the moment of data transfer on DR is also constant: z (ti ) = z (ti )− . Then (2) can be presented as the following: S ( p) = =
[
]
1 ∞ ⋅ ∑ z (t i )− − s(t i )− ⋅ e −ti ⋅ p = p i =−∞ 1 ∞ ⋅ ∑ e(t i )− ⋅ e −ti ⋅ p p i =−∞
(3)
Where e(ti ) = z (ti ) − s(ti ) . We shall introduce the following designations: E * ( p) =
∞
∑ e(ti )− ⋅ e −t ⋅ p i
(4)
i =−∞
Then: e * (t ) =
∞
∞
∑ e(ti )− ⋅ δ (t − ti ) = e(t )− ⋅ ∑ δ (t −t i ) =e(t )− ⋅ g к (t )
i =−∞
Where e(t ) = z (t ) − s(t ) , g к (t ) =
i =−∞
∞
∑ δ (t − ti ) - sequence of the
i = −∞
deltas - pulses arising at the moment of data transfer to the DR. On the basis of (3) it is possible to offer the following block diagram for MAC as NCS element (Fig. 3), where
Fig. 2. Change of signals NCS in time; ti - the random moments of data transfer from DS to DR via MAC.
On the basis of such signals description for signal s (t ) it is possible to write down the following: s (t ) = " + z (t 0 ) ⋅ [u (t − t 0 ) − u (t − t1 )] + + z (t1 ) ⋅ [u (t − t1 ) − u (t − t 2 )] + " = =
(1)
∞
∑ z (t i )⋅ [u (t − t i ) − u(t − t i +1 )].
i = −∞
Where u (t ) - single function, z (ti ) is a value of DS output signal at the moment ti of data transfer to DR.
Tiк = ti +1 − ti - the period between quantization of a signal is assumed as random with the density of probability distribution f к (T ) , which characterizes process of data transfer in MAC from DS to DR and obviously takes into account an opportunity of collision occurrence in the channel.
Fig. 3. Block diagram of the multiple access channel as element NCS
The basic difference of this diagram from the traditionally accepted in the theory of pulse systems is in the fact, that after
the quontizer the signal E * ( p ) corresponding to (4) enters the input of the integrator.
180
ABRAMOV ET AL.
The differential equation for MAC output signal can be presented as s(t ) = z (t )− − s(t )− ⋅ g к (t ) . To preserve the uniform approach to the development of mathematical NCS model and taking into account everything above-stated, we shall represent DS as the following block diagram (Fig. 4).
[
]
matrix differential equation for the considered NCS in the following way: 2
Y (t ) = C ⋅ Z (t ) − ∑ B j ⋅ Y (t )− ⋅ g j (t ) − − A ⋅ Y (t ) + D ⋅ V (t ) (5) j =1
Where A, B j , C - matrixes of the n × n dimensions; Y (t ) - n -dimensional vector of phase coordinates;
Z (t ) - n -dimensional vector of regular influences;
V (t ) - n -dimensional vector of white noises with a
matrix of spectral density S (t ) ; D - matrix of the n × n dimensions; sequences of deltas g j (t ) -
( )
Fig. 4. Block diagram of digital sensor (DS): W s p - transfer function of a continuous part of the sensor (a sensitive element), ∞ E s* p = e s t i − ⋅ e −ti ⋅ p , e *s t = e s t − ⋅ g s t , g s t - the i =−∞ sequence of deltas - pulses appearing at the moment of output signal quantization of a continuous part of the sensor.
( )
∑ ( )
()
()
()
()
The period of quantization for DS Tis is assumed as a random variable with density of probability distribution f s (T ) . It is necessary to notice, that quantization with the constant period can be considered as a special case of stochastic quantization. Assume, that the digital regulator influences object of regulation with a small enough step of quantization, that allows considering DR as an analog one with a transfer function Wr ( p ) . Thus, it is possible to present NCS with information transfer via the multiple access channel by the following block diagram (Fig. 5).
-
pulses:
g1 (t ) = g s (t ) , g 2 (t ) = g к (t ) . To simplify recordings, here and further, a common dependence of all variables on time t is omitted. Note that not only coordinates of system, but also coordinates of forming filters are understood as a vector of phase coordinates. The analysis of the (5) testifies to that it is stochastic. However, the theory of systems with the random structure examines Markov random processes. To make (5) the Markov one, it is necessary for random discrete sequences g j (t ) to
have the Poisson distribution. However usually the laws of distribution f к (T ) and f s (T ) are not of Poisson nature. To bypass this difficulty, we approximate them by common laws of Herlang of the corresponding order [4]. Thus it is possible to distinguish a final number of states, in each of which the system is described by the equation such as (5), and discrete sequences in them have the Poisson character. Each of these laws can be presented by its graph on the basis of which it is possible to construct common graph of the system state with regard to possible statistical dependence between the moments of quantization in streams g j (t ) .
Fig. 5. NCS scheme with information transfer via MAC
Assume, that the object of regulation and a sensitive element of a sensor are described by the linear equations, then, designating phase variables through yi (t ) , and regular inputs through zi (t ) , it is possible to write down the vector-
4 Artemyev V.M. The theory of dynamic systems with random changes of structure. - Мinsk.: Visshaya shkola, 1979. - 160 p.
181
RESEARCH OF NETWORK CONTROL SYSTEMS WITH COMPETING ACCESS
T α ⎛ ∂ ⎞ (1,1) (1,1) (Y ) + ∑ λ j1 ⋅ f (1,1) (Y ) = −⎜ ⎟ ⋅ Π (Y ) − ν 1 ⋅ f ⎝ ∂Y ⎠ j =1
Development of the common model, which takes into account all varieties of possible graphs, appears to be rather difficult. Proceeding from this, we shall develop the mathematical model of NCS at independent moments of quantization of signals in g j (t ) . It meets such operating mode
(
of NCS at which DS during random moments of time reads out OR output signal, and the data to a regulator pass by MAC at the moment of time when the channel is free. Thus the new data as well as the data received earlier pass to the regulator. Assume that the law of distribution f s (T ) can be approximated by a common law of Herlang of the α-order, and the law f к (T ) of the β-order, then a graph of state for the law f s (T ) will have α state graph, for the law f к (T ) - β state graph. Assume, that quantization of a signal, both in the first, and in the second stream occurs at the moment of time when the system gets in the first state corresponding to the graph. Write down Kolmogorov’s equations for corresponding graphs. For the graph with α-states: Piα = −λi ⋅ Piα +
α
∑ λki ⋅Pkα , i = 1, α , λi
k =1≠i
α
=
∑ λij
P jβ = −γ i ⋅ P jβ +
∑ γ mj ⋅Pmβ ,
(
m =1≠ j
+
i =1
= 1;
β
∑ P jβ
j =1
T
⎛ ∂ ⎞ ( k , m) f ( k , m) (Y ) = −⎜ (Y ) −ν k ⋅ f (k , m) (Y ) + ⎟ ⋅Π ⎝ ∂Y ⎠ +
β
α
∑ λ jk ⋅ f ( j, m) (Y ) + ∑ γ jm ⋅ f (k , j ) (Y ),
j =1≠ k
j =1≠ m
α
β
j =1
j =1
α
β
j =1≠ l
j =1
ν1 =
∑ λ1 j + ∑ γ 1 j ;
νl =
∑ λlj + ∑ γ 1 j ,
α
β
j =1≠ k
j =1≠ m
∑ λkj + ∑ γ mj ,
νi =
l = 1, α ;
α
β
j =1
j =1≠ i
∑ λ1 j + ∑ γ ij ,
i = 1, β .
(Y ) - vector of probability density stream for the Π state (i, j); F j = I − B j (i , j )
Statistical research of dynamic systems of random structure is reduced to determining probable moments of phase coordinates: mean values and the correlation moments, the differential equations for which look like that: •
M (t ) =
j =1
We shall develop a common model for the given mode of NCS functioning. We shall introduce additional transitions intensities: λ11 and γ 11 , for which: λ11 = 0 at α > 1, γ 11 = 0 at β > 1 . Using a technique, stated in (5), we shall receive Kolmogorov - Feller equations for the considered system phase coordinates density of distribution f (Y , t ) for all possible states:
νk =
k = 2, α , m = 2, β ;
i =1≠ j
= 1 (9)
(10)
β
α
j =1≠ l
Where γ ji - intensity of transition from the j state in i for
α
j =1≠ i
∑ λ jl ⋅ f ( j,1) (Y ) + ∑ γ j1 ⋅ F2−1 ⋅ f (l , j ) (F2−1Y );
∑ γ ji (7)
For all that the condition of normalization should satisfy to:
β
∑ γ ji ⋅ F2−1 ⋅ f (1, j ) (Y );
⎛ ∂ ⎞ (l ,1) (l ,1) (Y ) + f (l ,1) (Y ) = −⎜ ⎟ ⋅ Π (Y ) − ν l ⋅ f ⎝ ∂Y ⎠
β
Pij = Piα ⋅ P jβ (8)
j =1
T
(6)
the β-graph. We shall introduce a vector (i, j) for the characteristic of a system state, where i - a state for the α-graph; j - for the βgraph. Thus, the system will have α ⋅ β states. The probability of developing the system in a state (i, j) is equal:
∑ Piα
)
⋅ F1−1 ⋅ f ( j ,i ) F1−1Y +
j =1≠i
j = 1, β , γ j =
β
α ⎛ ∂ ⎞ (1, i ) (1, i ) (Y ) + ∑ λ j1 ⋅ f (1,i ) (Y ) = −⎜ ⎟ ⋅ Π (Y ) − ν i ⋅ f ∂ Y ⎝ ⎠ j =1
Where
β
∑ γ j1 ⋅ F2−1 ⋅ f (1, j ) (F2−1Y );
T
Where λij - intensity of transition from i state in j for the αgraph. For the graph with β-states:
)
⋅ F1−1 ⋅ f ( j ,1) F1−1Y +
∞
•
∫ Y (t ) ⋅ f (Y , t )dY
(11)
−∞ •
Θ(t ) =
∞
∫ [Y (t ) − M (t )]⋅ [Y (t ) − M (t )]
T
•
⋅ f (Y , t )dY (12).
−∞
The technique of the moments of probability integration for systems with a random structure is known [5]. Evaluations of phase coordinates in all system states are determined under the formulas:
5
Artemjev V.M., the Ivanovskii A.V. Discrete control systems with random period of quantization - M.: Energoatomizdat, 1986 - 96 p.
182
ABRAMOV ET AL. α β
M (t ) = ∑ ∑ M (i, j ) i =1 j =1
α β
[
Θ(t ) = ∑ ∑ Θ i =1 j =1
(i , j )
⎛ M (ij ) +⎜ M − ⎜ Pij ⎝
(ij ) ⎞ ⎛ ⎟⋅⎜M − M ⎟ ⎜ Pij ⎠ ⎝
(13) T ⎤ ⎞ ⎟ ⋅ Pij ⎥ ⎟ ⎥ ⎠ ⎥⎦
The vector-matrix differential equations (11) and (12), together with (6) and (7), and taking into account (8), (9) and (13), represent a mathematical model of a network control system with information transfer via the channel of multiple access for the mode under consideration. The moments of probability, by which the quality of functioning analysis can be made, are determined as the result of a system of differential equations decision describing a network control system under the set entry conditions M (t 0 ), Θ(t 0 ), Z (t 0 ), Piα (t 0 ) and Pjβ (t 0 ) .
III. THE NETWORK CONTROL SYSTEM OF ILLUMINATION The system of the illumination intensity maintenance, shown in Fig. 6, can serve as an example of network control system realization. The control system functions as follows: the photo cell sensor will transform illumination created by a bulb, in voltage. Further on this signal gets to the microcontroller, transforming an analog signal in a digital type and passing it as a data package via network Ethernet through hub to a regulator. It’s functions consist in the controlling influence on an electric bulb for stabilization of illumination under the influence of interferences in the system. There are not less than 10 stations connected by the help of hub with each other for generation of the traffic in a network.
M (t ) = C ⋅ Z (t ) − A ⋅ M (t ) −
2
∑υ j ⋅ B j ⋅ M (t )
2
2
j =1
j =1
(t ) = −( A + ∑ υ ⋅ B ) ⋅ Θ(t ) − Θ(t ) ⋅ ( A + ∑ υ ⋅ B ) T + Θ j j j j 2
2
j =1
j =1
From all these moments the greatest interest for the analysis is represented with a mean value (the first correlation moment) and a deviation from mean value (dispersion) of OR output. In case of application of the P-regulator the change of the mean value and a deviation from mean value (dispersion) of OR output in time depending on parameters of a control system can have the following form (Fig. 7, Fig. 8, Fig. 9, Fig. 10). As seen from the Figures 7-10, the network control systems stability depends on its parameters. To determine the network control systems areas of stability a Gurvits’s criterion is offered to use. Results of calculation are shown in a Fig. 11; thus areas of parameters where NCS is stable both in mean value, and in dispersions, correspond to the painted areas with points.
Fig. 7. Mean value of OR output.
Fig. 6 Structure of a network control system electric bulb, 2- photo cell, 3- interference (external influence), 4- microcontroller MC9S12NE64 of Freescale Semiconductor, 5station with functions of a regulator, 6- hub, 7- n-computers for generation of the traffic in a network
For the considered network control system of illumination the object of regulation is described by transfer function Wo(p) =ko/(T*p+1); transfer function of the sensor – Ws(p)=ks; as a regulator we shall choose the P-law (proportion law) with transfer function Wr(p)=kp; intensity of sensor quantization υ1, data network - υ2, system input influence x=1. Thus (11) (12) in the vector-matrix form will become:
(15)
+ ∑ υ j ⋅ B j ⋅ Θ(t ) ⋅ B Tj + ∑ υ j ⋅ ( B j ⋅ M (t )) ⋅ ( B j ⋅ M (t )) T + D ⋅ S (t ) ⋅ D T
Fig. 8. Mean value of OR output.
1-
(14)
j =1
Fig. 9. Dispersion of OR output.
RESEARCH OF NETWORK CONTROL SYSTEMS WITH COMPETING ACCESS
183
by the mean value and by the deviations from mean value (dispersion). It is shown in the figures that under the certain choice of parameters it is possible to achieve stability of NCS with competing access to the transfer channel. Using the given technique, we can receive transients and determine areas of stability of the network control systems applying other (for example, PI and PID) laws of regulation.
Fig. 10. Dispersion of OR output.
IV. RESULTS AND CONCLUSION As a result of structural and mathematical modelling the NCS model with transfer of the information via MAC as continuous - discrete with random structure has been developed, which allows to take into account stochastic character of time of data transfer via the channel, to determine areas of stability of control systems, and to synthesize the most effective distributed technological systems control algorithms. The suggested model can be applied to evaluate an opportunity of using standard Ethernet for data exchange between devices of network control systems by concrete technological process. REFERENCES Bauer P., Schitiu, M. Lorand, C. and K. Premaratne, (2001). ‘‘Total Delay Compensation in LAN Control Systems and Implications for Scheduling,’’ In: Proceeding of the American Control Conference, pp. 4300-4305. Cruz R. (1991). A Calculus for Network Delay, Part I: Network Elements in Isolation. IEEE Transactions on Information Theory, Vol. 37 pp. 114–131. Georges, J.-P., T., Divoux, and E. Rondeau, (to appear in 2005). Confronting the performances of a switched Ethernet network with industrial constraints by using the Network Calculus, International Journal of Communication Systems (IJCS). Göktas, F., (2000). Distributed control of systems over communication networks, Ph.D. dissertation, University of Pennsylvania.
Fig. 11. Network control systems areas of stability in the field of various parameters
While searching the areas of stability the transition to the following dimensionless values has been carried out: K=ks*kp*ko - the common factor of intensification of a network control system - the value, equal to intensification rates of sensor ks, P-regulator kr and OR ko product. T1=υ1*T=Td/T - describes the relation of sensor quantization average time Td to a controlled object time constant T. T2=υ2*T=Tmac/T – describes the relation of multiple access channel quantization average time Tmac to a controlled object time constant T. Choosing one of the constant values the border of system stability was determined in the field of two other parameters
Greifeneder, J. and G. Frey (2005). Probabilistic Delay Time Analysis in Networked Automation Systems. In: Proc. of the 10th IEEE ETFA 2005, Catania, Italy, Vol. 1, pp. 1065-1068 Poulard, G. (2003). Modeling and simulation of a control architecture over Ethernet with TCP/IP protocol, Mémoire de DEA, ENS Cachan (F). Richard J.-P. (2003). Time-delay systems: an overview of some recent advances and open problems Automatica, Vol. 39, pp. 1667 --- 1694.
Service-Oriented Context-Awareness and Context-Aware Services * H. Gümüşkaya, M. V. Nural Department of Computer Engineering, Fatih University 34500 Büyükçekmece, Istanbul, Turkey [email protected], [email protected] Abstract. This paper describes the design, implementation, deployment, and performance evaluation of a Service-Oriented Wireless Context-Aware System (SOWCAS), a distributed pervasive system having computers, a context-aware middleware server and mobile clients that support context-awareness and high adaptation to context changes for distributed heterogeneous indoor and outdoor mobile ubiquitous systems and environments. The client SOWCAS application runs on different mobile platforms and exploits the modern wireless communication facilities provided by the new state of the art mobile devices. The client architecture is designed and implemented with paying attention to supportability features, i.e., understandable, maintainable, scalable, and portable, real-time constraints, and using service-oriented and objectoriented design principles and design patterns. The SOWCAS Server provides basic and composite services for mobile clients and handles indoor and outdoor context information. The implementations were tested at the Fatih University campus and given as typical mobile context-aware scenarios from a campus life of students and academicians in the paper.
I.
INTRODUCTION
New mobile communication devices that combine several heterogeneous wireless access technologies such as cellular like UMTS/GSM, and GPRS/EGDE, GPS as well as wireless data communication LAN technologies, like IEEE 802.11, and Bluetooth are becoming widely available. In addition to advances in mobile devices and wireless communications, the recent developments in pervasive computing and contextaware systems provide a strong motivation to develop new software techniques and mobile services for both outdoor and indoor environments. Context-Aware Computing (CAC) is a relatively recent branch of mobile computing and refers to a general class of mobile systems that can sense their context of use, and adapt their behavior accordingly. The idea of CAC was one of the early concepts introduced in one of the pioneering work on ubiquitous computing research [1] and has been subject to extensive research since. CAC helps users interact better with their environment. Although a complete definition of context is difficult, it can be used to include who (identity), what (activity), status, where (location), when (time), nearby people, devices, lighting, noise level, network availability, and even the social situation; e.g., whether you are with your family or a friend from school. _________________ * This work is supported by the Fatih University Research Fund (Grand No. P50050702, Frameworks for Context-Aware Pervasive Systems─FCAPSYS)
In this paper we propose a context-aware framework, Service-Oriented Wireless Context-Aware System (SOWCAS) that eliminates much of the complexity and cost associated with integrating new emerging mobile devices into highly distributed enterprise systems. Our primary focus in SOWCAS is to develop a service-oriented indoor and outdoor context-aware system that benefits from the heterogeneity of wireless access technologies. We tested the first implementation at the Fatih University campus. II. RELATED WORK Location is the most explored of context information, and many context-aware systems may be viewed as being built over positioning systems extending them to provide task specific context-aware information. A context-aware system may be characterized by underlining infrastructure, positioning model, and its application speciation. User positioning is the first prerequisite to a context-aware framework. Today, the various forms of classical Triangulation (TN) and well-known K-Nearest Neighbor (KNN) algorithms are used to estimate the position of a mobile user for outdoor and indoor environments respectively. Triangulation has been widely used by various known location systems including GPS. KNN has been first used in the Microsoft’s RADAR system based on WLAN positioning techniques [2]. Service-Oriented Computing (SOC) is a distributed computing paradigm based on the Service-Oriented Architecture (SOA), which is an architectural style for building software applications that use services [3]. SOC and SOA are not completely new concepts; other distributed computing technologies like CORBA and RMI have been based around similar concepts. SOA and SOC are merely extensions of the existing concepts and new technologies, like XML, and Web Services, are being used to realize platform independent distributed systems. A SOA-based service is selfcontained, i.e., the service maintains its own state. A service consists of an interface describing operations accessible by message exchange. Services are autonomous, platformindependent and can be described, published, dynamically located, invoked and (re-)combined and programmed using standard protocols. SOA promotes loose coupling between software components. The building block of SOA is the SOAP (Simple Object Access Protocol). SOAP is an XML-
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 184–189. © Springer Science+Business Media B.V. 2008
SERVICE-ORIENTED CONTEXT-AWARENESS AND CONTEXT-AWARE SERVICES
based messaging protocol defining standard mechanism for remote procedure calls. The Web Service Description Language (WSDL) defines the interface and details service interactions. The Universal Description Discovery and Integration (UDDI) protocol supports publication and discovery facilities. Finally, the Business Process Execution Language for Web Services (BPEL4WS) is exploited to produce a service by composing other services. The SOA appears to be an ideal paradigm for mobile services. However, it is currently focused only on enterprise and business services. In addition, most of SOA research has been focused on architectures and implementations for wired networks. There are many challenges that need to be addressed by wireless middleware based on SOA. Wireless middleware will play an essential role in managing and provisioning service-oriented applications. Requirements for context-aware systems and frameworks have been widely discussed and described in [4], [5], [6]. Principles and guidelines for designing middleware for mobile computing have been published in literature [7], [8]. A few researchers have also published service-oriented computing imperatives for wireless environments [9]. However, there are few examples of real-world research based deployments of pervasive technologies [10]. In [11], we proposed a supportable, i.e. understandable, maintainable, scalable and portable, meta software architecture and some design principles. We applied this meta architecture and design principles in our WLAN indoor positioning system WiPoD (Wireless Position Detector) [12]. We investigated the possible implementations of WLAN positioning on WiPoD using different algorithms and looked for the answer of if the implementation could be achieved at a level of acceptable accuracy to be used in real-life cases.
185
WiPoD can estimate a user’s location to within a few meters of his/her actual location with high probability. After WiPoD, we developed a generic Service-Oriented Reflective Wireless Middleware (SORWiM) [13] for indoor and outdoor applications. It provides a set of basic and composite services for information discovery and dissemination. III. SOWCAS SYSTEM ARCHITECTURE We designed and deployed a distributed architecture composed of a central middleware SOWCAS Server, PC Proxies and mobile SOWCAS clients as seen in Fig. 1 to collect and process context information. The SOWCAS Proxy software runs on Proxy PCs located in classes, laboratories and some special locations in which students and professors are found frequently. These PCs are equipped with WLAN 802.11 and Bluetooth cards. These PCs and wireless access points located in buildings are used to locate and track the positions of students and faculty members in the university. We used TN and KNN algorithms in our system to estimate the position of a mobile user. SOWCAS is currently using WiPoD as its positioning engine. Other open source positioning systems such as Place Lab [14] can also be easily integrated to our system because of our design methodology and service-oriented architecture of SOWCAS. The SOWCAS wireless positioning system locates and tracks the user having an IEEE 802.11, Bluetooth and GPS supported device across the coverage area of WLAN and outdoors. It operates by processing received signal information from multiple APs, Bluetooth devices and GPS data. Besides indoor and outdoor location context information, we keep other context information such as status, activity, weekly schedules, detailed personal information for students and faculty members in the database as shown in Table 1.
Figure 1. SOWCAS system architecture.
GÜMÜŞKAYA AND NURAL
186 TABLE I.
TYPICAL CONTEXT INFORMATION FOR STUDENTS AND FACULTY MEMBERS
Status Offline / Online Available / Not Available Away Busy Invisible
Activity Coming to University Campus
Other Information Weekly Schedules
Leaving the University Campus
Personal Data
Attending Class / Lab At Meeting Dining at Cafeteria, Resting Studying at Office/Class/Library Using Vending/ATM Machine
Limited resources, heterogeneity, and a high degree of dynamism are the most common properties that exist in mobile devices such as PDAs, phones, and sensors. Mobile devices generally don’t have powerful CPUs, large amount of memory and high-speed I/O and networking capabilities compared to desktop PCs. The degree of dynamism present in ubiquitous computing does not exist in traditional servers and workstations. A PDA, for example, interacts with many devices and services in different locations, which implies many changing parameters such as the type of communication network, protocols, and security policies. Therefore, because we can’t predict all possible combinations, the software running on a PDA device must be able to adapt to different scenarios to cope with such dynamism. All these properties affect the design of the wireless middleware required for mobile computing. Conventional middleware platforms, such as CORBA/RMI are not appropriate, because, first of all, they are too big and inflexible for mobile devices. A wireless middleware should be lightweight as it must run on hand-held, resource-scarce devices. Conventional platforms expect static connectivity, reliable channels, and high bandwidth that are limited in resource-varying wireless networks. Wireless middleware should support an asynchronous form of communication, as mobile devices connect to the network opportunistically and for short periods of time. It should be built with the principle of awareness in mind, to allow its applications to adapt its own and the middleware behavior to changes in the context of execution, so as to achieve the best quality of service and optimal use of resources. Hiding network topologies and other deployment details from distributed applications becomes both harder and undesirable since applications and middleware should adapt according to changes in location, connectivity, bandwidth, and battery power. IV. SOWCAS CLIENT AND SERVER ARCHITECTURES We use middleware approach in both the client and server parts of the SOWCAS architecture as shown in Fig 2. This approach provides integration and interoperability of applications and services running on heterogeneous computing and communication devices, and simplifies distributed programming.
The new state of the art Pocket PCs and smart phones, such as IPAQ 914, Asus MyPAL A636N, and Sony Ericsson 990 which were our mobile devices in the implementations, have heterogeneous wireless access technologies such as WLAN, Bluetooth, GPS, UMTS/GSM and GPRS/EGDE. The programming access to these wireless technologies is generally provided by the third party products and software development environments as C/C++ APIs (Application Programming Interfaces) which are system dependent as given as OS Adaptation Layer in the SOWCAS Client architecture as shown in Fig 2 (a). We prefer to use the Java programming language in both the client and server parts of SOWCAS as much as possible. In the mobile client side we used a Java service-oriented platform. When it is not possible or difficult to use Java technologies for some mobile systems, we use the Microsoft’s .NET platform and Sybase’s PocketBuilder to develop web service client applications. The SOWCAS Client application architecture is shown in Fig. 3 and based on PCMEF layered architecture [15]. The layers include Presentation, Control, Mediator, Entity, and Foundation. A layered architecture is a system containing multiple, strongly separated layers, with minimal dependencies and interactions between the layers. Such a system has good separation of concerns, meaning that we can deal with different areas of the application code in isolation, with minimal or no side effects in different layers. By separating the system’s different pieces, we make the software understandable, maintainable, scalable, and portable, so that we can easily change and enhance it as requirements change, and port it to different mobile devices. Each layer communicates with only lower and upper layer. The top layer Presentation includes the user interface classes. The Control layer is responsible for functionalities such as determining the mobile user location, creating the radio map, retrieving the data from Mediator and passing this data to Presentation layer. The location determination algorithms, KNN and TN, are implemented in this layer. Mediator intervenes between hardware layer classes and high-level application layer classes. While Entity manages the wireless data objects currently in the memory, Mediator ensures that Control subsystem gets access to these data objects. At the bottom of system hierarchy, Foundation provides a communication link between the network interface card and Mediator. In this layer, different wireless technologies such as GPS, IEEE 802.11, Bluetooth, and Cellular Phone are supported in order to provide different location positioning infrastructures and multiple communication paths for the client. This package includes parts which are written in both C and Java. We used NDIS (Network Driver Interface Specification) to make queries to the network adapter. RawEther framework was used to access NDIS network interface drivers and the JNI (Java Native Interface) was used to access the native DLL code produced by RawEther. By writing programs using the JNI, we ensure that our code is completely portable across platforms.
SERVICE-ORIENTED CONTEXT-AWARENESS AND CONTEXT-AWARE SERVICES
187
Figure 2. SOWCAS Client (a) and Server (b) architectures.
Figure 3. SOWCAS client application architecture.
The SOWCAS Server has basic services such as Event, Messaging, Location, and Redirection as shown in Fig 2 (b). The composite services such as LBRComp and LBNComp are created using these basic services. Each service is depicted by a WSDL document, which describes service accessing rules. Web services are registered to the central UDDI database. The client searches the UDDI to find out the service it needs, fetches the WSDL file, and generates the stub with the WSDL to stub code generator provided by the web service toolkit, and starts calling remote methods. Reflective middleware presents a comprehensive solution to deal with ubiquitous computing [7]. Reflective middleware system responses are optimized to changing environments, including mobile interconnections, power levels, network bandwidth, latency/jitter, and dependability needs. Reflection is the ability of a program to observe and possibly modify its structure and behavior. In SOWCAS, reflection was used in the service level and we developed reflective services that take decisions based on context information.
The Event service provides authentication, preferences, and system and user information. The Messaging service is used to create, send and receive messages. The Location service searches and updates locations of mobile users. It is used by other services for giving decisions and providing active context-awareness that autonomously changes the application behavior according to the sensed location information. The Redirection service is responsible for client redirection to different servers and server reference translation. We used JCAF [16] which is a Java-RMI based contextawareness framework in the first SOWCAS prototype. The RMI feature makes the JCAF client Java language dependent. Since we have one more layer, web services layer on top of JCAF as shown in Fig 2 (b), our system is platform independent. Since JCAF has several restrictions, we have started to develop our own context-awareness framework. V. SERVICE ORCHESTRATION AND MOBILE SCENARIOS To demonstrate and evaluate the development of mobile applications using the SOWCAS services, several web service application programs were implemented to emulate several wireless application scenarios as shown in Fig 4. Some application scenarios using the basic and five composite services, Location Based Redirection Composite service (LBRComp), Location Based Notification Composite service (LBNComp), Map Composite service (MapComp), GPS Composite service (GPSComp), and WLAN Composite service (WLAN Comp), will be given below. In our scenario, Mustafa, a senior computer engineering student at Fatih University, has a Pocket PC equipped with GPRS/EDGE, WLAN and GPS technologies. At the beginning of our scenario, Mustafa’s Pocket PC has a connection to SOWCAS Server through the WLAN connection at his home.
188
GÜMÜŞKAYA AND NURAL
Figure 4. Mobile application scenarios.
When Mustafa has left his home and moved out from his home’s WLAN coverage area, his mobile device keeps its connection to SOWCAS through the GPRS/EDGE connection. The WLAN-to-GPRS handover is initiated by the user. LBRComp maintains transparent redirection according to changes in the mobile location using basic Location and Redirection services. This service first obtains the device location and redirects to the available addresses. When LBRComp decides that a WLAN-to-GPRS handover is required, it sends a handover-required message and an available endpoint (IP, port number, and service name) of the server to the mobile device. The user then performs a handover from the WLAN to the GPRS connection. When Mustafa arrives in Fatih University, and his mobile device discovers a wireless access point near the university entrance, the mobile device sends the location information to LBRComp through the GPRS connection. SOWCAS then sends the endpoint of the server that is accessible in the university. The mobile device performs a handover again and switches from GPRS to the WLAN connection again. The screen for this last event is shown in Fig 4 (a). Mustafa sets the new location as Home and gets a response message from SOWCAS. The response is a new connection end point for the WLAN network. The old and new end points are also shown in the lower part of the screen in Fig 4 (a). After Mustafa enters the university, he wants to go to a place where he does not know its location on the campus. He uses the Map Service composite service MapComp. The SOWCAS server first sends the campus map to Mustafa’s Pocket PC. Mustafa then enters the destination name. The SOWCAS Server finds the shortest path to the destination, and sends the destination coordinates to the Pocket PC. Finally the Pocket PC client program draws the path to destination on the screen. While Mustafa is moving to the destination, the real time location of Mustafa is shown as a moving blinking red circle as he walks on the Campus as shown in Fig 4 (b). The outdoor positioning is performed by
the GPS receiver on his device and the SOWCAS GSP software developed for the Pocket PC. The Mustafa’s realtime location information is also sent periodically to the SOWCAS Server using GPSComp service. After Mustafa gets the destination he enters the building. This time the indoor positioning using the WLAN infrastructure in that building takes place, and Mustafa’s new location is sent to the SOWCAS Server using the WLANComp service. LBNComp which composes the services of Notification, Messaging, Event and Location, notifies all registered users who are interested in a specific event. Imagine the following last scenario using this composite service. Halûk, Mustafa’s Professor at the Fatih University, is waiting for Mustafa’s arrival to the campus. He types a command which means “notify me when Mustafa appears on campus” in his Pocket PC which has a connection to the SOWCAS Server. When Mustafa enters the campus, immediately a notification is sent to Halûk’s Pocket PC as a message as shown in Fig 4 (c). VI. PERFORMANCE EVALUATION OF SOWCAS There are a number of research studies for performance evaluations of web services and SOAP compared to middleware technologies such as RMI/CORBA for wired networks [17], [18], [19]. In these studies, a comparison is made based on the performance of a web service implementation that calls a simple method and an RMI, CORBA, or other implementation that calls the same method. We performed similar benchmarking tests to measure the performance of SOWCAS in WLAN environments. The tests were performed on small, medium and large data values in remote method calls, and the averages were calculated for each test. The RMI implementations were faster compared to web service implementations for small batches of documents, but performance degrades rapidly with larger batches and higher latency in our tests. The web services have a high initial cost, but show little or no change with larger batches. Higher latency creates a greater initial cost, but performance
SERVICE-ORIENTED CONTEXT-AWARENESS AND CONTEXT-AWARE SERVICES
is still independent of batch size. As latency increases, the performance benefits of the document-oriented approach increase significantly. This is relevant when in some real world wireless communication scenarios, latency may even be minutes as for disconnected or asynchronous operations. The large amount of XML metadata in SOAP messages is the main reason that Web services will require more network bandwidth and CPU times than RMI. All numeric and other data in web services are converted to text. Meta-data, defining structure, are provided as XML mark-up tags. XML parsers allow client and server implementations to construct their distinct but equivalent representations of any data structures. The use of HTTP, and XML text documents, supports increased interoperability but also represents a significant increase in run-time cost for web service solutions. The XML formatted documents are inherently more voluminous than the binary data traffic of the other approaches. More data and control packets have to be exchanged across the network. When considering performance alone, web services provide value when the overhead of parsing XML and SOAP is outweighed by the business logic or computation performed on the server. Although web services generally don’t provide value in performance, but do provide a convenient way to provide user interface, automatic firewall protection, mobility and heterogeneity of applications, transparent proxies, and thin clients. The most natural designs for distributed objects are easy to use but scale poorly, whereas web services have good scaling properties. VII. CONCLUSION Real-world deployment of pervasive computing technologies will often face the challenges of heterogeneity especially in the client side—that is, mobile client software systems need to run in heterogeneous environments involving various networks, protocols, hardware, and operating systems. Hence, design for heterogeneity is fundamental for deployable systems. As a result of the service-oriented development strategy we used in SOWCAS, we designed and implemented loosely coupled subsystems that can run (and fail) independently of each other. For example, since the server was implemented using RMI in [16], the clients must have been also implemented using RMI which makes their system Java dependent. In our approach, the mobile clients can be implemented using different programming languages. In SOWCAS’s SOA middleware approach which provides a loosely coupled system, each subsystem can accommodate and utilize the peculiarities of different platforms, networks, and languages and still work together as one system. Changing or updating one component doesn’t affect the entire system, and a breakdown in one component only affects part of the system’s functionality. Moreover, debugging a loosely coupled component is easier, and we were able to handle scalability issues for each subsystem. Finally, a loosely coupled system lets us use different languages depending on the challenge the particular subsystem presents.
189
A related technical strategy mitigating the challenges of robustness, fault tolerance, and performance is to use stateless components which SOA also provides. In case of a serious error, it’s possible to restart and reinitialize stateless components with no loss of state information. Ensuring stability by restarting stateless processes has been a simple but effective strategy in a pervasive computing environment. We observed in our test results that although the web service implementations are slower compared to RMI, the performance is acceptable for many real time operations for a wireless middleware. The performance can be made better and improved if the issues described above are taken into consideration in middleware server designs based on SOA. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9]
[10] [11] [12] [13] [14] [15] [16]
[17] [18]
[19]
M. Weiser, “The Computer for the 21st Century,” Scientific American, vol 265, no. 3, pp. 66-75, September, 1991. P. Bahl and V. N. Padmanabhan, “RADAR: An In-Building RF-Based User Location and Tracing System,” Proceedings of IEEE Infocom 2000, Tel Aviv, Israel, vol. 2, pp. 775-784, March 2000. T. Erl, Service-Oriented Architecture: Concepts, Technology, and Design, Prentice Hall, 2005. A. Dey, G. D. Abowd and D. Salber, “A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications,” Human-Computer Interaction, vol. 16, pp. 97-166, 2001. A. Harter, A. Hopper, P. Steggles, A. Ward and P. Webster, “The Anatomy of a Context-Aware Application,” Wireless Networks, vol. 8, no. 2-3, pp. 187-197, 2002. F. Hohl, L. Mehrmann, and A. Hamdan, “A Context System for a Mobile Service Platform,” Trends in Network and Pervasive Computing, LNCS, Springer Verlag, vol. 2299, pp. 21–33, March 2002. M. Roman, F. Kon and R. Campbell, “Reflective Middleware: From your Desk to your Hand,” IEEE Distributed Systems Online, vol. 2, no. 5, 2001. C. Mascolo, L. Capra, W. Emmerich, “Principles of Mobile Computing Middleware,” Middleware for Communications, Wiley, 2004. R. Sen, R. Handorean, G. C. Roman and C. Gill, “Service Oriented Computing Imperatives in Ad Hoc Wireless Settings,” Service-Oriented Software System Engineering: Challenges and Practices, pp. 247-269 Idea Group Publishing, April 2005. T. R. Hansen, J. E. Bardram and M. Soegaard, “Moving Out of the Lab: Deploying Pervasive Technologies in a Hospital,” IEEE Pervasive Computer, vol. 5, no. 3, pp. 24-31, 2006. H. Gümüşkaya, “An Architecture Design Process Using a Supportable Meta-Architecture and Roundtrip Engineering,” Lecture Notes in Computer Science, Springer-Verlag, vol. 4243, pp. 324-333, 2006. H. Gümüşkaya and H. Hakkoymaz, “WiPoD Wireless Positioning System Based on 802.11 WLAN Infrastructure,” Enformatika, vol. 9, pp. 126-130, 2005. B. Yurday and H. Gümüşkaya, “A Service Oriented Reflective Wireless Middleware,” Lecture Notes in Computer Science, Springer-Verlag, vol. 4294, pp. 545-556, 2006. Place Lab Project: http://www.placelab.org. L. Maciaszek and B. L. Liong, Practical Software Engineering: A Case Study Approach, Addison Wesley, 2004. J. E. Bardram, “The Java Context Awareness Framework (JCAF) - A Service Infrastructure and Programming Framework for Context-Aware Applications,” The 3rd International Conference on Pervasive Computing, LNCS, Springer Verlag, 2005. D. Davis and M. Parashar, Latency Performance of SOAP Implementations, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid, pp. 407-412, 2002. M. B. Juric, I. Rozman, B. Brumen, M. Colnaric and M. Hericko, “Comparison of Performance of Web Services, WS-Security, RMI, and RMI–SSL,” The Journal of Systems and Software, vol. 79, no. 5, pp. 689-700, 2006. W. R. Cook and J. Barfield, “Web Services versus Distributed Objects: A Case Study of Performance and Interface Design,” Proc. of the IEEE International Conference on Web Services (ICWS), Sept. 18-22, 2006.
Autonomous Classification via Self-Formation of Collections in AuInSys Hanh H. Pham State University of New York at New Paltz [email protected]
Abstract - This paper describes a mechanism, called SIF, for autonomous and unsupervised hierarchical classification of information items in AuInSys, a pro-interactive information system, where each single information element is represented by a compact software agent. The classification is carried out through the self-organization of information items which form the collections incrementally. The information items compete with each other through the self-interactions of their agents which lead to incremental grouping of items into collections. The proposed mechanism provides an automatic, more flexible, and dynamic partitioning of information compared with what is used in traditional file systems.
I.
INTRODUCTION
In modern information systems we are overwhelmed with the huge amounts of heterogeneous information. As it is timeconsuming, costly, difficult or sometimes even impossible to classify and use all available information and data, we need new and effective ways to classify and manage these data and information. Information clustering plays a central role in this task. It divides information items in the repository into groups or clusters which contain items of similar contents. Clustering is an unsupervised information classification process where we need to find groups of information items without knowing the group structures. It is based on the assumption that information items with similar contents are relevant to the same topics, keywords, or query. Information classification and clustering can be done completely or partially by human experts and users, such as computer users cluster files into folders or directories in the computers. Clustering can also be done automatically by machines, such as clustering web pages by web search engines. This paper proposes a self-organized approach to automatic information classification, called self-clustering incremental formation (SIF), for automatic and clustering information items in text format or information items which are indexed with texts. The proposed method SIF is based on the use of software agents for capturing and representing every single information item in the repository. The interactions between software agents can lead to the virtual division or virtual combination of information items represented by these agents. This concept was also the foundation of AuInSys, an autonomic information management system, which was described in [2]. AuInSys uses an autonomic approach to representing and processing information in which
heterogeneous and distributed information items become proactive entities capable of self-discovering, self-integrating, and self-organizing. The rest of this paper is organized as follows. Section 2 reviews related works on automatic information classification. Section 3 describes the classification problem in AuInSys - an autonomic information system where the proposed clustering mechanism takes place. Section 4 explains in details how SIF works. The effects of selection conditions and distance measurements are studied in section 5. Finally, a brief assessment and possible directions of future work are discussed in section 6. II. RELATED WORKS The foundation of information clustering is based on the clustering hypothesis, which states that documents having similar contents tend to be relevant to the same query [3]. In information classification a repository of information items is partitioned into meaningful groups or collections of items with similar contents [1]. This can be done via: (i) unsupervised methods, clustering items into groups without knowing the structures of the groups [6], or (ii) supervised methods, categorizing items into groups using predefined patterns, structures, or specification of group semantics [7]. Clustering information is about discovering possible related groups of items while categorizing items is about defining whether or not an item belongs to given categories. In both cases, there are two approaches in evaluating information items using (i) similarity measures such as: Cosine, Dice, or Jaccard or (ii) distance measures such as Metric, Euclidean or Mutual Neighbor. Clustering mechanisms can be divided into: (i) hierarchical algorithms which produce nested groups or clusters and (ii) non-hierarchical partitioning algorithms which produce flat partitions of the repository. Hierarchical mechanisms are slower than the non-hierarchical ones but can provide classifications with relationships between the clusters. Non-hierarchical partitioning algorithms include K-means clustering family which assign each item or point to the cluster whose center, called centroid and is defined as the average of all the points in the cluster, is the nearest, such as batch kmeans and incremental k-means. Hierarchical mechanisms can be based on: (i) decompositional or top-down/divisive approach breaking the repository into clusters such as Scatter/Gather[5] or (ii)
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 190–195. © Springer Science+Business Media B.V. 2008
191
AUTONOMOUS CLASSIFICATION VIA SELF-FORMATION OF COLLECTIONS IN AUINSYS
III. CLASSIFICATION PROBLEM IN AUINSYS The proposed hierarchical clustering method SIF is designed for pro-active information systems such as AuInSys. The AuInSys can be considered as a repository of information items which can have different formats but we assume that each information item Iu, u=1..n can be indexed with terms or keywords. The key feature of AuInSys is that each information item is captured and represented by a generic agent which is an autonomous and compact program that can monitor the item and act without human user supervision. Repository Population Po(t) = {I1,I2,…In} is the set of all current information items in the system at the time moment t. Each item Ik is represented and pro-actively managed by a generic agent Ak . Generic Agents in the agent set A={A1,A2, … An} are compact and uniform software agents which can run and operate independently. Each Information Item Iu , u=1..n, can be indexed or characterized with a set of terms with weights called Item Index denoted with: DEX(I u ) ={(k 1 , β 1 ),(k 2 , β 2 ), .., (k x , β x )} where the terms = k1, k2, .. , kx ∈ Ќ with Ќ is a set of all terms, the weights = β1, β2, .. , βx ∈ Ds with Ds is a domain for weight coefficients. These weights can be defined based on the frequencies of terms as follows:
βm =
fm X
∑ h =1
for a term km, m = 1..X
fh
where, X is the number of all terms considered in the given information item Iu , fh is the frequency of term km appeared in the given information item Iu, h=1..X.
AuInSys Repository Po(t) = {I1 ,I2,…I n} A1
Classified Items
A9
A7
A6 A3
A8
A2
A 10 A6
A7
INPUT
CLUSTERING
DEX(I u) ={(k 1,b 1),(k 2 ,b 2), .., (kx,b x)}
agglomerative or bottom-up approach building clusters from individual items such as Ward method. A comparison of five representatives of the bottom-up hierarchical approach was provided in[4] for single link, complete link, group average, weighted average, and Ward methods. Single Link method is fast and easy to implement but usually forms big clusters. First, the similarity matrix is formed where the similarity is based on the most similar points. Then, the two most similar information items or most similar points of clusters join. In each step, the next two most similar and stillfree items from the remaining pool joint a cluster until no item left free in the repository. Complete Link method is also fast although a bit more difficult to implement but it forms smaller clusters compared with Single Link method. The other difference is that the similarity of clusters is based on their least similar points while the way the clusters are formed remains the same as in Single Link method. Group Average method defines the similarity of information items based on the average similarity of points or components of clusters. Weighted Average method adds weights in calculating the average similarity. Ward method uses Ward ratio in defining the similarity where the cluster analysis is based on the analysis of variance instead of distances. To deal with time constraints and the long computation time of hierarchical approach several fast clustering algorithms[10] were proposed for browsing through collection using topics [11]. For web pages fast clustering mechanisms such as Suffix Tree Clustering were proposed where clusters are created based on the phrases which are shared between the web pages. Other methods which utilize projections techniques were proposed to speed up the distance calculations of clustering [9]. In this paper, our proposed clustering method belongs to the hierarchical bottom-up or agglomerative approach. The information items, represented by agents, are incrementally grouped into collections through the interactions of their agents.
A2 A9
OUTPUT A8
A1
A3
A4
Agents {A1 ,A2, … A n}
A6
A6 A4
Collections Fig.1. Classification in AuInSys
A10
192
PHAM
Thus, DEX(Iu) shows which terms (k1, k2, .. , kx) represent the given information item Iu and how much (β1, β2, .. , βx) the given information item Iu relates to each of these terms (k1, k2, .. , kx). As in AuInSys the information items are dynamic, i.e. their contents may be modified or replaced at any time the agent Au which represents Iu monitors it and updates its index DEX(Iu) accordingly when any change occurs. In order to manage the information items in AuInSys we need to group them into collections of related contents. A collection Cv can have a number Sv of members. Not only information items but other collections can also be members of a collection. In other words, information items play the role of “files” and collections play the role of “folders/directories” in a regular file system. However, unlike in a file system where human users classify files into folders, in AuInSys the information items and the collections themselves need to be clustered into collections automatically The requirements of the clustering problem in AuInSys can be illustrated in Fig.1. Given n information items, each of which is characterized by its index DEX(Iu), u=1..n, these n items must be grouped into collections based on how they relate to each other without any predefined structure or characteristics of the collections including the number of collections. With such requirements the clustering method should be dynamic and hierarchical. IV. SELF-CLUSTERING INCREMENTAL FORMATION (SIF) We propose a self-clustering and incremental mechanism called SIF to solve the clustering problem in AuInSys and similar information systems where information items relate to each other with multiple dimension, by a number of terms (k1, k2, .. , kx). In AuInSys, in order to group information items into collections the agents representing these items must first independently define the item index DEX, then these agents must cooperatively build up the collections and incrementally add more items or collections to the existing collections. Our algorithm for Self-Clustering Incremental Formation can be described as follows: Step 1: For u=1..n, for each information item Iu its agent Au defines the item index DEX(Iu) = {(ku1,βu1),(ku2,βu2), .., (kXu,βXu)}, where Xu is the number of terms used in the given information item Iu. Since each agent represents one item we can have these agents work in parallel and thus reduces the system indexing time for n items. The indexing results are: DEX(I1) ={(k11,β11),(k12,β12), .., (kx1,βX1)} DEX(I2) ={(k21,β21),(k22,β22), .., (kx2,βX2)} … DEX(In) ={(kn1,βn1),(kn2,βn2), .., (kXn,βXn)}
For example, with a repository of 10 items using 10 terms in we have the following weights (β) in the indexes: k1
k2
k3
k4
k5
k6
k7
k8
k9
k10
I1
5.6
0.0
2.0
2.4
1.0
7.8
0.0
3.8
6.0
0.0
I2
6.2
0.0
5.3
5.4
6.3
0.0
0.0
9.1
8.5
5.5
I3
0.0
3.1
0.0
5.3
0.0
5.8
1.1
6.9
5.3
6.2
I4
0.0
0.0
6.0
3.8
0.0
7.6
0.0
7.0
8.4
1.8
I5
0.0
0.0
0.0
1.1
6.5
5.5
4.2
6.1
2.8
6.0
I6
4.2
0.0
1.3
7.5
0.0
9.0
7.2
8.9
0.0
0.0
I7
6.1
2.3
1.8
5.1
5.4
6.4
3.2
9.9
4.8
6.7
I8
0.0
2.9
6.1
6.6
4.6
1.5
0.9
0.0
0.0
8.8
I9
0.0
1.3
0.0
4.5
0.0
0.9
7.8
2.0
9.3
0.4
I10
9.8
0.0
0.0
0.0
7.3
6.8
0.1
4.9
0.0
5.1
Fig.2. Weights (β) for 10 items and 10 terms Step 2: The agents cooperate with each other to define the similarity Sim(Ig,Id) between each pair (Ig,Id) of information items based on their index. First, the agents define the common terms which have non-zero weights in the indexes of both Ig and Id :
⎧⎪ k ∈ DEX(Ig ) & k z ∈ DEX(Id ) ⎫⎪ Com(Ig , Id ) = ⎨(kz , β z ) : z ⎬ βz = Min{βzg ∈ Ig , βzd ∈ Id } ⎪⎭ ⎪⎩ For example, in the given repository: Com(I1,I2) = {(k1,5.6),(k3,2),(k4,2.4),(k5,1),(k8,3.8),(k9,6)} Then, the similarity is defined as the sum of all shared weight portions of the common terms:
Sim(Ig , I d ) =
Com ( I g , I d )
∑ MIN{β
gKr
, β dKr }
Kr
In the given example: Sim(Ig,Id) = 5.6+2+2.4+1+3.8+6 = 20.8 Step 3: Initially assign each item into a primary collection. Thus, we have n collections, each has only one item. The distances between the collections are the same as the distances between the items in those collections. Step 4: Define the most similar pair of collections and merge them into a new collection. Here, the distance between two collections is defined based on the maximal similarity from any member of one cluster to any member of the other cluster. The most similar pair of collections is the one with the maximal s similarity Sim(Cu,Cv) among any pairs of collections: Cu ,Cv
Sim(Cu , C v ) = Max{ Max{Sim( I g , I d )} : ∀Cu , Cv } I g ,I d
AUTONOMOUS CLASSIFICATION VIA SELF-FORMATION OF COLLECTIONS IN AUINSYS
193
Step 5: Calculate the similarities between the new collection and each of the existing collections. Repeat step 4 and step 5 until all n information items in AuInSYs are grouped into a single collection of size n. The result of this clustering is a hierarchical set of core collections with which we can build the final collections of items based on user needs. For instance, for the given example data in Fig. 2 we have the hierarchical clustering results as shown in Fig. 3. Based on this partitioning and the depth threshold for collections Ts (1:n) we can have several cuts or slices: S1, S2, S3, S4, and S5 which lead us to different options for defining the final collections of items in AuInSys as the followings: For S1 cut: C1 = {7,10} and C2 = {2} and C3 = {9,8} and C4 = {4} and C5 = {3} and C6 = {5} and C7 = {6} and C8 = {1} For S2 cut: C1 = {2,7,10} and C2 = {9,8} and C3 = {4,3} and C4 = {5} and C5 = {6,1} For S3 cut: C1 = {2,7,10} and C2 = {9,8} and C3 = {4,3,5} and C4 = {6,1} For S4 cut: C1 = {2,7,10,9,8} and C2 = {4,3,5} and C3 = {6,1} For S5 cut: C1 = {2,7,10,9,8} and C2 = {4,3,5,6,1} It is noticeable that the higher the cut is (S5) the less number of collections is (2) and the bigger the clusters are (5 items), compared to the opposite side (S1) where the number of collections is much higher (8) and the clusters are much smaller (1 or 2 items).
Fig.3. Results of Clustering Thus, by changing the collection threshold for slicing (Ts) we can adjust the collection granularity and density of items in the collections. With this feature of SIF we can dynamically build the “folders/directories” in AuInSys instead of having them fixed as in the traditional file systems. V. EFFECTS OF CLUSTERING DIRECTIONS In our hierarchical clustering method the condition for selecting the next winner agent/item to interact with and make its item to join a cluster plays a decisive role in how all the items will be grouped. Besides, as always, the distance measurement also significantly change the clustering results. In this section, we study the effects of both: (i) the selection conditions and (ii) the distance measurement on the clustering results of the proposed self-clustering incremental formation SIF method. We run SIF with three different selection conditions in combination with two kinds of distance measurements for the same repository of 50 items. The three conditions for selecting the winner are: • Distance by the Most Similar member • Distance by the Least Similar member • Distance by the Average of members The two distance measurements are Euclidian and Manhattan distances. The experiment results for six combinations are shown in Fig. 4,5,6,7,8,9.
194
PHAM
Fig.4. Clustering with Average Similar and Euclidian distance
Fig.7. Clustering with Average Similar and Manhattan distance
Fig.5. Clustering with Least Similar and Euclidian distance
Fig.8. Clustering with Least Similar and Manhattan distance
Fig.6. Clustering with Most Similar and Euclidian distance
Fig.9. Clustering with Most Similar and Manhattan distance
AUTONOMOUS CLASSIFICATION VIA SELF-FORMATION OF COLLECTIONS IN AUINSYS
VI. CONCLUSIONS We have presented a hierarchical clustering mechanism called SIF for information classification in dynamic and proactive information systems such as AuInSys [2]. This mechanism carries out the clustering though incremental grouping of information items which is implemented by the interactions of agents representing these items. The advantages of this mechanism are that: (i) it provides dynamic partitioning and dynamic collections of items, thus makes the information systems, AuInSys in this case, more flexible and also powerful compared with traditional file systems; (ii) the classification is carried out though agents which can work in parallel, so it would be more suitable for distributed information systems; (iii) it is responsive to any changes in the information items as the agents autonomously monitor the contents of information items. In the future we plan to build and add an intelligent mechanism for defining the most suitable partitioning of information items into collections for users in AuInSys based on their tasks and on the hierarchical clustering results provided by SIF. REFERENCES [1]
Fabrizio Sebastiani, Classification of text, automatic, The Encyclopedia of Language and Linguistics, Vol. 14, Elsevier Science Publishers, 2005. [2] Hanh Pham, “AuInSys: an Autonomic Information System”, in the Proceedings of the 2004 International Conference on Information and Knowledge Engineering (IKE'04), pp.102-108, ISBN 1-932415-27-0, CSREA Press, Las Vegas, USA, June 21 - 24, 2004. [3] Van Rijsbergen, C. J. Information retrieval. Butterworths, 1979. [4] Willett, Peter. Recent Trends in Hierarchic Document Clustering: A Critical Review. Information Processing and Management Vol. 24, No 5, p. 577-597, 1988. [5] Douglass Cutting, David R. Karger, Jan O. Pedersen and John W. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Proceedings of ACM/SIGIR, p. 318-329, 1992. [6] S. H. Srinivasan, Features for unsupervised document classification, proceeding of the 6th conference on Natural language learning, p.1-7, August 31, 2002. [7] Joachims, Thorsten. Text Categorization with Support Vector Machines: Learning with Many Relevant Features. Machine Learning: ECML-98. 10th European Conference on Machine Learning, p. 137-42 Proceedings. 1998. [8] Douglass Cutting, David R. Karger, Jan O. Pedersen and John W. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections. In Proceedings of ACM/SIGIR, p. 318-329, 1992. [9] Schutze, Hinrich and Silverstein, Craig Projections for efficient document clustering. SIGIR Forum (ACM Special Interest Group on Information Retrieval), p. 74-81, 1997 [10] Sahami, Mehran; Yusufali, Salim and Baldonado, Michelle Q.W. Realtime full-text clustering of networked documents. Proceedings of the National Conference on Artificial Intelligence, p. 845, 1997. [11] Javed Aslam, Katya Pelekhov and Daniela Rus,"Using Star Clusters for Filtering", CIKM 2000
195
Grid Computing Implementation in Ad Hoc Networks Aksenti Grnarov, Bekim Cilku, Igor Miskovski, Sonja Filiposka and Dimitar Trajanov Dept. of Computer Sciences Faculty of Electrical Engineering and Information Technology University Ss. Cyril and Methodious Skopje Skopje, R. Macedonia [email protected] Abstract – The development of ubiquitous computing and mobility opens challenges for implementation of grid computing in ad hoc network environments. In this paper, a new grid computing implementation for ad hoc networks is proposed. The proposed addition of the ad hoc network protocols suite offers an easy and effective way to exploit the computing power of the network nodes. The model is implemented in the NS-2 network simulator providing the possibility to investigate its performances and tune the network grid parameters. Index Terms – Grid Computing, Ad Hoc Networks, Service Deployment, Resource Discovery.
I. INTRODUCTION Not so long ago the target of the Grid scientists in their projects and explorations were the wired networks. But recently there has been a fair amount of research on using computational grids over mobile environments [1][2]. This is due to the expansion of mobile networks and the mobile technology, and the steady rise of their performances and computational capabilities. The potential ad hoc grid applications would allow nodes to accomplish specific task beyond an individual computing capacity. The ad hoc networks are usually used for emergency applications when results are needed immediately. However, some calculations are often time consuming and need to be done in harshest environment. In order to obtain the results more quickly these types of applications can be deployed in an ad hoc grid. Some typical applications are in the field of disaster management, e-healthcare emergency, sharing resources, collaborative computing, etc. Current work concerning ad hoc grid implementation can be divided into three groups. The first group deals with the challenges for establishing ad hoc mobile grids. In [3] various challenges in ad hoc grid like resource description, resource discovery, coordination systems, trust establishment and clearing mechanism are proposed. In [4] different problems like finding the resources, scheduling the tasks, node discovery, node property assessment, service deployment, service security and the Quality of Service are analyzed. The second group deals with the challenges in resource discovery. Several papers concerning resource discovery and executing tasks in the mobile Grid environment are
published. In [5] the use of service advertisement as a method for resource discovery is proposed. Yet this method is clearly inadequate for dynamic resources such as CPU load, available memory, battery lifetime and available time of the nodes. A different solution for resource discovery is proposed in [6]. It is based on using reactive protocols and it’s characterized with the use of a new module for finding resources with the help of the existing AODV routing protocol. However, this approach shows the architecture only of the process responsible for finding the necessary services. Also another drawback of this approach is the fact that AODV packets are changed, which leads to a change of one standardized protocol. In [7] reactive routing protocols are used and the nodes exchange messages between them about their resources and their neighbors resources. In [8] agents are used in order to find nodes that will execute certain tasks. The last two models overload the network with messages even though some nodes would not be used in the execution process. The third group deals with task scheduling in the ad hoc grid environment. One proposed and existing solution that addresses this problem is to use the DICHOTOMY protocol. The main function of this protocol is to allow resource provisioning to be scheduled among the most resourceful nodes in the mobile grid, while mitigating the overhead of discovery messages exchanged among them [9]. Having in mind the needs of the wired Grid, its ad hoc implementation needs to offer: (1) finding resources only when there is a need for execution of a given task, (2) minimum time of execution, (3) no influence on the upper or the lower layers, (4) rescheduling if a certain node is down, (5) implementation on every platform and (6) data protection. Also it must have the following characteristics: (1) to keep pace with the dynamic nature of the network, (2) not to overload the network, (3) to use different scheduling algorithms in order to improve the overall execution time, (4) to intervene in case of node failure, (5) to authenticate and authorize the nodes when joining the network. Starting from the needs stated above we have created new implementation of Grid in ad hoc networks realized as a separate layer. This new independent Ad Hoc Grid Layer (AHGL) gives opportunity for scalability without any affection on the other layers of the network architecture. The AHGL reduces the unnecessary traffic in the ad hoc grid environment because it includes only the nodes that satisfy the criteria for execution of certain jobs. This paper
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 196–201. © Springer Science+Business Media B.V. 2008
GRID COMPUTING IMPLEMENTATION IN AD HOC NETWORKS
also illustrates how services are deployed at the nodes in the ad hoc network. The rest of the paper is as follows. Section 2 defines the position of the AHGL in the network architecture and it describes the services that are implemented in AHGL. In Section 3 sequential description of AHGL in master mode is presented while in Section 4 the description of AHGL in slave mode is given. In Section 5 the implementation of AHGL in network simulator is presented. Section 6 presents the simulated scenario and gives analysis of the obtained results. In Section 7 the conclusion and future work in this layer are presented. II. AD HOC GRID LAYER (AHGL) The Ad Hoc Grid Layer is deigned as a separate layer in the network architecture of all the nodes in the ad hoc grid environment (fig. 1). Depending on the function of the node the layer can be in Master or Slave mode. AHGL for a given node is in master mode when that node is supposed to find free resources through out the ad hoc network, to schedule jobs and to dispatch them to the rest of the nodes. The AHGL for nodes that are meant to receive jobs, execute them and reply, with the results is in slave mode. APPLICATION LAYER AHGL Slave
Master application interface
resource finder
scheduler
dispatcher
resource responder
197
number of instructions that every job has. These parameters are used to check if the job can be placed in the available memory of the node and to calculate the time needed for job execution. The output for this service is in form of a table of parameters (fig. 3). The first and the second column (IDapp and IDjob) are used for identification of the jobs; the third column is used for the length of the job in Bytes and the fourth is the number of instructions (Nins).
Fig. 3 Example of the parameters table
For example, the first row presents the parameters of a job that belongs to the first application, the job has ID=12, size of 4378 KB and 212 instructions. B. Resource Finder The dynamic properties of ad hoc networks do not allow a central register node. This is the reason why we have implemented a service which discovers free resources across the network in the moment when there are incoming jobs to be executed. This is done by the resource finder service that generates AHGL request packet and broadcasts it to all available nodes in the network. The structure of AHGL request packet is shown in fig.4.
task execution
Ad Hoc Routing Protocol
Fig. 1 Position of AHGL in the ad hoc grid network architecture
All of these layer functions are realized by the means of different services. Depending on the role of the node, the particular services are activated. In AHGL there are implemented six possible services: application interface, resource finder, scheduler, dispatcher, resource responder and job execution service. A. Application interface service The first goal of the AHGL layer is to divide the application on small grid jobs with structure as shown in fig. 2. This is done by the Application interface service. Taking into consideration that every node may have several applications to execute, their distinction is done by the field IDapp. Also jobs which are part of the same application must be distinguished. This is done using the IDjob field. The last field IDSW is used to specify the software that is needed for proper execution of the job. The data needed to execute a certain job is in the DATA field.
Fig. 4 AHGL request packet
The first field is the destination of the packet (IDnode) while the second field (SEQ) is used to distinguish the different requests from a same node and to escape forever loops. To reduce traffic and to exclude the nodes that cannot execute even the smallest jobs, we are using the minimalism criterion. The following fields are parameters that every node must satisfy in order to be considered as a part of this ad hoc grid environment. These parameters are the minimum speed of the processor in GHz (CPU), the minimum free memory in KB (MEM) and minimum amount of the battery (BAT). Nodes that satisfy the given criterion reply with an AHGL reply packet shown on fig. 5. Using this packet, the nodes announce their processor speed (CPU), amount of free memory (MEM) and battery status (BAT). For future calculation of time needed to transmit the packet we also count the number of the hops needed to reach node (H). Nodes that are offering free resources are available for certain period of time. This free period of time starts at TS and ends at TE. During this period, nodes can be used in a given effective interval (TEFF).
Fig. 2 Representation of a job Fig. 5 AHGL reply packet
According to the specified software (IDSW) the Application interface service calculates the size and the
The resource finder service creates the resource table using the received AHGL reply packets, which is presented
198
GRNAROV ET AL.
in fig. 6, and afterwards used by the next service (scheduler) as input for the execution time optimization.
The first field IDnode is used as a destination for the job packet, while the other fields are used to represent the job (IDapp, IDjob, IDSW and DATA). After the execution of the job, the node generates a result packet (fig. 9).
Fig. 6 Example of resource table
For example, the first row presents the information obtained from node with ID=5 that has processor speed of 2.5 GHz, free memory of 284850 KB, is reachable in two hops from the master. It is available in time interval between 935 s and 8625 s, offers effective interval of 2500 s, and has Round Trip Time (RTT) of 1.7452 seconds. C. Scheduler The purpose of this service is to find the optimum time for execution of all jobs using a given scheduling algorithm, where the process is influenced by different parameters. Such parameters are: start and end time of the nodes availability (TS, TE), effective time (TEFF), processor speed (CPU), number of instructions (Nins), job size, network throughput (R), number of hops, round trip time (RTT) and the job priority. All those parameters are read from resource table and parameters table. The scheduling process first checks the time of the availability of the node (TEFF). It also checks whether the node offers enough time for execution of the job and whether the free memory of the node is big enough for the size of a certain job. If these criteria are achieved, the job is assigned to that node and its effective time is updated. The result of algorithm is a dispatch table (fig. 7) announcing which job is to be executed at which node.
Fig. 9 Result packet
The first field (IDnode) is the ID of the master node, the fields IDapp and IDjob represent the application and the job for which the results are intended. Because of the nature of the ad hoc networks, there is a great possibility that a certain node will not send a result as a consequence of a node failure or connection failure. In this case, the jobs assigned to that node, must be rescheduled. E. Resource Response Service This service monitors the available resources for a given node and it responds when AHGL request packets are received. First it checks the sequence number (SEQ) of the request then it compares the fields: processor (CPU), free memory (MEM) and battery energy (BAT) from the packet with the actual values from the node. If the values from the node are bigger, then the criteria are satisfied and the node generates AHGL response packet (fig. 5), which is then transferred to the lower ad hoc routing layer. F. Job Execution Service Job execution is the last service and is activated when the job arrives at the slave node. The job is forwarded to the upper layer for execution and the obtained results are packed and sent to the master node. III. AHGL IN MASTER MODE
Fig. 7 Example of Dispatch Table
For example, the first row represents a job with ID=14 that belongs to the first application which is assigned to the node 12 for execution. D. Dispatcher The Dispatcher is the fourth service in AHGL and its job is to dispatch the jobs to the appropriate nodes and to forward the obtained results to the Application interface service. This service uses the dispatch table, parameters table, and the queue with the jobs and it creates the job packet (fig. 8). The parameter table is used by the dispatcher to send jobs to the assigned node until the node has a free amount of memory. The packets are transferred to the lower layers and using the ad hoc routing protocol, sent to the destination nodes.
Fig. 8 Job packet
The work flow for a node in a master mode is shown on fig.10. In step 1, after receiving application for grid computing, the application interface service separates the application in smaller jobs forming a queue. The next step the service calculates the parameters for every job and forms the parameters table. In step 3 the resource finder service is activated. This service searches for free resources in the network using an AHGL request packet. The resource request packet is broadcasted to all the nodes in the network (step 4). The nodes that have received the packet send AHGL reply packet with information for the free resources that they have (step 5). Using all of the collected replies (step 6), the service generates a resource table with all the free resources for the nodes that satisfy the criterion for minimal amount of free memory, processor speed and battery status. This table together with the parameters table are then used by the scheduler (step 7). The scheduler uses a given algorithm in order to find the optimum time of execution for all the jobs for a certain application by
199
GRID COMPUTING IMPLEMENTATION IN AD HOC NETWORKS Application 1
AHGL
13
activation
application interface parameters table
3
R1 R2
Job queue
2
J1
J2
J3
J4
...
...
Jn
Rn 9
7 6
resource finder 4
resource table
7
8
scheduler
dispatch table
12
9
dispatcher 10
5
9
11
Ad Hoc routing protocol Fig. 10 AHGL in the master mode
deciding which job at which node will be executed. The output of the scheduler is a dispatch table which describes which job will be executed at which node (step 8). The dispatch table, parameters table and the job queue are input in the fourth service – dispatcher (step 9). This service is responsible for dispatching the jobs to the corresponding slave nodes (step 10). After the completion of the jobs, in step 11, the dispatch service waits for the results from the slave nodes. All of the results are sent to application interface service (step 12). The last step is performed when the application interface service transfers the result to the upper layer (step 13). IV. AHGL IN SLAVE MODE AHGL in slave mode is much simpler. The slave node has to announce its free resources if it fulfills the given criteria. It also executes jobs which are dispatched to it. This is achieved with the activation of the last two services: resource responder and job execution service. The work flow for this mode is shown on fig.11. Application layer
AHGL
4
resource responder 1
job execution
2
5
job execution service 3
6
Ad Hoc routing protocol Fig. 11 AHGL in slave mode
In step 1, the node receives an AHGL request packet generated by the master. It generates an AHGL reply packet
filled with its free resources (step 2), which is then sent to the master node. Upon receiving packets with job description (step 3), the job execution service transfers the job to the application layer (step 4). When the job is finished, the results are sent back to the job execution service (step 5). Then the result is packed into a result packet and it is sent back to the master node (step 6). V. IMPLEMENTATION IN NS2 For the purposes of analysis of the performances of the ad hoc grid, the previously introduced AHGL was implemented as an extension to the commonly used NS-2 network simulator [10]. The AHGL is implemented using two separate types of application layer derived from the Agent class, FindResources and ExecuteTasks which simulate the behavior of the corresponding resource finder and task execution services, respectfully. For each of the agents the analogous packet types and header control fields are defined. The agents are also keeping track of the specific characteristics of the node where they are activated. In this way, we are able to use the existing routing and other lower level protocols with no necessity to make any kind of changes in their implementation. The FindResources agent is started at each node with the start of the simulation. When working in master mode the agent floods the ad hoc network broadcasting a resource request packet, after which awaits the responses from the slaves. For each received reply, the agent updates the resources table with the received parameters for the packet source slave node. In order to be able to compensate for any lost packets in the network, after a certain period of wait time, the agents stops awaiting for responses and goes into idle state. The agent activated in slave mode is programmed to create a reply packet when they receive a resource request packet and they are satisfying the needed characteristics concerning the processor, battery and memory. The slave
200
GRNAROV ET AL.
VI. SIMULATION This section presents practical results of the implementation of AHGL in NS-2. The simulations where done in order to analyze the affect of the node density on the time needed for execution of a certain application.
Application contains 100 jobs on networks with different number of nodes ranging from 20 to 50. The nodes are randomly scattered on a 600 x 600 m terrain. The CPU is in range from 0.5 to 3.5 GHz and every node has physical memory of 512 MB. The jobs size is from 5 to 30 KB and the number of instructions for one job is calculated as a square from the job size. As a scheduling algorithm for dispatching the jobs among the nodes we used the First Come First Served algorithm. In order to observe the influence of the network density on the time needed for the execution of the application we measured the speedup which is defined as the ratio between the execution time of the application on one node and the execution time of the application in ad hoc grid with a given number of nodes. The time for execution of all the jobs is measured from the moment of dispatching the first job in AHGL until the last result is received. In Fig. 12 speedup calculated from the simulations is shown. It can be concluded that by increasing the number of nodes, the execution time of a certain application decreases. When the number of nodes reaches some threshold the obtained speedup is saturated or is with less intensity.
6,2
5,7 Speedup
fills out the reply packet with the necessary information about its characteristics and then unicasts the packet to the request source, i.e the master, after which goes into idle state. After replying to the master, the slave then continues the network flooding process. Thus upon receiving another request packet (because of the network flooding process) it drops the packet with no other action. The ExecuteTasks agent is also started at each node and runs in two modes master and slave, depending on the node type. At the master, the agent reads the resource table and the scheduled tasks table in order to obtain information about the next task that needs to be executed. For each task that needs to be executed the agent determines the destination, creates one or several packets that contain the task and sends them to the slave for processing, after which marks the slave as busy. When the master receives a reply from one of the slaves that informs him about the completion of the task and the corresponding response, the master checks to see if there are more tasks for the given slave and, if this is the case, creates new packets with the next task that is to be executed at that slave. The constraints regarding the availability time of the slaves are taken into consideration by scheduling all task packets to be sent in the time interval when the corresponding slave is known to be available, which is made using the NS-2 own Scheduler class. When in slave mode, this agent responds to received task packet by creating a reply packet which contains a bogus result, which is scheduled to be sent to the master after the calculated delay for the purposes of simulation of the time needed to calculate the appropriate result. The total time needed for execution of each task in the ad hoc grid is calculated at the master as the difference between the time stamp on the sent task packet and the time stamp on the received reply for the given task from the slave. In order to be able to simulate the work of the ad hoc grid in a smooth fully featured manner, an AHGTool was developed. The AHGTools firstly reads the values for the adjustable parameters like number of nodes, number of tasks, scheduling algorithm to be used and so on, and afterwards creates tcl scripts which are used for simulation of the ad hoc grid behavior using NS-2. Firstly the ad hoc network topology is defined, then the FindResources agent is activated at each node and the resource finder service is simulated, afterwards the scheduler activity is activated and, in the end, the ExecuteTasks agents are activated and the task execution service is simulated. After all of the ad hoc network activity is over, the results are automatically processed in order to obtain the desired values for the observed parameters of the network.
5,2
4,7 0,2
0,25
0,3
0,35
0,4
0,45
0,5
number of nodes / number of jobs
Fig. 12 Speedup achieved in Grid environment relatively to one node
VII. CONCLUSION In this paper we presented a new layer for grid computing in ad hoc networks called Ad Hoc Grid Layer (AHGL). This layer is a result of the implementation of the grid services in an ad hoc network. Its position in the network architecture allows this layer to be independent from the upper or the lower layers and it allows new services to be added without changing the other layers. AHGL was created in a manner that corresponds to the ad hoc network specific features. It is designed to be able to adapt to the network dynamical environment with a high fault tolerance degree. The proposed AHGL is implemented in the NS2 simulator. The implementation allows us to investigate the advantages and the possible weaknesses of the model. It also gives the opportunities to make an optimization of the different parameters that influences the performances of the ad hoc grid. The scheduler service of AHGL offers large number of different kind of parameters for the network and the nodes
GRID COMPUTING IMPLEMENTATION IN AD HOC NETWORKS
that can offer implementation of a different scheduling algorithm. In our future work we intend to conduct a research in order to discover how the new scheduling algorithms affect the performances of the AHGL. The main goal of the research will be to analyze when grid computing is effective in the ad hoc environment. REFERENCES [1] [2] [3]
J. Hwang and P. Aravamudham. Middleware services for P2P computing in wireless grid networks. IEEE Internet Computing, 8(4):40–46, 2004. S. Kurkovsky, Bhagyavati, and A. Ray. A collaborative problemsolving framework for mobile devices. In ACMSoutheast Regional Conference (ACMSE), pages 5–10, 2004. M. Waldburger and B. Stiller. Toward the mobile grid: Service provisioning in a mobile dynamic virtual organization. In IEEE International Conference on Computer Systems and Applications (ICCSA), pages 579–583, 2006.
[4]
201
Matthew Smith, Thomas Friese, Bernd Freisleben. Towards a Service-Oriented Ad Hoc Grid. Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks (ISPDC/HeteroPar’04) pp. 201-208 [5] R. Baldoni, R. Beraldi, and A. Mian. Survey of service discovery protocols in mobile ad hoc networks. Technical Report MidLab 7/06, Dip. Informatica e Sistemistica, Universita di Roma, 2006. [6] Zhuoqun Li, Lingfen Sun and Emmanuel C. Ifeachor. Challenges of mobile ad hoc grid and their application in e-healthcare. University of Plymouth, UK. [7] Imran Ihsan, Muhammad Abdul Qadir and Nadeem Iftikhar, Mobile Ad-Hoc Service Grid-MASGRID. ENFORMATIKA V5 2005 ISSN 1305-5313. [8] Zhi Wang, Bo Yu, Qi Chen, Chunshan Gao. Wireless Grid Computing over Mobile Ad-Hoc Networks with Mobile Agent. 07695-2534-2/05 IEEE 2006. [9] Antonio Tadeu A. Gomes, Artur Ziviani, DICHOTOMY: A Resource Discovery and Scheduling Protocol for Multihop Ad hoc Mobile Grids. In Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGRID), pages 719-724, 2007. [10] The network simulator - NS-2 Available: http://www.isi.edu/nsnam/ns
One-Channel Audio Source Separation of Convolutive Mixture 1
Jalal Taghia1, Jalil Taghia2 Dept. of Electrical and Computer Engineering, Shahid Beheshti University, Tehran, Iran 2 Dept. of Electrical Engineering, Shahed University, Tehran, Iran 1 Email: [email protected], 2 Email: [email protected]
Abstract-Methods based on one-channel audio source separation are more practical than multi-channel ones in the real world applications. In this paper we proposed a new method to separate audio signals from single convolutive mixture. This method is based on subband domain to blindly segregate this mixture and it is composed of three stages. In the first stage, the observed mixture is divided into a finite number of subbands through filtering with a parallel bank of FIR band-pass filters. The second stage employed empirical mode decomposition (EMD) to extract intrinsic mode functions (IMFs) in the each subband. Then we obtain independent basis vectors by applying principle component analysis (PCA) and independent component analysis (ICA) to the vectors of IMFs in the each subband. In the third stage we perform subband synthesis process to reconstruct fullband separated signals. We have produced experimental results using the proposed separation technique. The results showed that the proposed method truly performs separation of speech and interfering sound from a single mixture.
I.
INTRODUCTION
Separation of mixed audio signals has many potential applications such as music transcription, speaker separation in video conferencing and robust audio indexing in multimedia analysis. A recording done with multiple microphones enables techniques which use the spatial location of the source in the separation [1, 2], which often makes the separation task easier. However, often only a one-channel recording is available and the methods based on one-channel audio source separation are more practical in the real world applications. There are some existing algorithms of single mixture audio source separation [3, 4]. In this paper we consider the blind source separation (BSS) of single convolutive mixture of audio signals in a real environment. In a real environment, the signals are mixed with their reverberations. In order to separate such complicated mixture, we need to estimate unmixing filters of several thousand taps. Moreover, in a real environment, an impulse response does not remain unchanged even for several seconds. Early BSS approaches focused on treating the problem entirely in the time domain. Still, by having to employ fairly long unmixing filters to reach an adequate level of separation, such techniques are inherently slow and computationally inefficient for long reverberation times. Lured by the potential of substantially reducing excessive computational requirements, many authors have recently suggested to carry
out separation in the frequency [5, 6] or the subband domain [7, 8, 12]. In [9] has shown that the performance becomes poor with frequency domain BSS when we apply a long frame to several seconds of speech to estimate a long unmixing filter that can cover realistic reverberations. Motivated by these facts, we use a BSS method that employs subband processing (subband BSS). With this method, observed signal is converted in to the subband domain with a filter bank so that we can choose a moderate number of subbands and maintain a sufficient number of samples in each subband. Moreover length of unmixing filter, in each subband, is shorted and therefore we will be able to employ time domain ICA algorithm in the separation process efficiently. The proposed separation process includes empirical mode decomposition (EMD) [10] in each subband. The EMD decomposes the mixture signal as a collection of some oscillatory basis components termed as intrinsic mode functions (IMFs) containing some basic properties [10]. EMD can be applied to any nonlinear and nonstationary signal. In addition, EMD uses only single mixture (obtained from one sensor) to extract IMFs and thus it will be suitable for one-channel source separation. After applying EMD, we consider IMFs as observations. Then the principal component analysis (PCA) is applied to observations to produce some uncorrelated and dominant basis vectors. Also independent component analysis (ICA) is employed to make those basis vectors independent in each subband. Finally, we used a Kullback-Leibler divergence (KLD) based K-means clustering algorithm to cluster the independent vectors [11]. Then the unmixed signals which are related to each cluster are synthesized to obtain fullband unmixed signals (estimated source signals). Experimental results substantiated the strong potential of the proposed method for the single mixture audio source separation. II.
SUBBAND BASED BSS
In this paper we used the subband BSS system that is formed of three stages [12]: 1. Subband analysis stage. 2. Separation stage. 3. Subband synthesis stage. These stages are shown in Fig.1. A.
Subband Analysis Stage In the first stage, the observed single mixture x(t ) is divided into a finite number of K subbands through filtering with a parallel bank of FIR band-pass filters. To execute BSS
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 202–206. © Springer Science+Business Media B.V. 2008
203
ONE-CHANNEL AUDIO SOURCE SEPARATION OF CONVOLUTIVE MIXTURE Subband Analysis Stage
h1
x
h2
hK
M
M
M
Subband Synthesis Stage
Separation Stage
EMD & BSS EMD & BSS
Clustering Block
EMD & BSS
M
g1
M
g2
M
gK
M
g1
M
g2
M
gK
∑
u1
∑
u2
Fig. 1. The proposed subband model consisting of a subband analysis stage, a separation stage and a subband synthesis stage in the two sources and one sensor case.
on real-valued signals, we used the subband analysis filter bank as the cosine modulated version of a prototype filter h0 ( n) of length N and cutoff frequency ωc = π / K [16]: 1 nπ (1) hk (n) = h0 (n) cos[(k − ) ], k = 1,2,..., K 2 K 2nπ (2) h0 (n) = sinc ( ) w(n) . N As shown in Fig. 2, h0 (n) is a truncated sinc(·) function weighted by a Hamming window with w(n) = 0.54 − 0.46 cos(2nπ / N ),
n = 1,2,..., N .
(3)
the prototype filter h0 (n) is used to derive the analysis and the corresponding synthesis filter banks depicted in Fig.1. After the subband filtering stage, the effective bandwidth of the decomposed observed signal in each subband is reduced by a factor of 1 / K compared to the wider bandwidth of the original fullband signal. Then we employ decimation at the down-sampling rate M ( M < K ) to reduce the aliasing problem. Generally, the subband analysis stage yields the realvalued signals
x(k ,τ ) =
N
∑ hk (n) x(τ − n)
(4)
n =1
where x(k ,τ ) is down-sampled and band-limited signal in k th subband (k = 1,2,..., K ). And τ = lM denotes the time index at the reduced sampling rate for some integer l. B. Separation Stage Empirical Mode Decomposition The Empirical Mode Decomposition (EMD) [10] is a signal processing technique capable of extracting all the oscillatory modes present in a signal at different length scales. EMD decomposes a time series into components by empirically
identifying the physical time scales intrinsic to the data. Each extracted mode, named Intrinsic Mode Function (IMF), contains some basic properties [10] so that in the whole data set, the number of extrema (minima and maxima) and the number of zero crossing must be same or differ at most by one. In addition, the mean value of the envelope defined by the local is always zero. The decomposition of a signal e.g. X is trained by an iterative procedure in six steps: step 1. Identification of all maxima and minima of the series X.
step 2. Generation of the upper and lower envelope of X . step 3. Point by point averaging of the two envelopes to compute the local mean series m. step 4. Subtraction of m from the data to obtain an IMF candidate c = X − m. step 5. Check the properties of c : if c is not an IMF, replace X with c and repeat the procedure from step 1. If c is an IMF, evaluate the residue r = X − c. step 6. Repeat the procedure from step 1 to step 5 by sifting the residual signal. The process ends when the range of the residue is below a predetermined level or the residue has a monotonic trend. At the end, we obtain a collection of B components ci (i = 1,2,..., B) and a residue r. The signal X can be exactly reconstructed as a linear combination B
X = ∑i =1 ci + r.
The EMD can be applied to any nonlinear and nonstationary signal. Moreover, it uses only single mixture (obtained from one sensor) to extract IMFs. In this paper we used the EMD method to extract IMFs from the audio mixture. Separation Algorithm The proposed algorithm, for single mixture source separation, is based on employing the EMD method. In the
204
TAGHIA AND TAGHIA
first step we applied the EMD algorithm to the down-sampled and the band-limited signal x(k ,τ ) ( k = 1,2,..., K ) in each subband to extract IMFs ci(k ) ( i = 1,2,..., B and B is number of IMFs) which are related to k th subband. The dimensionality of ci(k ) is 1 × L ( L is length of data). In Fig.3 the first four IMFs are shown that they are obtained from applying the EMD algorithm to the signal of the first subband x(1,τ ) . In the second step we formed an observation matrix YL(k× B) in the k th subband: T
YL(×kB)
⎡c (k ) ⎤ ⎢ 1(k ) ⎥ ⎢c ⎥ =⎢ 2 ⎥ , ⎢ # ⎥ ⎢c (k ) ⎥ ⎣ B ⎦
( k = 1,2,..., K )
(5)
Sample
Fig. 2. Impulse response of the low-pass real-valued prototype FIR filter h0 ( n) with length N = 512.
where T stands here as a transpose operator and B indicates the number of IMFs in each subband. Then we applied the principle component analysis (PCA), which is implemented by singular value decomposition (SVD) [13], to the observation matrix YL(k× B) in each subband. Consequently the matrix PCL( k×)2 would be resulted which is composed of two uncorrelated basis vectors, as below: PC L( k×)2 = PCA [Y L(×k )B ]
IMF1
(6)
IMF2
where L is the length of the data and B is the number of IMFs. Two basis vectors obtained by PCA are only uncorrelated but not statistically independent. To derive the independent basis vectors, we need to employ independent component analysis (ICA). Thus we used the EFICA algorithm [14]. The EFICA is based on maximizing of NonGaussianity measure and it can be measured using the marginal entropy. After applying the EFICA algorithm, the independent basis vectors are obtained: IC 2( k×)L = EFICA([ PC L( k×)2 ]T )
IMF3
IMF4
(7)
IC2( ×k )L
where is composed of two independent vectors in k th subband. Each of these independent vectors belongs to the one of the sources.
C. Subband Synthesis Stage After the separation stage, we will have totally 2 K ( K is number of subbands) independent components (see Fig.1) that half of them belong to source1 (e.g. speech signal) and rest of them belong to source2 (e.g. music signal). Thus we need to cluster independent vectors into two groups. In this paper we used a Kullback-Leibler divergence (KLD) based K-means clustering algorithm for the grouping process (introduced in [11]). The KLD is used to measure the information theoretic distance between two independent vectors during K-means clustering. The vectors are automatically grouped into two clusters according to the entropy contained by individual vectors. In fact the KLD is an information theoretic measure and therefore, performs better on the clustering of independent
Fig. 3. Showing first four IMFs (out of 14) after applying the EMD to audio mixture (speech and music) in the first subband.
vectors [11]. Therefore we obtained two groups that all independent vectors belong to source1 are placed in the first group and we have indicated each of them, which is related to (k ) kth subband, with u1 . On the other hand, the residual independent vectors which are related to source2 are placed in the second group and we have indicated each of them with u 2( k ) .
For reconstructing the fullband separated signals we performed following steps: all of independent vectors belong to each group ( u1( k ) , u 2( k ) ) are first up-sampled by the interpolation factor M , next filtered by the synthesis filters g k (n) :
1 nπ g k (n) = g 0 (n) cos[(k − ) ], 2 K
(8)
ONE-CHANNEL AUDIO SOURCE SEPARATION OF CONVOLUTIVE MIXTURE
205
where the baseband synthesis filter g 0 (n) is actually a time reversed copy of the analysis prototype filter h0 (n) (Eq.2) equal to g 0 (n) = h0 ( N − n − 1)
(9)
with n = 1,2,..., N and k = 1,2,..., K . The subband synthesis stage is indicated in Fig.1. Finally, the fullband separated signals in the subband synthesis stage output are given by: K N u j (t ) = ∑ ∑ g k (n)u (jk ) (t − n)
(10)
k =1 n =1
for j = 1,2 with t denoting the time index at the restored sampling rate, such that t = l / M . III. EXPERIMENTAL RESULTS To investigate the potential of the proposed subband based method for achieving one-channel speech separation in challenging the real environments, few examples of mixture streams of two audio signals are used. All of the mixtures are recorded from one sensor and these mixtures are denoted as mix.1, mix.2 and mix.3. They are the mixtures of the male speech with music, the male and the female speech and two male speeches together, respectively. TABLE I THE EXPERIMENTAL SEPARATION RESULTS (IN TERMS OF ISNR) OF THE PROPOSED METHOD.
Mixtures
ISNR (dB) of source1
ISNR (dB) of source2
Mix1 (male speech & music sound)
7.61
8.23
Mix2 (male & female speeches)
5.46
7.04
Mix3 (two male speeches)
4.91
5.30
TABLE II THE EXPERIMENTAL SEPARATION RESULTS (IN TERMS OF OSSR) OF THE PROPOSED METHOD.
OSSR of source1 (OSSR1)
OSSR of source2 (OSSR2)
Mix1 (male speech & music sound)
0.3810
0.1956
Mix2 (male and female speeches)
-0.4932
-0.2213
Mixtures
Mix3 (two male speeches)
Fig. 4. Upper two waveforms indicate two real audio signals (speech and music), middle one indicates mixed signal (observed from one sensor) and lower two waveforms are separated signals (estimated using the proposed method).
All of the audio signals (their duration are three seconds) are recorded at a sampling rate of 11.025 kHz and 16-bit amplitude resolution. For subband analysis and synthesis, we used the polyphase filter bank that is generated by the prototype filter h0 (n) ( h0 (n) is shown in Fig. 2) and the length of prototype filter h0 (n) is 512 (i.e. N = 512 ). In our subband model, the number of subbands K was 32 and the down-sampling rate M was 8. We chose this downsampling rate ( M = 8) to reduce the aliasing problem. To extract IMFs, we used the EMD algorithm in the each subband and 14 IMFs are generated. In this paper, in order to measure the distortion between the original sources and the estimated sources, we used the improvement of signal-to-noise ratio (ISNR) as the quantitative measure of separation performance [15]. The ISNR is the difference between input and output SNRs. The input SNR ( SNR I ) is defined as SNR I = 10 log
-0.4771
2
t
∑ x(t ) − s j (t )
(11)
2
t
where x(t ) is the mixed signal, and s j (t ) is the original signal of the jth source. If u j (t ) is the separated signal (estimated source signal) corresponding to jth source, the output SNR ( SNRO ) is defined as:
∑ s j (t ) SNRO = 10 log
-0.5266
∑ s j (t )
2
t
∑ s j (t ) − u j (t ) t
2
.
(12)
206
TAGHIA AND TAGHIA
Then we employed ISNR(dB) = SNR I − SNRO as the performance measure. Table.1 shows the ISNR of each signal for every mixture (mix.1, mix.2 and mix.3). The source1 represents the male speech, and the source2 indicates the other signals such as the music, the female speech and another male speech. The ISNR represents the degree of suppression of the interfering signals to improve the quality of the target one [15]. Also we considered another measure to quantify the separation efficiency [11, 15]. In this measure the average value of the running short-term relative energy between the original and the separated signals is used. This measure is termed as original to separated signal ratio (OSSR) and is defined as: OSSR j =
⎛ ∑ w s 2 (t + τ ) ⎞ 1 ⎟ ⎜ τ =1 j , ∑ log 10⎜ w 2 T t ⎜ ∑τ =1 u j (t + τ ) ⎟⎟ ⎠ ⎝
j = 1,2
(13)
where s j (t ) is the jth original signal and u j is the jth separated signal ( j = 1,2) that T is the length of the mixture signal and w is a frame of 20 ms length. It is noticeable that for zero energy in a particular window, no OSSR measurement is obtained and when the original and the separated signals are exactly the same, the OSSR takes 0. Table.2 shows OSSR of each signal for every mixture (mix.1, mix.2 and mix.3). Finally, the waveforms of two real audio signals (speech and music), the mixed signal (observed from one sensor) and separated signals (estimated using the proposed method) are shown in Fig.4. IV. CONCLUSION A new method of single mixture audio source separation is presented in this paper. One of the important advantages of proposed method is its ability to separate the single convolutive audio mixture which is obtained from one sensor in a real environment. In a real environment, the signals are mixed with their reverberations. Thus to separate such complicated mixtures, we need to estimate unmixing filters of several thousand taps. Moreover, in a real environment, an impulse response does not remain unchanged even for several seconds. Since time domain and frequency domain have some problems to separate such mixtures, we used subband domain BSS method. With this method, the observed mixed signal is converted in to the subband domain with a filter bank so that we can choose a moderate number of subbands and maintain a sufficient number of samples in each subband. Moreover, length of unmixing filter in each subband is shorted and therefore we will be able to employ time domain ICA algorithm in separation process efficiently. In the separation process we used the empirical mode decomposition (EMD) to extract intrinsic mode functions (IMFs) in each subband. In fact the employing of EMD made us able to separate single audio mixture. With the aid of experiments we showed that the proposed method is applicable and useful in the singlemixture audio source separation.
REFERENCES [1]
D.E. Dudgeon, and R.M. Mersereau, Multidimensional Digital Signal Processing, Prentice Hall, Englewood Cliffs, USA, 1984. K. Torkkola, Blind separation of delayed and convolved sources, John Wiley & Sons, 2000. [3] G.J. Jang, and T.W. Lee, “A maximum likelihood approach to single channel source separation,” Journal of Machine Learning Research, vol.4, pp. 1365-1392, 2003. [4] G.J. Jang, T.W. Lee, and Y.H. Oh, “Single channel signal separation using time-domain basis functions,” IEEE Signal Process. Lett., vol.10, pp. 168-171, 2003. [5] L. Parra, and C. Spence, “Convolutive blind separation of non-stationary sources,” IEEE Trans. Speech and Audio Process., Vol. 8, pp. 320-327, 2000. [6] P. Smaragdis, “Blind separation of convolved mixtures in the frequency domain,” Neurocomp., Vol. 22, pp.21-34, 1998. [7] K. Kokkinakis, and P.C. Loizou, “Subband-based blind signal processing for source separation in convolutive mixtures of speech,” IEEE International Conf. on Acoustic, Speech and Signal Processing, Honolulu, HI, pp. 917-920, 2007. [8] N. Grbic, X.J. Tao, S.E. Nordholm, and I. Claesson, “Blind signal separation using overcomplete subband representation,” IEEE Trans. Speech and Audio Process., Vol. 9, pp. 524-533, 2001. [9] S. Araki, S. Makino, T. Nishikawa, and H. Saruwatari, “Fundamental limitation of frequency domain blind source separation for convolutive mixture of speech,” IEEE Trans. Speech and Audio Process., pp. 27372740, 2001. [10] N.E. Huang, Z. Shen, S.R. Long, M.C. Wu, H.H. Shih, Q. Zheng, N.C. Yen, C.C. Tung, and H.H. Liu, “The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis,” Journal of Proceedings the Royal of Society, pp. 903-995, 1998. [11] K.I. Molla, K. Hirose, and N. Minematsu, “Separation of mixed audio signals by decomposing Hilbert spectrum with modified EMD,” IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences, pp. 727-734, 2006. [12] S. Araki1, S. Makino, R. Aichner, T.Nishikawa, and H. Saruwatari, “Subband-Based blind separation for convolutive mixtures of speech,” IEICE Trans. on Fundamentals of Electronics, Communications and Computer Sciences, pp. 3593-3603, 2005. [13] M.E. Wall, A. Rechtsteiner, and L.M. Rocha1, “Singular value decomposition and principle component analysis,” Modeling, Algorithms, and Informatics Group (CCS-3), Los Alamos, New Mexico 87545, USA, 2003. [14] Z. Koldovský, P. Tichavský, and E. Oja, “Efficient variant of algorithm FastICA for independent component analysis attaining the CramÉr-Rao lower bound,” IEEE Trans. on Neural Networks, vol. 17, pp. 1265-1277, 2006. [15] E. Vincent, R. Gribonval, and C. Fevotte, “Performance measurement in blind audio source separation,” IEEE Trans. on Speech Audio Process., vol. 14, pp. 1462-1469, 2005. [16] K. Nayebi, T.P. Barnwell, and M.J.T. Smith, “Time-domain filter bank analysis: A new design theory,” IEEE Trans. Signal Process., vol. 40, pp. 1412-1429, 1992. [2]
Extension of Aho-Corasick Algorithm to Detect Injection Attacks Jalel Rejeb, and Mahalakshmi Srinivasan Department of Electrical Engineering, San José State University, One Washington Square, San José, CA 95110. USA email addresses: [email protected]
Abstract In this paper we propose an extension to the AhoCorasick algorithm to detect injection of characters introduced by a malicious attacker. We show how this is achieved without significantly increasing the size of the finite-state pattern matching machine of the Aho-Corasick algorithm. Moreover, the machine would detect a match only if the number of stuffed characters is within a specified limit so that the number of false positives remains low. A comparison of the CPU time consumption between Aho-Corasick algorithm and the proposed algorithm is provided. It was found the proposed algorithm can outperform the Aho-Corasick, while ignoring the stuffed characters and detecting a valid match.
I. INTRODUCTION Recently, there has been compelling and renewed interest on the efficiency of string matching algorithms, as they are increasingly being used, for instance, in the implementations of intrusion detection systems(IDS). In IDS , a string matching process can be one of the most expensive task in Internet packet processing since it is expected to fetch the content of each arriving packet, to identify or detect malicious message, and without compromising the speed performance of the networks [1] [2]. Aho-Corasick (AC) algorithm is probably the most popular choice of string matching algorithms in the design of IDS since it features a simple multi-partner matching at considerably faster searching speed when compared to other conventional string matching algorithms [2], [3], [4]. Due to its appealing features, AC algorithm has received great amount of attention to enhance its implementation both in software and hardware platforms. For instance, in [6] bitmap
compression and path compression were proposed to reduce the size of the AC data structure which leads memory space reduction. In bitmap compression, every state of the AhoCorasick machine uses a single pointer instead of 256 next state pointers to point to a valid next state similar to the tree-bitmap implementation in Eatherton algorithm [7]. The offset from the next state pointer is calculated from the number of bits set prior to the matched character. An algorithm with a finite state model using data compression was proposed [8]. The compression is achieved using Huffman codes thus reducing the delay. However, in this method, 49 Huffman trees were required to reduce the time to 47% and text size to 43%. To overcome this problem, another approach was proposed in [9]. In this approach, the states for the Aho-Corasick pattern matching machine are rearranged to achieve CPU cache efficiency. This is done by renumbering or rearranging the frequently used states that are close to the initial state in a breadth-first order and then collecting them in Random Access Memory (RAM). In [10] Ternary Content Addressable Memory (TCAM) was proposed to achieve high speed memory access with parallel multi-character scheme. Other hardware enhanced implementations of AC were also provided in [2]. However, the focus of these implementations was mainly on improving the speed of AC where the pattern for matching is assumed to be known and fix, that is without injections. In this paper, we show how to extend the AC algorithm to include the detection of character injection with possible variable pattern. The injections are not assumed to be known to the IDS designer. We will present how this is achieved without significantly increasing the size of the finite-state pattern matching machine of the Aho-Corasick
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 207–212. © Springer Science+Business Media B.V. 2008
208
REJEB AND SRINIVASAN
algorithm. The rest of this paper is organized as follows. The next section provides an overview of AC algorithm where examples were given to illustrate the vulnerability of AC to injection attacks. Section 3 describes the proposed algorithms. Section 4 summarizes our test and simulation results. The conclusion is given in section 5. II. AHO-CORASICK STRING MATCHING ALGORITHM The original Aho-Corasick algorithm uses a finite state pattern matching machine which locates all occurrences of a pattern in a target string. The machine takes a text string as an input, compares it with the set of patterns and then provides the locations at which a match is found [4]. The machine is realized using three functions, namely: a Goto function, a failure function and an output function. In this work, the original AC algorithm is modified to detect one or more characters injected by a malicious attacker and hence to find a valid match in the incoming packet stream. The Goto function and failure function are implemented in a very similar way to that of original AC algorithm while the pattern matching machine is altered. An example of the Goto function, failure function and its output function are as shown in Figure1, and Figure 2, respectively. Table 1 depicts the corresponding output response. In this case ‘here’ and ‘there’ are the patterns which need to be detect or fetched in payload of Internet packets. In this example, possible injection attacks would include payload with malicious patterns such as “h$ere” , “h ere”, or “h%ere” which won’t be detected by the conventional AC algorithms since these patterns are not initially known attacks, and no corresponding matching states were constructed in goto function. For instance the well-known IDS package of Snort [3] which is based on AC algorithm cannot detect these types of injections unless all possible injections (modifications) of each pattern are determined and then added as separate new patterns. As a result detections of these injections, would considerable increase the number of states thereby the memory usage of the algorithm. In the next section an alternative approach is presented which can detect the injection patterns without
increasing the initial number of states of the AC.
t
1
h
6
h
e
2
r
3
e
4
5
0 7
e
8
r
9
e
Figure 1. Goto Function of Aho-Corasick Pattern Matching Machine
t
h
1
e
2
3
r
e
4
5
0 h 6
e
7
r
8
e
9
Figure 2. Goto Function with Failure Function
TABLE 1 OUTPUT FUNCTION OF AHO-CORASICK MACHINE. i 5
output(i) {there, here}
9
{here}
III. PROPOSED AC ALGORITHM In this section, the original AC algorithm is modified to detect one or more characters injected by a malicious attacker and find a valid match in the incoming packet stream. The Goto function and failure function are implemented in a very similar way to that of original AC algorithm while the pattern matching machine is altered. When a character that causes a transition is encountered, the machine moves to the next state. On the other hand, if a character that does not cause a transition is found, the machine remains in the same state until it encounters a character that causes a transition. With this
209
EXTENSION OF AHO-CORASICK ALGORITHM TO DETECT INJECTION ATTACKS
are encountered one by one, the algorithm is still in the state ‘0’. This is as shown in Figure 5.
simple added feature of the algorithm, there is also a limit to the number of characters for which the machine remains in the same state. If the number of characters not causing a transition exceeds this limit, the machine declares that there is no match and proceeds to the next set of characters. Without losing any generality, the proposed algorithm is best illustrated through the following example.
j, k, a 0
Figure 5. Waiting for a match in state ‘0’
In this example we consider ‘there’ and ‘them’ as the two patterns to be detected, and the incoming packet text is ‘jkathekarethuem’. The main steps of proposed algorithm are then as follows. a) First, we construct the AC state graph with the Goto and failure functions are shown in Figure 3 and Figure 4, respectively.
d) When the character, ‘t’ is found, there is a transition to state ‘1’ as shown in Figure 6. j, k, a t
0
1
Figure 6. Transition to state ‘1’
0
t
1
h
2
e
r
3
e
4
5
m
e)
6
Similarly, transitions happen until state ‘3’ is reached as shown in Figure 7.
Figure 3. Goto Function of the Proposed Algorithm.
t
0
0
t
1
h
2
e
r
3
4
e
3
Figure 7. Transition until state ‘3’
6
Figure 4. Goto Function with Failure Function b) Note, with this particular example the output function consists of two possible outputs; ‘there’ ending at position ‘10’ and ‘them’ ending at position ‘15’ within the inspected string. c) The text string is now compared with the patterns. When the first three characters given in the text string ‘jka’
e
2
5
f)
m
h
1
When the next character ‘k’ is encountered, the original AC algorithm takes the failure function path and goes back to state ‘0’. However, this is where the proposed algorithm differs from the original AC algorithm as it regards the character as a possible injected character and remains in state ‘3’ as illustrated in Figure 8.
k 0
t
1
h
2
e
\
3
Figure 8. Waiting in state ‘3’ ignoring injected character ‘k’
210
REJEB AND SRINIVASAN
g) An important observation is the need to set a a tolerance level for the number of injection in the string can have. Hence a variable called ‘padCount’ is provided to monitor the number of injected characters. The algorithm detects a match until the amount of injection is well within the tolerance level. Once it exceeds this level set by the user in the code, the algorithm does not consider it to be a valid match. This helps to keep the number of false positives minimal. False positive is a condition in which a match is found when it actually does not exist. Hence, in the given example, the pattern matching machine remains in state ‘3’ even when it encounters another injected character ‘a’ as shown in Figure 9. t
0
h
1
The following general observations can be made about the proposed algorithm when compared to the conventional AC one: 1.Type of Injection: The proposed algorithm can detect an injection irrespective of the type of the injected character. In other words, the injected character could be a number, symbol or an alphabet. This makes the proposed algorithm more flexible, thus reducing the effort on the part of the user to specify the type of injections.
3
Figure 9. Waiting in state ‘3’ after 2 injected characters h) Next, the algorithm finds the character ‘r’ which is a valid character to make a transition to the next state, state ‘4’, and then to state ‘5’ on encountering the character ‘e’ as shown in Figure 10. k, a t
0
h
1
e
2
r
3
e
4
Figure 10. Further Transition until state ‘5’ for ‘there’ i)
0
In a similar manner, the next match ‘them’ is detected ignoring the stuffed character ‘u’ resulting in a pad count of 1. The state transitions are as shown in Figure 11. u t
1
h
2
e
Finally, the outputs ‘there’ and ‘them’ are displayed with the position at which they end in the suspicious string i.e., at positions ‘10’ and ‘15’ respectively.
DISCUSSION
k, a
e
2
j)
5
2.Location of Injection: The proposed algorithm detects a match at any location of the injection. All the characters that are injected before the pattern, at the middle or after the end of the pattern will be detected. Whereas the original AC algorithm cannot detect a match if there are one or more characters in the middle of the pattern. 3.Implementation considerations: The construction of the state graph for every variation of the pattern, due to every injection, can render the AC less practical algorithm. In the proposed approach the pattern matching machine does not return to the failure state when an injected character is found, instead remains in the same state until a valid character causing a transition is found. Moreover, the proposed algorithm ensures that a transition does not occur if the number of characters inserted exceeds the threshold. Thus, the injected characters are detected without the need of extending the number of states making it more efficient than AC, as further illustrated in our simulation results. IV. TESTING AND SIMULATION RESULTS
3 m 6
Figure 11. Transition Sequence for ‘them’
AC algorithm and the proposed algorithm were evaluated with several test cases using Java. The test cases consisted of several patterns with one, two, three and four injected characters. An
EXTENSION OF AHO-CORASICK ALGORITHM TO DETECT INJECTION ATTACKS
example of the considered test cases with two injections in the pattern is shown in Table 2. TABLE 2 POSSIBLE VARIATIONS FOR PATTERN “security,” TO BE ADDED TO THE ORIGINAL AC ALGORITHM. Keyword Number of injected security
2
Possible injections s<>ecurity se<>curity sec<>urity secu<>rity secur<>ity securi<>ty securit<>y s<e>curity s<ec>urity s<ecu>rity s<ecur>ity s<ecuri>ty s<ecurit>y seurity se<cu>rity se<cur>ity se<curi>ty se<curit>y secrity secity secty secy secuity secuty secuy securty secury
Figure 12 summarizes our simulation for the comparison of the CPU time analysis. In this figure, the number of injections in a given keyword are shown in the x-axis and the corresponding CPU times are plotted on the yaxis. The “Modified algorithm” refers to conventional AC algorithm with added states to detect injections. In order to enable the conventional AC algorithm to detect the injections, we manually specified and added all possible injections to AC states. From the figure we can observe that the proposed algorithm is considerably faster than conventional since it requires less number of states when comparing the target string with the injected patterns. Moreover, with the proposed algorithm, there is no need to specify the variations of the primary keyword while building the state graph. In the case of no injection the overall CPU time of both approaches is about the same.
The above table shows possible variations of the pattern ‘security’ with two injected characters. The patterns used for evaluation also included a set of virus and worm signatures. Three of the different virus / worm signatures were obtained from the work [5] and listed in Table 3. TABLE 3 EXAMPLE OF VIRUS AND WORMS NAME SIGNATURES Virus/Worm name
Worm CodeRed
Trojan.URLspoof. DOS.Aardwolf.446
211
Signature 8bf450ff9590feffff3bf490434b434b898534feffffeb2a8bf48b8 8d68feffff518b9534feffff52ff9570feffff3bf490434b434b8b8 d8d4cfeffff89848d8cfeffffeb0f8b9568feffff83c201899568fef fff8b8568feffff0fbe0885c97402ebe28b9568feffff83c2018995 6c6f636174696f6e2e687265663d756e6573636170652827 *3a2f2f*25303140*2729
In the second part of our experiment we focused on the memory usage of the conventional and the proposed AC algorithms. The analysis includes the number of instances and the size occupied by each of the instances in the program code. The packages considered for the memory analysis were the default package, the edu.sjsu.ee.alg package and the java.lang package. The chart in Figure 13 shows the memory analyses for the test cases that we considered.
0e1fb82135cd21891e????8c06????b821250e07babc00cd 21b44abb3c008e06????cd21e8
The hardware and software setting used for our testing consist of Intel® Pentium® Mprocessor 1.70 GHz, 504 MB of RAM which is running Sun Java JDK 1.5 [6]. Using JDK 1.5 package, the calculation of time for building the state graph became easy as the package consists of a new method call ‘System.nanoTime()’ which helps us in measuring time precisely in nanoseconds. We used this method to calculate the CPU time consumed by the modified version and the proposed algorithms.
In this figure, the x-axis represents the number of injections in a keyword and the corresponding bytes consumed by the algothims are plotted on the y-axis. As expected, the proposed algorithm consumes much fewer instances than the modified version of the original AC algorithm. As the number of pattern variations increases, the number of instances invoked for building the state graph also increases which results in higher amount of RAM needed by the input modified AC algorithm.
212
REJEB AND SRINIVASAN CPU time (ns)
Modified AC Proposed
800000000 700000000 600000000 500000000 400000000 30000000 20000000 10000000 0 2
3
4
5
6
7
8
9
10
Number of character insertions in keyword
Figure 12. Comparison of CPU time consumptions
Modifi Prop d
120000 100000 800000 600000 400000 200000
2
3
4
5
6
7
8
9 1
0
Number of character
Figure 13. Basic Memory Analysis chart of the input modified and proposed algorithms
V. CONCLUSION In this work, Aho-Corasick algorithm was extended to detect injection attacks without increasing the number of states. The CPU time consumed by conventional AC algorithm and the proposed algorithm were compared. The results indicate that the proposed algorithm is much faster than the input modified AC algorithm. The proposed algorithm would be very suitable for detection SQL injections. REFERENCES [1] J. McHugh, A. Christie, J. Allen, "Defending yourself: The role of intrusion detection systems," Software, vol. 17, no. 5, pp. 42-51, September 2002.
[2] K.K Tseng, Y.C. Lai, Y.D. Lin, and T.H. Lee, “A Fast Scalable Automaton Matching Accelerator for Embedded Content Processors,” ACM SIGARCH Computer Architecture News, Vol. 35 Issue 3, pp 38-43, June 2007 [3] Snort, http://www.snort.org [4] A. V. Aho and M. J. Corasick, “Efficient string matching: An aid to bibliographic search,” Communications of the ACM, vol. 18, no. 6, pp. 333–340, 1975. [5] M. Alicherry, M. Muthuprasanna, and V. Kumar, “High speed pattern matching for network IDS/IPS,” in Proc. IEEE Int. Conf, Netw. Protocols, Santa Barbara, CA, Nov. 2006, pp. 187-196. [6] N. Tuck, T. Sherwood, B. Calder, and G. Varghese, “Deterministic memory-efficient string matching algorithms for intrusion detection,” in Proc. IEEE INFOCOM, Hong Kong, China, Mar. 2004, pp. 2628–2639. [7] W. Eatherton, Z. Dottier, and G. Varghese, “Tree bitmap: Hardware/ Software IP lookups with incremental updates”. Unpublished, 2002. [8] T. Miyazaki. “Speed-up of string pattem matching using Huffman codes with finite state model” (in Japanese). Master Thesis, Kyushu Institute of Technology, 1997. [9] T. Nishimura, S. Fukamachi, and T. Shinohara. “Speed-up of Aho-Corasick pattern matching machines by rearranging states,” in Proc. IEEE String Processing and Info. Retrieval, Laguna de San Rafael, Chile, Nov. 2001, pp. 175-185. [10] F. Yu, R. Katz, and T. V. Lakshman, “Gigabit rate packet pattern matching using TCAM,” in Proc. IEEE Int. Conf. Netw. Protocols, Berlin, Germany, Oct. 2004, pp. 174– 183.
Use of Computer Vision During the Process of Quality Control in the Classification of Grain Rosas Salazar Juan Manuel, Guzmán Ruiz Mariana, Valdes Marrero Manuel Alejandro Universidad del Mar Campus Puerto Escondido (UMAR) Carretera Oaxaca Vía Sola de Vega Km. 1.5 Puerto Escondido, Oax., México C.P. 71980 Email: [email protected], [email protected], [email protected] Abstract-In this work the date collected is a result of the development of a system of computer vision taking place during the quality control stage of the process of classifying grain sizes. This is important as the machines may need to be adjusted if samples of produce are found to have differing percentages of grain quality during the process. This study focussed on the machine process involved in the production of rice for small and medium-sized companies, referred to in this study as S&MC. This production process is made up of different stages; the first is the acquisition of images of the grains of rice which are then condensed to obtain better results in the segmentation process. The segmentation process uses images which seek the outline of the grains only. This study took these images and later developed them further and algorithms were programmed in order to segment the whole rice grains and those three quarter size, using techniques of characterization and grain classification based on the morphologic properties of these grains. Finally, it was possible to segment and to classify 90.92% of the images analyzed in the correct form.
I. INTRODUCTION At the present time the schemes of quality control applied to the industry, in which the amount of production is marked by a high demand of the products, add even more competition for producers and require that producers do more and more to establish stricter standards to increase the quality of their products. The objective of this research is to equip an S&MC with this system in order to improve their methods of analysis of samples of grain in the quality control stage and to comply with the strict national and international standards. This will allow the S&MC to compete successfully in the market with its products. The implementation of computer vision in the segmentation process will eventually increase the volumes of packing and grain production, and will overall represent an economic improvement for the producers of rice of the state of Veracruz like first instance. The process of analysis of samples for the quality control stage in the adjustment of grain machines can be utilized by different S&MCs in the same line of business. However, the selection process of most S&MCs is done manually and visually which limits working times as between each production process there are “dead” times which affect the whole chain of production, including the transport of the product to the different markets. The developed system includes a mechanism of selection of grain samples (fraction and three quarters) from computerized strategies of computer vision and the automation of the analysis of rice samples.
grain. Secondly, an algorithm of detection of objects is applied and finally the classification for each sample of rice analysed in the laboratory of the S&MC. The complete system consists of four stages: acquisition, segmentation, detection and classification. These stages described are the following: 1)
Acquisition. An scanner HP-PSC 1200 series scanner was used, to obtain the images in format tiff. This format obtains monochrome images of 8 bits (gray scale) or of 24 bits (true color). With a resolution of 1200 DPI (dpi).
2)
Process of segmentation. In this stage the contours are defined that make each grain of rice. The following algorithms are implemented:
• • • •
Extraction HIS Lookup table Metric Threshold Gaussian
It is considered important the fact that even though two rice grains are joined an oriented algorithm is applied to identify contours that allow each one of pixels to be separated with their respective object in an image, which will avoid the joining of these. 3)
Detection of objects. This stage is based on a technique called particle analysis. A common particle is an area of pixels with the same logical state. All pixels in an image, which they belong to, are in a state of the first plane. The rest of pixels are in a state on the bottom. In a binary image, pixels in the middle have equal values to zero whereas each pixel different from zero is part of a particle.
4)
Classification. The stage of the classification identifies an unknown sample comparing a field of significant characteristics with a set of characteristics that conceptually represent classes of well-known samples. The classification implies two phases:
•
Training. - Shows a vision constructor the types of samples that are to be classified during the previous phase. It is possible to train any number of samples to create a trainer of the same ones, which compares them further along the process to the unknown samples during the phase of the classification.
•
Classifying. - In this phase a sample according to its similarity with base in the characteristics obtained in the phase is classified
II. DEVELOPMENT The system of computer vision used as a quality control measure in the process of adjustment of machines for the grain classification, were generated from segmentation strategies. First the method of threshold was used in the images to make out the outlines of each
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 213–217. © Springer Science+Business Media B.V. 2008
214
MANUEL ET AL.
III. TYPE OF USED IMAGES One worked with a specific type of images for the design of the algorithm: Fixed image: format tiff without compression was used to manage to store any level of depth of color of 1 to 32 bits and is, without a doubt, the ideal format anticipated to publish the images (see Fig. 1).
are difficult to process with any one of the afore-mentioned methods. For that reason it is necessary to make a pre-processing of these data and the algorithm used is the combination of two operations: •
A group of several levels of gray, choosing the average value, produces a reduction of gray levels
•
A filtrate to smooth the histogram.
(a)
(b)
Fig. 1. Original image
ALGORITHM FOR THE ADJUSTMENT OF THE IMAGE The objective of the adjustment of an image is to diminish the noise present in an image and to heighten the characteristics of interest, like the edges. In order to obtain this the following components were used: 1)
Extraction HSI. Its name is defined by the three parameters which take into account; the color H (hue), the saturation or chrome S, (amount of target) and luminance I (amount of black in color). In this case, the saturated tones would be more located in the contour of the hexagon corresponding to S=1 and L=0.5.
2)
Table Lookup In this step a substitution nonlinear is made where each byte of Fig. 1 is replaced with another one according to a table lookup. Or better said, it is the reduction of resistance of the dark zones to have better luminosity in the areas of the rice, (see Fig. 2).
Fig. 3. (a) Histograms before and (b) after the processing
The result is a bimodal histogram, where it is possible to distinguish the object (the abnormal edges and pieces) and at the bottom as can be seen in the second graph in Fig. 3(b). In this way the threshold of binary is calculated to be the average of the highest peaks of the histogram. The binary image is seen in Fig. 4.
Fig. 4. Threshold image of the rice.
4)
Algorithm Gaussian. It is a filter that uses the function of bidimentional Gaussian to calculate the coefficients of a mask, like a discreet approach [2]. The bidimentional Gaussian function is given by the following Equation (1) [3].
(1)
Fig. 2. Lookup Table
3)
Threshold metric. It applies a threshold to the processed image (Fig. 2), calculating the optimal threshold that depends on the surfaces that represent the initial gray scale while the metric technique is used. This technique is used to measure the visibility of an object in a scene.
With the illumination used the histogram of the image should approach the bimodal form, but actually a histogram is obtained as observed in the graph in Fig. 3. The data provided by this histogram
Discreet approaches of the Laplacian filter, starting off with the algorithm of Gaussian were used, to be used like convolution masks, considering the deviation [1]. Table I shows the results of the mask of convolution used to implement the Laplacian filter de Gaussian σ = 2.5. TABLE I MASK OF CONVOLUTION 7 XS 7 FOR THE LAPLACIAN FILTER DE GAUSSIAN 0.0032 0.0024 0.0016
0.0024 0.0008 -0.0010
0.0016 -0.0010 -0.0035
0.0012 -0.0017 -0.0046
0.0016 -0.0010 -0.0035
0.0024 0.0008 -0.0010
0.0032 0.0024 0.0016
215
USE OF COMPUTER VISION DURING THE PROCESS OF QUALITY CONTROL 0.0012 0.0016 0.0024 0.0032
-0.0017 -0.0010 0.0008 0.0024
-0.0046 -0.0035 -0.0010 0.0016
-0.0058 -0.0046 -0.0017 0.0012
-0.0046 -0.0035 -0.0010 0.0016
-0.0017 -0.0010 0.0008 0.0024
0.0012 0.0016 0.0024 0.0032
Being the results obtained in the following figure (see Fig. 5):
(c) Fig. 6. (a) Detection of shining objects; (b) average of rice; (c) total number of rice
6)
Fig. 5. Gaussian Laplacian
5)
Algorithm of count. The grains of rice within an image are small regions that are identified by their area which they occupy, as well as by their level of gray. Cases exist where the area of the regions could be sufficient for the identification, like in a scene with dispersed grains. In this proposed algorithm, the characteristic that allows the regions to be identified is the gray level since regions exist which contain agglomerate grains, whose values of gray contrast with the bottom.
Analysing the images it was observed that the size of the rice varies from one sample to another. Therefore, for each sample it is necessary to obtain an average, with base in the single grain of rice. Nevertheless, certain disadvantages exist, when the number of single grains is not significant, because they can have deformities, which may result in an erroneous average.
Extraction of characteristics and classification of regions. The objective of the classification of regions consists of applying different types of processing to obtain the number of rice grains in each region. The procedure to separate dispersed grains of the patches is based on the area of the regions.
The parameters of area, factor of structure, compactness and length of the contour that allow the regions to be classified is obtained from the analysis of different real images. With a simple algorithm the minimum and maximum size of a grain obtained in a random sample of images was determined. The remaining parameters are generated with the same images; the process consists of obtaining the values of these characteristics for each identified region. The values that are in Fig. 7 are those that provide better results based on the form of the regions (see Fig. 7).
Another one of the disadvantages appears when grains of rice do not appear in the image. In this case an average of 51 is determined on the basis of the results of experiments píxeles2 of area by grain. Later the grains of rice are counted by each region of interest (RI); the number of grains is by means of the area of the identified region and the average of grains in the image (see Fig. 6).
(a)
(b)
(a)
(b)
(c)
Fig. 7. (a) Determination of max and min, (b) result of the classification, (c) resulting image of the calcification.
216
MANUEL ET AL. TABLE III MANUAL RESULTS
IV. RESULTS OF THE PROCESS OF GRAIN CLASSIFICATION The sample population that was used for the statistics described composes of 95 images of acquired samples of rice from 4:00 a.m. to 6:30 p.m. on different days; Within the total samples 9 images of samples were identified that are considered defective due to faults of the process (be it by the increase of whiteness, the lack of human maintenance and/or errors). These are not include in the population, which was therefore reduced to 86 valid samples, of which 77 samples divided in two groups of 26 and one of 25 and each group of schedule were selected at random. The obtained results are the following ones (see Table II and Fig. 8): TABLE II EFFECTIVENESS OF THE SYSTEM BASED ON THE ENVIRONMENTAL CONDITIONS. Results Aquired
Normal
Images Recognised
Images Not Recognised
Percentage
Morning
25
22
3
92
Afternoon
26
24
2
92.30
Night
26
22
4
88.46
Total
77
69
8
90.92
20
nt ag e rc e
Three quarters 18 %
Weight 1.3g
Time 198seg
V9
14 %
2.6g
16 %
1.1g
190seg
V11 V14
51 % 40 %
9.8g 11.4g
52 % 45 %
0.8g 0.2
165seg 205seg
TABLE IV RESULTS OF THE SYSTEM
System Process results Broken V4
14.66 %
Weight 2.87g
Three quarters 5.57 % 19 5.601 % 18.601 4.04 % 1.0395
V9
13.100 %
2.62g
V11 V14
48.02 % 57.079 %
9.804g 11.48g
Weight 1.3g
Time 15 sec
1.12g
18 sec
0.808g 0.208g
14 sec 12 sec
Observations
24
23
20
Po
Weight 2.8g
TABLE V OBSERVATIONS OF THE PROCESSES
30 25
ManualProcess Results Broken V4 15 %
15
Recognised
10
Not Recognised
5
2
2
4
Afternoon
Actual weight
V4
2.8g
1.3g
V9 V11 V14
2.6g 9.8g 11.4g
1.1g 0.8g 0.2g
The correct result of the process is 15 – 20 y and in the manual process there was an error 2 % in ¾ The correct result of the process is 13 – 18 and in the manual process there was 14 – 16 The result is 48- 52. The result is 57 – 58.
Night
•
GroupDoors
•
Fig. 8. Global Effectiveness
The 9 samples not recognized, equivalent to 9.08% of the population, are caused by different procedures that compose the process of analysis (see Fig. 9).
• • •
Cause Errore
When raising the whiteness level, the counterbalance of the rice is warmed up more than the normal thing and therefore there is more breakage. The maintenance of the machines or the useage of the machines has not been adequate. The lack of supervision in vertilles. The adjustment of the machines are wrongly aligned or badly calibrated There could be human errors like not having changed the adjustment of the machine for the filling process.
The obtained results demonstrate a significant increase in the process of analysis as much in the results and of the samples.
11%
V. CONCLUSIONS
Whiteness 33%
¾
The problems which were noticed when viewing the results were:
0 Morning
Actual Broken Weight
56%
Maintenance Human error
Fig. 9. Causes of Errors
In the periods of evaluation of the samples with the system, results in the manual process in vertilles (v) were obtained as shown in table III and IV. It should be taken into account that the greatest part of the samples taken were taken at 4pm, when the people who do this process are usually visually tired, and moreover this type of manual process is usually tedious (see table V).
The most important part of this work was the process of the design and implementation of a system for the analysis of rice within the laboratory of a S&MC and in this way the ability to adjust the machines (Vertilles) so the final product is least affected with the finality of obtaining the best quality product. The analysis of images of the process of the rice within the laboratory presents/displays the following comparative advantages over relation traditional methods: • • • •
Greater objectivity at the time of discriminating elements of interest. Greater speed of processing. Greater precision in the counting of specific structures. Greater general effectiveness.
USE OF COMPUTER VISION DURING THE PROCESS OF QUALITY CONTROL • • • •
Possibility of evaluating fields by different observers. Possibility of keeping the data. Possibility of exporting the data to statistical software in direct form, which avoids transcription errors. Possibility of the automation of the procedures of capture, optimisation, selection of structures, etc. Any operator can make this without training which frees up the obtaining of material for analysis by the personnel specialized in the final evaluation.
In summary, to count on a tool as the analysis of images means having an automated process which is significantly objective, moderately fast and whose results can easily be accumulated, be analysed and be statistically graphical. The possibility of storing the data allows its future re-analysis. All this means the ability to change from an individual and almost artisan evaluation of a process, to a process that makes possible results such as smaller variation due to human operator-employee, facilitating and optimising the comparative analysis of results.
REFERENCES [1] Castleman, K.(1996), Digitalis Image Processing, pp. 667, Prentice Hall. New Jersey. [2] Haralick, R. and Shapiro, L. (1993), Computer and Vision Robot. Addison-Wesley. [3] Trucco, And Verri, A.(1998), Introductory for Techniques 3-D Computer Vision. Prentice Hall.
217
Theoretical Perspectives for E-Services Acceptance Model Kamaljeet Sandhu School of Computer and Information Science University of South Australia Mawson Lakes, SA 5095 Australia [email protected] Abstract: Web-based e-service user adoption and continuance requires an understanding of consumer contextual factors. In this context, e-service is referred to (interactive) service being provided on the website. This paper examines specific factors such as user experience and perception, motivation, support, control, and usage frequency in the online/offline user task environment and its affect on web-based e-service adoption and continuance. I. INTRODUCTION In recent years, the Internet has been identified as the world’s fastest growing marketplace with seemingly limitless opportunities for marketing products and services [1]. Web-based e-service delivery provides consumers with the opportunities of a virtual marketplace that are cost efficient, have 24/7 accessibility, lack geographic limitation, are interactive, and enable real time delivery of services. The developing web-based e-service domain seems limitless and expanding. From simple web-based e-service function to complex “service multiplier effect” [2] are being developed as the core function of web-based e-service. Any e-business creates a demand for pre-sales and after-sales service activities [3]. Hewlett Packard, for instance, is rapidly transforming their after-sales service to web-based e-service business unit, providing consumers the chance to interact in real time. Organizations engaged in e-businesses like banking, airlines, car rental, management consulting, music, software and educational institutions are increasingly opting for on-line services delivery to meet e-customer demand [4]. Today web-based e-services range from the electronic provision of traditional services (services with an “e” in front), such as investing and airline ticketing, intelligent interactivity in post-sales and pre-sales product support to online education course delivery [3]. As innovations in electronic service are rapidly emerging, it is yet unknown how the consumers are reacting and adjusting to this new web-based e-service function. Nevertheless, customer adoption and continuance are arguably a critical success factor in realizing the potential of web-based e-services and its future direction. From here onwards the terms end-users, users (referred in Information Systems literature) and consumers, and customers (referred in Services literature) will be used interchangeable and
imply to have the same meaning, the theoretical understanding is borrowed from these two disciplines. II. THEORETICAL PERSPECTIVES FOR THE MODEL Integrating Service and Information Systems literature in exploring the web-based e-service perspective (see Figure 1) will provide a better understanding of how consumers evaluate web-based e-services. An innovation is an idea, practice, or object that is perceived as new by an individual or other unit of adoption [5]. Service Literature
In form ation System s Literature
E -S ervice Perspective
Figure 1: Web-based e-service perspective Adoption of web-based e-service by end-user may be treated as technology adoption. The Technology Acceptance Model (TAM) of Davis [6] represents an important theoretical contribution toward understanding IS usage and IS acceptance behavior [7 and 8]. Applying the TAM model to investigate the end-user requirements may provide insight into web-based e-service usefulness and user friendliness. TAM focuses on two such beliefs: perceived usefulness (PU) and perceived ease of use (PEOU). The former would enhance the end-user job performance, while the latter would be easy to use. The two instruments of TAM have shown to have success in past IS studies; recent studies suggest it applies also to ecommerce and to adoption of the Internet technology [9]. Jackson et al., [10] used TAM to present a holistic framework and examine the various constructs that lead to behavioral intention to use an IS. TAM has its limitation of being employed beyond the workplace and the user task environment is not fully reflected [11]. Davis et al.’s [7] measurement of perceived enjoyment does not reflect a comprehensive set of intrinsic motivation factors, such as perceived enjoyment or perceived fun, as a research construct needs further theoretical validation. Malhotra and Galleta [12] pointed out that TAM is incomplete in one important respect: it does not account for social influence in the adoption and utilization of new information systems. Dishaw and Strong [13] also suggest a weakness of TAM, in
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 218–223. © Springer Science+Business Media B.V. 2008
THEORETICAL PERSPECTIVES FOR E-SERVICES ACCEPTANCE MODEL
its lack of task focus. Davis [6], himself argues that future technology acceptance research needs to address how other variables affect usefulness, ease of use, and user acceptance. Important conclusions can be derived from prior studies using the TAM model when applying it to the web-based eservice end-user adoption. First, merely instrumental usefulness is rather an insufficient basis to develop widespread end-user adoption of web-based e-service. Second, the integration of end-users perception with usefulness and user friendliness may not explain the webbased e-service adoption and continuance factor holistically.
219
continuance pattern. Defining a web-based e-service task is not an easy and straightforward process. Rather developing an approach to studying the process on the basis of webbased e-service and consumer interaction is suggested. This paper specifically investigates issues related to web-based eservice end-user adoption and the process involving successive usage and continuation of web-based e-services. With technology constantly changing it is subsequently has an impact on the consumer’s experience, usage and adoption behavior. This follows in discussion on consumers experience and perception in adopting web-based e-service task. III. END-USER EXPERIENCE AND PERCEPTION
Usage Frequency/Continuance Adoption
Perceived usefulness Situation-specific Motivation Perceived ease of use End-user
Technology
End-user experience and perception
Figure 2. Suggested triangulation approach for understanding consumer perspective in web-based e-service. Considering end-user, technology and adoption in the consumer context provides a triangulation approach (see Figure 1). The triangulation approach takes into account the other variables, which may have an impact on the consumer context. Adoption models traditionally focus on “first purchase decision” [14]. These models provide understanding of the adoption process of separate consumer goods. However, web-based e-service end-user adoption is primarily related to real time intangible service delivery on the web, successively following to tangible products and services. The integration of this function into situation specific motivation, consumer experience and perception, usefulness and user friendliness provides an important contextual basis in understanding the end-user adoption, continuance or rejection process. Unlike any other service, web-based e-service characteristics predominantly originate from the traditional offline environment involving human-to-human interaction (i.e., face-to-face interface). In the online environment this is differentiated by human-to-web interface. The consumer contextual factors influencing web-based e-service usage are considered to be the same whether online or offline. The information available to web-based e-service consumers is one of the determinants, which directs the consumer in achieving the desired objectives that form the purpose of using web-based e-service. This exploration assumes that end-users of web-based eservices are already consumers of services in the traditional environment (offline). Understanding the traditional service complexities and transforming it to the online web-based eservice task is a challenge for both practitioners and researchers. The dimension and scale of such complexity in terms of technology and adoption further develops into consumer oriented situation task, which is centric to understanding the web-based e-service adoption and
This exploration defines experience developed in the context of users activities and task conducted in an offline/online environment. Users engaged in web-based e-service activities tend to focus on prior perception and experience, especially from the offline environment. Web-based e-services being relatively new and developing, the scope of users interpretation of web-based e-services is reflected on the basis of prior service experience. The impact of web-based e-service experience on first time users compared to the frequent users will vary, a user with no experience of web-based e-service can form high (or low) perception, especially via word of mouth communications. Such perceptions may behave differently from those developed via experience. As Davidow and Uttal [15] pointed: “service expectations are formed by many uncontrollable factors, from the experience of customers…to a customer’s psychological state at the time of service delivery.” Understanding users experience and perception solely on the basis of online experience or offline experience would tend to limit the research dimension; rather a combination approach is adopted. The level of experience gained by the end-user while conducting traditional service or web-based e-service provides an opportunity in retaining that experience to be applied in future similar situation. Such experience depends on previous interface involving combination of allied factors such as motivation, support, self-service, control, and usage frequency. The following discussion explores the role of these factors. III. MOTIVATION Hoffman and Novak [16] in their model of Network Navigation focus on extrinsic and intrinsic motivation characteristics, which impact end-users involvement and focused attention. The motivation characteristics impacts the activity task the user undertakes, it is unknown if the same motivational characteristics are retained on the return visits (continuance) to the website to use web-based e-service. This may also be true in a situation when the end-user may have a different set of motivational characteristics when visiting another website to use e-service. Prior research [16, 17, 18, 19, and 20] suggests that users involvement is characterized by situational involvement and experience. There are no substantial studies that suggest that end-users come with a set of motivational characteristics online and
220
SANDHU
retain them until the task is completed. Due to the multiple visits made to the website and user getting accustomed to the mechanical process of conducting repeated web-based eservices task, it is of significance to know the state of motivation in user task interface. Intrinsic motivation in a task environment has also been associated with willingness to spend more time with the task, lower levels of anxiety, positive mood, and greater learning e.g. [17]. The sequential process associated with learning followed by experience retention impacts the end-user in adopting motivational characteristics, which positively affect the web-based e-service task. Gattiker [21], in a framework of training, suggested that individual motivation had an impact on substantive complexity. Based on intrinsic motivation research and user acceptance research, it can be expected that for a given objective level of effort, greater levels of intrinsic motivation during training had a favourable impact on perceptions of effort (perceived ease of use) [22]. Another element of motivation is willingness or readiness to act [23]. Whether or not the end-users are dissatisfied with the offline service, not all end-users will equally willing or ready to adopt web-based e-service. The literature on innovation adoption suggests that innovators and early adopters are more willing to try new innovations than late adopters and laggards. Zaltman et al, [24] studied attitudinal dimensions related to an innovation and the user willingness to consider its use. Mohr [25] pointed out that willingness is an important precursor to actual adoption of the innovation. It is conclusive that the user in a state of willingness will opt to try the web-based e-service, but will continue its use is unknown. Although the willingness to use will depend on other factors such as its relative advantage to offline service, cost, risk, efficiency, compatibility, and complexity of the web-based e-service task. The affect of motivation mechanisms on decision-making in online and offline environment involves the user in personalizing the web-based e-service needs. An important difference between online and offline markets is that for attributes for which information can be obtained in both media search costs are typically lower online than offline [26 and 27]. The decision-making process the user undergoes is perceived from a similar prior situation. Web-based eservice information that is easier to search will be more “available” and hence, will have a larger influence on overall product evaluation [28]. Motivation in alliance with other situational specific personalization factors has a tendency to strongly influence end-users web-based e-service decisionmaking process. The degree of motivation (intrinsic or extrinsic) with other situational specific personalization factors directs the user in achieving the task. Several studies have demonstrated that when information is presented in a suitable format (e.g., brand x attribute x matrix) as in Peapod (Peapod offers an online grocery subscription service in seven metropolitan areas), it facilitates both user information acquisition and comprehension [27, 29, and 30]. Not many studies [19, 31, and 27] investigated into users’ behavior in understanding
repeat online purchase and the users motivation behind it. BCG Report [19] instead suggested online retailers to encourage consumers to share personal information and leverage this customer knowledge to make the experience more meaningful for the customers. Creating a compelling reason for the end-user to make the first online purchase is one of the obstacles facing the researchers and practitioner. The study further suggests that to create a differentiated and low risk first purchase, engineer a flawless purchase experience, offer fast and reliable service, prove the integrity of online process, enable consumers to customize delivery options, and relieve anxiety over returns. Some of these factors, if not implemented in an online environment, tend to demotivate the end-user, and form poor or low expectation thus affecting experience [32] which is similar to the reflection of prior offline experience, when the end-user didn’t receive good service, goods were faulty, and the purchasing experience had a negative effect. This may discourage the users from acceptance and continuation of web-based e-services. Motivation can have a positive or negative effect on the endusers in an online environment, if a user had a positive experience shopping in an offline environment, the same positive belief will reflect on the willingness to using the online channel for web-based e-services and vice versa. But if a user of online web-based e-service had a negative experience, to know what caused the negative effect is low, compared to the offline environment where the human interaction is present. This has been reported in BCG Report [19], for a small but significant number of consumers, a single bad experience can put an end to their online shopping careers. This leads in understanding the level of online support required by end-users in conducting the web-based e-service task. IV. SUPPORT Online/offline support has been one of the crucial decision making determinants in web-based e-service end-user adoption. Similar to the offline situation, end-users require a high level of support in consumer life cycle. The need for support seems more important in conducting web-based eservice task due to mediated interaction between the enduser and the web interface and absence of online human support. Not much is being done to integrate online support into the core functions of web-based e-services. The overall support strategy adopted by organizations offering webbased e-service is to allow support through telephone (call centers), frequently asked questions (FAQ) on websites, or through email support. The effect of support on end-users may form a positive/negative perception towards using webbased e-service. Researchers have consistently identified good support through well-experienced staff and effective communication with end-users as critical for the success of the business. In an online web-based e-service environment such an interaction is most wanted but appears to be absent. Meuter et al [33] in their online study found that 80% of the customer complaints were made in person to the company, either by phone or by visiting a service facility. This suggests that when the user is confronted with a problematic web-based e-service experience, and to resolve the issue the
THEORETICAL PERSPECTIVES FOR E-SERVICES ACCEPTANCE MODEL
user adopts the traditional approach of face-to-face interaction, rather than the online approach. When the service fails it has fallen outside the customer’s zone of tolerance [34]. So far nothing is known about users tolerance levels of web-based e-services online, or the customers propensity to complain about online service failures [35], and customers reaction to it. Zeithaml, Parasuraman, and Malhotra [36] concluded that customers have no web-based e-service expectations, customers have been found to compare web-based e-service to competitor’s services and to brick and mortar stores [33, 37]. Thus users’ tolerance level for web-based e-services, their immediate reactions to the service failure and their consecutive behavior, are intertwined and forms part of the experience and perception the user develops. V. CONTROL Studies in consumer behavior suggest end-user require a higher degree of control on their activity [16, 17 and 20]. The role of control in online consumer behavior is considered equally significant in understanding end-user web-based e-service adoption. Control in this exploration refers in the environmental and behavioral context of endusers of web-based e-service. End-user control of environment relates in terms of the flexibility and freedom provided in conducting the web-based e-service task. To retain interest in a web-based e-service, there should be enjoyment on the part of the user during the interaction with the site. The end-user has the potential to control the eservice interaction with the medium on a commercial website. They are in charge, choosing to progress, explore or exit the content of the interaction [39]. Control is facilitated by the medium adapting to feedback from the individual, and also by providing explicit choices among alternatives [38]. Environmental control has been considered as one of the main discriminator between the classes of on-line purchasers and non-purchasers, as the main factor restraining the adoption of on-line shopping [40]. End-users opting for online shopping experience higher control structures online and lesser flexibility compared to offline environment. In such a state the end-users draw a perception of their prior shopping experience offline and compare with the online experience. This usually is reflected in terms of ‘what they can do offline’ compared to ‘what they cannot do online’. The ability to transfer the offline environmental characteristics to online is a task that involves considerable understanding of the web-based e-service end-user usage behavior in offline environment. In terms of end-user behavior this is reflected from the activities in traditional offline environment and provides useful information in determining the end-user activity and control characteristics. The development of intelligent agents guiding the user in conducting the task from start to completion tends to improve the user interface and reducing uncertainty and problematic experience, providing lesser control. An agent’s ability to act on its own implies some sort of knowledge base and intelligence, but the level of intelligence may vary from the simple (e.g. instruction to run a procedure at a specific time) to the complex (e.g. flexible reasoning systems) [41]. This has led to smart software’s taking over the task and
221
reducing users interactivity with the task and completion within a few ticks and clicks of a mouse (e.g., online banking, online bill payment). The whole process tends to be reduced, removing the intricacies the user can encounter, and at the same time standardizing the web-based e-service across all domain and user developing a positive experience. It has been proposed that agents will likely develop from simple, understandable systems to more complex and intelligent systems as users’ become accustomed to delegating activities and develop trust in the agent’s abilities [41, 42 and 43]. Often the role of shop assistants is to resolve any consumer query in traditional off line environment. Aberg and Shahmehri [44] in their study investigated into the role of human web assistants in electronic commerce. They designed and implemented a human web assistant system concentrating on three properties 1) human intelligence 2) individual user centered 3) user interface with different needs. There are few companies (e.g., liveperson.com, facetime.org) exploring the role of human assistance in websites. The development of a smart system able to interface human communication to machine understood communication would require the integration of end-users experience with the machine intelligence. Merging the two would be a difficult task, developments made would be considered useful. VI. USAGE FREQUENCY End-user web-based e-service usage frequency is another determinant in understanding web-based e-service continuance after successful adoption. In order to fully understand web-based e-service usage frequency as a success measure it is suggested here to reflect on IS usage studies. The debate on IS usage as a success measure is ongoing. Gelderman [45], argues that a rational for the application of IS usage as a success measure is the idea that it does not contribute to performance if it is not used (and will contribute to performance when it is). An alternative rationale Gelderman [45] states that users are able to assess the value of the IS and will use if they conclude that the benefits (rewards) will outweigh the costs (efforts) [8, 45, and 46]. Both rationales assume that more usage is better, which is not necessarily the case. This is not true for web information systems, where end-users may engage for considerable period of time leading to high usage, without any specific purpose (such as surfing the Internet for fun). Gelderman [45] argues that it is unclear what exactly is the amount usage of an IS and subjective measurement of usage maybe influenced by social desirability and usage measurement may suffer from time-dependent noise. In terms of web-based e-service usage this may be related to more usage the better. The application of usage as successful web-based e-service adoption and continuance determiner provides a basis in understanding successful and unsuccessful user visits for conducting the web-based eservice task. Though the number of visits to e-service websites cannot solely provide complete information about the success of web-based e-service adoption and continuance.
222
SANDHU
Lane and Koronios [47] investigated user requirements in business-to-consumer web information systems. Businessto-consumer web information systems are IS based on web technology that supports consumer interaction such as online shopping through integration with databases and transaction processing systems [48]. The development of effective mass information systems such as web information systems is critical to the success of web based electronic commerce [47 and 49]. Capturing user requirements is particularly important in the development of web information systems where user acceptance is often critical to the success of a web information system [47, 49, and 50]. It is unknown which web-based e-service features consumer (individual and group) often use and why, and the end-user movement frequency from one section to another which is further complicated by consumer diversity, first time, second time, or repeat visits to the website, and differentiation between satisfied and dissatisfied consumers. Collecting and analyzing information on end-user movement frequency in web-based e-service usage on the World Wide Web is the core for understanding consumer behavior in an online environment. There exist vast amounts of information in the form of data, which is meaningless until it is formatted and processed to understand the end-user movement frequency [47]. More recently several on-line tools like access log, agent log, error log, referrer log, cookies, user login ids, are available to capture and analyze user behavior and activity in web information systems and can reveal useful information in analyzing end-user movement frequency There are number of studies that have assessed the suitability of these tools in the development of web information systems, e.g. [50, 51, and 52]. These tools provide researchers with knowledge about the user requirements and may be integrated to the websites. Even with the availability of such tools, web-based e-service systems pose challenges regarding the capture of user requirements.
the individual end-user, technology, and adoption in the consumer context. Though prior studies even adopted the general technology user models such as the TAM, which is of significance, they do not take into account the consumer context issues on a commercial situation basis. To study the web-based e-service end-user adoption from a consumer perspective centered context and combining it with adoption and acceptance models would enhance the understanding. To explore the context further, issues related to situation specific personalization of individual end-users in motivation, support, control, self-service and usage frequency in online and offline environment may be used to produce evaluation guidelines that would facilitate the adoption and continuation process. The earlier studies [34] investigated the adoption of webbased e-services in different contexts, but not provide insight into acceptance and continuance of web-based e-service by consumers. Though a consumer may use web-based eservice for the first time, its continuous usage lays the success. Understanding and developing end-users continuous usage behavior in web-based e-services has illustrated the same perspective and provided similar insights into what determines IS acceptance and continuance. This study will concentrate on those factors and at the same time reveal its core relationship to web-based e-service end-user behavior. From the preceding discussion it has been clear that webbased e-services end-user adoption is not a simple and straightforward process. Rapid development in web-based eservice technology delivery is gradually shaping the consumer uptake and usage of this new innovation. The level of interaction in traditional services to web-based e-services and simultaneous use of both has laid a new set of implications for the organizations, government, consumers, practitioners, and researchers. REFERENCES
End-user E-Service Experience and Perception in Control, Self-service, and Support
H1
End-user E-Service Perceived Usefulness
H3
H1
End-user E-Service Adoption
H2 End-user E-Service Situation Specific Motivation
H2
End-user E-Service Perceived Ease of Use
H5
End-user E-Service Usage Frequency; and Continuance
H4
Figure 3: Conceptual framework for Web-based E-Services Adoption Model (E-SAM). The model complements Davis’s [6] main characteristics (perceived usefulness and perceived ease of use) with significant contexts and factors. Whether the end-user adoption decision for web-based e-services emanates from: end-users experience and perception, situation specific motivation, or acceptance and continuance contexts in usage frequency needs investigation. VII. CONCLUSIONS In line with the preceding discussion it is suggested that web-based e-service end-user adoption takes into account
[1] Domains, C. (1999). “Business on the Internet.” http://www.cleverdomains.com/business.htm,CleverDomains. [2] Aberdeen Group. (1999). “Web-based e-service: Using the Internet to manage customers.” http://www.servicesoft.com/presskit-whitepaper.html, [3] Ruyter, K. D., Wetzels, M., and Kleijnen, M. (2001). “Customer adoption of web-based e-service: an experimental study.” Inl Journal of Service Industry Mgt 12(2): 184-207 [4] Forrest, E., and Mizerski, R. (1996). “Interactive Marketing, the future present.” American Mark. Assoc. [5] Rogers, E. (1983). Diffusion of Innovation (2nd Edition). [6] Davis, F.D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of IT. MIS Quart, 13(2): 319-40 [7] Davis, F. D., Bagozzi, R.P., and Warshaw, P.R. (1992). “Extrinsic and Intrinsic Motivation to use computers in the workplace.” Jou. of Applied Social Psych 22(14): 1111-132. [8] Robey, D. (1979). “User attitudes and management information system use.” Academy of Management Journal 22(3): 527-38. [9] Gefen, D., and Straub, D. (2000). “The relative importance of perceived ease of use in IS adoption: A study of e-commerce adoption.” Journal of the Ass. for IS 1 Article 8: 1-20. [10] Jackson, C. M., Chow, C., and Leitch, R.A. (1997). “Toward an understanding of the behavioral intention to use an information system.” Decision Sciences 28(2): 357-90. [11] Moon, J., and Kim, Y. (2001). “Extending the TAM for a world-wide-web context.” Inf and Mgt 38: 217-30.
THEORETICAL PERSPECTIVES FOR E-SERVICES ACCEPTANCE MODEL [12] Malhotra, Y., and Galletta, D. F. (1999). “Extending the technology acceptance model for social influence: theoretical bases and empirical validation.” in Proceedings of the 32nd ICSS, 1999. [13] Dishaw, M. T., and Strong, D.M. (1999). “Extending the technology acceptance model with task-technology fit constructs.” Information and Management 36: 9-21. [14] Mahajan, V., and Muller, E. (1990). “New product diffusion models in marketing: A review and directions for research.” Journal of Marketing Research. 54(1): 1-27 [15] Davidow, W. H., and Uttal, B. (1989). “Service companies: focus or falter.” Harvard Business Review (July/August): 17-34. [16] Hoffman, D. L. and Novak, T, P (1996). “Marketing in Hypermedia Computer-Mediated Environments: Conceptual Foundations.” Journal of Marketing Research. 60(7): 50-68. [17] Csikszentmihalyi, M. (1988). “The future of flow, in optimal experience: Psychological Studies of flow in Consciousness, Mihalyi Csikszentmihalyi and Isabella Selega Csikszentmihalyi, eds., Cambridge, Cambridge University Press”. [18] Csikszentmihalyi, M. and LeFevre, J. (1989). “Optimal experience in work and leisure” Journal of Personality and Social Psychology 56(5): 815-22. [19] Abdelmessih, N., Yolles, E., Grevler, C., Kuo, L., Sterling, K and Davidson, K (2000). Winning of the Online Consumer: Insights Into Online Consumer Behavior, The Boston Consulting Group. [20] Ramaswami, S. N., Strader, T. J., and Brett, K. (2001). “Determinants of on-line channel use for purchasing financial products.” Intl. Journal of E-Comm 5(2): 95-118. [21] Gattiker, U. E. (1992). “Computer skills acquisition: A review and future directions for research.” Journal of Management 18(3): 547-574. [22] Venkatesh, V (1999). “Creation of favorable user perceptions: Exploring the role of intrinsic motivation.” MIS Quarterly 23(2): 239-60. [23] MacInnis, D. J., Moorman, C., and Jaworski, B.J. (1991). “Enhancing and measuring consumers' motivation, opportunity, and ability to process brand information from ads.” Journal of Marketing 55(4): 32-53. [24] Zaltman, G., Duncan, R., and Holbeck, J. (1985). “Innovations and Organiz..” Malabar, FL: Robert E. Krieger. [25] Mohr, L. (1969). “Determinants of innovations in organizations.” American Policy Science Rev. 63: 111-26. [26] Bakos, J. Y. (1997). “Reducing buyer search costs: Implications for electronic marketplaces.” Management Sciences 43: 1676-92. [27] Degeratu, A. M., Rangaswamy, A., and Wu, J. (2000). “Consumer choice behavior in online and traditional supermarkets: The effects of brand name, price, and other search attributes.” Working Paper. The Smeal College of Business, Penn State University; and A.B. Freeman School of Business, Tulane University: 1-45. [28] Kisielius, J., and Sternthal, B. (1984). “Detecting and explaining vividness effects in attitudinal judgments.” Journal of Marketing Research. 21: 54-64. [29] Bettman, J. R., and Kakkar, P. (1977). “Effects of information presentation format on consumer information acquisition strategies.” Journal of Consumer Research 3: 233-40. [30] Russo, J. E., Staelin, R., Nolan, C. A., Russel, G. J., and Metcalf, B. L. (1986). “Nutrition information in the supermarket.” Journal of Consumer Res. 13 (June): 48-70. [31] Velli, A., Lisboa, P.J.G., and Meehan, K (2000). “Quantitative characterization and prediction of on-line purchasing behavior: A latent variable approach.” International Journal of Electronic Commerce 4(4): 83-104. [32] Johnson, C. and Mathews, B.P. (1997). “The influence of experience on service expectations.” International Journal of Service Industry Management 8(4): 290-305. [33] Meuter, M. L., Ostrom, A.L, Roundtree, R.I., and Bitner, M.J. (2000). “Self-service technologies: Understanding customer
223
satisfaction with technology-based service encounters.” Journal of Marketing 64: 50-64. [34] Zeithaml, V. A., Berry, L., and Parasuraman, A. (1993). “The nature and determinants of customer expectations of service.” Journal of the Ac. of Marketing Sc. 21(1): 1-12. [35] Riel, A. C. R., Liljander, V., and Jurriens, P (2001). “Exploring consumer evaluations of web-based e-services: a portal site.” Intl Jou of Service Industry Mgt 12(4): 359-377. [36] Zeithaml, V. A., Parasuraman, A., and Malhotra, A (2000). “A conceptual framework for understanding e-service quality: Implications for future research and managerial practice.” Working Paper. Report No. 00-115. [37] Szymanski, D. M., and Hise, R.T. (2000). “E-satisfaction: an initial examination.” Journal of Retailing 76(3): 309-22. [38] Webster, J., Trevino, L.K., and Ryan, L. (1993). “The dimensionality and correlates of flow in human-computer interactions.” Computer in Human Behavior 9: 411-26. [39] Trevino, L. K., and Webster, J. (1992). “Flow in computermediated communication.” Communication Research 19(5): 539573. [40] Velli, A., Lisboa, P.J.G., and Meehan, K (2000). “Quantitative characterization and prediction of on-line purchasing behavior: A latent variable approach.” International Journal of Electronic Commerce 4(4): 83-104. [41] Sproule, S. and Archer, N. (2000). “A buyer behavior framework for the development and design of software agents in ecommerce.” Internet Research: Electronic Networking Applications and Policy 10(5): 396-405. [42] Grosof, B.N. (1997). “Building commercial agents: an IBM research perspective (invited talk)”, Exhibition on practical applications on intelligent agents and multi-agent technology (PAAM 97), http://www.research.ibm.com/igents/paps/rc20305.pdf. [43] Etzioni, O. (1997). “Moving up the information food chain.” AI Magazine, 11-18 [44] Aber, J., and Shahmehri, N. (2000). “The role of human web assistants in e-commerce: an analysis and a usability study.” Internet Research: Electronic Networking Applications and Policy 10(2): 114-25. [45] Gelderman, M. (1998). “The relation between user satisfaction, usage of information systems and performance.” Information and Management 34: 11-18. [46] Markus, L. M., and Robey, D. (1983). “The organizational validity of management information systems.” Human Relations 36(3): 203-26. [47] Lane, M. S., and Koronios, A. (2001). “A balanced approach to capturing user requirements in business-to-consumer web information systems.” Australian Journal of Information Systems 9(1): 61-69. [48] Isakowitz, T., Bieber, M., and Vitali, F. (1998). “Web information systems.” Communications of the ACM 41(7): 78-80. [49] Hansen, H. R. (1995). “Conceptual framework and guidelines for the implementation of mass information systems.” Information and Management 28(2): 125-42. [50] Bauer, C., and Chong, S. (1999). “Electronic markets development: Using marketing research to determine user requirements, in: Global Networked Organizations.” Proceedings of 12th BLED Electronic Commerce Conference, Bled, Slovenia. 40823. [51] Bertot, J. and McClure, C.R. (1997). “Web usage statistics: measurement issues and analytical techniques.” Government Information Quarterly 14(4): 373-90. [52] Wu, K. L., Yu, P.S. and Ballman, A. (1998). “Speed tracer: A web usage mining and analysis tool.” IBM Systems Journal 37(1): 89-106.
E-Services Acceptance Model (E-SAM) Kamaljeet Sandhu School of Computer and Information Science University of South Australia Mawson Lakes, SA 5095 Australia [email protected] Abstract-This paper reports on developing a framework for Web-based EServices Acceptance Model (E-SAM). The paper argues that user experience, user motivation, perceived usefulness, perceived ease of use may influence user acceptance of web-based eServices. The model was examined at a university where students and staff use services that have moved from a paper-based to Web-based eService system. The findings from the data analysis suggest that user experience is strongly related to perceived ease of use; and perceived usefulness to user motivation in user acceptance of Web-based eServices. The bonding of these variables and the lack of strong relationships between other components suggest that the model’s application to eServices and highlight a research framework.
I.
INTRODUCTION
This paper reports a study of a university based eService providing interactive services on a website. EServices are a form of information delivery on websites involving commercial and non-commercial activities. EServices websites offer services, provide information, are mostly interactive and offer the user the capacity to conduct commercial activity. As innovation in electronic service are rapidly emerging, there is a need to know how consumers are reacting and adjusting to this new web-based function. This paper reveals a study of the component design that impact on the usefulness and success of a web based eService. In this context the Web-based eService Acceptance Model is a conceptual framework for understanding user’s attributes when using websites, including attributes such as user’s experience, user’s motivation, and user’s continued use of the web-based eService. The conceptual model developed in this paper attempts to map user-based factors in determining acceptance and continuous use of eServices on websites. Nevertheless, user acceptance and continued use are arguably a critical success factor in realizing the potential of web-based eServices and its future direction. In a study by Accenture [1] on Web-based eServices it was discovered: 89% of executives are familiar with web-based eServices; 79% of them are evaluating web-based eServices; 76% ranked web-based eServices as either a high or a medium priority. The study reveals that the executives are evaluating Web-based eServices as a current option, and not as a Research and Development or an investment only in future capabilities. Although organisations are developing eServices capabilities little is understood in terms of user acceptance. A study by Datamonitor, Inc. [2] suggests that as much as $6.1 billion in potential Web sales were lost in 1999 due to inadequate eService. Web-based eServices are developing rapidly. Organisations and consumers alike are moving closer to its acceptance. The two studies have something in common: that is the concern for lost market opportunities in Web-based eServices.
Web-based eServices are believed to be superior to services delivered through ordinary channels because of their convenience, interactivity, relatively low cost, and high degree of customization or personalization, among other advantages [3]. Web-based eServices are enabling business capabilities that companies have been struggling to provide for years: connecting business processes that span different information systems, or even different organisations [4]. There is little understanding of the user acceptance of Webbased eService that underlie the demonstrating criteria in how eServices are accepted and used based on user centred factors. Characteristics such as users experience, motivation, and continued use of websites may be the primary criteria in conducting a web-based activity. II. THEORETICAL FRAMEWORK AND MODEL COMPONENTS Current research on web-based eServices focuses on understanding those factors that influence how successfully and rapidly users adopt the eService. Much of that research utilizes the Technology Acceptance Model (TAM) of Davis [5]. TAM represents an important theoretical contribution toward understanding IS usage and IS acceptance behaviour [6] and has been applied in a variety of end-user studies on the world-wide-web [7, 8, 9, and 10]. Perceived usefulness
Attitude towards using
External
Behavioural intention to
Actual system use
Perceived ease of use
Figure1. The technology acceptance model (Davis. 1989) These studies investigated the application of TAM in conjunction with one or more factors (i.e., experience, motivation, and usage frequency). TAM focuses on two constructs: perceived usefulness (PU) and perceived ease of use (PEOU) (Fig.1). These two instruments of TAM have been shown to have success in past IS studies in explaining user acceptance of technology. Recent studies suggest that the TAM model also applies to user acceptance of e-commerce and to Internet technology [8]. To further enhance the model Venkatesh [9] suggests using an adjustment-based theoretical model, including control, intrinsic motivation, and emotion as variables within the ease of use dimension of TAM construct. Venkatesh [9] argued that there are multiple factors not directly related to the user-system interaction that are more
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 224–229. © Springer Science+Business Media B.V. 2008
E-SERVICES ACCEPTANCE MODEL (E-SAM)
important to TAM. These are control, intrinsic motivation, and emotion. Similarly, Steer et al [11] used TAM in a webbased environment to study user’s usage behaviour. It is suggested that a broader range of complex factors are needed to assess influence on acceptance behaviour. Morris and Morris and Turner [12] suggest that until the quality of experience scale is related to a relevant measure of IT use, the utility of the construct cannot be determined. It may be argued that quality of experience scale is a variable that may not stay constant in user specific situation on the websites. Davis (1989), himself argues that future technology acceptance research needs to address how other variables affect usefulness, ease of use, and user acceptance. The research model for Web-based electronic services (ESAM) proposes that six constructs - user experience, user motivation, user perceived usefulness [5 and 6], user perceived ease of use [5 and 6], user acceptance of eService, and user continued use of eService- are the main determinants of user acceptance of web-based eServices (Figure 2). End-user E-Service Experience and Perception in Control, Self-service, and Support (EXP)
H1a
End-user E-Service Perceived Usefulness (PU)
H1b
End-user E-Service Acceptance
H2a End-user E-Service Situation Specific Motivation (MOT)
H3a
H2b
End-user E-Service Perceived Ease of Use (PEOU)
H4
End-user EService Continued usage (CU)
H3b
Figure 2: Web-based Electronic Services Acceptance Model (E-SAM)
III. THE FRAMEWORK FOR ESERVICES A. Users Experience IS developers are invariably concerned with success. User’s interaction with new technology has however been accompanied by unknown obstacles resulting in technology rejection. Such obstacles focus the user attention on the downside of technology usage. Capturing such obstacles at the time of their occurrence on websites is a challenge. Webbased eServices are believed to be quick and the period of time users spend o the task is significantly small. Often it is difficult to capture the factors behind the rejection due to insufficient information about user interface activity. Website activities offer challenges and skills that lead users in achieving the task. It may be in terms of website location, task completion, information search, website navigation, and any other activity that facilitates user’s involvement in interface process. Control offers flexibility to users in the form that they can conduct the task easily. The user’s interaction skills invariably get better every time a website is visited. This develops into a learning process with users practising and improving on their experience. Flow formalises and extends a sense of playfulness [13, 14, 15, and 16] that is incorporated within the web activity. Websites offering higher levels of flow are shown to have
225
higher retention of users, perceived sense of control over the website interactions, focus their attention, and find it cognitively enjoying [17]. It has been observed that users, when successful in a task on website, feel confident in using for other tasks [18]. Accordingly it can be hypothesized that: H1a: User Experience affects Perceived Usefulness (PU) to use EServices H1b:
User Experience affects Perceived Ease of Use (PEOU) to use EServices
B. Users Motivation Prior studies suggest that user motivation is also an important component in acceptance or rejection of a technology [19 and 9]. There are two main categories of motivation: extrinsic and intrinsic [20]. Extrinsic motivation drives the user in to perform behaviour to achieve specific objectives [21], while the intrinsic motivation relates to user perception of pleasure and satisfaction from performing the behaviour [20]. Though TAM has not explicitly included or excluded the motivation component, studies suggest it be included in the perceived usefulness and perceived ease of use construct [e.g. 9]. Hoffman and Novak [19] included the motivational characteristics in developing a user process model of network navigation in hypermedia. In a web-based environment there may be number of unknown factors that shape user motivation to use an eService on the website. Often it is intriguing to know why some users are not motivated to complete the web-based process while others drop out. Interestingly, the focus on user motivation in technology usage remains dominant [9 and 19]. Whether it is the intrinsic or extrinsic motivation that drives user behaviour in the web-interface process is unknown. The level of motivation the user expresses may also shape the perception towards the web-based task. If the user had a negative experience (hence low motivation) it’s more likely to be reflected in the future visits to the website. Accordingly it can be hypothesized that: H2a: Perceived Usefulness (PU) affects User Motivation to use EServices H2b: User Motivation affects Perceived Ease of Use (PEOU) to use Eservices
C. Usage Frequency User’s visits to the website may also have an influence on the usage frequency (i.e. continued use). In terms of how easy the website is to use, may relate users attitude towards the task. If the task relates to user getting more experienced it will be reflected on the usage. Users may experience high level of flow in conducting the task. The easy navigation features, better control, self-service, good support on the website facilitates the usage process. Heijden [7] suggests that ease of use does not seem to directly influence website usage, but indirectly through usefulness and perceived entertainment value. The construct of entertainment has been found to be an effective evaluator of usage. However, it is still not clear whether those characteristics are website driven or user drive. The entertainment context on a website may have a different affect among different user groups. (i.e. a teenager will have a different influence in
226
SANDHU
comparison to mature age user). Hoffman and Novak [19] argued that attracting users to a website and generating repeated visits to be one of the main challenges. Moore and Benbasat [22] report that mandatory use of Information Technology has a positive impact on usage and that in situations of mandated use other factors tend to have less ability to explain the acceptance and use. Accordingly it can be hypothesized that: H3a: Perceived Usefulness (PU) affects User Acceptance of eServices H3b: Perceived Ease of Use (PEOU) affects User Acceptance of eServices H4: User Acceptance affects Continued Use (CU) of eServices
IV. RESEARCH METHODOLOGY Pitkow and Recker [23] explain the benefits of an online survey method. The online survey method provides better control on data structures, saves time, low cost, reduces chances of mixing complete and incomplete data into the main data stream, transferring and processing of data in an electronic format is made easy. The survey investigated user’s perception towards the university website. The users were asked to complete all questions in a web-based survey. Staff and Students of the University of Australia (not the real name) participated in an online survey. Staff and students were selected as participants because of their easy access to computers and use of eServices on university websites. The author programmed the website such that if the form was incomplete, and the user clicked on the submit button, they were alerted to complete the section. This provided the advantage of not receiving incomplete responses, and as a result all 403 responses received over a three-week period and were all complete. Collected data was received online and transferred to excel files, and later to the SPSS software package for analysis. The instrument derives two of its constructs (perceived usefulness and perceived ease of use) from TAM [5 and 6]. Reflective items were used to measure all the constructs, consistent with [3]. An effort was made to use items validated from literature review and previous case study. For some construct (i.e. pre-acceptance desires and expectations), formative items were developed based on a review of literature [24 and 25] and a belief elicitation process [3]. The purpose was to identify each stream of factors and study its affect on the acceptance and continued use of web-based eServices. The items were demarcated into factor categories - user experience, user motivation, user perceived usefulness, user perceived ease of use, user acceptance, and continued use (Table 1). Questions were direct and related to user interaction with the university website. The order of the questions in the survey was randomized, but each section probed the user about their specific experience, motivation, perceived usefulness, perceived ease of use, acceptance, and continued use of eServices on the university website. A total of 30 questions were asked, attention was given to keeping the questionnaire short and to the point. The users were asked to put a click on boxes that best matched their responses. This ranged from 1 to 6, varying from 1 Strongly disagree; 2 disagree; 3 Not sure; 4 Agree; 5 Strongly Agree; 6 Not applicable.
The questionnaires were tested, reframed to check for consistency, normality and validation among users. An initial test batch of 50 responses was analysed to ensure construct validity. A previous case study of “Implementing Web-based Electronic Services” [26] had been used to gather the constructs, evaluate them and build into the survey. However, further validation of the instruments in the context of the study and using the pilot study data necessitated changes in the final instrument used which is consistent with [27]. The data analysis was performed using partial least square to map the path structures in the model [28]. The PLS procedure [29] has gained interest and use among researchers in recent years because of its ability to model latent constructs under conditions of nonnormality and small to medium sample sizes [3, 28, and 30]. The results relate to how effective each of the construct is in the model and its confirmation or disconfirmation of hypothesis that are empirically tested. PLS represents an attempt to make latent variable path modelling using the Partial Least Square (PLS) approach accessible and convenient to all interested parties [28]. Partial Least Square is used to assess the structural model. Hypotheses are tested by examining the size, the sign, and the significance of the path coefficients and the weights of the dimensions of the constructs respectively which is consistent with [3 and 27].
V. MODEL TESTING AND VALIDATION FOR ELECTRONIC SERVICES USER ACCEPTANCE The causality model of Figure 3 summarizes the various structural constructs of the E-SAM framework. The path coefficients are the standardised regression coefficients. The R² are also shown. These coefficients are shown in Table 1 and 2 - Path Coefficients and Inner Model.
Figure 3: The causality model.
227
E-SERVICES ACCEPTANCE MODEL (E-SAM)
The outer weights for latent variables are at significant levels shown next to the path coefficients depicting significant arrows in bold. Latent variables User Experience (Exp), User Perceived Ease of Use (PEOU), have a significant impact on User Acceptance (Accept). However, the most important impact on User Acceptance comes from User Experience (0.667) followed by User Perceived Ease of Use (PEOU 0.576). User Motivation (Mot 0.129) is weakly rated and depicts to lesser influence on acceptance and continued use. Table 1: Path Coefficients User Exp User Mot User PU User PEOU User Accept User CU User Exp 0.000
0.000
0.000
0.000
0.000
0.000
User Mot 0.000
0.000
0.546
0.000
0.000
0.000
User PU
0.432
0.000
0.000
0.000
0.000
0.000
User PEOU User Accept User CU
0.667
0.129
0.000
0.000
0.000
0.000
0.000
0.000
0.306
0.576
0.000
0.000
0.000
0.000
0.000
0.000
0.372
0.000
Note: Path significance levels- p<.001; p<.01; p<.05 The lesser influence comes from User Motivation on User Perceived Ease of Use (0.129), which contradicts the motivation ability to influence perceived ease of use component (see Table 1). It is not surprising that the User Experience is much more important for eServices acceptance and continued use than some traditional offline characteristics. User Experience though impact User Perceived Ease of Use (0.667) but it does less on Perceived Usefulness (0.432), thereby validating the TAM construct in extending to the electronic service acceptance model (ESAM). User Acceptance is an important factor in web-based eServices. It mainly depends on User Experience through Perceived Ease of Use in driving the User Acceptance of eServices. It is interesting to note that User Perceived Usefulness has a direct influence on Motivation (0.546). It indicates that the construct of Usefulness and Motivation may have a direct influence on the user- task process, and the user expressing a flow of high or low motivation. Table 2 illustrates decomposition of the data in R² towards the contribution of each explanatory variable to the prediction of the dependent variable and it makes sense when the correlation coefficients (see Table 3) predict the underlying assumption of the working model. The R² for User Acceptance is 0.586 and it is very satisfactory taken into account the diversity of the model. Similarly, the R² value for Continued Use is 0.138 and is weak, indicating continued use may not be a function related to user acceptance of eServices, and may be affected by other uncontrollable factors beyond user’s first time acceptance. TABLE 2: INNER MODEL Latent Var. Mean Location Multi. Av RSq. ResVar User Exp 0.0000 0.0000 0.0000 0.5949
Av Av Comm Redund 0.4051 0.0000
User Mot
0.0000 0.0000
0.2972
0.6063
0.3937 0.1172
User PU
0.0000 0.0000
0.1869
0.2073
0.7927 0.1482
User 0.0000 0.0000 PEOU User 0.0000 0.0000 Acceptance User 0.0000 0.0000 Cont Use
0.5386
0.3629
0.6371 0.3431
0.5862
0.4212
0.5788 0.3393
0.1383
0.7536
0.2464 0.0341
Average
0.2913
0.5163
0.1439
TABLE 3: CORRELATION OF LATENT VARIABLES Latent Var. User Exp User Exp 1.00
User Mot
User PU User PEOU
User Accept
User Mot
0.445
1.00
User PU
0.432
0.546
1.00
User PEOU 0.725 User 0.572 A t User CU 0.215
0.426 0.489
0.455 0.568
1.00 0.715
1.00
0.313
0.298
0.213
0.372
User CU
1.00
Extraction communalities are estimates of the variance in each variable accounted for by the factors (or components) in the model. User Perceived Usefulness rates high on the scale (0.7927) followed by User Perceived Ease of Use (0.6371), User Acceptance (0.5788), User Experience (0.4051), User Motivation (0.3937), and User Continued Use (0.2464). Small values indicate variables that do not fit well with the model, and should possibly be dropped from the analysis. Communalities below (0.5) are ignored, that leaves User Perceived Usefulness, Perceived Ease of Use, and User Acceptance (see Table 2). It is interesting to note that a stronger influence to use Web-based eServices comes from User Perceived Usefulness (0.7927). This can be assigned to user task attributes that significantly affect user attitudes of perceived usefulness towards the eService task. The role of continued use in eService is dependent on users valuing different task characteristics that go into performing the task after the initial acceptance (i.e. first time use) and to a higher level the role of user experience acts as a catalyst in user task acceptance process. The performance of eService task acceptance does not reflect on the role of continued use, rather a weak path link (0.372) explains that user tendency in continuing the eService task may be related to broader range of factors indirectly influencing continued use of eServices. The regression coefficients point to this assumption. For example regression coefficients for Perceived Ease of Use are 0.539 and for User acceptance is 0.586, which is significantly acceptable in drawing the assumptions. It can be predicted that User Acceptance to a less extent a predictor of Continued Use. User Motivation has no direct affect on Continued Use but indirectly though Perceived Usefulness. Similarly User Experience has no direct affect on Continued Use but indirectly through Perceived Usefulness and Perceived Ease of Use in Acceptance of eServices (see the path coefficients in Fig 2). The decomposition of the outer model (Table 4) provides a good insight into the inner working of the model. Items have reasonably high loadings (≥ 0.50), with the majority greater than 0.70, therefore demonstrating convergent validity. All items were found to be significant at 0.01 level. Majority of the manifest variables load significantly high, with the exception of few (see Table 4). For example EX3, M7, UF4, UF7, UF8 scored below 0.5 indicating further validation of these items in the instrument. Higher loadings rated for Perceived Usefulness, followed by Perceived Ease of Use, User Motivation, User Experience, and User Acceptance. Notably the loadings are significantly high for User Experience and User Motivation, once again predicting the affect of these variables on the overall model structure.
228
SANDHU
usefulness it has no effect on perceived ease of use. There is no indication that perceived usefulness drives user acceptance, whereas perceived ease of use is driving the user acceptance process of web-based e-services. There is no support for any relationship between user acceptance and the functionality of continued use of web-based e-services. This may explain that user acceptance of technology may not always leap into continued use after initial acceptance; rather Table 4: Outer Model ===========================================================it is followed by user-centred decision-making process. User motivation variable demonstrates to have less influence on user acceptance, analysis into the latent variables structure reveals strong correlations among its own latent variable. Items such as M1 (0.794), M3 (0.713), M4 (0.748) significantly load high on its own latent variable indicating that User Motivation as a construct can still be validated by effectively manipulating this construct with in the model.
Variable Weight Loading Location ResidVar Communal Redundan --------------------------------------------------------------------------------------------------User Experience (EXP) outward EX1 0.2907 0.7298 0.0000 0.4674 0.5326 0.0000 EX2 0.3256 0.7658 0.0000 0.4136 0.5864 0.0000 EX3 0.1237 0.4455 0.0000 0.8015 0.1985 0.0000 EX4 0.2024 0.5087 0.0000 0.7412 0.2588 0.0000 EX5 0.3340 0.7321 0.0000 0.4640 0.5360 0.0000 EX6 0.2409 0.5643 0.0000 0.6816 0.3184 0.0000 --------------------------------------------------------------------------------------------------User Motivation (MOT) outward M1 0.3592 0.7940 0.0000 0.3696 0.6304 0.1877 M2 0.2571 0.5803 0.0000 0.6633 0.3367 0.1003 M3 0.2134 0.7126 0.0000 0.4923 0.5077 0.1512 M4 0.2584 0.7477 0.0000 0.4409 0.5591 0.1665 M5 0.1601 0.5622 0.0000 0.6840 0.3160 0.0941 M6 0.1793 0.5169 0.0000 0.7329 0.2671 0.0795 M7 0.1011 0.3723 0.0000 0.8614 0.1386 0.0413 --------------------------------------------------------------------------------------------------User Perceived Usefulness (PU) outward PU1 0.2887 0.8450 0.0000 0.2860 0.7140 0.1335 PU2 0.2857 0.9022 0.0000 0.1860 0.8140 0.1522 PU3 0.2827 0.9199 0.0000 0.1538 0.8462 0.1582 PU4 0.2669 0.8925 0.0000 0.2035 0.7965 0.1489 --------------------------------------------------------------------------------------------------User Perceived Ease of Use (PEOU) inward PE1 0.1744 0.7898 0.0000 0.3763 0.6237 0.3359 PE2 0.0776 0.7392 0.0000 0.4537 0.5463 0.2943 PE3 0.1910 0.7423 0.0000 0.4490 0.5510 0.2968 PE4 0.2000 0.7704 0.0000 0.4064 0.5936 0.3197 PE5 0.5456 0.9331 0.0000 0.1293 0.8707 0.4690 ---------------------------------------------------------------------------------------------------User Acceptance (Accept) outward UF1 0.5108 0.8303 0.0000 0.3106 0.6894 0.4041 UF2 0.3339 0.6468 0.0000 0.5816 0.4184 0.2453 UF6 0.4540 0.7928 0.0000 0.3714 0.6286 0.3685 ---------------------------------------------------------------------------------------------------User Continued Use (CU) outward UF3 0.5895 0.6980 0.0000 0.5128 0.4872 0.0674 UF4 0.0791 0.3244 0.0000 0.8948 0.1052 0.0146 UF5 0.6276 0.7284 0.0000 0.4694 0.5306 0.0734 UF7 0.2761 0.3201 0.0000 0.8976 0.1024 0.0142 UF8 -0.2168 -0.0800 0.0000 0.9936 0.0064 0.0009 ===========================================================
Table 6: Summary of Hypothesis Tests Hypothesis Support H1a: User Experience Æ User No H1b: User Experience Æ User Yes H2a: User PU Æ User Yes H2b: User Motivation Æ User No H3a: User PU Æ User No H3b: User PEOU Æ User Yes H4: User Acceptance Æ User No
VI CONCLUSIONS
The evidence from the PLS analysis strongly supports the TAM construct (i.e. perceived ease of use and perceived usefulness). The consistent pattern suggests that the measurement scale has high validity and reliability. It may also be suggested that user acceptance for Web-based eServices is likely to increase if the users experience a higher perceived ease of use within the website. Such features enhance user’s ability with higher flexibility within the website.
Note: All loadings are significant at .001
The inner model explains that Perceived Ease of Use is the most important variable in the prediction of User Acceptance contributing 40% of the R², followed by User Experience 30%, User Perceived Usefulness 21%, and User Motivation 9% (Table 5). Such interpretation of path coefficients provides vital information on the working structure of the model; any such generalisation should be interpreted with caution and take into account non-significant path coefficients as they can come from multicollinearity problem [31]. Table 5: Explanatory Variables for User Acceptance
Explanatory Variables for User Acceptance User Experience User Motivation User PU User PEOU
Path Coefficient 0.432 0.129 0.306 0.576
Correlation 0.572 0.489 0.568 0.715
Percentage based on path 30% 9% 21% 40%
The analysis then suggested acceptance of three of the hypotheses and rejection of four (Table 6) The evidence reveals that perceived usefulness is not affected by user experience but it does to perceived ease of use. Though user motivation is influenced by perceived
In contrast user experience doesn’t influence perceived usefulness, which may be supported by website characteristics being different from user perspective. It is interesting to note that high correlation exists for User experience (0.572) and perceived ease of use (0.715), which can further enhance user acceptance of eServices. The pattern of user motivation aligning with perceived usefulness may be related to the users finding higher-level motivation when the perceived usefulness within websites is found to be high. User acceptance of eServices may increase when users experience higher level of motivation in conducting the task. Higher motivation may account to the predicability for revisits to the websites. This research tested the acceptance of web-based eServices in relation to factors such as users experience, motivation, usage acceptance and continued use. The TAM factors (perceived ease of use and perceived usefulness) measured in a consistent pattern. User Experience and Motivation measured strongly alongside TAM constructs indicating a pattern existed for further extension of TAM. Although perceived ease of use influences user acceptance of eServices, it has no affect on continued use; whereas perceived usefulness doesn’t influence user acceptance and has no affect on continued use. This suggests that the application of TAM in Web-based eService Acceptance can be enhanced by integration of other constructs. The TAM constructs provide a structure for the E-SAM model. There is no support in the study for continued use of eServices based on user initial acceptance. This is surprising
E-SERVICES ACCEPTANCE MODEL (E-SAM)
as there is evidence that supports user acceptance of technology is based on prior usage. In this context initial acceptance cannot be counted as a function for continued use alone, and range of other prominent factors need be considered.
REFERENCES [1] Accenture (2003). Innovation Delivered. www.accenture.com/webservices [2] Datamonitor (2000). The U.S. market for Internet-based customer service. Report No. DMTC0681. April 2000. www.datamonitor.com/~60a60daeaab049e191c7be489b90b 82f~/all/reports/product_summary.asp?pid=DMTC0681 [3] Khalifa, M., and Liu, V. (2003). “Satisfaction with Internet-Based Services: The Role of Expectations and Desires”. International Journal of Electronic Commerce, 7(2):31-49. [4] Kaltenmark, J (2003). Innovation Delivered. Accenture Report on Web Services. www.accenture.com/webservices [5] Davis, F. D. (1989). “Perceived usefulness, perceived ease of use, and user acceptance of information technology.” MIS Quarterly 13(2): 319-40. [6] Davis, F.D., Bagozzi, R.P., and Warshaw, P.R. (1989). “User Acceptance of computer technology: A comparison of two theoretical models.” Management Science 34(8): 9821002. [7] Heijden, H (2000). “Using the Technology Acceptance Model to predict website usage: Extensions and Empirical test. Research Memorandum 2000-25, Vrije Universiteit, Amsterdam. [8] Gefen, D., and Straub, D. (2000). “The relative importance of perceived ease of use in IS adoption: A study of e-commerce adoption.” Journal of the Association for Information Systems 1 Article 8: 1-20. [9] Venkatesh, V., and Morris, M.G. (2000). Why don’t mean ever stop to ask for directions? Gender, social influence, and their role in technology acceptance and usage behavior. MIS Quarterly, 24: 115-139. [10] Wright, K.M., and Granger, M.J. (2001). Using the web as a strategic resource: an applied classroom exercise. Proceedings of the 16th Annual Conference of the Intl. Academy for Inf. Management, New Orleans, Louisiana [11] Steer, D., Turner, P., and Spencer, S. (2000). “Issues in adapting the Technology Acceptance Model (TAM) to investigate non-workplace usage behavior on the worldwide-web. School of Inf. Systems, University of Tasmania. [12] Morris, M.G., and Turner, J.M. (2001). Assessing users’ subjective quality of experience with the World Wide Web: an exploratory examination of temporal changes in technology acceptance. Intl. Journal of Human-Computer studies. 54: 877-901. [13] Csikszentmihalyi, M. (2000). Flow: The psychology of optimal experience. New York: Harper and Row. [14] Bowman, R.F., Jr. (1982). A Pac-Man Theory of Motivation: Tactical implications for classroom instruction. Educational Technology 22(9): 14-16. [15] Csikszentmihalyi, M. and LeFevre, J. (1989). “Optimal experience in work and leisure.” Journal of Personality and Social Psychology 56(5): 815-22. [16] Miller, S. (1973). Ends, Means, and Galumphing: Some Leitmotifs of Play, American Anthropologist 75:87-98.
229
[17] Webster, J., Trevino, L.K., and Ryan, L. (1993). “The dimensionality and correlates of flow in human-computer interactions.” Computer in Human Behavior 9: 411-26. [18] Ruyter, K. D., Wetzels, M., and Kleijnen, M. (2001). “Customer adoption of eService: an experimental study.” Intl. Journal of Service Industry Management 12(2): 184-207 [19] Hoffman, D. L., and Novak, T.P. (1995). “Marketing in hypermedia computer-mediated environments: Conceptual foundations.” Research Program on Marketing in ComputerMediated Environments: 1-36. [20] Vallerand, R.J. (1997). Toward a hierarchical model of intrinsic and extrinsic motivation. Adv. Experiment. Soc. Psych. 29: 271-360 [21] Deci, E.L., and Ryan, R.M. (1985). Intrinsic Motivation and Self-Determination in Human Behavior, NY: Plenum Pr. [22] Moore, G., and Benbasat, I. (1991). ‘Development of an instrument to measure the perceptions of adopting an Information Technology innovation.” IS Res 2(3): 192-221. [23] Pitkow, J.E., and Recker, M.M., (1995). “Using Web as a survey tool: Results from the second WWW user survey. Computer Networks and ISDN Systems, 27(6): 809-822 [24] Corea, S. (2000). “Technological capacitation in customer service work: A sociotechnical approach. In P.DeGross and J.I.DeGroass (eds), Proceedings of the Twelfth-First ICIS. Atlanta: Association for Information Systems, 45-47. [25] Zeithaml, V.A., Parasuraman, A., and Malhotra, A. (2000). “A Conceptual framework for understanding eService Quality: Implications for future Research and managerial Practice.” Marketing Science Institute, Working paper, Report no: 00-115 [26] Sandhu, K., and Corbitt, B (2002). “Exploring an understanding of Electronic Service end-user adoption,” The International Federation for Inf. Processing, WG8.6, Sydney. [27] Karahanna, E; Straub, D.W., and Chervany, N.L. (1999). “Information Technology Adoption Across Time.” MIS Quarterly, 23(2): 183-213. [28] Chin, W.W., and Lee, M.K.O. (2000). A proposed model and measurement instrument for the formation of IS satisfaction: Proc. of the ICIS, Atlanta: AIS, 553-563. [29] Wold, H. (1985). “Partial Least Squares”, in Encyclopedia of Statistical Sciences, vol.6, Kotz, S and Johnson, N.L. (eds), John Wiley and Sons, NY, 581-591 [30] Compeau, D.R., and Higgins, C.A. (1995). “Application of social cognitive theory to training for computer skills. IS Res, 6(2): 118-143 [31] Chatelin, Y.M., Vinzi, V.E., and Tenenhaus, M. (2002). “State-of-art on PLS Path Modelling through the available software.” HEC School of Bus and Mgt, France.
Factors for E-Services System Acceptance: A Multivariate Analysis Kamaljeet Sandhu School of Computer and Information Science University of South Australia Mawson Lakes, SA 5095 Australia [email protected] Abstract- This study investigates factors that influence the acceptance and use of e-Services. The research model includes factors such as user experience, user motivation, perceived usefulness and perceived ease of use in explaining the process of e-Services acceptance, use and continued use. The two core variables of the Technology Acceptance Model (TAM), perceived usefulness and perceived ease of use, are integrated into the Electronic Services Acceptance Model (E-SAM).
I. INTRODUCTION E-Services system continuance at the individual user level is central to the survival of many electronic commerce firms, such as Internet service providers (ISPs), online retailers, online banks, online brokerages, online travel agencies, and the like [1]. Information, if not available to individual users in online and offline environments, directs the user in adopting the search process to meet their information requirements. If users are successful in the search process it forms a positive perception and becomes part of user experience. E-Services system on website is generally perceived as being successful, but insufficient evaluation has been conducted on how well the websites meets their user’s primary information requirements [2]. Accordingly they argues that in the pre-adoption process users evaluate the extent to which their expectations were met and in the post-adoption process, they make a decision about continuing or discontinuing use of the system. An individual’s intention to adopt (or continue to use) Information Technology (IT), including e-Services is determined by two basic sets of factors: one set reflects personal interests and the other reflects social influence [3, 4, 5, and 6]. With regard to personal factors, attitudes toward adopting (or continuing to use) an IT system or innovation reflects an individual’s positive and negative evaluations of it (i.e. experience, motivation, perceived usefulness, and perceived ease of use). II. THE RESEARCH MODEL- ELECTRONIC SERVICES ACCEPTANCE MODEL (E-SAM) The research model includes e-Services acceptance and continued use of e-Services as dependent variables. The dependent variable, that is acceptance of e-Services, relates to 1) the intention to navigate e-Services system, which makes way for user interaction; 2) the intention to use the eServices system more for work, indicating that user interaction is facilitated; and 3) the likelihood of finding information quickly and so meets with user objectives. The
other dependent variable is ‘continued use’ of the e-Services system in relation to international students’ use in areas other than work, success or failure in work, task focus, screen design personalisation, effects of incomplete information and unfriendly features. All of the variables were measured on six point Likert scales.
User E-Service Experience
H1
User E-Service Perceived Usefulness
H2
H5 User E-Service Acceptance
H3 User E-Service Motivation
H4
User E-Service Perceived Ease of Use
H7
User EService Continued usage
H6
Figure 1: Electronic Services Acceptance Model (E-SAM) A. EASE OF USE FACTOR Ease of use (factor 1) was determined by international student response to items about how to use the e-Services system, learning to use those service, the flexibility in using the e-Services system, the level of effort that was required, the ease to use the e-Services system, navigation of the eServices system, movement that related to accessibility of web pages on the e-Services system, and information on the e-Services systems web pages. B. USEFULNESS FACTOR Usefulness (factor 2) was determined from measures of eServices system support, ability to do work quickly, work performance, work productivity, work effectiveness, ease of use, usefulness, and work usage of the e-Services system. C. EXPERIENCE FACTOR International student experience (factor 3) was determined by measures of e-Services system control when interacting with the e-Services system, feelings of confusion in use, feelings of calmness when interacting with the e-Services system, feelings of frustration when faced with problems when using the system, ability to locate the e-Services system, the process of becoming skilled when interacting with the e-Services system, and knowing about e-Services system availability on the university’s website. D. MOTIVATION FACTOR Motivation (factor 4) was determined from measures of student inspiration, encouragement to use e-Services based on past and present interactions, the level of expectations in using the e-Services system, and peer influence. The acceptance of e-Services (factor 5) was determined by
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 230–235. © Springer Science+Business Media B.V. 2008
231
FACTORS FOR E-SERVICES SYSTEM ACCEPTANCE: A MULTIVARIATE ANALYSIS
measures about monitoring time when using the e-Services system, money spent when using the system, and about proposed benefits that positively or negatively affect acceptance of the e-Services system. E. ACCEPTANCE AND CONTINUED USAGE FACTOR The continued usage of e-Services (factor 6) was determined by measures that focused on e-Services system features that the international students used when revisiting the e-Services system. These were work accomplishment in terms of the task they were able to do, perception of the level of support, feelings of independence when using the e-Services system, e-Services system screen displays, e-Services system personalised welcome, levels of non-work usage, and concentration levels when interacting with the e-Services system. The continued use of e-Services was assumed to have been developed over a period of time when students have formed a perception of their successful and unsuccessful activities on e-Services system, on the concentration needed to interact with the e-Services system, on the state of the customised screens and on other features of the e-Services system that assisted them in their work. III. HYPOTHESES From the analysis of previous research it was hypothesized that in the in the acceptance, use and continued use of the eServices system: H1: user experience positively affects perceived usefulness, H2: user experience positively affects perceived ease of use, H3: perceived usefulness positively affects user motivation, H4: perceived ease of use positively affects user motivation. The Technology Acceptance Model (TAM) provides a useful comparison between perceived usefulness and perceived ease of use. Based on usefulness of the e-Services system the user will have a favorable perception to accept eServices if it results in some form of improvement or increase in work productivity that is directly related to the use of the e-Services. The useful outcomes from using eServices are dependent on e-Services acceptance. Similarly, if the user’s perceptions of the e-Services system are that it is easy to use it would reflect in the level of acceptance of the system. Therefore, it was also hypothesized that in the acceptance, use and continued use of the e-Services system: H5: perceived usefulness positively affects user acceptance, H6: perceived ease of use positively affects user acceptance, H7: user acceptance positively affects continued use of eServices The continued use of e-Services is formed on the basis that the user has accepted the system as a productive tool in their work. IV. DATA ANALYSIS For the data analysis the study used a two-step estimation approach, in which the first step analyses the measurement model and the second estimates a structural relationship. The two-step estimation details the distinctive characteristics of the relationship between indicators (observed variables) and the underlying variables. After completing an estimation of the measurement model through an exploratory factor
analysis, the next step involves a path analysis (structural model) using Partial Least Squares [7]. To conduct a factor analysis it is important to test the data initially for the adequacy of the data and the degree of relatedness of the variables. The Kaiser-Meyer-Olkin Measure of Sampling Adequacy is a statistic which indicates the proportion of variance in the variables which is common variance, i.e. which might be caused by underlying factors (SPSS 2003). Table 1 shows the empirical value for KaiserMeyer-Olkin Measure of Sampling is 0.915 which explains that a factor analysis will be useful with the data (SPSS 2003). Bartlett’s test of sphericity indicates whether the correlation matrix is an identity matrix, which would indicate that the variables are unrelated (SPSS 2003). The Bartlett’s test of sphericity is 0.000. Very small values (less than .05) indicate that there are probably significant relationships among the variables (SPSS 2003). Table 1 Kaiser-Meyer-Olkin Measure and Bartlett’s Test The next step was to do a factor analysis using SPSS (2003). Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Bartlett's Test of Approx. Chi-Square Sphericity df Sig.
0.915 5515.282 703 .000
V. FACTOR ANALYSIS An exploratory factor analysis was conducted with Varimax rotation to test the construct validity of the survey questions. Rummel [8] suggests that a hypothesis regarding dimensions of user attitude can be measured using factor analysis because meaning which is associated with ‘dimension’ is that of a cluster or group of highly inter-correlated characteristics or behaviour for which factor analysis may be used to test for their empirical existence. The author further suggests that those characteristics or behaviours which should, in theory, be related to a particular dimension, can be postulated in advance and statistical tests of significance can be applied to the factor analysis results. Another technique for testing the hypotheses in this study is ‘factor analysis’ to test patterns within the variables. The testing of the hypotheses is intended to demonstrate further validation of the instrumentation, as discussed earlier. If the variables perform as predicted by theory, then we can infer that the measurement of the variables is nomologically valid [9]. For scale assessment, a combina tion of confirmatory factor analysis and reliability analysis is used. Confirmatory factor analysis is used to assess variable validity for the variables considered in this research. Follow-up reliability analysis is used to further assess the stability of the scales used. To test the reliability of the measures, Cronbach’s alpha [10] values for the multiple items were calculated. All variables (factors) that merged from the factor analysis showed high Cronbach’s (except for continued use), establishing the reliability of the instrument. Perceived usefulness had a Cronbach .908, perceived ease of use .899, motivation .731, experience .7213, e-Services acceptance .735, and e-Services
232
SANDHU
continued usage .519. This analysis helps reinforce the validity and reliability of the scales used in this study. According to Nunnally (1978), a reliability value of 0.8 or greater is desired, but 0.7 or greater is acceptable. Thus the reliability for all the measures is acceptable. Two sets of Factor Analysis were conducted (Factor Analysis 1 and 2). The first set included only the TAM variables of usefulness and ease of use (Table 3 and 4). The purpose of first conducting a factor analysis with TAM variables was to analyse how the TAM items in a factor perform. The second factor analysis includes other variables such as international students experience, their motivation, e-Services acceptance, and continued usage of e-Services as well as the TAM variables of usefulness and ease of use (Table 3 and Table 4). The comparison of two sets of Factor analysis provides information about the relationship of TAM variables to other variables that have been introduced in this study and enable examination of the performance of TAM variables that were adopted into the Electronic Services Acceptance Model (E-SAM). VI. FACTOR ANALYSIS 1 Table 2 shows the variance explained for the TAM model factors, perceived ease of use and perceived usefulness. Only two factors have been extracted with eigenvalues above 1, in total the two factors account for 60.351% of the variance in the items and therefore appear to be a good representation of the original data set. Factor 1- perceived ease of use explains 30% of the variance and factor 2- perceived usefulness also explains 30% of the variance. Table 2 Total Variance Explained Extraction Method: Principal Component Analysis. Factors
1 2 3
Initial Eigenvalues % of Cum ul V ar % 46.187 46.187 14.164 60.351 4.653 65.004
T otal 7.390 2.266 .744
Extraction Sum s of Squared Loadings % of Cum ul T otal V ar % 7.390 46.187 46.187 2.266 14.164 60.351
Rotation Sum s of Squared Loadings % of Cum ul T otal V ar % 4.845 30.279 30.279 4.812 30.073 60.351
The factor scores for each item measuring these two factors, ease of use and usefulness, are shown in Table 3. To test the construct validity of ease of use and usefulness, factor loadings were assessed. Loadings for items above .50 are show. A high loading value indicates that a strong relationship exists between that item and either ease of use or usefulness. Factor 1 relates to ease of use and Factor 2 to usefulness. A. EASE OF USE FACTOR The first factor consists of items that measured: understandability about how to use the system, learning to use the e-Services system, the flexibility in using the eServices system, the level of effort that is required to use eServices system, the ease to use the e-Services system, navigation of the e-Services system on the web, movement that related to accessibility of web pages on the e-Services system, and finding information on the e-Service system. The result of the analysis shows that most items load highly on their associated factors.
B. USEFULNESS FACTOR The second factor consists of items that measured: the level of support on the e-Services system, impact on working quicker, effect on work in terms of performance, productivity, and effectiveness, usage of the system, ease of use of the system, and usefulness in work. All items in the ‘usefulness’ factor had loadings above 0.8. The items student’s work performance (.849), work productivity (.841), and work effectiveness (.842) emerged with the highest measures. The results reveal that these variables are valid and strongly influence the usefulness factor. The higher loading items in the ease of use variable are the ease of use of the e-Services system (.818), learning to use the e-Services system (.779), and in understanding how to use the system (.772). These measures are valid and have a strong influence on the ease of use factor 1 (Table 3). These items in the ease of use factor 1 are important and form a reliable scale for this variable. For both factors the composite reliability coefficients values exceeded 0.7 and are thus considered valid according to Nunnally [11]. Thus, the measures of these variables are reliable. Table 3 Rotated Component Matrix for students user group F a c to r 1 E ase o f u se
F a c to rs
F a c to r 2 U se fu ln e s s
E ase o f u se U n d e rs ta n d a b ility
.7 7 2
.2 1 5
L e a rn in g
.7 7 9
.1 4 8
F le x ib le
.7 1 6
.2 1 7
E ffo rt
.7 5 0
.2 5 0
E a s y to u s e
.8 1 8
.2 4 9
N a v ig a tio n
.7 6 9
.2 5 9
M ovem ent
.7 1 1
.0 9 5
F in d
.5 8 0
.2 7 5
U se fu ln e s s e -S e r v ic e s su p p o rt
s y s te m
.0 6 5
.6 3 9
Q u ic k e r
.3 3 0
.7 2 8
P e rfo rm a n c e
.1 8 5
.8 4 9
P ro d u c tiv ity
.2 3 4
.8 4 1
E ffe c tiv e n e s s
.1 7 9
.8 4 2
E a s ie r
.2 9 0
.7 5 1
U sefu l
.2 4 9
.6 7 9
W o rk u sag e
.2 2 5
.5 9 3
Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. A Rotation converged in 3 iterations.
The results from the factor analysis (Table 3) support the positioning of TAM variables (i.e. perceived ease of use and perceived usefulness) in the Electronic Services Acceptance Model (E-SAM). The consistent data pattern suggests that the measurement scale has high validity and reliability. VII. FACTOR ANALYSIS 2 A second factor analysis was then undertaken which included the other factors: student experience and motivation, acceptance and continued usage of the eServices system along with the TAM factors - perceived usefulness and ease of use. Table 4 displays the loadings in a matrix for the international student user group. Factor 1 represents ease of use, factor 2 usefulness, factor 3 student experience, factor 4 student motivation, factor 5 acceptance
FACTORS FOR E-SERVICES SYSTEM ACCEPTANCE: A MULTIVARIATE ANALYSIS
of e-Services, and factors 6, 7, 8, and 9 relate to continued usage of the e-Services system. Factorial validity is concerned with whether the ease of use and usefulness form distinct variables (Davis 1989). Table 4 shows the factors, loadings, and resultant scale reliabilities for the ease of use, usefulness, experience, motivation, acceptance and continued use of the e-Services system. As expected, the ease of use and usefulness factors measured strongly in this study. A. EASE OF USE FACTOR The factors, ease of use, tests the student’s abilities in learning how to use the e-Services system. Learning to use the e-Services system seems reasonable for student user interaction to begin the use with the e-Services system. Similar findings about the learning to use the system have been positively reported by [12]. This study has arrived at similar findings that the users perceive the e-Services system to provide some form of understanding about how to use the e-Services system which may form a basic criteria determining user-e-Services system interaction. This finding has also consolidated previous research findings [e.g. 13, 14, and 15] that the user in a task environment associates with their willingness to greater learning on how to use those systems. B. USEFULNESS FACTOR Factor 2 measured usefulness of the e-Services. The most influential item in this factor is student user’s work effectiveness. It is reasonable to argue that the usefulness of a system, amongst other things, may be determined on the basis of how effective an e-Services system is in user work. This item has been similarly recorded as the highest measuring item in Davis’s study [16] when he first theorized TAM. This item is a fundamental structure that forms the usefulness of the systems. The findings of user work effectiveness in perceived usefulness factor has been demonstrated to be equally validated here in this study. This study has also demonstrated the robustness of this item and its application and validation for the e-Services system and has also tapped into an important aspect of usefulness of the e-Services system. C. EXPERIENCE FACTOR Factor 3 reports on the student users experience to use the eServices system. The influential item captured in this factor is user’s feelings of frustration. The feelings of frustration in using the system scored in the top number three items in Davis’ study [16] and has occupied number one place in this study. This item has been considered as an important aspect of the user’s expressing their experience with the e-Services system. However, this item was measured in Davis’ study [16] as a perceived ease of use factor item and not as user experience, as has been adopted in this study. Nevertheless, the ranking of this item in the number one place in the formation of the user experience factor is considered as influential in e-Services systems use. The two items having negative values in factor 3 of user experience are users feeling confused when using the e-Services system and users feeling frustrated. These are shown in Table 4. These two items were rotated in SPSS to attain a positive valuation in order to be included in the experience factor. One other
233
interesting finding that emerged in the data is that the item, users feeling confused using the system, scored on the seventh rank in the Davis study [16], but scored at the number two rank for the items in this study. The scores and rankings of these two items, user frustration and user confusion in using the e-Services system, have improved when adopted in the user experience factor, rather than in the perceived ease of use factor. D. MOTIVATION FACTOR Factor 4 reports on student motivation to use the e-Services system. Previous research in technology adoption by Venkatesh [15] suggests that user’s motivation includes the desires to perform an activity based on perceptions of past and present efforts. The item that is user encouragement to use e-Services based on past and present interactions is the most influential item in the motivation factor of this study. Bandura [17] suggests two types of expectations, efficacy and outcome that best determine user behaviour. An outcome expectancy is defined as a “person’s estimate that a given behaviour will lead to certain outcomes” [17]. Therefore users understanding of interaction with the eServices system can largely be based on this assumption that users are encouraged to use system and are driven by the outcomes of those perceptions and which are formed from past and present outcomes that the user had in using the system. Koufaris [9] Heijden [18] found that perceived enjoyment on websites that is associated with user motivation is influential in driving the continued use of, and visits to, the websites. E. ACCEPTANCE OF E-SERVICES FACTOR Factor 5 reports on acceptance of the e-Services system. The influential item in this factor is users monitoring time. This item in Davis study [16] was dropped after it was placed in the eleventh rank of the item list for perceived usefulness. Though the wordings of this item has been improved in this research to- users monitoring time from the Davis study [16] that was phrased as- Saves me time. This item was adopted in the user acceptance factor compared to Davis’ study [16] that adopted this item as a perceived usefulness item. The findings of this item on the first rank of the user acceptance factor also demonstrates that this item performs well when adopted in the user acceptance of e-Services and better explains the user perception towards e-Services systems use. F. CONTINUED E-SERVICES USAGE FACTOR Factors 6, 7, and 8 have reported on the continued e-Services usage). The item users inadequate support measured as the most influential item in continued use of the e-Services system. This item has performed well when adopted in the continued use factor compared to Davis’ study [16] which was ranked number twelve on the item list and not adopted in the perceived ease of use factor. Though this item has been shown to perform well in continued use of the eServices system in this study, it was subsequently dropped from the perceived ease of use in the Davis study [16]. The wordings of this item has been improved to- users feeling inadequate support on the e-Services system, which in Davis’ study [16] was phrased as- provides guidance. This item after scoring number one on the item list also
234
SANDHU
demonstrates that it best fits in the continued use factor in this study. Table 4 Rotated Component Matrix for students user group Factors
1 2 3 4 5 Ease Usefulness Experience Motivation Accept of use
Ease of use Understandability .686 Learning .770 Flexible .672 Effort .766 Easy to use .713 Navigation .704 Movement .627 Find .635 Usefulness Website support .065 Quicker .287 Performance .107 Productivity .206 Effectiveness .144 Easier .255 Useful .248 Work usage .179 Experience Control .456 Confused -.308 Calm .345 Frustrated -.274 Skilled .283 Know .306 Motivation Inspiration .372 Encouragement .165 Expectations .203 Peers influence .039 Acceptance Monitor time .096 Dollar spending -.072 Proposed benefits .038 Continued usage Website features -.120 Accomplishment -.139 Service .163 perception Inadequate .026 support Independent .310 Screens .009 Welcome .139 Non-work usage .243 Concentration -.027
6 7 8 9 Cont. Cont. Cont. Cont. usage usage usage usage
.205 .135 .220 .246 .250 .244 .068 .187
.322 .141 .125 .138 .309 .332 .191 .244
.080 .043 .177 .039 .032 .054 .142 .137
-.085 -.033 -.007 .071 -.028 .070 .018 .146
-.062 .095 .041 .010 .049 .209 .065 -.019 -.122 -.046 -.010 .016 .099 .022 .025 .014 -.099 .021 .085 .141 .010 -.007 .086 .129 -.197 .073 -.022 .124 -.134 -.087 .000 -.016
.536 .721 .826 .840 .842 .742 .704 .609
.040 .240 .179 .109 .126 .123 .084 .101
.235 .111 .130 .138 .139 .160 .086 .036
.204 .104 .088 .078 .060 .077 -.003 .067
-.095 .231 .137 .026 .048 -.041 .020 -.063 -.021 .017 .118 .084 .015 .063 .015 -.007 .060 -.024 .044 -.021 -.029 .121 -.055 .014 .009 .105 -.056 .045 -.042 .014 .199 .276
.161 -.131 .086 -.152 .263 .229
.531 -.730 .546 -.731 .590 .583
.101 -.077 .114 -.079 -.069 .125
-.003 .066 .030 -.012 .043 .016
.024 .050 .115 .048 .101 -.082 .010 .023 -.058 -.062 .233 -.105 .153 -.006 .014 .112 .124 .031 -.038 .160 -.001 .065 -.029 .284
.360 .148 .288 .199
.119 .120 .073 .046
.523 .693 .689 .650
.099 .014 -.037 .151
-.057 .062 .062 .336 -.014 -.081 .055 .163 .082 -.087 -.058 .140
.001 .061 .052 .145
.102 .088 .217
-.027 .158 -.137
.036 -.001 .134
.864 .787 .688
-.015 -.020 .074 .011 .082 -.017 .063 .203 .071 .166 -.060 -.122
-.018 -.017
-.098 -.062
-.070 .185
.132 .002
.683 .759
.081 .273 -.091 .091 -.026 .086
.113
.040
.164
.128
-.028
.731
.118 -.212
.145
.062
-.073
-.014
.235
.773
.028
.207 .098 .098 .051 .081
.076 .047 .048 -.010 .047
.035 .009 .142 -.017 .274
.064 .019 .060 .161 -.050
,067 .653 -.167 -.078 .363 -.011 .748 -.027 -.066 .144 .731 .143 .098 -.195 .083 .609 -.076 .169 .030 .611
.200
Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. Rotation converged in 9 iterations.
In order to retain an item, it must have a significant factor loading. Hair et al. [19] suggests that a factor loading value of 0.3 should be considered a significant value while a factor loading value of 0.4 can be considered to be more significant value. A factor loading value that exceeds 0.5 is considered to be very significant value for convergent validity [19]. All the items in the survey used in this study achieved convergent validity. The items that had a loading value below .05 are e-Services system enjoyment (0.466), confidence to use the e-Services system (0.442), improved self-service to use the e-Services system (0.383), finding support on the e-Services (0.425), motivation by improved work performance (0.413), training to use the e-Services system (0.383), and new features of the e-Services system (0.463). These items were dropped to improve the factor scores. The result of the new factor analysis is shown in Table 4. Eigen values above the threshold level (1.0) were used as the criterion for the factor analysis and the factor analysis now explains 63.315% variance among items. As shown in Table 5, the independent variables converged well into four factors,, perceived ease of use (factor 1) explains 14% of the variance, perceived usefulness (factor 2)
explains 14% of the variance, user experience (factor 3) explains 8% of the variance, and user motivation (factor 4) explains 5% of the variance, verifying that each item measuring one of the four variables representing that specific factor. Table 5 Total Variance Explained ExtractionSumsof Squared Factors
Initial Eigenvalues Total
%of
Cumul
Var
%
RotationSumsof Squared
Loadings Total
Loadings
%of
Cumul
Var
%
%of Total
Var
Cumul%
1. Easeof use
10.64 28.78 28.780 10.64
28.78 28.780 5.256 14.206
14.206
2. Usefulness
3.255 8.798 37.578 3.255
8.798 37.578 5.248 14.185
28.390
3.Experience
1.876 5.071 42.649 1.876
5.071 42.649 3.052
8.249
36.639
4. Motivation
1.720 4.648 47.297 1.720
4.648 47.297 2.135
5.770
42.410
5. Acceptance
1.430 3.865 51.162 1.430
3.865 51.162 2.069
5.591
48.000
6. Continueduse
1.303 3.521 54.684 1.303
3.521 54.684 1.558
4.210
52.210
7. Continueduse
1.139 3.079 57.763 1.139
3.079 57.763 1.468
3.969
56.179
8. Continueduse
1.044 2.820 60.583 1.044
2.820 60.583 1.414
3.822
60.001
9. Continueduse
1.011 2.732 63.315 1.011
2.732 63.315 1.226
3.314
63.315
10
.961 2.598 65.913
Similarly, factor analysis of the dependent variables converged well into five factors, user acceptance which is factor 5 explains 5% of the variance, continued use which consists of factors 6, 7, 8 and 9 together explains only 3% of the variance shown in Table 5. The data shows that the independent variables i.e. perceived ease of use, perceived usefulness, and user experience have a greater impact on the factor score affecting the dependent variables i.e. user acceptance and use of e-Services and that the independent variables i.e. user motivation and continued use has the least impact on the dependent variables i.e. user acceptance and use of the e-Services system. VIII. SUMMARY The evaluation of the factors is reported in detail. However six key conclusions have emerged. The variables perceived ease of use and perceived usefulness are significant in the ESAM model in explaining those factors which influence the acceptance, use and continued use of the e-Services system by international students. The data shows that the most impact on user acceptance, use and continued use of the eServices system comes from perceived ease of use, perceived usefulness, and user experience, and perceived usefulness is more significant than perceived ease of use in this study. Student experience and motivation to use eServices are also significant factors in terms of influencing the acceptance, use, and continued use of e-Services. This also suggests that perceived usefulness is an important guiding factor to user acceptance of the e-Services system. Student evaluation of e-Services acceptance is based on time they spent in doing work on the e-Services system; money (dollars) spent in using the e-Services, and proposed benefits in student’s work to use e-Services. Finally, it can also be concluded that continued use of e-Services is primarily influenced by e-Services system features, work accomplishment that assists the user in continuously using
FACTORS FOR E-SERVICES SYSTEM ACCEPTANCE: A MULTIVARIATE ANALYSIS
them in their work and determine their importance based on meeting their work requirements. Because the validity and reliability of the measures are within acceptable levels, the next step was to perform a path analysis, using PLS [7] to test the formulated research hypotheses and evaluate some degree of causality. Factor analysis is not explanatory and only an indication of the factors. The factor analysis shows the validity of the factors that have been measured on the scale, but it tells little about the working structure of the E-SAM model. Then, to test the factors in a model, PLS was used to determine how the factors (i.e. the factors derived from factor analysis such as user experience, user motivation, perceived ease of use, perceived usefulness, user acceptance, and continued use of e-Services) behave in the E-SAM model. And to know the influence of those factors (i.e. user experience, user motivation, perceived ease of use, perceived usefulness) in user acceptance, use and continued use of e-Services. The PLS provides the fundamental basis to understand the relationship of the factors in the model and to distinguish from important factors to unimportant factors. The evidence from the PLS analysis supports the E-SAM model variables for user experience, perceived ease of use, and user acceptance and that they are important in explaining acceptance, use and continued use of the eServices System. The consistent pattern suggests that the measurement scale has high validity and reliability. In addition student acceptance of e-Services system is likely to increase if students perceive significant ease of use within the e-Services system. Such features enhance the ability of students, endowing them with higher flexibility within the eServices system. In contrast experience does not appear to influence perceived usefulness. This is perhaps supported by e-Services system characteristics being different from a student’s perspective. It is proposed that student experience affects the usage pattern for e-Services system. Better control features, selfservice ability and good support within the e-Services system are likely to increase usage frequency. It is interesting to note that a high correlation exists between student experience (0.572) and perceived ease of use (0.715), which can further enhance student acceptance of eServices. Table 6 show a summary of hypotheses tested in this study. Student experience does not affect perceived usefulness to use e-Services whereas perceived ease of use does. Perceived usefulness affects student motivation but it does not affect acceptance of e-Services, which is rather surprising since there is no correlation between the perceived motivation and acceptance of e-Services. Table 6 Summary of hypotheses tests Hypotheses Supported H1: Student experience positively affects perceived usefulness to use e-Services No H2: Student experience positively affects perceived ease of use to use e-Services Yes H3: Perceived usefulness positively affects student motivation to use e-Services Yes H4: Perceived ease of use positively affects student motivation to use e-Services No H5: Perceived usefulness positively affects student acceptance of e-Services No H6: Perceived ease of use positively affects student acceptance of e-Services Yes H7: Student acceptance positively affects continued use of e-Services No
235
There is no obvious link in this data between perceived usefulness, motivation, and acceptance. Rather the link exists only between perceived usefulness and motivation. It is more surprising to note that perceived ease of use affects acceptance of e-Services but it does not affect student motivation. There is no evidence to support the notion that eServices acceptance will lead to continued use of e-Services. REFERENCES [1] Bhattacherjee, A. (2001). “Understanding Information Systems continuance.” MIS Quarterly 25(3): 351-370. [2] D’Ambra, J., and Rice, R.E. (2001). “Emerging factors in user evaluation of the world wide web.” Inf. and Mgt 38: 373-84. [3] Karahanna, E., Straub, D.W., and Chervany, N.L., (1999). “Information Technology Adoption Across Time: A CrossSectional Comparison of Pre-Adoption and Post-Adoption Beliefs.” MIS Quarterly 23(2): 183-213. [4] M., and Liu, V. (2003). “Satisfaction with Internet-Based Services: The Role of Expectations and Desires”. International Journal of Electronic Commerce, 7(2):31-49. [5] Xue, M., Harker, P.T., and Heim, G.R. (2004). “Incorporating the dual customer roles in e-Service design.” The working paper series. The Wharton Financial Institutions Center. [6] Hsu, M., and Chu, C. (2004). “Predicting electronic service continuance with a decomposed theory of planned behaviour.” Behaviour and Information Technology, September–October 2004, 23(5): 359–373. [7] Chin, W. W. (1998). “The partial least squares approach for structural equation modelling.” Lawrence Erlbaum Associates. [8] Rummel, R. J. (1970). “Applied factor analysis.” Evanston, ILL: Northwestern University Press. [9] Koufaris, M. (2002). “Applying the Technology Acceptance Model and Flow Theory to Online Consumer Behaviour.” Info. Systems Research 13(2): 205-223. [10] Cronbach, L. J. (1971). “Test validation. Education Measurement.” American council on education, Washington D.C. [11] Nunnally, J. C. (1978). “Psychometric theory.” McGraw-Hill. [12] Dillon, A., and Gabbard, R. (1998). “Hypermedia as an Educational Technology: A Review of the Quantitative Research Literature on Learner Comprehension, Control, and Style.” Review of Educational Research, 68(3): 322-349. [13] Csikszentmihalyi, M. (1975). “Beyond boredom and anxiety.” Jossey-Bass, San Francisco. [14] Venkatesh, V., and Spier, C. (2000). “Computer technology training in the workplace: A longitudinal investigation of the effect of mood.” Organisational Behaviour and Human Decision Processes, forthcoming. [15] Venkatesh, V. (1999). “Creation of favourable user perceptions: Exploring the role of intrinsic motivation.” MIS Quarterly 23(2): 239-60. [16] Davis, F. D. (1989). “Perceived usefulness, perceived ease of use, and user acceptance of information technology.” MIS Quarterly 13(2): 319-40. [17] Bandura, A. (1986). “Social foundations of thought and action: A social cognitive theory.” Prentice-Hall, Englewood Cliffs, NJ. [18] Heijden, H. (2000). “Using the Technology Acceptance Model to predict Website Usage: Extension and empirical test.” Research Memorandum, Vrije Universiteit Amsterdam. [19] Hair, J. F., Anderson, R.E., Tatham, R.L., and Black, W.C. (1995). “Multivariate data analysis with reading.” Prentice Hall, Englewood Cliffs, NJ. [20] Gelderman, M. (1998). “The relation between user satisfaction, usage of information systems and performance.” Information and Management 34: 11-18. [21] Bianchinia, D., Antonellisa, V., Pernicib, B., and Plebanib, P. (2006). “Ontology-based methodology for e-service discovery.” Information Systems 31 (2006) 361–380”
A Qualitative Approach to E-Services System Development Kamaljeet Sandhu School of Computer and Information Science University of South Australia Mawson Lakes, SA 5095 Australia [email protected] Abstract- This paper reports a case study of user activities in electronic services systems development. The findings of this study explain the acceptance of an e-services system that is being implemented for processing student’s admission applications on a university website. The user’s interface with the systems development provides useful information on the characteristics of e-services. The ease of use construct in doing the task is an important feature of e-services. The other attributes are user control characteristics of an e-services system.
I.
INTRODUCTION
The paper reports a qualitative research conducted at an international student recruitment department of an Australian university. The department has introduced an electronic service system to process student applications. The users of the system are the university staff processing the applications directly on the university website and students making application for an admission. The service process can be completed on the website. Currently the staff is resisting the adoption of e-services because it does not offer adequate control on task management. The study examines the user-based control factors in e-service systems development arising from the end-user context. Effective user interface design is critical in the End-User Information System (EUIS) environment [1,2,3, and 4]. The level of control a user has in conducting a web-based electronic service task affects the users’ experience and subsequent usage. Control is a construct that reflects situational enablers or constraints to behavior [5 and 6]. Venkatesh [7] argue that control relates to an individual’s perception of the availability of knowledge, resources, and opportunities required in performing a specific behavior. Similarly, end-user control in web-based e-service systems environment may be defined in terms of the degree of effort the user has to exert to complete a task [3 and 4]. In order for the user to have a higher degree of control, users expect the system to be flexible to meet the task requirement. In this environment web electronic service software is merely a tool to get business tasks done. On the contrary accomplishing the purpose of the task is not an end in itself; rather it demonstrates users’ parameters in doing the task. The use of web-based electronic systems software should be transparent to users, and it should not distract users’ attention from the business task [1]. Working with EUIS software should feel natural, not require special concentration [1]. Unfortunately, the significance of this point is easily lost among IS developers, when developing systems [3 and 4]. Technical experts need to gather all
information about the user interfaces and embed them with the software tools for the best results [3 and 4]. “How much” and to “what extent” is for debate. Effective user interface focuses attention on two basic human factor principles: (1) to learn a software program, users develop a conceptual model of the interface; and (2) a software program should allow users to control the dialogue. II.
CONCEPTUAL FRAMEWORK
The discussion is based on theoretical concepts in analyzing the emerging evidence and its implication in webbased electronic service systems development from the user control-centered adoption perspective. Rogers (1983) argue that attributes that have an indirect effect on innovation adoptions may also play an important role. User control attributes may influence the user adoption decision of webbased e-service. The Technology Acceptance Model (TAM) of Davis [8 and 10] represents an understanding of IS usage and acceptance behavior [9]. TAM has been applied in a variety of end- user studies on world-wide-web [11, 12, 13, 14, and 15]. Applying the TAM model to investigate the end-user control may provide insight into what makes webbased e-service usefulness and user friendliness. TAM focuses on two such beliefs: perceived usefulness (PU) and perceived ease of use (PEOU), it may be suggested that integrating end-user control factor in TAM may have an influence on users’ acceptance of web-based e-service [7 and 16, see Fig. 1]. Provide higher control on inform ation Reduce m ultiple login and increase tim e out period R em ove m ultiple screens to one Provide higher level of task control
Provide user self-service Less reliance on paper docum ents
R em ove task duplicity Test users task com petency
USER TASK CONTRO L IN W EB BASED E-SER V ICES SYSTEM S
A rranging task in a sequence
Install intelligence to the system to detect task errors Inform users of new task tools R egular task training for users
A dd com pulsory field for docum ents A void m issing inform ation R em ove entering task data m anually Provide user m anuals, FA Q’s and online help Flexibility in editing docum ents Im prove screen layout and provide user flexibility to change screen features
Sandhu and Corbitt (2002)
Figure 1: User Task Control
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 236–241. © Springer Science+Business Media B.V. 2008
A QUALITATIVE APPROACH TO E-SERVICES SYSTEM DEVELOPMENT
In TAM study [8] “control over work” factor was retained, although it is ranked fairly low and argued to be an important aspect of usefulness. In other studies [7, 17, 5, and 6] control is treated as a perceptual construct in understanding user behavior. The control characteristics (see figure 1) in systems development have gained attention of researchers and practitioner alike. Emerging evidence subjectively points towards the end-user control factor in defining the current problem with in the web-based electronic service systems framework. The users were experiencing considerable degree of constraints in managing electronic task. These are the early perceptions that were gradually shaping user attitude towards the new Web-based system [18]. Attitude towards using a technology is omitted in TAM [7], albeit the users’ early attitude towards controlling the system (not vice versa) played a pivotal role in its adoption and consequently it’s diminishing usage. Research with the model found that the users’ experienced working within fixed standards and personalizing task features on individualized basis are not available. There is a gap between users personalizing the task (which though rested in systems control features and not with the users’) and the scope of electronic service assisting the user with the task. The users determined that these early perceptions of the new system being without ease of use. Senior management and systems development team, when implementing the system, worked on the concept of standardizing the web-based electronic service output. This developed into end-user resistance in adopting the webbased electronic service within the set parameter. The enduser found the control factor as inheriting within the system and not with themselves (i.e. end-users) and this made the electronic task difficult to perform. The end-users based their expectations on reasonable experience in the past when task control inherited with the user and not the system. To some extent the users felt isolated from the system and the disparity believed to be growing. Regan and O’Connor [1] suggest several techniques for developing and reinforcing the users’ conceptual model of the interface in systems development. These include using metaphors, avoiding modes, ensuring consistency, making the interface user driven, and making the interface transparent. Providing user with the consistency in interacting with the metaphors that convey the same meaning enhances user control over the electronic service task. In this paper we will focus on easing user task interface in the web-based environment (see figure 1). III. RESEARCH METHOD The problem facing the international admission department is that the staff is not adopting the web electronic service in processing student admission applications; instead they were continuing with the old system that is a traditional paper based service. Printing documents, storing them in folders, and processing correspondence with students through traditional mail were central to the workflow system.
237
Reliance on the paper-based service tended to duplicate and increase task load leading to errors and confusion. As a result of this, the department lag behind in providing good service to its clients (students), resulting in considerable backlog. The department introduced the web electronic service to catch up with the increasing backlog and to improve their service to students. The staff account for their resistance to adopt web electronic service on the basis of factual information such as: it added additional load to their current task, there was a lack of confidence in web electronic service, there was fear of providing wrong information on the web, and there was a resistance to seeking help when required. This case study examined evidence from multiple sources of documentation, open-ended interviews, and participantobservation [19]. The discussions and interviews were openended, the researchers in the beginning provided the topic, and the respondents were probed their opinion about the events. The respondents were asked to explore their understanding of exercising control in the web-based eservice system. Each interview was taped and subsequently transcribed for analysis. A reasonable approach was taken to verify the responses with information from other sources. The respondents were encouraged to provide their own insight into the problem and this was later converged with responses from other respondents and sources. The researchers avoided following sequence of certain set of questions only, as it would have limited the scope of study and may not provided important and rich information. IV.
CASE ANALYSIS
The users were asked about their perception of task control in web-based e-services. The degree of control related to what the users’ were able to do and what they couldn’t, and the amount of effort exerted in doing a similar task offline. There were strong pretexts among users to draw comparisons when the web-based e-service system didn’t meet its expectation. The following statements from a number of participants support this. We need more control…we can get report, but that’s only in numbers, whereas the other databases has more information providing us with more control over information…this makes the work easier…(Participant A) Information that is important to the user and specific in nature and delivered in a meaningful form would make sense and was considered to be useful and easily remembered, other than that it brings complexity to the user task. The importance of this point was expressed by one of the participants: Due to time out period that disconnect from the system, the user has to reenter all the information once again…this creates duplicity of information for us...as the same user is reapplying again and it is hard to differentiate between the same application...(Participant B)
238
SANDHU
The users’ experienced that the ease of use of the new webbased system offered little control in terms of managing the information flow, and demanded complete control and flexibility over information in doing the task, anything short of that resulted in an attitude of resistance towards further usage. Similarly Rogers [20] claims that adoption is a function of variety of factors, including relative advantage and ease of use of the innovations [21]. The relative advantage in controlling the information was essential among users. Regan and O’Connor [1] suggest that there should be consistency when users’ interface with the system reinforcing the users conceptual model of software. Consistency in a web-based task environment relates to screen layout for different tasks to the personalizing of the screen features, to an understanding users’ interaction with different features on the screen, and thus to offering control in the task that was necessary and which was either unknown to the user or missing and thus the control factor had little or no prominence in the systems development [3 and 4]. Giving the user with control on the web-based electronic system necessitated the demand for a web-based electronic platform that would provide the user with control leverage in accomplishing the task. Making the interface user-driven is considered necessary to support the user’s work environment and goals [2]. A task analysis could reveal not only what users want to do and how, but also factors such as the amount of variability in processing sequences, exception conditions, problems, and interfaces with other systems or manual procedures. It was believed that such a feature, if available, would have added to the ease of doing the task and offered usefulness [8 and 10]. General models of employee behavior suggest that behavior is determined by clarity of role, ability to perform, and motivation to perform [22]. Similarly, user behavior in a service production and delivery situation will be facilitated when (1) users understand their roles and know how they are expected to perform, such as clearly defining their task; (2) users are able to perform as expected, that is the minimum standard expected to do the task; and (3) there are valued rewards for performing as expected [23] that is users will be motivated individually and collectively for their achievements. While working with both systems users had forgotten their actual roles within both the systems. One participant disclosed this: When marketing managers were told to handle an inquiry, the reaction was, it wasn’t their job (Participant B) There was confusion among employees in the Case Study in understanding who was responsible in controlling and delegating the task originating from a web-based environment. This demonstrated that either the users were shifting their task to their sub-ordinates or they didn’t understand their contribution and responsibility in the web electronic task system. No minimum standard was set for each task; rather the complete service output was viewed as a final delivery of service. This also reduced the sense of ownership, and the minimum standard expected of doing the task. It was not known whether the users’ were proficient in using the web electronic service database system. This was
assumed on the basis of their past experience and interface with the paper-based system. Working with a database system requires a certain level of expertise for which the users’ were not tested. It may not have been possible for the users to understand the essential requirements of the task and what was expected from it. One participant noted: We can’t change or edit letter templates, can’t do anything, everything is fixed...whereas the other system lets you do (Participant C) These were some basic useful control characteristics that the system should contain to enhance ease of use [7]. Understanding systems usefulness from the users’ perspective was of little importance even when further upgrading the system. Some features if tested before deployment may have enhanced the systems capability to meet users’ expectation and usefulness [8 and 10]. Users were in control when they were able to switch from one activity to another (i.e. from one screen to another), change their minds easily, and stop activities they no longer want to continue. Another participant noted: When I am in between different tasks I am logged off...have to login number of times...due to time out period, loose work when login back…it’s irritating… (Participant D) Users were disconnected from the system within the idletime period, especially between multiple tasks. Instead users were given the option to cancel or suspend the task any time they wished without causing disastrous results. Further consequences of disconnecting users from the system also formed a lower impression among users to disconnect early from the system before the system disconnected, and hence this resulted in lower usage, even though the usage could have been increased between tasks. There was fear among the users providing wrong information to the clients. One user noted: We are afraid to providing wrong information to our clients.… (Participant E) When users are in control, they should be able to explore the system without fear of causing any irreversible mistakes [1]. If the users’ were offered positive, reinforcing learn-byexploring environment to use system for its usefulness and effectiveness it may have improved user task ability. Mistakes should not cause serious or irreversible results. Another participant went a step further and said: They do not seek help when needed… (Participant F) This user did not want to take the responsibility for the mistake and ask for help because of fear. Seeing the screen features helps users to visualize, rather than have to recall, how to proceed. Both the presentation of features and the user interaction should be transparent [1]. One participant noted that: If any information is missed, there is no way to check, there are no compulsory fields to inform of missing information… (Participant G)
A QUALITATIVE APPROACH TO E-SERVICES SYSTEM DEVELOPMENT
Whenever possible, the users should be alerted for missing information. Such features, if present, may offer a better control to the user in using the system. The dialogue between users’ and the web interface provides feedback whenever possible. Users may be prompted to perform an action with visual feedback, audible feedback, or both. The flexibility offered in terms of choosing color, emphasis, and other presentation techniques show users which choices they can select, when a choice has been selected, and when a requested action has been completed. Such features introduce usefulness and ease of use in users’ task [2]. One participant disclosed this in one of the interviews: Screen is fine, fonts are too small, and its hard trying to fit everything in one page…people whose first language is not English would like the fonts to be bigger in size... (Participant H) This point is important that the web-based system was not designed and developed keeping in mind the diverse user category. The composition of user category was local and international. Different user understood the web-based task differently based on their competency of English language. It may have reflected on user experiencing a diminishing interest in the information presented that was difficult to read. Users had been using the data entry screen and those features were compared to be friendly. A data entry screen (form) such as a database input screen created complexity for the user when such friendly features were not available on the web-based system. The screen design though allowed input of data into a system but its interaction behind the user screen was not understood. Screen design principles are related to paper-based form designs [1]. The screen design assesses many principles of screen design, such as character size, spacing, and ease of use [8 and 10]. One participant stated: Site needs to be improved with better features and functionality that will make it easier for us to use… (Participant I) The users constantly voiced their need of the system as something that was easier to use. Consequently further down the system, it got more complicated, not in terms of doing the task but in a way that was rigid in offering control over the task. What then constitutes user friendliness? Many end users list performance support aids [1]. One participant reported such aids as unknown or missing: Mine has been chucked out...haven’t seen one… (Participant J) When using a software program, the user inevitably comes across situations when they (i.e. users) don’t know what to do when faced with a problem [1]. Sometimes error messages are confusing and not meaningful. Users frequently find software manuals, even when well written, of limited value in solving the problem. The reason may be
239
due, at least in part, to the fact that these manuals generally contain everything that anyone would ever need to know about a software package. But they help little in solving problems of different user categories. Since new users have limited experience on which to discriminate, or even formulate their questions, it is often difficult to locate the applicable sections of the manual. The level of control in overcoming the problem situation in the task may require careful understanding of the user expertise based on the user category (i.e. novice, intermediate, or advanced) [3 and 4]. The department in this Case Study outsourced its major development work for web-based electronic systems. Outside programmers had a technical perspective rather than a user perspective, and as a result, the central focus of designing the system for the user was forgotten. It did not include well-written descriptions of the system and directions on how the end-user could accomplish specific tasks. System developers knew the system well; the assumption that others (i.e. users) could use it equally easily was a common misconception. Manuals may effectively document how things are supposed to work, but they omit the exceptions, what can go wrong, and what to do about it. Increasingly, large software development projects are separating the responsibility for documentation writing from the program writing by hiring a technical writer to work with the programmer. At the time of the study an outside consultant was brought in and worked closely with the department IT Manager in developing an electronic workflow. That documentation it was hoped was then likely to be more easily understood by the end-user. There is no internal search engine, if there was one it would have helped in the task… (Participant K) When the documentation is part of the software program, it is considered a help facility [1]. On-screen help facilities offer assistance to the user at the time the problem occurs [24]. Some systems have a specific function key that lists help options alphabetically, requiring the user to specify the type of assistance needed. Other, context-sensitive help facilities assume that the user needs assistance at the point where the help is asked for, such as the role of human web assistants [24]. Such systems then automatically take the user to information that relates to the task in progress. A search engine with in the website offering this functionality would have positive effect on users [25]. These help features were missing in the system studied in this case. Such support features not only offer the user with better control over the task it also forms a positive perception among users in using those aids for subsequent tasks. In other words, they were being developed at the same time the system was developed and were being integrated with task operations.
Martin [26] suggests that many information systems carry a lot of data but little information. Information if not controlled and presented to the user in a meaningful way may create complexity in the user task. There is a tendency among users to reject data from information systems when it has little chance of helping them make decisions. When information received by the users that was incomplete or
240
SANDHU
was missing, the users’ attitude towards the system was negative, for example: It is inadequate in conducting the task…the system is not 100% ready… (Participant L) The users were helpless in exercising the degree of control that was very much needed in doing the task. In that situation making sense of incomplete or missing information was increasing the task load and adding to the user frustration. The task features (see Figure 1) were expected to improve the level of user control in doing the task. This may not be the ultimate balance of the overall task management but may develop into a specific area that can be applied in knowing if the electronic task meets with the user control requirement. Demonstrating the need of such requirement in task control may also point towards how easy or how difficult it is to perform the task on electronic platform depending on the user category. Zipf [27] developed a model of human behavior that presumes humans have a desire to escape sustained effort. Zipf’s formal hypotheses were that “as task becomes more difficult, fewer humans attempt that task”. The more difficult the task (effort and time), the more the number of people abandoning that task will rise after they have started. This may apply to some extent on the task control factor, when the user cannot exercise a certain degree of control that is needed in doing the task and may abandon it completely. In such situations the user might also find known ways to do the task, this will depend on the effort the user is willing to exert in completing the task. Usually the effort exerted in having control over the task may change due to the location (or situation) of the task. One participant revealed: At home I am more motivated in using the system, as I am relaxing; at work I am hurrying as I have to do this quick…do that… (Participant M) Users won’t exert the same level of effort in doing different tasks, in different locations (or situations) and there will be a degree of variance in the effort exerted. Seemingly if the control factor is fixed some users might complete the task others might drop out if it were difficult. If the control factor were variable (i.e. can be adjusted) the chances of completing the task among different user categories (i.e. novice, intermediate, advance) might increase. It may not be true but may prove to be a guiding factor that can be considered when implementing systems. We may debate “what is the degree of control the users demand”, the answer to this may not be straight forward and would differ from system to system; from users to users; and from task to task. There may not be a general scale (not yet) that can be applied across the web-based electronic systems platform. This can only be understood as the system is developed and pilot tested among different users to discover which component functionality offers the optimum degree of control in doing the task. This information can then be integrated into the system and later checked for measuring
the degree of control performance that is needed by the users in the task. Effective control in doing the task also requires investment of time and different inputs that are needed in completing the task. This may require an analysis of the system that such characteristics exist in the web-based electronic platform facilitating the task process. The complete task process that is completed electronically will require an infrastructure that will support the different phases of the task. V. CONCLUSIONS The preceding discussion based a case study where the management and the systems development team attempted to improve user-control over task in web-based electronic service environment without implementing such characteristics in the systems. The development in technology is designed to provide users with ease of use and better control over the task. The system control is expected to enhance information control, reduce paperwork, remove duplicity in task, organize task sequence, reduce manual records, and improve task efficiency. Delegating better control on the task in different user categories needs careful understanding of the web-based electronic systems infrastructure. This requires outlining the scope of the system and what is expected of it in the service process. Systems undergo changes and restructuring as part of the systems development life cycle and the improved control structure in upgrades that meets the users’ requirement is of relevance. It is important to remember the degree of control users have in doing the task. Often the degree of control is less or more than expected. Getting the right balance in user control will require an understanding of the task and technology processes. For this purpose there should be a close coordination between the systems development team and the end-users of that system. Detailing the task process and developing the system from the perspectives of the user is important. Often this criterion is easily forgotten when developing the system. Significantly improvement in task control structures will equally outline what the system is capable of doing with in its parameters. At times certain tasks may require human coordination that is beyond the scope of the software interface, in such situation user’s involvement in developing the service may require moderate amount of inputs and depending on that will be the output. The control characteristic that is available to the users in the systems is of relevance as to what the user can do within the given parameters. Similarly the control factors existing within the web-based electronic service will perform those functions that are built in and based on what the users want to achieve within the flexibility. There may be scope for varying the systems structure depending on the users control requirements. In order to have the control structure varied the system should be capable of being restructured, which at times may be difficult or too expensive. To accommodate restructuring in the system will require a consecutive restructuring in the users control application. Sometimes this might complicate things further; but there are aspects of the technology that can absorb complexities leaving the user with the simpler task. Such aspects if illustrated in systems
A QUALITATIVE APPROACH TO E-SERVICES SYSTEM DEVELOPMENT
development may improve control structures in the system and in user application. REFERENCES [1] Regan, E. A., and O’Connor, B.N. (1994). End-user Information Systems, Perspectives for Managers and Information Systems Professionals, Macmillan Publishing Company. [2] Goodhue, D.L. (1998). “Development and Measurement Validity of a Task-Technology Fit Instrument for User Evaluations of Information Systems.” Decision Sciences 29(1): 105-138.
241
[14] Moon, J., and Kim, Y. (2001). “Extending the TAM for a world-wide-web context.” Information and Management 38: 217-30. [15] Wright, K.M., and Granger, M.J. (2001). Using the web as a strategic resource: an applied classroom exercise. Proceedings of the 16th Annual Conference of the International Academy for Information Management, New Orleans, Louisiana [16] Davies, T. (1999). “Going with the Flow: Web sites and customer involvement.” Internet Research: Electronic Networking Applications and Policy 9(2): 109-116.
[3] Sandhu, K and Corbitt, B (2002). End-user control in web-based electronic services: case study. Working paper.
[17] Taylor, S., and Todd. P.A. (1995). Understanding Information Technology usage: A test of competing models. Inform. Systems Res. 6(2) 144-176
[4] Sandhu, K., and Corbitt, B (2002). “Exploring an understanding of Electronic Service end-user adoption,” The International Federation for Information Processing, WG8.6, Sydney.
[18] Karahanna, E., Straub, D.W., and Chervany, N.L., (1999). “Information Technology Adoption Across Time: A Cross-Sectional Comparison of Pre-Adoption and PostAdoption Beliefs.” MIS Quarterly 23(2): 183-213.
[5] Ajzen, I. (1985). From intentions to actions. A theory of planned behavior. J. Kuhl, J. Beckmann, eds. Action Control: From Cognition to Behavior. Springer Verlag, New York, 11-39.
[19] Yin, R. K. (1994). Case Study Research, Design and Methods, Sage Publications.
[6] Ajzen, I. (1991). The theory of planned behavior. Organ. Behavior and Human Decision Processes 50(2) 179-211. [7] Venkatesh, V. (2000). “Determinants of Perceived Ease of Use: Integrating Control, Intrinsic Motivation, and Emotion into the Technology Acceptance Model. Information Systems Research 11(4): 342-65. [8] Davis, F. D. (1989). “Perceived usefulness, perceived ease of use, and user acceptance of information technology.” MIS Quarterly 13(2): 319-40. [9] Davis, F. D., Bagozzi, R.P., and Warshaw, P.R. (1989). “User Acceptance of computer technology: A comparison of two theoretical models.” Management Science 34(8): 9821002. [10] Davis, F. D. (1993). “User Acceptance of information technology: systems characteristics, user perceptions and behavioral impacts.” International Journal of Man-Machine Studies. 38(3): 475-87. [11] Heijden, H (2000). “Using the Technology Acceptance Model to predict website usage: Extensions and Empirical test. Research Memorandum 2000-25, Vrije Universiteit, Amsterdam. [12] Gefen, D., and Straub, D. (2000). “The relative importance of perceived ease of use in IS adoption: A study of e-commerce adoption.” Journal of the Association for Information Systems 1 Article 8: 1-20. [13] Steer, D., Turner, P., and Spencer, S. (2000). “Issues in adapting the Technology Acceptance Model (TAM) to investigate non-workplace usage behavior on the worldwide-web. School of Information Systems, University of Tasmania.
[20] Rogers, E.M. (1983). Diffusion of Innovations , Free Press, New York, NY. [21] Adams, D.A., Nelson, R.R., and Todd, P.A. (1992). “Perceived Usefulness, Ease of Use, and Usage of Information Technology: A Replication.” MIS Quarterly, June. [22] Zeithaml, V.A. and Bitner, M.J. (2000). Services Marketing, Integrating Customer Focus Across the Firm, McGraw-Hill Higher Education, 2nd edition. [23] Schneider, B., Bowen, D.E. (1993), “The Service Organization: Human Resources Management is Crucial”, Organizational Dynamics, No.Spring, pp.39-52. [24] Åberg, J. and Shahmehri, N. (2000). “The role of human Web assistants in e-commerce: an analysis and a usability study.” Internet Research: Electronic Networking Applications and Policy 10(2): 114-125. [25] Zeithaml, V.A., Parasuraman, A., and Malhotra, A. (2000). “A Conceptual framework for understanding eService Quality: Implications for future Research and managerial Practice.” Marketing Science Institute, Working paper, Report no: 00-115 [26] Martin, M. P. (1991). Analysis and design of business information systems, Macmillan Publishing Company. [27] Zipf, G.K. (1949). Human Behavior and the Principle of Least-Effort. Addison-Wesley, Cambridge, MA. [28] Bianchinia, D., Antonellisa, V., Pernicib, B., and Plebanib, P. (2006). “Ontology-based methodology for eservice discovery.” Information Systems 31 (2006) 361– 380”
Construction of Group Rules for VLSI Application Byung-Heon Kang, Dong-Ho Lee, and Chun-Pyo Hong Dept. of Computer and Communication Engineering, Daegu University, Kyungsan, 712-714, Korea [email protected], [email protected], [email protected]
Abstract-An application of cellular automata (CA) has been proposed for building parallel processing systems, cryptographic hashing functions, and VLSI technology. Recently, Das et al. have reported characterization of reachable/non-reachable CA states. Their algorithm has only offered the characterization under a null boundary CA (NBCA). However, in the simplifications of hardware implementation, a periodic boundary CA (PBCA) is suitable for constructing cost-effective algorithms such as a linear feedback shift register (LFSR) structure because of its circular property. Therefore Kang et al. have provided an algorithm for deciding of group/non-group CA based on periodic boundary condition. However, they did not have provided an algorithm for constructing of group CA rules. Thus, this paper presents an efficient evolutionary algorithm for constructing of group CA rules based on periodic boundary condition. We expect that the proposed algorithm will be efficiently used in variety of applications including cryptography and VLSI technology.
effective algorithms such as a linear feedback shift register (LFSR) structure because of its circular property. Therefore Kang et al. have provided an algorithm for deciding of group/non-group CA based on periodic boundary condition [10]. However, they did not have provided an algorithm for constructing of group CA rules. Thus, in this paper, we provide algorithms to decide group/non-group CA and construct the group CA rules based on periodic boundary condition. Our algorithm will be used in applications of pseudorandom number generator (PRNG) and VLSI technology, and so forth. In the next Section, we give a brief description of a CA. Section 3 and 4 provide evolutionary algorithms. Finally, the conclusion is remarked in Section 5. II. CELLULAR AUTOMATA
I.
INTRODUCTION
Study of the homogeneous structure of CA and its evolutions were initiated in the early 1950s as a general framework for modeling complex structures capable of self-reproduction and self-repair, and the compiled work was reported in [1]. A new phase of activities started with Wolfram, who pioneered the investigation of CA mathematical models for self-organizing statistical systems [2]. He identified several characteristic features of self-organization in uniform three-neighbourhood finite CA with two states per cell, and has reported onedimensional, periodic boundary additive CA with the help of polynomial algebra [3]. Studies of null and periodic boundary CA and some experimental observations have also been reported by Pries et al. [4]. In the early phase of development, application of CA was proposed for building parallel multipliers [5], and prime number sieves [6]. Subsequently, in the early 1970s, a CA has been proposed as parallel processing computers [7] and for building a sorting machine. Particular, two-dimensional CA has been used extensively for image processing and VLSI technology [8]. A group CA has been projected as a generator of pseudorandom numbers of high quality, and a class of nongroup CA has been established to be an efficient hashing function generator. Recently, characterizations of reachable/non-reachable CA states and group/non-group CA were reported by Das et al. [9]. Their algorithms only offered the characterization under a null boundary CA (NBCA). However, in hardware implementation cost, a periodic boundary CA (PBCA) is suitable for constructing cost-
Cellular automata (CA) are collection of simple cells usually in a regular fashion and dynamic system which space and time are discrete. The value of each cell in the next stage is determined by the value of the cell and its neighbour cells in the current stage under the local rule. The state of an n-cell CA at time t can be represented by a characteristic polynomial [3], P (t ) ( x) =
n −1
∑q
(t ) i
⋅ xi
(1)
i =0
where the value of cell’s position i is the coefficient of xi and qi(t ) is the output state of the ith cell at the tth time step. Periodic boundary conditions in a CA are implemented by reducing the characteristic polynomial modulo the fixed polynomial (xn-1) at all stages, as per the relation n −1
i n i ∑ qi ⋅ x mod ( x − 1) = ∑ ∑ ( qi + j ⋅n ) x i
i =0 j
(2)
Time evolution in a CA can be represented by multiplication of the characteristic polynomial P(x) by a fixed polynomial containing both positive and negative power terms T(x). Thus, P (t ) ( x) = T ( x) ⋅ P (t −1) ( x) mod ( x n − 1) (3) where arithmetic is performed in GF(2). The time evolution of CA states has been represented by iterative multiplication of polynomials [3]. Extensive analysis of a CA in terms of topological characterization of the statetransition diagram based irreversibility, cyclic components, depths, and so on, has been carried out with polynomial algebra. In the positional notation, a cell is represented by x0,
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 242–244. © Springer Science+Business Media B.V. 2008
CONSTRUCTION OF GROUP RULES FOR VLSI APPLICATION
243
TABLE Ⅱ while its left and right neighbours are represented as x-1 and x+1, STATE TRANSITION ARRAY respectively. The fixed polynomial T(x) corresponding to rule 111 110 101 100 011 010 001 000 90 is x-1 ⊕ x+1 (⊕ denoting the modulo-2 operator). RMT Rule (7) (6) (5) (4) (3) (2) (1) (0) The rule 90 is involved only XOR logic. In this case, if the Cell 1 0 1 0 1 d d d d 90 rule of a CA cell includes only XOR logic, it is called a linear Cell 2 1 1 0 0 1 1 0 0 204 Cell 3 0 0 1 1 1 1 0 0 60 rule. Otherwise, rule involving XNOR logic like a rule 195 is Cell 4 d 1 d 0 d 1 d 0 102 referred to as complemented rule. If all the cells of a CA have linear rule, it is called a linear CA. A CA having a combination For example, let us consider (i) the CA having a rule (90, of XOR and XNOR rules is called an additive CA, whereas a CA having AND-OR rule is non-additive CA. A CA is called a 204, 60, 102), (ii) the state <0 1 0 1> having the first and last uniform CA if all the CA cells have the same rule, otherwise it cell are 0 and 1, respectively. Table 2 shows state transition is called a hybrid CA. If the left (right) neighbour of the from Algorithm 1. leftmost (rightmost) terminal cell is connected to 0-state, then the CA is called a null boundary CA (NBCA). Otherwise, if Input: rule[n], state[n], n the extreme cells are adjacent to each other, the CA is called a Output: Rule[n][8] periodic boundary CA (PBCA). Step 1. Converts from each decimal values of a rule to the binary If a CA contains only cyclic states, it is called a group CA. values. Each state of a group CA has only one predecessor. So, all Step 2. Classifies four cases on the value of the first and last cell states of a group CA are reachable state. But if a CA contains of a state. both cyclic and non-cyclic states, it is called a non-group CA. Case 1: Any states of a non-group CA have no predecessor or a If state[0]=0 and state[n-1]=0, number of predecessors. Therefore, any states of a non-group Rule[0][i]=‘d’, where i=4,5,6,7 CA are reachable or non-reachable state. The proposed Rule[3][i]=‘d’, where i=1,3,5,7 algorithms report the construction of a group CA rules based Case 2: on periodic boundary condition. If state[0]=0 and state[n-1]=1, Rule[0][i]=‘d’, where i=0,1,2,3 III. AN ALGORITHM FOR STATE TRANSITION Fig. 3. Example of Step 3: (a) previous (b) next Rule[3][i]=‘d’, where i=1,3,5,7 In this section, we provide an algorithm for state transition. Case 3: A state having n cells is presented by <S0 S1 ⋅⋅⋅ Sn-2 Sn-1>. At the If state[0]=1 and state[n-1]=0, first, we assign each possible neighbourhood configurations Rule[0][i]=‘d’, where i=4,5,6,7 about each rule and present each rule as rules[n]. At the Rule[3][i]=‘d’, where i=0,2,4,6 second, we classify four cases according to the value of the Case 4: first and last cell of a state. The classified four cases are (i) <0 If state[0]=1 and state[n-1]=1, S1 ⋅⋅⋅ Sn-2 0>, (ii) <0 S1 ⋅⋅⋅ Sn-2 1>, (iii) <1 S1 ⋅⋅⋅ Sn-2 0> and (iv) Rule[0][i]=‘d’, where i=0,1,2,3 <1 S1 ⋅⋅⋅ Sn-2 1>. The values of the leftmost and rightmost cell Rule[3][i]=‘d’, where i=0,2,4,6 of NBCA are all 0. Therefore, if those values are 0, then the
next value of cell 1 and cell 4 is represented as d (don’t care). TABLE I STATE TRANSITION OF RMTS
Fig. 1. Algorithm 1. DecideStateCase
IV. CONSTRUCTING ALGORITHM FOR GROUP CA RULES
RMT at ith rule
RMT at (i+1)th rule
0 or 4 1 or 5 2 or 6 3 or 7
0, 1 2, 3 4, 5 6, 7
The following Algorithm 1 classifies a state case according to the value of the first and last cell. Rule Min Term (RMT) is the decimal value of present states [9]. Table 1 offers the set of RMTs S′i+1 at (i+1)th cell rule on which the cell can change its state for an RMT chosen at the ith cell rule for state change. A two-dimensional array with n rules is referred to as a Rule[n][8].
This section provides an algorithm for constructing of group CA rules using Algorithm 1. We organize rules for group CA based on period boundary condition in Algorithm 2. For understanding, we report a process for a group CA having rules based on period condition. (i) We select the first rule 90 having a condition S[0] = {5, 7} and S[1] = {4, 6}. (ii) When i = 1, we determine the next rule 204 because of S′[0] = {0, 1, 4, 5} = {0, 1}, S′[1] = {2, 3, 6, 7} = {2, 3}, S′[2] = {0, 1, 4, 5} = {0, 1}, S′[3] = {2, 3, 6, 7} = {2, 3}. The numbers between S[0] = {0, 1} and S[1] = {2, 3} are the same, so the PBCA is a group until this phase.
244
KANG ET AL.
(iii) When i = 2, If we choose a rule 60, S′[0] = {0, 1}, S′[1] = {2, 3}, S′[2] = {2, 3}, S′[3] = {0, 1}. Then, we get to S[0] = {0, 1} and S[1] = {2, 3}. So, the PBCA is a group until this phase. (iv) Finally, we select the last rule 102 because of S′[0] = {3}, S′[1] = {1}, S′[2] = {3}, S′[3] = {1}. Then, we get to S[0] = {1, 3} and S[1] = {1, 3}. The numbers between S[0] and S[1] are the same. So, the PBCA having the rules (90, 204, 60, 102) is a group. Input: n Output: An n-cell group CA rules. Step 1. Select the first rule having a condition that |S[0]| = 2 for Rule[0][j] = 0 and |S[1]| = 2 for Rule[0][j] = 1, j = 4, ⋅⋅⋅ , 7. Step 2. For i = 1 to n-1 Step 2.1 For j = 0 to 1 (a) Determines the RMTs for the next level using Table 1 of S[j]. (b) Distributes these RMTs into S′[j] and S′[j+1], such that S′[j] and S′[j+1] contain that the values of RMTs are 0 and 1, respectively for Rule[i][k]=(i-1) , k=0,⋅⋅⋅,7. (c) If S′[j] and S′[j+1] have 4, 5, 6 and 7, replace by 0, 1, 2 and 3, respectively. (d) Removes duplicated sets from S′ and assign the sets of S′ to S. (e) Select the next rule having a condition that the numbers between |S[0]| = |S[1]|. Step 3. For j = 0 to 1 (a) Determines the RMTs for the next level using Table 1 of S[j]. (b) Distribute 0 and 1 randomly in the last rule. Step 4. Report the PBCA as an n-cell group CA. Fig. 2. Algorithm 2. ConstructGroupCARules
V. CONCLUSION Constructing of a group CA is proposed in this paper and two evolutionary algorithms are introduced. These efficient algorithms are based on a periodic boundary condition. The proposed schemes have the same time complexity as Das et al.’s scheme which has only offered the characterization under null boundary conditions, while our schemes are based on periodic boundary conditions. Moreover, we have reduced the number of loops in comparison with Algorithm 2 of Das et al.’s algorithm. We have removed one loop in Algorithm 2 by proceeding simultaneously with step 3 and 4 of Das et al.’s algorithm. Therefore, our scheme reduced the program processing time. And, the execution time of our algorithms depends on n. Hence, the complexity of our programs is O(n) respectively. A class of non-group CA has been established to be an efficient hashing function generator, while a group CA has been projected as a generator of pseudorandom numbers of
high quality. Moreover, for hardware implementation, a PBCA is suitable for the construction of cost-effective schemes such as a LFSR structure because of its circular properties. Thus, we expect that our algorithm can be used in VLSI technology. REFERENCES [1]
J. V. Neumann, The Theory of Self Reproducing Automata, University of Illinois Press, Urbana and London, 1996. S. Wolfram, “Statistical Mechanics of Cellular Automata”, Reviews of Modern Physics, APS physics, 1983, Vol. 55, pp. 601-644. [3] O. Martin, A. M. Odlyzko, and S. Wolfram, “Algebraic properties of cellular automata”, Communications in Mathematical Physics, SpringerVerlag, New York, 1984, Vol. 93, pp. 219-258. [4] W. Pries, A. Thanailakis, and H. C. Card, “Group properties of cellular automata and VLSI application”, IEEE Transactions on Computers, IEEE, 1986, Vol. 35, No. 12, pp. 1013-1024. [5] A. J. Atrubin, “A One–Dimensional Real-Time Iterative Multiplier”, IEEE Transactions on Computers, IEEE, 1965, Vol. EC-14, No. 13, pp. 394-399. [6] P. C. Fischer, “Generation of Primes by a One-Dimensional Real-Time Iterative Array”, Journal of ACM, ACM Press, New York, 1965, Vol. 12, No. 3, pp. 388-394. [7] F. B. Manning, “An Approach to Highly Integrated, ComputerMaintained Cellular Arrays”, IEEE Transactions on Computers, IEEE, 1977, Vol. C-26, No. 6, pp. 536-552. [8] A. Rosenfeld, Picture Language, Academic Press, New York, 1979. [9] S. Das, B. K. Sikdar and P. P. Chaudhuri, “Characterization of Reachable/Nonreachable Cellular Automata States”, Lecture Notes in Computer Science, Springer-Verlag, New York, 2004, LNCS 3305, pp. 813-822. [10] B. H. Kang, J. C. Jeon and K. Y. Yoo, “Decision Algorithms for Cellular Automata States Based on Periodic Boundary Condition”, Lecture Notes in Computer Science, Springer-Verlag, New York, 2006, LNCS 4173, pp. 104-111. [2]
Implementation of an Automated Single Camera Object Tracking System Using Frame Differencing and Dynamic Template Matching Karan Gupta1, Anjali V. Kulkarni2 1 [email protected], National Institute of Technology, Nagpur, India 2 [email protected], Indian Institute of Technology, Kanpur, India
Abstract - In the present work the concepts of dynamic template matching and frame differencing have been used to implement a robust automated single object tracking system. In this implementation a monochrome industrial camera has been used to grab the video frames and track a moving object. Using frame differencing on frame-by-frame basis, a moving object, if any, is detected with high accuracy and efficiency. Once the object has been detected it is tracked by employing an efficient Template Matching algorithm. The templates used for the matching purposes are generated dynamically. This ensures that any change in the pose of the object does not hinder the tracking procedure. To automate the tracking process the camera is mounted on a pan-tilt arrangement, which is synchronized with a tracking algorithm. As and when the object being tracked moves out of the viewing range of the camera, the pan-tilt setup is automatically adjusted to move the camera so as to keep the object in view. Here the camera motion is limited by the speed of the stepper motors which mobilize the pan-tilt setup. The system is capable of handling entry and exit of an object. Such a tracking system is cost effective and can be used as an automated video conferencing system and also has application as a surveillance tool. I. INTRODUCTION
Object tracking is central to any task related to vision systems. Tracking objects can be complex [1] due to loss of information caused by projection of the 3D world on a 2D image, noise in images, complex object motion, non rigid or articulated nature of objects [2], partial and full object occlusions, complex object shapes, scene illumination changes, and real-time processing requirements. Tracking is made simple by imposing constraints on the motion and/or appearance of objects. In our application we have tried to minimize the number of constrains on the motion and appearance of the object. The only constraint on the motion of the object is that it should not make sudden changes in the direction of its motion while moving out of the viewing range of the camera. Unlike other algorithms [3] the present algorithm is capable of handling the entry and exit of an object. Also, unlike [2],[4] , no colour information is required for tracking an object. There is no major constraint on the appearance of the object, though an object which is a little brighter than the background gives better tracking results.
There are three key steps in the implementation of our object tracking system: • Detection of interesting moving objects, • Tracking of such objects from frame to frame, • Analysis of object tracks to automate the pan-tilt mechanism A. Frame Differencing The system first analyses the images, being grabbed by the camera, for detection of any moving object. The Frame Differencing algorithm [5], [6] is used for this purpose, which gives as output the position of the moving object in the image. This information is then used to extract a square image template (of fixed size) from that region of the image. The templates are generated as and when the appearance of the object changes significantly. B. Dynamic Template Matching The newly generated template is then passed on to tracking module, which starts tracking the object taking the template as the reference input. The module uses template-matching [7],[4] to search for the input template in the scene grabbed by the camera. If the object is lost while tracking (signifying that the object has changed its appearance) a new template is generated and used. Since the image templates, being used for matching, are generated dynamically, the process is called Dynamic Template Matching [8]. C. Pan-Tilt Mechanism The movement of the object is analyzed for automation of the Pan-Tilt mechanism [9]. Depending upon the movement of the object, the pan-tilt mechanism is operated to keep the object in the camera’s view. This paper is organized as follows. Section II describes in detail the algorithm employed in our automated tracking system. The hardware setup of the pan-tilt setup is discussed briefly in Section III. In Section IV, we present the experimental results of a single person tracking example. Section V is devoted to conclusions.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 245–250. © Springer Science+Business Media B.V. 2008
246
GUPTA AND KULKARNI II. THE ALGORITHM
In this section we will explain and analyze the algorithm employed in our automated tracking system. The discussion and analysis of the algorithm is divided into four subsections. A. Object Detection using Frame Differencing Identifying moving objects from a video sequence is a fundamental and critical task in an object tracking vision application. The approach we choose was to perform frame differencing [5] on consecutive frames in the image acquisition loop, which identifies moving objects from the portion of a video frame that differs significantly from the previous frame. The frame method basically employs the image subtraction operator. The image subtraction operator [10] takes two images as input and produces as output a third image whose pixel values are simply those of the first image minus the corresponding pixel values from the second image. The subtraction of two images is performed straightforwardly in a single pass. The output pixel values are given by: Q(i , j) = P1(i , j) – P2(i , j)
(1)
There are many challenges in developing a good frame differencing algorithm for object detection. First, it must be robust against changes in illumination. Second, it should avoid detecting non-stationary background objects such as moving leaves, rain, snow, and shadows cast by moving objects. In our efforts to develop a high performance algorithm for object tracking we have tried to overcome the difficulties in our object detection module. The images obtained from the camera are preprocessed to eliminate unwanted disturbances. This includes application of a thresh-holding operation [10] followed by the use of iterative-morphological erosion operation [10][11]. By performing these operations we were able to desensitize the object detection from inaccuracies in the background. The goal of the object detection module is to provide the object tracking module with the positional information of the moving object. Algorithm for Object Detection Module:
from one frame to other). Such a method fails to detect slow moving objects. Therefore to remove this limitation we subtract ith frame from the (i-3)th frame to ensure a detection which is independent of speed.
4. Perform the Binary Thresh-holding Operation on fr separating the pixels corresponding to the moving object from the background .This operation also nullifies any inaccuracies introduced due to the camera flickering. The result of this operation is a binary image, fb , wherein only those pixels are set as ‘1’ which correspond to the moved object. In the thresh-holding technique a parameter called the brightness threshold (T) is chosen and applied to the image f[m,n] as follows: IF ELSE
This version of the algorithm assumes that we are interested in light objects on a dark background. For dark objects on a light background we would use: IF ELSE
f[m,n]≤ T
The image buffer is an array of image variables which are used for temporary storage of frames. The array is programmed to behave as a queue with only three elements at any given point of execution.
3. Perform Frame Differencing Operation on the ith frame and the (i-3)th frame where the resultant image is represented as fr (8 bit gray-scale) fr = fi – fi-3 (2) |Here it is noticeable that instead of subtracting the ith frame from the (i-1)th
frame we are subtracting the ith frame from the (i-3)th frame. This has been done taking into consideration that even slow moving objects should be detected. It has been observed that image subtraction on consecutive frames detects only fast moving objects (objects whose position changes noticeably
fb[m,n]=object=1 fb[m,n]=background=0
While there is no universal procedure for threshold selection that is guaranteed to work on all images, there are a variety of alternatives. In our case we are using fixed threshold (a threshold that is chosen independently of the image data). As our main objective is to separate the object pixels from that of the background this approach gives fairly good results.
5. Perform an Iterative Mathematical Morphological Erosion Operation on fb to remove really small particles from the binary image. The result of this step is again a binary image, fbb. This step ensures that small insignificant movements in the background are ignored ensuring better object detection. 6. Calculate the center of gravity (COG) of the binary image fbb .The result of this operation is a set of two integers C(cog_x ,cog_y) which determines the position of the moving object in the given scene. The COG is calculated by: cog_x =cog_y + x cog_y = cog_y + y Total = Total + 1
(3a) (3b) (3c)
for each pixel where x, y is the current pixel location. The resulting COG is then divided by the Total value: cog_x = cog_x/Total cog_y = cog_y/Total
1. Grab the ith frame fi (8 bit gray-scale) 2. Retrieve the (i-3)th frame fi-3 from the image buffer.
fb[m,n]=object=1 fb[m,n]=background=0
f[m,n]≥T
(3d) (3e)
to result in the final x, y location of the COG.
7. Transfer the positional information C(cog_x ,cog_y) to the object tracking module. 8. Store the frame fi in the image buffer and discard the frame fi-3 9. Increment i by 1 10. Goto (Step 1) Fig. 1 shows the different stages on application of the above algorithm to a set of consecutive frames.
IMPLEMENTATION OF AN AUTOMATED SINGLE CAMERA OBJECT TRACKING SYSTEM
247
has changed its appearance) a new template is generated and used. As the templates for searching are being generated dynamically the object tracking module is said to be employing the dynamic template matching algorithm. Further on the (x, y) co-ordinates of the tracked object are calculated and passed on to the pan-tilt module for analysis and automation.
.
Algorithm for the Tracking Module: (a) Sample Frame i
(b) Sample Frame i-3
1 Get the positional information C(cog_x ,cog_y) of the object from the Object Detection Module.
(c) Result of image subtraction on frame i and frame i-3
(d) Result of thresh-holding operation on image 1 (c)
2 Generate a image template Ti by extracting a square image from the last frame grabbed by the camera. The template is extracted in the form of a square whose image coordinates are given by -Top, Left corner: (cog_x - 100, cog_y - 100) -Top, Right corner: (cog_x + 100, cog_y - 100) -Bottom, Left corner: (cog_x - 100, cog_y + 100) -Bottom, Right corner: (cog_x + 100, cog_y + 100) 3 Search the generated template Ti in the last frame grabbed by the camera. This is done by using an efficient template matching algorithm as in [10].
(e) Result of morphological erosion operation on image 1(c)
(f) Center of gravity determination
Fig. 1 Different stages in application of Frame Differencing algorithm to detect a moving object.
B. Object Tracking using Dynamic Template Matching The aim of our object tracker module is to generate the trajectory of an object over time by locating its position in every frame of the video. In our implementation the tasks of detecting the object and establishing correspondence between the object instances across frames are performed simultaneously (i.e multithreaded). The possible object region in every frame is obtained by means of our object detection module, and then our tracker module corresponds to the object across frames. In our tracking approach the object is represented using the Primitive Geometric Shape Appearance Model [1],[7] (i.e the object is represented as a rectangle). The limitation of the frame differencing algorithm in not being able to detect stationary objects is the reason why the template matching process must be coupled with it to create a good tracking system. The object tracking module takes the positional information C(cog_x ,cog_y) of the moving object as an input from the object detection module. This information is then used to extract a square image template (whenever required) from the last acquired frame. The module keeps on searching it in the frames captured from that point. Whenever found it displays a red overlaid rectangle over the detected object. If the template matching doesn’t yield any result (signifying that the object
4 IF the template matching is successful THEN IF the tracker has NOT detected motion of the object AND the detector has THEN goto STEP 1 (get a new template) ELSE goto STEP 5 (get the x, y position) ELSE goto STEP 1 (get a new template ) 5 Obtain the position P(x, y) of the match and pass it on to the pan-tilt automation module for analysis. 6 Goto STEP 3 The output of Object Tracking is further utilized by the Pan-Tilt Automation module to operate the pan or tilt motors according to the behavior of the object. Fig. 2 shows the dynamically generated template in correspondence with the center of gravity co-ordinates determined in the Fig 1(f) C. Pan-Tilt Automation The Pan-Tilt Automation module serves the purpose of operating the pan-tilt motors depending upon the motion behavior of the object being tracked. The goal of the module is to operate the motors such that the object remains in view all the time. This module takes the position C(cog_x, cog_y) of the tracked object as the input for analysis. Depending upon the direction of movement of the object the pan-tilt mechanism is operated. Any object is allowed to enter the viewing range of the camera. This means that the
248
GUPTA AND KULKARNI
4
IF ( x >700 )
{ region C} AND (DH = RIGHT) THEN Pan Right ELSE IF ( x < 100 ) { region A} AND (DH = LEFT) THEN Pan Left ELSE IF ( y < 100 ) { region B} AND (DV = UP) THEN Tilt Up ELSE IF ( y > 500 ) { region D} AND (DV = DOWN) THEN Tilt Down ELSE goto STEP 1
The commands Pan Left, Pan Right, Tilt Down and Tilt Up have the following meanings -Pan Left : The pan stepper motor is turned by a fixed angle Anticlockwise -Pan Right : The pan stepper motor is turned by a fixed angle clockwise -Tilt Up : The tilt stepper motor is turned by a fixed angle Anticlockwise -Tilt Down : The tilt stepper motor is turned by a fixed angle clockwise The regions corresponding to the conditions above are shown in Fig. 4.
Fig. 2. A template generated dynamically based on the Center of Gravity information of the moving object
mechanism is not activated for an object which enters the scene, it’s only activated when the object under consideration tries to leave the scene. In such a scenario the camera is panned or tilted in the direction of the movement of the object. This ensures that the object stays in view of our tracking system even if it tries to leave the viewing range of the camera.
One important thing to be noted is that during the time when the motors are turning the object tracking is stopped. The system assumes that after the fixed angle rotation the object comes back in view. This limitation is due to the fact that frame differencing algorithm has been employed for object detection. Algorithms such as those described in [8], [9] (which can track objects while the camera is in motion) can be used but they are computationally expensive. Another thing to be noted is that for the object being tracked, the maximum speed at which it can be tracked successfully is limited by the maximum speed at which the pan/tilt motors can operate. Faster motors give better results.
Algorithm for the Pan-tilt Automation Module:
1 Get the position P(x, y) of the tracked object from the Object Tracking Module. 2 Get the horizontal direction of movement (DH) of the tracked object (as shown in Fig. 3.) 3 Get the vertical direction of movement (DV) of the tracked object (as shown in Fig. 3.)
Fig. 4. Diagram showing different regions of the camera view and the pan-tilt directions
D. Merging the three modules Having discussed all the three main modules of our object tracking system we will now show how the three different process have been brought together to form a single algorithm. The three processes are Multithreaded for concurrent execution. Hence instead of executing sequentially they execute parallel to each other. The data exchanged between the three processes is shown in Fig. 5. This approach has a major advantage, the simultaneous execution of the three process ensure that the object in question is detected and tracked immediately i.e. there is no lag between detection and tracking of the object. III. HARDWARE SETUP
Fig.3. Values of DH and DV depending upon the direction of motion
This section deals with the pan-tilt arrangement of the tracking system. The system utilized two stepper motors to
IMPLEMENTATION OF AN AUTOMATED SINGLE CAMERA OBJECT TRACKING SYSTEM
249
account for the pan and tilt motion of the camera. As presented in Fig. 6 each motor is operated using an electronic stepper motor driver circuit. The motor driver circuits are in turn controlled by the tracking program through the parallel port interface. The tilt motion of the camera is governed by the stepper motor installed vertically whereas the pan motion of the camera is governed by the motor installed horizontally. The driver circuit used to operate the motors has three inputs which are connected to the parallel port of the computer. These three inputs are: 1. 2. 3.
Motor On/Off: The motor is turned on as long as a + 5V signal is given to this line Direction Control: A zero volt signal defines the clockwise rotation whereas +5 volts defines the anticlockwise direction. Speed Control: A square wave is provided to this input line. The frequency of the square wave determines the speed of rotation of the motor. Fig. 7 Actual pan-tilt setup
IV. EXPERIMENTS AND RESULTS
Fig. 5. Concurrent execution of the ‘Object Detection’, ‘Object Tracking’ and the ‘Pan-Tilt Automation’ modules
180o
TILT
PAN
Fig. 6 A schematic diagram of the Pan-Tilt setup
Results are presented of the experiment performed using the tracking setup shown in Fig 7. The experiment was performed using a dual core Pentium machine fitted with a high performance image acquisition card. The camera used was a monochrome industrial camera configured to take 640x480 resolution images at 30frames/second. Using the given setup the tracking system was found to work well under different lighting conditions. Also the pan-tilt mechanism responded as desired to the movement of the object being tracked. The experiment involved the tracking of a single person walking inside a room. The results of the experiment are presented in Fig 8. Please note that the faint red square represents the result of the template matching module and has been superimposed by the tracking module on the original frame. In the given sequence of images the camera remains stationary from Fig 8(a) to Fig 8(b) as the person remains within the viewable limits. In Fig 8(f) the person is occluded as it reaches the end of the video frame. It can be seen that even in such a condition the person is being tracked. Also in Fig 8(f) the object has crossed the end limit. To keep the person in camera’s view the pan-tilt mechanism pans the camera to left. This is evident from Fig 8(g), where the background has changed. Another noticeable point in Fig 8(g) is that the red square is no longer visible. This is due to the fact that in our algorithm an object is not tracked when the camera is in motion. As can be seen in Fig 8(h) the system starts tracking the person again. This happens when the camera becomes stationary again.
250
GUPTA AND KULKARNI
(a)
(b)
The system has been automated using a pan-tilt setup which is synchronized with the algorithm. The pan-tilt setup, together with the pan-tilt software module, ensures that a moving object is always kept in the camera’s view. The maximum speed of the pan/tilt motor determines the maximum speed of the pan/tilt motion of the camera, which in turn determines the maximum speed below which an object can be tracked successfully. The present object tracking system can be used in applications where accurate tracking is required but high performance hardware is either not available or is not affordable. The system is very much applicable to areas like surveillance and video conferencing. Future work focuses on tracking multiple objects at the same time as well as on improving tracker accuracy during camera motion. ACKNOWLEDGMENT
(c)
(d)
(e)
(f)
(g)
(h)
Fig. 8 Experimental results from real time tracking of a person walking inside a room V. CONCLUSION
A robust and efficient automated single object tracking system is presented. The system has been implemented using an algorithm which is a combination of two algorithms i.e. the frame differencing algorithm and the dynamic template matching algorithm. The algorithm is efficient enough to be implemented effectively on a standard desktop computer and requires no colour information to track an object. This makes implementing the tracking system very cost effective. The algorithm does not require any initial information about the object it is going to track and can well handle the entry and exit of a single moving object. Apart from that the algorithm has experimentally been shown to be quite accurate and effective in detecting a single moving object under bad lighting conditions or occlusions.
Thanks are due to Dr. Bhaskar Dasgupta of ME Department for giving the idea of developing an object tracking system. Mr. Rajendra’s help is duly acknowledged in the hardware setup development. We also acknowledge the contributions made by Miss Suvi Srivastava in developing the motor driver circuits. We are also thankful to Mr. Vikash Gupta for the support he provided in conducting the experiments. REFERNCES [1] YILMAZ, A., JAVED, O., AND SHAH, M. 2006: Object tracking: A survey. ACM Comput. Surv. 38, 4, Article 13 [2] RICHARD Y. D. XU , JOHN G. ALLEN , JESSE S. JIN, 2004: Robust real-time tracking of non-rigid objects, Proceedings of the Pan-Sydney area workshop on Visual information processing, p.95-98 [3] BAR-SHALOM, Y. AND FOREMAN, T. 1988: Tracking and Data Association. Academic Press Inc [4] FIEGUTH, P. AND TERZOPOULOS, D. 1997: Color-based tracking of heads and other mobile objects at video frame rates. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 21–27. [5] JAIN, R. AND NAGEL, H. 1979: On the analysis of accumulative difference pictures from image sequences of real world scenes. IEEE Trans. Patt. Analy. Mach. Intell. 1, 2, 206–214. [6] HARITAOGLU, I., HARWOOD, D., AND DAVIS, L. 2000: W4: realtime surveillance of people and their activities. IEEE Trans. Patt. Analy. Mach. Intell. 22, 8, 809–830. [7] COMANICIU, D., RAMESH, V., AND MEER, P. 2003: Kernel-based object tracking. IEEE Trans. Patt. Analy. Mach. Intell. 25, 564–575. [8] S. YOSHIMURA AND T. KANADE, 1994: Fast Template Matching Based on the Normalized Correlation by Using Multiresolution Eigenimages, Proc IROS ‘94, Munich, Germany. [9] D. MURRAY AND A. BASU, 1994:Motion tracking with an active camera , IEEE Trans. Pattern Anal. Machine Intell., vol. 16, pp. 449–459. [10] RAFAEL C. GONZALEZ, RICHARD E. WOODS, 2002: Digital Image processing, Second Edition. Prentice Hall International,Ch 9, 10 . [11]EDWARD R. DOUGHERTY , 1993 : Mathematical Morphology in Image Processing. CRC Press, Ch 1, 2 . [12]MONNET, A., MITTAL, A., PARAGIOS, N., AND RAMESH, V. 2003: Background modeling and subtraction of dynamic scenes. In IEEE International Conference on Computer Vision (ICCV). 1305–1312. [13] ZHONG, J. AND SCLAROFF, S. 2003: Segmenting foreground objects from a dynamic textured background via a robust kalman filter. In IEEE International Conference on Computer Vision (ICCV). 44–50. [14] SCHWEITZER, H., BELL, J. W., AND WU, F. 2002: Very fast template matching. In European Conference on Computer Vision (ECCV). 145-148 [15]L. WANG, W. HU, AND T. TAN.,2002 : Face tracking using motion guided dynamic template matching. In ACCV, 2002.
A Model for Prevention of Software Piracy Through Secure Distribution Vaddadi P. Chandu University of Maryland College Park, MD [email protected]
Karandeep Singh University of Maryland College Park, MD [email protected]
Ravi Baskaran Capitol College Laurel, MD
Abstract Although many anti-piracy methods are prevalent today in the software market, still software piracy is constantly on the rise. The cause of this can be attributed to (a) either these methods are very expensive to be implemented (b) or are easy to be defeated (c) or are simply not very convenient to use. Anti-piracy methods usually consist of lock-and-key policy where the software to be protected is locked using some encryption method (lock) and this lock requires a key to be unlocked. The key is called as registration code and the mechanism is called as registration mechanism in the software parlance. The registration mechanism may be implemented in many ways- software, hardware, or a combination of both. The way it is implemented makes the protection scheme more vulnerable or less vulnerable to piracy. The method of implementation also affects the user convenience to use the software. Some mechanisms are very convenient to the user but are more vulnerable to piracy. Some others are least vulnerable to piracy but are very inconvenient to the user. Different vendors choose different ways to implement the lock and unlock mechanism to make it as hard as possible to pirate their software. In this paper, we discuss several anti-piracy methods, the security issues related to them, and present a model which allows prevention of software piracy through a secure mechanism for software distribution.
1. Introduction In today’s software marketplace, piracy has become a major threat to intellectual property. With the easy availability of inexpensive optical disc readers and writers, bit-by-bit copying of optical discs is becoming increasingly trivial. According to [20], 35% of the software installed on the personal computers was illegal. As per the study in [1], the median piracy among 97 countries in 2005 was 64%, which means, for every two US dollar worth of software purchased, one US dollar worth was obtained illegally. In global sales, the current market for software for use with personal computers is cited to be around US
$50b [1]. However, due to piracy activities, the losses are estimated to be around US $30b. In 2005, the piracy rate is seen to be lowest at 22% in North America and highest at 69% in Eastern Europe. Countries such as Vietnam and Zimbabwe show a piracy rate of 90% [1]. During the last decade, the losses due to software piracy tend to be following an upward trend world-wide. The losses from software piracy force the government and software companies to enforce protection in order to fence the intellectual property from illegal users. These methods include (a) legal protection such as Intellectual Property laws which enforce civil punishments for illegal software users, and (b) Technological protection which applies software and hardware locks to the software and thereby makes it very hard for the intruders to break into it. Several inventors [7, 10, 8, 9, 12, 2, 6, 17, 11] have developed protection techniques (Technological) to prevent software piracy through (a) software approach (where the software requires an encryption code to install or execute, and this code is distributed only to legal owners of the software), (b) hardware approach (where the software requires a hardware device, which cannot be duplicated easily, with an encryption code to install or execute. This device is distributed only to legal owners of the software, and/or (c) a combined approach. This paper discusses these methods, shows how each one of them can be defeated, and also presents a model to prevent software piracy. The remainder of this paper is organized as follows. Section 2 presents existing and next generation methods for software protection, including the ways on how to break them. Section 3 discusses our model for prevention of software piracy, and finally, Section 4 presents conclusions and future research.
2. Existing methods for preventing software piracy A straight forward solution to prevent piracy is to make the software totally immune to intrusion.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 251–255. © Springer Science+Business Media B.V. 2008
252
CHANDU ET AL.
However, any scheme can be intruded if sufficient time and resources exist through (a) bit-by-bit copying using advanced optical disc readers and writers, (b) Instruction Camouaging [13], or (c) Key generation through brute force methods. No scheme can be considered totally invincible. However, piracy can be controlled by making a scheme very difficult to intrude and thereby deterring the intruder’s desire to break into the software. Following subsections discuss several methods that have been proposed to curb piracy and are prevalent since the introduction of optical discs.
2.1 Using registration codes as an anti-piracy technique This is one of the most widely used techniques due to its (a) easy-to-use, (b) low-cost, and (c) faster timeto-market features. A registration code is a unique string of alpha-numeric and/or special characters which are generated using proprietary encryption algorithms. A registration code is provided to each legal owner of the software by the software vendor. The software requires this code at the time of installation in order to properly install and execute. This mechanism definitely provides a certain level of protection to the software. However, most intruders use a brute force approach to either re-generate the registration code, or to understand that part of software which when bypassed eliminates the need for the registration code. In both cases, this method can be easily defeated. In another scenario, if the legal owner herself wants to pirate the software the task becomes very trivial. The owner can copy the original optical disc bit-by-bit to a blank disc using a simple optical disc writer and use the same registration code that came for the original disc. In other words, the owner creates a clone of the original disc.
2.2 Using technique
online
activation
as
anti-piracy
This method prevents the users’ ability to use software from cloned optical discs. In online activation, the software vendor maintains a central authenticating agent CAA. The CAA is typically a combination of a server and a database where all the legal registration codes that are in use are stored. A customer is required to submit the registration code online to the CAA at the time of installation. Once the code is submitted, it is (a) checked for a known pattern or a specific value, and (b) checked with other registered codes that exist in the database. Part (a) ensures that the registration code is not tampered and part (b) ensures that the registration code is not duplicated. If the code fails any of these tests, the registration is denied and the software fails to be installed and executed. This method, no doubt, is a very effective
means to control piracy. However, if there is a pattern existing in the registration code (which is usually the case), brute force method can be used to understand that pattern and generate new codes that will comply with the pattern. Alternatively, through brute force, that part of the software which requires activation can also be bypassed to remove the online activation altogether (typically known as cracked software). Hence, this method can also be defeated. Additionally, this method comes with a strict requirement that the customer needs to be connected to the internet in order to install the software. Moreover, if there are any CAA outages, the customer is penalized by not being able to install and/or execute the software.
2.3 Using media technique
protection
as anti-piracy
Media protection is a technique that provides (a) copy protection, which prevents users from using copied or emulated software through virtual drive, and (b) hack protection, which prevents memory lifts and automated circumvented attacks that are used to create unprotected executable versions. This is achieved through specialized software and/or introducing random errors into the media in order to make the copy action fail. For example, in [14], this method is implemented through encrypting the program executable in an access control wrapper which in turn requires authentication against its signature. Another implementation of this technology is [19], where optical disc cloning is prevented by adding a unique non-copyable electronic key code on the optical disc during glass mastering process. Although, this technique has been very successful, but there are some issues associated with itsupport from manufacturers, higher cost-to-manufacture, and compatibility issues (in some cases). For example, [19] requires specialized manufacturing process because the electronic key code is imprinted during the glass mastering process. [14] comes with compatibility problems and cannot be used with all operating systems.
2.4 Using hardware based protection as antipiracy technique Hardware based protection is proven to be most effective technique till date to stop piracy. In this method, the software vendor employs a piece of hardware along with the software product. This device contains an unalterable hard coded registration code. The software requires this device to be plugged in, in order to be installed and executed. Although this method is most secure but from a user’s convenience point of view this is not the best. The hardware device has to be plugged in at all times in order for the software to execute. If a customer needs to use multiple software all of which employ a similar protection technique, then the work
A MODEL FOR PREVENTION OF SOFTWARE PIRACY THROUGH SECURE DISTRIBUTION
environment becomes very messy. Moreover, since these devices are plugged into a computer either through USB port or serial port or parallel port, the maximum number of such software that a customer can use simultaneously is limited by the number of ports (USB + serial + parallel) available on her computer. Additionally, the cost associated with developing the hardware device is high. The cost can be justified only if the price of the software is very high. Typically, this method is used in situations where the price of the software is above few thousand US dollars. For example, Model Technology [16] employs this method to protect its software for computer aided design of integrated circuits.
Start
Insert the device into a computer
NO
3. Proposed model- Using Universal Serial Bus Memory Stick for preventing piracy Our model to attack software piracy is based on using hardware and software support through use of flash memory device and microprocessor. With the advent of economical and fast flash memory devices and microprocessors, we believe this combination should be easy to implement. Our model executes in two steps- for any software that needs to be installed, first it checks if the software is present on an authentic hardware device, second, it checks if the software is currently installed on another machine. Fig.1. shows the flowchart for the model for a single license.
Is the software on a designated
YES
2.5 Using re-writable optical disc, and unique keys as anti-piracy technique This is a recently patented method [5] in which the inventors propose to use a re-writable optical disc with a physical and a logical key to achieve piracy protection. At the time of manufacturing the disc, a physical key Kphysical is impressed in the ATIP (Absolute Time In Pregroove) signal [4] of the disc. A logical key Klogical corresponding to the physical key Kphysical is associated with each disc. Using Kphysical and Klogical, a new key is generated which is used to encrypt the software contained in the disc. To read the software on the disc, both keys, Kphysical and Klogical are required. The software creates a decryption key Kdecryption through some combination of Kphysical and Klogical and uses it to decrypt the software contained in the disc. If any of the keys is incorrect, then there will be a mismatch and Kdecryption will be an incorrect key. Since the key Kphysical is imprinted at the time of manufacture in the ATIP signal (which cannot be reproduced), such a disc cannot be cloned through bit-by-bit copying. This method offers very high level of software protection, but requires support from the manufacturer.
253
NO
Is this a fresh installation?
YES 1. Install the software on the computer 2. Store the state of the installation
Terminate Fig.1. Flowchart for the proposed model (single license case)
The flow can be classified as two parts, (a) the part that performs hardware check, and (b) the part that performs software check. Part (a) ensures that the software is not a bit-by-bit copied software, and part (b) ensures that the software is not currently installed on any other machine. In order to understand the model better, now we raise some questions/ arguments, and try to answer them. The first question is - how can authentic software detected from copied software in part (a) of the flow chart? The answer to this question is straight forward. This can be detected in the following way. Assign a unique hardware ID to each hardware device, and pass on the device ID to the software. Upon a request for the software installation, the ID in the software can be compared to the actual hardware device ID. Any noncopied software should pass this test. However, an argument may raise here- if the device ID is spoofed to the software this scheme can be defeated. There is a simple solution to this problemimplement a gateway and make it the only means of communication between the device and external world.
254
CHANDU ET AL.
No outside software should be able to access the device information or the software as such. Since all requests are routed through the gateway, the device ID cannot be spoofed. This scheme is shown in Fig. 2.
Fig.2. Single Gateway method (Prevents device ID spoofing)
The second question is - how can it be detected if an installation is fresh or not (part (b)). Alternatively, how can it be detected if the software is currently installed elsewhere? To detect if the software is installed elsewhere, it is sufficient to know the state of the software. By state of the software, we mean information pertaining to an old installation. After each installation, copy the MAC address of the computer at a specific location (LocationState) in the device. Upon each un-installation, reset the value at the LocationState to a known value such as 0. Since the value at the LocationState is present only between installation and un-installation times, at each request for an installation, the value at the LocationState can be compared to the reset value to see if the software is currently installed elsewhere. This raises two arguments- first, the MAC address can be spoofed, and second the location can be reset externally by a malicious attacker. We first handle the second argument which automatically makes the first one a mute argument. Since we follow the access to the device through a single gateway (Fig.2), the LocationState where this is written will not be accessible to malicious attackers. Since the LocationState is not accessible to the malicious attackers, it doesn’t matter if the MAC address is original or spoofed, because in any case there will only be one installation at any given point in time. At this point, it is safe to say that this model will work against both bit-by-bit copying and illegal licensing. In the next subsection, we propose an implementation of the model.
3.1 Proposed implementation of the model In this model, we made the following assumptions- (a) a rewritable device/memory with a permanent hardware ID to store the software and state
information, (b) an intelligent gateway to identify the access restrictions and deny illegitimate accesses, communicate with the external world to install and uninstall the software, and load and store the state of the software. These two assumptions can be easily realized through (a) a flash memory with a hardware ID imprinted during manufacture- for the rewritable device, and (b) a microprocessor such as PIC or x86- for the gateway. The microprocessor will need to implement the following- (a) USB protocol to communication with the computer, (b) Load/Store functionality into the specific location which stores the installation state, (c) read the software from its location and send it out on the wire. The set up is shown in Fig. 3 below. When this setup is plugged into a computer, the computer will detect a USB device plugged into it, and will try to communicate with it. The microprocessor will respond to the USB signals from the computer and will request the computer user to allow it to install or uninstall the application. Once the user issues a command for installation, the microprocessor with check for the copied software and duplicate licensing and if both checks pass, it will first, read the software from the flash memory, install the software, and upon completion record the state. It is essential to record the state in the end to avoid phantom installations- where the installation state gets changed but the actual installation fails. Similarly if the user issues a command for un-installation, the microprocessor, simply uninstalls the application from the computer, and resets the installation state of the software.
Fig.3. Proposed implementation of the model
4. Conclusion and future research We presented a robust model and proved that it works against software piracy attacks through use of a combination of both hardware and software techniques. The proposed model prevents casual copiers from bit-bybit copying and also from illegal licensing. The explanation of the model in this paper is presented for the case of a single license, but it is very straight forward to extend this model to multiple licenses. This can be achieved by programming the microprocessor to handle
A MODEL FOR PREVENTION OF SOFTWARE PIRACY THROUGH SECURE DISTRIBUTION
multiple licenses instead of single license during the store and check operations. If there may be a case where a computer is replaced without uninstalling the old installation, then this model will not allow installing the software in a new computer. In such a situation, according to our model, the user will have to obtain a fresh set up from the software vendor after appropriate verification. The only downside of our model, as per our understanding is that it requires support from hardware manufacturer, but given that hardware development is getting inexpensive each day, once the process is established it should be easy to manufacture. Currently, the lack of hardware manufacturer support prevented us to implement our model but our future research will be to find alternative ways to develop a prototype to this model.
References [1] Third Annual BSA and IDC Global Software Piracy Study. May 2006. [2] R. Akiyama, M. Yoshioka, and Y. Uchida. Software copying System. Sept. 1998. US Patent no. 5805699. [3] Alladin. HASP HL Hardware Key with 128-bit AES Encryption. http://www.alladin.com/HASP/HaspHL.asp. [4] Andy McFadden. CD-Recordable FAQ-Section 2. http://www.cdrfaq.org. [5] J. A. Barnard, M. A. Inchalik, and B. L. Ha. Copy protection using multiple security levels on a programmable CD-ROM. June 2006. US Patent no. 7057993. [6] A. N. Chandra, L. D. Comerford, and S. R. White. Implementing a shared higher level of privilege on personal computers for copy protection of software. Feb. 1987. US Patent no. 4644493. [7] J. P. DeMont. Method and apparatus for distributing information products. Nov. 1999. US Patent no. 5982889. [8] B. A. Fite, M. L. Mitchell, R. A. Kunz, and C. R. Brannon. CD-ROM with machine-readable I.D. code. Mar. 1995. US Patent no. 5400319. [9] B. A. Fite, M. L. Mitchell, R. A. Kunz, and C. R. Brannon. Using defect read from a disk to represent a machine readable code. Sept. 1998. US Patent no. 5805549. [10] T. Hasebe, R. Akiyama, and M. Yoshioka. Storage medium for preventing illegal use by third party. Sept. 1996. US Patent no. 5555304. [11] R. S. Indeck, W. Marcel, L. George, and A. L. Hege. Method and appartus for improved fingerprinting and authenticating various magnetic media. Apr. 1998. US Patent no. 5740244. [12] H. Kanamaru. Information recording method and apparatus, function recording method and apparatus, and information reproducing method and apparatus. Aug. 1999. US Patent no. 5940505. [13] Y. Kanzaki, A. Monden, M. Nakamura, and K. Matsumoto. A Software Protection Method Based on Instruction Camouage. Electronics and Communications in Japan, 89(1), 2006. Translated from: Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J87-A, No. 6, June 2004, pp. 755-767. [14] Macrovision. Game Disc Protection. http://www.macrovision.com.
255
[15] MAI Digital Security. USB Port Dongle. http://www.keylok.com/usb dongle.html. [16] Mentor Graphics. ModelSim- a comprehensive simulation and debug environment for complex ASIC and FPGA designs. http://www.model.com/. [17] C. H. O’Connor and J. J. Pearce. Method of securing CDROM daafor retrieval by one machine. Apr. 1998. US Patent no. 5745568. [18] Rockey. ROCKEY4 Hardware Based Securiy. http://www.rockey.com.my. [19] Sony, Copy Control for CD-ROM, with SecuROM. http://www.sonydadc.com/. [20] Fourth Annual BSA and IDC Global Software Piracy Study. 2007.
Performance Enhancement of CAST-128 Algorithm by Modifying Its Function Krishnamurthy G.N, Ramaswamy V, Leela G.H, Ashalatha M.E, Bapuji Institute of Engineering and Technology, Davangere-577 004, India. Abstract-There has been a tremendous enhancement in the field of cryptography, which tries to manipulate the plain text so that it becomes unreadable, less prone to hacker and crackers. In this regard, we have made an attempt to modify CAST-128 algorithm which is a secret-key block cipher so as to enhance performance by modifying its function, which would not only be a secure one, but would also reduce total time taken for encryption and decryption. This paper attempts to improve performance without violating memory requirements, security and simplicity of existing CAST-128 algorithm. The proposed modification is only limited to the change in the implementation of the function F of the CAST-128’s Feistel network. Because the change in the total time taken for encryption and decryption cannot be understood on software implementation, we have implemented VHDL application to show the differences in the delay.
Key words: Cryptography; Plaintext; Ciphertext; Encryption; Decryption; Secret-key; Feistel-network; P-array; S-box; Function. I. INTRODUCTION CAST-128[3] is a design procedure for symmetric encryption algorithm developed by Carlisle Adams and Stafford Tavares. CAST has a classical Feistel network with 16 rounds. It operates on 64-bit blocks of plaintext to produce 64-bit blocks of cipher text. The key size varies from 40-bits to 128-bits in 8-bit increments. In this paper, we have tried to improve the existing CAST128 algorithm by modifying its function(F) by parallel evaluation of two operations. The parallel execution is efficient in processing, such that it requires only 66.66% of the time required for the original function. As the algorithm uses 16 iterations, this time is saved 16 times for every encryption /decryption.
Encryption CAST-128 is a Feistel network consisting of 16 rounds (Fig. 1). The input is a 64-bit data element. The plaintext is divided into two 32-bit halves: L0 and R0 We use variables Li and Ri to refer to the left and right of the data after round i is completed. The cipher text is formed by swapping the output of the sixteenth round, in other words, the cipher text is a concatenation of R16 and L16. L0 || R0 = Plaintext For i = 1 to 16 do Li = Ri-1; Ri = Li-1 XOR Fi[Ri-1,Kmi,Kri]; Ciphertext = R16 || L16 The function F includes the use of four 8 * 32 S boxes, the left circular rotation function, and four functions that vary depending on the round number. We label these functions as f1i, f2i, f3i and f4i. We use I to refer to the intermediate 32-bit value after the left circular rotation function, and the labels Ia, Ib, Ic and Id to refer to the 4 bytes of I, where Ia is the most significant and Id is the least significant byte. With these conventions function F is defined as shown in TABLE I:
CAST employs two sub keys in each round namely a 32bit masking sub key (Kmi) and a 5-bit rotate sub key (Kri). The function F depends on the round. It has the structure of classical Fiestel network with 16 rounds of operation. CAST-128 is a variable-length key, 64-bit block cipher. The algorithm consists of two parts: a sub key generation part and a data- encryption part. CAST-128 uses four primitive operations, Addition (+) and subtraction (-) using modulo 232 arithmetic, Bitwise ex-OR (^) and Left Circular Rotation (<<<).
Fig. 1. CAST-128 Original Encryption Scheme.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 256–260. © Springer Science+Business Media B.V. 2008
PERFORMANCE ENHANCEMENT OF CAST-128 ALGORITHM BY MODIFYING ITS FUNCTION
257
TABLE I DEFINITION OF ORIGINAL FUNCTION Rounds 1,4,7,10,13,16
I = ((Kmi + Ri-1) <<< Kri) F = ((S1[Ia] ^ S2[Ib]) – (S3[Ic]) )+ S4[Id]
Rounds 2,5,8,11,14
I = ((Kmi ^ Ri-1) <<< Kri) F = ((S1[Ia] - S2[Ib]) + (S3[Ic])) ^ S4[Id]
Rounds 3,6,9,12,15
I = ((Kmi - Ri-1) <<< Kri) F = ((S1[Ia] + S2[Ib]) ^ (S3[Ic]) )- S4[Id]
Substitution Boxes There are eight 32-bit S-boxes with 256 entries each S1[0],S1[1],….,S1[255]; S2[0],S2[1],….,S2[255]; S3[0],S3[1],.....,S3[255]; S4[0],S4[1],….,S4[255]; S5[0], S5[1],….,S5[255]; S6[0],S6[1],.....,S6[255]; S7[0],S7[1],.….,S7[255]; S8[0],S8[1],.....,S8[255]; Four of these namely S-box1 through S-box4 are used in encryption and decryption process. The remaining four namely S-box5 through S-box8 are used in sub key generation. Each S-box is an array of 32 columns by 256 rows. The 8-bit input selects a row in the array; the 32-bit value in that row is the output. All of the S-boxes contain fixed values. Generating the Sub keys Sub key generation is a complex process. To begin, label the bytes of the 128- bit key as follows: x0x1x2x3x4x5x6x7x8x9xAxBxCxDxExF Here x0 represents the most significant byte and xF represents the least significant byte. Also use the following definitions: • Km1,…..,Km16 Sixteen 32-bit masking sub keys. • Kr1,…...,Kr16 Sixteen 32-bit rotate sub keys of which only the least significant 5-bits of each are used. • z0…….zF Intermediate (temporary) bytes.
Fig. 2. CAST-128 Original F function.
Decryption Decryption for CAST-128[4] is relatively straightforward. Ironically, decryption works in the same algorithmic direction as encryption beginning with the ciphertext as input. However, as expected, the sub-keys are used in reverse. Function F The function F is designed to have good confusion, diffusion and avalanche properties. It uses S-box substitutions, modulo 232 addition and subtraction, exclusive OR operations and key dependent rotation. The strength of the F function is based primarily on the strength of the S boxes, but further use of arithmetic, boolean and rotate operations add to its strength. II. PROPOSED MODIFICATION Without violating the security requirements, the CAST128 function F can be modified as shown in TABLE II below.
• K1……K32 Intermediate (temporary) 32-bit words. The values K1 through K32 are calculated from the key using S-boxes 5 through 8. Then the sub keys are defined by: for i =1 to 16 do Kmi = Ki; Kri = K16+i;
TABLE II DEFINITION OF ORIGINAL FUNCTION Rounds 1,4,7,10,13,16
I = ((Kmi + Ri-1) <<< Kri) F` = (S1[Ia] ^ S2[Ib]) – (S3[Ic] + S4[Id])
Rounds 2,5,8,11,14
I = ((Kmi ^ Ri-1) <<< Kri) F`= (S1[Ia] - S2[Ib]) + (S3[Ic] ^ S4[Id])
Rounds 3,6,9,12,15
I = ((Kmi - Ri-1) <<< Kri) F`= (S1[Ia] + S2[Ib]) ^ (S3[Ic] - S4[Id])
KRISHNAMURTHY ET AL.
258
Fig. 3. CAST-128 Modified Encryption Scheme.
This modification(Fig. 3) supports the parallel evaluation of two operations by using threads. The parallel evaluation reduces the time from two operations to time required for one operation. As the algorithm uses 16 iterations, this time is saved 16 times for every encryption/decryption. This is a considerable improvement. Also, as the security of CAST-128 lies in the fact that it uses variable key, this modification does not make the algorithm vulnerable in any way so that cryptanalysis becomes easy. But true parallelism cannot be achieved on a uniprocessor system. So the effect of the modification can be seen only in multiprocessor system, with at least two processors. So this modified function can be best adopted for the hardware implementation of the algorithm. In the hardware implementation the modified function F (Fig. 4) requires only two levels of computation, where as the original function F (Fig. 2) required three levels of computation.
Fig. 4. CAST-128 Modified F function.
The generation of sub keys is done in the same way as done in the existing CAST-128 Algorithm. i.e., similar 8 Sboxes with 256 entries each are used to generate the sub keys.
III. SAMPLE WAVEFORMS AND RESULTS ANALYSIS Following simulation diagram (Fig. 5) shows the time required to execute the Function F of the existing CAST-128 function as marked by the 2 thick lines. It is taking 55ps - 5ps = 50ps
The above modification does not require any change to be made in the original algorithm. The original algorithm works fine with the modified function for both encryption as well as decryption. The same four primitives namely addition, subtraction, bitwise exclusive OR and left circular rotation operations are used in the modified function. Fig. 5. Waveform for CAST-128 Original Function F
PERFORMANCE ENHANCEMENT OF CAST-128 ALGORITHM BY MODIFYING ITS FUNCTION
Fig. 6. Waveform for CAST-128 Modified Function F
“Fig. 6” shows Simulation diagram (Fig. 6) for the time required to execute the modified Function F of the CAST-128 as marked by the 2 thick lines. It is taking 45ps - 5ps = 40ps. The ratio of time taken between CAST-128 with modified and existing Function=40/50=0.8; Hence we have 20% improvement in the performance.
259
Fig. 8. Waveform for CAST-128 Algorithm with Modified Function
“Fig. 8” given below shows the time taken to execute the CAST-128 algorithm with modified function. The time required for encryption is shown between the first 2 thick lines and the time taken for decryption is the time between last two thick lines. The time taken for encryption = 1125 – 0 = 1125ps.
“Fig. 7” given below shows the time taken to execute the CAST-128 algorithm with original function, where in the time required for encryption is shown between the first 2 thick lines and the time taken for decryption is the time between last two thick lines. The time taken for encryption = 1285 – 0 = 1285ps. The time taken for decryption = 4085 – 2800 = 1285ps which is same as time taken for encryption.
So the ratio of time taken between modified and existing CAST-128 algorithm=1125/1285=0.875; We thus have 12.5% improvement in the over all performance. The time taken for decryption = 2625 – 1500 = 1125ps(same as encryption).
III. CONCLUSION It is found that the improved modified algorithm increases the performance of CAST-128 algorithm by reducing total time from 1285 ps to 1125 ps. Using VHDL implementation, it is observed that the reduction in time achieved for encryption and decryption is above 12.5 % compared to the existing algorithm. The results are as shown in the sample waveforms and result analysis.
Fig. 7. Waveform for original CAST-128 Algorithm
KRISHNAMURTHY ET AL.
260
REFERENCES [1]
Bruce Schneier. Eason, “Applied Cryptography, Protocols, Algorithms, and Source Code in C” New York, Wiley, 1996.
[2] William Stallings, “Cryptography and Network Security” 3rd ed., Pearson Education, 2003. [3] Adams C, “The CAST-128 Encryption Algorithm” RFC 2144, May 1997. [4] Krishnamurthhy G N, V. Ramaswamy, Leela G H, “Performance Enhancement of Blowfish Algorihm by modifying its function”, Proceedings, International conference on CISSE, University of Bridgeport, USA, 2006.
B. Schneier, “Description o f a New Variable-Length Key, 64-Bit Block Cipher (Blowfish)”, Fast Software Encryption, Cambridge Security Workshop roceedings (December 1993), Springer-Verlag, 1994, pp. 191-204.
[5]
[6] Anne Canteaut(Editor) “Ongoing Research Areas in Symmetric Cryptography” , ECRYPT, 2006. [7] Gilles-Fracois Piret, Block Ciphers: Security Proofs, Cryptanalysis, Design, and Fault Attacks, Ph.D Thesis, UCL Crypto Group, 2005.
A Concatenative Synthesis Based Speech Synthesiser for Hindi Kshitij Gupta IIM Ahmedabad [email protected]
Abstract- The document presents a speech synthesis system based upon hidden Markov models and decision trees. The development follows the approach generally implemented for speech recognition systems. A front end analysis of the database is done and the derived feature vectors are subject to phonetic hidden Markov models, which are then clustered based upon a decisiontree approach that models the context of the phones being considered. The individual phone segments form an acoustic leaf sequence which divides various contexts into equivalence classes. During synthesis, the phone sequence to be synthesised is translated into an acoustic leaf sequence by posing questions associated with different nodes of the decision tree to the immediate context of the phone. The sequence of terminal nodes thus obtained is amalgamated to obtain the desired pronunciation. I.
INTRODUCTION
Speech synthesis has been developed along two different lines– a formant rule based approach and a concatenation approach. Though both forms have their advantages, however, for complex languages like Hindi, where it is difficult to formulate rules for each and every contextual consideration, the concatenation alternative is a better consideration. Earlier research in the area of concatenative speech synthesis focussed primarily on manual segmentation and alignment of speech data. However, with the advent of time and research the applicability of speech recognition techniques to the reverse process has proved to be advantageous [1]. The use of statistical models in the form of hidden Markov models that are not only automated, but also trainable allowing for training directly on natural speech than carefully well-crafted nonsense contexts for different phones. The use of hidden Markov models is further complemented by the incorporation of decision tree based clustering which allows for appropriate modelling of different contexts. Another possibility would have been the use of Gaussian Mixture Models (GMM), however, that approach is more suited to speech recognition since such mixture models cannot be labelled with particular contexts – a more important aspect for speech synthesis than recognition. Thus, HMMs coupled with decision trees form the basis of the proposed speech synthesis system. Though a lot of efforts are being made nowadays for developing efficient speech synthesis systems, the HP system [6] and the IBM system [2] are notable amongst the present generation of speech synthesis software. The remaining document is structured as follows. Section 2 considers the Hindi phone set and relates important aspects of
the language for speech synthesis. The pre-processing of acoustic signals is considered in section 3, followed by an explanation of the application of HMM to the proposed approach. Section 4 explores the decision tree construction and associated evaluation function. Lastly, section 5 concludes the proposal with suggestions for further research. II. THE HINDI PHONE SET The Hindi language is characteristically phonetic since the written script follows closely with the phonetic associations. Unlike English, certain consonants, especially stressed plosives, in the Hindi language are a combination of multiple phones, e.g. /ddh/ and /dhh/. These need to be modelled separately as they cannot be accounted for by simply bootstrapping the English phone set [3]. Another issue is the inclusion of the schwa. The Hindi language comprises 33 consonants, which can be used as full or half characters. While certain approaches treat them as a single case, we differentiate between the two on the basis of schwa inclusion. Half characters are represented by the simple phones, the schwa is the phone /a/, and full characters are simply a combination of the two, that is, the usual phone with a trailing schwa. However, when considering words, the inclusion becomes a totally different issue and certain considerations need to be made, which may be summarised in the following inherent vowel (schwa) suppression (IVS) rules: • No two successive characters undergo IVS. • Characters present in the first position never undergo IVS. • Last characters of a word always undergo IVS. • Characters present in the middle of a word undergo IVS if the next character is not the last character or does not have a vowel other than /a/ associated with it. IVS refers to the condition where the trailing schwa need not be mentioned. An example of a word following the above rules /bh//aa//r//a//t/. Though rules may be defined to is account for general behaviour as above, anomalies may exist that cannot be modelled through fixed rules. Thus, a pure rulebased approach for the Hindi language is unfeasible. Such anomalies may be dealt with through statistical learning methods such as HMMs. An instance of such an anomaly is the use of /sh/ and /ssh/ which cannot be modelled through rules. To this effect concatenating individual utterances of the two wherever they appear is more credible than trying to formulate rules.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 261–264. © Springer Science+Business Media B.V. 2008
262
GUPTA
IV. APPLICATION OF HMM Markov models represent a stochastic process whose output is a set of states, where each state corresponds to a physical (observable) event. When extended to include the case where the observation is a probabilistic function of the state i.e., the resulting process is a doubly embedded stochastic process with an underlying stochastic process that is not observable (hidden), but can only be observed through another set of stochastic processes that produce the sequence of observations, we may use such hidden Markov models (HMMs) for speech synthesis and recognition. Unlike speech recognition where all three problems associated with HMMs need to be resolved, in terms of initialisation, training and finally recognition; speech synthesis only requires the training phase [4]. Given a model λ , the Baum-Welch (BW) algorithm estimates a new model ^
λ , for which ^ ⎛ ⎞ L⎜ D | λ ⎟ ≥ L ( D | λ ) ⎝ ⎠
Fig. 1. The Hindi Phone Set
Thus, the use of concatenation for Hindi is justified and the issue of schwa is illustrated as a point of contention when formulating the Hidden Markov Models and decision tree. The Hindi phone set is presented in Fig. 1.
with the equality occurring when the likelihood has reached a maximum. The weights used by the BW algorithm are a posteriori probabilities of each training vector being observed when the model is in each state, computed using the old model λ . The BW formulae for the new state mean vectors and covariance matrices are:
⎛ linearfrequency⎞ MelFrequency = 2595 × log⎜1+ ⎟ 700 ⎠ ⎝
∑∑ γ (t )o
^
(1)
Acoustic modeling assumes that each acoustic vector is uncorrelated with its predecessors and successors. Since speech signals are continuous, this assumption is problematic. The traditional solution is to augment the cepstral vector with its first and second differentials. Since the Mel cepstral vector is 13 elements long, after appending the differentials and one energy component, the final acoustic vector becomes 40 elements in length. This concludes the pre-processing of the acoustic signal.
Tr
R
III. THE PREPROCESSING MODULE A sampling frequency of 44.1 kHz is used with a pre-emphasis filter 1-0.97z-1. The waveform is partitioned in frames, each frame constituting a 20ms span, that is, 882 samples. A 5ms overlap is maintained for consecutive frames. Smoothening is performed through application of a 20ms Hamming window before being subject to a Fast Fourier Transform (FFT) to analyse the signal in the frequency domain. Spectral analysis reveals those speech signal features which are mainly due to the shape of the vocal tract. Spectral features of speech are generally obtained as the exit of filter banks, which properly integrate a spectrum at defined frequency ranges. The most commonly used spectral analysis measure is the Mel Frequency Cepstral Coefficients (MFCC) [3]. MFCC can be applied as the following:
(2)
μj =
r =1 t =1 R Tr
r j
r t
(3)
∑∑ γ rj (t ) r =1 t =1
∑ ∑ γ (t )(o R
^
Σj =
Tr
r j
r =1 t =1
R
r t
)(
− μ j o tr − μ j
Tr
∑ ∑ γ rj (t )
)
'
(4)
r =1 t =1
where,
γ rj (t )
= a posteriori probability of the HMM being in state j
when generating the t-th vector in the r-th observation sequence. Also,
γ rj (t )
can be computed in terms of the forward (α) and
backward (β) variables as:
γ j (t ) = =
L(O, x(t ) = j | λ ) L(O | λ ) α j (t )β j (t ) L(O | λ )
(5)
263
A CONCATENATIVE SYNTHESIS BASED SPEECH SYNTHESISER FOR HINDI
where,
α j (t ) = L(o1 ,...., ot , x(t ) = j | λ )
β j (t ) = L(ot +1 ,...., oT | x(t ) = j, λ )
Thus, trained HMMs are obtained for each phone model. At this stage it is important to note that a few considerations need to be made for the Hindi language while formulating the models. The issue of full and half characters needs to be addressed here, either simply model half characters and a separate schwa and combine the two for full characters, or devise separate models for full characters also. For the proposed approach, either model should work properly since inherent vowel suppression needs to be considered while performing the phonetic transcription of words. The approach chosen should govern the significance of the training procedure, since separate models for full and half characters rule out the requirement for a model for schwa. To this effect, it is proposed to use this particular approach as concatenating a half character and a schwa to produce a full character representation may not always be spectrally feasible and result in artifacts in the acoustic signal that may eventually cause data misrepresentation. V. CONTEXT MODELING USING DECISION TREES The pronunciation of a word or a phone depends on its immediate context for which several phonological rules have been developed over the past several years. Though holding relevance for speech applications similar to recognition, this approach is not too successful in the field of speech synthesis as it relies too much on human perception rather than acoustic reality. Moreover, such rules only identify gross changes in contexts and are unable to model more subtle feature level changes in the speech data. Therefore, it would be useful if the contextual variations of different phones be presented through models that relate to the speech data at the feature level. In continuance with the above suggestion, a decision tree based approach is presented in this paper. While research has been done in this area, it has primarily been concentrated upon building triphone models and modelling context in terms of the immediate preceding and succeeding phones. The proposed decision tree implementation allows for flexibility in terms of the number of preceding and succeeding phones that may be considered for context modelling [5]. A. Pre-processing for tree construction The frames extracted from speech data are labelled by a vector quantiser using a predetermined alphabet. The frames are used to train individual phonetic hidden Markov models. Using the trained models and the Viterbi alignment procedure the labelled speech is then aligned against the phonetic base forms. This For each aligned phone obtained from the above process, a data record of the following form is created: • Identity of the current phone – p • The context – ‘k’ succeeding and preceding phones – pi. • Label sequence ‘y’ aligned against the current phone.
Thus, partitioning the entire data on the basis of identity of the phone, several instances of the phone being spoken in different contexts is obtained. B. Tree construction The objective is to combine contexts into equivalence classes and make models for each class, though this division should result in equivalence classes consisting of contexts that result in similar label strings. An effective way of implementing the above is by the use of binary decision trees. To begin construction, all the collected instances of the phone are taken at the root node and are then divided into two subsets which are further sub-divided into two and hence forth. The split at each node is based upon the answers to binary questions relating to the context of the phone being modelled. This evaluation function is based upon a probabilistic measure that is related to the homogeneity of a set of label strings. Splitting stops when either one of two stopping criteria is met: • Number of samples at a node falls below a certain threshold. • The evaluation function value falls below a certain threshold. During recognition, given a phone and its context, the decision tree of that phone is invoked and the answers to the questions associated with each node determine the final leaf which determines the model to be used for concatenation. C. Questions The question set consists of questions of the form Is Pi ε S where S is a subset of the set of all phones. The subset S can range from singleton sets consisting of simple phones or groupings based on acoustic classification e.g. affricates, plosives etc. The division into different questions is dependent on the language being considered as well as individual discretion. The rules governing inherent vowel suppression need to be implemented and incorporated into the question set. These rules can also be presented as binary questions as devised above to maintain uniformity of the question set. The Hindi language is a phonetic language and the general tendency of contextual details considered here is limited to only 2 phones either side of the phone under consideration.
[
]
D. Evaluation functions As previously mentioned, ‘y’ can be considered as a string of acoustic labels generated by the vector quantiser. If this string corresponds to single, then it can safely be assumed that the labels in the sequence are independent of each other. This simplification enables us to employ an evaluation function that is much easier to compute than the standard hidden Markov model approach. The compromise made on loss of contextual information is accounted for in the Hindi language since the difference in pronunciations is not primarily dependent on the actual phones in the immediate neighbourhood, as no definite rules to define such possibilities are formulated. As a result of being independent of each other, the order in which these labels occur in the string becomes inconsequential and ‘y’ can
264
GUPTA
be fully characterised by the number of times each label in the acoustic alphabet occurs in that string. Thus, a histogram is constructed associated with each string. This computational efficiency is better suited to the Hindi language, which is hardly rule-based when pronunciations are being considered. At a node in the decision tree, the associated question ‘q’ will divide the data at the node into two disjoint subsets. The objective of the evaluation function is to determine the particular question that results in the purest split. If yl and yr denote the label strings at the left and right successor nodes respectively, and Ml and Mr are the associated models, the evaluation function may be represented as follows: F ⎧ N l μ li Log μ li + ⎫ m (q , n ) = ∑ ⎨ ⎬ i =1 ⎩ N r μ ri Log μ ri − N n μ ni Log μ ni ⎭
(6)
where,
The mean rates may be calculated as follows:
y ∑ ε
i
for i =1, 2, ..., F
REFERENCES [1] [2]
[3]
[5]
μli = mean rate of the left successor node for Poisson model μ ri = mean rate of the right successor node for Poisson model μ rn = mean rate of the current node for Poisson model 1 N
The author thanks the Computer Vision and Pattern Recognition Unit (CVPRU) of the Indian Statistical Institute (ISI), Calcutta for their support and Prof. B. B. Chaudhuri for his able guidance during the course of the research and implementation of the project.
[4]
N l = number of label strings in the left child node N r = number of label strings in the right child node N n = total strings in the current node = N l + N r
μ ni =
ACKNOWLEDGMENT
(7 )
y Yn
where, ‘ F ’ = size of the acoustic alphabet. Thus, the evaluation function can be easily computed at every node using the Poisson model as demonstrated above. When the word to be synthesised is decomposed into its constituent phones, the HMMs corresponding to the phones are traversed posing the same evaluation questions and finally arriving at the most appropriate allophonic form of that particular phone given the context and IVS rules. VI. CONCLUSION AND FUTURE WORK This document presents a basic skeleton to develop a TTS system for Hindi based on decision tree and HMM, the benefit of using such system is that real life continuous speech database can be used for training after building the entire TTS and this speech data itself can be used for providing concatenative segments to produce words that otherwise may not even feature in the dictionary provided for phonetic transcription during the training phase. Thus, the concatenative approach serves as a better alternative than the formant rule based approach. Though the system proposed in this document is efficient, yet certain aspects of speech synthesis may further be incorporated to enhance the performance.
[6]
Bulyko, I., Ostendorf, M., “The Impact of speech recognition on speech synthesis,” in Proc. of the IEEE Workshop on Speech Synthesis, 11-13 Sept., 2002., pp. 99-106. Pitrelli, J. F., Bakis, R., Eide, E. M., Fernandez, R., Hamza, W., Picheny, M. A., “The IBM expressive text-to-speech synthesis system for American English”. IEEE transactions on Audio, Speech and Language Processing, vol. 14, no. 4, pp. 1099-1108, July 2006. Kumar, M., Rajput, N., Verma, A.,”A Large Vocabulary Continuous Speech Recognition System for Hindi”,IBM J. RES. & DEV. 48 (5/6) September/November 2004. Donovan, R. E., “Trainable Speech Synthesis,” Ph.D. thesis, Cambridge University Engineering Department, 1996. Rajput , N., Subramanium, L. V., Verma, A., “Adapting phonetic decision trees between languages for continuous speech recognition”, International Conference on Spoken Language Processing, vol.3, pp. 850-852, October 16-20, 2000. Kalika Bali, Partha Pratim Talukdar, N. Sridhar Krishna, A.G. Ramakrishnan, “Tools for the Development of a Hindi Speech Synthesis System”, In 5th ISCA Speech Synthesis Workshop, Pittsburgh, pp.109114, 2004.
Legibility on a Podcast: Color and Typefaces Lennart Strand Mälardalen University Information Design, IDP Box 325 631 05 Eskilstuna, Sweden
Abstract–As in printed matter, black text on white background is the most legible color combination on small computer screens like media players.
I. INTRODUCTION Color – on type and background – significantly affects legibility, both in print and on computer screens. Reference [1] writes that the most important consideration when working with type and color is to achieve an appropriate contrast between type and its background (p. 80, 2007): “It has long been considered that black type on a white background is the most legible (combination). While this combination remains an excellent choice, other alternatives may offer equal if not improved legibility due to improved digital and printing technologies, and the fact that color is a relative phenomenon. ––– Generally, all legibility guidelines related to working with color and type in print also apply to type appearing on a computer screen.” Reference [2] writes that a text on a computer screen must have good legibility, and that information designers should use typefaces designed for screen display, such as Trebuchet and Verdana. (“It Depends: ID – Principles and Guidelines”, p. 2425, 2007). Reference [3] writes that the background color on a computer screen should be fairly light or fairly dark, depending on the content. The text should have an opposite color, fairly dark or fairly light color. The best combination is black text on white or yellow background. Reference [4] found, in a study about airport signage, that black text on a yellow background is superior as compared to white on black, white on grey and black on white. Reference [5] studied the effects of six foreground/background color combinations, three fonttypes (Arial, Courier New, and Times New Roman), and two wordstyles (italicized and plain) on legibility of websites.
“Participants scanned simulated websites for a target word; readability was inferred from reaction time (RT). In general these results suggest that there is no one foreground/background combination, font, or wordstyle which leads to the fastest RT (i.e. best readability), but rather a designer must consider how each variable affects the other(s).” Note: the word “readability” in the reference 5’s study has the same meaning as “legibility” in this study. II. SMALL COMPUTER SCREENS In order to find out how color affects legibility and reading comfort on small computer screens on mediaplayers that can show videofilm, like an Apple iPod, I have asked students in information design at Mälardalen University in Eskilstuna, Sweden, to study: • five color combinations, typeface and background • five different typefaces • bold and regular typefaces • lower case and upper case type For the study I used an Apple iPod-screen, measuring 40 mm x 52 mm. Alternatively the examples could be studied as a video cast measuring 50 x 66 mm on a 17-inch computer screen. The text varied in all five examples, but the number of textlines were the same, four in all examples. The subjects were asked to study the color combinations thoroughly for legibility: Take all the time you need. Then rank the combinations on a scale from 1 to 5. Number 1 is the combination that you find the most comfortable to read on the iPod-screen. Number 5 is the combination that is the least comfortable for you to read on this small computer screen. The color combinations were: Black on yellow background Yellow on black Yellow on dark brown Black on white White on black
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 265–267. © Springer Science+Business Media B.V. 2008
266
STRAND
The typeface – Verdana Bold 50 points – was in this study the same on all of the tested backgrounds.
A. Five color combinations – type and background; which color combination is most comfortable for you to read?
In addition five different typefaces were studied for legibility on the same iPod-screen. The typed text was the same in all five examples, in order for the subjects to be able to concentrate on the typefaces. The text was black on a yellow background. The subjects were asked to study the typefaces thoroughly: Which one of these five typefaces are most comfortable for you to read on the iPod-screen? Please take all the time you need and study each typeface thoroughly. Then rank them on a scale from 1 to 5, with 1 as the type face that you find the most comfortable to read on this small computer screen and with 5 as the one that is the least comfortable for you to read.
33 subjects found that black text on white background had good legibility and that this color combination was the most comfortable combination to read on a small computer screen, out of the five examples. 25 subjects felt that white text on a black background was the most comfortable combination to read on a small computer screen. Answers 1+2 combined: Black on white 33 White on black 25 Black on yellow background 15 Yellow on black 14 Yellow on a brown-reddish background 6
The typefaces were: Verdana Bold Georgia Bold Futura Bold Helvetica 75 Bold Frutiger 65 Bold
B. Which typeface is most comfortable for you to read? Out of the five typefaces tested 30 subjects of 92 found Frutiger 65 Bold to be the most comfortable typeface to read on a small computer screen. 24 subjects found Helvetica 75 Bold to be the most comfortable typeface to read on a small computer screen.
The type size here was also 50 points.
Answers 1+2 combined: Frutiger 65 Bold 30 Helvetica 75 Bold 24 Verdana Bold 21 Georgia Bold 10 Futura Bold 7
I used bold typefaces in my examples in this study, though reference [6] in a study about headings in print, (Rubriker, p. 45-47, 2003), found that regular or normal type was easier to read than bold typefaces. Because of this recommendation I also asked the subjects to compare Verdana Bold with Verdana Regular on the iPod-screen; I then asked the subjects to write down what to them was the most comfortable typeface to read – bold or regular. The subjects were also asked to study – on the same iPodscreen or computer screen – one paragraph typed in lower case type, and the same paragraph typed in upper case type. The number of textlines were four, the typeface used was Verdana Bold 50 points. The subjects were then asked whether they prefer lower case type or upper case type on this small computer screen. Subjects were also asked to mark their answers with an F for females, and with M for males. All subjects in this study were under 30 years of age. III. THE ANSWERS Overall, 93 students participated in the project. They were asked to carefully study the various variations, and rank them on a scale from 1 to 5, with 1 as the most preferred example, and 5 as the least preferred example. In order to get as close as possible to the overall preferred examples I combined ranks 1 and 2. I did this because some subjects had a difficult time to chose between combinations 1 and 2.
C. Do you prefer bold or regular type? The typeface used here was Verdana, designed for screen. 27 of 46 subjects found it easier to read a regular typeface than a bold typeface. 19 subjects preferred bold typeface. Answers: Verdana Regular 27 Verdana Bold 19 D. Do you prefer lower case type or upper case type? The typeface used here was also Verdana. 45 of 47 subjects prefered lower case type compared to upper case type. 2 subjects prefered upper case type. Answers: Lower case type 45 Upper case type 2 E. What number of textlines are the most comfortable for you to read on a small computer screen like an iPod? 17 subjects preferred 4 textlines 11 subjects preferred 3 textlines 11 subjects preferred 5 textlines There were no significant differences in the choices of color and type between men and women.
LEGIBILITY ON A PODCAST: COLOR AND TYPEFACES
CONCLUSION/RECOMMENDATION The traditional choice, black type on white background is a good choice for a color combination on small computer screens like iPods. As is regular type, and lower case type. Four is the preferred number of textlines. REFERENCES [1] R. Carter, B. Day, P. Meggs, “Typographic Design: Form and Communication. Fourth Edition. Hoboken, New Jersey, USA: John Wiley & Sons, Inc. 2007.
267
[2] Pettersson, R, “It Depends: ID – Principles and Guidelines,” Tullinge, Sweden: Institute for Infology. 2007. [3] Bradshaw, A. C. “Evaluation Visuals for Instruction,” In R.E. Griffin, W.J. Gibbs & V.S. Villiams (Eds.) 2000: Natural Vistas Visual Literacy & The World Around Us. Selected Readings of the International Visual Literacy Association. International Visual Literacy Association. [4] Waller, R, “Comparing typefaces for airport signs,” Information Design Journal Volume 15 number 1, 2007 p. 1-15. Amsterdam/Philadelphia: John Benjamins Publishing. [5] Hill, A, (supervised by Scharff, L. V.) “Readability of screen displays with various foreground/background color combinations, font styles, and font types,” 1997. Retrieved 2007-08-24. [6] Pettersson, R, (2003). “Rubriker,” Stockholm: Stiftelsen Institutet för Mediestudier, 2003.
The Sensing Mechanism and the Response Simulation of the MIS Hydrogen Sensor Linfeng Zhang1, Erik McCullen2, Lajos Rimai2, K. Y. Simon Ng3, Ratna Naik4, Gregory Auner2 1
Department of Electrical Engineering University of Bridgeport, Bridgeport CT 06604 2 Department of Electrical and Computer Engineering Wayne State University, Detroit MI 48202 3 Department of Chemical Engineering and Material Science Wayne State University, Detroit MI 48202 4 Department of Physics and Astronomy Wayne State University, Detroit, MI 48202 Abstract-The Pd0.96Cr0.04 alloy gated metal-insulatorsemiconductor (MIS) hydrogen sensor has been studied. A new sensing mechanism is proposed that the sensor response is related to the protons. The capacitance-voltage (CV) curve, conductancevoltage (GV) curve, and the sensor response are simulated.
INTRODUCTION ydrogen is a clean energy carrier with zero emission into the environment and it also be widely used in chemical and semiconductor industries. However, hydrogen is explosive if the concentration of hydrogen is more than 4 % in the air. Although mass spectrometry and gas chromatography are sensitive and selective to hydrogen, the elaborate has handling system is complicated and bulky. To detect the hydrogen leakage for the safety and also measure the hydrogen concentration for the process control, small sensors with high sensitivity and wide dynamic range are necessary. Catalytic bead and electrochemical sensors are sensitive and selective to hydrogen, but they are not environment independent since oxygen is needed for the reaction with hydrogen. Resistive sensor is environment independent but the signal-to-noise ratio is not good. The miconductor hydrogen sensors with metal-oxide-semiconductor (MOS) structure were first reported by Lundstrom in 1975 [1, 2, 3] and are fabricated on substrates such as silicon or silicon carbide. This type of sensors is generally operated at temperatures above ambient and the gate material is Pd or its alloy [4]. The sensor works as capacitor and the CV curve shifts in the presence of hydrogen. The sensing mechanism has been extensively studied to improve the performance including the dynamic range, transient response [5], etc. However, there are still challenges in these sensors for the real application. Firstly, although the sensitivity of the sensor is very high and can detect hydrogen even at 1 ppm, it shows saturation at 60 ppm. Secondly, SiO2 as insulator is not stable in some chemicals especially in hydrogen at high concentration. The current sensing mechanism can not explain the shift of the CV shift. In this paper, a different sensing mechanism is proposed and the device characteristics (CV and GC) and the sensor response are simulated.
H
I.
DEVICE FABRICATION AND CHARACTERIZATION
III-IV compound semiconductor, aluminum nitride (AlN), due to its wide band gap (6.2 eV) is used as insulator instead of SiO2 in our device. Moreover, AlN is very stable in chemicals and its thermal expansion coefficient is similar to that of Si(111). In our device fabrication, n-Silicon (111) is used as the substrate, AlN is grown through plasma source molecular beam epitaxy with base pressure 10-10 Torr, and the Pd0.96Cr0.04 metal gate and aluminum back contact are deposited through magnetron sputtering with the base pressure 10-6 Torr. The thicknesses of AlN and metal gate are the same as 500 Å and the aluminum back contact is 2,000 Å. The electrical characterization of the sensors is conducted in an inhouse designed and built testing chamber. The CV and GV curves are obtained through a HP4192A impedance analyzer, which is controlled by a PC with LabView programs. Figure 1 is the CV (a) and GV (b) curves. With the increasing of frequency, the capacitance decreases and the conductance increases. This is related to the series resistance. In addition, in the CV curve of our device, only accumulation and depletion regions can be observed. From the capacitance at 1 kHz in the accumulation region, the dielectric constant of AlN can be calculated around 13.78 and the AlN is highly textured. Moreover, in the CV curves, the flat band voltages VFB are more positive at the higher frequency. Therefore, at high frequency, some negative charges are incorporated in the AlN during CV sweeping in either the forward or backward direction. The above result from the characterization shows that these devices are normal, thus they can be tested for the hydrogen response.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 268–272. © Springer Science+Business Media B.V. 2008
269
THE SENSING MECHANISM AND THE RESPONSE SIMULATION
EXPERIMENTAL RESULTS AND DISCUSSION
In our experiment, the capacitance is fixed with a PID feedback control loop and the voltage is measured as the response to hydrogen. Fig. 2 shows the typical C-V curves before and after the exposure to hydrogen. The C-V curve shifts to the negative bias voltage side in the exposure to hydrogen. This shift due to hydrogen has been reported from the all MOS/MIS hydrogen sensor.
N2 1000 ppm H2
Capacitance (pF)
II.
Bias Voltage (V) Figure 2 The C-V curves before and after the exposure to hydrogen
2500
500 Hz 900 Hz 4 kHz 15 kHz 100 kHz 1 MHz
Capacitance (pF)
2000
1500
A. SENSING MECHANISM In the current model of the hydrogen sensing mechanism, the hydrogen response is attributed to the induced hydrogen dipole moment of the hydrogen atoms and the response can be calculated by considering the dipoles as two parallel, homogeneous charged, sheets [6]. Thus the voltage shift is:
1000
ΔV =
500
0
-5
-4
-3
-2
-1
0
Bias Voltage (V) (a)
0.0012
Conductance (s)
0.0010 0.0008
500Hz 900Hz 4kHz 15kHz 100kHz 1MHz
0.0006 0.0004
ε
Where, ni is the hydrogen concentration,
-3
-2
-1
0
Bias Voltage (v) (b)
Figure 1 The CV (a) and GV (b) curves from Pd0.96Cr0.04 gated device at 80 oC
μ
is the
effective dipole moment, ε is the permittivity of the material in the dipole layer. An updated model includes the exothermic adsorptions of hydrogen atoms on the metal outer surface and the dipoles on the metal/insulator interface [7]. However, there are some unresolved issues in this mechanism. If there is a dipole moment of hydrogen atom, since the distance of this dipole is negligible if compared with the thickness of the MIS device, this dipole moment can be negligible and does not affect the thickness of the depletion region in Si side. Even if there is a dipole layer, it will introduce another series capacitance to the capacitance of the insulator as:
Cdipole =
0.0002 0.0000 -4
ni μ
ε
d
Here, d is the effective charge separation. From the CV curve, the additional series capacitance should shift CV curve lower in the exposure to hydrogen and this is different to the experimental result where the CV curve shifts to the negative bias side as in Figure 2. The only possible reason for the shift due to the exposure to hydrogen is related to some positive charge on the metal/insulator interface but in the insulator side. If the hydrogen atom loses one electron to the conduction band of metal and form proton layer on the interface, then there is the CV shift as hydrogen response observed. Since the number of the active sites on the metal/insulator interface is limited, if protons occupy all the sites, the sensor shows saturation to hydrogen. To calculate the number of the
270
ZHANG ET AL.
B.
CAPACITANCE-VOLTAGE SIMULATION
To understand the MOS / MIS device and the device testing including the CV and DV measurement, a 1-D model is constructed involving the solution of Poisson equation because the device thickness is much smaller than the other two dimensions. For positive voltage, electrons are accumulated on the surface. As a result, the MIS structure appears almost like a parallel–plate capacitor, dominated by the insulator properties Ci=εi/d. As the voltage become negative, the semiconductor surface is depleted. Thus a depletion-layer capacitance Cs is added in series with Ci. The frequency dependence of the CV curves is due to the different response to the ac probe of the two types of carriers, namely electrons and holes, as well as to the presence of series resistance [8]. Moreover, the experimental CV curves from our MIS sensors show no inversion region when the bias is very negative. This indicates that the holes have too long response time and can not follow the frequency of the applied a.c. voltage even at 1 Hz [9]. To get a correct CV curve at such frequency, in the calculation of the capacitance, the hole density term should be removed from the charge neutrality equation. However, this term is still needed in order to calculate the Fermi level in bulk Si. The details in the simulation are available in Ref. [9].
2000 100 Hz 10 kHz 50 kHz 100 kHz 1 MHz
C (pF)
1500 1000 500 0 -4
-3
-2
-1
0
1
2
3
4
Bias Voltage (v) Figure 3 The simulated CV curves at different frequencies
4000 Pd-Cr gated device Simulated Pd-Cr gated device
3500
Capacitance (pF)
the interface are 2.72×1015 m-2 for our device. Eriksson found that the number of the active site was related to the oxygen concentration on the surface of the insulator [2]. Fogelberg deduced the number of the adsorption sites on the Pd/SiO2 interface is 6×1017 m-2 with the assumption of the dipole moment of the hydrogen atoms and obtained the surface sites of 1×1019 m-2 through experiment [1]. This different result of the interface site number is attributed to the different sensing mechanisms and metal/insulator interfaces.
Figure 3 shows the CV curves of device. The difference between the ideal one and the real device is the fixed positive charge in the device, this charge is most probably related to the oxidation of the real device. Thus the whole curve shifts left.
3000 2500 2000 1500 1000 500 0
10000
100000
1000000
Frequency (Hz) (a)
0.0032 0.0028
Conductance (S)
interface sites, the capacitance in the accumulation region can be obtained from the low frequency CV curves without the consideration of the series resistance effect. The surface charges can be calculated from the constant capacitance voltage shift Q = C0 ΔV . Thus the densities of the sites on
Pd-Cr gated device Simulated Pd-Cr gated device
0.0024 0.0020 0.0016 0.0012 0.0008 0.0004 0.0000 10000
100000
1000000
Frequency (Hz) (b) Figure 4 The comparison between the simulated and the experimental capacitance-frequency and conductance-frequency curves from Pd-Cr gated device
Figure 4 (a) shows the relationship between the capacitance in the accumulation region (0 bias) and the frequency. Due to the series resistance, the capacitance decreases with the increasing of the frequency. At low frequency (<10 kHz), the simulated result is relatively close to the experimental result. But, there is big difference at high frequency. Figure 4 (b) shows the relationship between the conductance and the frequency. In the experimental data, there is a peak in the curve, this is due to the interface states between aluminum nitride and silicon [8]. In the future, the interface states will be considered into this model. To simulate the hydrogen response, the proton diffusion in the thin film need to be considered. A electrochemical method, “stripping”, is used to measure the diffusion coefficients in different metal thin films. Figure 5 shows the diffusion
271
THE SENSING MECHANISM AND THE RESPONSE SIMULATION
2000
Capacitance (pF)
coefficient changes with the Cr concentration. For 600 Å PdCr film, there is no much difference in the diffusion coefficient from the films with different Cr concentration. However, for 1000 Å Pd-Cr film, with the increasing of the Cr concentration, the diffusion coefficient decreases quickly. Because the sensor with more than 10 % Cr doesn’t show any sensitivity to hydrogen, we focus on the thin film with low Cr concentration. Figure 6 shows the proton concentration profile in Pd-Cr thin film at different time due to the proton diffusion, it is very fast to approach the steady state.
1500
1000
500 without hydrogen with 1% hydrogen
0 -3
-2
-1
0
1
Bias Voltage (V) 11
1000 A 1000 A 600 A 600 A Surface composition effect on D
10
2 -1
ms )
9 8
DH (10
-15
7
Figure 7 The left shift of the CV curve due to 1% hydrogen
Currently, the data of the chemical kinetics in the hydrogen adsorption, dissociation, and desorption is not available. The response simulation is based on our sensing mechanism and the steady-state response. Figure 7 shows the shift of the CV curve when hydrogen concentration is 1%.
H
6 5
IV
4 3
0
10
20
30
A new sensing mechanism is proposed that the sensor response is related to the protons layer on the metal/insulator interface. The CV and GV curves are simulated at different frequencies. With the model based on our sensing mechanism, the CV shift due to hydrogen is also simulated. In the future, the chemical kinetics and the device electronics will be combined into one model to simulate the sensor’s transient response. Moreover, the interface states on the insulator/semiconductor interface will be considered as well.
40
Cr (%)
Dimensionless Concentration
Figure 5 The effect of the Pd-Cr composition on DH
-2
1.0
CONCLUSIONS
D=2.8E-11cm /s
0.9 0.8 0.7 0.0015s 0.003s 0.0045s 0.006s 0.0075s 0.009s
0.6 0.5 0.4 0
200
REFERENCES
400
600
800
1000
Thickness (A) Figure 6 proton concentration distribution in the Pd alloy film at different time.
[1] J. Fogelberg, et al., “Kinetic modeling of hydrogen adsorption absorption in thin-films on hydrogen-sensitive field-effect devices - observation of large hydrogeninduced dipoles at the pd-sio2 interface,” Journal of Applied Physics, vol. 78(2), pp. 988-996, 1995 [2] M. Eriksson, I. Lundstrom, L.G. Ekedahl, “A model of the Temkin isotherm behavior for hydrogen adsorption at Pd-SiO2 interfaces,” Journal of Applied Physics, vol. 82(6), pp. 3143-3146, 1997 [3] M. Eriksson, et al., “The influence of the insulator surface properties on the hydrogen response of field-effect gas sensors,” Journal of Applied Physics, vol. 98(3), pp. 034903-034903(6), 2005. [4] L. Zhang, E. F. McCullen, et al., “Response to Hydrogen of a metal/AlN/Si thin film structure: effects of composition and structure of a combination Pd-Cr gate,” Sensors and Actuators B-Chemical, vol. 113(2), pp. 843851, 2006.
272
ZHANG ET AL.
[5] L. Zhang, E. F. McCullen, L. Rimai, et al. “Reverse Response Transients in a Pd-Ni/AlN/n-Si Hydrogen Sensor,” Sensors and Actuators B, vol. 123(1), pp. 277282, 2007, [6] I. Lundstrom, M. Armgarth, L.G. Petersson, “Physics with Catalytic Metal Gate Chemical Sensors,” Crc Critical Reviews in Solid State and Materials Sciences, vol. 15(3), pp. 201-278, 1989. [7] M. Eriksson, et al., “The influence of the insulator surface properties on the hydrogen response of field-effect gas sensors,” Journal of Applied Physics, vol. 98(3), pp. 034903-034903(6), 2005. [8] E.F. McCullen, et al., “Electrical characterization of metal/AlN/Si thin film hydrogen sensors with Pd and Al gates,” Journal of Applied Physics, vol. 93(9), pp. 57575762, 2003. [9] Zhang, Linfeng, Ph.D. Dissertation, Wayne State University, 2006
Visual Extrapolation of Linear and Nonlinear Trends: Does the Knowledge of Underlying Trend Type Affect Accuracy and Response Bias? Lisa A. Best University of New Brunswick [email protected] Abstract- The purpose of these experiments was to examine the ability of experienced and inexperienced participants to predict future curve points on time-series graphs. To model real-world data, the graphed data represented different underlying trends and included different sample sizes and levels of variability. Six trends (increasing and decreasing linear, exponential, asymptotic) were presented on four graph types (histogram, line graph, scatterplot, suspended bar graph). The overall goal was to determine which types of graphs lead to better extrapolation accuracy. Participants viewed graphs on a computer screen and extrapolated the next data point in the series. Results indicated higher accuracy when variability was low and sample size was high. Extrapolation accuracy was higher for asymptotic and linear trends presented on scatterplots and histograms. Interestingly, although inexperienced participants made expected underestimation errors, participants who were aware of the types of trends they would be presented with made overestimation errors.
I.
INTRODUCTION
It is nearly impossible to turn on the television or radio or read a newspaper without being exposed to some issue involving growth or decline. We are faced daily with population growth, economic growth, and stock market dips and rises [5, 10] and phenomena such as this can have an important bearing on human affairs and failures to understand these processes can have grave consequences. Although humans are quite accurate at predicting linear change, accuracy is much lower when nonlinear trends are presented [30, 33] and the rate of growth or decline is consistently underestimated [12, 13, 34, 35]. MacKinnon and Wearing [22] examined the ability to extrapolate future data points for different linear and nonlinear trends and included 100 data points and a growth (or decay) rate of 6%. Participants were presented with a single data point and asked to extrapolate the value of the series in the next time period. After their last extrapolation, participants identified the function they had seen. Results showed that forecasting was quite accurate (with a slight bias towards underestimation), identification ability was poor (only 25% of participants were able to identify the trend they had seen). Wagenaar and Timmers [34] suggested that time-series extrapolation is a two-step process. In the first step, one must identify the underlying series properties (e.g., the direction of the trend, amount of acceleration, etc.). To evaluate if people can discriminate between different trends, Best, Smith, and Stubbs [3] presented participants with graphs of increasing and
decreasing asymptotic, exponential, and linear trends and found that discrimination accuracy was high when the level of variability was low and the sample size was high. Overall, participants made correct discriminations on 67% of exponential trials, 65% of asymptotic trials, and 54% on linear trials. Given a chance performance of 16.6%, it appears that participants could identify underlying trend properties. After the underlying trend is identified, one must use this information to extrapolate future series values. On extrapolation tasks accuracy is often low and underestimation errors are common [13, 16, 30]. The tendency towards underestimation has been found in many modifications of the forecasting paradigm [24]. Typical experiments present successive values and then ask some variant of the question, what will the value be if the growth continues as it is? Underestimation is not affected by the shape of the underlying trend [13, 19], the phrasing used to explain the task, the data presentation method [18, 19, 20], an awareness of the tendency of the bias [1], or the expertise of the judge [28, 33]. Graphical Analysis of Time Series Data In many situations graphs induce people to see phenomena that were previously undetected. In particular, time series graphs present temporal variation in some discrete or continuous dimension [see 23] and can make salient important changes in natural processes. Typically, time series data are presented graphically as a line graph, with time on the horizontal axis and the “amount” of the variable in question on the vertical axis [17]. In order for such graphs to be effective, it is important to consider carefully what information has to be conveyed and the best way to convey this information [5, 6, 7, 31, 36]. Although some research indicates that people underestimate exponential growth even when shown graphs, the possibility specific types of graphs are superior in revealing nonlinear change has not been fully investigated. Although one purpose of a graph is to allow a reader to access important data and draw accurate conclusions, the precision of these conclusions is dependent upon the task. Culbertson and Powers [8] found that bar graphs are particularly useful if observers are required to estimate specific quantities and other researchers [3, 17] found that line graphs and scatterplots were more appropriate when it is necessary to identify the underlying trend. Best and her colleagues [3, 4] found that the graph reading accuracy was affected by both the
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 273–278. © Springer Science+Business Media B.V. 2008
274
BEST
type of graph presented and the underlying trend type— whereas line graphs best conveyed linear trends, bar graphs and suspended bar graphs were superior when nonlinear trends were presented. The Effects of Variability and Sample Size Despite the fact that it is generally accepted that presenting more data points enhances the ability to determine the trend presented on a graph [2, 4, 21], Wagenaar and Timmers [35] reported more underestimation when larger sample sizes were presented. In a classic study, Wagenaar and Timmers [34] examined the biases associated with exponential growth by presenting participants with a set of data points (n=3, 5, or 7) and asking them to determine the value that would be reached in k years. There was less underestimation when fewer data points were presented. The results of this study are consistent with Wagenaar and Timmers [35]. Andreassen and Kraus [1] proposed the salience hypothesis to explain this finding. This hypothesis can be illustrated using an example from Wagenaar and Timmers [34]. When five data points (3, 7, 20, 55, 148) were presented, there was more underestimation than when three points (3, 20, 148) were presented. The salience hypothesis explains the greater degree of underestimation in terms of the absolute differences between the individual data points. In the case of exponential growth, the steep increase is more salient if fewer points are present because there is a larger difference between the individual data points. These large differences lead participants to interpret the series as growing more quickly. When more data points are presented, the steep exponential increase is not as noticeable because the magnitude of the differences between the individual data points is not as great. Statement of the Problem The present experiments were designed to investigate several of the issues involved in the perception of nonlinear relationships. Under optimal conditions, human performance may rival that of statistical tests [2, 4]. However, if not enough data are presented [2, 4, 21, 27], if the level of variability is high [9, 11, 25, 26], or if the display method is inadequate [5, 32] accuracy can decline. In this study, participants were presented with one of six different trend types. The experimental stimuli included increasing and decreasing exponential, asymptotic, and linear trends. The purpose of these experiments was to integrate several independent variables in order to determine how each factor affects extrapolation accuracy. Participants were shown a graph (bar graph, scatterplot, line graph, or suspended bar graph) with different trends (with differing levels of underlying variability and sample size) and asked to predict the next data point. II. EXPERIMENT ONE Participants Six participants served in the experiment. Before the experiment, participants were shown examples of the
underlying curves (standard deviation = 0) and samples of the curves with added variability. All participants had served in a study designed to examine the ability of participants to discriminate between different exponential, asymptotic, and linear trends [3]. As is typical with psychophysical experiments, the number of participants was small, with each participant making a large number of perceptual judgments. Apparatus All stimuli were presented on an IBM compatible computer (with a screen resolution of 640 by 480 pixels). Procedure Participants viewed graphs on a computer screen and were instructed to predict the next data point in the series (for sample stimuli, see Figure 1). Each participant was presented with 36 sessions of 54 trials, resulting in 1944 trials per participant.
Figure 1.
Sample stimuli that were presented to participants. In the top row, an increasing exponential trend with a low level of variability is presented with 9, 17, or 33 data point. In the bottom row, a decreasing asymptotic trend is presented. The left panel illustrates the trend with low variability, the middle panel includes a moderate level of underlying variability, and the right panel presents a decreasing asymptotic curve with high variability.
The experimental stimuli were generated using basic formulae that defined the underlying trends. The formulae were as follows: decreasing asymptotic [Y=1.75 - 0.75 (LOG(X-.5) / LOG(10))]; decreasing exponential [Y=0.804 + 0.75 (LOG (37.5 - (X+4)) / LOG(10))]; decreasing linear [Y = 2.02 - 0.41(X)]; increasing asymptotic [Y = 0.804 + 0.75 (LOG(X-0.5) / LOG(10))]; increasing exponential [Y = 1.75 0.75 (LOG (37.5 – (X+4)) / LOG(10))]; and increasing linear [Y = 0.54 + 0.04 (X+4)]. The formulae were chosen in order that each of the curves would have a similar start and end value. Variability was determined by calculating a standard normal deviate from a population with a mean of 0 and a standard deviation of 1 and transforming that to a deviation value to be added to or subtracted from the underlying curve point. The standard normal deviate was multiplied by .22 for low variability, by .44 for moderate variability, and by .66 for high variability. Each trend was presented with 9 (low sample size), 17 (intermediate sample size), or 33 (high sample size)
VISUAL EXTRAPOLATION OF LINEAR AND NONLINEAR TRENDS
data points. The trends were presented in four different graph types—line graph, histogram, scatterplot, and suspended bar graph. The result was a 4 (graph) x 3 (variability) x 3 (sample size) x 6 (trend type) repeated measures design. On each trial, participants were required to predict where the next data point in the series. A data point appeared on the screen and participants adjusted its vertical position (using the up and down arrow keys) until they were satisfied with their estimate. The initial height of the data point to be adjusted was randomized to control for over- and underestimation biases based solely on initial location. After participants were satisfied with their prediction, they clicked “OK” to move on to the next trial. Participants were instructed to work through the trials quickly but to take as much time as they needed to be confident about their judgment. Results and Discussion Extrapolation Accuracy To assess extrapolation ability, participant accuracy was calculated. Extrapolation error was defined as the absolute difference between the predicted point and the next point on the underlying curves. Smaller deviations indicated lower errors and higher extrapolation accuracy. A 4 (graph) x 3 (variability) x 3 (sample size) x 6 (trend type) repeated measures ANOVA was conducted on absolute accuracy. There was a statistically significant main effect for graph type, F(3, 15)=5.01, p<.05, η2 = .50. When histograms or suspended bar graphs were used to present the data, the mean error was .50. The corresponding error was .56 when scatterplots were used to present the data. When line graphs were used, error was highest at .73. Post hoc tests (in all instances, Tukey’s LSD test was used to further evaluate statistically significant main effects and interactions) showed that accuracy was significantly lower for line graphs than for scatterplots or suspended bar graphs. There were no significant differences between line graphs and histograms. Thus, at least for extrapolation tasks, it appears that scatterplots and suspended bar graphs lead to higher extrapolation accuracy. As expected, overall error was lower when variability was low and increased as variability increased. The statistically significant interaction between variability and trend type, F(10, 50)=2.51, p<.05, η2 =.33, indicated that the effects of variability are more pronounced when exponential trends were presented. As can be seen in Figures 2a and Figure 2b, regardless of the level of variability, error was highest when exponential trends were presented. When variability was low or moderate, error was higher when increasing exponential trends were presented but when variability was high, error was typically higher for all trend types.
275
Figure 2. Caption for Figure 3. Absolute extrapolation error of experienced participants for increasing and decreasing exponential, asymptotic, and linear trends as a function of variability (Panel A). Error bars represent the standard error of the mean. Absolute extrapolation error of exponential, asymptotic, and linear trends as a function of variability. There were no statistically significant differences in accuracy for the increasing and decreasing trends (Panel B).
There was a statistically significant interaction between sample size and trend type, F(10, 50)=2.11, p<.05, η2 =.33. For exponential trends (increasing or decreasing), error was similar across all levels of sample size (see Figure 3a and Figure 3b) but when linear and asymptotic trends were presented, error was significantly lower when sample size was 33. Thus, it appears that increasing the number of data points on a graph had little effect on the extrapolation of exponential trends but led to improved accuracy when linear and asymptotic trends were presented.
Figure 3. Absolute extrapolation error of experienced participants for increasing and decreasing exponential, asymptotic, and linear trends as a function of sample size (Panel A). Absolute extrapolation error of exponential, asymptotic, and linear trends as a function of sample size. Again, there were no statistically significant differences in accuracy for the increasing and decreasing trends (Panel B).
Over- and Underestimation Biases Extrapolation bias can be defined as the mean of the signed extrapolation errors. In this case, positive values indicate that the predicted point that was greater than the actual curve point and negative values indicates that the prediction was lower than the actual value. Overall, participants made systematic over- and underestimation errors, F(5, 25)=9.2, p<.0001, η2 =.65. In all cases, the change in the trend was judged to be greater than the actual change, indicating overestimation errors. As can be seen in Figure 4, the degree of overestimation was greater when the trend was increasing and was closer to zero for decreasing trends. The overall bias was greater for exponential and linear trends and lower for asymptotic trends. There was a significant interaction between sample size and trend type, F(10, 50)=40.96, p<.05, η2 =.89. For asymptotic
276
BEST
and linear trends, sample size had very little effect on the extrapolation errors (see Figure 4a) and regardless of the number of data points, there was a tendency towards overestimation. This pattern did not hold for exponential trends (see Figure 4b). For both increasing and decreasing exponential trends, there was a tendency to overestimate when the sample size was 17 or 33. However, when the sample size was 9, this pattern reversed and the participants underestimated the change. For all exponential curves, post hoc tests revealed that extrapolations when N=9 were different from those made when N=17 or N=33.
Figure 4. Over- and underestimation errors of experienced participants as a function of trend type and sample size. Panel A shows the estimation errors associated with increasing and decreasing linear trends and Panel B shows the estimation errors associated with exponential trends.
III. EXPERIMENT TWO Although much of the previous research [24] found that, on extrapolation tasks, participants underestimated future points on exponential trends, the results from Experiment 1 produced an opposite finding. Overall, participants overestimated the degree of change in both increasing and decreasing trends and it is possible that prior knowledge about the six trend types affected predictions. In Experiment 1, participants were aware of the six trend types had considerable experience with detecting the trends. This prior knowledge could have led to predictions that were higher than those made by inexperienced participants. The purpose of this experiment was to examine the forecasting accuracy and bias of inexperienced participants, using the same procedure as that described above. Participants Four students enrolled in an undergraduate class in Sensation and Perception served as participants. Participants were informed that they would be presented with increasing and decreasing linear and nonlinear trends but no additional information was given. Procedure The procedure was identical to that outlined in Experiment 1. Results The purpose of this experiment was to examine the effects of variability, sample size, graph type, and trend type on the extrapolation accuracy of inexperienced participants. A 3 (variability) x 3 (sample size) x 4 (graph type) x 6 (trend type) repeated measures analysis of variance was conducted on extrapolation accuracy. As expected, there were significant main effects for variability (F(2,6)=20.66, p=.002, η2 =.88), sample size (F(2,6)=20.11, p=.002, η2 =.87), and trend type (F(5,15)=4.37, p=.012, η2 =.59). The effects of these variables
on extrapolation accuracy were similar for the inexperienced and experienced participants—accuracy was highest for asymptotic trends, accuracy dropped when linear trends were presented and was lowest when exponential trends were presented. As expected, overall performance was best when variability was low and sample size was high. An overall comparison of the inexperienced and experienced participants revealed no significant accuracy differences, F(1,8)=.003,p=.96. The absolute accuracy was similar for each of the trends and accuracy was highest for asymptotic, intermediate for linear, and lower for exponential trends. Interestingly, the participant type by sample size interaction was statistically significant, F(2, 16)=7.43, p=.005, η2 = .48. As was previously discussed, among experienced participants, accuracy for was the similar (and higher) when N=17 or N=33 and was lower when N=9. Inexperienced participant accuracy was higher (and virtually identical) when sample size was 17 or 33 and was significantly lower when sample size was 9. Thus, for experienced participants accuracy improved when the same size increased from 9 to 17 points but for inexperienced participants, accuracy was higher when the sample size was lower. The accuracy of the inexperienced participants was not affected by graph type, F(3,9)=1.74, p=.23, η2 = .37. Performance was similar, regardless of whether histograms (M=.53), line graphs (M =.47), scatterplots (M =.50), or suspended bar graphs (M = .65) were used. The interaction between participant type and graph type was statistically significant, F(3, 24)=4.47, p=.013, η2 = .36. Extrapolation accuracy of inexperienced participants was similar for all graph types but the error of experienced participants was lower when line graphs were used. Over- and Underestimation Biases In order to assess the effects of variability, sample size, graph type, and trend type on bias, a repeated-measures analysis of variance was conducted on the signed extrapolation errors for each trend type. Overall, results indicate no significant differences between experienced and inexperienced participants, F(1,8)=2.35, p=.17, and similar effects for variability, sample size, and graph type. Among the inexperienced participants there was a significant main effect for trend type, F(5, 15)=31.66, p=.001, η2 = .91. As can be seen in Figure 6, the bias of inexperienced participants was closest to zero when asymptotic trends were presented and higher for linear and exponential trends. When increasing trends were presented, extrapolation bias was lowest for asymptotic trends (M=-.12) and higher for linear (M=-.21) and exponential trends (M=-.32). When decreasing trends were presented, extrapolation bias was larger (asymptotic bias=.38; exponential bias=.58; linear bias=.46). In each case, these predictions represent an underestimation. There was a significant interaction between participant type and trend type, F(5, 40)=35.83, p=.000, η2 = .82. As can be seen on Figure 5, the bias of inexperienced and experienced participants varied according to trend type. Post hoc tests revealed statistically significant differences between
VISUAL EXTRAPOLATION OF LINEAR AND NONLINEAR TRENDS
experienced and inexperienced participants for each trend type. Overall, experienced participants overestimated the amount of growth presented and inexperienced participants underestimated the amount of growth presented. Thus, the underestimation bias found by previous researchers may be dependent upon the experience level of the forecaster. When experienced forecasters have information about the trend presented in a time-series graph, there is a tendency towards overestimation but when inexperienced forecasters (who are unaware of the underlying trend type) are forced to predict future points there is a tendency towards overestimation.
Figure 5. Over- and underestimation errors of inexperienced and experienced participants as a function of trend type. (DE = decreasing exponential; IE = increasing exponential; DA = decreasing asymptotic; IA = increasing asymptotic; DL = decreasing linear; IL = increasing linear).
There was an additional interaction between graph type and participant type, F(3,24)=4.95, p=.008, η2 = .38. Overall, the bias of the experienced participants was lower than that of the inexperienced participants. The differences were largest when histograms and suspended bar graphs were used to present the data and were quite small when line graphs and scatterplots were used. Post hoc tests reveal that the bias of experienced participants was significantly lower than that of inexperienced participants when histograms and suspended bar graphs were used to present the trends. When line graphs and scatterplots were used, the bias of experienced and inexperienced participants was similar, suggesting the importance of selecting graph types that maximize the ability levels of forecasters. IV. GENERAL DISCUSSION The Extrapolation of Future Points on Time Series Graphs Experiment 1 examined the ability of experienced participants to predict future points on a time series graph. As expected, extrapolation accuracy was best when variability was low and sample size was high. Extrapolation accuracy was lowest for exponential curves and was higher when asymptotic trends were presented. Past research [13, 14, 15, 33] has shown that the prediction of sharp growth or decay is difficult and the present results extended these findings them to the case of graphically displayed trends, which were expected to provide favorable conditions for making such judgments. The extrapolation accuracy of experienced participants was highest when the data were presented on bar graphs,
277
scatterplots, and suspended bar graphs and was lower when line graphs were used. Performance of inexperienced participants did not depend on graph type and accuracy was similar for all graph types. These findings partially confirm previous claims [36] that although line graphs may be more effective for discrimination of patterns or trends, other graphical formats may be optimal when the reader must predict specific data points. In these experiments, participants had two tasks: to determine the pattern of the underlying trend and, based on that pattern, to predict future points. Experienced participants [3] were aware of the types of trends that they would be presented with and, thus, their accuracy may not have been as dependent upon accurate trend discrimination. Inexperienced participants, on the other hand, were unaware of the trend types and their accuracy was dependent upon accurate discrimination. Previous studies [3, 36] have shown that line graphs effectively convey trend information and it is possible that accuracy of inexperienced participants was higher when line graphs were used because they made the underlying trend obvious. If extrapolation is indeed a two-step process, the optimal graph type may depend upon whether the reader is aware of and can base their predictions on the underlying trend. Extrapolation Biases Although much of the previous research found that, on extrapolation tasks, participants tend to underestimate future points on exponential trends, the current studies only partially confirmed these findings. Although inexperienced participants underestimated future curve points, experienced participants made overestimation errors. For all participants the estimation errors were greatest for exponential and linear trends and closer to zero for asymptotic trends. For inexperienced participants and in many real-world situations, deciding on the type of trend is difficult because there are, in principle, an unlimited number of possibilities. In Experiment 1, participants were familiar with the underlying trends and were aware that one of six possible trends was present and this knowledge could have led participants to overestimate future data points. For example, participants were aware that exponential trends would be presented on one third of the trials. This awareness, paired with the knowledge that the last few data points on exponential trends represent a sharp increase or decrease, could have led participants to make judgments that were either too high for increasing trends or too low for decreasing trends. The exact mechanism by which such knowledge could have produced overestimations of change remains unclear, but it is possible that changeenhancing memory distortions play a role [29]. Underestimation and the Salience Hypothesis. There was one interesting exception to the finding of overestimation bias of experienced participants. When exponential trends with a small sample size were presented, participants underestimated the amount of change in future curve points. This finding contrasts with previous research on exponential trends [34, 35] and the results of Experiment 2 in which small sample sizes
278
BEST
[10] G. DeSanctis, “Computer graphics as decision aids: Directions for research,” Decision Sciences, 15, 463-487, 1984. [11] M. J. Furlong and B. E. Wampold, “Intervention effects and relative variation as dimensions in experts' use of visual inference,” Journal of Applied Behavior Analysis, 15, 415-421, 1982. [12] P. Goodwin and G. Wright, “Heuristics, biases and improvement strategies in judgmental time series,” Omega: The International Journal of Management Science, 22, 553-568, 1994. [13] G. V. Jones, Polynomial perception of exponential growth. Perception and Psychophysics, 21, 197-200, 1977. [14] G. V. Jones, A generalized polynomial model for perception of exponential series. Perception and Psychophysics, 25, 232-234, 1977. [15] G. V. Jones, Perception of inflation: Polynomial not exponential. Perception and Psychophysics, 36, 485-487, 1984. [16] G. Keren, Cultural differences in the misperception of exponential growth. Perception and Psychophysics, 34, 289-293, 1983. [17] S. M. Kosslyn, Elements of graph design. New York: W. H. Freeman, 1994. [18] M. J. Lawrence, R. H., Edmundson, , and M. J. O’Connor, An examination of the accuracy of judgmental extrapolation of time series. International Journal of Forecasting, 1, 25-35, 1985. [19] M. J. Lawrence, R. H., Edmundson, , and M. J. O’Connor, The accuracy General Conclusions of combining judgmental and statistical forecasts. Management Science, 32, 1521-1532, 1986. One of the goals of the studies reported here was to systematically examine how variability, sample size, and graph [20] M. Lawrence and S. Makridakis, Factors affecting judgmental forecasts and confidence intervals. Organizational Behavior and Human Decision type affect discrimination and extrapolation accuracy. For Processes, 42, 172-187, 1989. both experienced and inexperienced participants, extrapolation [21] S. Lewandowsky and I. Spence, The perception of statistical graphs. Sociological Methods and Research, 18, 200-242, 1989. accuracy was highest and bias was lowest when asymptotic trends were presented. Overall, more errors occurred when [22] A. J. MacKinnon and A. J. Wearing, Feedback and the forecasting of exponential change. Acta Psychologia, 76, 177-191, 1991. linear and exponential trends were presented. Performance [23] D. McDowall, R. McCleary, E. E. Meidinger, and R. A. Hay Jr., was generally best when the level of underlying variability was Interrupted time series analysis. Beverly Hills, CA: Sage, 1980. low and sample size was high (with the exception of the lower [24] E. Mullett and Y. Cheminat, “Estimation of exponential expressions by high school students,” Contemporary Educational Psychology, 20, 451accuracy of inexperienced participants when sample size was 456, 1995. high). Interestingly, overall the performance of the experienced [25] K. J. Ottenbacher, “Reliability and accuracy of visually analyzing graphed and inexperienced participants was similar. data from single-subjects designs,” American Journal of Occupational Therapy, 40, 464-469, 1986. ACKNOWLEDGMENT [26] K. J. Ottenbacher, “Visual inspection of single-subject data: An empirical analysis,” Mental Retardation, 28, 283-290, 1990. I would like to thank D. Alan Stubbs and Laurence D. Smith [27] H. S. Park, L. Marascuilo, and R. Gaylord-Ross, “Visual inspection and statistical analysis in single-case designs,” Journal of Experimental for their invaluable work on this project. Without their Education, 58, 311-320, 1990. guidance, the completion of these studies would have been [28] N. R. Sanders and L. P. Ritzman, “The need for contextual and technical difficult. knowledge in judgmental forecasting,” Journal of Behavioral Decision making, 5, 39-52, 1992. [29] L. Silka, Intuitive judgments of change. New York: Springer-Verlag, REFERENCES 1989. [30] H. Timmers and W. A. Wagenaar, “Inverse statistics and misperception of [1] B. Andreassen and S. J. Kraus, “ Judgmental extrapolation and the exponential growth,” Perception and psychophysics, 21, 558-562, 1977. salience of change,” Journal of Forecasting, 9, 347-372, 1990 [31] E. R. Tufte, The visual display of quantitative information. Cheshire, CT: [2] L. A., Best, L. D., Smith, D. A., Stubbs, and R. B. Frey, “Sensitivity and Graphics Press, 1983. bias in graphical perception,” Paper presented at the Eastern Psychological Association Annual Meeting, Providence, RI, March 1999. [32] E. R. Tufte, Envisioning information. Cheshire, CT: Graphics Press, 1990. [3] L. A., Best, L. D., Smith, and D. A., Stubbs, “Perception of Linear and Nonlinear Trends: Using Slope and Curvature Information to Make Trend [33] W. A. Wagenaar, and S. D. Sagaria, “Misperception of exponential growth,” Perception and Psychophysics, 18, 416-422, 1975. Discriminations,” Perceptual and Motor Skills, 104, 707-721. [34] W. A. Wagenaar and H. Timmers, “Extrapolation of exponential time [4] L. A., Best, L. D., Smith, R. B. Frey and D. A., Stubbs , “Perception of series is not enhanced by having more data points,” Perception and graphs: Discriminability and variability,” Paper presented at the Maine Psychophysics, 24, 182-184, 1978. Biological and Medical Sciences Symposium, Biddeford, ME, June 1999. [35] W. A. Wagenaar and H. Timmers, “The pond-and-duckweed problem: [5] J. M. Carroll, “Human-computer interaction: Psychology as a science of Three experiments on the misperception of exponential growth,” Acta decision,” Annual Review of Psychology, 48, 61-83, 1999. Psychologia, 43, 239-251, 1979. [6] W. S. Cleveland, “The elements of graphing data,” Monterey, CA: [36] A. Wallgren, B. Wallgren, R. Persson, W. Jorner, and J. Haaland, Graphing Wadsworth, 1984. statistics & data: Creating better charts. London: Sage, 1996. [7] W. S. Cleveland, “A model for studying display methods of statistical graphics,” Journal of Computational and Graphical Statistics, 2, 323-343, 1993. [8] H. M. Culbertson and R. D. Powers, “A study of graph comprehension difficulties,” Audio-Visual Communication Review, 7, 97-100, 1959. [9] A. DeProspero and S. Cohen, “Inconsistent visual analyses of intrasubject data,” Journal of Applied Behavior Analysis, 12, 573-579, 1959.
reduced the underestimation and led to a greater perception of change. In an effort to explain the typical underestimation bias, Andreassen and Kraus [1] proposed the salience hypothesis. According to this hypothesis, smaller sample sizes result in larger point-to-point changes and hence make sharp increases more salient. However, it is apparent that this hypothesis cannot account for the bias of experienced participants with low sample sizes in the present research, given that the tendency in this case was to report less change rather than more. How best to account for these discrepancies is unclear, but it is possible that they stem from differences in the methods used to present the time series data to participants. In the previous research, participants typically viewed lists of numbers and gave their extrapolations as numerical values. In the present experiments, participants viewed graphs and made extrapolations by adjusting the last point on the graph.
Resource Discovery and Selection for Large Scale Query Optimization in a Grid Environment Mahmoud El Samad, Abdelkader Hameurlain, Franck Morvan IRIT laboratory, Paul Sabatier University 118, route de Narbonne 31062 Toulouse Cedex 9, France { samad, hameur, morvan } @irit.fr Abstract- Current peer to peer (P2P) methods employing distributed hash tables (DHT) for resource discovery in a Grid environment suffer from these problems: (i) the risk of network congestion due to the sent messages for updating data on resources and (ii) the risk of the churn effect if a large number of nodes want to update their data at the same time. These problems form big challenges in a large scale dynamic Grid environment. In this paper we propose a method of resource discovery and selection for large scale query optimization in a Grid environment. The resource discovery extends the P2P system Pastry. DHT are used to save only static data on resources. We retrieve the dynamic properties of these resources during the resource selection by a monitoring tool. First, the originality of our approach is to delay the monitoring of resources to the phase of resource selection. This strategy will avoid a global monitoring of the system that is often employed in the current resource discovery systems. Second, the method is executed in a decentralized way in order to help the database optimizer to better discover and select resources.
I.
INTRODUCTION
During these last years, the Grid knew a very important development for high computation tasks (“computational grid”) and data storage (“data grid”). In both cases, the resource scheduling had attracted the attention of the research and industrial community. The main idea is to provide a transparent access to heterogeneous resources distributed in a large-scale environment and where every user belongs to a virtual organization. The phases of resource discovery and selection are the two first phases of a Grid Scheduling Process (GSP) whereas the third is the execution phase [14]. To give an analogy, the phase of resource discovery can be compared to an Internet search where a number of potentially suitable references are returned, but these references might be unavailable due to network failures or outdated URLs. The second phase of resource selection determines the actual suitability of a resource at the time of the selection process, as well as its agreement to participate in the forthcoming execution phase [10]. In most traditional systems, the information on the resources is statically defined in a centralized way. Initially, the Grid [8, 9] employs the Grid Resource Information Protocol (GRIP) for saving static data on resources (e.g. CPU speed, RAM capacity) in a centralized approach. Globus uses a directory service based on LDAP, named Monitoring and Discovery System (MDS) [4], for saving static data on resources in a
centralized server called the Grid Index Information Server (GRIP). Unfortunately, these systems operate badly in a highly heterogeneous and dynamic large-scale environment such as Grid due to the following reasons: (i) the centralization of the data on the resources can create a bottleneck due to the sent messages to a single point and (ii) the information on the resources does not consider state changes of the environment. Several solutions were proposed for resource discovery in a Grid. The main motivation of these works is the fast discovery of the best subset of resources to execute a task. The resource discovery must consume a minimal quantity of bandwidth to avoid network congestion. The execution task in a Grid can require one or several machines with characteristics such as a: speed CPU > 2Ghz, a RAM > 512 Mb, a hard disk space of 200 Mb and a CPU load less than 25 %. These attributes contain static data (e.g. speed CPU, RAM capacity) and dynamic data (e.g. CPU load). For example, we quote the language Resource Specification Language (RSL) [3] of Globus. In this system, a query is written in this form: search os-type=linux && 800MHz<=cpu-speed<=1000MHz. We classify the current methods for resource discovery in a Grid into two main categories: the methods based on the master-slave approach (or leader / worker approach) [13, 14] and the methods based on the peer to peer (P2P) systems [1, 2, 12, 20]. In the master-slave approach, every master keeps up to date information for a set of resources belonging to a virtual organization. The problems with this approach arise when the number of slaves becomes very large [13], and the scalability [14]. P2P methods for resource discovery in Grid are classified in a recent paper [20]. The principle of a part of these methods is the utilization of distributed hash tables (DHT). DHT will maintain static and dynamic data on resources. The main problems [1, 2, 20] when using such a technique are the following: (i) the risk of network congestion due to the sent messages for updating dynamic data on resources and (ii) the risk of the churn effect if a large number of nodes want to update their data at the same time. These problems form big challenges in a large scale dynamic Grid environment. It is important to note that most of these works are employed for resource discovery for a task. In our case, a task is a relational operation (e.g. a join) of a database system. The resource discovery in a database sytem depends on the data locality (e.g. a scan operation is placed where relevant data
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 279–283. © Springer Science+Business Media B.V. 2008
280
SAMAD ET AL.
reside). Relational operations must be executed taking into account operand locations and user location. The idea is to explore the neighbor nodes possessing the best characteristics to execute an operation. In this paper we are interested in the resource discovery and selection for large scale query optimization in a Grid environment. These two phases are executed after the logical allocation of resources (i.e. this phase attribute a set of processors with no precise identity for each operation). In the context of resource scheduling for parallel query processing on Grid we quote the works of [6, 7, 11, 17]. All these references suppose that data on resources are findable by a simple search function (e.g. getAvailableMachines() in [6], SearchAvailableSites() in [17]) with a centralized method. Centralized methods could generate a bottleneck because all messages are sent to a single point. So, it is important to define decentralized methods for resource discovery able to better operate in a large-scale environment. In this paper we propose a new method for resource discovery and selection for large scale query optimization in a Grid environment. The resource discovery extends the P2P system Pastry [15]. Contrary to current methods employing DHT to save static and dynamic data on resources, we use DHT to save only static data on resources. Each node in Pastry represents a machine with static properties. We retrieve the dynamic properties of these machines during the resource selection by a monitoring tool (e.g. NWS [21], the service presented in [5]). Resources are chosen according to operand locations and user location. Our proposal will guarantee to discover the best and nearest nodes to operands (this will help to use nodes connected to the node where operand is present with a high bandwidth value). First, the originality of our method is to move (or to delay) the monitoring of resources to the phase of resource selection. This strategy will avoid a global monitoring of the Grid that is often employed in the current resource discovery systems. Second, the method is executed in a fully decentralized way in order to help the database optimizer to better discover and select resources. A database optimizer is responsible of generating an optimal execution plan (or near to the optimal) of a query (e.g. SQL query). The paper is organized as follows. Section 2 presents the state of the art on resource discovery in a Grid. In Section 3, first we briefly present the Pastry algorithm. Second we detail our proposal of resource discovery and selection for large scale query optimization. Section 4 discusses challenges. While Section 5 concludes and presents perspectives. II. STATE OF THE ART We classify current methods for resource discovery in a Grid into two main categories: the methods based on the masterslave approach (or leader / worker approach) [13, 14] and the methods based on the P2P systems [1, 2, 12, 19]. As we mentioned in the introduction, the problems of the master-slave approach arise when the number of slaves becomes very large [13], and the scalability [14].
Methods based on P2P systems for resource discovery are classified into two categories [20] depending on how peers are organized: the structured methods (e.g. MAAN [1] based on Chord [18], [2] based on Pastry [15]) and the unstructured methods (e.g. [12, 19]). Structured P2P systems employ a rigid structure to interconnect peers and to organize file indices, while in unstructured methods each pair is randomly connected to a fixed number of others peers and there is no information about the location of files. Due to the lack of an underlying structure in those systems, there is no information about the location of files, thus the prevailing resource location method is “flooding”. A peer looking for a file issues a query which is broadcast in the network. Clearly flooding is not scalable since it creates a large volume of unnecessary traffic in the network [20]. Our proposal extends structured P2P systems. The basic functionality of P2P structured systems is to employ distributed hash tables (DHT or for Distributive Hash Table) for file locations. A DHT is a table allowing storage and rapid access in O(1) of pairs of type . This data structure provides basic operations like: Lookup (key) (returning the value associated with the key in the DHT) and Store (key, value). DHT improve the effectiveness of a classical hash table because they make possible the running of multiple operations in parallel. We quote the method Chord [18], a better known system for file location in P2P environment. Chord provides support for just one operation: given a key, it maps the key into a node. Another method named Pastry [15] is a variant of Chord (this method will be described in Section 3 since our method extends it). Many techniques of resource discovery use DHT to save static and dynamic data on resources. The main problems [1, 2, 20] in this case are: (i) the risk of network congestion due to the sent messages for updating dynamic data on resources and (ii) the risk of the churn effect if a large number of nodes want to update their data at the same time. We focused our study for the works on resource discovery and selection for database systems. In this context, works of [6, 7, 11, 17] are interested in resource scheduling for parallel query processing for Grids. [6, 7, 17] are interested in resource scheduling to enable partitioned parallelism to an operation. [11] propose an algorithm to MOMD (Multiple Operation Multiple Data) parallelism for relational operations in a Grid. All these references suppose that data on resources (i.e. machines) are findable by a simple search function (e.g. getAvailableMachines() in [6], SearchAvailableSites() in [17]) with a centralized method. Indeed, the centralized methods could generate a bottleneck because all messages are sent to a single point. However, it would be very interesting to propose decentralized methods (that extend P2P systems) of resource discovery for large scale query optimization. This method must have the following properties: (i) the ability to operate in a large-scale environment, (ii) the decentralization of the information on resources and (iii) the consideration of the state changes of the environment. Our method will be detailed in the next Section.
SELECTION FOR LARGE SCALE QUERY OPTIMIZATION IN A GRID ENVIRONMENT
III. METHOD FOR RESOURCE DISCOVERY AND SELECTION Our method for resource discovery extends the P2P system Pastry [15]. It will be detailed in the next Section. A. Pastry [15] Each node in the Pastry network has a unique identifier (nodeId) coded on 128 bits and is positioned on a virtual ring. When presented with a message and a key, a Pastry node efficiently routes the message to the node with a nodeId that is numerically closest to the key, among all currently live Pastry nodes. Each node in Pastry is responsible of keys (the identifier of a data is a key) the closest (numerically) of its identifier. In order to perform message routing in the network, each node maintains a routing table. Also, it maintains a set of neighbors on the circle identifiers. Thus, each node in Pastry (Figure 2) is characterized by: an identifier, a set of keys closest to its identifier, a routing table and a set of neighbors (close to him in the IP meaning).
281
C. Encoding of static characteristics of a resource In this section, we present how we encode static data of a resource into a unique identifier (Figure 3). This identifier will be inserted into a virtual ring following the Pastry algorithm [15]. Characteristics are: the operating system (OS), the CPU time to perform a simple operation, the size in Megabytes of RAM, the I/O bandwidth and the hard disk capacity. These characteristics will be coded on n bits (we do not detail the number of bits for each characteristic). In order to complete the 128 bits, we use a hash function that take in entry the IP address of a machine and convert it to a code on 128 – n bits (We can use a SHA-1 function). Thus, we avoid having 2 resources with the same identifier. Our strategy differs from [2] where static and dynamic characteristics are encoded into one identifier.
Figure 1. An example [15] of a node structure with Pastry.
Routing in Pastry: The routing of a key K to a node which is closest to the numerically ID in a network of N nodes, is done in less than O(log N) steps when the network is operating normally. When sending a message with a key K, each node receiving the message determine –firstly- the number m of commonality between its identifying and the key K, and then he looks into his routing table to the a node with at least m + 1 digits in common with its and then transmits the message. It is a routing by prefix correspondence. B. The proposed method The execution plan of a query is composed by a set of relational operations (e.g. selection, join). So we want to determine for each operation the number of optimal resources. We suppose that the logical allocation of resources is done before the resource discovery. Therefore, we propose the schema of the Figure 2.
Figure 2. Steps for query evaluation.
Figure 3. Encoding static characteristics of a resource.
D. Algorithm of resource discovery and selection The idea is, for each operation Opi of a query Q, to discover the best set of resources with respect to operand locations (i.e. the data necessary for the execution of an operation). The database optimizer determines the degree of parallelism di. It will give an approximate value to di because in a Grid environment the determination of the correct degree of parallelism is more complicated then parallel system due to network and resource heterogeneities. The problem of resource scheduling on the Grid is actually more complicated than choosing the correct degree of parallelism. The database optimizer should decide not only how many machines should be used in total, but exactly which these machines are, and which parts of the query plan each machine is allocated [6,15]. In addition, we assume that the database optimizer will provide information on the preferred characteristics Cari for the di machines. This is done according to the operand profiles of an operation Opi.
282
SAMAD ET AL.
The algorithm of resource discovery and selection for a query Q is described in an abstract way (Figure 4). Q is is composed by a set of n operations. We suppose that the logical allocation is done before executing the algorithm of figure 4. During the phase of resource discovery, we will discover di*X machines, where X is a factor to increase the chance of having the least-loaded machines. For the phase of resource selection, we use a monitoring service (e.g. NWS [21]) to select the best di machines issued from the phase of resource discovery (i.e. the di*X machines). In the following, we detail the algorithm of resource discovery. Resource selection and discovery for a query Q // n: the number of operations of Q For (i=0; i
Figure 4 Algorithm of resource discovery and selection for a query
E. Algorithm of resource discovery (Figure 5) The method discover (Opi, di, X, Cari) form the basis of our algorithm. The goal is to discover a set of machines with good characteristics to carry out an operation Opi. We assume that the database optimizer possesses these two methods: getCarMachine (Opi) and getNbOptimalMachine (Opi, Cari). The first provides the preferable characteristics Cari for the execution of an operation Opi. The machines with high disk I/O rate are preferred for retrieving data, the machines with high connection speeds are preferred when the query cost is network-bound, and the machines with high CPU capacity are selected for the rest, CPU-intensive operations [6]. The second determines the optimal number of machines di to execute Opi. During the initialization, we explore the neighbor nodes, named setOfDNeighbor, possessing the closest characteristics to Cari in order to better execute Opi. Thus we have the method getNeighbor (Cari). This is feasible because Pastry provides a field for the neighboring nodes (see section Pastry), getNeighbor (Cari) returns the best neighbors (We choose to get at minimum 4 neighbors among 8, see Figure 1 example of Pastry node). Hence, the algorithm is implemented taking into account operand locations of an operation Opi. This choice will guarantee to discover the best and nearest (in order to use nodes with a high bandwidth value to the operand) nodes to an operand. If the operation does not require any special characteristics, getNeighbor (Cari) will return all the neighbors of the current node.
The loop will allow to discover (in the setOfDNeighbor) di * X machines for the operation Opi. We use the variable setOfDNeighborOfNeighbor to save the current neighbors of neighbors. At the end of the algorithm, each Opi possesses a set of discovered machines setOfDMachine having good characteristics to perform Opi. Pastry is employed in our system because these advantages: (i) to easily identify a resource with static characteristics in a decentralized way, and (ii) to rapidly access all the neighbors (and their characteristics) of a given node. F. Algorithm of resource selection (Figure 6) The resource selection algorithm determines for each operation Opi the best di resources among di * X resources. The method getStateMachine (machine) retrieves the current state of the machine by using a monitoring tool (e.g. NWS [21]). The returned object state contains all dynamic characteristics (i.e. CPU load, I/O load, and bandwidth load) for each discovered machine. A very important parameter is the bandwidth value between the machine where the operands exist and the machine where an operation Opi will be executed. This parameter posses a fundamental role because data will be repartitioned (or sent) on the selected machine. Finally, the method selectBestDiMachines() select the best di machines according to the state of discovered machines, it uses a sort function.
Algorithm for resource discovery for an operation Opi method :discover(Opi,di,X,Cari) Caract Cari = getCarMachine(Opi) ; // Get the characteristics to execute Opi int di = getNbOptimalMachine (Opi,Cari) ; // Get the number of optimal resources to execute Opi //Initialization int cptDMachine = 0 ;// count the number of discovered machines setOfDNeighbor = getNeighbor (Cari) ; // Get the neighbors of the current node using Pastry setOfDMachine.add(machineIni) ; setOfDMachine.add(setOfDNeighbor) ; cptDMachine = cptDMachine + setOfDMachine.size() ; While ( cptDMachine < di*X ) { tempNeighbor = new tempNeighbor; For (j=0 ;j<setOfDNeighbor.size() ;j++){ setOfDNeighborOfNeighbor = setOfDNeighbor[j].getNeighbor(Cari) ; setOfDMachine.add(setOfDNeighborOfNeighbor); // We save the new discovered neighbor tempNeighbor.add(setOfDNeighborOfNeighbor); cptDMachine = cptDMachine + setOfDNeighborOfNeighbor.size() ; } setOfDNeighbor = tempNeighbor; } // setOfDMachine contains di*X discovered machines
Figure 5 Algorithm of resource discovery for an operation
SELECTION FOR LARGE SCALE QUERY OPTIMIZATION IN A GRID ENVIRONMENT Algorithm for resource selection for an operation Opi Method selectBest(Opi,di,setOfDMachine) For (i=0 ; i< di*X ; i++) { //Invocation of monitoring information State[i] = getStateMachine(setOfDMachine[i]); } // SelectBestDiMachines called another sort method setOfSMachine = selectBestDiMachines (setOfDMachine,state);
Figure 6 Algorithm of resource selection for an operation
IV.
DISCUSSIONS AND CHALLENGES
The proposal has the following advantages: (i) the facility of the maintenance of DHT because it contains only static data and (ii) the optimization of the global monitoring cost because the monitoring is done in the phase of resource selection for the set of discovered machines. However, a challenge is to estimate the X factor, which is essential given the dynamic nature of grid resources. The idea is to increase the number of resources to discover in order to have a greater chance to find the least-loaded machines. The X factor is set by the system administrator according to the state of virtual organizations. Another solution is to dynamically assign a value of X according a global function that monitor the Grid state. Many experiments and simulations must be done before assigning a value for the X factor. So, we prefer not giving a value before implementing our method. V. CONCLUSIONS AND PERSPECTIVES In this paper we proposed a new method of resource discovery and selection for large scale query optimization in a Grid environment. Our method possesses these properties: (i) the ability to operate in a large-scale environment, (ii) the decentralization of the information on resources (to avoid the bottleneck) and (iii) the consideration of the state changes of the environment. We are currently implementing the proposed method. And we had already install and connect the monitoring tool NWS [21] to our platform. The perspectives of our work are: − The performance evaluation of our proposal. − The comparison of our method with DHT methods (with static and dynamic data). − The determination of the effective threshold of our method. It would be interesting to know in which cases our strategy is efficient. In another term we wish to determine for which metric value (e.g. CPU load, bandwidth I/O, memory load) our method is better then others. REFERENCES [1] Min Cai, Martin Frank, Jinbo Chen, Pedro Szekely, “MAAN: A MultiAttribute Addressable Network for Grid Information Services”, Grid Computing, 2003. Proceedings. Fourth International Workshop on 17 Nov. 2003 Page(s):184 – 191.
283
[2] Adeep S. Cheema, Moosa Muhammad, and Indranil Gupta, “Peer-to-peer Discovery of Computational Resources for Grid Applications”, Grid Computing Workshop 2005, 2005 IEEE. [3] K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, and S. Tuecke. “A resource management architecture for metacomputing systems”. Lecture Notes in Computer Science, 1459, 1998. [4] K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman. “Grid information services for distributed resource sharing”. In 10th IEEE Symp. On High Performance Distributed Computing, 2001. [5] M. El Samad, J. Gossa, F. Morvan, A. Hameurlain, J-M. Pierson, L. Brunie, A monitoring service for large scale dynamic query optimization in a Grid environment. International Journal of Web and Grid Services (IJWGS), to appear. [6] A. Gounaris, R. Sakellariou, N. W. Paton, A.A. Fernandes, “Resource Scheduling for Parallel Query Processing on Computational Grids”, Proceedings of the Fifth IEEE/ACM International Workshop on Grid Computing (GRID’04). [7] V. F. V. Da Silva, M. L.Dutra, F. Porto, B. Schulze, A. C. Barbosa and J. C. de Oliveira, An adaptive parallel query processing middleware for the Grid, CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE, Concurrency Computat.: Pract. Exper. 2006; 18:621– 634, Wiley InterScience. [8] I. Foster, C. Kesselman and S. Tuecke, The Anatomy of the Grid Enabling Scalable Virtual Organizations, International Journal on Supercomputer Applications, 15(3), 2001. [9] I. Foster, The Grid: A New Infrastructure for 21st Century Science, Physics Today, Vol. 55 #2, p. 42, 2002. [10] G. Kakarontzas, I. K. Savvas, “Agent-Based Resource Discovery and Selection for Dynamic Grids”, 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE’06) pp. 195-200. [11] S. Liu, H.A. Karimi, Grid query optimizer to improve query processing in grids, Future Generation Computer Systems (2007), doi:10.1016/j.future.2007.06.003S. Liu, Hassan A. Karimi, Grid query optimizer to improve query processing in grids. [12] M. Marzolla, M. Mordacchini, S. Orlando, “Resource Discovery in a Dynamic Grid Environment”, Proceedings of the 16th International Workshop on Database and Expert Systems Applications (DEXA’05), 2005 IEEE. [13] A. Padmanabhan, S. Wang, S. Ghosh , and R. Briggs, “A Self-Organized Grouping (SOG) Method for Efficient Grid Resource Discovery”, Grid Computing Workshop 2005, IEEE. [14] T. G. Ramos, A. C. M. A. de Melo, “An Extensible Resource Discovery Mechanism for Grid Computing Environments”, Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGRID’06) pp. 115-122. [15] A. Rowstron, P. Druschel, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems”, Appears in Proc. of the 18th IFIP/ACM International Conference on Distributed Systems Platforms (Middleware 2001). Heidelberg, Germany, November 2001. [16] J. M. Schopf, Grid resource management: state of the art and future trends. Norwell, MA, USA: Kluwer, Academic Publishers, 2004, ch. Ten actions when Grid scheduling: the user as a Grid scheduler, pp. 15–23. [17] K. M. Soe, A. A. Nwe, T. N. Aung, T. T. Naing, N. L. Thein, Efficient Scheduling of Resources for Parallel Query Processing on Grid-based Architecture, APSITT 2005 Proceedings. 6th Asia-Pacific Symposium 2005, IEEE. [18] I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, H. Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications”, Proceedings of the 2001 ACM SIGCOMM Conference. [19] D. Talia, P. Trunfio, “Peer-to-Peer protocols and Grid services for resource discovery on Grids”, in: L. Grandinetti (Ed.), Grid Computing: The New Frontier of High Performance Computing, in: Advances in Parallel Computing, vol. 14, Elsevier Science, 2005. [20] P. Trunfio, et al., “Peer-to-Peer resource discovery in Grids: Models and systems”, Future Generation Computer Systems (2007), doi:10.1016/j.future.2006.12.003. [21] R. Wolski, N. Spring, J. Hayes, The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing, Journal of Future Generation Computing Systems,Volume 15, Numbers 5-6, pp. 757-768, October, 1999.
Protecting Medical Images with Biometric Information Marcelo Fornazin Unesp – Sao Paulo State University Av. Luiz E.C. Coube 14-01 Bauru – SP – Brazil +551431036079 [email protected]
Danilo B.S. Netto Jr. Unesp – Sao Paulo State University Av. Luiz E.C. Coube 14-01 Bauru – SP – Brazil +551431036079 [email protected]
Abstract- Medical images are private to doctor and patient. Digital medical images should be protected against unauthorized viewers. One way to protect digital medical images is using cryptography to encrypt the images. This paper proposes a method for encrypting medical images with a traditional symmetric cryptosystem. We use biometrics to protect the cryptographic key. Both encrypted image and cryptographic key can be transmitted over public networks with security and only the person that owns the biometrics information used in key protection can decrypt the medical image.
I. INTRODUCTION Medical image can be stored and handled in digital format trough Picture Archiving and Communication System (PACS). PACS is an integrated management system for archiving and distributing medical image data [1]. Images stored in PACS should be private and available only to doctor and patient. Because of that, these images should be handled in a secure way, i.e., they should be stored and transmitted with security to prevent unauthorized access. The Digital Image and Communication in Medicine (DICOM) standard Part 15 (PS 3.15-2007) provides a standard for secure communication and digital signature [2]. It provides mechanisms that could be used to implement security policies with regard to the interchange of DICOM objects between Application Entities. DICOM standard defines four security profiles: secure use profiles, secure transport connection profiles, digital signature profiles and media storage secure profiles. Health Insurance Portability and Accountability Act (HIPAA) [3] provides a conceptual framework for healthcare data security and integrity and sets out strict and significant federal penalties for non-compliance. However, the guidelines do not mandate specific technical solutions, rather there is a repeated emphasis on the need for scalable compliance solutions appropriate to variety of clinical scenarios covered by HIPAA language. HIPAA currently address four key areas: electronic transactions and code sets, privacy, unique identifies, and security. One way to secure medical images is to store and to transmit them with cryptography. Medical images can be encrypted for storage and transmission and decrypted later by the doctor or patient. Also, it is possible to verify the
Marcos Antonio Cavenaghi Unesp – Sao Paulo State University Av. Luiz E.C. Coube 14-01 Bauru – SP – Brazil +551431036079 [email protected]
Aparecido N. Marana Unesp – Sao Paulo State University Av. Luiz E.C. Coube 14-01 Bauru – SP – Brazil +551431036079 [email protected]
authenticity of a medical image sent over a public network using digital signature schemes. Recent works have proposed methods to secure medical images with public key cryptography [4], which will be described in Section II.A. Traditional cryptosystems uses cryptographic keys to protect data. Cryptographic keys are large random numbers and they need to be stored in protected environments, like smartcards or protected memories, and released by some authentication method, otherwise they can be stolen, lost or informed. Another method to protect cryptographic keys is to store it protected with some biometric information. Section II.C describes a method for biometric protection of cryptographic keys. Biometrics provides secure storage and authentication for cryptographic keys. It is also possible to provide authenticity of a medical image with Biometrics. This paper presents an approach related to medical image protection based on private key cryptography and Biometrics. II. BACKGROUND This section presents the main concepts of Cryptography and Biometrics related to the new proposed method for medical image protection. A. Cryptography Cryptography is a method of storage and transmission of data with security. It uses code systems to scramble data so that the data cannot be understood by anyone other than the intended party [5]. Cryptography has two phases: encryption and decryption. Encryption transforms plain text (data that can be understood) in chipper text (data that has no meaning). Decryption recovers the plain text from cipher text. Cryptography uses keys to encrypt and decrypt data, so only person who has access to the key can decrypt data encrypted with that key. Due to this, encrypted data can be stored and transmitted with security. If an unauthorized person accesses encrypted data, he will not be able to understand the information contained on it. Cryptosystems are the mechanisms that perform encryption and decryption. Nowadays, there exist some secure cryptosystems. Two examples of secure cryptosystems are: AES (Advanced
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 284–289. © Springer Science+Business Media B.V. 2008
PROTECTING MEDICAL IMAGES WITH BIOMETRIC INFORMATION
Encryption Standard) [6] and RSA (Ronald Rivest, Adi Shamir e Leonard Adleman) [7]. Private-key cryptography (symmetric cryptography) is a cryptographic model where there is a key which is used to encrypt and decrypt data. Some data encrypted with such a key only can be decrypted with the same key. So, the key should be stored in a secure way. If an unauthorized person accesses the key he can decrypt the data encrypted with it [5]. Public-key cryptography (asymmetric cryptography) is a cryptographic model where there is a pair of keys: public key and private key. Data encrypted with the public key, only can be decrypted with the correspondent private key. In this model, a person called Alice generates a pair of keys; she holds the private key with her and distributes the public key to people who want to send data to her. Bob, a person who received the public key from Alice can encrypt the message and send it to Alice. After encrypted, the message can be decrypted only by the private key, so only Alice can decrypt the message. If someone gets the public key and the encrypted message it cannot decrypt the message with the public key. Public-key cryptography can be used for data encryption, or authenticity verification [5]. Symmetric cryptography has a better performance on encryption and decryption, but has the disadvantage of key protection. Public key cryptography has the advantage of the key pair, where just the private key should be stored in a secure way and it is not necessary to transmit or share it. Symmetric cryptography has the disadvantage of key management: the private key should be transmitted in a secure way. Many applications use asymmetric cryptography for key exchange and symmetric cryptography for data protection. So they can protect data with good performance and transmit symmetric keys in secure ways. Recent works applied cryptography to secure medical images. In [4] Cao et al. used o public key cryptography to protect medical and verify its authenticity. Image authenticity is verified by digital signature where the sender generates a signature of medical image with its private key and everyone can verify the authenticity of medical image with sender’s public key. Image security is provided by encryption of medical image and signature with receiver’s public key and they can be decrypted only by receiver, the owner of private key. This work presents a method for medical image security, but this method has some performance issues and it don’t care about receiver private key protection. Another works address performance issues on medical image [8] and medical image authenticity verification [9]. B. Biometrics Biometrics is the study of automated methods for uniquely recognizing humans based upon one or more intrinsic physical or behavioral traits, such as: fingerprints, iris, retina, face, voice, signature, gait, etc [9]. These traits are called biometrics features.
285
The biometrics features must have the following desirable properties: universality, uniqueness, permanence, collectability, performance, acceptability and circumvention. These properties match different biometric features for specific applications [9]. This makes biometrics more secure and convenient than the traditional methods of authentication based on passwords. As a biometrics feature, fingerprint has good rating on most of the biometrics properties, and this makes it a good candidate to be employed on automated authentication systems. Also, fingerprint is broadly used due to the low cost of fingerprint sensors [11]. Biometrics can be used to protect other information. The usage of biometrics to protect data is called biometrics cryptosystems. Biometrics cryptosystems are cryptographic systems that use biometrics keys to encrypt data. They address some issues besides the security of the data encrypted, for example, biometrics signal variability [12]. C. Fuzzy Vault Fuzzy Vault [13] is a cryptographic scheme that protects a password and biometrics information in a combination that does not reveal any detail about them. Also this scheme addresses some issues related to biometrics comparison as signal variability and order variation in signal elements. To overcome these issues, it is proposed the usage of error correcting codes, particularly Reed-Solomon codes. In the cryptographic construction named FVS or simply Fuzzy Vault, a secret can be locked in a vault by a set A. Another set B, which overlaps substantially A, can unlock the vault and recover the secret successfully. Such error-tolerant cryptographic algorithm can be useful in many circumstances in which the exactitude represents a drawback, such as secret protection using biometrics data and other types of noisy data, biometrics template protection, and so on. Juels and Sudan [13] FVS defines LOCK and UNLOCK algorithms, which are used to protect the password and biometrics information. C1. Lock Algorithm In the LOCK algorithm, the secret k is embedded in a polynomial p over x, for example, each elements of k is a coefficient of p. So, if n is the length of k, p has order of n-1. Then a set A is used to evaluate p over the elements of A. This step generates a set G of points called genuine points. Then, it is generated a set C of chaff points that do not lie on G. G and C are joined and scrambled generating the set V which represents the Vault encoding the secret k and biometrics information A. Fig. 1 shows the points generated in Fuzzy Vault lock algorithm (a). The polynomial p encodes the secret k. (b) The genuine points (black) are projected from evaluation elements of A over p plus the chaff points (white) added to the set. (c) The genuine and chaff points that form the vault V.
286
FORNAZIN ET AL.
C2. Unlock Algorithm
to verify the correctness of k. The need of a CRC is explained in the next paragraph. A diagram of the encoding process proposed by Uludag and Jain is presented in Figure 2.
To unlock V, a set B is get. It will be used to detect candidate points in V. Candidate points are detected finding the projection in x of elements in V that are equal to B. Candidate points are supposed to be a corrupted message. Then, an error correction algorithm is used to reconstruct the polynomial p from the set of candidate points. The algorithm proposed by Juels and Sudan is the Peterson-BerlekampMassey [13]. If the candidate points overlap sufficiently on A, the error correction algorithm reconstructs p, otherwise it returns null.
Fig. 2. Fuzzy Vault Encoding [14]. Fig. 1. a) Polynomial p from secret k; b) Genuine points (black) and chaff points (white); c) All points from vault V.
C3. Security of Fuzzy Vault If the length of C is large enough, only someone who knows k or a significant number of elements of A can unlock vault V. If somebody with no knowledge of k or A try to recover k using a brute force search on V, the computational time required turns it infeasible to be done. Fig. 1 (c) shows that without any information from A or k it is infeasible to detect the genuine points in the vault V. III. FUZZY VAULT FOR FINGERPRINTS Previous works propose the implementation of Fuzzy Vault using fingerprints minutiae data [14]. They use a concatenation of x and y components from minutiae to define the set A. Uludag et al. [14] proposed an implementation of Fuzzy Vault for Fingerprints based on the first implementation of the Fuzzy Vault performed by Clancy et al. Their work uses a Galois field of size 16, i.e., GF(16). This field has 216 elements [14]. They use an 8 order polynomial. Each coefficient has 16 bits. So, they have 9 coefficients of 16 bits, which gives a 144 bit string. The first 8 coefficients of p are the secret k that will be encoded and the last coefficient is a CRC that is used
In the decode process Uludag et al. use successive Lagrange Interpolation over sets with 9 elements from candidate points to unlock the vault V, instead of using ReedSolomon decode over all elements in the candidate points. The Fuzzy Vault decoding proposed by Uludag et al. is presented in Fig. 3. To enable Lagrange interpolation, it is necessary to use a method to verify the correction of polynomial decode. This method is a CRC that is appended in the secret. This change has been made because the computational time of Lagrange interpolation from subsets of B is the same of Reed-Solomon decoding from B [15]. Uludag and Jain have also appended some alignment information for fingerprints. This information is called Helper Data. Helper Data are Orientation Field Flow Curves (OFFC) [11], which coefficients represent orientation from fingerprint curves, helping in the fingerprint alignment process during decode of the vault. It is important to observe that Helper Data do not reveal any information from the minutiae data encoded in the Vault. Uludag and Jain [16] got 84,5% of successful decoding over Fingerprint Verification Competition 2002 database [17].
PROTECTING MEDICAL IMAGES WITH BIOMETRIC INFORMATION
287
The Helper Data support in FVLib is still in progress. The next release of FVLib will support the Fuzzy Vault encoding and decoding with Helper Data. V. MEDICAL IMAGES PROTECTION USING FVLib
Fig. 3. Fuzzy Vault Decoding [14].
IV. PROPOSED METHOD In this paper we propose a method that uses private key cryptography (symmetric cryptography) to encrypt medical images and to protect the private key with biometrics information to avoid unauthorized access. The proposed method encodes the medical image with the AES algorithm and uses an implementation of the Fuzzy Vault for Fingerprints to encode the cryptographic key used in AES encoding. Then, it is possible to safely transmit and store both the medical image and the AES key. With this method, only the person who has the same fingerprint used in the encoding process can decode the image. A. Fuzzy Vault Lib (FVLib) In order to implement this, we developed the FVLib, which is a library that implements the FVS, allowing it to be integrated with real world security applications. FVLib was implemented in C++ (Microsoft Visual C++), but there is a Java implementation in progress, which will allow the use of FVLib with smartcard applications. These applications will allow humans to be authenticated based on their biometrics characteristics, instead of on a single password. In order to assess the proposed method a system which uses FVLib was implemented. In this system, fingerprint characteristics are extracted using an U.are.U 4000 scanner and inserted into the FVLib as parameters. If the fingerprint matches with the template fingerprint, the corresponding stored AES key is returned.
The implemented system integrates an AES cryptosystem and Fuzzy Vault for Fingerprints to protect medical images with biometrics features of doctor or patient. Odd situations such as doctor or patient death can be overcome by collecting both doctor and patient biometrics features to protect the images. In this case, doctor or patient can unlock the vault and retrieve it. This can be easily implemented to give some redundancy to the method. The system has two phases: encoding and decoding. The encoding phase protects the medical image with 128-bit AES cryptography and protects AES key with biometrics features. Then, both encrypted image and protected key can be securely stored. The decoding phase retrieves the medical image after successful decoding of AES key, retrieved by the use of the biometrics features. The encoding is carried out in the following steps: 1. An AES key is generated to encrypt the medical image; 2. The medical image is encrypted; 3. The AES key is protected with Fuzzy Vault using a fingerprint from the person who will access the image (the doctor or the patient); 4. The encrypted image and the AES key are ready to be stored or transmitted securely. Fig. 4 shows a diagram of the encoding process of the implemented system. The decoding is carried out in the following steps: 1. The AES key is decoded from Fuzzy Vault with the fingerprint information of the person who intends to access the image (the query fingerprint); 2. If the query fingerprint matches the template fingerprint, Fuzzy Vault decoding releases AES key, otherwise it returns an error; 3. If the AES key is released, then the encrypted image is decoded with the retrieved AES key. Fig. 5 shows the diagram of the decoding process of the implemented system. The system can provide security for medical image storage or communication. The image can only be decrypted with the AES key used during encryption. The AES is protected in Fuzzy Vault with fingerprint information. AES key can be accessed only by the person who owns the fingerprint used in Fuzzy Vault encoding. Also there is no way to change the encrypted medical image, the AES key or the fingerprint information. This method can be used in medical image protection for PACS. When a doctor or patient asks for an image, it can be sent
288
FORNAZIN ET AL.
through a public network without any privacy risk. The encrypted image can also be stored by doctor or patient protected by his biometrics features. VI. RESULTS The implemented system has been tested with medical images and fingerprints to evaluate its overall performance, including the processing time for encoding and decoding. In our experiments, there were used three medical images with different sizes: 5 MB, 27 MB and 108 MB. Each image has been encoded and decoded 10 times and the processing time has been recorded. The system runs on an Intel Core2 Duo T5250 Processor with 1.5 GHz and 2 GB of RAM. Note that, this is not a supercomputer.
7 it is possible to determine that the proposed system can encode and decode medical images at 11 MB/s rate. The system proposed in [4] takes 40 seconds to encode or decode a 7MB image and 2-3 minutes to a 36 MB image in a Sun Sparc 690MP multiprocessor machine. TABLE I
ENCODING TIME Image Size (MB)
Average Encoding Time (s) Standard Deviation
5
27
108
0.659
2.565
10.100
0.023
0.187
0.714
TABLE II
DECODING TIME Image Size (MB)
Average Decoding Time (s) Standard Deviation
5
27
108
0.660
2.573
10.057
0.018
0.076
0.297
Fig. 4. Encoding of Medical Image and key protection with biometrics features.
Fig. 6. Encoding time of the implemented system with different image sizes.
Fig. 5. Decoding of AES key and medical image with biometrics features.
TABLE I and TABLE II show the encoding and decoding execution time of the implemented system for the three images. Fig. 6 and Fig. 7 show the times for encoding and decoding the images with different sizes. Encoding and decoding time are almost the same. Based on Fig. 6 and Fig.
Fig. 7. Decoding time of the implemented system with different image sizes.
PROTECTING MEDICAL IMAGES WITH BIOMETRIC INFORMATION
VII. CONCLUSIONS This paper presented a method to protect medical images through biometrics cryptosystems. This method can be used to provide security for medical images storage and communication. In order to assess the method for medical image applications, a system using the library FVLib was implemented. The proposed system can protect a medical image with traditional cryptosystems and protect the cryptographic key with biometric information. Only the owner of biometric information can decode the encoded image, this avoids impostors to have access to the cryptographic keys and decode the image. The experimental results obtained showed that biometrics can provide security and authenticity for medical images. Besides, the proposed system has a good performance for medical image encoding and decoding. Therefore, the experimental results showed that the biometrics cryptosystems can be used in security implementations of PACS. ACKNOWLEDGMENTS The authors would like to thank FAPESP (Grant Number 2006/07062-8) for the financial support. REFERENCES [1] Huang, H. K. . Picture archiving and communication systems: principles and applications. New York : Wiley, 1999. [2] National Electrical Manufacturers Association (NEMA). Digital Imaging and Communications in Medicine (DICOM), Part 15: Security Profiles, PS 3.15-2007. Rosslyn, VA : s.n., 2007. [3] US Department of Health and Human Services. HIPAA. [Online] http://aspe.os.dhhs.gov/admnsimp. [4] Medical image security in a HIPAA mandated PACS environment. F., Cao, H.K., Huang e X.Q., Zhou. s.l. : Elsevier, December 2003, Computerized Medical Imaging and Graphics, Vol. 27. [5] Stinson, D. , R.. Cryptography: theory and pratice. 2. Ontario, Canada : Chapman & Hall/CRC, 2002. p. 339. 158488-206-9.
289
[6] NIST. Advanced Encryption Standard (AES). [Online] 2001, [http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf]. [7] Rivest, R., Shamir, A. and Adleman, L. A Method for Obtaining Digital Signatures and Public-Key Cryptosystems. Communications of the ACM. 1978, Vol. Vol. 21, 2, pp. 120-126. [8] Norcen R.; Podesser M.; Pommer A.; Schmidt H.-P.; Uhl A. Confidential storage and transmission of medical image data. Computers in Biology and Medicine, Vol. 33, 3, 2003. pp. 277-292. [9] Smith, J.P. Authentication of digital medical images with digital signature technology. Radiology, Vol. 194, 1995. pp 771-774. [10] Jain, Anil K., Ross, Arun, and Prabhakar, Salil. An Introduction to Biometric Recognition. IEEE transactions on Circuits and Systems for Video Technology, Special Issue on Image- and Video-Based Biometrics. 2004, Vol. 14, 1, pp. 420. [11] Dass, Sarat C. and Jain, Anil K. Fingerprint Classification Using Orientation Field Flow Curves. . Proceedings of the Fourth Indian Conference on Computer Vision, Graphics & Image Processing.2004, pp. 650-655. [12] Uludag, Umut, et al. Biometric cryptosystems: issues and challenges. . Proceedings of the IEEE, Special Issue on Enabling Security Technologies for Digital Rights Management. 2004, Vol. 92, pp 948- 960. [13] Juels, Ari e Sudan, Madhu A fuzzy vault scheme... Proceedings of the IEEE International Symposium on Information Theory, pp 408. [14] Uludag, U., Pankantti, S., and Jain, A. K. Fuzzy vault for fingerprints. AVBPA, Lecture Notes in Computer Science. 2005, Vol. 3546, pp. 310-319. [15] Clancy, T. Charles, Kiyavash, Negar e Lin, Dennis J. Secure smartcard-based fingerprint authentication. New York : ACM Press. WBMA'03: Proceedingsof the 2003 ACM SIGMM Workshop on Biometrics Methods AND Applications. 2003, pp. 45-52. [16] Uludag, U. and Jain, A. K Securing Fingerprint Template: Fuzzy Vault with Helper Data.. Computer Vision and Pattern Recognition Workshop. 2006, pp. 163-163. [17] Maltoni, D., Maio, D., Jain, a. K., and Prabhakar, S.. Handbook of Fingerprint Recognition.New York : Springer, 2003.
Power Efficiency Profile Evaluation for Wireless Communication Applications Marius Marcu, Dacian Tudor, Sebastian Fuicu “Politehnica” University of Timisoara Computer Science and Engineering Department Timisoara, Romania Abstract— With the advances on semiconductor technologies, wireless communication applications are an emerging class of applications on battery powered devices. The multitude and complexity of components that implement a large spectrum of communication protocols on these devices requires closer evaluation and understanding in respect to power efficiency. As there is very little analysis on the relationship between wireless specific components and their power profile in the context of mobile wireless communication, we investigate the landscape of wireless communication efficiency methods. Next we apply power efficiency metrics on different wireless systems and analyze the experimental data we obtained through a monitoring application. Evaluation scenarios start with different wireless communication systems (WLAN, Bluetooth, GSM) and then analyze power consumption and efficiency in several scenarios on the WLAN communication system
I.
INTRODUCTION
Two of the most important evolution directions of mobile as well as traditional computing systems during the last years are oriented towards power efficient and unified communication systems. Among many others, Gartner made this assessment while publishing the “Top 10 Strategic Technologies for 2008” [1]. Gartner’s reports show that IT is responsible for 2% of carbon dioxide emissions which can be simply lowered by half by reducing the processing core frequency by 20% and adding processing cores. On the other side, we are witnessing a merging trend towards unified communication systems over IP protocols from desktop to handheld devices. In the handheld mobile device where we focus our attention, there is a widened application landscape over IP services which require increased processing demands. In this hot domain, while most of the research focuses on power efficient communication protocols and wireless sensor networks, there is little analysis on the relationship between application types and their power profile in the context of mobile wireless communication. One important fact is that the energy demands of battery powered devices and their applications substantially exceed the capacity of the actual batteries that power them [2]. To respond to this negative characteristic, development of new architectures and design methodologies for battery-efficient, thermal-aware and power-aware systems have been pursued. Power consumption and heat dissipation have become key elements in the field of high-end integrated circuits, especially those used in mobile and battery powered embedded devices, because of the new problems introduced by the increase in power consumption [3].
Research on battery-efficient system design has largely been motivated by the need to improve battery lifetime, but it has mainly focused on minimizing average power consumption. However, recent research has established that minimizing average power does not necessarily translate to maximizing battery life. This is because, in practice, the amount of energy delivered by a battery can vary significantly, depending on the time profile of the current drawn from the battery [4]. Therefore, power profiling techniques should be used in order to optimize power consumption of different applications. Furthermore, in order to obtain power-efficient mobile systems, the application level of the system should be designed for low power consumption [5]. The main idea of power-aware applications is that they should adapt and take corrective measures during their execution in order to optimize energy consumption and use resources efficiently. On the other way, today’s popular IEEE 802.11 standard for wireless networks has changed the face of networking. WLAN market has grown rapidly as wireless technology has evolved to meet fundamental needs of businesses and technology consumers. Wireless links have intrinsic characteristics (variable bandwidth, corruption, channel allocation delays) which influence the performance of transport protocols but also the energy consumption of battery powered devices. In this paper we present our results of the wireless communication power efficiency study on different battery powered devices. We give first a summary on existing advances in wireless power efficiency analysis and measurements and then present the evaluation application and the obtained results. II.
WIRELESS COMMUNICATION EFFICIENCY EVALUATION METHODS
One very popular application on a battery powered mobile devices is multimedia content visualization over a wireless link. The authors in [6] evaluate the impact on three encoding parameters on the CPU cycles during the decoding process. A total cost in terms of CPU cycles is introduced and a relationship to power consumption is suggested, but power consumption measurements are not performed, thus the cost model cannot be verified in practice. A fine grained benchmark with a low power wireless mobile computing platform (Consus) was conducted in [7] where Bluetooth and Wi-Fi components are compared. Although their power profile looks similar, the power consumption of the radio Wi-Fi
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 290–294. © Springer Science+Business Media B.V. 2008
POWER EFFICIENCY PROFILE EVALUATION
component represents 60% of the total power. This identifies the radio subsystem as a primary candidate for specific power management optimizations. A more generic approach towards application-level prediction of power consumption and battery dissipation has been taken in [8], where an estimation methodology is introduced. Based on a set of drain-rate curves from a set of benchmarks, estimation for an arbitrary program is composed. The estimation is limited to CPU and memory dimensions and it does not take into account the display, wireless communication or I/O components, but could be extended to include wireless basic benchmarks to assess wireless based communication applications. One of the implications of the application move towards unified communication systems is that most of them require secure communications that are provided through cryptographic components realized in both hardware and software. As one might suspect, there should be a link between security protocols and energy consumption on battery powered embedded systems. The authors of [9] have conducted thorough measurements and proved that security processing has a significant impact on battery life. They have conducted an energy consumption analysis on a wide set cryptographic algorithm, including a thorough analysis on the SSL protocol, where its energy consumption is broken into cryptographic and non-cryptographic components. Some proof on the increased usage of wireless mobile devices has been reported in [10], where wireless data transfer and battery lifetime for Wi-Fi connected mobile terminals has been logged in several networks in the US. It is striking that 99% of the survey participants were under cellular networks while 49% were under Wi-Fi networks on everyday life. Based on application usage type (email, web feeds, web browsing and multimedia) and usage interval time, a relationship between data-size transfer, transfer interval and battery life is presented from the obtained logged data. III.
POWER CONSUMPTION AND POWER-EFFICIENCY MONITORING APPLICATION
A. Power efficiency metrics The monitoring application we used in our tests to evaluate application level power efficiency is based on the power benchmark concept we defined in [11]. The power benchmark is an extension of the benchmarking concept by applying power consumption metrics. Our proposed power benchmark software is based on the information measured or estimated by the battery driver or the operating system. Battery status is based on several parameters: battery capacity [mWh], maximum battery capacity [mWh], charge/discharge rate [mW], current drawing [mA], battery remaining life time [s], battery temperature [oC], etc. All these parameters provide an image over the application level power consumption. As a battery discharges, its terminal voltage decreases from an initial voltage Vmax, given by the open circuit voltage of the fully charged battery, to a cut-off voltage Vcut, the voltage at which the battery is considered discharged. Because of the chemical reactions within the cells, the capacity of a battery
291
depends on the discharge conditions such as the magnitude of the current, the duration of the current, the allowable terminal voltage of the battery, temperature, and other factors. The efficiency of a battery is different at different discharge rates. When discharging at low rate, the battery’s energy is delivered more efficiently than at higher discharge rates [4]. Battery lifetime under a certain load is given by the time taken for the battery terminal voltage to reach the cut-off voltage. The standard capacity of a battery is the charge that can be extracted from a battery when discharged under standard load conditions. The available capacity of a battery is the amount of charge that the battery delivers under an arbitrary load, and is usually used (along with battery life) as a metric for measuring battery efficiency [12]. Different power consumption metrics presented before can be used in benchmark, but these can be grouped in two classes: Power consumption metrics – battery discharge rate, current consumption; Remaining battery metrics – remaining life time, remaining battery capacity, etc. For power efficiency we used energy consumed to send/ receive a bit of information or the number of information bits transferred between two battery capacity values. B. Wireless implications for TCP/IP Wireless links are an important part of the Internet today. In the future the number of wireless or mobile hosts is likely to exceed the number of fixed hosts in the Internet. Despite its increasing usage, TCP/IP stack implementations are not optimized for wireless particularities and power management aspects. For example, the Collision Detection mechanism is suitable on wired LAN, but it cannot be used on a WLAN for two reasons: a) implementing a collision detection mechanism would require the implementation of a full duplex radio; b) in wireless environments we cannot assume that all stations hear each other. Therefore 802.11 WLAN uses a Collision Avoidance (CA) mechanism together with a positive acknowledgement mechanism [13]. Errors rates on wireless links are a few orders of magnitudes higher then on fixed links. These errors cannot always be compensated by the Data Link Layer. The fundamental design problem for TCP is that a packet loss is always classified as a congestion problem. TCP cannot distinguish between a loss caused due to a transmission error and a loss caused by a network overload. Therefore we can say that TCP/IP protocols stack implementations are not well optimized for wireless communication medium and management of power consumption. C. Monitoring application In order to show how different types of wireless communication patterns influence the power consumption of a mobile device at the application level, we implemented a C++ application written in MS Visual Studio 2005. The same application source code was built for different Microsoft
292
MARCU ET AL.
Windows platforms: Win32, Window Mobile 5.0 PocketPC and Windows Mobile 5.0 Smartphone. The monitoring application is composed from a number of specialized modules (Fig. 1): Battery monitor - is a software module used to achieve real-time on-line power consumption data from battery device; CPU monitor - is a software module used to monitor CPU parameters such as load, temperature, etc.; Wireless monitor - is a software module implemented to monitor different parameters of wireless communication: signal power strength (RSSI), bandwidth, data transferred, etc.; Traffic generator - client/ server classes used to generate TCP/UDP traffic on wireless network card. The generator could be configured to use different values for communication parameters: data block size, data transfer direction (download/ upload), transport protocol (TCP/ UDP), data patterns, etc.; Data logger - logging module to save all monitoring values from all modules. Traffic generator
Main application
Data logger
IV. MOBILE SYSTEMS POWER-EFFICIENCY CHARACTERIZATION OF WIRELESS COMMUNICATION We used the monitoring application we implemented and power benchmark we introduced in [11] to emphasis power consumption of wireless communication on mobile system. A. Wireless communication devices power consumption In order to measure the power consumption of different wireless communication devices existing in a mobile system we used the same LOOX T830 PocketPC. The power signatures obtained for this system when each of the following subsystems like GSM, Bluetooth and WLAN are respectively switched on are presented in Fig. 2. For each power signature in Fig. 2 the communication device was turned on but no communication traffic was initiated. It can be observed that the GSM communication hardware is very well designed for battery powered devices because its power consumption is quite the same with the GSM hardware deactivated. Bluetooth and WLAN hardware switched on imply more power consumption than with no communication hardware activated. Bluetooth device activation add around 13% to the total idle power consumption, and WLAN card activation consumes around 50% more than the idle state. All of the above results are shown in a more intuitive way in the chart depicted in Fig. 3.
application level OS and driver level
Battery monitor
CPU monitor
Wireless monitor
hardware level Fig. 1. Monitoring application architecture D. Experimental testcases In order to evaluate power efficiency profiles for wireless communication applications we elaborate a set of experiments, based on them we established a set of test cases. Every experiment ran for 30 minutes in the same environmental conditions. Three hardware devices we used in our tests: Fujitsu-Siemens LOOX T830 PocketPC and SmartPhone; Fujitsu-Siemens LOOX N560 PocketPC; Fujitsu-Siemens Intel Pentium IV dual core mobile 2000MHz laptop with 512 MB RAM. The proposed test cases try to cover different aspects of wireless communication applications: WLAN chipset consumption; WLAN compared to other kind of wireless communication standards; ad-hoc and infrastructure wireless communication; reliable (TCP) and non-reliable (UDP) wireless communication; wireless communication data patterns; wireless communication data block sizes; wireless communication direction (download vs. upload); wireless communication distances and transfer rates.
Fig. 2. Wireless communication devices power consumption
Fig. 3. Wireless communications power consumption
B. Wireless LAN power consumption From Fig. 3 it can be easily observed that WLAN chipset and communication represents an important part from total power consumption of the device. In order to show the different aspect of WLAN chipset power consumption the following experiments were executed on a Loox N560 PocketPC:
POWER EFFICIENCY PROFILE EVALUATION
-
WLAN chipset off; WLAN chipset on and not connected to the network; WLAN chipset on and trying to connect to the network; WLAN chipset on and connected to the network; WLAN chipset on and transferring data on the network. Related to the idle state power consumption with WLAN chipset switched off, WLAN chipset on and WLAN communication show an increase in power consumption as presented in Fig. 4. WLAN chipset switched on has the same power consumption independent of the WLAN network status: not connected, trying to acquire network connection and connected.
293
Communication bandwidth depends on the blocks’ size transferred (Fig. 7) therefore it can be observed that power efficiency of wireless communication is directly proportional with communication bandwidth (Fig. 8).
Fig. 6. Power consumption for different transferred block sizes
Fig. 4. WLAN chipset power consumption
Fig. 7. Bandwidth for different transferred block sizes
Fig. 5. WLAN chipset current consumption
Current consumption of the PocketPC when WLAN chipset is switched on is three times larger than the case WLAN is off. When communication is started over the wireless, the increase in current consumption is 6 times greater than the case WLAN is off (Fig. 5). C. WLAN communication power consumption In this test case we tried to find out how different communication parameters influence the power consumption and power efficiency. The first test was selected to emphasis the relationship between communication block sizes and power consumption and efficiency. This test involved two battery powered devices: a laptop and a PocketPC. The monitoring application was deployed and configured as server on the laptop and as client on the PocketPC. The monitoring application ran for 30 minutes for each test and in each test different block size transfers were effectuated: 1KB, 4 KB, 10 KB and 100KB. From Fig. 6 it can be observed that power consumption is the same no matter the size of blocks transferred.
Fig. 8. Power efficiency for different transferred block sizes
Another aspect of wireless communication addressed by a new test case is the distance between the communicating devices. In our tests power consumption is the same no matter the distance between sender and receiver for both laptop and PocketPC tested. The power efficiency decreases when the distance between sender and receiver increases. Studying the results of this test we conclude that for both systems we used (the laptop with Windows XP and the PocketPC with WM5) the wireless communication drivers are not optimized for power management regarding the aspect of distance between interconnected nodes. In this test case the power management settings for PocketPC were activated. Other test cases we ran were: wireless network ad-hoc and infrastructure; uplink and downlink transfers; data transferred patterns.
294
MARCU ET AL.
For all these tests the same result was obtained: wireless transfers power consumption has the same values but power efficiency it depends direct proportional with transfer rates. D. Wireless applications power consumption When wireless communication is performed more power is consumed than the case when only the communication hardware is turned on. Two types of wireless voice communication are presented in Fig. 9: GSM and WLAN VoIP using Skype application. It can be observed that GSM communication is very well suited for mobile devices because when GSM chipset is switched on the total system power consumption does not increase and when a voice call over GSM consumes less energy than WLAN VoIP communication.
REFERENCES [1] [2]
[3]
[4] [5]
[6]
[7]
[8]
[9] Fig. 9. Wireless voice communication power consumption
V.
CONCLUSIONS
Power benchmarks are software applications that monitor battery status parameters when running certain workload. They have applicability in profiling mobile systems and their applications, evaluation and calibration of power management solutions, battery sizing related to applications, on-line battery monitoring. We presented in this paper a power benchmark implementation for wireless communication power consumption and power efficiency profiling. In our tests we observed that power management techniques have no effect on WLAN communication. Besides specific energy profile evaluations, one of the applicability of our proposed tests could be to help assessing the power profile aspect of TCP/IP stack implementations. This will be one of the directions of our future work.
[10]
[11]
[12] [13]
Gartner Inc, “Top 10 Strategic Technologies for 2008” http://www.gartner.com/. K. Lahiri and S. Dey, “Efficient Power Profiling for Battery-Driven Embedded System Design”, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 23, No. 6, Jun. 2004, pp. 919-932. S.H. Gunther, F. Binns, D.M. Carmean, and J.C. Hall, “Managing the Impact of Increasing Microprocessor Power Consumption”, Intel Technology Journal, Q1, 2001. D. Linden and T. Reddy, “Handbook of Batteries”, McGraw-Hill, 2001. D. Brooks et. all, “Watch: A Framework for Architectural-level Power Analysis and Optimization”, Proc. of International Symp. on Computer Architecture, 2000, pp. 83-94. C. Koulamas and A. Prayati and G. Lafruit and G. Papadopoulos., “Measurements and modeling of resource consumption in wireless video streaming: the decoder case”, WMuNeP ‘06: Proceedings of the 2nd ACM international workshop on Wireless multimedia networking and performance modeling, pp67-72, ACM Press, 2006, Terromolinos, Spain Raghunathan, V. Pering, T. Want, R. Nguyen, A. Jensen, P. , “Experience With A Low Power Wireless Mobile Computing Platform, Low Power Electronics and Design”, 2004. ISLPED ‘04. Proceedings of the 2004 International Symposium, pp363-368, August 9-11, 2004, Los Angeles, CA Chandra Krintz, Ye Wen, and Rich Wolski, “Application-level Prediction of Battery Dissipation”, ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), pp224-229, August 911, 2004, Newport Beach, CA Potlapally, N.R. Ravi, S. Raghunathan, A. Jha, N.K., “Analyzing the energy consumption of security protocols”, Proceedings of the 2003 International Symposium on Low Power Electronics and Design, 2003, pp30-35, August 25-27, 2003, Seoul, Korea. Ahmad Rahmati and Lin Zhong., “Context-for-wireless: contextsensitive energy-efficient wireless data transfer”, Proceedings of the 5th international conference on Mobile systems, applications and services, MobiSys ‘07, pp165-178, June 11-14, 2007, Juan, Puerto Rico. D. Tudor and M. Marcu, “A Power Benchmark Experiment for BatteryPowered Devices”, IEEE Int. Workshop on Intelligent Data Acquisition and Advanced Computing Systems, 6-8 Sep. 2007, Germany. M. Hachman, “SPEC Developing Benchmark To Measure Power”, ExtremeTech, May 2006. C. Chaudet, D. Dhoutaut, I. Guerin Lassous, “Experiments of some performance isssue with IEEE 802.11b in ad hoc networks”, Proceedings of the Second Annual Conference on Wireless On-demand Network Systems and Services 2005.
Closing the Gap Between Enterprise Models and Service-Oriented Architectures Martin Juhrisch Project MIRO Westfälische Wilhelms-Universität Münster [email protected]
Abstract-In 2005, four German universities created a research program to improve and standardize their administrative processes. Therefore, reams of processes were analyzed and core functions identified. As a result, automatable core functions have been implemented as web-services using the service-oriented architecture (SOA) paradigm. To facilitate reuse, this functionality has been documented using a service catalog. However, the real advantage of SOA does not evolve until these services become configurable at the business level. We introduce a modeling grammar to model service function interfaces and standardized business object types allowing the redesign of enterprise systems functionalities at the business level. We illustrate the usefulness of our approach with a pilot study conducted at the University of Munster (Germany). Index Terms—enterprise models, service-oriented architecture, state transition models, web services
I.
INTRODUCTION
Today’s German universities are faced with a modernization and performance gap. At the same time the requirements of students and researcher increase, the human resources necessary for the management of the university and its central institutions remain static or are even decreasing [1]. A consortium of four German universities has started to meet the challenge by analyzing its core processes in order to find reorganization potential of its structure. A large potential for modernizing the university administrations lies in process reorganization with information and communication technology (ICT). Within the scope of the IT reengineering project “Munster Information System for Research and Organization” (MIRO), we build upon the analysis outcomes and introduce new ICT-components. Additionally, we try to reengineer legacy information systems on the basis of a university wide service-oriented architecture (SOA) [2]. In contrast to previous software architectures, SOA offers the possibility to establish a closer connection of business process flow – represented by enterprise models – and information system infrastructure – represented by web-services [3]. The integration of SOA components to enterprise models follows this idea trying to close the gap between IT and organizational domain. The possibility of a direct link between business workflows and the supporting information system are often seen as the critical strength of the SOA paradigm. The connection between elements with a direct business reference like business objects
Werner Esswein Chair for Information Systems, esp. Systems Eng. Technische Universität Dresden [email protected]
and elements of an information system architecture aims at the enabling of sustainable enterprise architectures. This raises the question in which way a connection between structural respectively process organization – documented in enterprise models – and a software system could be established? Classical approaches like the object oriented analysis and design (OOAD) failed with similar objectives in the past or enable an automated transformation between both domains only under ideal conditions [4]. Goal of this study is now to introduce a method allowing the configuration of SOA services at a business level and therewith carry out an adjustment between the target state organizational model and the information system (IS). The configuration demands on the one hand to be able to estimate the applicability of a service for a certain business scenario and on the other hand a high flexibility of the service, which matters regarding its integration in the information system architecture. Concerning the second aspect, industrial standards of the W3C cover already today all information necessary to access the web-service’s value. A deficit exists concerning the way of establishing a business access to a web-service in order to estimate its business value and its compatibility in a business scenario. Since we cannot automatically transform an informal (respectively semi-formal) service description into a formal WSDL, an automatic test on feasibility of the service configuration in an enterprise model is hampered. If we accept this task as a creative human decision, technologies become interesting that allow a semi-automatic support. The approach presented in this paper seizes this idea and proposes an integration framework based on a catalog system. The paper is structured as follows. The next section gives a brief theoretical introduction into the specifics of enterprise modeling and service-oriented architectures. Section 3 introduces a framework showing how enterprise and webservice models can be integrated using a web-service modeling approach embedded in the method engineering field. The paper ends with a discussion, summarizing the proposed ideas and exposing open questions regarding the realization of the application integration. II. BACKGROUND A. Enterprise modeling In the course of business process reengineering activities
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 295–299. © Springer Science+Business Media B.V. 2008
296
JUHRISCH AND ESSWEIN
modeling approaches are an important technique for an effective intervention into the problem domain [5]. Essential benefit is attained through reduction of complexity by abstraction which facilitates analysis of complex systems ([6]; [7]). Hence, within the scope of first stage problem analysis and documentation, visual models are established – so called enterprise models representing business requirements for the actual enterprise situation [8]. The semantic mightiness of enterprise modeling language constructs has to cover nonformal aspects supporting a deep understanding of the business domain as well as formal aspects in order to prepare the mapping of an organizational target state to a system implementation [9]. Thus, we need semi-formal languages to model problems, which are not well-structured, highly subjective, and finally not objectively well formed [10]. Most grammars, however, do not possess a sufficient number of language constructs to model all phenomena in a domain [11]. Consequently, we might need a multitude of modeling languages to model the dynamic changing business domains [12]. For process documentation purpose, e.g., we use modeling techniques like Petri-Nets [13] or the Event-Driven Process Chain (EPC) [14]. On the other hand in the course of the design of executable business process, languages like the Business Process Modeling Notation (BPMN) [15] have been developed. They enable business domain experts to model understandable graphical representations of business processes with the intention to generate executable Business Process Execution Language (BPEL) artifacts [16]. A BPEL file contains a XML based graph structure defining a flow of link elements to web-services. Hence, business processes can be performed by a defined flow of web-service invocations. In the course of an increasing number of enterprise modeling grammars, meta-CASE tools are needed to support language independent modeling. Software that supports language independent modeling is subsumed under the concept of metaCASE tools like MetaEdit+ [17] or Cubetto toolset [18]. B. Service-oriented architectures A SOA is a method to design system landscapes based on the concepts: contract, service, and interface. The SOA paradigm intends that a service is related to a semi or fully automated activity in a business process according to the terms of a contract, which determines the characteristics of the activity’s implementation [19]. The functions contractually agreed within the service are realized by the interface of an application. As service function we define the differentiated and independently useable functions of a service, which can be used by other services [20]. The SOA paradigm differs according to its service function concept from other software paradigm as a service function represents an entire business activity and its reuse is intended on system landscape level. This may be the case if a SOA has reached a stage of development where the premises for a reuse of software solutions in an organization are complied. Then we speak about service as the possibility for multi reuse of business
solutions including their implementation in software [19]. For detailed information about organizational premises for software reuse we reference [19]. This paper focuses on a web-service based realization of the SOA paradigm according to the recommendations of the W3C consortium [3]. W3C’s layered architecture for today’s webservice technology focuses on three principle core elements described in detail by [21]. The Simple Object Access Protocol (SOAP), the Web-Service Description Language (WSDL) as well as the Universal Description, Discovery and Integration Language (UDDI) [22] have become de facto standards for XML messaging, web-service description and registration. In addition to these main protocols web-service composition requires higher levels of description. In the course of this demand orchestration and choreography languages base upon several high level standards like WS-BPEL or WSFL (Webservice Flow Language) [23]. Compared to the core protocols this high level languages take a step forward by integrating web-services in business process models. Essence is the integration of business processes across enterprise’s boundaries by modeling web-services in directed graphs in the order of their chronological sequence. One main problem in implementing a SOA is the appropriate definition of service function granularity. The literature does not specify a generally accepted guideline [20]. The basis for this decision can be a supported business process, the derivable services and the importance of individually design objectives [20]. Hence, the decision is done individually, dependent on the characteristics of the particular business process. This leads to the situation that business activities are assigned to service functions of varying size in a non-automatable decision process. III. INTEGRATION FRAMEWORK As mentioned in [24], the establishment of a direct linking between organizational and IT domain can only be done with the provision that we use a set of entities simultaneously in both domains. Up to now, existing approaches are not suitable to the transformation from enterprise models to software design models (see [4]; [25]). The main problem from a modeling view is the use of language constructs originally intended exclusively for the software design [4]. Thus, design decisions may exert influence on the way the analysis of the organizational domain is modeled. In addition an automatic transformation demands a complete input containing all information relevant to the transformation. Though, the completeness of enterprise models cannot be subject independently assured. Furthermore, it is impossible to evaluate the correctness or significance of modeled information. Authors proclaiming an automatable transformation assume that functions of a web-service correspond to particular business activities in a business process. However, this idea may be arguable if we talk about functional requirements models and not about target state organizational models
297
ENTERPRISE MODELS AND SERVICE-ORIENTED ARCHITECTURES
(enterprise models). Functional requirements model reference the software system in a high-abstract manner. Hence, in reducing the level of abstraction we could assign design functionality to web-service functionality. This dependency implies a homomorphism between service functions of a webservice and business functions in a functional requirements model. However, this is admittedly not adaptive for our sense of integration between enterprise models and the SOA domain. The present paper proposes an approach, which supports a decision process between business problem description and IT solution but not an automatic transformation. The decision oriented approach is based upon the experiences of the software architect with adjustment between target state organizational models and reusable SOA components. Previous research is more likely to show that this decision process cannot be formalized and therewith automated, too [4]. Formalization would require that all variables with an impact to the dependency between solution alternatives and their consequences to the software design have been operationalized in a functional way. Enterprise models can inform business experts about the context of the information system; however they cannot serve as a formal input for a machine. Hence, we propose a business configuration of SOA services in a target state organizational model using a framework that consist of a catalog system for SOA services in conjunction with a modeling tool. Out of a bulk of documented software solutions the one is chosen that suits the best in a certain business context. A. Building service function interfaces At the University of Munster the semantic gap between IT and organizational domain is bridged by building up a service catalogue for all university web-services and their integration into a modeling CASE tool environment. As every web-service implements a certain web-service interface, and since the language WSDL does not consist of appropriate language constructs for a business level configuration, we require a complementary representation of service functions – respectively their signatures – at business level. Every service function signature represents real business value implemented by an automated business activity. If we could create more business like service function signatures, we would be able to configure SOA functionality in target state organizational models. The question is now, how to describe a business level signature of a service function? Business Objects are an approach to assign the object orientation paradigm not only to implementation level in software engineering but also to the level of enterprise modeling [26]. A business objects represents simultaneously a part of the enterprise model (outer perspective) and of the system design (inner perspective). Within the scope of business objects, representations of real world entities are basically characterized by a distinct name and a description of their semantic and aim in their business domain. The linking of services causes a mapping between input and output business
objects of different service functions. A semantic heterogeneity between business objects representing identical real world concepts resolves in bad mapping possibilities as each involved service has its own understanding of the business object. Hence, to reduce the semantic heterogeneity we propose the one-of definition of a business object in an SOA environment. This can be done in an enterprise service bus (ESB) [20] in terms of a central architecture control. We assume that all business objects are mapped to a set of attributes and each linked service can handle this objects respectively can perform a mapping of these objects to its own. This leads to standardized sets of business object types (BOT) and attribute object types (AOT), which we can use to describe the service function interface. To foster the definition of BOTs we use a modeling language in the style of a UML object class diagram on meta-level (see Figure 1). Extended E ³Metametamodel
M3M
AO definition OST definition model of
M2M
Metamodel of state transition modeling language
Model of OST Address
model of
BOT
Name User identification Password
Person
Street
Phone Sex
House number
Bank code number
Residence Country
Bank name Bank account Address
Metamodel of service function interface modeling language
used in
State transition model
model of
Model of service function interfaces BOT
BOT
M1M
Last name First name Place of birth
test on feasibility
Servicefunction(i )
BOT
BOT
getRole
Fig. 1. Modeling framework.
These BOTs are subsequently used to define a service function interface on object model level. Section 3.2 introduces the modeling language that is used to define the service function interface. An service function interface represents one-to-one a service function signature of a web-service and therewith establish the reference between a business function at business level and its implementation in IT. Afterwards we can configure existing service function implementation in an enterprise model (state transition models) as we have information about what a service function does. Admittedly, until now we cannot configure service functions depending on information, how a service function processes its business function. The BOT just solve the problem of semantic heterogeneity of service functions and foster a higher degree of their university wide understanding. Semantic heterogeneity means that web-service functions implement domain specific concepts in different ways. Though, pragmatic heterogeneity references the varying understanding of an implementation of a business function [27]. B. Modeling approach Our modeling approach supports distributed modeling of service function interfaces and not a central modeler. Main
298
JUHRISCH AND ESSWEIN
problem in a distributed modeling approach are the use of semi-formal modeling languages, which share strength and weaknesses with natural languages. As a result of a high degree of freedom within the modeling activity, we have naturally no standardized set of modeling elements. This will lead to a lack of comparability what becomes an issue as we need to configure SOA service within a single enterprise model. Thus, to enable cross-model references we propose the use of BOT representing a limited set on business objects representing the consensus between software architects and domain experts. Thus, the software architect is specifically restricted in his expressive power when modeling a service function interface as only the predefined set of BOT can be instantiated by the assignment of AOTs. The method engineering is based upon the existing E³-Model (E³; [28]) classified as meta-meta-model on M3-Level of the Meta-Object-Facility Architecture [29] (see Figure 1). Abstracting from a determined notation, as a first step we define a standardized set of BOT and AOT on M2-Level. Secondly, we turn our attention to the construction of the metamodel of the service-function interface modeling language. We assume that all contained constructs and relationships between them are described over the E³-Model language conventions within the extended E³-metametamodel. Afterwards, we generate service function interfaces within a separate service function interface model on object level (M1M). Thus, we generate separately a modeling language that can be understood and adopted by target group users of the E³+WS method (see Figure 2). Elements
Types <xsd :element name = ”student ” minOccurs =”1 ” maxOccurs =”1 ”>
message
< xsd : complexType name = “Student“ > < xsd : attribute name ="courseOfStudies " type ="xsd :Object "/> < xsd : attribute name ="matriculation numer
Service function
" type ="long "/>
< /xsd :complexType>
Object structure type
xsd :element >
Attribute object
aggregation messageEdge _in
Student
messageEdge _out
Course of Studies
Publish dissertation
Matriculation number
Dissertation
Course of Studies
message objects that act as containers for input respectively output parameter objects. Messages are connected with service functions over so called messageEdge_in or messageEdge_out constructs depending on the role of the message regarding their related service functions. Thereby a service function can be connected to maximum one messageEdge_in and always exactly one messageEdge_out (see Figure 2). Hence, it is assumed that the possible service functions communicate either with a request-response or with a notification pattern [22]. To define a single message parameter, we use the predefined BOT and AOT dividable into simple or complex objects. A BOT cannot be instantiated like ordinary object types but parameterized in the service function interface view using the predefined set of AOTs. Simple AOT are assigned to simple data types. These types comprises on the one hand predefined XML schema data types and on the other hand own data types declared within the WSDL types tag. With self declared data types we are able to define for example appropriate report formats as potential result layouts for a migration operation. A complex AOT references a BOT respectively design activities of a method engineer. Therefore, an E³-data type is introduced to establish ties between complex AOT and meta-model patterns. In the first move, we only tie E³-object types down to complex AOT. BOT and AOT are connected over a nondirectional edge. For more detailed information of the E³+WS modeling notation we reference the work of Weller et al. [24]. The main requirement for this framework represents the avoidance of any restriction to the ordinary modeling task on business level. Both clients and modeling experts (model creator) should not notice the structure of a service function interface (see Figure 1). Usually the construction of enterprise models requires “a specific competence that is neither covered by software engineering nor by organization science” [9] so implementation related information has to be kept away. Additionally the integration of implementation aspects after the analysis phase of the system engineering would not facilitate the adaptation process of enterprise systems functionality.
Matriculation number
Dissertation
Publication date
portType < operation name
>
< input message =“message (1526 b 60 )“>
Message < message name ="message (1135554 )"/ > < part name = “Person“ element
=“publishDissertation“
= “xsd :string“ / >
< output message =“message (1135554 )“> < /operation >
< /message >
Fig. 2. Interface model for service function “Publish PhD Thesis”.
Centric element of our modeling approach is the service function. A service function interface consists of an appropriate parameter assignment. Therefore, we define special
C. Configuring web-services in enterprise models To sum up, the proposed integration framework tends to result in a set of business object types (BOTs), attribute object types (AOTs) and service function interfaces processing one or more BOTs. Using the interface models we can configure webservice functions in enterprise models in a way that allows an automatic mapping to the SOA. Therefore, we introduce a state transition modeling language based on the EPC in ARIS (see Figure 3).
299
ENTERPRISE MODELS AND SERVICE-ORIENTED ARCHITECTURES
faculty name
Dissertation is present
[3]
AND
... Publish dissertation
AND
[4]
Archive dissertation
[5] Dissertation is archived
[6] Student finished studies
Student
Dissertation
Publication status
Removal from the registry of students
Student removed
Person
[7]
Course of Studies Matriculation number
[8] [9]
Name User identification
... Fig. 3. Integration of object structures in enterprise models.
We extended the EPC with the concept of BOT and AOT. Fig. 3 illustrates the extension of the construct “Event” with a BOT. Hence, an “Event” is defined as a certain business state represented as the state of the participating business object types. Thus, we propose ARIS EPC as target state organizational model and a modified EPC as state transition model representing the result of the task automation decision process [4]. Hence, automatable business logic is carried out by a certain amount of state transitions of BOTs. A certain web-service function matches in a business context if an “Event” in the EPC model passes business objects with the appropriate attribute assignment to the second accordingly to its service function interface model.
[10] [11] [12] [13] [14] [15] [16] [17] [18] [19]
IV. DISCUSSION The approach presented in this paper builds upon the ideas for a uniform specification of web-services and the standardization activity with UDDI. We realized a modeldriven development of service function interfaces and their assignment to web-service functions in a UDDI repository. Regarding the configuration of a SOA within enterprise models, the presented modeling approach allows us to integrate web-service functionality in state transition models and to test on feasibility in terms of a meaningful semantic process chain. Though, without information about the concrete implementation of a web-service function we have the problem to indicate how a certain service function is executing its task. We rely on the knowledge of the software engineer who is modeling the service function interface. For now, we cannot be sure if this process chains makes sense from a pragmatically point of view. This is part of our future research. REFERENCES [1]
J. Becker, L. Algermissen, T. Falk, D. Pfeiffer, and P. Fuchs, “Model Based Identification and Measurement of Reorganization Potential in Public Administrations - the PICTURE-Approach,” presented at The Tenth Pacific Asia Conference on Information Systems (PACIS’06), 2006.
[20]
[21] [22] [23]
[24] [25] [26] [27]
[28] [29]
P. Fremantle, S. Weerawarna, and R. Khalaf, “Enterprise Service Examining the emerging field of Web Services and how it is integrated into existing enterprise infrastructures,” Communication of the ACM, vol. 45, pp. 77-82, 2002. T. Erl, Service-oriented architecture: concepts, technology, and design: Prentice Hall PTR, 2005. M. Karow and A. Gehlert, “On the Transition from Computation Independent to Platform Independent Models,” presented at AMCIS Conference Proceedings, 2006. M. Hammer and J. Champy, Business Reengineering: a manifesto for business revolution, vol. 1. New York: HarperBusiness, 1998. H. Balzert, Development of Software-Systems: Principles, methods, languages, tools. Mannheim, Vienna, Zurich: Wissenschaftsverlag, 1994. O. K. Ferstl and E. J. Sinz, Grundlagen der Wirtschaftsinformatik, vol. 1, 2 ed. München, Wien: Oldenbourg, 1994. P. Fettke and P. Loos, “Classification of reference models: a methodology and its application,” Information Systems and e-Business Management, vol. 1, pp. 35-53, 2003. U. Frank, “Conceptual Modelling as the Core of the Information System Discipline - Perspectives and Epistemological Challenges,” presented at Proceedings of the 5th America’s Conference on Information Systems, AMCIS’99, Milwaukee, 1999. D. Harel and B. Rumpe, “Modeling languages: Syntax, semantics and all that stuff, part I: The basic stuff.,” Weizmann Institute Of Science 2000. R. Agarwal, P. De, and A. P. Sinha, “Comprehending object and process models: An empirical study,” IEEE Trans.Software Engrg., vol. 25, pp. 541-556, 1999. Y. Wand and R. Weber, “Research Commentary: Information Systems and Conceptual Modeling - A Research Agenda,” Information System Research, vol. 13, pp. 363-376, 2002. J. Desel, “Petri nets and business process management . Saarbrücken: Geschäftsstelle Schloss Dagstuhl, 1998. A.-W. Scheer, ARIS - Business Process Modeling, vol. 3. Berlin: Springer, 2000. S. White, “Using BPMN to Model a BPEL Process,” BPTrends, vol. 3, pp. 1-18, 2005. IBM, “Business Process Execution Language for Web Services,” 2002. S. Kelly, M. Rossi, and J. P. Tolvanen, “What is Needed in a MetaCASE Environment?,” in Enterprise Modelling and Information Systems Architectures, U. Frank, Ed., 2005, pp. 22-35. T. Cubetto. Dresden: Semture GmbH, 2007. A. Dietzsch and T. Goetz, “Nutzen-orientiertes Management einer Service-orientierten Unternehmensarchitektur.,” presented at Wirtschaftsinformatik 2005, eEconomy, eGovernment, eSociety, 2005. K. J. Oey, H. Wagner, S. Rehbach, and A. Bachmann, “Mehr als alter Wein in neuen Schläuchen: Eine einführende Darstellung des Konzepts der serviceorientierten Architektur.,” in Unternehmensarchitekturen und Systemintegration, S. Aier, Ed. Berlin: GITO Verlag, 2005. P. Muschamp, “An introduction to Web Services BT Technology Journal, vol. 22, pp. 9-18, 2004. A. E. Walsh, Uddi, Soap, and WSDL: The Web Services Specification Reference Book. New Jersey: Prentice Hall Professional Technical Reference, 2002. L. Ardissono, A. Goy, and G. Petrone, “Enabling conversations with web services,” presented at Proceedings of the second international joint conference on Autonomous agents and multiagent systems, AAMAS’03, Melbourne, 2003. J. Weller, M. Juhrisch, and W. Esswein, “Towards using visual process models to control enterprise systems functionalities,” Int. J. Networking and Virtual Organisations, vol. 3, pp. 412-424, 2006. C. Ouyang, M. Dumas, A. H. M. ter Hofstede, and W. M. P. van der Aalst, “Pattern-based translation of BPMN process models to BPEL web services,” International Journal of Web Service Research (JWSR), 2007. M. Weske, “Business-Objekte: Konzepte, Architekturen, Standards,” Wirtschaftsinformatik, vol. 41, pp. 4-11, 1999. S. Overhage, “A Standardized Framework for the Specification of Software Components,” presented at Proceedings of the 5th Annual International Conference on Object-Oriented and Internet-Based Technologies, Concepts, and Applications for a Networked World (NODe 2004), Erfurt, 2004. S. Greiffenberg, Method Engineering in Business and Government. Hamburg: Dr. Kovac, 2004. O. M. Group, “Meta Object Facility (MOF) Specification, version 1.4,” 2002. “
[2]
student
“
metadata
Dissertation
Object Normalization as Contribution to the Area of Formal Methods of Object-Oriented Database Design Vojtěch Merunka, Martin Molhanec Faculty of Economics and Management, Czech University of Life Sciences Prague Faculty of Electrical Engineering, Czech Technical University in Prague Czech Republic [email protected], [email protected] Abstract - An overview of the current status in the area of formal technique of object database design is described in the article. It is discussed here, why relational normalization is not able to be easy used in object databases. The article introduces various proposals of object normal forms and it brings own authors’ evaluation in this area.
I. INTRODUCTION Nowadays many various object databases are used in the practice. The programmers and IT experts have many implicit knowledge and experience concerning an effective production of applications. Unfortunately their knowledge is expressed very few in a form of formal rules and procedures, which would be widely accepted in the IT community. As a consequence of this situation, we can see wrong usage of relationships and hierarchies among objects, breakneck tricks in code, etc. The problem of these applications is not that they do not work. Unfortunately really monstrous constructions work thanks to modern components and development systems and this is why the discussion with designers about need to rebuild their system is very hard. Therefore we decided to discuss the formal techniques of object databases design. A database is the fundament of almost all software applications and object technology is very promising to be in the mainstream. Also, many myths exist in the community of object-oriented software vendors and developers. For example very popular is the myth about no need for any normalization, about easiness of programming etc.
II. TRADITIONAL APPROACH TO NORMALIZATION IN OBJECT DATABASES
We need formal techniques for object database design for the same reasons as in relational databases - for solving potential problems with data redundancy and consistency of data, for more flexibility etc. Let’s look first why is not
possible to use same techniques as in relational databases. For relational database design has been considered useful: 1. data normalization, 2. method of attributes synthesis and 3. method of data decomposition of attributes. We cannot use them because ODM (Object Data Model) is not continuation of RDM (Relational Data Model). Both models differ in the following matter of facts: 1. Objects have not only data components as of relational records, but also methods. The question is how those objects should be described. It is used the term behaviour or attribute of object, which consists both of data items and methods. Attributes are object characteristics seen from the outside. About attributes it is not differentiated, if they have an implementation origin in a data item or in a method 2. Rules of relational data normalization are derived from the concept of functional dependency between attributes. The attributes in relational database are basic building unit and they are scalar and atomic values, from them the individual relations consist of. But in ODM we work with whole objects, which compound attributes together from relational view. In additional, values of these attributes can be other objects as well. This all means that basic building unit of object data model in contrast to relational is neither atomic nor scalar. 3. A concept of object identity is the third problem of simple adoption of relational technique. In RDM, the identity of record is created by a value of chosen attributes (primary keys). In object data model identity of object is based on addresses into virtual memory and is independent on any value changes. III. SOME LATTER-DAY APPROACHES TO OBJECT
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 300–304. © Springer Science+Business Media B.V. 2008
NORMALIZATION
OBJECT NORMALIZATION AS CONTRIBUTION TO THE AREA OF FORMAL METHODS
Some various papers arisen in the turn of 1980s and 1990s. First papers applied to the enlargement of relational techniques, other papers present new approaches to this problem. 1.
Nootenboom’s OONF First three normal forms for relational and object databases are universally valid according to Dutch author Henk Nootenboom [11]. As a substitute for fourth and fifth relational normal form (and probably BCNF) he sets up the concept of one object normal form, which has following definitions: A collection of objects is in OONF if it is in 3NF and contains meaningful data elements only. Universal validity of 1NF, 2NF and 3NF is absolutely legitimate opinion, but we leave without comment the OONF definition. 2.
Khodorkovsky’s ONF, 4ONF, 5ONF and 6ONF The paper [8] sets up the opinion of object normal form (ONF), which concerns the right relation among data of object and methods of object. The rules of so defined object normal form are completed to classical relational definitions 4NF, 5NF (and to 6NF to, which is author’s specification of 5NF). The author names these completions of classical definitions as 4ONF, 5ONF and 6ONF. The paper is considered to more qualified formulation of similar ideas as the example above. It is valid according to the author that 1NF, 2NF and 3NF are common for relational and object databases. 3.
“Chinese” ONF The paper [15] sets up one object normal form as substitute for all relational normal forms. Tree hierarchy considers the object data model in this paper so, how XML language works with it. We suppose that problems of object databases proposal, as we understand it, does not concern. “Australian-Swiss” ONF The authors [12] set up one ONF by the help of more types of functional dependences among objects. Concrete path dependency concerns a composition of objects and navigability among objects, local dependency concerns relations of internal object folders and global dependency concerns requirements on application. Objects structure is in ONF then, if users’ requirements on applications are retrospectively deducible from the relations among objects. In agreement with our opinion it concerns all interesting contribution to problem of testing the accordance of suggested object oriented application model with its requirements. This problem is connected with the proposal of object databases, but it does not go about methodology of right proposal from preventive view of redundancy, inconsistence, etc.
301
applications are set up in internet [2] and in books [1], which are analogical with first, second and third relational normal form. The authors define these normal forms 1ONF, 2ONF and 3ONF themselves and talk about them as a tool for objects classes’ normalization complementary with technique of design patterns. Let’s look at their proposals in detail: 1ONF A class is in 1ONF when specific behaviour required by an attribute that is actually a collection of similar attributes is encapsulated within its own class. An object schema is in 1ONF when all of its classes are in 1ONF. 2ONF A class is in second object normal form (2ONF) when it is in 1ONF and when “shared” behaviour that is needed by more than one instance of the class is encapsulated within its own class(es). An object schema is in 2ONF when all of its classes are in 2ONF. 3ONF A class is in third object normal form (3ONF) when it is in 2ONF and when it encapsulates only one set of cohesive behaviours. An object schema is in 3ONF when all of its classes are in 3ONF. In the third and the last object normal form it is possible to recognize that it offers analogical results as the third relational normal form. It concerns that we have the characteristics within some objects, which should be interpreted as an independent object. So we single them out into the new independent object. If we do not do it, so the objects will keep more mutually various groups of characteristics, which authors defines as cohesive behaviours in the definition. The sense of this normal form is absolute. The only thing is that it is defined by too programmer-oriented language.
4.
5.
Three Ambler-Beck’s object normal forms Three object normal forms for object
oriented
IV. OUR INTERPRETATION OF THE PROBLEM All here mentioned papers are very important contribution to discussion of this topic. When we resume, what probably the community of analysts and designers expect from the technique of object normalization, so we can define the following conclusion: 1. It has to be very simple, precise, and understandable and should work with minimum of concepts similarly as it is with “classical” normalization. We suppose that implementation of difficult definitions distinctively exceeded the range of classical normal forms; a lot of types of concepts and relations are not the right way. 2. It should be focused concretely on database design, so the structure of objects, which will serve to data saving and manipulation in database systems and not in objects, which are responsible for “operation” of
302
MERUNKA AND MOLHANEC
applications. We have design patterns for correct development of application objects, which is not necessary to be replaced. 3. It is possible, that eventually object approach becomes universal approach to information system analysis, and relational technology limits itself to be one of possible implementation variant only. So present conditions can be turned to the opposite. It would be smart to have the new theory analogical with entity-relation modelling concept and relational normalization. In the best case the relational normalization (as a tool of the relational technology) should be deducible from new theory. V. ONTOLOGICAL BASED APPROACH TO OBJECT ORIENTED NORMALIZATION
As mentioned in chapter 2, the reason for data normalisation in relation or object database is some; it is the need for disappearance of data redundancy, the source of many application errors. What is the origin of principle of data normalisation? Both paradigmatic models – relational and object data model are a PDM`s (Platform Dependent Models) in the sense of MDA (Model Driven Architecture) and can be derived by transformation from PIM (Platform Independent Model). A required model with high level of platform independency is a conceptual data model. The right conceptual model is a map of concepts and their relationships in the real world, the physical reality of everyday life which everyone experiences. The real world has the one most important nature a non-existence of physical redundancy. All objects in the real world exist only in one particular entity. In addition derived attributes do not exist in the real world as well. The proper conceptual data model is based on the strong ontological origin and relational and object data normalization can be derived from its ontological concepts. The basic ontological principles which effect data normalization in relational and object area are as follows: • If any part of property has its own sense, such property is an individual concept. The property is self-indivisible. • If some properties together have a specific sense, such group of properties forms a new individual concept. The concept has only individual properties. • Individual concepts are joined only by associations, such as inheritance, composition and relationship. • A concept has every property only once. If concept is joined by sense with multiple occurrences of identical property, the concept and such properties are joined by composition association. • A particular property belonging to only one concept. A property joined by sense with more than one concept is a relationship. Hereinbefore presented principles originated from the data conceptual ontology conformable with the general ontology
introduced in GOL (General Ontology Language) [16]. We take a note, that our interpretation of notions such as concept, property and association originate from this general ontology and not from the common sense used in the object oriented programming. VI. OUR DEFINITION OF OBJECT NORMAL FORMS We think that the modified form of Ambler-Beck’s approach fulfils best all above described requests. This version was already checked several times, it was used in lessons and practically in Czech companies and we purpose to work on it further. We suppose that this method should precede all possible next thoughts about using of inheritance, composition and other relations among objects in the course of modelling.
Order
Supply
supplier firstname supplier surname supplier address client firstname client surname client address order date payment mode first product name first product price second product name second product price third product name third product price ...
supplier firstname supplier surname supplier address client firstname client surname client address supply date payment mode first product name first product price second product name second product price third product name third product price ...
Fig. 1 - Classes in non 1ONF.
1ONF A class is in first object normal form (1ONF) when its objects do not contain group of repetitive attributes. Repetitive attributes must be extracted into objects of the new class. The group of repetitive attributes is then replaced by the link at the collection of new objects. An object schema is in 1ONF when all of its classes are in 1ONF. In the figure Fig. 1 there is the example of assignment in non-normalized form and in the figure Fig. 2 there is the same assignment in 1ONF. We do not assume in contrast to original approach, that designers recognize automatically groups of repetitive attributes and single them out into independent class. This rule is really necessary according to our experience from school and practice. The problem is not always trivial as in mentioned example. Repeated attributes can exist under various names and they are not easy visible for the first sight.
OBJECT NORMALIZATION AS CONTRIBUTION TO THE AREA OF FORMAL METHODS 1
Contract supplier firstname supplier surname supplier address client firstname client surname client address payment mode
_detail
1
*
Order
products`
1 .. *
1 .. * price
_detail 1
Supply
VII. CONCLUSION
Product name
1 products` *
Fig. 2- Classes in non 2ONF.
2ONF A class is in second object normal form (2ONF) when it is in 1ONF and when its objects do not contain attribute or group of attributes, which are shared with another object. Shared attributes must be extracted into the new objects of new class, and in all objects, where they appeared, must be replaced by the link at the object of new class. An object schema is in 2ONF when all of its classes are in 2ONF. It concerns the attributes – supplier’s first name, supplier’s surname and his address and client’s first name, client’s surname, his address and method of payment in our example. These attributes were common for concrete order and supply and then it was necessary to establish new object class Contract.
303
Our approach mentioned here brings many questions and themes to next thought. One of them is a question, how the relation IS-A and other known relations among objects include into this approach. The consideration offers, that right use of IS-A relations would be interested next fourth normal form. When we focus on the process of object normalization, we can expect its practical analogy with technique of applying design patterns and primarily with refactoring technique. Our future research will focus on this direction, where we will try to define the rules of object normal forms as a sequence of refactoring steps of object database scheme. It is a pity, that so perspective and practical used database technology has not still understandable and universally accepted theoretical foundation and formal techniques of design. It is unquestionable, that many research centres are interested in this theme, but some coherent and universally accepted results was not yet published. Absence of common formal tool and technique makes the large incompatibility of present object databases, even if they are very good practical used product themselves. Therefore we suppose that near future brings maybe a few alternative approaches.
VIII. ACKNOWLEDGMENT Contract
1
_detail
1
*
Order
products`
1
1 .. *
name price
* *
1 .. *
Product
_detail 1
klient` dodavatel`
1 1
Supply
This research (work) has been supported by Ministry of Education, Youth, and Sports under research programs MSM 6840770014 and MSM6046070904.
products` *
Address
Person *
address`
REFERENCES
1
Fig. 3 - Classes in non 3ONF.
3ONF A class is in third object normal form (3ONF) when it is in 2ONF and when its objects do not contain attribute or group of attributes, which have the independent interpretation in the modelled system. These attributes must be extracted into objects of new class and in objects, where they appeared, must be replaced by the link at this new object. An object schema is in 3ONF when all of its classes are in 3ONF. It concerns the data about suppliers and client in the objects of class Contract. These attributes represent the persons and have independent interpretation on contracts. And if it agrees with the setting, so we can declare the same about the addresses.
[1]
Ambler Scott: Building Object Applications That Work, Your StepBy-Step Handbook for Developing Robust Systems Using Object Technology, Cambridge University Press/SIGS Books, 1997, ISBN 0521-64826-2 [2] Ambler Scott: Object Orientation -- Bringing data professionals and application developers together, http://www.agiledata.org/essays/ [3] Barry D.: The Object Database Handbook: How to Select, Implement, and Use Object-Oriented Databases, ISBN 0471147184 [4] Beck K.: Agile Database Techniques- Effective Strategies for the Agile Software Developer, John Wiley & Sons; ISBN 0471202835 [5] Blaha M., Premerlani M.: Object-Oriented Modeling and Design for Database Applications, Prentice Hall, ISBN 0-13-123829-9 [6] Catell R. G.: The Object Data Normal: ODMG 3.0, ISBN 1558606475 [7] Gemstone Object Server -- documentation & non-commercial version download, http://www.gemstone.com [8] Khodorkovsky V. V.: On Normalization of Relations in Databases, Programming and Computer Software 28 (1), 41-52, January 2002, Nauka Interperiodica. [9] Kroha P.: Objects and Databases, McGraw Hill, London 1995, ISBN 0-07-707790-3. [10] Loomis M., Chaundri A.: Object Databases in Practice, ISBN 013899725X [11] Nootenboom Henk Jan: Nuts - a online column about software design. http://www.sum-it.nl/en200239.html
304
MERUNKA AND MOLHANEC
[12] Tari Zahir, Stokes John, Spaccapietra Stefano: Object Normal Forms and Dependency Constraints for Object-Oriented Schemata, ACM Transactions on Database Systems 513-569, Vol 22 Issue 4, December 1997. [13] Vanicek Jiri, Data Gathering for Science and Research, Agricultural Economics, 50, 2004 (1), 29-34. [14] Wai Y. Mok, Yiu-Kai Ng and David W. Embley, An Improved Nested Normal Form for Use in Object-Oriented Software Systems. Proceedings of the 2nd International Computer Science Conference: Data and Knowledge Engineering: Theory and Applications, pp. 446452, Hong Kong, December 1992. [15] Yonghui Wu, Zhou Aoying: Research on Normalization Design for Complex Object Schemes, Info-Tech and Info-Net, vol 5. 101-106, Proceedings of ICII 2001, Beijing. [16] W. Degen, B. Heller, H. Herre, and B. Smith. GOL: Towards an axiomatized upper level ontology. In Barry Smith and Nicola Guarino, editors, Proceedings of FOIS’01, Ogunquit, Maine, USA, October 2001. ACM Press.
A Novel Security Schema for Distributed File Systems Bager Zarei
Mehdi Asadi
Saeed Nourizadeh
Shapour Jodi Begdillo
Dept. of Computer Engineering, Islamic Azad University, Shabestar Branch, Iran
Dept. of Computer Engineering, Islamic Azad University, Khamneh Branch, Iran
Dept. of Computer Engineering, Islamic Azad University, Shabestar Branch, Iran
Dept. of Computer Engineering, Islamic Azad University, Parsabad mogan Branch, Iran
[email protected]
[email protected]
[email protected]
[email protected]
Abstract-Distributed file systems are key tools that enable collaboration in any environment consisting of more than one computer. Security is a term that covers several concepts related to protecting data. Authentication refers to identifying the parties involved in using or providing file services. Authorization is the process of controlling actions in the system both at the user level, such as read and write, and at the administrative level, such as setting quotas and moving data between servers. Communication security addresses the integrity and privacy of messages exchanged over networks vulnerable to attack. Storage security concerns the safety of data “at rest” on disks or other devices. But these are mechanistic approaches to security. At a higher level, the issue of trust of all the parties in each other and the system components control the security decisions. This project is a successful endeavor to make security in distributed file systems. The main object of the paper was to get in depth knowledge about base concepts of distributed file system like fault tolerance, fault recovery, scalability and specifically security.
• Storing information • Retrieving information • Sharing information File systems are useful for long-term and persistent storage of information and are a way to organize information, usually in the form of files and a file hierarchy. A file consists of mapping a name to a block of information accessed as a logical group. Many ways exist to implement file systems and many file systems have been implemented, but relatively few of these are widely used. Normally, file systems are implemented as part of the operating system’s functionality, in the kernel. Applications use a common interface provided by the operating system for working with files, while the operating system controls the semantics of this file access.
1 .Keywords- Security, DFS , Fault Tolerance, Fault Recovery, Load Balancing, DES2.
I.
INTRODUCTION
To install security in distributed file systems there are different protocols such as SSL and Kerberos. These protocols use methods such as client and server mutual authentication, Authorization to clients for accessing and using of file system services and etc. In Authorization method, client must has a ticket or valid certificate for using file system services, that the ticket includes information such as username, expiration time of ticket and etc. The rest of this paper is organized as follows: In section 2 backgrounds and the main purpose of file systems are discussed. In section 3 design points of file system from different points of view are presented. In Section 4 the design requirements of distributed file system are covered. Secure distributed file systems are proposed in section 5 and finally the section 6 includes conclusion and our proposed solution for installing security in a distributed file system and comparison our proposed method with kind of existing systems for installing security in DFS. II. FILE SYSTEM BACKGROUND File systems serve several main purposes: 1 2
Distributed File System Data Encryption Standard
III. FILE SYSTEM ISSUES Some issues, common to most file systems, need addressing in the design of any new file system. 1.
Sharing Sharing files among users is important in any file system. Several ways exist to define the sharing semantics in a file system. UNIX semantics for file sharing dictates that users should immediately see the effects of all writes to a file. Thus if both user1 and user2 open the same file, if user1 writes to the file and user2 later reads from the file, user2 should see what user1 had written to the file. Another method to implement file sharing is called session semantics. These semantics guarantee changes to files are seen by other users when the file is closed. If multiple users have opened the same file, they could hold different or stale copies of the files when one of them writes to the file. Another approach is to treat every file as read-only and immutable. Every change to a file essentially creates a new immutable file with a new file name to refer to the new copy. Because all files are read-only, caching and consistency become much simpler to implement. Access Control
Access control in a file system is making sure users only access file resources they are allowed to access, including read access to some files and full write access to other files. Once
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 305–310. © Springer Science+Business Media B.V. 2008
306
ZAREI ET AL.
users are authenticated, a mechanism must exist to control the access to resources. In UNIX, file permissions can be set to any combination of read, write, and execute. Each file is associated with a single owner and a single group, and only the owner can set the permissions. Three groups of permissions exist: one for the owner, one for the group, and one for everyone else. UNIX access control is too difficult to use and does not promote sharing. We desired a better system for access control, preferring a system similar to AFS with access control lists, allowing normal users to create groups and set permissions on files. 2.
Caching Different media have different costs of access. Cache memories near processors are very fast, local disks are slower, and remote disks can be even slower to access. The basic idea of caching is to keep copies of recently accessed data on faster medium to speed up repeated access to the same data. One copy somewhere is usually considered the master copy and every other copy is a secondary copy. All types of file systems need some form of caching. Local file systems cache disk blocks in memory to reduce the time needed to fetch blocks from a disk. Distributed file systems need to cache remote blocks or files locally to reduce the amount of network traffic needed. Cryptographic file systems need to cache decrypted blocks to operate on them. Consistency
Changes made to a cache copy must eventually be propagated back to the master copy, and a cache copy can become out of date if the master copy is changed by another client. The general problem of keeping the cache copy consistent with the master copy is called cache consistency. To maintain consistency, a system needs a caching policy to determine what data is cached and when data is removed from the cache. The job of maintaining consistent caches can be given to either the clients or the server. The clients can be responsible for periodically checking the validity of their caches. This has the disadvantage of frequent checks that may not be necessary. Another approach is to have the server maintain information about what each client has cached. When the server detects a client has something invalid in its cache, the server contacts the client and tells the client to invalidate that particular file. One disadvantage of the server-centered approach is extra complexity in both the client and server code. Write Policy
When a cache block is modified, the changes can be pushed back to disk or a remote server at different times, corresponding to different cache write policies. One method, called write-through, is to immediately send all writes through the cache directly to the master copy. Another method is delayed-write, postponing writes until a future time. A type of
delayed write, called write-on-close, writes out the cache when the file is closed. 3.
Fault Tolerance A system is fault tolerant if the system can work properly even if problems arise in parts of the system. File systems can be characterized as either state-full or state-less. The less implicit state contained in each request, the more fault tolerant a system can be. Availability is how readily accessible files are despite possible problems such as servers becoming unavailable or communication problems. A fault tolerant system should aim to provide high availability. One common way of providing fault tolerance and increasing availability is to replicate files either on multiple disks on the same machine or on multiple machines. Replication greatly complicates the system in maintaining consistency among the copies. Complete consistency sometimes may be sacrificed for performance. 4.
Scalability Another design concern was scalability. We can view scalability in several different ways. Traditionally, people have looked at how many clients file servers can handle simultaneously. Once we leave local file systems and work with a network of machines, we start to care about the number of machines we can work with. Scalability is the ability for the system to handle a large number of active clients and servers and still be able to provide high performance for all involved. IV. DESIGN REQUIREMENTS OF A DFS There are a number of requirements that need to be addressed in the design of a heterogeneous distributed file system. In particular, these are in the areas of distribution, security and concurrency control. • Distribution: A distributed file system should allow the distribution of the file service across a number of servers. In particular, the following distributed file system requirements should be addressed: o Access transparency – clients should be unaware of the distribution of files across multiple servers. o Location transparency – clients should see a uniform file name space, which is independent of the server where the files are located. o Hardware and operating system heterogeneity – a distributed file system should provide a mechanism to allow multiple file servers, each running on a different file system type, to interact with each other, thus providing one global file system to its clients. • Security: Due to the inherent insecurity of the internet, which is vulnerable to both eavesdropping and identity falsification, there is a need to both secure all communication and verify the identity of all parties in a secure manner. The security architecture of a distributed file systems should provide authentication of both client and server, in addition to securing all communication between them.
A NOVEL SECURITY SCHEMA FOR DISTRIBUTED FILE SYSTEMS
• Concurrency: As the file server may be accessed by multiple clients, the issues of concurrent access to files needs to be addressed. A distributed file system should provide a locking mechanism, which allows both exclusive and shared access to files. V.
DISTRIBUTED SECURITY SYSTEMS
Distributed systems are more vulnerable to security breaches than centralized systems as the system is comprised of potentially untrustworthy machines communicating over an insecure network. The following threats to a distributed system need to be protected against: • An authorized user of the system gaining access to information that should be hidden from them. • A user masquerading as someone else, and so obtaining access to whatever that user is authorized to do. Any actions that they carry out are also attributed to the wrong person. • Security controls being bypassed. • Eavesdropping on a communication line, thus gaining access to confidential data. • Tampering with the communication between objects modifying, inserting and deleting transmitted data. • Lack of accountability due, for example, to inadequate identification of users.
307
SSL provides session encryption by protecting against people listening in on the network. The files, however, are still seen in unencrypted form by the file server. In some situations, one may not want the system administrator of the file server to be able to read one’s files. A user may also be concerned about someone breaking into the file server and gaining access to all of the files there. File encryption prevents the leak of information to someone who has access to the file server. Several significant differences exist between file encryption and session encryption. Files are persistent; so long term security must be taken into account. Encrypting a file such that no one can break into the file now is not enough; it must not be possible, or the risk should be sufficiently low, that in the future the file will also remain safe. Another difference is that files can require random access, whereas most encryption techniques simply cannot work with anything other than a sequential stream of data. Digital Signatures
Secure Communication Secure communication is provided through both the encryption and the digital signing of the data being passed over the network.
Digitally signing data prevents the unauthorized modification of data during transmission. Rather than sign the whole message, which would add too much overhead, a unique digest is signed instead. This message digest is produced by a secure hashing algorithm, which outputs a fixed-length hash, unique to that message. This ensures that if the message is modified in any way, the digest will change, which can then be detected at the receiving end. Digitally signing data is accomplished through the use of a public-key cipher. The message digest is encrypted with the private key of the sender, which the recipient deciphers using the senders public-key. If the data was modified, or the digest was encrypted with a different private key, then the original digest and the digest calculated by the recipient will be different. This ensures that both the data was not modified during transmission and that it originates from the true sender.
Encryption
2.
These threats are prevented through the securing of all communication over the network, the correct authentication of both client and server, and the authorization of clients to access resources. 1.
Encryption prevents eavesdropping of sensitive data transmitted over an insecure network. There are two types of encryption algorithms: asymmetric and symmetric. Asymmetric or public-key, algorithms have two keys, a public and a private. Data is encrypted with one key and decrypted with the other. Symmetric encryption, or private key, however, uses just one key, agreed between the sender and recipient beforehand. This method is faster than asymmetric encryption. A common cryptographic technique is to encrypt each individual conversation with a separate key, called a session key. Session keys are useful because they eliminate the need to store keys between sessions, reducing the likelihood that the key might be compromised. However the main problem with session keys is their exchange between the two conversant. This is solved through the use of either a public-key cipher or key agreement algorithm, such as Diffie-Hellman. File Encryption
Authentication Before a client is allowed to access any of the data stored in the file server, it must be able to prove its identity to the server. A secure mutual identification scheme, which does not require the transmission of passwords over the network, and provides authentication of both client and server was designed. This scheme is similar to the model provided by Kerberos, and is described below. Coulouris et al. state that a secret key is the equivalent of the password used to authenticate users in centralized systems. Therefore, possession of an authentication key, based on the client’s password can verify the identity of both the client and the server. The method used to convert the password into an encryption key is based on the method used in the PKCS #5 standards, whereby the key is generated by computing the hash of the password. A user identifier is used to identify the client for a particular session. This identifier can be encrypted with the authentication key to produce a token. This is illustrated in Figure 1.
308
ZAREI ET AL.
3.
Authorization Authorization is the granting of access to a particular resource. This prevents both unauthorized and authorized users gaining access to information that should be protected from them. Authorization is commonly implemented using ACL’s3, which is a list of users associated with each file in the system, specifying who may access the file and how.
Fig.1. Token Generation
4.
Kerberos The name “Kerberos” comes from a mythological threeheaded dog that guarded the entrance to Hades. Kerberos is a network authentication protocol developed at MIT. It is designed to provide authentication to client/server systems using secret key cryptography. Kerberos is based on the SecretKey Distribution Model that was originally developed by Needham & Schroeder. Keys are the basis of authentication in Kerberos and typically are a short sequence of bytes for both encrypt and decrypt: Encryption => plainTxt + Encryption key = cipherTxt Decryption => cipherTxt + Decryption key = plainTxt
It was implemented in C on Linux system. DES encryption algorithm has been used to encrypt the messages. The DES is a reversible operation, which takes a 64-bit block and a 64-bit key, and produces another 64-bit block. Usually the bits are numbered so that the most-significant bit, the first bit, of each block is numbered 1. UDP has been used as a Transport layer protocol for Kerberos. Kerberos deals with three kinds of security objects: Ticket: A token issued to a client by the Kerberos ticketgranting service for presentation to a particular server, verifying that the sender has been recently authenticated by Kerberos. Tickets include an expiry time and a newlygenerated session key for use by the client and the server. Authenticator: A token constructed by the client and sent to the server to prove the identity of the user and the currency of any communication with a server. An authenticator can be used only once. It contains the client’s name and a timestamp and is encrypted in the appropriate session key. 3
Access Control Lists
Session Key: A secret key randomly generated by Kerberos and issued to a client for use when communicating with a particular server. Encryption is not mandatory for all communication with servers; the session key is used for encrypting communication with those servers that demand it and for encrypting all authenticators.
Client’s processes must possess a ticket and a session key for each server that they use. It would be impractical to supply a new ticket and key for each client-server interaction, so most tickets are granted to clients with a lifetime of several hours so that they can be used for interaction with a particular server until they expire. • Advantages of Kerberos: – Secure authentication – Single sign-on – Secure data flow – Client and server mutual authentication • Applications benefiting from Kerberos: – File Systems : NFS ، AFS ، DFS – Shell Access : Login ، rlogin ، telnet وssh – File Copy : Ftp ، Rcp ، Scp ، Sftp – Email : KPOP ، IMAP • Limitations of Kerberos: – Doesn’t explicitly protect against Trojan attacks. – Is mainly intended for single-user workstations. KDC can be a single point of failure. 5.
Secure Sockets Layer The SSL Handshake Protocol was developed by Netscape Communications Corporation to provide security and privacy over the Internet. The protocol supports server and client authentication. The SSL protocol is application independent, allowing protocols like HTTP, FTP, and Telnet to be layered on top of it transparently. The SSL protocol is able to negotiate encryption keys as well as authenticate the server before data is exchanged by the higher-level application. The SSL protocol maintains the security and integrity of the transmission channel by using encryption, authentication and message authentication codes. The SSL Handshake Protocol consists of two phases, server authentication and client authentication, with the second phase being optional. In the first phase, the server, in response to a client’s request, sends its certificate and its cipher preferences. The client then generates a master key, which it encrypts with the server’s public key, and transmits the encrypted master key to the server. The server recovers the master key and authenticates itself to the client by returning a message encrypted with the master key. Subsequent data is encrypted with keys derived from this master key. In the optional second phase, the server sends a challenge to the client. The client authenticates itself to the server by returning the client’s digital signature on the challenge, as well as its public-key certificate. This is the standard protocol used by sites on the Web needing secure transactions.
A NOVEL SECURITY SCHEMA FOR DISTRIBUTED FILE SYSTEMS
309
5. After getting message from TGS the client decrypts it by own private key and get session key. Then makes a new message similar below and sends it to server (client encrypts its own request with session key that has got from TGS ago). Message = {user-name, Sealed-request} Fig.2. Secure Sockets Layer
VI. CONCLUSIONS AND PROPOSED SOLUTION To receive to the security in distributed file systems there are different mechanisms such as encryption, certificate authentication and digital signature. DES use for encryption and protection of information against different kinds of attackers. In this paper several common methods for installing secure communication between client and server such as SSL, Kerberos and mutual authentication are explained completely. Finally we proposed a method for increasing security in the distributed file systems that uses the combination of proposed methods in this paper. Figure 3 and Figure 4 are a view of our proposed schema that is based on a certificate or TGS4 and an AS5. Finally secure communication between client and server will be made. We will describe step-by-step our system as implemented in this paper:
6. After receiving message from client the server reviewed the certificate which is related with this client (that has got from TGS ago). If the expiration time of client certificate is not reached, by using of session key that existing in certificate, it decrypts the client request and after doing the request it encrypts the result by session key and sends to client. At the end client decrypts the result by session key and use it. Our proposed system has all of the benefits of Kerberos but it has decreased the number of interchanged messages for making a secure communications between client and server from 6 messages to 5 messages. As in networks systems we need rapid services so the rapid of services is increased with decreasing of number of messages. This is what we want in distributed systems and networks. In following table two common methods that used to create secure communications in networks are compared with our proposed architecture.
1. Client sends a message to AS for receiving certificate (the message includes username, Authentication server name and client service request that is sealed with client private key) Message = {user-name, AS-name, SealedClientServiceRequest}
2. The AS checks its ACL (Access Control List) for determining validity of client. If the client has validity, AS puts request and user-name of client on buffer (client Authentication). The TGS Then takes the request and username from buffer and finds the private key of client from the username and decrypt the client request. If client valid for use of its requested service, TGS creates a certificate includes username, certificate expire time and session key.
Fig.3. Proposed scurety system Architecture
Certificate = {user-name, certificate- expiration time, sessionkey}
3. TGS makes a message similar below and after encryption it by server private key, sends it to server. Message = {Certificate}
4. TGS makes another message similar below and after encryption it by client private key, sends it to client. Message = {session-key} Fig.4. Message time orderin Proposed Architecture 4
Ticket Granting Server 5 Authentication Server
310
ZAREI ET AL. System Comparison Criteria
Kerberos
SSL
Our Proposed Architecture
Key
Private Key
Public Key
Private Key
Time Synchronization
Synchronous
Asynchronous
Synchronous
Suitable Application
Networks Environments
Ideal for the WWW
Networks Environments
Passwords reside in users’ minds where they are usually not subject to secret attack.
Security
Kerberos has always been open source and freely available.
Availability Number of Interchanged Messages
6 Messages
Certificates sit on a user’s hard drive (even if they are encrypted) where they are subject to being cracked.
Passwords reside in users’ minds where they are usually not subject to secret attack.
Uses patented material, so the service is not free. ------
-----5 Messages
Table 1- Comparison of different systems to create secure communications
REFERENCES [1] Marcus O’Connell, Paddy Nixon - JFS: A Secure Distributed File System for Network Computers, 25th EUROMICRO ‘99 Conference, Informatics: Theory and Practice for the New Millenium, 8-10 September 1999, Milan, Italy. IEEE Computer Society 1999. [2] Benjamin C. Reed, Mark A. Smith, Dejan Diklic - Security Considerations When Designing a Distributed File System Using Object Storage Devices –. IEEE Computer Society 2003. [3] John R. Douceur and Roger P. Wattenhofer – Optimizing File Availability in a Secure Serverless Distributed File System, 20th Symposium on Reliable Distributed Systems (SRDS 2001), 28-31 October 2001, New Orleans, LA, USA. IEEE Computer Society 2001. [4] Ted Anderson & Leo Luan - Security Practices in Distributed File Systems, The Second International Workshop for Asian Public Key Infrastructures Taipei, Taiwan , October 30November 01, 2002.
[5] Scott A. Banachowski , Zachary N. J. Peterson , Ethan L. Miller & Scott A. Brandt - Intra-file Security for a Distributed File System, Proceedings of the 10th Goddard Conference on Mass Storage Systems and Technologies / 19th IEEE Symposium on Mass Storage Systems, College Park, MD, April 2002, pages 153–163. [6] Kevin Fu, M. Frans Kaashoek, and David Mazieres, Fast and secure distributed read-only file System, ACM Transactions on Computer Systems, February 2002. [7] Distributed File System Security with Kerberos. Austin Godber, http://uberhip.com/godber/cse531 [8] Q. Xin, E. L. Miller, and T. J. E. Schwarz. Evaluation of distributed recovery in large-scale storage systems. In Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing (HPDC), pages 172–181, Honolulu, HI, June 2004.
A Fingerprint Method for Scientific Data Verification Micah Altman Institute for Quantitative Social Science, Harvard University 1737 Cambridge St. # 325 Cambridge, MA 02138 [email protected] Abstract - This article discusses an algorithm (called “UNF”) for verifying digital data matrices. This algorithm is now used in a number of software packages and digital library projects. We discuss the details of the algorithm, and offer an extension for normalization of time and duration data.
.
INTRODUCTION In digital library systems, migrating digital objects among formats is necessary both for effective use and preservation. However, a consequence of such reformatting is that we are often required to re-verify the intellectual content of the object, in order to avoid data loss and misinterpretation. The Universal Numeric Fingerprints (UNF’s) is an algorithmic tool used to verify that a data matrix or related digital object that is produced in one software environment and/or format has been correctly interpreted when moved to a different environment and/or format. In outline, the algorithm uses three steps to compute a fingerprint: First, an approximation algorithm is used to compute a reduced-fidelity version of the target digital object. Second, this reduced version is put into a normalized serial form. Third, a hash function is used to compute a unique fingerprint from the resulting serialized object. While these steps could in theory be applied to any digital object, in practice it is necessary to create a specific normal form and approximation method must be defined for each type of object to be processed. The UNF is implementation is specially tailored for use with scientific data. We describe the approach, implementation, and algorithmic details below.
RELATED APPROACHES The UNF approach shares some similarities to some published approaches to audio and video fingerprinting [1,2] The UNF algorithm however is specifically tailored to scientific and statistical data vectors. Audio and video fingerprinting methods typically rely on type-specific feature extractions, and are not directly applicable to scientific databases. Furthermore, unlike audio and video fingerprints, which typically compute a long sequence, UNF’s use a more compact representation, suitable for use in scholarly citations.
Other approaches have been developed that attempt to verify that one object is a derivative of another object, regardless of file format. These methods operate through insertion or alteration of data in unused or unnoticed portions of the object, forming a digital watermark. Research into digital watermarks have produced algorithms that are designed to be robust to lossy transformations of the object. And hence some types of image objects can be identified as a derivative of another even when the derivative is manifested in a different file format. (For a survey see [3].) Watermarks have significant shortcomings when used to establish the semantic equivalence of two arbitrary digital objects. Watermarking algorithms cannot be used to establish that two independently created objects are semantically equivalent, since these will not share the same watermark. Conversely, two objects could have identical watermark information added, but contain completely different semantic content. Nor can watermarks be used to verify that a derivative is identical to a watermarked digital object, if the derivative was created from the original digital object before the watermark was applied to that original digital object. Furthermore, watermarks are not practical for some objects, such as numeric data and source code files, where the alterations created by the watermarking process tend to alter the semantic content of the digital object.
IMPLEMENTATION AND USE The UNF was developed in the course of research into improving the numerical accuracy of statistical software [4]. It was incorporates type-specific algorithms for scientific data such as numeric vectors, matrices, and related objects. Where possible, UNF algorithms are implemented as plugins for the software applications being used to manipulate or analyze the digital objects of interest. This strategy is sed so that the UNF can be computed directly from the internal data structure used by the software product performing the data display and manipulation. This allows the UNF to be used not just to verify format migration, but also to verify that the data has been correctly interpreted by a software application. (Some of the errors we frequently encountered in [4], and which the UNF detects, included: omitted records, omitted
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 311–316. © Springer Science+Business Media B.V. 2008
312
ALTMAN
variables, recoding of missing values, and loss of numeric precision.) Furthermore, this implement strategy obviates the need to write specific parsers for each data format – since the host software application parses the object before calculating UNF. Instead, we need only supply standard interfaces for computing UNF’sof vectors, matrices, and other related structures. The general algorithm was first described in [4] along with a sample implementation. Version 3 was the first version to be implemented in publicly available software: The UNF package for R was made available through the Comprehensive R Archive Network (CRAN), a code archive developed part of the R Project [5]. (This version of the software also introduced the name “Universal Numeric Fingerprint”. ) UNF plug-ins are currently available for the statistical software packages “R”, “SAS”, and “Stata”; along with a stand-alone application, and C++ libraries. The UNF technology has been incorporated in a number of other digital library and preservation practices: UNF’s are used to support format migration in the “Virtual Data Center” (VDC) digital library system for federated data sharing. [6] The successor of the VDC, the Dataverse Network Project (DVN) [7] which provides digital library software and virtual-arching services, uses UNF’s for data verification. Both systems now generate scholarly citations for all data collections, based on the standard for the citation of quantitative data developed in [8]. In this citation standard, the UNF is an integral part of the citations, and is used to verify that a given data collection matches the version cited. The citation’s combination of a global unique identifier, UNF, and bridge service URL combine to support permanence, verifiability, and accessibility. For use in citations to data, when displayed, the UNF is formatted so that it is self-identifying and can easily be printed. An example of a printed UNF is UNF:3:ZNQRI14053UZq389x0Bffg?==, where UNF: identifies the rest of the string as a UNF, :3 means that the fingerprint uses version 3 of the UNF and hash algorithm, and everything after the next colon is the actual fingerprint. For a particular algorithm and number of significant digits, the fingerprint is always the same length. Thus, the UNF includes enough self-identifying information so that the algorithm used may be updated to newer versions over time without disturbing old citations. UNF’s are also used by the The Data Preservation Alliance for the Social Sciences (Data-PASS), and . This is a partnership of six major U.S. archival institutions collaborating to acquire and preserve data at-risk of being lost to the research community. To support this goal, Data-PASS developed shared technical infrastructure and best practices for metadata, confidentiality, security, processing and other elements of the archival management process. UNF’s are used for citation and verification in the Data-PASS shared catalog infrastructure, and are part of the best practices for metadata identified by the alliance. (See [9,10]).
In these systems, standards, and practices, UNF’s are used as a replacement for or supplement to file-level cryptographic hashes and summary statistics. (In a sense, the UNF is a uniquely tailored summary statistic for the entire object.) Like a cryptographic hash, the UNF detects changes to the data object. Unlike a standard hash, it is sensitive only to semantically important changes – changes of format or insignificant changes of precision will not yield differing UNF’s.
ALGORITHMIC CONSIDERATIONS To summarize, the abstract algorithm consists of the following steps: An approximation algorithm is used to compute the approximated semantic content of the digital object. This approximated content is then put into a normalized form. A hash function is used to compute a unique fingerprint for the resulting normalized, approximated object, and the hash is stored. When the object is reloaded into the same or another application, this process is repeated, and the value generated at load time is compared to the stored value. These steps are discussed in more detail in the remainder of this section. The first part of what is needed is a normalization function f () that maps each sequence of numbers (or more commonly, blocks of bits) to a single value: (1) To verify the data, we would need to compute f once when initially creating the matrix. Then recompute it after the data has been read into our statistics package. For robust verification, we should choose f() such that small changes to the sequence are likely to yield different values in f(). Normalization of objects alone, while it can be used as a basis of establishing identity across formats in limited cases, is inapplicable when reformatting of the object changes the precision, accuracy, or level of detail of an object in trivial ways. This is a well known issue is video and audio formats, in reformatting complex text documents, and surprisingly occurs commonly even in reformatting purely numerical databases. Any type-specific approximation function A(), may be employed.1 This approximation process, A(), accepts as input a digital object, O, of specified type, and an approximationlevel parameter, k. 1
For audio and video type-specific approximation is used, such as decimation, spatial or frequency downsampling, and/or numerical cutoff filtering. For examples of decimation, downsampling and cutoff algorithms see: [11], [12].
A FINGERPRINT METHOD FOR SCIENTIFIC DATA VERIFICATION
(a) A sign character in {+,-} (b) A single leading period. (c) A decimal point, represented by a period character: `.` (d) Up to k-1 digits following the decimal, comprised of the remaining k-1 digits of the number, omitting trailing zeros. (e) A lower case ‘e’ (f) A sign character. (g) The digits of the exponent, omitting trailing zeros.
A() should be chosen to capture semantic of approximation.2 That is O'= A(O,k), then O' should be semantically close to O, since the UNF does not otherwise make direct use of the semantic content of the object. Two other conditions on A() are notable: [Monotonicity] if k < k0 then for some measure of semantic distance, if A(O, kj ) <> A(O', kj) then A(O, ki) <> A(O',k), j>i 2. [Nesting] If k ≥ k0 then A(A(O, k), k0) = A(O, k0)
1.
When A() satisfies the monotonicity condition, greater values of k yield less accurate approximations. When A() satisfies the nesting condition, UNF’s can then be used to determine if two derivative objects O' and O'' are approximations of an object O, without knowing the UNF of O. To do this, generate a sequence of UNF’s for both objects, at increasing levels of approximation. If the resulting UNF’s match for a particular approximation level, the two objects are approximately semantically identical. (Alternatively, if the conditions on A() above are not satisfied, then it is possible that no UNF computed on O' will match the UNF computed on O'' at any approximation level.)
TYPE-SPECIFIC NORMALIZATIONS AND APPROXIMATIONS FOR NUMERIC VECTORS
3.
If the element is missing it is represented as a string of three null characters. If the element is a IEEE 754 non-finite floating point special value, it is represented as the signed lower-case IEEE minimal printable equivalents (i.e., +inf,-inf, +nan).
4.
Character strings representing non-missing values are terminated with a POSIX end-of-line character.
5.
Each character string is encoded in a Unicode bit encoding. (see [13]) (Versions 1-4 use UTF-32BE versions 4.1 and greater uses UTF-8 for increased performance.)
6.
The vector of character strings is combined into a single sequence, with each character string separated by a POSIX end-of-line characters and a null byte. A hash is computed on the resulting sequence. Version 4 uses SHA256 [14] as the hashing algorithm. ( Versions one through three use a 64-bit checksum, a 64-bit CRC, and MD5, respectively, as the hashing algorithm.)
7.
The resulting hash is then base64 encoded (see [15]), using big-endian byte-order, for printing.
Versions three and four of the UNF algorithm operate on datavectors3 and involve the following steps: 1.
Each element in the numeric vector is rounded to k significant digits, using the IEEE 754 ‘round toward zero’ rounding mode. The default value of k is 7, the maximum expressible in single-precision floating point calculations.
2.
Each element is then converted to a character string. Unless the element is missing or not a finite value, it is represented in exponential notation, in which noninformational zeros are omitted. This notation is constructed such that if step 1 satisfies the nesting above, the resulting character string will also satisfy the nesting property. For example, the number pi, at five digits, is represented as “-3.1415e+” and the number 300 is represented as the string “+3.e+2”. Specifically, this notation comprises:
2
Note that this does not necessarily imply the inverse, that for two semantically similar objects, A(O,k) = A(O',k) for some k. 3 A vector is defined as an ordered sequence of values of uniform type. Vectors may include missing values.
313
ALGORITHMIC VARIATIONS Character vectors. Vectors of character values are treated similarly: In step 1, each character string is converted to Unicode Normal Form C, and then truncated to k characters rather than rounded to k digits (The default k for characters is 128.) Step 2 is skipped. Step 3 through 7 are the same. Mixed vectors. UNF's can be computed on vectors of mixed type by normalizing each element as per its type, and then computing the hash on the resulting byte stream. Unordered sets. To compute a UNF for an unordered set we convert it to a vector by sorting the approximated elements as follows:(1) First apply the type specific approximation to each element in the vector. (2) Second, sort the approximated elements in POSIX order. (3) Third, apply type specific normalization to each element. (4) Compute the hash.
314
ALTMAN
Composite objects. UNF's can be computed on composite objects that comprise a hierarchy of other objects, by computing a UNF of the UNF of each component recursively. This proceeds as follows: 1. Form a vector of mixed primitive values, by iterating across each component and transforming it according to its type: − (a) If the component is a single primitive type, convert it to its approximate canonical representation as above, but do not compute a UNF. − (b) If the component is a vector or set, compute its UNF as above. − (c) If the component is a compound object, compute a UNF for it, using steps 1-3. 2. For each UNF in the resulting vector, convert it to a base64 encoded character string. 3. If there is no explicitly intrinsically defined ordering to the object components, sort the resulting values using a POSIX locale sort order. 4. Compute the UNF of the resulting vector of character strings. For example, the UNF’s for multiple variables can be combined to form a UNF for an entire data frame, and UNF’s for a set of data frames can be combined to form a single UNF representing an entire research study. Specifically, UNF's any hierarchical composite objects: For simplicity in the selfdocumenting representation, one should ues a consistent version and level of precision across the individual UNF’s being combined.
EXTENSIONS IN VERSION 5 In this section we introduce improvements to the current UNF algorithm, which together comprise version 5 of the algorithm. To summarize briefly, these changes comprise changes to the following steps in the algorithm: − − − −
Allowing the hash to be truncated after computing. Providing a new default rounding mode, rounding to significant digits. Normalization forms for date, time, duration, bitstring, and logical values. Extensions to the printable form to compactly indicate any non-default options used.
Replacement Hash Function Version three of the UNF used the MD5 algorithm in its digest step. This produced a relatively manageable printable hash, however the algorithm has since been was later found to have significant flaws [16]. Version 4 of the UNF algorithm replaced MD5 with SHA256. For automated applications, this was a desirable trade-off: a great increase in security was gained by a minor decrease in performance. However, when
printing citations, the length of the SHA256 hash may have a significant impact on readability and usability. Version 5 of the UNF strikes a more balanced trade-off among security, efficiency, and human usability by allowing truncation of th SHA256 using the method described in [17]: − A full hash (in this case SHA256) is computed. − To truncate to N bits, the N leftmost (most significant) are used, the least significant 256-N are discarded Unlike the MD5 algorithm, truncated SHA-256 is not subject to any known attacks. (The discovery of nearcollisions in SHA-256 would cast doubt on the security of the truncated hash. None so far have been reported.) Finally, SHA-256 always computes a 256 bit hash internally for security, but this resulting can be truncated to either 196 or 128 bits for printing and storage. This means that truncated hashes may be compared to their untruncated counterparts to determine that the truncated hash represents the same object. 4 Thus a truncated SHA256 yields a substantial benefit for human usability in printed citations, and significantly improves security, at the cost of a small reduction in performance relative to MD5. Alternative Approximation Method for Numeric Vectors Rounding toward zero ensures nesting holds, but can produce more rounding error than rounding toward nearest. The maximum log relative error (LRE) for the former is (digits-1) while the maximum LRE for the latter is digits. Hence, you may wish to use one more significant digit when computing UNF’s than when reporting rounding significant digits for presentation or storage. More important, rounding toward zero also does not guarantee inverse semantic similarity, since two numbers that are close numerically can yield different approximate values. In particular, numeric imprecision that occurs near whole-number boundaries can yield different approximations. Version 5 by default, uses rounding to k significant digits. This proceeds by rescaling (dividing by a power of 10) the value so that the kth digit is immediately to the left of the decimal, applying IEEE round to nearest rounding, and rescaling by the original scale factor. This violates nesting, but assures that two values numerically close will have the same approximate representation, which makes it more robust to numeric imprecision. As an option, the previous rounding mode may still be used. Normalization of Date, Time, Duration, Bitfield and Logical Values 4
Note that this method of truncating SHA256 is not identical to the official SHA224 implementation, since the official SHA224 uses different initial values.
A FINGERPRINT METHOD FOR SCIENTIFIC DATA VERIFICATION
Previous implementations of the UNF algorithm addressed vectors of numeric and character data. While dates and times are can be represented in either form, additional normalization is required to ensure that semantically equivalent vectors of other types yield the same UNF. This section describes additional normalization forms.5 In version 5, boolean values are represented by converting to one of three numeric values: {0,1,missing} . No rounding is applied. Bit fields are normalized by: (1) converting to big endianform, (2) truncating all leading empty bits, (3) aligning to a byte boundary by re-padding with leading zero bits, (4) base64 encoding to form a character string representation. No rounding is applied. Missing values are represented by three null bytes. Time, date and duration normalization is more complex. Time and date standards, such as ISO 8601 [18] are helpful in determining whether two elements are semantically equivalent, but are not sufficient for use in the UNF process since they fail to define a single canonical representation. (ISO 8601 permits many representations of time and date values, including variation in separators, time zone representation, fractional times in different bases, week dates and ordinal dates. Thus it is possible to represent the same moment in time in many different ways, and still be compliant with the standard). We now describe a unique character-based representation for times, calendar dates, and durations. This representation can be used in step 2 of the UNF algorithm described in the previous section, in order to form UNF’s of vectors of time/date elements. This normalization method is essentially a single unambiguous representation selected from the many described in the ISO 8601 standard , and is thus complies with the standard, and can easily be implemented using standard libraries for manipulating ISO date/time values. To form a UNF for a data element comprising a calendar date the date is converted to a character string in the following form: “YYYY-MM-DD”. This comprises zero-padded 5
ASN.1 [19] defines a 'distinguished' encoding form for sets and sequences of values which resembles somewhat the UNF normalization representation for numbers (which was developed independently). However we did not find ASN normalization forms suitable for scientific data representation for a number of specific and general reasons: ASN.1 does not have a way of representing missing values nor infinite values. Nor are the ASN normalization forms not designed to be compatible with partial value representation, such as partial dates. More important, ASN.1 normalization forms also are made on the basis of format rather than semantic content (e.g. Integers and reals have different normalizations type, but may represent the same value). Generally, ASN.1 is a complex standard most suitable for systems data exchange, it does not address construction of an approximate form of an object, nor the construction and representation of fingerprints.
315
numbers representing year, month (in [01,12]) and day. Partial (imprecise) dates where only the year, or the year and month are formed as “YYYY” and “YYYY-MM” are permitted. The UNF method for representing time is based on the in the ISO 8601 extended format: “hh:mm:ss.fffff”. The quantity “.fffff” represents fractions of a second, it must contain no trailing (non-significant) zeroes, and must be omitted altogether if valued at zero (thus “12:01:01.0” is not legal for UNF´s). Other fractional representations (such as fractional minutes and hours) are not permitted in the UNF representation. The UNF representation permits times in one time zone only. If the time zone of the observation is known, the time value must be converted to the UTC time zone and a “Z” must be appended to the time representation: “02:01:04.003Z”. Elements that comprise a combined date and time are formed by concatenating the (full) date representation, “T”, and the time representation, as in: “2004-1228T02:01:04.003Z”. Partial date representations must not be used for combined date/time values. Type-specific approximation proceeds by deleting entire components of the time, date, or combined time/date. Components should be deleted in the following order: fractional seconds component, seconds, minutes, hours, day, time zone indicator (if any), and month. Durations are represented by the format P[n]Y[n]M[n]DT[n]H[n]M[n]S, where each [n] represents the number of years, months, dates, hours, minutes, and seconds (respectively) in the duration. Fractional values of seconds (only) are permitted in the form of “nnn.fffff”. Where n=0, the “0” is required. All other leading and trailing zeroes, fractional hours and minutes, and truncated values are prohibited. Durations may be used only where the actual start time is not known, otherwise a time interval must be used. Extensions to the Printed Representation Version 5 extends the printed representation to include other ways of documenting alternate approximation levels, algorithmic options. This extension is forwards compatible and allows for a more compact printable representation when only a subset of approximation levels deviate from their defaults. These fields are required only where the approximation level, or algorithm parameter used differs from the default for that UNF version. The following additional representations of approximation levels are permitted: (1) ``###,###,###', (where ### is an integer value) indicating non-default levels of approximation for numeric, character, and time values (2) any combination and ordering of 'N###', 'X###, 'T###', 'H##', 'R###', separated by commas, indicating approximation values for numeric, character, time values, the hashing truncation, and the numeric rounding mode. Values associated with H may be in {256,196,128} to indicate different levels of truncation, and
316
ALTMAN
Tiger 128 correspondingly, R may take on {1,2} representing the truncation and significant digit rounding mode. These
SUMMARY The abstract UNF procedure can be thought of as a “semantic checksum”. Technically, it is formed by creating a cryptographic hash of the normalized content of a digital object that has been approximated at an acceptable level of semantic fidelity. The UNF algorithm itself is adapted for statistical/ scientific data. It provides a method for generating normalized representations of approximated vectors (and more complex objects formed from vectors) comprising numeric, character, time, date and duration values. UNF’s are now used in statistical applications to prevent misinterpretation of data, in archiving and citation practices for verification of content, and in digital library systems to verify format migration. In these systems and standards UNF’s are, typically, used a replacement for or supplement to file-level cryptographic hashes and simple descriptive statistics A notable aspect of the UNF implementation is that UNF’s are calculated from within the software application acting on the digital object. This use can detect misinterpretation of the object, as well as corruption. In a sense, the UNF acts as a uniquely tailored summary statistic for the entire object
ACKNOWLEDGMENTS Thanks to Gary King and Akio Sone for comments and suggestions. This research was supported in part by an award (PA#NDP03-1) from the Library of Congress through its National Digital Information Infrastructure and Preservation Program (NDIIPP).
REFERENCES [1.] P. Cano, E. Batle, T. Kalker, J. Haistma, “A Reiview of Algorithms for Audio Fingerprinting”, IEEE Workshop on Multimedia Signal Processing,, IEEE Press, 2002, pp. 169173. [2.] J. Oostveen, T. Kalker, J. Haitsma, “Feature Extaction and a Databse Stategy for Video Fingerprinting”, Lecture Notes in Computer Science, vol 2314, Heidelburg: Springer Berlin, 2002.
[3.] P. Meerwald, and A. Uhl,. “A Survey of WaveletDomain Watermarking Algorithms” in Proceedings of SPIE, Electronic Imaging, Security and
Watermarking of Multimedia Contents III, vol. 4314, 2001, pp. 506-516. [4.] M. Altman M, J. Gill , M.P. McDonald, Numerical Issues in Statistical Computing for the Social Scientist. New York: John Wiley & Sons, 2003 [5.] R development Core Team, “R: A language and environment for statistical computing.” R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-90005107-0. 2004. [6.] M Altman, L. Andreev, M. Diggory, M. Krot., G. King, D. Kiskis, E. Kolster, A. Sone., S. Verba. “An Introduction to the Virtual Data Center Project and Software.” in Proceedings of the First ACM+IEEE Joint Conference on Digital Libraries. ACM Press, New York. 2001. [7.] G. King, . 2007. “An Introduction to the Dataverse Network as an Infrastructure for Data Sharing”, Sociological Methods and Research. Forthcoming 2007. [8.] Altman, M., King, G. 2007. “A proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib 13(3/4). [9.] M. Altman , J. Crabtree., D. Donakowski,, M. Maynard, , “Data Preservation Alliance for the Social Sciences: A Model for Collaboration”. Paper presented at DigCCurr 2007, Chapel Hill, N.C. 2007. [10.] M. Vardigan, C. Whiteman, “ICPSR meets OAIS: applying the OAIS reference model in the social science archive context”, Archival Science, 7(1) 2007. pp 73-87. [11.] K.J. Renze, J.H. Oliver JH, “Generalized Unstructured Decimation.” IEEE Computer Graphics and Applications, vol. 16, Cole, Vardigan, Archival Science, ICPSR meets OAIS: applying the OAIS reference model to the social science archive contex, pp 24-32. [12.] IEEE, Programs for Digital Signal Processing, New York: IEEE Press, 1979. [13.] Unicode Consortium, The Unicode Standard, Version 4.0.0. Boston: Addison-Wesley, 2003. [14.] National Institute of Standards and Technology (NIST), (2002). “Secure Hash Algorithm.” NIST FIPS 180-2. [15.] Josefsson, S , “The Base16, Base32, and Base64 Data Encodings’ RFC 3548. 2003. [16.] A. Joux, “Multicollisions in Iterated Hash Functions. Application to Cascaded Constructions.”in Lecture Notes in Computer Science 3152. Springer Berlin: Heidelberg, 2004, pp. 206-317 [17.] Q. Dang, Recommendation for Using Approved Hash Algorithms, NIST Special Publication 800-107.Nastional Institute of Standards and Technology. (Draft, July 2007) [18.] International Studies Organization. Data elements and interchange formats Information interchange Representation of dates and times. ISO Standard 8601, (3d ed.). 2004 [19.] International Studies Organization. Abstract Syntax Notation One (ASN.1), b
Mobile Technologies in Requirements Engineering Gunnar Kurtz, Michael Geisser, Tobias Hildenbrand, Thomas Kude Department of Information Systems I University of Mannheim, 68131 Mannheim, Germany {gkurtz, mgeisser, thildenb, tkude}@mail.uni-mannheim.de Abstract - This paper presents the current capabilities of mobile technologies to support the requirements engineering phase of software development projects. After a short insight into the state of the art of requirements engineering, different areas of application of mobile requirements engineering processes and tools - such as Arena-M or the Mobile Scenario Presenter - are being presented. The paper concludes with a critical statement regarding the benefit of current mobile tools in real-life requirements engineering.
I. INTRODUCTION In the last couple of years mobile technologies (MT) have advanced significantly and have become reliably available to a broad public. As MT offer the possibility to use computing devices such as PDAs, smart-phones or Blackberries anywhere and anytime to access the internet or a company’s intranet, various industries have been trying to use these technologies to support their business processes. Some companies have done quite successfully and gained significant advantages from MT, e.g. [29, p. 660] found that “mobile computing is one of the most attractive IT solutions that can be used in construction to reduce the rework and money waste”. Consequently, the question arises how processes in software development can be supported by MT as well. Out of the phases of the classical waterfall model (requirements engineering, system design and specification, coding, testing, delivery, deployment and maintenance – cf. [19]), requirements engineering (RE) exhibits a considerable potential for utilizing MT successfully: Recent studies, e.g. by [26] and [12], show that most systems are delivered with only 42% to 67% of the expected requirements. Reference [12] identifies a domino effect, created by “misunderstood, wrong, even slightly skewed requirements” as one of the most important reasons for the failure of software projects. As [10, p. 58] put it, “getting requirements right might be the single most important and difficult part in a software project”. References [4] and [11] further illustrate the importance of proper requirements: According to their studies the effort of fixing a yet undiscovered error gets multiplied by ten when moving to the next phase of the development process. This might lead to substantial additional costs in software projects which encompass several man-months of work and extensive specifications. Consequently, valid and sound requirements contribute considerably to the economical success rate of software projects. Reference [1] identifies communication problems during the RE phase as a major factor in the delay and failure of software projects. MT do not only support communication between stakeholders, but also allow analysts to define requirements directly at the user’s desk in the prospective working envi-
ronment. Therefore the potential benefits of using MT in the RE process are immense. The goal of this paper is to analyze and elaborate the capabilities and opportunities of MT to support RE processes. Therefore, different mobile RE tools and processes as well as their possible areas of application are to be evaluated with respect to their benefits for RE. In addition, an overview of the state-of-the-art of RE in general and its success factors, especially in distributed and mobile settings, is presented. This paper is divided into three parts - an introduction to RE in general, the presentation of capabilities of MT and a conclusion: To get an overview concerning RE in general, chapter II will elucidate the fundamentals of RE. Furthermore, the most important success factors and recent developments of RE are being shown. Chapter III deals with MT and their potential to support the RE process. After exploring these possibilities the two state-of-the-art solutions Arena-M and the Mobile Scenario Presenter are presented. Finally, the use of MT in the RE process is exemplified with the help of real-life cases. Chapter IV concludes this paper with a summary and suggestions for further work. II. REQUIREMENTS ENGINEERING A. Requirements and the Requirements Engineering Process According to [25, p. 5], Requirements Engineering is the process of “discovering, documenting, and maintaining a set of requirements for a computer-based system”, whereas the requirements of a system or product can be defined as “the descriptions of the services provided by the system and its operational constraints” [24, p. 118] respectively as “something the product must do or a quality it must have” [18, p. 9]. Requirements are usually further sub-divided into functional, non-functional and domain requirements. Functional requirements describe what the system is supposed to do, e.g. its reaction to a particular input, or what types of services are to be provided. Non-functional requirements define system properties and constraints, e.g. maximum response times. Domain requirements are determined by the software’s domain and may be functional as well as nonfunctional [24, p. 119–120]. Usually, the RE process consists of the following five subprocesses [24, p. 143]: - Feasibility Study: “Assessing whether the system is useful to the business” - Elicitation and Analysis: “Discovering requirements” - Specification: “Converting these requirements into some standard form”
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 317–322. © Springer Science+Business Media B.V. 2008
318
KURTZ ET AL.
-
Validation: “Checking that the requirements actually define the system that the customer wants” - Management: Managing changing requirements, e.g. due to modifications in the system’s hardware, software or organizational environment As the final sub-process suggests, requirements change and have to be re-negotiated throughout the development of the project. Thus, RE is not a single straight-forwarded process, but can be considered a spiral as well: Three activities (Requirements elicitation, Requirements specification and Requirements validation) are organized around a spiral, whereas the amount of time and effort of each activity depends on the current stage of the overall process [24, p. 144]. Another more hands-on RE process is presented by [10]. It includes the identification of major stakeholders and domain boundaries, the examination of system artifacts and source material from current and previous systems, and the frequent feedback of stakeholders and experts. In an empirical study [10] found that “successful teams performed on average three iterations of the RE process”. As RE defines the software system that is being produced, errors in this part of the job may - and certainly will - have severe consequences for the rest of the development process. This importance is supported by [10, p. 58] who identifies “deficient requirements as the single biggest cause of software project failure”. As [4] showed, inaccurate requirements have severe consequences on the entire development process: Depending on the size of the software project and the degree of formalism, the costs of fixing an error get multiplied by the factor 10 for each phase in the development process. In a study by [11], a specifications document had an average of 5 errors per page. For a 500-page-document and a project duration of 18 months, these errors led to additional costs of 11.5m Euro. In addition, the development time could be reduced by 40 % with an optimized RE process. Consequently, correct and sound requirements contribute substantially to the economical success of software projects. B. RE in different Software Development Models The establishing of system and software requirements is the first step in the classical waterfall model originally presented by [19]. Even though new models have been introduced, the waterfall model is still quite popular: Reference [16, p. 42] reports that 35% of its study’s participants still used the waterfall model. Nevertheless, more recent software development models still include some kind of RE as well. However, the degree of formalism is quite diverse. In Extreme Programming, requirements are not specified as a list of required system functions, but “the system customer is part of the development team and discusses scenarios with other team members” [24, p. 399]. On the other extreme, when developing critical systems, requirements are expressed in detail using e.g. an algebraic or a model-base approach [24, p. 222]. Concluding, RE is an integral aspect of any software development model and due to its impact on the subsequent phases in each model, one of the most important ones.
C. Success Factors Reference [10, p. 65] has identified several best practices for successful RE. Concerning stakeholder interaction, the most important mentioned ones are: - “Allocate 15 to 30 percent of total project effort to RE activities” - “Involve customers and users throughout RE” - “Identify and consult all likely sources of requirements” - “Prioritize requirements” Following these best practices, customers and users are actively involved in the RE process and help to prioritize requirements. This is a crucial aspect of the entire process as these stakeholders know best what the software is supposed to be used for and which features are needed most. To handle communication among all stakeholders most projects employed an internal website, where the requirements were posted and maintained [10]. Studies by [1] and [17] both identify communication problems (e.g. one way communication channels, different types of notations, organizational barriers, traceability problems) between stakeholders as an important issue in the RE process as well. To improve communication [17] suggests the implementation of a communication protocol that regulates messages and the information exchange between RE tasks. D. Recent Developments Research by [16] has shown that for requirements elicitation the currently dominant techniques are scenario & use cases, focus groups, and informal / semiformal modeling. As mentioned above, RE teams usually use an internal website or other online tools to ease the communication among all stakeholders. According to [16] websites and support systems for distributed teams in general are becoming increasingly important. In general, due to the high number of mergers & acquisitions between companies and the consequent composition of multi-national, physically distributed development teams, the demand for group support systems for the collaborative development of software is high. Concerning the highly interactive RE process, distributed technologies such as Arena-II [21] or DisIRE [5] have emerged. Furthermore, the globalization and offshoring of software development have had an impact on RE. Reference [2] presents four characteristics of software development projects that are outsourced to offshore locations: First of all, the vendor is faced with two groups of stakeholders (the client’s IT group and its business community). Secondly, usually multiple RE tools and processes exist across the different locations. Thirdly, globalized projects are characterized by different working and communication cultures. Finally, the ad hoc staffing of teams leads to multiple transitions between the different locations. All of these characteristics have to be taken into account when establishing an RE process for an outsourced project that is being developed offshore [2, p. 38– 39]. Another important development is the popularity of embedding software in all kind of products, such as cell phones,
MOBILE TECHNOLOGIES IN REQUIREMENTS ENGINEERING
cars or television sets [27]. To get valid and stringent requirements for embedded or mobile software, [22, p. 17] suggests: “If your system behavior depends strongly on the environment, go there and explore it.” This new way of discovering requirements became possible through the advancement of MT and will be covered in chapter III. III. MOBILE TECHNOLOGIES’ BENEFIT TO REQUIREMENTS ENGINEERING
A. Potential of Mobile Technologies in the Domain of Requirements Engineering As pointed out in chapter II, RE is one of the most important phases of the software development process. However, it is one of the most difficult to manage: A lot of stakeholders (developers, business analysts, future users, etc.) have to be involved to produce sound requirements. Yet, e.g. due to time-limitations, stakeholders are often not available to be involved in the RE process. Additionally, in many cases they have problems expressing their expectations because of the phenomenon of tacit knowledge. Reference [6, p. 5] emphasizes this: “Experience shows that simply asking managers what they want often works poorly. They do not (usually) know what is technically feasible, and cannot accurately describe what their workers and clients really do [...].” Consequently, the requirements suffer. However, MT offer substantial opportunities addressing these problems and improving the requirements elicitation process in particular: Using a mobile device, a requirements analyst can capture requirements anytime and anywhere, optimally directly at the prospective user’s desk. Being directly at the user’s desk, problems and challenges of the immediate work environment can be identified as well. Users no longer have to try to articulate their tacit knowledge, but the requirements analyst can directly observe the user’s working habits. This leads to a profound comprehension of the intended purpose of the software, less media disruption, and therefore to better requirements [22]. B. Range of Application Not only the RE of business software with a limited set of stakeholders can profit from MT. Another major area of application of MT is the elicitation of wide audience requirements. As software is embedded in mass-products (such as home-entertainment-systems, mobile devices, cars, etc.) to a great extent, the need to analyze the requirements of wide audiences has emerged [27]. MT allow requirements analysts to visit end-users even in their homes and analyze the usage behavior of e.g. a television-set. Another area of application is the usage of MT in outdoor environments, e.g. in the construction or utilities industries [29, 14]. C. Available Software Tools i. RE-specific vs. general-purpose Tools For employing MT in the RE process two categories of software are available: One can use general-purpose tools (such as Pocket Word or mobile web logs) to capture requirements. On the other hand, one can utilize sophisticated tools that specifically support RE and are integrated with their
319
desktop equivalents. Usually general-purpose tools are already being shipped with most mobile devices and are conveniently available to a lot of users. In contrast, mobile RE tools are just being developed and do not have many features yet [13]. However, [13, p. 49] strongly recommends sophisticated tools, because “real value comes from integrating a PDA’s capabilities into an application that supports specialized RE tools for mobile requirements discovery”. Yet, mobile tools are not supposed to replace their desktop pendants, but should complement existing environments. Integration of mobile and desktop tools is critical as users of both systems should have access to the same workspace [23]. Two of the most popular mobile RE tools are Arena-M and the Mobile Scenario Presenter. ii. Arena-M Arena-M (Anytime, Anyplace Requirements Negotiation Aids - Mobile) is a mobile negotiation assistant and part of the Arena-II system. As a web-application - running on mobile browsers - it provides all Arena-II features to mobile stakeholders. Arena-II itself has been developed by the Johannes Kepler University of Linz and provides support for distributed requirements negotiations [21]. It is based on EasyWinWin, “a requirements negotiation approach that has originally been designed to support face-to-face group interactions using a Group Support System (GSS)” [21, p. 87]. EasyWin-Win presented by [3] in 2000 - involves all success-critical stakeholders into the negotiation process and encourages them to share their different expectations. Originally, all stakeholders had to be physically available in a meeting room, where issues could be resolved face-to-face. To be able to integrate stakeholders more easily, Arena-II allows stakeholders to be physically distributed, connected via the internet on a desktop PC. Arena-M can be used on mobile devices and therefore allows requirements negotiations to be performed directly in the work environment of future system users [21]. iii. Mobile Scenario Presenter The Mobile Scenario Presenter (MSP) has been developed by the Johannes Kepler University of Linz, Austria, and the City University London, UK. It is based on ART-Scene and allows “both mobile analysts and future system users to acquire requirements systematically and in situ using structured scenarios” [13, p. 47]. ART-Scene (Analyzing Requirements Trade-offs: Scenario Evaluation) is a scenario-driven approach to elicit stakeholder requirements. The MSP allows analysts to walk through different scenarios by generating normal and alternative courses from use case specifications. When walking through each scenario with the corresponding stakeholders, the analyst can add, edit or remove events and requirements. The MSP does not only support plain text descriptions of requirements, but also allows them to be augmented with drawings and audio notes [20]. D. Evaluation Recent studies have shown that Arena-M and the MSP do support requirements analysts quite successfully: Reference [21, p. 5] shows “the potential of Arena-M and its benefits for
320
KURTZ ET AL.
supporting negotiators directly in their work context”. Reference [14] reports positive effects on the documentation and discovery of requirements as well: Within an empirical research, analysts discovered almost up to 30 requirements per hour with the MSP, which is three times the rate of using the desktop Scenario Presenter. However, as with most emerging technologies, certain limitations have to be considered as well: i. Technological Limitations Although MT are a quite mature technology, [21] and [14]. have reported some issues users had when utilizing their mobile devices for RE: By comparing the desktop-based ArenaII with Arena-M, users complained about awkward data input with a pen control, poor system performance (e.g. slow response times) and problems with the Wi-Fi network. Usability was further reduced by the absence of features such as error recovery, review of past activities and documentation. In contrast, [23, p. 714] states that “not all features [...] will make sense on a mobile device” and proclaims to focus on simplicity rather than on a high number of features. Generally, [9] identifies synchronization mechanisms for data stored on various devices and resource limitations of mobile devices as the most urgent technological challenges. Currently, Arena-M and the MSP are both based on Microsoft’s ASP.NET and optimized for Microsoft’s Pocket Internet Explorer. To allow numerous stakeholders to use these tools, they should be platform-independent and support different mobile devices (e.g. smart phones, PDAs and Tablet PCs) utilizing various mobile networks (GPRS, UMTS, WLAN, etc.) [20]. ii. Communication Problems Reference [7] shows additional challenges that have to be faced with distributed requirements negotiations in general, such as the lack of immediate feedback from other stakeholders or poorer communication through the absence of faceto-face interaction. Poorer communication might even lead to misinterpretation of the requirements by other stakeholders as there often is no possibility to check whether they were interpreted correctly [1]. Reference [6, p. 5] strongly advises to use a structured interview format, because “open interviews are also more vulnerable to interviewer bias” and therefore “vulnerable to political manipulation by participants, as many requirements engineers know from bitter experience”. E. Real-World Example Reference [14] has employed MT to discover requirements for a London bus stop information system. The system under development is supposed to provide bus arrival times in realtime. During their study analysts used the MSP to walk through a given scenario at the bus stop and entered requirements either from observations or from acquisition dialogues with end users. The study showed that although the MSP could still be improved, it did support analysts quite well (cp. also the study by [15, p. 1188]). F. Best Practices
Concluding, [13, p. 46–52] has recently published six lessons learned for practitioners to follow when establishing stakeholder needs. From their research the following best practices have emerged: - “You can use mobile technologies in the workplace to discover requirements that precisely reflect the future end users’ needs” - “To achieve real benefits, we need a range of bespoke mobile requirements tools” - “Consider the needs of different users of mobile RE tools, such as analysts and future system users” - “Usability is essential for mobile requirements tool” - “Carefully plan your use of mobile RE tools in advance to ensure a sound technical infrastructure or to lessen your dependence on the infrastructure in the first place” - “Capture just enough information about a requirement to enable its complete specification at a later time” IV. CONCLUSION A. Summary and Discussion Various authors suggest the use of MT in the RE process and illustrate positive effects such as productivity gains and a better quality of the resulting requirements. However, one should not over-estimate the impact of mobile tools: A lot of documents in a requirements specification are diagrams (e.g. use cases), flow charts, or ER-diagrams. Unlike simple text documents, these documents can hardly be created and viewed on mobile devices—some devices with small screens might have problems visualizing the ER-diagram of a grownup software system. Nevertheless, MT are certainly going to become an integral part of the toolsets of most requirements analysts. Besides general advantages of mobile computing, they offer specific advancements for RE as well. A general advantage of MT is the possibility for users to access and share data anywhere in real time: Ideas may be captured as soon as they arise and they can instantly be transmitted and discussed with other peers and without possible media disruption. Besides these common benefits to ease communication among stakeholders, new products demand new and innovative RE processes as well: As many software products operate in a mobile environment, the behavior of the product is context-dependent as well. Consequently, requirements should be explored in this very same context [22]. Hence, MT are a good option to support stakeholder interviews or to gather specific data (such as usage behavior) in the field. Due to technical limitations and human communication preferences they are not going to substitute stakeholder meetings and face-to-face discussions [1, 13]. Ultimately, MT are not going to replace existing RE tools and techniques, but they offer new possibilities to enhance and complement RE processes. B. Further Work Further research should address technical issues of MT and issues regarding the integration of MT into RE processes:
MOBILE TECHNOLOGIES IN REQUIREMENTS ENGINEERING
Technical issues such as the integration of mobile and desktop tools, the amount of features a mobile device should offer, and the ideal utilization of a mobile device’s scarce resources are probably among the most urgent ones [23]. Regarding the tool support for collaborative software engineering in general, [28] demands an integrated web-based development environment as existing tools mostly only cover a single phase of the entire software development process. Besides those technical issues, research concerning the optimal workflow when using mobile tools is necessary as well. E.g. reference [14] suggests requirements analysts to work in pairs - similar to pair programming: one analyst doing the interview, whereas the other one takes care of documenting their findings. However, an optimal solution is yet to be discovered, as [14] found analysts not being able to cope with multi-tasking (observing the environment and entering requirements in a PDA at the same time) very well. Additionally, the use of MT in real-life RE processes has to be further explored empirically. On the basis of those findings processes and tool design can evolve. Subsequently, these newly created artifacts can in turn be further evaluated by empirical methods according to the design science research approach presented by [8]. REFERENCES [1] A. Al-Rawas and S. Easterbrook, “Communication problems in requirements engineering: a field study,” Proceedings of the First Westminster Conference on Professional Awareness in Software Engineering, 1996, pp. 4760, [2] J. Bhat, M. Gupta, and S. Murthy, “Overcoming requirements engineering challenges: Lessons from offshore outsourcing,” IEEE Software, Sept./Oct. 2006, pp. 38-44. [3] B. Boehm and P. Gruenbacher, “Supporting collaborative requirements negotiation: The EasyWinWin approach,” Proceedings International Conference on Virtual Worlds and Simulation, 2000. [4] B. W. Boehm, Software Engineering Economics. Prentice-Hall, Englewood Cliffs, NJ, 1981. [5] M. Geisser, A. Heinzl, T. Hildenbrand, and F. Rothlauf, “Verteiltes, internetbasiertes Requirements-Engineering,” WIRTSCHAFTSINFORMATIK 49 (2007) 3, 2007, pp. 199–207. [6] J. A. Goguen, “Formality and informality in requirements engineering,” Proceedings of the 2nd International Conference on Requirements Engineering, 1996, pp. 102– 108. [7] P. Gruenbacher and P. Braunsberger, “Tool support for distributed requirements negotiation,” Cooperative Methods and Tools for Distributed Software Processes, Franco Angeli, Milano, Italy, 2003. [8] A. R. Hevner, S. T. March, J. Park and S. Ram, “Design science information systems research,” MIS Quarterly vol. 28, 2004, pp. 75–105.
321
[9] T. Hofer, G. Leonhartsberger, and M. Pichler, “Considerations and requirements for tools supporting mobile teams,” Proceedings of the 22nd International Conference on Distributed Computing Systems Workshops, 2002, pp. 389–390. [10] H. F. Hofmann and F. Lehner, “Requirements engineering as a success factor in software projects,” IEEE Software, vol. 18, no. 4, July/Aug. 2001, pp. 58–66. [11] C. Hood and R.Wiebel. Optimieren von Requirements Management & Engineering: Mit dem Hood Capability Model. Springer, Berlin ; Heidelberg [u.a.], 2006. [12] D. Jacobs, “Requirements engineering so things don’t get ugly,” Companion to the proceedings of the 29th International Conference on Software Engineering, 2007, pp. 159–160. [13] N. Maiden, L. Seyff, P. Gruenbacher, L. Otojare, and K. Mitteregger, “Determining stakeholder needs in the workplace: How mobile technologies can help,” IEEE Software, vol. 24, no. 2, 2007, pp. 46–52. [14] N. Maiden, N. Seyff, P. Gruenbacher, O. Otojare, and K. Mitteregger, “Making mobile requirements engineering tools usable and useful,” Proceedings of the 14th IEEE International Requirements Engineering Conference (RE’06)-Volume 00, pp. 26–35, 2006. [15] K. Moe, B. Dwolatzky, and R. Olst, “Designing a usable mobile application for field data collection,” Proceedings of the 7th AFRICON Conference in Africa, vol. 2, 2004, pp. 1187- 1192. [16] C. J. Neill and P. A. Laplante, “Requirements engineering: The state of the practice,” IEEE Software, vol. 20, no. 6, 2003, pp. 40–45. [17] B. Palyagar and D. Richards, “A Communication Protocol for Requirements Engineering Processes,” 11th International Workshop on Requirements EngineeringFoundation For Software Quality, 2005, pp. 13–14. [18] S. Robertson and J. Robertson. Mastering the requirements process. Addison-Wesley, Upper Saddle River,NJ, Munich [u.a.], 2006. [19] W. W. Royce, “Managing the development of large software systems: concepts and techniques,” Proceedings of the 9th international conference on Software Engineering, 1987, pp. 328–338. [20] N. Seyff, “Collaborative tools for mobile requirements acquisition,” ASE ‘04: Proceedings of the 19th IEEE international conference on Automated software engineering, 2004, pp. 426–429. [21] N. Seyff, P. Gruenbacher, C. Hoyer, and E. Kroiher, “Enhancing gss-based requirements negotiation with distributed and mobile tools,” Proceedings of the 14th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprise, 2005, pp. 87– 92. [22] N. Seyff, P. Gruenbacher, and N. Maiden, “Take your mobile device out from behind the requirements desk,” IEEE Software, vol. 23, no. 4, 2006, pp. 16–18. [23] N. Seyff, P. Gruenbacher, N. Maiden, and A. Tosar, “Requirements engineering tools go mobile,” ICSE ‘04:
322
KURTZ ET AL.
Proceedings of the 26th International Conference on Software Engineering, 2004, pp. 713–714. [24] I. Sommerville. Software engineering. Addison-Wesley, Harlow ; Munich [u.a.], 8. edition, 2007. [25] I. Sommerville and P. Sawyer. Requirements engineering: a good practice guide, Wiley, 2003. [26] Standish Group. CHAOS Reports. [27] T. Tuunanen and M. Rossi, “Engineering a method for wide audience requirements elicitation and integrating it to software development,” Proceedings of the 37th An-
nual Hawaii International Conference on System Sciences, 2004, pp. 174–183. [28] J. Whitehead, “Collaboration in software engineering: A roadmap,” International Conference on Software Engineering, 2007, pp. 214–225. [29] W. Zou, X. Ye, W. Peng, and Z. Chen, “A brief review on application of mobile computing in construction,” Proceedings of the First International Multi-Symposiums on Computer and Computational Sciences, vol. 2, 2006, pp. 657–661.
Unsupervised Color Textured Image Segmentation Using Cluster Ensembles and MRF Model Mofakharul Islam
John Yearwood
Peter Vamplew
Center of Informatics and Applied Optimization School of Information Technology and Mathematical Sciences University of Ballarat, Australia Abstract- We propose a novel approach to implement robust unsupervised color image content understanding approach that segments a color image into its constituent parts automatically. The aim of this work is to produce precise segmentation of color images using color and texture information along with neighborhood relationships among image pixels which will provide more accuracy in segmentation. Here, unsupervised means automatic discovery of classes or clusters in images rather than generating the class or cluster descriptions from training image sets. As a whole, in this particular work, the problem we want to investigate is to implement a robust unsupervised SVFM model based color medical image segmentation tool using Cluster Ensembles and MRF model along with wavelet transforms for increasing the content sensitivity of the segmentation model. In addition, Cluster Ensemble has been utilized for introducing a robust technique for finding the number of components in an image automatically. The experimental results reveal that the proposed tool is able to find the accurate number of objects or components in a color image and eventually capable of producing more accurate and faithful segmentation and can. A statistical model based approach has been developed to estimate the Maximum a posteriori (MAP) to identify the different objects/components in a color image. The approach utilizes a Markov Random Field model to capture the relationships among the neighboring pixels and integrate that information into the Expectation Maximization (EM) model fitting MAP algorithm. The algorithm simultaneously calculates the model parameters and segments the pixels iteratively in an interleaved manner. Finally, it converges to a solution where the model parameters and pixel labels are stabilized within a specified criterion. Finally, we have compared our results with another well-known segmentation approach. I.
INTRODUCTION
Color image segmentation emerges as a new area of research. Color image segmentation can solve many contemporary problems in medical imaging, mining and mineral imaging, bioinformatics, and material sciences. Naturally, color image segmentation demands well defined borders of different objects in an image. So, there is a fundamental demand of accuracy. The segmented regions or components should not be further away from the true object than one or a few pixels. So, there is a need for an improved image segmentation technique that can segment different components precisely. Image data have some particular characteristics that differentiate them from other form of data. Image data may
have corrupted values due to the usual limitations or artifacts of imaging devices. Noisy data, data sparsity, and high dimensionality of data create difficulties in image pixel clustering. As a result, image pixel clustering becomes a harder problem than other form of data. Although there are some existing algorithms for unsupervised color image segmentation, none of them has been found to be robust in determining an accurate number of components or segments. More noise means more uncertainty. Handling uncertainty is the most significant problem in image segmentation. Conventional probability theory was the primary mathematical model that used to deal with this sort of uncertainty. Researchers have found that probability theory alone is not quite adequate to handle uncertainty especially where the situation demands more precise handling. As a result, there was a need to augment conventional probability models by introducing some additional concept or knowledge to make it more powerful. Over the last few decades researchers from the different computing communities proposed various approaches that are quite efficient in handling uncertainty. Markov Random Field Model (MRF) based approaches are among them. The only difference between MRF model based approaches with other models is that MRF model based approaches consider the neighborhood influences of pixels in addition to their pixel features (color and texture) while other models consider the pixel features only. This basic difference adds an extra degree of segmentation accuracy especially while handling noisy image data. Natural color images are particularly noisy due to the environment they were produced. Therefore, it is hard to develop a robust and faithful unsupervised technique for automatic determination of number of objects in a color image. Although there are a few existing approaches for unsupervised color image segmentation, none of them has been found robust in all situation. Initially we tried SNOB [2], [3], a Minimum Message Length (MML) based unsupervised data clustering approach to address this problem. The Minimum Message Length (MML) Principle is an information-theoretic approach to induction, hypothesis testing, model selection, and statistical inference. Although SNOB has been found more or less effective in finding the number of clusters in an image but it miserably fails when image contains more noise and other imaging artifacts like shadow and intensity inhomogeneity which are most common in color medical images. As a result, we introduce Cluster
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 323–328. © Springer Science+Business Media B.V. 2008
324
ISLAM ET AL.
Ensembles to get a consolidated clustering for finding the number of components in a color medical image. Using a finite mixture model in color image segmentation is a popular approach as in statistical model-based methods; computationally intensive algorithms are a common drawback. As a result, finite mixture models have received significant attention from researchers in the last decade. Finite mixture models have been further modified by Sanjay-Gopal and Herbert to Spatially Variant Finite Mixture Models (SVFMM) by introducing a process based on Markov Random Fields (MRF), which is able to capture the spatial relationship among neighboring pixels [1] In doing so, it can efficiently handle specific situations like recognition of different objects in an image which is usually associated with numerous artifacts such as noise and intensity inhomogeneity. So, SVFMM has some advantageous features, especially in the case of contextually dependent physical phenomena like images as it incorporates the outstanding features of the MRF model without additional computational burden. In this particular work, we have improved this one dimensional SVFM Model to a multidimensional SVFM model which is able to handle multiple number of features simultaneously. The remainder of this paper proceeds as follows: In Section II we present the Spatially Variant Finite Mixture Model (SVFMM) followed by improvement of the model by us in Section III. In Section IV, we describe our color image segmentation approach that we use in our proposed approach for color image segmentation. In Section V, we present previous work in this specific field. Section VI presents our contribution as a whole into the proposed approach. Experimental results demonstrating the effectiveness and accuracy and of the proposed approach are discussed in Section VII and finally in Section VIII, we present our conclusion and future work. II.
SPATIALLY VARIANT FINITE MIXTURE MODEL (SVFMM)
The SVFMM model is a modification of the classical mixture model or Gaussian mixture model with the MRF model [4]. We assume a mixture model with an unknown number of components K each one having its own vector of density parameter θ j for each feature. So, the probabilities of the ith pixel belonging to the jth class label are
η ij = P ( j | x i )
where, x = observation at the ith pixel. Here, model parameters should strictly satisfy several constraints at the end of this section. The SVFMM assumes that the density function
f ( x i | φ , Θ)
at any observation xi is given by:
j =1
j
(2)
is the number of
Maximum a posteriori (MAP) introduces a prior distribution for the parameter set φ that takes into account spatial information based on the Gibbs function. According to the Hammersley-Clifford theorem, Gibbs distribution takes the following form; (3) P (φ ) = Z1 exp(−U (φ )) where, U (φ ) =
φ
β ∑ V N (φ ) . i
− U (φ ) is an energy
is a vector of features. The function
term. β is called the regularization parameter, the normalizing constant Z is called a partition function and V N i (φ ) denotes clique potential of the label configuration
p m within the neighborhood N i of the ith pixel which can be calculated as;
V N i (φ ) = Where the
∑ g (u
m∈N i
i ,m
)
(4)
u i ,m denotes the distance between the two i
m
label vectors p and p . The function g(u) must be gradually increasing and nonnegative. A posterior log density function can be derived from (3) as; N
P (φ , Θ | X ) = ∑ log( x i | φ , Θ) + log P(φ )
(5)
i=
The Expectation-Maximization (EM) algorithm requires that the computation of the conditional expectation value
z ij of the hidden variables must be at the Expectation step due i
to MAP estimation of the parameters { p j } and
z
i(t ) j
=
p ij
∑
(t)
ψ ( x i |θ
(t) pi ψ l =1 l K
{θ j } ;
(t) j
( x i |θ
) (t) j
)
(6)
Then, maximization of the following log-likelihood corresponding to the complete data set is performed in the Maximization step;
QMAP (φ , Θ | φ (t ) Θ (t ) )
f ( x i | φ , Θ) = ∑ p ψ ( x i | θ ) i j
distribution with
components in the mixture.
(1)
i
K
ψ ( xi | θ j ) is a Gaussian parameters θ j = ( μ j , σ j ) , and K where,
where, t = iteration step.
(7)
UNSUPERVISED COLOR TEXTURED IMAGE SEGMENTATION
For each parameter, QMAP can be maximized independently, which yields the update equations for parameter of the component densities
μ (jt +1)
and [σ j ]
2 ( t +1)
.
However, maximization of the function Q MAP with i
respect to the label parameter p j does not provide a closed form of update equations. In addition, the maximization procedure must also take into the constraints 0 ≤ p j ≤ 1 and i
K
∑p j =1
i j
= 1 . However, this difficulty has been encountered
successfully by Sanjay-Gopal & Herbert [1] and subsequently by Blekas et al [4]. III. IMPROVEMENT OF THE SVFMM MODEL
From an image segmentation point of view, approaches based on the SVFMM as proposed in other works [1], [4] have a common limitation that it can handle single feature only. The reason is, the SVFMM is a single dimensional model. As a result, its application is limited to gray level images only. In our proposed approach, we have improved and modified the model to make it efficient for longer set of features, which makes it enable to handle multiple feature set. Therefore, our proposed multidimensional SVFMM is capable to handle multiple cues including color and texture features simultaneously. IV.
PROPOSED APPROACH
COLOR
IMAGE
SEGMENTATION
Accuracy in segmentation has tremendous potential in different imaging applications where subtle information related to color and texture is required to analyze the image accurately. In this particular work, we have used CIE-Luv and Haar wavelet transforms as color and texture descriptors respectively. Using the combined effect of a CIE-Luv color model and Haar transforms, our proposed approach is able to segment color images precisely in a meaningful manner. As a whole, our proposed segmentation approach combines SVFM model, Wavelet transform and CIE-Luv color space along with a robust technique for finding the number components in an image using cluster ensemble to implement a robust unsupervised color image segmentation approach that will produce accurate and faithful segmentation results on a wide variety of natural color images. The most significant aspect in image segmentation is feature extraction. The segmentation quality greatly depends on the choice of feature (intensity, color, texture, and coordinate) descriptor. Choice of a suitable color space is one of the main aspects of color feature extraction. Most of the color spaces are three dimensional where different dimensions represent the different color components. The most common color space in computer graphics is RGB. Unfortunately, this 3D color space does not correspond to equal perception of color dissimilarity. As a result, alternative color spaces are generated by transformation of the RGB color space. Color spaces like HSV, CIE-Lab and
325
CIE-Luv can be generated by non-linear transformation of the RGB space. We prefer CIE-Luv as it represents the three characteristics i.e., hue, lightness and saturation that characterize the perceptually uniform color efficiently. The texture descriptor plays a significant role in the analysis of contextually dependent physical phenomena like image in addition to the color descriptor. This project aims to investigate and implement a more context sensitive image segmentation technique that can produce reliable segmentation of color images on the basis of subtle color and texture variation. In order to do that we have introduced wavelet transforms for the first time as texture descriptor under Markovian framework for color image segmentation purpose. Image analysis in medical imaging demands a more sensitive texture descriptor that can provide us an efficient way to capture the subtle variation in texture. The Wavelet transforms has the ability to capture subtle information about the texture while other texture descriptors are not ideal in those specific situations. Recently, approaches using MRF models and Bayesian methods have provided answers to various contemporary problems in image segmentation. The application of a clustering method to image segmentation has the particular characteristics that spatial information should be taken into account. Here in addition to intensity-values of pixels, the pixel location must also be used to determine the cluster to which each pixel is assigned. Conceptually, in most cases it is desired that the same cluster label be assigned to spatially adjacent pixels. In implementing these ideas, the Bayesian framework provides a natural approach. Pixel intensity information is captured by the likelihood term in Bayesian Networks, while a prior biasing term captures the spatial location information with the help of Markov Random Fields (MRF) model. A. Color Feature Extraction Color is an important dimension of human visual perception that allows discrimination and recognition of visual information. Extracting and matching color features are relatively easy. In addition to that, color features have been found to be effective in color image segmentation [5], [6], [7]. In this particular work, we prefer CIE-Luv as it represents the three characteristics i.e., hue, lightness and saturation that characterize the perceptually uniform color efficiently. B. Texture Feature Extraction Texture characterizes local variations of image color or intensity. There is no formal or unique definition of texture though texture based methods are commonly being used in computer vision and graphics. Each texture analysis method defines texture according to its own need. Texture can be defined as a local statistical pattern of texture primitives in observer’s domain of interest. Texture is often describe as consisting of primitives that are arranged according to placement rule which gives rise to structural methods of texture analysis that explicitly attempts to recover the primitives and the replacement rules. On the other hand,
326
ISLAM ET AL.
statistical methods consider texture as random phenomena with identifiable local statistics. The neighborhood property is a common similarity among all the texture descriptions. So, we can define texture as an image feature which is characterized by the gray value or color pattern in a neighborhood surrounding the pixel. We are using Wavelet transform to extract texture feature. Using Wavelet transform is a recent trend in image processing. Although, wavelet transform has made significant contribution in the several areas of image processing such as image enhancement, image compression, and image registration, its use in image segmentation is very limited. Generally, Wavelet transforms performs better than other texture descriptors like Gabor and MRSAR filters [10], [16]. Gabor filters are found efficient in capturing the strong ordered or coarser texture information while MRSAR filters works well in capturing weak ordered or finer texture details. Wavelet transform computation involves recursive filtering and sub-sampling and each level, it decomposes a 2D signal into four sub-bands, which are often referred to as LL, LH, HL, and HH (L=Low, H=High) according to their frequency characteristics. C. Finding The Number of Components We have tried several data clustering approaches to identify the number of components automatically but most of them have been found inappropriate for handling image data properly. A few of them are found incapable to handle data having corrupted values as many pixels may not have true values for some attributes while few of them are not performing well in handling discrete data like image data. SNOB [2], [3], an unsupervised data clustering approach is free from this sort of difficulties and has been found more effective in handling noisy color image data that have a diverse varieties of faults and imperfections. D. Cluster Ensembles In general SNOB has been found more or less effective in identifying the number of components in an image but sometimes it produces spurious results due to its limitations to handle some imaging artifacts like noise, shadow, and intensity inhomogeneity. So, there is a pressing need to devise a robust unsupervised technique to find the number of components in an image automatically. An emerging approach like Ensembles of Clusterings can plays a significant role to augment the technique for finding a faithful and authenticated number of components in an image. Ensembles of clustering are basically a combination of multiple clusterings to get a consolidated clustering without accessing the original data and algorithms. Here, only symbolic cluster labels are mutually shared. Multiple clusterings can be produced either using a single clustering algorithm by varying the different parameter or using different clustering algorithms on same data set and parameter [11], [12]. We use SNOB as a starting point for getting multiple clusterings based on which Ensembles of Clusterings produce a consolidated clustering. The procedure we have followed for determining the number of components
is as follows: produce multiple clusterings for different parameter values and select the stable solution found in the plot of the number of clusters. The convergence criteria or number of iteration of MRF model is the only parameter that we have varied to get multiple clusterings. E: Segmentation of Image Using Our Improved SVFMM Model After extraction of the six color and texture features, we put all the features into a MRF model for incorporating the neighborhood relationships among pixels in the context of these six features. The output of the MRF model will be fed into the SNOB as input to determine the number of classes in the input image on the basis of these color and texture features. Now, the six original features along with the number of classes to be obtained from Ensembles of Clusterings, are the input of our segmentation model, where clustering of pixels will be done on the basis of homogeneity in color and texture properties. As a result, regions are obtained where there is a discontinuity in either color or texture. After we put these input into the segmentation model, the K-means algorithm determines the centers of the clusters for each component of the image. Then the Expectation Maximization (EM) algorithm is used for estimating the model parameters. In fact, model parameter estimation and segmentation of pixels will run simultaneously within MAP estimation under Markovian framework. Finally, MAP will classify the pixels of the input images into different pixel classes on the basis of color and texture features. Here, the function QMAP plays a crucial role in determining the actual membership of a class. In fact, the color and texture features of pixel directly influence the maximization of the QMAP function. That means every individual feature has some influence on the maximization process of the QMAP function. Some features have greater influences while others have minimum effect. If the influences of some particular features are greater than others in the maximization process of the QMAP function for a particular pixel then the clustering of that pixel is greatly influenced by those particular features. V.
RELATED PREVIOUS WORK
Kato & Pong proposed a segmentation technique for color textured images based on Markov Random Fields using color and texture features [6]. The proposed approach is supervised in nature. As such, no model parameter estimation technique has been used and the number of components was selected manually. In 2003, Kato, Pong, and Song proposed another color and texture feature based algorithm using a multi-layer MRF model for unsupervised segmentation that automatically estimates the number of components in the Gaussian mixture and the associated model parameters by introducing a mathematical term to the energy function of each layer [7]. Four-paged paper does not provide enough description about how it estimates the number of components. Further, Kato & Pong proposed another similar algorithm where they used
UNSUPERVISED COLOR TEXTURED IMAGE SEGMENTATION
color and texture feature. They utilized the EM algorithm for estimation of the model parameter but they set the number of components (pixel classes) manually [5]. Panjwani & Healy proposed another unsupervised segmentation approach, where Agglomerative Hierarchical Clustering has been utilized for pixel segmentation and a maximum pseudolikelihood scheme for model parameter estimation [9]. No technique has been applied for determination of number of components. Ray & Turi [13] proposed an unsupervised color image segmentation approach using K-means where they employed Inter-Class and Intra-Class distance measures for pixel classification. This is a color feature based approach where they did not consider texture features and neighborhood relationships among pixels. Yiming, Xiangyu and Luk [14] suggested another unsupervised color image segmentation approach on finite mixture model where they employed Minimum Message Length (MML) principle for finding the number of components in images and Maximum Likelihood Estimate (MLE) for pixel classification. This is also a color feature based approach where neither texture feature nor has neighborhood relationship been considered. Deng & Manjunath [15] proposed another unsupervised color image segmentation approach that is mainly focused on color image segmentation using both edges and region-based approach. This approach is commonly known as JSEG approach. VI.
WHAT’S NEW IN OUR PROPOSED APPROACH
Our approach is different from previous work in a few major ways. Approaches based on the SVFMM as proposed in other works [1], [4] have been applied on gray level images only, whereas we have developed the model into a multidimensional model to make it efficient for color and textured images as well. The algorithms as proposed in other previous work [5], [6], [7] are based exclusively on MRF model that uses Gabor and MRSAR texture features in addition to the color features. Here, the authors did not use any Finite Mixture Model in their approach rather they directly applied the MRF model in their algorithms, which makes their algorithm computationally more expensive. Further, these existing approaches are greatly influenced by similarity in color. In natural color images, only similarity in color does not really mean the same object. In front of a background color many objects of similar color may lie on it, which can be characterized through variation of their textures. We have used the Haar wavelet transform for texture feature extraction as it has the ability to capture subtle texture information with better computational efficiency. Further, Haar transforms are yet to employ as a texture descriptor in SVFMM based segmentation scheme that uses MRF. Further, in order to maintain sharp boundaries, we have assigned 65% weight on color features and 35% weight on texture features in the same layer in MAP classification that adds some extra advantages in terms of computing efficiency. In addition, we introduce a robust technique for automatic determination of the number of pixel classes in color images based on Cluster Ensembles, which is new of its type and can
327
be considered as a novel approach in the context of color image. We have seen few unsupervised color image segmentation approaches. Among them, Kato, Pong, and Song, [7] has introduced the unsupervised technique for determining the number of pixel classes. So far, the authors have applied this approach on 3 images only out of which 2 are synthetic color images. The only natural color image they used is a very simple image that has only one object on a background. Further, we have not seen any continuation work so far on this particular approach. So, we can not consider this approach as a general approach for a wide variety of color images. Yiming, Xiangyu and Luk [14] approach is good for small applications and simple images. Deng & Manjunath’s [15] JSEG approach so far has been found to be more effective and robust among the existing unsupervised approaches. As a result, it has gained popularity among the image segmentation community. Comparisons of experimental results reveal that our proposed approach is able to outperform the well-known JSEG approach. VII.
DISCUSSSION ON EXPERIMENTAL RESULTS
In our experiment, we have sourced our images from the Barkeley Segmentation dataset [17]. We have compared our proposed approach with another existing powerful color image segmentation approach JSEG [15], which is known as the most robust approach so far. In Fig. 1, segmentation results with JSEG shows over segmentation in case of top image. In the middle image, it failed to capture the objects accurately in some regions. In bottom image, two horses are merged into one object.
Fig. 1. Segmentation Results with JSEG approach
328
ISLAM ET AL.
In Fig. 2, experimental results reveal that our proposed approach captures the finer details of the image in terms of color and texture and thus is capable to segment the images more accurately than the JSEG approach. It is clearly evident from the experiments that our proposed approach is able to segment an image into its constituent objects in a meaningful manner.
[6]
[7] [8] [9] [10]
[11] [12] [13]
[14]
[15] Fig. 2. Segmentation Results with our proposed approach VIII.
CONCLUSION
We have proposed a new approach for unsupervised segmentation technique for color image that can successfully segment color images with each and every details present in color and texture. We have applied our approach on a wide variety of natural images and found promising results but we could not present these results here due to space constraint. Future work will focus on comparison of this approach with other unsupervised approaches based on SVFMM and MRF model in terms of computing efficiency and segmentation quality. Further, effectiveness of Haar wavelet transforms could be compared with Daubechies wavelet transforms and othertexture descriptors like Gabor and MRSAR as well. REFERENCES [1] [2] [3] [4] [5]
[16]
S. Sanjay-Gopal, and T.J. Herbert, “Bayesian Pixel Classification Using Spatially Variant Finite Mixtures and the Generalized EM Algorithm”, IEEE Trans. on Image Processing, Vol. 7(7), July 1998, pp. 207-216. C.S. Wallace, and D.M. Boulton, “An Information measure for classification”, Computer Journal, Vol. 11(2), pp. 185-194. C.S. Wallace, and D..L. Dow, “MML clustering of multi-state, poisson, von mises circular and gaussian distribution”, Statistics and Computing, Vol.10(1), }’, Jan. 2000, pp.73-83. K. Blekas, A. Likas, N.P. Galatsanos, and I.E. Lagaris, , “A SpatiallyConstrained Mixture Model for Image Segmentation”, IEEE Trans. on Neural Networks, Vol. 16(2), pp. 494-498. Z. Kato, and T.C. Pong, “A Markov Random Field Image Segmentation Model for Color Textured Images”, Image and Vision Computing, Vol. 24(10), pp. 1103-1114.
[17]
Z. Kato, and T.C. Pong, “A Markov Random Field Image Segmentation Model for Combined Color and Textured Features, W.Skarbek (ED)”, Proceedings of ICCAIP, Vol. 2124 of LNCS, Springer, Warsaw’, pp. 547-554. Z. Kato, T.C. Pong, and G. Song, “Unsupervised segmentation of color textured images using a multi-layer MRF model”, Proceedings of ICIP, Vol. (1), Spain, (2003) pp. 961-964. K.V. Mardia, “Markov Models and Bayesian Methods in image analysis”, Journal of Applied Statistics, Special Issue: Statistical Methods in Image Analysis, Vol. 16(2), pp. 125-130. D.K. Panjwani, and G. Healy, “Markov Random Field Models for Unsupervised Segmentation of Textured Color Images”, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 17(10). G.R. Shivani, P. Manika, and D. Shukhendu, “Unsupervised Segmentation of Texture Images Using a Combination of Gabor and Wavelet Features”, ICVGIP 2004, Proceedings of the Fourth Indian Conference on Computer Vision, Graphics & Image Processing, Kolkata, India’, Vol. December 16-18, 2004}, pp. 370-375. A.L.N. Fred, and A.K. Jain, “Data Clustering Using Evidence Accumulation”, In the Proceedings of the 16th Intl. on Pattern recognition ‘ICPR’02’ A. Strehl, and J. Ghosh, “Cluster Ensemble - a knowledge reuse framework for combining multiple partionings”, Journal of Machine Learning Research, Vol. 2(2002), pp. 583-617. R. Siddheswar and R.H. Turi, “Determination of Number of Clusters in k-means Clustering and application in Color Image Segmentation”, In Proceedings of the 4th Intl. Conf. on Advances in Pattern Recognition and Digital Techniques (ICAPRDT’99), vol. Calcutta, India, 1999 pages: 137-143 Wu Yiming, Yang Xiangyu, and Chan Kap Luk, “Unsupervised Color Image Segmentation based on Gaussian Mixture Model”, In Proceedings of the 2003 Joint Conf. of the 4th Intl. Conf. on Information, Communications and Signal Processing, Vol. 1(15-18 Dec. 2003), pages: 541-544 Y. Deng, and B.S. Manjunath, “Unsupervised Segmentation of ColorTexture Regions in Images and Videos”‘ In IEEE Trans. on Pattern Randen and J.H. Husoy, “Filtering for Texture Classification: A Comparative Study”, IEEE Transaction on Pattern Analysis and Machine Intelligence, Vol. April, 1999, pp. 291-310.Analysis and Machine Intelligences, Vol. 23(8), pp. 800-810 The Berkeley Segmentation Dataset(http://www.eecs.berkeley.edu/Research/Projects/C
S/vision/bsds/BSDS300/html
An Efficient Storage and Retrieval Technique for Documents Using Symantec Document Segmentation (SDS) Approach 1
Mohammad A. ALGhalayini1 and ELQasem ALNemah
2
Director, Computer and Information Unit, Vice Rectorate for Branches King Saud University, Riyadh, Saudi Arabia 2 College of Computer and Information Sciences King Saud University, Riyadh, Saudi Arabia
Abstract Today, many institutions and organizations are facing serious problems due to the tremendously increasing size of documents, and this problem is further triggering the storage and retrieval problems due to the continuously growing space and efficiency requirements. This problem is becoming more complex with time and the increase in the size and number of documents in an organization. Therefore, there is a growing demand to address this problem. This demand and challenge can be met by developing more efficient storage and retrieval techniques for electronic documents. Various techniques were developed and reported in the literature by different investigators. These techniques attempt to solve this problem to some extent but, most of the existing techniques still face the efficiency problem, in the case, the number and size of documents increase rapidly. The efficiency is further affected in the existing techniques when documents in a system are reorganized and then stored again. To handle these problems, we need special and efficient storage and retrieval techniques for this type of information retrieval (IR) systems. In this paper, we present an efficient storage and retrieval technique for electronic documents. The proposed technique uses the Symantec Document Segmentation (SDS) approach to make the technique more efficient than the existing techniques. The use of this approach minimizes the storage size of the documents and as a result makes the retrieval efficient. Introduction Accomplishing a paperless office has been a major concern for many institutions world wide. To approach this we have to control and understand the flow of all types of documents in all offices in an institution. Our main goal is to archive all flowing documents (and their related information) in an imaging database to be able to arrange our rapidly piling documents and to be able to retrieve their images (and their related information) faster and easier and hence, more effectively and efficiently.
In order to store and retrieve the images (and their related information) of the flowing documents in such an institution, not only we need to dedicate the technology or computers to do this, but also we have to worry about several issues such as the networking capabilities, the available pc’s, the storage media, the size of concurrent users, the image scanners specifications, the practical and efficient methods of documents image storage (in image size and type). In this research, we considered the case of King Saud University as a good model to adopt for documents storage and retrieval, since the amount of flowing documents is considerably high and the need to arrange, store and retrieve these flowing documents is fairly high. In addition, this KSU model could be adopted in other similar governmental and private institutions. Analyzing KSU A4 Documents KSU official paper has a unique style which is as follows : Each office or department name appears on the upper right side on the A4 page. KSU logo appears in the middle of the A4 page, and the number, date, and attachments of the document appear on the upper left side of the A4 page. In addition, the lower part is dedicated for the postal mail address. Figure (1) below shows an average sample of a Full A4 letter1 in KSU. In order to store an image of that letter, and similar documents images, in a dedicated database system or at least save these images on a local storage media, we want to figure out the proper image type to use, and the proper image size to store it with keeping in mind that we should preserve an acceptable viewing quality at all times. This was presented by applying the (POSSDI) Process2, which was the first paper related to this research.
1 2
Full A4 letter means that the page has textual contents. The Process Of Optimizing the Selection Of the Scanned Document Images.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 329–338. © Springer Science+Business Media B.V. 2008
330
ALGHALAYINI AND ALNEMAH
Our next step in minimizing the storage size of the document image for a faster image storage and retrieval purpose, is showing how to minimize the size of the stored image by semantically segmenting the Full A4 Document page into two main parts. The out side part which contains the three upper segments, the side marginal segment, and the lower segment, and the inside part (the Full A4 Body) which contains the contents of the document under study. Since all we need is actually an image of the Full Body (the inners framed area in Figure (2) below; therefore, there is actually no need to store the image of the outside upper and lower parts appearing on the official KSU paper for these information are stored as textual data in the proper database fields.
Segment 1
Segment 2
Segment 3
Segment 4
Segment 5
Segment 6
Figure (2) Segment 7
Segment 2 represents the area of the document where the university (or the institution) logo is displayed. This segment; therefore; can be excluded as well from our scanned image since the user who is interested in reviewing the contents of the document is usually not interested in reviewing the logo of the university each time he/she retrieves the document for reviewing.
Figure (1)
Semantically Segmenting The A4 Document As we can see in Figure (2) that the Full A4 sample document we are examining in KSU may be semantically segmented into 7 different sections or segments. These segments could be looked at semantically as follows : Segment 1 represents the area of the document where the issuing date should be written in addition to a description of the accompanying attachments (e.g. Manual, Report, Photos, Schedule, etc.). This segment usually contains the information, which is stored as textual data in their corresponding database table field(s). Therefore, we really can exclude this segment from our scanned image.
Segment 3 represents the area of the document where the ministry name is displayed, in addition to the university (or the institution) name, as well as the name of the college or the administrative department that the document is issued from. This segment also can be excluded from our scanned image, since the user who is interested in reviewing the contents of the document is usually not interested in reviewing the names of the ministry and the university each time he/she retrieves the document for reviewing, since the name of the issuing college or the administrative department is usually stored as textual data in the corresponding database table field(s). Segment 4 represents the area of the document where it is usually empty and is considered as the left and the right margins. Usually this area is about 3 cm from the right side and 3 cm from the left side and we can exclude at least two thirds of it and keep only 1 cm of margin from both sides for clarity purposes of the included text body (or Full Body) only. Segment 5 represents the area of the document where the textual body is located. This area usually contains the important contents
AN EFFICIENT STORAGE AND RETRIEVAL TECHNIQUE
to be reviewed, it is of a great importance that makes various institutions consider archiving their document piles for later accessing and reviewing. Hence, we may not exclude this area of course and we called it the Full A4 Body or in short the Full Body. In our study, we assumed that this area is 17 cm in width X 22.5 cm in height only3, and we scanned the body of all the previously scanned A4 images for storage sizes comparison purposes, which we will focus on in the next section.[26] Segment 6 represents a sub area of the Full Body. Initially, it was considered for being excluded from the scanned image; however, we reviewed a large group of the flowing documents and we found that, in most cases, this area contains a hand written contents4. Therefore, we really can not exclude this segment from our scanned image as well and it was preferred to be a part of the Full Body area. Segment 7 represents the area of the document where the issuing college or administrative department postal address is displayed. This segment; therefore; can be excluded from our scanned image since the user who is interested in reviewing the contents of the document is usually not interested in reviewing the postal address which can be stored as textual data in the corresponding database table field(s). Analyzing The Full Body Of The Scanned A4 Sample Page We examined the storage size of the Full A4 Page for all 6 different outputs (256 Gray Scale), (Black and White), (Millions of Colors), (256 Colors System Palette), (256 Colors Web Palette), and (256 Colors 8-bit) but we will only present the uncolored document images in this paper since the colored document images follow the same idea. Then we applied the proposed (POSSDI) Process to select the best image types to use among all 21 different image types5 and for 10 different scanning resolutions6. In the following sections we will analyze the Full Body Segment of the Full A4 scanned page by comparing the storage sizes of the Full A4 page and the Full Body Segment and show the percentage of the storage savings in sizes for both uncolored
331
output types (The 256 Gray Scale and The Black and White) as well. 1 - The (256 Gray Scale) Scanning Output Type First, it may be important, for clarity reasons, to follow the following procedure which we used for examining both uncolored scanning output results. I - We recall the Full A4 page storage sizes shown in Figure ( 3). II - Then we show the Full Body page segment storage sizes as well displayed below in Figure (4). III - We generate a figure showing the differences in storage sizes between both figures. In this case each image size in Figure (4) is subtracted from each corresponding image size in Figure (3). The resulting sizes for all 21 image types and 10 scanning resolutions are shown in Figure ( 5). IV - Even though Figure (5) shows the differences in storage sizes between the Full A4 scanned pages and the Full Body scanned segment, It is not necessary in our study to compare each image storage size appearing in Figure (3) with it’s corresponding image storage size appearing in Figure (4). Rather, we will consider the average storage sizes for each scanning resolution and do our comparison instead. A comparison between average storage sizes for each scanning resolution for the (256 Gray Scale) scanning output is sown in Table ( 1). V - At is point we are ready to calculate the percentage value (Pavg) of both averages in sizes between the Full A4 scanned page and the Full Body scanned page Segment as follows : Average size of the Full Body Scanned page Segment
(1)
Percentage Value (Pavg) = Average size of the Full A4 Scanned page
VI - Finally, we can calculate the percentage value (Psav) of the saving we gained by scanning the Full Body page Segment as follows :
3
The normal A4 page size is 21 cm in width and 29.9 cm in hight. The reviewing administrator usually writes a brief explanation or a redirection for the current document or a procedure to be taken , depending on the subject of the contents, in this area. 5 The tested image types were FlashPic *.pcx, Graphic Interchange Format *.gif, JPEG *.jpg, PaperPort Browser-Viewable *.htm, PaperPort Image *.max, PaperPort Self-Viewing *.exe, PC PaintBrush *.pcx, PCX Multi-page *.dcx, PDF Image *.pdf, Portable Network Graphics *.png, TIFF *.tif, TIFF - Class F *.tif, TIFF - Group 4 *.tif, TIFF - LZW *.tif, TIFF - Uncompressed *.tif, TIFF Multipage *.tif, TIFF Multi-page - Class F *.tif, TIFF Multi-page - Group 4 *.tif, TIFF Multi-page - LZW *.tif, TIFF Multi-page - Uncompressed *.tif, and the Windows Bitmap *.bmp . 6 The tested scanning resolutions were 75 dpi, 100 dpi, 150 dpi, 200 dpi, 250 dpi, 300 dpi, 350 dpi, 400 dpi, 450 dpi, and 500 dpi. 4
Percentage Value (Psav) = 100 % - Percentage Value (Pavg) The results of both (Pavg) and (Psav) are shown in Table (1) below.
332
ALGHALAYINI AND ALNEMAH Full A4 Scanned Pages Images Types vs. Storage Sizes in KiloBytes ( 256 Gray Shades)
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Scanning Resolution Image Type FlashPic *.fpx Graphic Interchange Format *.gif JPEG *.jpg PaperPort Browser-Viewable *.htm PaperPort Image *.max PaperPort Self-Viewing *.exe PC PaintBrush *.pcx PCX Multi-page *.dcx PDF Image *.pdf Portable Network Graphics *.png TIFF *.tif TIFF - Class F *.tif TIFF - Group 4 *.tif TIFF - LZW *.tif TIFF - Uncompressed *.tif TIFF Multi-page *.tif TIFF Multi-page - Class F *.tif TIFF Multi-page - Group 4 *.tif TIFF Multi-page - LZW *.tif TIFF Multi-page - Uncompressed *.tif Windows Bitmap *.bmp Minimum Storage Size Maximum Storage Size Average Storage Size Of the Full A4 Scanned Document Page
75 dpi 112.00 32.00 64.00 96.00 80.00 432.00 224.00 224.00 80.00 112.00 128.00 16.00 16.00 128.00 544.00 128.00 16.00 16.00 128.00 544.00 544.00 16.00 544.00
100 dpi 176.00 48.00 96.00 160.00 112.00 464.00 400.00 400.00 128.00 192.00 208.00 32.00 32.00 208.00 976.00 208.00 32.00 32.00 208.00 976.00 976.00 32.00 976.00
150 dpi 304.00 80.00 160.00 272.00 192.00 544.00 720.00 720.00 224.00 352.00 384.00 48.00 32.00 384.00 2,176.00 384.00 48.00 32.00 384.00 2,176.00 2,160.00 32.00 2,176.00
200 dpi 480.00 112.00 240.00 416.00 304.00 656.00 1,024.00 1,024.00 336.00 512.00 544.00 48.00 48.00 544.00 3,856.00 544.00 48.00 48.00 544.00 3,856.00 3,840.00 48.00 3,856.00
250 dpi 688.00 176.00 352.00 592.00 432.00 784.00 1,808.00 1,808.00 480.00 896.00 944.00 96.00 80.00 944.00 6,000.00 944.00 96.00 80.00 944.00 6,000.00 6,000.00 80.00 6,000.00
300 dpi 896.00 208.00 448.00 784.00 560.00 912.00 1,808.00 1,808.00 608.00 912.00 976.00 96.00 80.00 976.00 8,640.00 976.00 96.00 80.00 976.00 8,640.00 8,640.00 80.00 8,640.00
350 dpi 1,168.00 272.00 576.00 1,040.00 752.00 1,104.00 2,848.00 2,848.00 800.00 1,408.00 1,504.00 144.00 128.00 1,504.00 11,760.00 1,504.00 144.00 128.00 1,504.00 11,760.00 11,744.00 128.00 11,760.00
400 dpi 1,472.00 352.00 720.00 1,280.00 928.00 1,280.00 3,552.00 3,552.00 976.00 1,776.00 1,904.00 192.00 176.00 1,904.00 15,360.00 1,904.00 192.00 176.00 1,904.00 15,360.00 15,344.00 176.00 15,360.00
450 dpi 1,760.00 432.00 864.00 1,552.00 1,120.00 1,488.00 4,816.00 4,816.00 1,168.00 2,352.00 2,544.00 256.00 240.00 2,544.00 19,440.00 2,544.00 256.00 240.00 2,544.00 19,440.00 19,424.00 240.00 19,440.00
500 dpi 2,096.00 496.00 1,008.00 1,840.00 1,328.00 1,696.00 5,760.00 5,760.00 1,376.00 2,832.00 3,056.00 304.00 288.00 3,056.00 24,016.00 3,056.00 304.00 288.00 3,056.00 24,016.00 23,968.00 288.00 24,016.00
174.48
288.76
560.76
905.90
1,435.43
1,862.86
2,601.90
3,347.81
4,278.10
5,219.05
Figure (3) Shows Full A4 Scanned Pages Images 21 Types vs. Storage Sizes in KB for the 10 different scanning resolutions Full Body Scanned Pages Images Types vs. Storage Sizes in KiloBytes (256 Gray Shades) No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Scanning Resolution Image Type FlashPic *.fpx Graphic Interchange Format *.gif JPEG *.jpg PaperPort Browser-Viewable *.htm PaperPort Image *.max PaperPort Self-Viewing *.exe PC PaintBrush *.pcx PCX Multi-page *.dcx PDF Image *.pdf Portable Network Graphics *.png TIFF *.tif TIFF - Class F *.tif TIFF - Group 4 *.tif TIFF - LZW *.tif TIFF - Uncompressed *.tif TIFF Multi-page *.tif TIFF Multi-page - Class F *.tif TIFF Multi-page - Group 4 *.tif TIFF Multi-page - LZW *.tif TIFF Multi-page - Uncompressed *.tif Windows Bitmap *.bmp Minimum Storage Size Maximum Storage Size Average Storage Size Of the Full Body Scanned Document Page
75 dpi 84.00 32.00 48.00 80.00 64.00 416.00 192.00 192.00 60.00 96.00 112.00 16.00 16.00 112.00 336.00 112.00 16.00 16.00 112.00 336.00 336.00 16.00 416.00
100 dpi 132.00 32.00 72.00 128.00 80.00 448.00 288.00 288.00 96.00 144.00 160.00 16.00 16.00 160.00 592.00 160.00 16.00 16.00 160.00 592.00 592.00 16.00 592.00
150 dpi 228.00 64.00 120.00 208.00 144.00 496.00 528.00 528.00 168.00 272.00 288.00 32.00 32.00 288.00 1,312.00 288.00 32.00 32.00 288.00 1,312.00 1,312.00 32.00 1,312.00
200 dpi 360.00 96.00 180.00 320.00 224.00 576.00 768.00 768.00 252.00 400.00 416.00 48.00 32.00 416.00 2,320.00 416.00 48.00 32.00 416.00 2,320.00 2,320.00 32.00 2,320.00
250 dpi 516.00 128.00 264.00 448.00 320.00 672.00 1,344.00 1,344.00 360.00 672.00 720.00 64.00 64.00 720.00 3,600.00 720.00 64.00 64.00 720.00 3,600.00 3,600.00 64.00 3,600.00
300 dpi 672.00 160.00 336.00 560.00 416.00 768.00 1,328.00 1,328.00 456.00 688.00 736.00 64.00 64.00 736.00 5,216.00 736.00 64.00 64.00 736.00 5,216.00 5,216.00 64.00 5,216.00
350 dpi 876.00 208.00 432.00 752.00 544.00 896.00 2,096.00 2,096.00 600.00 1,056.00 1,136.00 96.00 96.00 1,136.00 7,056.00 1,136.00 96.00 96.00 1,136.00 7,056.00 7,056.00 96.00 7,056.00
400 dpi 1,104.00 272.00 540.00 928.00 672.00 1,024.00 2,928.00 2,928.00 732.00 1,440.00 1,568.00 144.00 128.00 1,568.00 9,200.00 1,568.00 144.00 128.00 1,568.00 9,200.00 9,200.00 128.00 9,200.00
450 dpi 1,320.00 320.00 648.00 1,120.00 816.00 1,168.00 3,824.00 3,824.00 876.00 1,872.00 2,048.00 192.00 176.00 2,048.00 11,648.00 2,048.00 192.00 176.00 2,048.00 11,648.00 11,632.00 176.00 11,648.00
500 dpi 1,572.00 384.00 756.00 1,328.00 960.00 1,312.00 4,800.00 4,800.00 1,032.00 2,336.00 2,576.00 240.00 240.00 2,576.00 14,384.00 2,576.00 240.00 240.00 2,576.00 14,384.00 14,384.00 240.00 14,384.00
144.00
214.86
406.86
646.86
1,010.29
1,286.86
1,792.00
2,352.00
2,981.33
3,676.95
Figure (4) Shows Full Body Scanned Pages Images 21 Types vs. Storage Sizes in KB for the 10 different scanning resolutions
333
AN EFFICIENT STORAGE AND RETRIEVAL TECHNIQUE Full A4 vs. Full Body Scanned Document Page Images Types vs. Differences in Storage Sizes in KiloBytes (256 Gray Shades) No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Scanning Resolution Image Type FlashPic *.fpx Graphic Interchange Format *.gif JPEG *.jpg PaperPort Browser-Viewable *.htm PaperPort Image *.max PaperPort Self-Viewing *.exe PC PaintBrush *.pcx PCX Multi-page *.dcx PDF Image *.pdf Portable Network Graphics *.png TIFF *.tif TIFF - Class F *.tif TIFF - Group 4 *.tif TIFF - LZW *.tif TIFF - Uncompressed *.tif TIFF Multi-page *.tif TIFF Multi-page - Class F *.tif TIFF Multi-page - Group 4 *.tif TIFF Multi-page - LZW *.tif TIFF Multi-page - Uncompressed *.tif Windows Bitmap *.bmp Minimum Difference in Size Maximum Difference in Size Average Difference in Size
75 dpi 28.00 0.00 16.00 16.00 16.00 16.00 32.00 32.00 20.00 16.00 16.00 0.00 0.00 16.00 208.00 16.00 0.00 0.00 16.00 208.00 208.00 0.00 208.00 41.90
100 dpi 44.00 16.00 24.00 32.00 32.00 16.00 112.00 112.00 32.00 48.00 48.00 16.00 16.00 48.00 384.00 48.00 16.00 16.00 48.00 384.00 384.00 16.00 384.00 89.33
150 dpi 76.00 16.00 40.00 64.00 48.00 48.00 192.00 192.00 56.00 80.00 96.00 16.00 0.00 96.00 864.00 96.00 16.00 0.00 96.00 864.00 848.00 0.00 864.00 181.14
200 dpi 120.00 16.00 60.00 96.00 80.00 80.00 256.00 256.00 84.00 112.00 128.00 0.00 16.00 128.00 1,536.00 128.00 0.00 16.00 128.00 1,536.00 1,520.00 0.00 1,536.00 299.81
250 dpi 172.00 48.00 88.00 144.00 112.00 112.00 464.00 464.00 120.00 224.00 224.00 32.00 16.00 224.00 2,400.00 224.00 32.00 16.00 224.00 2,400.00 2,400.00 16.00 2,400.00 482.86
300 dpi 224.00 48.00 112.00 224.00 144.00 144.00 480.00 480.00 152.00 224.00 240.00 32.00 16.00 240.00 3,424.00 240.00 32.00 16.00 240.00 3,424.00 3,424.00 16.00 3,424.00 645.71
350 dpi 292.00 64.00 144.00 288.00 208.00 208.00 752.00 752.00 200.00 352.00 368.00 48.00 32.00 368.00 4,704.00 368.00 48.00 32.00 368.00 4,704.00 4,688.00 32.00 4,704.00 904.19
400 dpi 368.00 80.00 180.00 352.00 256.00 256.00 624.00 624.00 244.00 336.00 336.00 48.00 48.00 336.00 6,160.00 336.00 48.00 48.00 336.00 6,160.00 6,144.00 48.00 6,160.00 1,110.48
450 dpi 440.00 112.00 216.00 432.00 304.00 320.00 992.00 992.00 292.00 480.00 496.00 64.00 64.00 496.00 7,792.00 496.00 64.00 64.00 496.00 7,792.00 7,792.00 64.00 7,792.00 1,437.90
500 dpi 524.00 112.00 252.00 512.00 368.00 384.00 960.00 960.00 344.00 496.00 480.00 64.00 48.00 480.00 9,632.00 480.00 64.00 48.00 480.00 9,632.00 9,584.00 48.00 9,632.00 1,709.71
Figure (5) Shows The Full A4 and The Full Body Scanned Pages Images Differences in Storage Sizes in KB for the 21 Image Types and 10 different scanning resolutions
75 dpi
100 dpi
150 dpi
200 dpi
250 dpi
300 dpi
350 dpi
400 dpi
450 dpi
500 dpi
Average Storage Size Of the Full A4 Scanned Document Page
174.48
288.76
560.76
905.90
1,435.43
1,862.86
2,601.90
3,347.81
4,278.10
5,219.05
Average Storage Size Of the Full Body Scanned Document Page
144.00
214.86
406.86
646.86
1,010.29
1,286.86
1,792.00
2,352.00
2,981.33
3,676.95
Average Storage Size of Full Body / Average Storage Size of Full A4 Pages (Pavg)
82.53%
74.41%
72.55%
71.40%
70.38%
69.08%
68.87%
70.25%
69.69%
70.45%
Percentage of Storage Space Saving (Psav)
17.47%
25.59%
27.45%
28.60%
29.62%
30.92%
31.13%
29.75%
30.31%
29.55%
Table (1) Shows the Percentage of Storage Space Savings when the Full Body is used instead of the Full A4 Scanned Page
334
ALGHALAYINI AND ALNEMAH
Analysis Of Table (1) By closely observing Figure (5) we can find out that some images sizes differences are zeros and this means that those Full Body scanned page segment images sizes are equal to their corresponding Full A4 page sizes since for certain images types the quality grows automatically when the scanned area is smaller. This is why, we will consider all the scanned image types for each scanning resolution and investigate in general the average saving in sizes between the Full A4 scanned page and the Full Body scanned page segment for all the images types but for each scanning resolution individually. As we can see from Table (1) that the percentage of storage savings value (Psav) ranges between 17.47 % for the 75 dpi scanning resolution and 31.13 % for the 350 dpi scanning resolution, which is a very acceptable percentage meaning that we can save at least 17.47 % and at most 31.13 % of the space used by saving the Full Body page segment instead of the Full A4 scanned page for the (256 Gray Scale) output. Graph (1) below show a graphical representation of the differences in average storage sizes for the (256 Gray Scale) scanned page for all scanning resolution outputs.
III - We generate a figure showing the differences in storage sizes between both figures. In this case each image size in Figure (7) is subtracted from each corresponding image size in Figure (6). The resulting sizes for all 21 image types and 10 scanning resolutions are shown in Figure (8). IV - Even though Figure (8) shows the differences in storage sizes between the Full A4 scanned pages and the Full Body scanned Segment, It is not necessary in our study to compare each image storage size appearing in Figure (6) with it’s corresponding image storage size appearing in Figure (7). Rather, we will consider the average storage sizes for each scanning resolution and do our comparison instead. A comparison between average storage sizes for each scanning resolution for the (Black and White) scanning output is sown in Table (2). V - At is point we are ready to calculate the percentage value (Pavg) of both averages in sizes between the Full A4 scanned page and the Full Body scanned page Segment as follows : Average size of the Full Body Scanned page Segment
(2)
Percentage Value (Pavg) = Average size of the Full A4 Scanned page
Average Storage Sizes of Full A4 Scanned Pages vz Full Body Scanned Pages ( 256 Gray Scale )
VI - Finally we can calculate the percentage value (Psav) of the saving we gained by scanning the Full Body page Segment as follows :
6,000.00
5,000.00
Storage Sizes in KB
4,000.00
Percentage Value (Psav) = 100 % - Percentage Value (Pavg)
3,000.00
The results of both (Pavg) and (Psav) are shown in Table (2). 2,000.00
1,000.00
0.00 dpi
dpi
dpi
dpi
dpi
dpi
dpi
dpi
dpi
dpi
75
100
150
200
250
300
350
400
450
500
Scaning Resolutions Average Storage Size Of the Full A4 Scanned Document Page
Average Storage Size Of the Full Body Scanned Document Page
Graph (1) Shows the a line graph comparison of both the Full A4 and the Full Body scanned page image storage Space used
2 - The (Black and White) Scanning Output Type First, it may be important, for clarity reasons, to follow the same procedure which we used previously for examining the (Black and White) scanning output results. I - We recall the Full A4 page storage sizes shown in Figure (6). II - Then we show the Full Body page segment storage sizes as well displayed in Figure (7).
335
AN EFFICIENT STORAGE AND RETRIEVAL TECHNIQUE Full A4 Scanned Pages Images Types vs. Storage Sizes in KiloBytes (Black and White) No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Scanning Resolution Image Type FlashPic *.fpx Graphic Interchange Format *.gif JPEG *.jpg PaperPort Browser-Viewable *.htm PaperPort Image *.max PaperPort Self-Viewing *.exe PC PaintBrush *.pcx PCX Multi-page *.dcx PDF Image *.pdf Portable Network Graphics *.png TIFF *.tif TIFF - Class F *.tif TIFF - Group 4 *.tif TIFF - LZW *.tif TIFF - Uncompressed *.tif TIFF Multi-page *.tif TIFF Multi-page - Class F *.tif TIFF Multi-page - Group 4 *.tif TIFF Multi-page - LZW *.tif TIFF Multi-page - Uncompressed *.tif Windows Bitmap *.bmp Minimum Storage Size Maximum Storage Size Average Storage Size Of the Full A4 Scanned Document Page
75 dpi 112 16 48 32 16 368 16 16 16 16 16 16 16 16 80 16 16 16 16 80 80 16.00 368.00
100 dpi 176 16 96 32 16 368 32 32 16 16 16 16 16 16 128 16 16 16 16 128 128 16.00 368.00
150 dpi 336 32 176 32 32 384 64 64 32 32 16 32 16 32 272 16 32 16 32 272 288 16.00 384.00
200 dpi 528 64 288 48 32 384 112 112 32 48 32 32 32 48 496 32 32 32 48 496 496 32.00 528.00
250 dpi 720 80 384 64 32 400 144 144 32 48 32 48 32 64 752 32 48 32 64 752 768 32.00 768.00
300 dpi 976 96 496 64 48 400 192 192 48 80 48 64 48 80 1,088 48 64 48 80 1,088 1,104 48.00 1,104.00
350 dpi 1,232 112 624 80 48 400 224 224 48 96 48 64 48 96 1,472 48 64 48 96 1,472 1,488 48.00 1,488.00
400 dpi 1,520 144 752 80 64 416 272 272 48 96 48 80 48 112 1,920 48 80 48 112 1,920 1,936 48.00 1,936.00
450 dpi 1,792 160 864 96 64 416 320 320 48 96 64 80 64 128 2,448 64 80 64 128 2,448 2,448 48.00 2,448.00
500 dpi 2,112 192 1,008 96 64 416 384 384 64 112 64 96 64 144 3,008 64 96 64 144 3,008 3,024 64.00 3,024.00
48.76
62.48
105.14
163.05
222.48
302.48
382.48
476.95
580.57
695.62
Figure (6) Shows Full A4 Scanned Pages Images 21 Types vs. Storage Sizes in KB for the 10 different scanning resolutions Full Body Scanned Pages Images Types vs. Storage Sizes in KiloBytes (Black and White) No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Scanning Resolution Image Type FlashPic *.fpx Graphic Interchange Format *.gif JPEG *.jpg PaperPort Browser-Viewable *.htm PaperPort Image *.max PaperPort Self-Viewing *.exe PC PaintBrush *.pcx PCX Multi-page *.dcx PDF Image *.pdf Portable Network Graphics *.png TIFF *.tif TIFF - Class F *.tif TIFF - Group 4 *.tif TIFF - LZW *.tif TIFF - Uncompressed *.tif TIFF Multi-page *.tif TIFF Multi-page - Class F *.tif TIFF Multi-page - Group 4 *.tif TIFF Multi-page - LZW *.tif TIFF Multi-page - Uncompressed *.tif Windows Bitmap *.bmp Minimum Storage Size Maximum Storage Size Average Storage Size Of the Full Body Scanned Document Page
75 dpi 84.0 16.0 36.0 32.0 16.0 368.0 16.0 16.0 12.0 16.0 16.0 16.0 16.0 16.0 48.0 16.0 16.0 16.0 16.0 48.0 48.0 16.00 368.00
100 dpi 132.0 16.0 72.0 32.0 16.0 368.0 32.0 32.0 12.0 16.0 16.0 16.0 16.0 16.0 80.0 16.0 16.0 16.0 16.0 80.0 80.0 16.00 368.00
150 dpi 252.0 32.0 132.0 32.0 16.0 384.0 48.0 48.0 24.0 32.0 16.0 32.0 16.0 32.0 176.0 16.0 32.0 16.0 32.0 176.0 176.0 16.00 592.00
200 dpi 396.0 48.0 216.0 48.0 32.0 384.0 80.0 80.0 24.0 32.0 16.0 32.0 16.0 32.0 304.0 16.0 32.0 16.0 32.0 304.0 304.0 16.00 912.00
250 dpi 540.0 64.0 288.0 48.0 32.0 384.0 112.0 112.0 24.0 48.0 32.0 32.0 32.0 48.0 464.0 32.0 32.0 32.0 48.0 464.0 464.0 32.00 1,232.00
300 dpi 732.0 80.0 372.0 48.0 32.0 384.0 144.0 144.0 36.0 64.0 32.0 48.0 32.0 64.0 656.0 32.0 48.0 32.0 64.0 656.0 656.0 32.00 1,584.00
350 dpi 924.0 80.0 468.0 64.0 48.0 400.0 176.0 176.0 36.0 64.0 32.0 48.0 32.0 80.0 896.0 32.0 48.0 32.0 80.0 896.0 896.0 32.00 1,952.00
400 dpi 1,140.0 112.0 564.0 64.0 48.0 400.0 208.0 208.0 36.0 80.0 32.0 48.0 32.0 80.0 1,168.0 32.0 48.0 32.0 80.0 1,168.0 1,168.0 32.00 2,288.00
450 dpi 1,344.0 128.0 648.0 64.0 48.0 400.0 240.0 240.0 36.0 80.0 48.0 64.0 48.0 96.0 1,456.0 48.0 64.0 48.0 96.0 1,456.0 1,472.0 48.00 2,688.00
500 dpi 1,584.0 144.0 756.0 80.0 48.0 400.0 272.0 272.0 48.0 0.0 48.0 64.0 48.0 112.0 1,808.0 48.0 64.0 48.0 112.0 1,808.0 1,824.0 0.00 3,072.00
42.10
52.19
81.90
116.38
158.67
207.43
262.29
321.33
386.86
456.57
Figure (7) Shows Full Body Scanned Pages Images 21 Types vs. Storage Sizes in KB for the 10 different scanning resolutions
336
ALGHALAYINI AND ALNEMAH Full A4 vs. Full Body Scanned Document Page Images Types vs. Differences in Storage Sizes in KiloBytes (Black and White)
No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Scanning Resolution Image Type FlashPic *.fpx Graphic Interchange Format *.gif JPEG *.jpg PaperPort Browser-Viewable *.htm PaperPort Image *.max PaperPort Self-Viewing *.exe PC PaintBrush *.pcx PCX Multi-page *.dcx PDF Image *.pdf Portable Network Graphics *.png TIFF *.tif TIFF - Class F *.tif TIFF - Group 4 *.tif TIFF - LZW *.tif TIFF - Uncompressed *.tif TIFF Multi-page *.tif TIFF Multi-page - Class F *.tif TIFF Multi-page - Group 4 *.tif TIFF Multi-page - LZW *.tif TIFF Multi-page - Uncompressed *.tif Windows Bitmap *.bmp Minimum Difference in Size Maximum Difference in Size Average Difference in Size
75 dpi 28.0 0.0 12.0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 32.0 0.0 0.0 0.0 0.0 32.0 32.0 0.00 32.00 6.67
100 dpi 44.0 0.0 24.0 0.0 0.0 0.0 0.0 0.0 4.0 0.0 0.0 0.0 0.0 0.0 48.0 0.0 0.0 0.0 0.0 48.0 48.0 0.00 48.00 10.29
150 dpi 84.0 0.0 44.0 0.0 16.0 0.0 16.0 16.0 8.0 0.0 0.0 0.0 0.0 0.0 96.0 0.0 0.0 0.0 0.0 96.0 112.0 0.00 112.00 23.24
200 dpi 132.0 16.0 72.0 0.0 0.0 0.0 32.0 32.0 8.0 16.0 16.0 0.0 16.0 16.0 192.0 16.0 0.0 16.0 16.0 192.0 192.0 0.00 192.00 46.67
250 dpi 180.0 16.0 96.0 16.0 0.0 16.0 32.0 32.0 8.0 0.0 0.0 16.0 0.0 16.0 288.0 0.0 16.0 0.0 16.0 288.0 304.0 0.00 304.00 63.81
300 dpi 244.0 16.0 124.0 16.0 16.0 16.0 48.0 48.0 12.0 16.0 16.0 16.0 16.0 16.0 432.0 16.0 16.0 16.0 16.0 432.0 448.0 12.00 448.00 95.05
350 dpi 308.0 32.0 156.0 16.0 0.0 0.0 48.0 48.0 12.0 32.0 16.0 16.0 16.0 16.0 576.0 16.0 16.0 16.0 16.0 576.0 592.0 0.00 592.00 120.19
400 dpi 380.0 32.0 188.0 16.0 16.0 16.0 64.0 64.0 12.0 16.0 16.0 32.0 16.0 32.0 752.0 16.0 32.0 16.0 32.0 752.0 768.0 12.00 768.00 155.62
450 dpi 448.0 32.0 216.0 32.0 16.0 16.0 80.0 80.0 12.0 16.0 16.0 16.0 16.0 32.0 992.0 16.0 16.0 16.0 32.0 992.0 976.0 12.00 992.00 193.71
500 dpi 528.0 48.0 252.0 16.0 16.0 16.0 112.0 112.0 16.0 112.0 16.0 32.0 16.0 32.0 1,200.0 16.0 32.0 16.0 32.0 1,200.0 1,200.0 16.00 1,200.00 239.05
Figure (8) Shows The Full A4 and The Full Body Scanned Pages Images Differences in Storage Sizes in KB for the 21 Image Types and 10 different scanning resolutions
75
100
150
200
250
300
350
400
450
500
dpi
dpi
dpi
dpi
dpi
dpi
dpi
dpi
dpi
dpi
Average Storage Size Of the Full A4 Scanned Document Page
48.76
62.48
105.14
163.05
222.48
302.48
382.48
476.95
580.57
695.62
Average Storage Size Of the Full Body Scanned Document Page
42.10
52.19
81.90
116.38
158.67
207.43
262.29
321.33
386.86
456.57
Average Storage Size of Full Body / Average Storage Size of Full A4 pages (Pavg)
86.33%
83.54%
77.90%
71.38%
71.32%
68.58%
68.58%
67.37 %
66.63%
65.64%
Percentage of Storage Space Saving (Psav)
13.67%
16.46%
22.10%
28.62%
28.68%
31.42%
31.42%
32.63 %
33.37%
34.36%
Table (2) Shows the Percentage of Storage Space Saving when the Full Body is used instead of the Full A4 Scanned Page
AN EFFICIENT STORAGE AND RETRIEVAL TECHNIQUE
Analysis Of Table (2) By closely observing Figure (8) we can find out that some images sizes differences are zeros and this means that those Full Body scanned image segments sizes are equal to their corresponding Full A4 scanned page sizes since they for certain images types the quality grows automatically when the scanned area is smaller. This is why, we will consider all the scanned image types for each scanning resolution and investigate in general the average saving in sizes between the Full A4 scanned page and the Full Body scanned page segment for all the image types but for each scanning resolution individually. As we can see from Table (2) that the percentage of storage savings value (Psav) ranges between 13.67 % for the 75 dpi scanning resolution and 34.36 % for the 500 dpi scanning resolution, which is a very acceptable percentage meaning that we can save at least 13.67 % and at most 34.36 % of the space used in saving the Full Body page segments instead of the Full A4 scanned page for the (Black and White) scanning output. Graph (2) below show a graphical representation of the differences in the average storage sizes for the (Black and White) scanned pages output for all considered scanning resolutions.
337
storage space by using the (SDS) Model if we used 100 dpi scanning resolution, and we may save 27.45% of storage space by using the (SDS) Model if we used 150 dpi scanning resolution. To appreciate these savings percentages, we may assume that we are saving 1000 pages images in KSU on a daily bases : Storage space required to store 1 Full A4 scanned page in (PDF) image format using 100 dpi = 128 KB7. Storage space required to store 1000 Full A4 page in (PDF) image format using 100 dpi = 128 KB * 1000 = 128,000 KB ≈ 128 MB8. Storage space saved by adapting the (SDS) Model to store 1000 Full Body scanned page segment in (PDF) image format using 100 dpi = 128 MB * 25.59% = 32.75 MB. In addition to the considerable size of storage savings when the Full Body model is used, It was observed that the more scanning resolution used in scanning the page images, the more difference in the averages of the storage sizes between the Full A4 scanned page sizes and the Full Body scanned page segment sizes. 2 - The Benefit Of Adopting The (SDS) Model With The (Black And White) Output Scanning Type
Average Storage Sizes of Full A4 Scanned Pages vz Full Body Scanned Pages ( Black and White )
It was proven in a previous experiment related to this research paper that the most optimal scanning resolutions for this scanning output type (The Black And White) were the 300 dpi and the 350 dpi, therefore, and according to Table (2) we may save 31.42 % of storage space by using the (SDS) Model if we used 300 dpi scanning resolution, and we may save 31.42 % of storage space by using the (SDS) Model if we used 350 dpi scanning resolution.
800.00
700.00
Storage Sizes in KB
600.00
500.00
400.00
300.00
200.00
To appreciate these savings percentages, we may assume that we are saving 1000 pages in KSU on a daily bases :
100.00
0.00 dpi
dpi
dpi
dpi
dpi
dpi
dpi
dpi
dpi
dpi
75
100
150
200
250
300
350
400
450
500
Scaning Resolutions
Average Storage Size Of the Full A4 Scanned Document Page
Storage space required to store 1 Full A4 page in (PDF) image format using 300 dpi = 48 KB9.
Average Storage Size Of the Full Body Scanned Document Page
Graph (2) Shows the a line graph comparison of both the Full A4 and the Full Body scanned page image storage Space used
Conclusion and Summary 1 - The Benefit Of Adopting The (SDS) Model With The (256 Gray Scale) Scanning Output Type It was proven in a previous experiment related to this research paper, that the most optimal scanning resolutions for this output type (256 GRAY SCALE) were the 100 dpi and the 150 dpi, therefore, and according to Table (1) we may save 25.59% of
Storage space required to store 1000 Full A4 page in (PDF) image format using 300 dpi = 48 KB * 1000 = 48,000 KB ≈ 48 MB. Storage space saved by adapting the (SDS) Model to store 1000 Full Body page segment in (PDF) image format using 300 dpi = 48 MB * 31.42 % = 15.08 MB.
7 8 9
In (256 Gray Scale ) 1024 KB = 1 MB In (Black and White )
338
ALGHALAYINI AND ALNEMAH
In addition to the considerable size of storage savings when the Full Body model is used, It was observed that the more scanning resolution used in scanning the page images, the more difference in the averages of the storage sizes between the Full A4 scanned page sizes and the Full Body scanned page segment sizes.
[15]
Siganos Dimitrios and Stergiou Christos, “Questions and Answers”, available at: http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol3/cs11/test.html
[16]
“Basic Images Processing” : “Some general commands related to handling Matlab graphics and printing” “Simple image processing operations that you can do with Matlab” available at : http://noodle.med.yale.edu/~papad/ta/handouts/matlab_image.html
[17]
“Document Image Analysis” available at : http://elib.cs.berkeley.edu/dia.html
[18]
“Getting Started with MATLAB” available at : http://www.stewart.cs.sdsu.edu/cs205/module7/getting6.html
[19]
“Matlab Image Processing Toolbox” available at : http://homepages.inf.ed.ac.uk/rbf/HIPR2/impmatl.htm
[20]
“Matlab Resources” available at : http://www.cse.uiuc.edu/heath/scicomp/matlab.htm
[21]
“Read and Display an Image” available at : http://www.mathworks.com/access/helpdesk/help/t oolbox/images/getting8.html
[22]
“Reading and displaying images” “matlab toutorial”,03/25/2005, available at : http://ai.ucsd.edu/Tutorial/matlab.html#images
[23]
“Recognition Stages Characters and Their Features” “Arabic Writing Examples” avaliable at : http://oacr.tripod.com/..
[24]
“What’s OCR?” available at : http://www.dataid.com/aboutocr.htm
[25]
WIKIPEDIA Encyclopedia, “Pattern Recognition”, available on the following site: http://en.wikipedia.org/wiki/Pattern_recognition
[26]
ALGhalayini M., Shah A., “Introducing The ( POSSDI ) Process : The Process of Optimizing the Selection of The Scanned Document Images”. International Joint Conferences on Computer, Information, Systems Sciences, and Engineering (CIS2E 06) December 4 - 14, 2006
References: [1]
Abhipsita Singh, Parvati Iyer and Dr.S.Sanyal, “Optical Character Recognition Systems for noisy images” available on the following site: http://profile.iiita.ac.in/ipchandrashekar_02/publications/ocr.pdf
[2]
Cavers Ian, “Graphics and Related Issues”, Fri Dec 4 15:01:52 PST 1998 , http://www.cs.ubc.ca/spider/cavers/MatlabGuide/node12.html
[3]
Al-abdulkareem Eatedal A., “Off-line Arabic characters Recognition” , Supervised .by: Dr.Feryal Haj Hassan ,KSU Computer Science Department, 2004.
[4]
Dr Fiaz Hussain, Dr John Cowell, “Resolving Conflicts Between Arabic and Latin Character Recognition”, De Montfort University, England And Dubai Polytechnic University, UAE. Available on the following site: http:// www.cse.dmu.ac.uk
[5]
Richard Casey, “ Document Image Analysis” , available at: http://cslu.cse.ogi.edu/HLTsurvey/ch2node4.html
[6]
H Bunke And P S P Wang, “Handbook Of Character Recognition And Document Image Analysis”, available at : http://www.worldscibooks.com/compsci/2757.html
[7]
Meyerdierks, D. Ward-Thompson , “Baseline correction” available at: http://www.starlink.rl.ac.uk/star/docs/sc1.htx/node29.html
[8]
Haigh Susan,”Optical Character Recognition (OCR) as a Digitization Technology” “The Imaging Process” “An Overview of Imaging and OCR” 15 Nov 1996 available at http://www.collectionscanada.ca/9/1/p1-236-e.html
[9]
Ibrahim S. I. Abuhaiba, ‘A Discrete Arabic Script For Better Automatic Document Understanding’, Islamic University of Gaza, Palestine. Available at the following site: http:// www.kfupm.edu.sa/publications/ajse/articles/281B_05P.pdf
[10] K. Andras, “Optical Character Recognition” available at : http://www.kornai.com/MathLing/ocr.html “
[11] Klassen Tim, “ Features of the Arabic Language available at : http://torch.cs.dal.ca/~klassen/ocr/arabic.htm
on 17 Nov 2000 ,
[12] Maged Mohamed Fahmy, Somaya Al Ali, ‘Automatic Recognition of Handwritten Arabic Characters Using Their Geometrical Features’, University Of Bahrain, Bahrain, University of Qatar, Qatar. Available at the following site: http://www.ici.ro/ici/revista/sic2001_2/art1.htm [13] M Simone, “ Document Image Analysis And Recognition” http://www.dsi.unifi.it/~simone/DIAR/ [14] Mulliner Jason, “PC-Based Software for Optical Character Recognition”, sep 2003, available at: http://www.findarticles.com/p/articles/mi_qa3957/is_200309/ai_n9277794
A New Persian/Arabic Text Steganography Using “La” Word Mohammad Shirali-Shahreza Computer Science Department Sharif University of Technology Tehran, IRAN [email protected] Abstract- By expanding communication, in some cases there is a need for hidden communication. Steganography is one of the methods used for hidden exchange of information. Steganography is a method to hide the information under a cover media such as image or sound. In this paper a new method for Steganography in Persian and Arabic texts is presented. Here we use the special form of “La” word for hiding the data. This word is created by connecting “Lam” and “Alef” characters. For hiding bit 0 we use the normal form of word “La” (" )"ﻟـﺎby inserting Arabic extension character between “Lam” and “Alef” characters. But for hiding bit 1 we use the special form of word “La” (" )"ﻻwhich has a unique code in Unicode standard texts. Its code is FEFB in Unicode hex notation. This method is not limited to electronic documents (E-documents) and also can be used on printed documents. This approach can be categorized under feature coding methods. Keywords⎯ Feature Coding, Information Security, Text Steganography, Persian/Arabic Text, Unicode Standard.
I. INTRODUCTION In the 21st century communications are expanded because of developing new technologies such as computers, Internet, mobile phones, etc. By using these technologies in different areas of life and work, the issue of security of information has gained special significance. Hidden exchange of information is one of the important areas of information security which includes various methods like cryptography, steganography, and coding. In Steganography the information is hidden in a cover media so nobody notice the existence of the secret information. Steganography works have been carried out on different medias such as images, video clips and sounds [1]. Text steganography is the most difficult kind of steganography because there is no redundant information in a text file as compared with a picture or a sound file [2]. The structure of text documents is identical with what we observe, while in other types of documents such as in picture, the structure of document is different from what we observe. Therefore, in such documents, we can hide information by introducing changes in the structure of the document without making a notable change in the concerned output. Contrary to other media such as sounds and video clips, using text documents has been common since very old times. This has extended until today and still, using text is preferred
over other media, because the texts occupy lesser space, communicate more information and need less cost for printing as well as some other advantages. As the use of text and hidden communication goes back to antiquity, we have witnessed to steganography of information in texts since past. For example this method has been done by some Iranian classic poets as well. Today, the computer systems have facilitated information hiding in texts. The range of using information hiding in text has also developed from hiding information in electronic texts and documents to hide information in web pages. Most of the text steganography methods are for English texts and there are a few text steganography methods for other languages. In this paper we introduce a new text steganography method for Persian and Arabic texts. Our method is based on the special form of “La” word. This word is created by connecting “Lam” and “Alef” characters. The normal form of “La” (" )"ﻟـﺎis used for hiding bit 0 and special form of word “La” (" )"ﻻis used for hiding bit 1. The details of our method and its experimental results are described in the third section. As we said earlier, there are a few text steganography methods for non English languages. In the next section we will discuss two reported Arabic text steganography methods. In the final section the conclusion will be made after investigating and studying some advantages of this method. II. RELATED WORKS Most of the text steganography methods are done for English texts, but there are a few methods for Arabic text steganography. In this section we review two reported Arabic text steganography methods. Similar to our proposed method, both of these methods can be classified under feature coding methods. The first method which is invented by the author uses pointed letters for hiding data [3]. In the second method the authors using both the existence of the points in the letters and the redundant Arabic extension character for hiding data [4]. Before explaining the details of these methods, first we describe the feature coding method.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 339–342. © Springer Science+Business Media B.V. 2008
340
SHIRALI-SHAHREZA
A. Feature Coding [5] In feature coding methods, some of the features of the text are altered. For example, the end part of some characters such as h, d, b or so on, are elongated or shortened a little thereby hiding information in the text. In this method, a large volume of information can be hidden in the text without making the reader aware of the existence of such information in the text. By placing characters in a fixed shape, the information is lost. In some feature coding methods, retyping the text or using OCR program destroys the hidden information. B. Steganography by Shifting Character Points [3] Although, both Arabic and English languages have points in their letters, the amount of pointed letters differ too much. English language has points in only two letters, small "i" and small "j", while Arabic has points in 14 letters out of its 28 alphabet letters as shown in Table 1. This large number of points in Arabic letters made the points in any given Arabic text remarkable and can be utilized for information security and watermarking as presented in this method. In this method we proposed to hide information in the points of the Arabic letters. To be specific, we hide the information in the points’ location within the pointed letters. First, the hidden information is looked at as binary with the first several bits (for example, 20 bits) to indicate the length of the hidden bits to be stored. Then, the cover medium text is scanned. Whenever a pointed letter is detected its’ point location may be affected by the hidden info bit. If hidden value bit is one the point is slightly shifted up; otherwise, the concerned cover-text character point location remains unchanged. This point shifting process is shown in Figure 1 for the Persian letter ‘Noon’. In order to divert the attention of readers, after hiding all information, the points of the remaining characters are also changed randomly. Note that, as mentioned earlier, the size of hidden bits is known and also hidden in the first 20 bits. Table 1-
Letters without point
Persian Letters [3]
Letters with one point
Letters with two points
Letters with three points
Fig. 1. Vertical displacement of the points for the Persian letter NOON [3]
This method of point shifting may have its advantages in security and capacity; it can store a large number of hidden bits within any Arabic text. However, it has a main drawback, its low robustness making it unpractical. For example, the hidden information is lost in any retyping or scanning. The output text has a fixed frame due to the use of only one font. In fact, this information security method is appropriate with its font type of characters, which is not standard and can be lost or changed easily. C. Steganography Using Letter Points and Extensions [4] In this method the authors use the pointed letters with extension to hold secret bit ‘one’ and the un-pointed letters with extension to hold secret bit ‘zero’. Note that letter extension doesn’t have any effect to the writing content. It has a standard character hexadecimal code: 0640 in the Unicode system. In fact, this Arabic extension character in electronic typing is considered as a redundant character only for arrangement and format purposes. The only bargain in using the extension is that not all letters can be extended with this extension character due to their position in words and Arabic writing nature. The extension can only be added in locations between connected letters of Arabic text; i.e. extensions cannot be placed after letters at end of words or before letter at beginning. The proposed steganography hypothesis is that whenever a letter cannot have an extension or found intentionally without extension it is considered not holding any secret bits. This proposed steganography method can have the option of adding extensions before or after the letters. To be consistent, however, the location of the extensions should be the same throughout the complete steganography document. Assume they add the extensions after the letters. Figure 2 shows an example of this method. In this example the secret bits “110010” are hidden in an Arabic sentence. To add more security and misleading to trespassers, both options of adding extensions before and after the letters can be used within the same document but in different paragraphs or lines. For example, the even lines or paragraphs use steganography of extensions after the letters and the odd use extensions before or vice versa.
A NEW PERSIAN/ARABIC TEXT STEGANOGRAPHY USING “LA” WORD Watermarking bits
110010
Cover-text
Output text
Fig. 2. Steganography example adding extensions after letters [4]
III. OUR SUGGESTED METHOD By considering the special characteristics of the Persian and Arabic scripts, this paper presents a new text steganography method for Persian and Arabic texts. In this section first we explain some characteristics of these two languages. Then we explain the Unicode Standard briefly and at last we explain our suggested method in full details. A. The Characteristics of Persian and Arabic [6] Arabic alphabet has 28 letters. Persian has all the letters of Arabic and four more letters of (گ،ژ،چ،)پ. In these two languages a letter can have four different shapes. The shape of each letter is determined by the position of that letter in a word. For example the letter « »عis written as « »ﻋــat the beginning of a word, like « »ـﻌـin the middle, as « »ـﻊat the end, and as « »عin the separate (isolated) position. In Persian and Arabic the letters are connected to each other in writing while in English the letters are written separately. In English the letters are written in a left-to-right format and in some languages the letters are written in a top-to-bottom format but in Arabic and Persian the letters are written in a right-to-left format. In Arabic and Persian point is very important and 17 of 32 Persian letters (and 14 of 28 Arabic letters) have one or more points. Among these 17 letters, 2 letters have 2 points and 5 letters have 3 points and the remaining 10 letters have one single point (Table 1) while in English only two small letters "i" and "j" have point. When typing Persian and Arabic texts, in some cases such as to finish a sentence in the same column, sometimes the Arabic extension character is used for lengthening a word. This character has no specific function and is used only for adding to the beauty of the text. In Persian and Arabic writing, in order to add to the beauty of the text, when the “Lam” ( )لand “Alef” ( )اcharacters come after each other, they are written as ""ﻻ. We use this characteristic of Arabic and Persian languages in our method and will explain it in this section.
341
B. Unicode Standard [7] Unicode Standard is the international character-encoding standard used for presenting the texts to process by computers. This standard is compatible to the second version of ISO/IEC 10646-1:2000 which has the same characters and codes of ISO/IEC 10646. Unicode enables us to encode all the characters used in writing of the languages of the world. This standard uses the 16-bit encoding which provides enough space for 65000 characters, that is to say, it is possible to specify and define 65000 characters in different moulds such as numbers, letters, symbols, and a great number of current characters in different languages of the world. Unicode has determined codes for all the characters used in main languages of the world. Moreover because of the wideness of the space dedicated to the character this standard also includes most of the symbols necessary for high-quality typesetting. The languages whose writing system can be supported by this standard are Latin (covering most of the European languages), Cyrillic (Russian and Serbian), Greek, Arabic (including Arabic, Persian, Urdu, Kurdish), Hebrew, Indian, Armenian, Assyrian, Chinese, Katakana, Hiragana (Japanese), Hangeul (Korean). Moreover there are a lot of mathematical and technical symbols, punctuation signs, arrow, and miscellaneous signs in this standard. In Unicode standard Arabic block has been developed to cover the characters of the languages which use Arabic writing system. Among these languages we can mention Persian, Urdu, Pashto, Sindhi, and Kurdish. Moreover this standard has detailed and careful explanations about the implementation methods including letters-connection method, the exhibition of the right-to-left and bi-direction texts. This way the programmers don’t have to refer to the local guide. C. Our method As we said in section III.A, in Persian and Arabic writing, when the “Lam” and “Alef” characters come after each other, they are written as " "ﻻto add to the beauty of the text. But in the normal form, it is written as ""ﻟـﺎ. We have used this characteristic of Persian and Arabic languages in order to hide data in Persian and Arabic texts written in Unicode format. In the electronic texts written in Unicode format, word “La” is written in a special form as a one character (")"ﻻ. The code of “La” character in Unicode Standard is FEFB. For writing “La” in the normal form, it is needed to insert the Arabic extension character between “Lam” and “Alef” characters. The hexadecimal code of Arabic extension character in Unicode Standard is 0640. By this approach, the “La” word is written in the normal form (")"ﻟـﺎ.
342
SHIRALI-SHAHREZA
Our method can be described as follows: If we come across any “La” word in the text, we choose one of the two forms of this word considering the information in question for hiding in the text. That is to say if we want to hide the bit 1 we would use the special form of “La” (" )"ﻻand if we are going to hide the bit 0 we would choose normal form of “La” (")"ﻟـﺎ. This way we hide the information in texts. Furthermore the volume of the information hidden in text is hidden at the beginning of the text in order that we can extract the right amount out of the text in future. To extract the information from the text having hidden information (stegano text), we respectively investigate the “La” being presented in the text. If it is the special form of the “La” word (")"ﻻ, it means that the bit 1 is hidden in the text. If the word is the normal form of the “La” word (")"ﻟـﺎ, it means that the bit 0 is hidden in the text. By putting all the bits of 0 and 1 next to each other we can extract the hidden information from the text. We tested our method on some Persian text files. We selected the resources which are used in our previous point steganography method [3] in order to compare these methods. The resources which are selected for computing the capacity of the methods for hiding data are including sport pages of some Iranian newspapers. The Internet address of these newspapers and the capacity of each text for hiding data are shown in Table 2. All of the articles were selected on 20 August 2005. As it is seen, our method capacity is low, but it has advantages such as that data are not destroyed by printing and this method is not only for electronic document. We will discuss some advantages of this method in the next section. Table 2- Comparing the capacity of our method and the point Steganography method
Farhange ashtidaily.com Ashti Hamshahri hamshahri.net Iran iraninstitute.org JameJam jamejamdaily.net Javan javandaily.com Jomhouri jomhourieslami.com Eslami Keyhan kayhannews.ir Khorasan khorasannews.com Quds qudsdaily.net Shargh sharghnewspaper.com
Pointed Method Capacity Ratio (Bit/Kilobyte) Our Method Capacity Ratio (Bit/Kilobyte) Our Method Text Capacity (bit)
WebSite Address
Text Size (Kilo Byte)
Newspaper
13.3
19
1.43
96
6.82 6.64 3.84 8.03
7 11 5 8
1.03 1.66 1.48 1.00
120 105 113 115
3.52
4
1.14
125
2.92 5.40 9.98 20.4
6 4 3 18
2.05 0.74 0.30 0.88
106 116 114 118
IV. CONCLUSION This paper proposes a new method for hiding data in Persian and Arabic texts by using the special form of “La” word. In this method the normal form of “La” (" )"ﻟـﺎor special form of “La” (" )"ﻻword are used in place of each other to hide bit 0 or 1. This method is not limited to hiding data in electronic documents (E-documents) and can be used in ordinary printed documents. If the stegano text containing the hidden data printed, we can extract the hidden data by tracking normal form of “La” (" )"ﻟـﺎand special form of “La” (" )"ﻻword. This method is not dependent on any special format and we can save it in numerous formats such as HTML pages or Microsoft Word documents because the stegano Unicode texts will not change because of copying and pasting and the data hidden in texts remains intact. Even if we change the configuration of the texts such as changing the text font size or making it bold or italic the texts in the formatted files such as Word documents, the hidden data still remains unchanged. The use of Unicode for writing texts in different languages has been welcomed warmly. Furthermore it can be implemented and executed on different systems and devices because most of them support Unicode Standard. As a result a wide range of the users can use this method. Since Pashto (one of the official languages of Afghanistan) and Urdu (the official language of Pakistan) are similar to Arabic and Persian writing system we can apply this method to these two languages, too. Arabic is the religious language of all the Muslims and more than one billion Muslims live throughout the world. In view of this, our suggested method covers a wide range of the users. REFERENCES [1] [2] [3]
[4]
[5] [6] [7]
N.J. Hopper, Toward a theory of Steganography, Ph.D. Dissertation, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA, July 2004. W. Bender, D. Gruhl, N. Morimoto, and A. Lu, "Techniques for data hiding," IBM Systems Journal, vol. 35, issues 3&4, 1996, pp. 313-336. M.H. Shirali-Shahreza and M. Shirali-Shahreza, "A New Approach to Persian/Arabic Text Steganography," Proceedings of the 5th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2006), Honolulu, HI, USA, July 10-12, 2006, pp. 310-315. A. Gutub and M. Fattani,. “A Novel Arabic Text Steganography Method Using Letter Points and Extensions,” Proceedings of the WASET International Conference on Computer, Information and Systems Science and Engineering (ICCISSE), vol. 21, Vienna, Austria, May 2007, pp. 28-31. K. Rabah, "Steganography-The Art of Hiding Data," Information Technology Journal, vol. 3, issue 3, 2004, pp. 245-269. M.H. Shirali-Shahreza and M. Shirali-Shahreza, "Persian/Arabic CAPTCHA," IADIS International Journal on Computer Science and Information Systems (IJCSIS), vol. 1, no. 2, October 2006, pp. 63-75. The Unicode Standard, http://www.unicode.org
GTRSSN: Gaussian Trust and Reputation System for Sensor Networks 1
Mohammad Momani1, Subhash Challa2 Engineering Department, University of Technology Sydney, Australia 2 NICTA, VRL, University of Melbourne, Australia [email protected] , [email protected]
Abstract- This paper introduces a new Gaussian trust and reputation system for wireless sensor networks based on sensed continuous events to address security issues and to deal with malicious and unreliable nodes. It is representing a new approach of calculating trust between sensor nodes based on their sensed data and the reported data from surrounding nodes. It is addressing the trust issue from a continuous sensed data which is different from all other approaches which address the issue from communications and binary point of view. I. INTRODUCTION
Trust has been the focus of researchers for a long time. It started in social sciences where trust between humans was studied. The effect of trust was also analysed in economic transactions as presented in [1, 2], and Marsh in [3] was one of the first to introduce a computational model for trust. Then ecommerce necessitated a notion to judge how trusted an internet seller can be as in [4, 5]. So did Peer-to-Peer networks and other internet forums where users deal with each other in a decentralized fashion as in [6, 7]. Recently, attention has been given to the concept of trust to increase security and reliability in Ad Hoc as in [8, 9] and sensor networks as in [10, 11]. Along with the notion of trust, comes that of Reputation. Reputation is the opinion of one person about the other, of one internet buyer about an internet seller, and by construct, of one wireless sensor network (WSN) node about another node in the same network. Trust is a derivation of the reputation of an entity. Based on a reputation, a level of trust is bestowed upon an entity. The reputation itself has been build over time based on that entity’s history of behaviour, and may be reflecting a positive or negative assessment. The trust problem is a decision problem under uncertainty, and the only coherent way to deal with uncertainty is through Probability. There are several frameworks for reasoning under uncertainty, but it is well accepted that the probabilistic paradigm is the theoretically sound framework for solving decision problems with uncertainty. Some of the trust models introduced for sensor networks employ probabilistic solutions mixed with ad-hoc approaches. None of them produces a full probabilistic answer to the problem. In this paper we extend our previous work presented in [12, 13] and we look at applying the Trust notion to WSNs providing data. Most studies of Trust in WSNs focused on the trust associated with the routing and the successful
performance of a sensor node in some predetermined task. This resulted in looking at binary events. The trustworthiness and reliability of the nodes of a WSN, when the sensing data is continuous has not been addressed. We look at the issue of security in WSNs using the trust concept, in the case of sensed data that is of continuous nature. The rest of the paper is organised as follows: Section 2 presents the related work, which covers only the very specific related work which we extended before. Section 3 introduces the Beta reputation system. We introduce our new model in section 4. In section 5 we present some of the simulation results and section 6 concludes the paper. II. BACKGROUND
In this paper, we derive a Bayesian probabilistic reputation system and trust model for wireless sensor network. We argue that the problem of assessing a reputation based on observed data is a statistical problem. Some trust models make use of this observation and introduce probabilistic modelling such as the trust model RFSN developed by Ganeriwal and Srivastava in [10]. The RFSN model presented in [10] uses a Bayesian updating scheme known as the Beta Reputation System introduced in [14] for assessing and updating the nodes reputations. The use of the Beta distribution is due to the binary form of the events considered. The observable nodes transactions data is referred to as first-hand information. A second source of information in trust modelling is information gathered by other nodes about a node of interest to an entity assessing its reputation. This second source of information is referred to as second-hand information. It consists of information gathered by nodes as first-hand information and converted into an assessment of that node. Due to the limitations of a WSN, the second-hand information is summarized before being shared. For example, RFSN uses a probability model in the form of a reputation system to summarize the observed information, and share the values of the parameters of the probability distributions as second-hand information. This shared information is soft data, requiring a proper way to incorporate it with the observed data into the trust model. The step of combining both sources of information is handled differently by different trust models. “Reference [10] uses Dumpster-Shafer belief theory”. Although a reputation system is designed to reduce the harmful effect of an unreliable or malicious node, such system
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 343–347. © Springer Science+Business Media B.V. 2008
344
MOMANI AND CHALLA
can be used by a malicious node to harm the network. Systems such as in [10] and [11] are confronted with the issue of what second hand information is allowed to be shared. For example, some prohibit negative second-hand information to be shared, in order to reduce the risk of a negative campaign by malicious nodes. We propose a full probabilistic way to incorporate all the second-hand information into a reputation system. To resolve the issue of the validity of the information source, the information is modulated using the reputation of the source. III. RELATED WORK
The Beta Reputation System was proposed by Josang and Is mail in [14] as a model to derive reputation ratings in the context of e-commerce. It was presented as a flexible system with foundations in the theory of statistics. Ganeriwal and Srivastava in [10] use the work of Josang and Ismail in their trust model for wireless sensor networks. Srinivasan, Teitelbaum and Wu in [11] mention the possibility of use of the Beta reputation system. The Beta reputation system is based on the Beta probability density function, Beta (α, β) as shown in equation (1). f (p |α,β) =
Γ (α + β ) α −1 p (1 − p ) β −1 Γ (α ) Γ ( β )
R ijnew = B eta (α ijn ew , β ijn ew )
(2)
Where node ni uses its reputation of node nk in the combination process. The authors of [10] follow the approach of [14], by mapping the problem into a Dempster-Shaffer belief theory model [15], solving it using the concept of belief discounting, and doing a reverse mapping from belief theory to continuous probability. We find it unnecessary to use the Belief theory. Rather, the probabilistic theory provides for a way to combine these two types of information. IV. GTRSSN TRUST MODEL
Trust modelling represents the trustworthiness of each node in the opinion of another node, thus each node associates a trust value with every other node as in [16], and based on that trust value a risk value required from the node to finish a job can be calculated. As illustrated in Fig. 1, node X might believe that node Y will fulfil 40% of the promises made, while node Z might believe that node Y will fulfil 50% of the promises made.
(1)
Where 0 ≤ p ≤ 1, α > 0, β > 0 and p is the probability that the event occurs, that is θ = 1. If we observe a number of outcomes where there are r occurrences and s non occurrences of the event, then using a Bayesian probabilistic argument, the probability density function of p can be expressed as a Beta distribution, where α = r + 1 and β = s + 1. This probabilistic mechanism is applied to model the reputation of an entity using events of completion of a task by the assessed entity. The reputation system counts the number r of successful transactions, and s the number of failed transactions, and applies the Beta probability model. This provides for an easily updatable system, since it is easy to update both r and s in the model. Each new transaction results either in r or s being augmented by 1. “Reference [10] uses this probability model in its reputation system”. For each node nj, a reputation Rij can be carried by a neighbouring node ni. The reputation is embodied in the Beta model and carried by two parameters αij and βij. αij represents the number of successful transactions node ni had with, or observed about nj, and βij the number of unsuccessful transactions. The reputation of node nj maintained by node ni is Rij = Beta (αij + 1, βij + 1). The trust is defined as the expected value of the reputation, Tij = E ( Rij ) . Second hand information is presented
to node ni by another neighbouring node nk. Node ni receive the reputation of node nj by node nk, Rkj , in the form of the two parameters αkj and βkj. Using this new information, node ni combines it with its current assessment Rij to obtain a new reputation Rijnew as in equation (2).
Fig.1: A simple trust map [16]
In other words trust modelling is simply the mathematical representation of a node’s opinion in another node in a network. In our model we are calculating trust based on the continuous sensed data (temperature) as opposed to all previous related works which are calculating trust based on binary events. Let {A1, A2, …, AN} be the nodes of a wireless sensor network. Let the corresponding matrix (Γ) be as shown in equation (3). ⎛1 ⎜ . Γ = [Γ i, j ] = ⎜ ⎜. ⎜ ⎝.
.
.
1 .
. 1
.
.
.⎞ ⎟ .⎟ .⎟ ⎟ 1⎠
(3)
If node Ai is connected to node Aj then Γ i , j = Γ j ,i = 1 otherwise it is equal to 0. X is a field variable of interest which is of a continuous nature. This variable such as temperature, chemical quantity, atmospheric value, is detected and sensed by the nodes of the WSN and is reported only at discrete times t = 0, 1, 2, …, k, the random variable XAi = Xi is the sensed value by node Ai. i = 1, …, N. xi(t) is the realization of that random variable at time t. Each node Ai, i = 1, …, N has a time series {xi(t)}. These time series are most likely different, as nodes are requested to provide a reading at different times, depending on the sources of the request. It could also be that the nodes provide such readings when triggered by some events. We
GTRSSN: GAUSSIAN TRUST AND REPUTATION SYSTEM FOR SENSOR NETWORKS
assume that each time a node provides a reading, its one-hop neighbours see that report and can evaluate the reported value. For example if node Aj reports xj(t0) at some time t0, then node Al obtains a copy of that report, and has its own assessment xl(t0) of the sensed variable, say temperature. Let yi,j(t) = xj(t)-xi(t). From node Ai's perspective, Xi(t) is known, and Yi,j(t) = Xj(t) - Xi(t) represents the error that node Aj commits in reporting the sensed field value Xj(t) at time t. Yi,j(t) is a random variable modelled as a Normal (Gaussian) shown in equation (4).
Y i , j ( t ) ∼ N (θ
,τ
i, j
2
)
(4)
τ
is assumed known, and is the same for all nodes. If we let yi , j to be the mean of the observed error, as observed by Ai
about Aj's reporting as in equation (5), y i,
j
=
∑
k t =1
y
i, j
(t ) / k
(5 )
then
(θ i , j | y i , j ) ∼ N ( y i , j , τ
2
/k)
(6 )
Where yi , j = {( yi , j (t ) ; for all t values at which a report is issued by Aj}. This is a well known straightforward Bayesian updating where a diffuse prior is used. We let μi , j = yi , j and
In addition to data observed in form of yi , j = {( yi , j (t ) for all t values at which a report is issued by Aj}, node Ai uses second hand information in the form of ( μls , j , σ ls , j ) , s = 1, …,m from the m nodes connected to Aj . This is an “expert opinion”, that is soft information from external sources. Each of these m nodes has observed node Aj's reports and produced assessments of its error in the form of ( μls , j , σ ls , j ) , s = 1, …, m and consequently Tls,j, s = 1, …, m. In using expert opinion/external soft information, one needs to modulate it. Node Ai uses its own assessment of the nodes Al1 ,..., Alm , in the form of ( μi ,ls , σ i ,ls ) , s = 1, … , m and consequently Ti,ls , s = 1, …, m. Using Bayes theorem, the probability distribution of θi,j is obtained, that uses the observed data along with the second hand modulated information as shown in equation (10). P (θ i , j | yi , j , ( μl1 , j , σ l1 , j ),..., ( μlm , j , σ lm , j ), ( μi ,l1 , σ i ,l1 ),..., ( μi ,lm , σ i ,lm ))
Ri, j = N ( μ i, j ,σ
2 i, j
)
(7 )
∑
μinew ,j
( μls , j + μi ,ls )
m
+ (ky / τ 2 ) ⎛ 1 ⎞ − 1⎟ α ⎜⎜ ⎟ Ti ,ls ⎝ ⎠ = m 1 + (k / τ 2 ) ∑ s =1 ⎛ ⎞ 1 − 1⎟ α ⎜⎜ ⎟ ⎝ Ti ,ls ⎠ s =1
where μi , j = yi , j and σ i2, j = τ 2 / k are the equivalent of αij and βij as in [10].
σ i2, j new =
Trust is defined differently, since we want it to remain between 0 and 1, we define the trust to be the probability as shown in equations (8) and (9). T i , j = P r o b { | θ i , j |< ε }
T i , j = P ro b { - ε < θ i , j < + ε } = ⎛ ε − yi, j ⎞ ⎛ −ε − yi, j ⎞ =φ⎜ ⎟−φ ⎜ ⎟ τ k / ⎝ ⎠ ⎝ τ / k ⎠
(8 )
(10)
Equation (10) is proportional to the product of three terms, which represents the likelihood, the prior distribution and the second hand information. By elaborating the second hand information we proved that it is a Normal (Gaussian) distribution with mean and variance as shown in equations (11) and (12) consequently.
σ i2, j = τ 2 / k . Recall that k is nodes dependent. It is the number of reports issued by node j, and differs from node to node. We define the reputation Ri , j as in equation (7)
345
1
∑
m s =1
1 ⎛ 1 ⎞ − 1⎟ α ⎜⎜ ⎟ T ⎝ i , ls ⎠
+ (k / τ 2 )
(11)
(12)
2 new These values ( μinew ) along with ( μi , j , σ i2, j ) are easily , j ,σ i, j
updatable values that represents the continuous Gaussian new version of the (α i , j , βi , j ) and (α inew of the binary , j , βi , j )
approach in [10], as derived from the approach in [14]. The (9 ) network topology and protocols follow those of [10, 11]. The solution presented is simple, and easily computable. This is with keeping in mind that the solution applies to networks with The bigger the error θij is, meaning its mean shifting to the limited computational power. Some would object to the use of right or left of 0, and the more spread that error is, the less the a diffuse prior, which in effect, forces a null prior trust value, trust value is. Each node Ai maintains a line of reputation regardless of the ε value. A way to remedy this is to start with a assessments composed of Ti,j for each j, such that N ( μ0 , σ 02 ) prior distribution for all θij, such that the prior trust Γi , j ≠ 0 (one-hop connection). Ti,j is updated for each time is 1/2. This choice not only answers the diffuse prior issue, but period t for which data is received for some connecting node j.
346
MOMANI AND CHALLA
also allows the choice of the parameters involved. ε can be determined, given μ0 and σ0. μ0 is most likely to be set to 0. Therefore, σ0 and ε determine each other. With a proper prior θi , j as shown in equation (13),
θ i, j ∼ N ( μ 0 , σ
2 0
)
sensors locations Node 6 Node 1 760
(1 3 )
755
Node 4 3 Node
750
the reputation parameters μi , j and σ i2, j are presented in
Width (m)
Node 13
equations (14) and (15) consequently.
735
(14)
(1/ σ ) + ( k / τ ) 2 0
2
Node 5 Node 8
730
(15)
720
725
and the updated values are presented in equations (16) and (17).
⎛ 1 ⎞ − 1⎟ α ⎜⎜ ⎟ T ⎝ i , ls ⎠
T r u s t v a lu e
with second hand information without second hand information 20
30
40 50 60 70 Time Trust between node 1 and node 7
80
90
100
1
T ru s t v a lu e
T ru s t v a lu e
10
0.5
(17)
V. SIMULATION RESULTS
To verify our theory we developed several simulation experiments and we present in this section the results from 2 different scenarios conducted on the network shown in Figure 2. In both simulation experiments, we calculate the trust between 4 nodes (1, 6, 7 and 13) in a sub-network of 15 nodes as shown in Figure 2. In the first scenario we assumed that only a random region is selected to report data on every time series and the result is represented first in Figures 3, 4 and 5, while in the second scenario we assumed that the entire network is reporting for every time series and the result is represented in the second figure of Figures 3, 4 and 5. First, we assume that all nodes are working properly and report the sensed event with only a small reading error. Simulation results show that the trust values of node 1 for the other nodes (6, 7 and 13) are slightly different but converge to 1 as can be seen in Figure 3. The results presented in Figures 3, 4 and 5 show that the second scenario is giving more precise results as the trust is updated for all nodes at each time series.
0
0
10
20
30
40 50 60 70 Time Trust between node 1 and node 7
80
90
100
10
20
30
40 50 60 70 Time Trust between node 1 and node 6
80
90
100
10
20
30
80
90
100
1
0.5
10
20
30
40 50 60 70 Time Trust between node 1 and node 6
80
90
100
1
0.5 0
with second hand information without second hand information
0.5
T r u s t v a lu e
+ (k / τ 2 )
760
Trust between node 1 and node 13
0
1
0.5
10
20
30
40
50 60 Time
70
80
90
100
0
40
50 60 Time
70
Fig. 3: All nodes are normal
In other experiments, we assume that nodes 7 and 13 are faulty or malicious nodes, the results from the simulation are presented in Figure 4 and show the trust value for nodes 7 and 13 dropping to zero. Node 6 is assumed reliable, and its corresponding trust value follows a growing path that eventually reaches 1. Trust between node 1 and node 13
Trust between node 1 and node 13 1 with second hand information without second hand information
0.5 0
10
20
30
40 50 60 70 Time Trust between node 1 and node 7
80
90
1
with second hand information without second hand information 10
20
30
40 50 60 70 Time Trust between node 1 and node 7
80
90
100
10
20
30
40 50 60 70 Time Trust between node 1 and node 6
80
90
100
10
20
30
80
90
100
1
0.5
10
20
30
40
50 60 70 Time Trust between node 1 and node 6
80
90
0
100
1
1
0.5
0.5 0
0
100
0.5 0
1
0.5
T ru s t v a lu e
1
Node 12
Node 2 Node 11 750 755
745
1
T r u s t v a lu e
(1 / σ ) + ∑ s =1 m
0
(16)
T r u s t v a lu e
1 2 0
740 Length(m)
1
0.5
T r u s t v a lu e
σ i2, j new =
735
Trust between node 1 and node 13
T r u s t v a lu e
μinew ,j =
( μls , j + μi ,ls )
+ (kyi , j / τ 2 ) ⎛ 1 ⎞ − 1⎟ α ⎜⎜ ⎟ ⎝ Ti ,ls ⎠ m 1 + (k / τ 2 ) (1/ σ 02 ) + ∑ s =1 ⎛ 1 ⎞ − 1⎟ α ⎜⎜ ⎟ ⎝ Ti ,ls ⎠
730
Fig. 2: Wireless Sensor Network Diagram
T ru s t v a lu e
m
Node 10
Node 15
725
1 = (1/ σ 02 ) + (k / τ 2 )
( μ0 / σ 02 ) + ∑ s =1
Node 14
740
T r u s t v a lu e
σ
2 i, j
( μ0 / σ 02 ) + (kyi , j / τ 2 )
Node 7
Node 9
T r u s t v a lu e
μi , j =
745
10
20
30
40
50 Time
60
70
80
90
100
0
40
Fig. 4: Node7 and node 13 are faulty
50 60 Time
70
GTRSSN: GAUSSIAN TRUST AND REPUTATION SYSTEM FOR SENSOR NETWORKS
Figure 5 shows the trust value from the direct information reaches zero for both nodes 7 and 13. This is because node 1 is faulty, and contradicts nodes 7 and 13 based only on direct information. However, using second information, the trust for these two nodes is high. This is an interesting case as both nodes (13,7) are assessing node 1 as a faulty node. The trust value for node 6 is set to the initial value of (0.5) and will decrease to zero as there is no second hand information available about node 6.
T ru s t v a lu e
T ru s t v a lu e
0
10
20
30
40
50 60 70 Time Trust between node 1 and node 7
80
90
100
with second hand without second hand
1
0
10
20
30
40
50 60 70 Time Trust between node 1 and node 6
80
90
100
1
0.5 0
REFERENCES [1] [2]
with secondhand information without secondhand information 10
20
30
40
50 60 70 Time Trust between node 1 and node 7
80
90
100
10
20
30
40
80
90
100
[4]
1
[5]
0.5
T ru s t v a lu e
T ru s t v a lu e
0.5
0
Sydney and partial funding through the ARC Linkage Grant LP0561200.
[3]
Trust between node 1 and node 13 1
0.5
T ru s t v a lu e
T r u s t v a lu e
Trust between node 1 and node 13 1
0.5
0
50 60 70 Time Trust between node 1 and node 6
1
[6]
0.5
10
20
30
40
50 60 Time
70
80
90
100
0
10
20
30
40
50 60 Time
70
80
90
100
Fig. 5: Node 1 is a malicious node
In the last example shown in Figure 5, we do know that node 1 is faulty, since it is a simulation exercise. The results clearly should indicate to the network that node 1 is faulty. However, it could also be the case that nodes 7 and 13 are malicious. The trust system works on the assumption that a majority of nodes in a neighbourhood are reliable. This principle helps purge the system of bad elements.
[7] [8] [9] [10]
[11] VI. CONCLUSION AND FUTURE WORK
In this paper we introduced a new Gaussian Trust and Reputation System for Sensor Networks (GTRSSN). We introduced a theoretically sound Bayesian probabilistic approach for calculating trust and reputation systems in WSN. We also presented simulation experiment results conducted on different scenarios. In future research, we will try to map the trust network model to a Bayesian network model to address the issue of how to decide on the deleting or keeping nodes in wireless sensor networks.
[12]
[13]
[14]
ACKNOWLEDGEMENT We acknowledge funding for this research through a postgraduate scholarship from the University of Technology,
347
[15] [16]
S. Ba and P. A. Pavlou, “Evidence of the effect of trust building technology in electronic markets: price premiums and buyer behavior,” MIS Quarterly, vol. 26, 2002. P. Dasgupta, “Trust as a commodity,” in n Gambetta, Diego (ed.) Trust: Making and Breaking Cooperative Relations, vol. electronic edition, D. Ingram, Ed.: Department of Sociology, University of Oxford,, 2000, pp. 49-72. S. Marsh, “Formalising Trust as a Computational Concept,” in Departmet of Computer Science and Mathematics, vol. PhD: University of Stirling, 1994, pp. 184. D. H. McKnight and N. L. Chervany, “Conceptualizing Trust: A Typology and E-Commerce Customer Relationships Model,” presented at Proceedings of the 34th Hawaii International Conference on System Sciences - 2001, 2001. P. Resnick and R. Zeckhauser, “Trust among strangers in internet transactions: empirical analysis of eBays reputation system,” presented at NBER: workshop on empirical studies of electronic commerce, 2000. K. a. D. Aberer, Z. , “Managing trust in a peer-2-peer information system,” presented at Ninth Int. Conf. Information and Knowledge Management, 2001, 2001. L. Xiong and L. Liu, “A reputation-based trust model for peer-topeer e-commerce communities,” presented at IEEE conference on E-commerce, 2003. S. Buchegger and J. L. Boudec, “Performance analysis of the CONFIDANT protocol,” presented at 3rd ACM int. symp. Mobile ad hoc networking & computing, 2002. P. Michiardi and R. Molva, “CORE: A Collaborative Reputation Mechanism to enforce node cooperation in Mobile Ad hoc Networks,” 2001. S. Ganeriwal and M. B. Srivastava, “Reputation-based Framework for High Integrity Sensor Networks,” presented at the 2nd ACM workshop on Security of ad hoc and sensor networks Washington DC, USA 2004. A. Srinivasan, J. Teitelbaum, and J. Wu, “DRBTS: Distributed Reputation-based Beacon Trust System,” in 2nd IEEE International Symposium on Dependable, Autonomic and Secure Computing (DASC'06), 2006. M. Momani, S. Challa, and K. Aboura, “Modelling Trust in Wireless Sensor Networks from the Sensor Reliability Prospective,” in Innovative Algorithms and Techniques in Automation, Industrial Electronics and Telecommunications, T. Sobh, K. Elleithy, A. Mahmood, and M. Karim, Eds.: Springer Netherlands, 2007. M. Momani, K. Aboura, and S. Challa, “RBATMWSN: Recursive Bayesian Approach to Trust Management in Wireless Sensor Networks,” presented at The Third International Conference on Intelegent Sensors, Sensor Networks and Information Processing, Melbourne, Australia, 2007. A. Jøsang and R. Ismail, “The Beta Reputation System,” in 15th Bled Electronic Commerce Conference. Bled, Slovenia, 2002. G. Shafer, “A mathematical theory of evidence,” Princeton University, 1976. B. N. Shand, “Trust for resource control: Self-enforcing automatic rational contracts between computers,” University of Cambridge Computer Laboratory UCAM-CL-TR-600, 2004.
Fuzzy Round Robin CPU Scheduling (FRRCS) Algorithm M.H. Zahedi, M. Ghazizadeh and M. Naghibzadeh 1
Abstract: Scheduling is to determine when and on what processor each process has to run. Scheduling objectives are achieving high processor utilization; minimizing average (or maximum) response time, maximizing system throughput, and being fair (avoid process starvation).Many scheduling have been introduced in order to optimizing the processor utilization. One of this algorithm is Round Robin CPU scheduling (RRCS) .We optimize RR by using the concept of fuzzy rule base system. We introduced a new fuzzy algorithm called fuzzy rule based round robin CPU scheduling (FRRCS) .The rules are extracted from the knowledge of experts and they are experimental. The advantage of FRRCS is compared with the RRCS. It is shown that the FRRCS algorithm is improved the waiting and response times. Index terms-CPU Scheduling, RRCS, FRRCS, fuzzy rule base I.INTRODUCTION CPU scheduling is to determine when and on what processor each process has to run. This is important because the overall system utilization and performance as well as the response time of processes depend on how the processes are scheduled to run. Operating systems world went through a long period in which the most popular operating systems had simple and inefficient scheduling algorithms. Early operating systems were single threaded and ran one process at a time until the user directs them to run another process. More recent systems benefit from having sophisticated CPU scheduling algorithms. Until recently, basic assumptions behind most scheduling algorithms were as follows: (1) there is a pool of runnable processes contending for the CPU, (2) processes are independent and compete for resources, and (3) the function of the CPU scheduler is to distribute the scarce resource of the CPU to the different processes fairly (according to some
definition of fairness) and in such a way that Computer Department, Faculty of Engineering, Mashad Ferdowsi University, Mashad, Iran [email protected], [email protected], [email protected]
optimizes some performance criteria. In general, these assumptions are starting to break down. First of all, CPU s are not really that scarce - almost every user has access to more than one, and pretty soon user will be able to afford more. Second, many applications are starting to be structured to use multiple cooperating processes. So, a view of the scheduler as mediating between competing entities may be partially obsolete. Burst Time or service time is the time needed by process to finish. Possible process states in a multiprogramming environment are: (1) Running process is consuming CPU , (2) Ready - ready to run, but not actually running on the CPU , (3) Waiting waiting for some resource or an event . To evaluate a scheduling algorithm, there are many possible criteria: (1) CPU Utilization - keep CPU utilization as high as possible, (2) Throughput - complete as many processes as possible in every unit of time, (3) Turnaround Time –total time that a process span, (4) Waiting Time –total time a process spend in the ready state. Brief introductions to some common scheduling algorithms are as follows [1, 2, 5 and 6]: First-Come First-Service (FCFS): There is only one ready queue for the whole system and the operating system runs the process in the front of queue. New processes entering the system must wait at the end of the queue. A process does not give up CPU until it either terminates or needs some service, like I/O, which is provided by a peripheral processor. Shortest-Job-First (SJF): There is only one ordered ready queue for the whole system. The processes are given the CPU time based on their total CPU requirement. A process with lower CPU requirement is positioned a closer to the front of the queue. The operating system takes the process from the front of queue. New processes are inserted in the queue in such a way to retain the queue assertion. A process does not give up CPU until it either terminates or needs some service, like I/O, which is provided by a peripheral processor. This algorithm can eliminate some of the variances in waiting and turnaround time. In fact, it is optimal with respect to average Waiting Time. If the jobs are short the algorithm performance is very good. However, if the job is long, the system will hold off on running the process. So, users should give pretty good estimates of overall CPU requirement.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 348–353. © Springer Science+Business Media B.V. 2008
FUZZY ROUND ROBIN CPU SCHEDULING (FRRCS) ALGORITHM
Multilevel Feedback Queue Scheduling: Like multilevel scheduling, except processes can move between queues as their priority changes. It can be used to give I/O bounded and interactive processes CPU priority over CPU bounded processes. It can also prevent starvation by increasing the priority of processes that have been idle for longer time. Round Robin (RR): It is similar to First-ComeFirst-Service algorithm, but allows preemption and partial CPU assignment. A CPU quantum time or time slice is first agreed upon. The algorithm allows the first process in the queue to run for as long as the quantum time. Implementing Round Robin requires timer interrupts. To schedule a process the algorithm sets the timer to go off when the quantum time expires. If process does I/O before timer goes off, the system performs a context switch, i.e. save the state of the running process and loads the state of the next process, and run the next process. But if the process quantum time expires, the system does a context switch and run the next process. This algorithm gives good response time to short requests, but can give bad Waiting Time. In RRCS we deal with some key parameters such as priority, quantum time and burst time. One specially, straight forward method is to describe these parameters through fuzzy set based on the linguistic variables such as low, medium and high. In Section 2, a Round Robin based hybrid algorithm is presented, In Section 3, the new algorithm is implemented and tested. Its performance is compared with the performance of the above mentioned algorithms .Section 4 summarizes the paper. II. Round Robin based hybrid algorithm Now we introduce a new CPU scheduling mechanism to improve the performance of the Round Robin algorithm by decreasing the Waiting Time and Normal Turnaround Time. The algorithm makes use of three priority queues and three parameters to determine which process is assigned to which queue. The first parameter is the Burst Time of processes; Short Burst Time causes high priority and vice-versa. The second parameter is the process’s user defined priority. The last parameter is the “ratio of remaining time to Burst Time” of a process that we call it RS time and will use it in continuation of the paper; RS is high when the process priority is low and viceversa. The algorithm allocates in every round four quantum times to each process in the highest priority
349
queue (Q1), two quantum times to each medium priority queue (Q2) process, and one quantum time to each process in the lowest priority queue (Q3). It means that scheduler circulates among queues after their quantum times. At first it service a process from first queue, if available , then service a process from second queue ,after that a process from the third queue will be serviced. This cycle repeats until there is no process in the queues. The scheduler works similar to the Round Robin with respect to each queue. If the process is not completed after the quantum time, the scheduler uses our algorithm to determine the queue in which the process must be inserted. Then the process inserted at the end of the queue. Recall that Q1 has the highest and Q3 has the lowest priority. The following fuzzy triangle membership functions are used to evaluate the membership degree of process in the queues. The fuzzy membership functions that we used in this algorithm are shown in fig.1. As shown in this figure, each of these membership functions has three fuzzy sets [3, 4]: low, medium and high. All of these membership functions have their own ranges.
Fig.1.Membership Functions Fuzzy If/Then rules, like those listed below, are used to make decision on the selection of queue for a process: 1. IF Burst Time is low and priority is high and RS time is medium then insert process in Q1 2. IF Burst Time is low and priority is medium and RS time is medium then insert process in Q2 3. IF Burst Time is high and Priority is low and RS time is medium then insert process in Q3 4. IF Burst Time is medium and priority is medium and RS time is medium then insert process in Q2 5. IF Burst Time is medium and priority is low and RS time is low then insert process in Q2 6. IF Size is medium and priority is high and RS time is low then insert process in Q1 7. IF Burst Time is high and priority is high and RS time is high then insert process in Q1 8.... So there is a rule base with 3*9 rules which are shown in Table1. Fuzzy engine uses these rules to make a required decision.
350
ZAHEDI ET AL.
Fig.2. Optimized RR CPU Scheduler Table I Rule-Base table The structure of our algorithm is shown in Fig. 2. This figure shows the arrival of processes and their categorization into three classes. The scheduler selects a queue and assigns the process to the queue based on its evaluation. Wait-Prior = Wait Time * priority Normal TT=Turnaround Time / Burst Time
III. Test and Comparison
We have tested the algorithm and compared the The first parameter (Wait-Prior) simultaneously results with FCFS, RR, and SJF scheduling considers Waiting Time and priority. A process with algorithms. The results are shown in tables. Table 2, high priority should wait less. The parameters which 4, 6 and 8 refers to the data and result of running are considered are depicted at the bottom of the them. Table 3, 5, 7 and 9 refers to the minimum, tables. average (mean) and the maximum of the results for Table 2 shows the result of running FCFS algorithm each algorithm. In these tables, we have introduced on entrance data. “Wait-Prior” and “Normal Turnaround Time” metric which are calculated by these formulas: Table II. First Come First Service algorithm Burst
Priority
Arrival
Start
Finish
Wait
Normal TT
Turnaround
wait-prior
7
5
70
70
76
0
1
7
0
24
4
140
140
163
0
1
24
0
37
8
235
235
271
0
1
37
0
13
3
260
272
284
12
1.92
25
36
3
0
305
305
307
0
1
3
0
56
6
360
360
415
0
1
56
0
25
1
455
455
479
0
1
25
0
20
1
465
480
499
15
1.75
35
15
15
2
510
510
524
0
1
15
0
17
5
570
570
586
0
1
17
0
29
7
585
587
615
2
1.07
31
14
6
0
655
655
660
0
1
6
0
14
1
675
675
688
0
1
14
0
26
0
715
715
740
0
1
26
0
48
9
740
741
788
1
1.02
49
9
8
2
755
789
796
34
5.25
42
68
28
2
775
797
824
22
1.79
50
44
59
0
790
825
883
35
1.59
94
0
33
9
840
884
916
44
2.33
77
396
96
0
925
925
1020
0
1
96
0
28
0
1000
1021
1048
21
1.75
49
0
65
4
1050
1050
1114
0
1
65
0
37
2
1050
1115
1151
65
2.76
102
130
99
8
1055
1152
1250
97
1.98
196
776
22
7
1095
1251
1272
156
8.09
178
1092
351
FUZZY ROUND ROBIN CPU SCHEDULING (FRRCS) ALGORITHM
Table III. Important results used for comparison Wait
Normal TT
Turnaround
wait-prior-avg
Min
0
1
3
0
Mean
20.16
1.77
52.76
30
Max
156
8.09
196
1092
In Table 4 you can see the execution of the same
processes by applying the RR algorithm.
Table IV. Round Robin algorithm Burst
Priority
7
5
24
4
37
8
13 3
Arrival
Start
Finish
Wait
Normal TT
Turnaround
wait-prior
70
70
76
0
1
7
0
140
140
163
0
1
24
0
235
235
281
10
1.27
47
80
3
260
265
284
12
1.92
25
36
0
305
305
307
0
1
3
0
56
6
360
360
415
0
1
56
0
25
1
455
455
499
20
1.80
45
20
20
1
465
465
494
10
1.50
30
10
15
2
510
510
524
0
1
15
0
17
5
570
570
586
0
1
17
0
29
7
585
587
615
2
1.07
31
14
6
0
655
655
660
0
1
6
0
14
1
675
675
688
0
1
14
0
26
0
715
715
740
0
1
26
0
48
9
740
741
826
39
1.81
87
351
8
2
755
761
768
6
1.75
14
12
28
2
775
779
834
32
2.14
60
64
59
0
790
809
916
68
2.15
127
0
33
9
840
845
907
35
2.06
68
315
96
0
925
925
1040
20
1.21
116
0
28
0
1000
1005
1048
21
1.75
49
0
65
4
1050
1050
1246
132
3.03
197
528
37
2
1050
1060
1176
90
3.43
127
180
99
8
1055
1070
1345
192
2.94
291
1536
22
7
1095
1110
1188
72
4.27
94
504
Table V. Important results used for comparison Min Mean Max
Wait 0 30.44 192
Normal TT 1 1.72 4.27
Turnaround 3 63.04 291
Wait-prior-avg 0 42.44 1536
352
ZAHEDI ET AL.
Table 6 shows the execution of the processes
calculated by applying SJF algorithm.
Table VI Shortest Job First algorithm Burst
Priority
Arrival
Start
Finish
Wait
Normal TT
Turnaround
wait-prior
7
5
70
70
76
0
1
7
0
24
4
140
140
163
0
1
24
0
37
8
235
235
271
0
1
37
0
13
3
260
272
284
12
1.92
25
36
3
0
305
305
307
0
1
3
0
56
6
360
360
415
0
1
56
0
25
1
455
455
479
0
1
25
0
20
1
465
480
499
15
1.75
35
15
15
2
510
510
524
0
1
15
0
17
5
570
570
586
0
1
17
0
29
7
585
587
615
2
1.07
31
14
6
0
655
655
660
0
1
6
0
14
1
675
675
688
0
1
14
0
26
0
715
715
740
0
1
26
0
48
9
740
741
788
1
1.02
49
9
8
2
755
789
796
34
5.25
42
68
28
2
775
797
824
22
1.79
50
44
59
0
790
825
883
35
1.59
94
0
33
9
840
884
916
44
2.33
77
396
96
0
925
925
1020
0
1
96
0
28
0
1000
1021
1048
21
1.75
49
0
65
4
1050
1087
1151
37
1.57
102
148
37
2
1050
1050
1086
0
1
37
0
99
8
1055
1187
1285
132
2.33
231
1056
22
7
1095
1152
1173
57
3.59
79
399
Table VII. Important results used for comparison Wait
Normal TT
Turnaround
wait-prior-avg
Min
0
1
3
0
Mean
16.48
1.56
49.08
25.41
Max
132
5.25
231
1056
Table8 shows an execution of our algorithm
as was discussed in previous.
353
FUZZY ROUND ROBIN CPU SCHEDULING (FRRCS) ALGORITHM
Table VIII. Fuzzy Round Robin Algorithm Burst
Priority
Arrival
Start
Finish
Wait
Normal TT
Turnaround
wait-prior
7
5
70
70
76
0
1
7
0
24
4
140
140
163
0
1
24
0
37
8
235
235
275
4
1.11
41
32
13
3
260
264
284
12
1.92
25
36
3
0
305
305
307
0
1
3
0
56
6
360
360
415
0
1
56
0
25
1
455
455
487
8
1.32
33
8
20
1
465
467
500
15
1.75
35
15
15
2
510
510
524
0
1
15
0
17
5
570
570
576
0
1
17
0
29
7
585
587
615
2
1.07
31
14
6
0
655
655
660
0
1
6
0
14
1
675
675
688
0
1
14
0
26
0
715
715
740
0
1
26
0
48
9
740
741
807
20
1.42
68
180
8
2
755
755
765
3
1.38
11
6
28
2
775
777
830
28
2
56
56
59
0
790
797
915
67
2.14
126
0
33
9
840
841
909
37
2.12
70
333
96
0
925
925
1028
8
1.08
104
0
28
0
1000
1006
1044
17
1.61
45
0
65
4
1050
1050
1180
65
2
130
260
37
2
1050
1052
1194
108
3.92
145
216
99
8
1055
1058
1274
121
2.22
220
968
22
7
1095
1098
1140
24
2.09
46
168
Table IX. Important results used for comparison Min Mean Max
Wait 0 21.56 121
Normal TT 1 1.53 3.92
The above results show that our algorithm has the best Normal Turnaround Time (Normal TT) and the Wait-Prior-Avg in our algorithm perform better than FCFS and RR. As tables 3, 5, 7 and 9 show our algorithm, compared with RR algorithm, has a good improvement. We can see that Mean of Turnaround Time has decreased about 14%, Mean of Normal Turnaround Time has decreased about 11% and Mean of wait-prior-avg has decreased about 37%. So the improvement is high because of the less Normal Turnaround Time and the less wait-prior-avg. This algorithm also shows some high improvements in comparison with the other algorithms. IV. Conclusion The aim of this paper is to improve RRCS algorithm. We combine MFQ and RRCS and concept of fuzzy rule based system to obtain a new algorithm called fuzzy rule based round robin CPU scheduling (FRRCS).Then by using a sample data we implement theses algorithms manually. Then we compared the results and it is shown that the suggested algorithm
Turnaround 3 54.16 220
wait-prior-avg 0 26.65 968
has better Turnaround and Normal Turnaround time than RR algorithm. This Algorithm also reduces the Waiting time. The main advantage is that the algorithm facilitates using interactive Systems. References [1] Andrew S. Tanenbaum .Modern operating systems. Prentice Hall, New Jersey, 1992. [2] A. Silberschatz and P. B. Galvin.Operating system concepts, Addison-Wesley Publishing Company, New York, 1994. [3]J.F. Baldwin,” Fuzzy logic and fuzzy reasoning “in Fuzzy Reasoning and Its Applications, E.H.Mamdani and B.R. Gaines (eds.), London: Academic Press, 1981. [4] L.A. Zadeh, Fuzzy Sets, Information and Control, 1965. [5] M. Naghibzadeh, Operating System Concepts and Techniques, Published by: iUniverse, Dec. 2005. [6] Tanenbaum, A. Modern Operating Systems (Prentice-Hall International, Inc.), 1992.
Fuzzy Expert System In Determining Hadith1 Validity M.Ghazizadeh, M.H.Zahedi, M.Kahani and B.Minaei Bidgoli
ABSTRACT-
There is a theoretical framework in the Islamic science that helps us to distinguish the valid Hadith from invalid one, named “Hadith science”. In addition to Hadith Science, “Rejal science” that concentrates upon the examination of the characters of those who narrated the Hadith .These sciences together can contribute to prove the validity of Hadith. The main objective of this paper is to determine the rate of validity of a Hadith through a fuzzy system with respect to some parameters. According to view point of expert, the data knowledge base has been designed and the essential rules have been extracted. Then the system was implemented by the usage of expert system software’s. After that the samples taken from” KAFI1 “ volume 1 were inserted into the data base to be assessed by means of documentary2 information. The results deduced from our designed expert system were compared with expert view points. The comparison shows that our system was correct in 94% cases. Index Terms—Expert Systems, Fuzzy sets, Rule base System, Inference Engine, Rejal Science and Hadith Science.
I. INTRODUCTION Fuzzy logic is a vast theory consisting fuzzy sets, fuzzy logic, fuzzy measuring and extra. As it is applied at fuzzy logic, fuzzy is implicating some different ambiguity and uncertainty cases. Especially referring to the ambiguity of expressive language and people’s thoughts, and it is different with uncertainty which is being expressed by a theory. Prof. Lotfi Zadeh officially presented an article “Fuzzy sets” at 1965 A.C. Fuzzy logic reached to an inclination point in 1974[2]. Fuzzy was applied to “controlling” by Ebrahim Mamdani for the first time. The international society of fuzzy systems as the first scientific organization was established Faculty of Engineering, Mashad Ferdowsi University Mashad, Iran. [email protected], [email protected] . Faculty of Computer Engineering, Iran University of Science & Technology, Tehran, [email protected] 1 KAFI is one of the most reliable reference books of Hadith in Islam. 2 Document includes a set of narrators that have been arranged in hierarchy.
for fuzzy logic theorists and executors in 1984[5-6]. The first and the most successful function of fuzzy system are “controlling”. Fuzzy systems are based on rules or knowledge. Computing the rules is the first step to design a fuzzy system. The next step is combination of these rules. Some of the processes of fuzzy sets are: complementary, intersection (minimum) and union (maximum).There are different methods to evaluate a rule; one of them is Mamdani inference following [2]: μ QMM ( X , Y ) = min [μ f p 1 ( x ), μ f p 2 ( y )]
μ QMp
(X
,Y
)=
[μ f (x ) × μ f ( y )] p1
p2
Generalized Modes ponnes deduction might be used as following: Imagine the fuzzy set “ A′ ” and fuzzy relation “A → B” At U*V, A fuzzy set
B ′ at V is as follow:
μ B ′ ( y ) = sup t [μ A′( X , μ A → B ( X , Y ))] Fuzzy rules bases: fuzzy rules set will be if → then When the rule “I” is presented as follow: RI: If X1 is Ai1 and X2 is Ai2 ….and Xr is Air then y is Bi Fuzzy inference engine: principles of fuzzy logic are applied for rules’ combinations. There are 2 different methods for Inference: 1) Inference based on rules’ combinations 2) Inference based on singular rules. At second method, each rule has got a fuzzy result and the final result is combination of those fuzzy results. Combination is done by union or intersection. Inference engine has got various kinds. One of them is multiplication inference engine which is computed as follow:
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 354–359. © Springer Science+Business Media B.V. 2008
355
FUZZY EXPERT SYSTEM IN DETERMINING HADITH VALIDITY
μ B ′ (Y ) = max {∏ r K =1 μ Aik (x ′k ).μ Bi ( y )} i = 10000 nR
Hadithology is what we are going to discuss about at 2nd chapter. We are also going to talk a bout the criterion and proofs which are confirming the credibility of a Hadith. We’ll be having a brief look at sets and components of fuzzy controlling systems at fuzzy systems which are determining the validity of a Hadith [1]. One of the attractive issues at both Shia and Sonni Hadith is paying attention to the sources. Hadith narrators have done their best to know the roots of Hadith meanwhile they are gathering them. One of the branches of Hadith science which is being called “Rejal science” is allocated to this issue and some important books have been written to introduce Hadith narrators [1, 7]. Some people decided to manage some rules and principles to distinguish a true Hadith from a wrong one and the reliable narrator from the liar one because of the wrong Hadith which had been added to the true ones. Rejal science is considering different aspects of narrator’s personality and that’s what has got an effect at acceptance or rejections of a Hadith. Rejal science has been attractive for the Moslems since the first days of Islam’s appearance and little by little it has been developed. Rejal science has been completed technically since the time Hadith has been widespread [9]. Background of Rejal science is returned to the half of the first century; About 40A.H. Abid Allah ( Abi Rafe`son ) Who was Imam Ali`s writer ( peace be upon Him ) Started writing the names of those who were long a long with Imam at Jamal , Safain and Nahrovan wars ( Ansar ) and those who weren’t(Mohajerin ) at a book . Sheikh Tosi has reminded the name of the book as ….. “Tasmiye of Martyrs long a long with Imam Ali at wars: safain, Nahrovan and Jamaln” At Shiite main and the first base of Rejal science is Implicating Imams` and prophets` praises and reproaches to ward some of their followers. There are so many different Hadith and Hadith books in which Shia’s Imams and other honest ones have proved or
denied some people because of their views or deeds. Therefore we may take a result the first bases of Rejal science at Shiite innocent Imam’s Hadith [9]. II. Hadith approved or Disapproval There are 2 disprovement
points
at
Provment
or
1- Provment and disprovement of each narrators 2- continuous or discrete chain of Hadith 2.1 Different Grades of approved of Narrators • Most reliable narrators • More reliable narrators • Best reliable narrators • reliable • ....... • Weak • Very weak 2.2 Different Grades of Disapproval of Narrators These words like lewd, liar and etc are the words which disapprove a narrator. Another parameter is the religion of the narrator. The last parameter is the number of narrations recited by the narrators. 2.3 Continuous and Discrete of a Hadith Hadith is being divided into 2 groups from the point of being continuous or discrete Motasel: there is no time interval between the narrators Moghuf (absolute Moghuf): Being narrated from Imam’s follower. Directly or indirectly and has referred him by a pronoun and source of Hadith is not clear. Morsal: Hadith has been narrated by the one who have not been during Imam’s life. There are also some other cases which are not as important as others that we are not going to discuss about them at this article. III. Designing and Implementing Processes
As it was discussed at chapter 2, there are two parameters to determine the validity of a Hadith:
356
GHAZIZADEH ET AL.
1) Hadith narrator `s honesty and Reliability 2) Continuous and discrete of a Hadith
000 001 002 003 004
Table I Term Names of “Continuity” Term Name Mosnad Moghuf Morsal Abi_omair General
Fig. 1. MBF of “Religion”
Table II Definition Points of MBF “Background” Term Name Shape/Par. Definition Points (x, y) Mohmal Linear (0, 0) (60, 0) (100, 1) (150, 0) (250, 0) Zaeif Linear (0, 1) (60, 1) (100, 0) (250, 0) Mamdoh Linear (0, 0) (100, 0) (150, 1) (195, 0) (250, 0) Moatabar Linear (0, 0) (150, 0) (195, 1) (240, 0) (250, 0) MOVASAGH Linear (0, 0) (195, 0) (240, 1) (250, 1) Table III Definition Points of MBF “Narration_Num” Term Name Low
Shape/Par.
Definition Points (x, y)
Linear
(0, 1) (18.572, 1) (60, 0) (130, 0) (0, 0) (20, 0) (60, 1) (100, 0) (130, 0) (0, 0) (60, 0) (111.426, 1) (130, 1)
Medium
Linear
High
Linear
Table IV Definition Points of MBF “Religion” Term Name CF_L
Shape/Par. Linear
CF_M
Linear
CF_H
Linear
Definition Points (x, y) (0, 1) (100, 0) (0, 0) (75, 0) (0, 0) (100, 1)
(25, 1)
(50, 0)
(25, 0) (100, 0) (50, 0)
(50, 1) (75, 1)
Table V Term Names of “Hadith_Validity” Term Name Unknown Weak Goodness Reliable Right
Fig. 2. MBF of “Background”
Fig. 3. MBF of “Narration_Num”
As we know a Hadith has a document. If a narrator in the hierarchy was disapproved the document would automatically be rejected. Experts grant the qualified degree to narrators. As it was discussed at chapter 1 for creating fuzzy rule base system, the first and important step is to establish a rule’s base, therefore we need some rules plus expert’s views to make inference. Two inference engines have done our work and get the result the output of the first engine is the Rank of each narrator and it will be an input for the second inference engine. The output of the second engine is Hadith validation rate. We divide our system in rules in two blokes .First rule blokes focus on the personal characteristics of narrator and the second rule blocks concentrate on Hadith. Table VII Variables of Group “Outputs”
Table VI Term Names of “Narrator_Charact”
#
Variable Name
1
Hadith_Validity
26
Narrator_Charact
Term Name very_low low medium_low medium medium_high high very_high
Type
Term Names unknown Weak Goodness Reliable Right very_low low medium_low medium medium_high high very_high
357
FUZZY EXPERT SYSTEM IN DETERMINING HADITH VALIDITY
Narrators have been given 5 degrees based on their background and 3 degrees based on their religion. The combination of background and religion degree gives 15 statuses. Table VII. Rule Block “RB1” Aggregation: Result Aggregation: Number of Inputs: Number of Outputs: Number of Rules:
MINMAX MAX 2 1 13
Table IX Rules of the Rule Block “RB1” IF Background MOVASAGH MOVASAGH MOVASAGH Moatabar Moatabar Moatabar Mamdoh Mamdoh Mamdoh Mohmal Mohmal Mohmal Zaeif
Religion CF_H CF_M CF_L CF_H CF_M CF_L CF_H CF_M CF_L CF_H CF_M CF_L
THEN DoS Narrator_Charact 1.00 very_high 0.70 very_high 0.50 very_high 0.90 High 1.00 medium_high 0.80 medium 0.90 medium_high 0.60 medium 0.90 medium_low 0.50 medium_low 0.30 Low 0.80 very_low 1.00 very_low
The second inference engine has 3 inputs, one input is the output of first engine and the others are number of narration and continuity chain of Hadith that is shown in table I. Table X Rule Block “RB2” Aggregation: Result Aggregation: Number of Inputs: Number of Outputs: Number of Rules:
MINMAX MAX 3 1 36
Second step is fuzzification of the variables. Triangular fuzzifier is used for the eligibility degree and singleton fuzzifier is used for the time interval. At the third step the inference engine has been applied. Inference engine helps us to process the input data by means of rule base to make suitable output. Multiplication inference engine is applied to simplify our computation. Therefore the following methods are used [3]: A) inference based on separated rules plus union combination
B) Mamdani multiplication inference C) Algebra multiplication for fuzzy intersection D) Maximum for fuzzy union which is being computed as follow : μ B ′ (Y ) = max ∏ k =1r μ Aik (x ′k ).μ Bi ( y ) } I=1…,nR
{
through having a fuzzy set A′ at U, the above Formula (multiplication inference engine) will be resulted a fuzzy set B ′ at V . Table XI Rules of the Rule Block “RB2” Continuity Mosnad Mosnad Mosnad General Moghuf Abi_omair Abi_omair Abi_omair Mosnad Mosnad Mosnad Abi_omair Abi_omair Abi_omair Mosnad Mosnad Mosnad Abi_omair Abi_omair Abi_omair Mosnad Mosnad Mosnad Abi_omair Abi_omair Abi_omair Mosnad Mosnad Mosnad Abi_omair Abi_omair Abi_omair Mosnad Mosnad Abi_omair Abi_omair
IF Narration_ Num High Medium Low
High Medium Low High Medium Low High Medium Low High Medium Low High Medium Low High Medium Low High Medium Low High Medium Low High Medium Low
Narrator_Char DoS act very_high 1.00 very_high 0.90 very_high 0.80 1.00 1.00 very_high 0.90 very_high 0.80 very_high 0.70 High 1.00 High 0.90 High 0.80 High 0.90 High 0.80 High 0.70 Medium_high 0.60 Medium_high 1.00 Medium_high 0.90 Medium_high 0.50 Medium_high 0.90 Medium_high 0.80 Medium 0.70 Medium 0.60 Medium 0.50 Medium 0.60 Medium 0.50 Medium 0.40 Medium_low 0.60 Medium_low 1.00 Medium_low 1.00 Medium_low 0.50 Medium_low 1.00 Medium_low 1.00 Low 0.60 very_low 0.90 Low 0.70 very_low 1.00
THEN Hadith_Vali dity Right Right Right Weak Unknown Right Right Right Reliable Reliable Reliable Reliable Reliable Reliable Reliable Goodness Goodness Reliable Goodness Goodness Goodness Goodness Goodness Goodness Goodness Goodness Goodness Weak Weak Goodness Weak Weak Weak Weak Weak Weak
Last step is de fuzzifier designing. We use center Average Defuzzifier according to logic of the problem and criteria to choose Defuzzifier Center Average Defuzzifier is one of the most common and current
358
GHAZIZADEH ET AL.
Fig. 4. Structure of the Fuzzy Logic System
methods at fuzzy systems and their controlling . Output is at an exact and definite point ( y ′ ∈ v ) as follow [5]: nR
y′ =
∑ Y ( ). w ( ) i
i =1
Example Suppose A, B, C, D, E, F and G are 7 narrators. Following are qualifications of these narrators: Table XII Narrator Characteristic
i
0
nR
∑
w (i )
i =1
w (i ) Is height degree, therefore our out put
( y ′)
is our final out put or in other word Validity degree of a Hadith. Figures 5-7 shows the effect of parameters on outputs graphically.
Fig. 5.Effect of Background and Religion on Narrator validity
Fig. 6.Effect of Background and Number of narration on Narrator validity
Fig. 7.Effect of Narrator validity and Continuity on Hadith validity
1 2 3 4 5 6 7
Narrator ’s Name A B C D E F G
Background Movasagh Zaeif Mohmal Moatabar Mohmal Movasagh Moatabar
Religion CF 56 23 45 48 32 56 34
Number of Narration 20 17 67 54 35 76 93
Assume that A, C, E are chain of one Hadith and B, D, F, G are chain of second Hadith. If First Hadith continuity was Moghuf and the Second one was Mosnad (Motasel) following is the result of our system that determine state validity of Hadith: 1. The validity of Hadith is “Unknown”. 2. The validity of Hadith is “Very Low” We implement this system in First volume of kaffi book that contains more than 1900 Hadith and 4000 narrators. By doing simple query on the outputs the following results will be gained: Systems output is correct in 94% cases. It means system has only errors in 6% cases. IV. Conclusion
Fuzzy logic has a great scope and it can be used for different purpose. Islamic sciences can make use of it. We can also benefit from the concept s of its logic in Islamic sciences such as Hadith science. We concluded we can use the expert’s experiences and implements them in form of IF-then Rules. The final out put will be shown in fuzzy set. We are able to show the
FUZZY EXPERT SYSTEM IN DETERMINING HADITH VALIDITY
result digit through using defuzzifier. As reoffered before the system accuracy is very high and we can claim that if all the rules enter the system correctly and completely the system will be worked without any errors. Quick inference and rapid computing of the outputs are the other characteristics of this system. The system is also flexible because the rules are changeable. V. Acknowledgment This work is partially supported by the Computer Research Center of Islamic Sciences (CRCIS). NOOR co. P.O. Box 37185-3857, Qum, Iran REFERENCES
[1] Al-Tosi, Book of alrejal, publishing Co.Alrazi, Qum, 1987. [2] E. H. Mamdani, Application of fuzzy algorithms for control of simple dynamic plant, Proc. Inst. Elect. Eng., Control
Sci., Vol. 121, (1974)1585–1588. [3] E. H. Mamdani and S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller, Int. J. Man-Mach.
Stud., Vol. 7, (1975), 1–13. [4] E.H.Mamdani, Advances in the linguistic synthesis of fuzzy controllers, Int. J. ManMach.Stud., Vol. 8, (1975), 669–678. [5] L.A. Zadeh, Fuzzy Sets, Information and Control, 1965. [6] L.A Zadeh. The role of fuzzy logic in the management of uncertainty in expert systems, Fuzzy Sets and Systems, Vol.
11, No. 3, (1983), 199–227. [7]M.Hasan-Ebn-alhor alameli, Vasael-Shia, Publishing Co. Islamic Science, Qum, 1980. [8] Mohammade Ebn yaghub Koleini,book of Osole e Kaffi ,publishing Co Maktab-Islami ,Tehran,1978 [9] S.A.Khamenei, Four privcipal books in Rejal science, publishing Co. Islamic culture, Tehran, 1994.
359
An Investigation into the Performance of General Sorting on Graphics Processing Units Nick Pilkington and Barry Irwin Department of Computer Science Rhodes University Grahamstown, South Africa
Abstract—Sorting is a fundamental operation in computing and there is a constant need to push the boundaries of performance with different sorting algorithms. With the advent of the programmable graphics pipeline, the parallel nature of graphics processing units has been exposed allowing programmers to take advantage of it. By transforming the way that data is represented and operated on parallel sorting algorithms can be implemented on graphics processing units where previously only graphics processing could be performed. This paradigm of programming exhibits potentially large speedups for algorithms.
I. INTRODUCTION Sorting is a field of computer science that has been thoroughly researched and investigated. Canonically stated sorting is the process by which a list of randomly ordered elements is transformed into a list ordered by some criteria. There are a number of sorting algorithms that can achieve this with different speeds. It is Interesting that attempting sorting on the Graphics Processing Units (GPU) opens up possibilities for using parallel sorting algorithms, commonly called sorting networks [6]. These are formulations of sorting algorithms that are not possible on sequential processors because of the limits of processor design. This paper is concerned with presenting an investigation into whether or not there is a significant performance gain, if any, using a GPU to sort data. It is sought whether general sorting is computationally viable, and what constraints there are to deal with as well as if these constraints can be overcome. This paper will first consider how general purpose computation is achieved on specialized graphics hardware. An experiment will then be presented as to how a variety of parallel sorting algorithms were tested and compared as well as describing any problems encountered and how they were circumvented. II. THE GRAPHICS PIPELINE The graphics pipeline is the central core of graphics processing. It can be thought of as a sequence of stages operating in parallel in a fixed order. Each stage of the The authors would like to acknowledge the financial and technical support of this project of The National Research Foundation of South Africa, Telkom SA, Business Connexion, Comverse SA, Verso Technologies, Stortech, Tellabs, Amatole, Mars Technologies, Bright Ideas Projects 39 and THRIP through the Telkom Centre of Excellence at Rhodes University.
pipeline receives information from the previous stage, performs some operation on it and then passes it on to the next stage in the pipeline. The graphics pipeline in analogous to an assembly line in a factory, where each stages builds upon the previous one [2]. III. STREAM PROGRAMMING MODEL Conversely to sequential processing on a CPU; in the stream processing model all data is represented as a stream. These streams can be thought of as an ordered set of data of the same type. The type of data in the stream can be very simple (a stream of integers or floating-point number) or it could be more complex (a stream of points or matrices). These streams can be of any length however efficiency is higher on longer streams. There are a number of functions that can be executed on streams and they include, copying them, deriving substreams from them, indexing into them with a spate index stream and finally performing computation on them with kernels [8], [3], [11], [1]. Kernels operate on an input stream and produce a corresponding stream of output elements. The defining characteristic of kernel is that they do not operate on individual elements. Kernels can be thought of as the evaluation of a function on each element of an input stream. This is in many ways similar to the ‘map’ operation of functional programming [8], [3], [11], [1]. The kernel could perform one of several operations like expansion, where more than one element is produced from a single input, reductions, where more than one input element is combined into a single output element, or filters, where only a subset of input elements are output. Computation on a single stream element does not depend on any of the other elements of the stream and as a result is purely a function of the input element. This restriction is very favorable as it means that the input stream type is completely known at the time of compilation and can be optimized as such, but even more favorable is the fact that this independence implies that the order of computation of the mapping is unimportant which ultimately means that what appears to be a serial kernel operation can actually be executed in parallel [1], [9]. Applications can be constructed by chaining together the inputs and output of various streams, whereby the output of
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 360–365. © Springer Science+Business Media B.V. 2008
PERFORMANCE OF GENERAL SORTING ON GRAPHICS PROCESSING UNITS
one stream becomes the input of the next one. The graphics pipeline is traditionally structured as stages each depending on the result of the immediate previous stage. This makes the graphics pipeline a good match for the stream programming model as it is analogous to the stream just described [1], [9]. Similarly vertex and fragment shader programs would take the place of kernel programs that execute on elements of the stream.
361
Each of the algorithms we implemented and executed a number of times to gather performance data. The timing considerations are detailed more precisely in section VI and the test configuration in section VII. Since the investigation was general in nature, the algorithms were implemented in a canonical way and no optimizations were made for special cases or input sets. This gave unbiased statistical data that could be used to give an accurate performance analysis.
IV. PROGRAMMABLE PARALLEL PROCESSORS
VI. TIMING CONSIDERATIONS
In traditional programming, the code is executed by the CPU. Most GPUs have two different types of shaders, vertex processors and fragment processors. Newer cards contain a third type of shader known as geometry processors however these are not ubiquitous yet. The vertex processors operate on streams of vertices. Vertices can contain varying amounts of information but at a minimum they contain a position and colour. Other information that can be associated with them is alpha coverage and texture coordinates. The output of the vertex shader is a stream of transformed vertices. The transformation depends on what the vertex shaders actually do. It could just pass-through all values unchanged, or it could change the colour and position of the vertex. The output vertices are used to construct triangles. The triangles are used to construct fragments which can be thought of as precursors to pixels. Fragments contain all the information required to generate a pixel and write the corresponding values into the frame buffer. The fragment processors in the GPU run fragment programs, also known as pixel shaders, on the fragments in the input stream. The resulting output is the colour that is written to the frame buffer at the end of the graphics pipeline. Fragment and vertex shaders are the means by which the processing in the graphics pipeline can be controlled [4], [11], [7]. An in-depth discussion of General Processing on Graphics Processing Units is beyond the scope of this paper and the reader is directed to the references section for more information on this emergent field in computer science [12].
In order to gain a better understanding of GPU sorting performance a high resolution timer was necessary to time fragments of code in order to generate the data for statistical analysis. In timing code there were a number of factors that could either be included or excluded and consideration was required in order to set a uniform way which the code was timed in order to get consistently accurate results. For this exact purpose following were adhered to in timing all code. A micro-second timer was used in the code to time the execution length of the various sorting operations. The initialization of the lists to be sorted were not timed nor was any other types of computation the contributed to the overhead of the programs rather that the algorithm in question.
V. METHODOLOGY In order to investigate the performance of sorting on the GPU a number of sorting algorithms for both sequential and parallel processors was selected. The sorting algorithms were chosen to span a various number of complexities, and types in order to get a broad range of results. The algorithms selected for performance testing along with their Big-O complexities and whether they are sequential or parallel are detailed in I.
VII. TESTING CONFIGURATION In timing the implementations each of the four different sorting implementations were run 1,000 times each on data sets of sizes 322, 642, 1282, 2562, 5122 and 10242. The average of these runtimes was then used as the time for that specific implementation and data set size. This average was computed as the standard arithmetic mean. More formally if the average time is computed as: 1 1000 where xi are individual test case times. All timing runs were executed on a machine with the specification detailed in table II. TABLE II TEST PLATFORM CONFIGURATION CATEGORY PROCESSOR MEMORY GRAPHICS GRAPHICS DRIVER MAIN BOARD HARD DRIVE OPERATING SYSTEM
DETAILS INTEL CORE 2 DUO (1.86GHZ) 2048MB DDR2 (400MHZ) NVIDIA GEFORCE 7900 GT (256MB) NVIDIA DETONATOR 91.47 INTEL CORPORATION Q965 80GB SATA WINDOWS XP SERVICE PACK 2
TABLE I SORTING ALGORITHMS ALGORITHM BUBBLE SORT QUICK SORT TRANSITION SORT BITONIC MERGE SORT
COMPLEXITY O(n2) O(n log n) O(log2 n) O(log n2)
TYPE SEQUENTIAL SEQUENTIAL PARALLEL PARALLEL
VIII. IMPLEMENTATION This section will describe each of the four sorting algorithms detailed in table I briefly as well as describe how they were implemented on the CPU and GPU. All implementations were done in C++, with the GPU implementation done using the Cg shader language and the OpenGL graphics API.
362
PILKINGTON AND IRWIN
A. CPU Implementations 1) Bubble Sort: The bubble sort is one of the simplest sorting algorithm and also one of the slowest. It operates by comparing every pair of elements and swapping them as necessary [6]. Pseudo code for the bubble sort is shown in Algorithm 1. The bubble sort was implemented in code in the same way. Algorithm 1 Bubble Sort def bubbleSort(A) { for i = 1 to length (A) do for j = i +1 to length (A) do if (A[j] > A[i]) swap(A[i], A[j]) endif end end }
2) Quick Sort: The quick sort is significantly faster than the bubble sort, and is a more frequent choice in real life sorting applications. The quick sort employs a divide and conquer approach to divide a list in two then sort each list recursively. The lists are divided by selecting a pivot element within the list and moving all elements that are less than the pivot into the first sub-list and all elements that are greater than the pivot to the second list (equal elements can fall into either list). These two sub lists are then sorted in the same way. Pseudo code for the quick sort is shown in Algorithm 2. The implementation was done in the same way that the pseudo code shows. B. GPU Implementations As discussed in section I parallel sorting algorithms are different to sequential sorting algorithms as they are designed to be distributed across more than one processor. The following section with detail how the odd even transition and the bitonic merge sorts operate as well as how they were implemented on the GPU.
Algorithm 2 Quick Sort Pseudo code
1) Odd Even Transition Sort: The odd even transition sort is based on the operation of the bubble sort described in VIIIA. 1 [5]. The operation of the sort considers every element to the left of itself and compares and swaps them as necessary. As this is a parallel sorting algorithm there is no explicit loop. In fact the algorithm is best thought of as a network where the nodes in the network perform some operation on the elements in question and data moves with in the network until it is sorted. The parallel pseudo code for the odd even transition sort is presented in Algorithm 3. Algorithm 3 Odd Even Transition Sort Def transitionSort() { repeat n times do in parallel if ( element[n] > element [n+1] ) swap (element[n] , element [n+1]) endif end parallel end }
The odd even transition sort was implemented in a single fragment shader using the render to texture feedback loop mechanism describe in sub-section IX-A. The compare and exchange operations were performed within a fragment shader. The shader’s execution represented one pass of the algorithm. The actual data values were encoded into the red colour channel of a texture with the same dimensions as the data set size. In order to invoke the shader to execute on the data a screen sized quad was rendered to the screen. The resulting image in the frame buffer was then read back in the texture and re-rendered invoking another pass of the algorithm. This process was repeated n times resulting in the data being fully sorted. The images in figure 1 show the stage of the list of data being sorted at various stages during the execution of the algorithm. It should be notes that the data, although depicted in a two dimensional sense is actually representative of a linear sequence. Fig. 1. Odd Even Transition Sort
def quickSort(A) { var less , equal , greater if length(A) <= 1 return array select a pivot value pivot from A for i := 1 to length(A) x = A[i] if x <= pivot then add x to less endif if x > pivot then add x to greater endif return concatenate(quicksort(less),quicksort(greater)) endfor }
2) Bitonic Merge Sort: A bitonic merge sort is another sorting network. It operates on the principal of bitonic sequences. Definition 1: A bitonic sequence is composed of two subsequences, one monotonically non-decreasing and the
PERFORMANCE OF GENERAL SORTING ON GRAPHICS PROCESSING UNITS
other monotonically non-increasing. Bitonic sequences have two properties that are of interest in a bitonic merge sort. The first is that a bitonic sequence can be divided in half and result in two sequences such that both are bitonic, and either every element in the first sequence is less than or equal to every element in the second sequence or every element in the first sequence is greater than or equal to every element in the second sequence. A sorted sequence is a bitonic sequence Algorithm 4 Bitonic Merge Sort def bitonicMergeSort(int[] A, int n) { perform_bi tonic_merge() sort_bitonic(A, n/2) sort_bitonic(A+n/2,n/2) }
where one of the comprising sequences is empty. In order to perform this division, elements in corresponding positions in each sequence are compared and exchanged as necessary. This operation is sometimes called a bitonic merge [10], [6]. In order to perform a full bitonic merge sort the initial sequence is assumed to have length a power of two. This ensures than it can be continually divided in half. The first half of the sequence is sorted into ascending order while the second half is sorted into descending order. This operation results in a bitonic sequence. A bitonic merge is performed on this sequence to yield two bitonic sequences each of which is sorted recursively until all the elements in the sequence are sorted. The pseudo code in Algorithm 4 presents the recursive algorithm for a bitonic merge sort. The implementation of the bitonic merge sort was significantly more complex than the odd even transition sort. The shaders represented the bitonic merge operation while the sort merge function was performed implicitly by passing a uniform parameter to the shader indicating the current recursive depth. As with the odd even transition sort described in sub-section VIII-B.1 the data values of the list were encoded into the red channel of a texture and the same render to text mechanism was used to perform the log2n iterations necessary to sort the data fully. The images in figure 6 show the data being sorted at various stages during the execution of the algorithm. Once again it should be notes that the data is actually a one dimensional sequence, not two dimensional.
363
IX. DIFFICULTIES ENCOUNTERED There were a number of difficulties encountered in implementing sorting on the GPU. This section details various salient difficulties that presented themselves during the implementation, some of which were referenced to earlier. It also provides caveats and ways in which the problems could be solved and circumvented. A. Render to texture Sorting is a multi-pass operation and can only be accomplished through a number of iterations. Therefore there is a need for a feedback loop where data that has been operated on is passed back to the beginning of the sorting operations and used in the next iteration. This need extends beyond sorting and applying to graphics processing in a general sense. The simplest approach to creating this feedback loop is rendering to a texture. This is achieved by rendering normally to the off-screen frame buffer, but then instead of swapping the buffers to display the rendered data result, the contents of the off-screen frame buffer are read out, and copied directly back into the source texture and the buffer cleared. This means that the operation of all the shaders takes place on the texture data, and the resulting data is stored in the texture again as input for the next iteration. This can be initiated by rendering more geometry. Figure 3 illustrates this process. B. Coordinate Wrapping Although the texture is two dimensional the data it represented is still considered a linear list of elements. Therefore the element to the left of an element in the first column of the texture is the element in the last column of the previous row of the texture. Care needed to be taken in order to correctly wrap the texture coordinated when accessing a neighboring element. Fig. 3. Render to Texture Feedback Loop
Fig. 2. Bitonic Merge Sort
C. Uniform Parameters Sorting requires information to be passed into the kernel programs. In the bitonic merge sort, the step size and current recursive depth needed to be known within the execution environment of the fragment shader in performing the bitonic merge operation. These parameters are bound to the kernel program from the housing C++ program and appear and are accessible as standard parameters in the shader.
364
PILKINGTON AND IRWIN
X. RESULTS Table III presents the mean execution times of the four different sorting operations across the different data set sizes. These execution times are presented visually in figure 4. It should be noted that the times are presented on a log scale. TABLE III MEAN EXECUTION TIMES OF SORTING ALGORITHMS (microseconds) SIZE
BUBBLE
QUICK
TRANSITION
BITONIC MERGE
32 64 128 256 512 1024
1,245.34 94,456.92 1,656,493.37 21,266,453.67 2,988,593,594.34 48,651,723,523.47
119.29 508.88 2179.00 9419.59 40,954.87 178,791.31
7824.04 10544.65 11345.21 15452.44 18546.23 23681.59
2418.93 2886.15 2998.27 3156.21 5369.47 6022.12
TABLE IV RELATIVE SPEEDUP OF GPU SORTING ALGORITHMS TO QUICK SORT SIZE 32 64 128 258 512 1024
RELATIVE SPEEDUP FACTOR TRANSITION SORT BITONIC MERGE SORT 0.02 0.05 0.05 0.18 0.18 0.75 0.61 2.98 2.21 7.64 7.55 29.69
Fig. 4. Sorting Algorithm Execution Times
Considering sorting performance as a whole; the average execution time for the CPU quick sort and GPU bitonic merge sorts across all data set sizes are μQuickSort = 38,662.16 and μMergeSort = 3,788.34 respectively. These values can be used to compute an average relative speedup of GPU sorting to CPU sorting as: μ 10.23 μ This shows than on average the GPU performed the sorting operation 10.23 times faster than the CPU or in other words exhibited a speed up of 1, 023%. It is also interesting to investigate the performance of the two GPU sorting algorithms independently of the CPU ones. The transition sort has a complexity of O(log2n) where are the bitonic merge sort has a complexity of O(log2(n2)). The graphs of these two complexities are shown in figure 5 and figure 6. It can be seen that these two sorting algorithms both have the same asymptotic behavior. XI. PERFORMANCE ANALYSIS The execution times depicted in figure 4 conform accurately to the predicted complexities in table I. There are several salient points to notice in comparing sorting on the GPU and sorting on the CPU. The CPU sorts out performs the GPU for the smaller test cases in general. The reason for this is the structure of the feedback loop. As discussed in sub-section IXA the feedback loop is constructed by copying the contents of the frame buffer back into the texture. This is an expensive operation and the GPU implementations pay the price on the smaller data set. However as the data set size increases the parallel nature of the GPU based algorithms offset the cost of the render to texture operation outperform the CPU sorts. This is illustrated in table IV where the speedup of the transition and merge sorts in computed against the faster of the CPU sorts, the quick sort. The bubble sort is excluded from this comparison as is it infeasible for large data sets and generally only considered in sorting because of its simplicity and not its efficiency. The actual speedups of the odd even transition sort and bitonic merge sorts were computed as:
Fig. 5. Transition Sort Asymptotic Complexity
Since the GPU is a parallel processor and the algorithms being analyzed are parallel in nature it is important to consider the effect that the number of shaders cores has on the performance. It is expected that where a GPU has more shader cores the performance of the sorting algorithm distributed better and executes faster. Table III shows that this is in deed the case as with a higher number of shader cores the performance increases.
PERFORMANCE OF GENERAL SORTING ON GRAPHICS PROCESSING UNITS Fig. 6. Bitonic Merge Sort Asymptotic Complexity
365
XIII. ACKNOWLEDGEMENTS The authors would like to thank the Computer Science Department of Rhodes University for their support of this paper. Nick Pilkington would also like to thank his mother for her support. REFERENCES [1] [2] [3]
TABLE V MEAN EXECUTION TIME OF BITONIC MERGE SORT WITH VARYING NUMBERS OF SHADER CORES CORES 4 8 24
MEAN EXECUTION TIME (MICROSECONDS) 28,644.76 10,863.39 3,788.24
A. OPTIMIZATIONS As stated in section V the implementations where constructed to be canonical and general in nature. With no specific optimizations made. This section will detail what types of optimization could be made to the GPU based sorts to increase performance even further. A. Feedback Mechanism As mentioned in IX-A the feedback loop structure introduces a large performance bottleneck as moving data back from the CPU to texture memory is slow. A number of mechanisms have been developed to remove this bottleneck. One such way is rendering directly into a texture without the need to copy data between memory locations. This method is not standardized in OpenGL 1.3 however ARB extensions makes it possible [13]. B. Data Encoding The implementation section detailed that the data to be sorted was encoded in the red channel of the texture. This leaves the blue, green and alpha channels of the texture totally unused. If data was encoded in these channels as well, the GPU sorting algorithms could sort four times more data concurrently with little extra computation incurred. XII. CONCLUSION The results shown in section X illustrate that high performance sorting is a very viable option on GPUs. The parallel nature of GPUs provides the means to implement parallel sorting algorithms not possible on sequential processors. For smaller data sets there is no real benefits in sorting on the GPU however as the size of the data set to be sorted increases the GPU sorting algorithms enjoy super-linear time growth where as the fastest CPU sorting algorithms are still O(n log2n).
[4] [5] [6] [7] [8]
[9] [10] [11] [12] [13]
Ian Buck. Taking the plunge into gpu computing. In GPU Gems 2, page 509, One Lake Stree, Upper Saddle River, NJ, 2004. Addison Wesley. Randima Fernando and Mark Kilgard. The Cg Tutorial. AddisonWesley Professional, 2003. Daniel Horn. Stream reduction operations for gpgpu applications. In GPU Gems 2, page 557, One Lake Stree, Upper Saddle River, NJ, 2004. Addison Wesley. Emmett Kilgariff and Randima Fernando. GPU Gems 2, chapter The GeForce 6 Architechture, page 471. Addison Wesley, 2004. Peter Kipfer and Rudiger Westermann. GPU Gems 2, chapter 46, pages 733 – 746. Addison Wesley Professional, 2005. Donal Knuth. The Art of Computer Programming. Addison-Wesley Professional, 1998. Aaron Lefohn. Gpu memory model overview. In SIGGRAPH ’05: ACM SIGGRAPH 2005 Courses, page 127, New York, NY, USA, 2005. ACM Press. David Luebke, Mark Harris, Jens Krager, Tim Purcell, Naga Govindaraju,Ian Buck, Cliff Woolley, and Aaron Lefohn. Gpgpu: general purpose computation on graphics hardware. In SIGGRAPH ’04: ACM SIGGRAPH 2004 Course Notes, page 33, New York, NY, USA, 2004.ACM Press. Ian Buck Mark Harris. Gpu flow-control idioms. In GPU Gems 2, page 547, One Lake Stree, Upper Saddle River, NJ, 2004. Addison Wesley. Norman Matloff. Introduction to parallel sorting [unpublished]. Department of Computer Science University of California at Davis, 2006. John Owens. Streaming architechture and technology trends. In GPU Gems 2, page 457, One Lake Stree, Upper Saddle River, NJ, 2004. Addison Wesley. Nick Pilkington. An investigation into general processing on graphics processing units [unpublished]. Department of Computer Science, Rhodes University, South Africa, 2007. Chris Wynn. Opengl render-to-texture. Technical report, NVIDIA Corporation, 2006.
An Analysis of Effort Variance in Software Maintenance Projects Nita Sarang
Mukund A Sanglikar
CMC Limited, Mumbai, India [email protected]
University Department of Computer Science, University of Mumbai, Mumbai, India [email protected]
Abstract- Quantitative project management, understanding process variations and improving overall process capability, are fundamental aspects of process improvements and are now strongly propagated by all best-practice models of process improvement. Organizations are moving to the next level of quantitative management where empirical methods are used to establish process predictability, thus enabling better project planning and management. In this paper we use empirical methods to analyze Effort Variance in software maintenance projects. The Effort Variance model established was used to identify process improvements and baseline performance. I. INTRODUCTION The development and subsequent maintenance of software is complex with a large number of factors contributing to its success. A real-world experimentation to study the dynamics of such a system is difficult and often turns out to be costly; empirical studies helped researchers and project managers in these areas. This paper focuses on the software maintenance realm. The paper discusses the difficulties in predicting the estimates associated with enhancements to software that is already in production and proposes a model to better the estimation process. Empirical techniques were applied to quantitatively manage project estimates. Project data collected over a period of 5 years from developing and maintaining large-scale applications was used to establish a process performance model for predicting Effort Variance (EV) and improving the estimation process [11].
The data for this study was provided by CMC Limited; two separate product lines within the securities domain were considered.
II. THE DATA ANALYSIS CONTEXT Release cycles for maintenance projects are determined by the business requirements and require extensive planning on the part of the software maintenance teams to ensure that committed new functionality gets out to the end-users in a timely manner. Competitive edge to organizations comes from continuously enhancing the products / applications while meeting the time-to-market considerations. The primary question under investigation is based on improving effort estimation when planning change requests to the software - what are the factors that lead to variation between planned and actual efforts? It is necessary to recognize the importance and the influence of context variables and the domain, when interpreting the results of empirical studies. Our research and associated analysis is based on field data from industrial projects in the Securities domain and also includes supporting business applications. The applications under consideration are large-scale distributed systems that have been under maintenance for over a decade; new applications and business functionality continues to be added. Along with functional enrichment, these applications have also undergone migrations for new technologies and platforms; including architecture makeover. A typical sizeable enhancement (change request or new application) goes through a complete software development life cycle before it is ready for implementation. Post warranty support (typically 3 months), the application is merged with others that are under maintenance. The drivers for such change requests are typically: • The product suite continues to grow in size as more and more functionality / applications get implemented. • The engineering technologies and practices change with time. • Technology migration of one or more application components happens over a period of time. This also includes new versions of system / layered software and databases.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 366–371. © Springer Science+Business Media B.V. 2008
AN ANALYSIS OF EFFORT VARIANCE IN SOFTWARE MAINTENANCE PROJECTS
•
The sunset of older technologies, demands redesign. • Defect fixes are a significant component of software maintenance. • Regression testing has significance. We considered two distinct product lines [5] within securities domain for this analysis. Both product lines have undergone maintenance, with new functionality being added to the base product, and migrations over a period of time. Also included are other change requests that continue to flow in for all applications [1]. Table 1 classifies major functional enhancements (new applications) added to the base product over time. TABLE 1 ENHANCEMENT HISTORY OF PRODUCT LINES Data Source
No. of Applications with Size < 550UFPs
No. of Applications with Size 551 to 2000UFPs
No. of Application with Size > 2000UFPs
Product Line 1
11
8
5
Product Line 2
22
3
3
Technology migrations and re-design of software for new technologies are also included as enhancements. It may be noted that there is a need to continuously get new business content (change requests) into the migrated versions. III. EMPIRICAL STUDY DESIGN CONSIDERATIONS The empirical analysis revolves around the quality improvement paradigm and was driven by a need to have better predictability of the processes implemented [3]. Various initiatives – CMMI®, ITIL, Business Excellence [18], [19] are being pursued through a sustained quality culture; data on organization and process performance is available. In the context of software maintenance, we considered the following design decisions to be significant: • The definition of the size of a maintenance task. • The norm for describing a maintenance cycle. • The prediction of efforts needed to carry out software maintenance tasks. In the software maintenance context, we do not consider the ever-growing size of the code base but rather analyze the data for the size of individual change
367
requests (enhancements) after appropriate segregation [2], [4], [6], [7]. The maintenance cycle starts post-warranty. That is, defect fixes in maintenance projects relate to residual or latent defects in the software; all efforts pertaining to defect fixes were categorized as rework. We selected the use of prediction models as instruments to support the estimation and analysis. IV. DATA ANALYSIS The data analysis revolves around establishing a performance model for Effort Variance that represents past and current process performance and can be used to predict future results of the software maintenance process; the output variable ‘Y’ under consideration being, Effort Variance. The Goal Question Metrics (GQM) technique [9] was used to identify the top ten related input variables ‘x’ as the factors influencing EV in maintenance projects. Perception was used to segregate the basic system attributes from the dynamic system factors/considerations. Qualitative analysis was used to support and provide early validation to this reasoning. All qualitative considerations were adequately addressed to remove their influencing effects on the system. This was ascertained from project performance data in the project databases. Figure 1 depicts the model. Independent Variables: - Size - Skill Level - Process Compliance - Requirements Volatility
Dependent Variables: - Effort Variance
Moderating Variables: - Communication Overhead - Span of Control - Review Coverage - Rework - Schedule Slippage - Planned Effort Fig. 1: Conceptual candidate model for EV
Clearly, not all attributes (identified as candidate measures) are directly measureable, for example ‘Skill Level’ and ‘Span of Control’. The challenge of measuring such attributes is to define meaningful substitute measures that can be used in practice. In order to apply statistical inference to the problem [10], hypothesis testing was used. Hypotheses were postulated around the question: What are the factors that lead to variation between planned and actual efforts?
368
SARANG AND SANGLIKAR
The hypotheses were established on small data samples (less than 100 data points). t-test and ANOVA test were used to test the NULL (H0) hypotheses on different population sets and study the behavior of the software maintenance process [8]. The statistical significance the data shows towards the hypotheses (H0) was established based on the p-value. Table 2 gives the results of the hypotheses tests. TABLE 2 TESTS OF HYPOTHESES Hypothesis
Test
p-value
Two – sample t - test
0.964
<= 50% Rework, >5<= 10% Rework, > 10% Rework
One – way ANOVA
0.000
Size < 10UFPs, Size > 1000UFPs, Size >= 100UFPs <= 100UFPs
One – way ANOVA
0.000
Skill Level <= 3, Skill Level >3
Two – sample t - test
0.024
Process Compliance > 80, Process Compliance < 80
With respect to other candidate ‘x’ variables identified, namely, Requirements Volatility, Review Coverage and Span of Control, these did not show any correlation with Effort Variance. The hypotheses tests for Effort Variance identified a need for multiple linear regression analysis to quantify the relationship between EV and the identified dependent measures – Rework, Size and Skill Level. Table 3 summarizes the statistically significant relationships with respect to our Effort Variance analysis. TABLE 3 SIGNIFICANT ATTRIBUTE RELATIONSHIPS Statistically Significant Relationships
Following are the findings from hypotheses testing: • Effort Variance is not affected by the level of process compliance the project demonstrates. It may be noted that no data points showed PCI < 75%; the organization had steadily moved to maturity level 4, and level 5 of the with CMMI® initiatives S/w-CMM® underway to drive a coherent measurement program. • Effort Variance increases significantly with the Rework component. It may be noted that the organization had standardized on IFPUG’s Enhancement Project Function Point Calculation; as a standard convention, the Unadjusted Function Point (UFP) count was used. • Effort Variance is much smaller for sizes in the range 100 to 1000UFPs as compared to Size < 100UFPs or Size > 1000UFPs. Further analysis of the correlation of Effort Variance with Size indicated a range of data values (Size < 100UFPs) that do not show correlation. These points had to be removed from the data set used for modeling. • Effort Variance is dependent on the overall Skill Level of the team; a Skill Level of 3 was the threshold.
r value
p-value
Size (+)
0.944
0.000
Skill Level on 1↑ – 5↓ scale (+)
0.999
0.000
Review Coverage (-)
0.132
0.399
Candidate
Variables
The following are the definitions for measures used in the data analysis: Effort Variance (Original) = actual Effort – original Effort x 100, where original Effort Effort may be measured in person days or person months. Rework is the Effort in person months expended in defect correction. Skill Level relates to an overall competency of the team and is determined as a skill score index. It is computed as an average of the skill score of project team members, where the skill score for an individual is a weighted average of various skill competency levels; and weights being determined by the role of the individual in the team. Skill competency across 5 areas is considered – management, technology, domain, process and tools. V. DESIGNING THE PREDICTION MODEL AND ACCURACY MEASURES A number of different prediction models have been previously used and analyzed for their accuracy. These include models developed using regression analysis, neural networks and pattern recognition techniques. The linear multiple regression model has been found to be fairly accurate and is commonly used for its simplicity; and is preferred over the nonlinear
AN ANALYSIS OF EFFORT VARIANCE IN SOFTWARE MAINTENANCE PROJECTS
multiple regression model types when not significantly less accurate. Accuracy comparisons based on the Magnitude of Relative Error (MRE) = |actual effort – predicted effort| and actual effort PRED give an indication of the overall fit for a set of data points and provide a meaningful analysis when determining the accuracy that the model provides. In our analysis, we used the multiple linear regression model type minimizing (actual effort – predicted effort)2 along with the measures Mean Magnitude of Relative Error (MRE), Median Magnitude of Relative Error (MdMRE) and appropriate variants of PRED for our analysis. We chose PRED based on our observations of the effort data from maintenance projects. More than 50% of the tasks in maintenance projects required less than three maintenance days (effort). For such tasks, the software maintenance project manager could tolerate a difference in effort of up to 0.5 maintenance days, and not more even though the MRE was much higher. Thus, PRED(0.25, 0.5) was used. PRED(0.25, 0.5) = % tasks with MRE <= 0.25 OR (|actual effort – predicted effort|) <= 0.5 maintenance days. We chose n-fold cross-validation to calibrate the data model to all ‘n’ observations, thereby ensuring that the estimators of prediction error are unbiased and can closely approximate the real-world situation. VI. THE PREDICTION MODEL AND ITS VALIDITY Linear regression analysis was used to determine the best-fit plane (or hyperplane) that satisfies the principles of least squares [12] while modeling the relationship between ‘x’s and ‘Y’. A step-wise regression approach was adopted to determine an acceptable fit with as few variables as possible. Iteration 1: Multiple regression analysis considering all statistically significant relationships. EV = -10.5 + 82.1Rework + 0.0752Size - 46.6Skill Level (M1) It was observed that a set of 12 data points with Size < 100UFPs was skewing the results. A causal analysis on the data indicated a need to revise the estimation methodology for small change requests (tasks). An estimation methodology based on number of data store updates, server process updates, and transactional complexity was established and validated through a new data (change requests) set. In an attempt to generalize the model for Effort Variance, data points in the original data set with Size < 100UFPs (12 tasks) were retrofit into model M3. With no Size measure, a
369
combination of factors; any model based on Skill Level did not give an acceptable goodness of fit (the best being R-Sq = 37%. Thus, no generalizations were possible though the predictability of EV using the new effort estimation methodology increased to 94.34% for small change requests. The established model was thus recalibrated minus the 12 tasks. Iteration 2: Calibration of the data model for Size. EV = -1.74 + 2.85Rework + 0.003895Size + 9.94Skill Level (M2) The variables Rework and Skill Level were found to be co-linear; we had to eliminate one from the prediction model. Considering that Rework was a derived measure, we preferred to use the base measure Skill Level. Iteration 3: Optimization based on co-linearity checks. EV = -1.64 + 0.003895Size + 9.96 Skill Level (M3) An accuracy analysis of multiple regression models minimizing different cost functions, in a similar context [13], indicates that the log linear multiple regression models are the most accurate. Accordingly, we also considered the prediction model based on log linear multiple regression. Iteration 4: Log linear multiple regression including all the variables and minimizing [log(actual effort / predicted effort)2]. log(EV) = 0.402 – 0.000577Size * log(Rework) - 0.00039Size * log(Skill Level) + 1.48Rework * log(Size) (M4) Successive iterations were based on optimizing the variables by analyzing the goodness of fit R-Sq on the deletion and addition of sets of variables. Iteration 5: Log linear multiple regression optimized for a subset of variables and minimizing [log(actual effort / predicted effort)2]. log(EV) = 0.5 + 0.00231Size - 0.000871Size * log(Skill Level) (M5) The prediction models M1 to M5 were analyzed for reasonableness of fit R-Sq and for accuracy [16], [17] through the MRE. A comparison of accuracy of the successive prediction models established using MMRE, MdMRE and PRED(0.25, 0.5) is in Table 4. The prediction accuracy for the model M3 is: MMRE = 0.0291, MdMRE = 0.0119 and PRED(0.25, 0.5) = 100%. Thus, model M3 achieves very good predictive accuracy in terms of MMRE, MdMRE and PRED measures.
370
SARANG AND SANGLIKAR TABLE 4 COMPARIZON OF ACCURACY BETWEEN MODELS
M1 (N = 53) M2 (N = 41) M3 (N = 41) M4 (N = 40) M5 (N = 40)
impact on the overall project performance, namely, the project’s Effort Variance context.
R-Sq
MMRE
MdMRE
PRED
90.1%
0.2468
0.1192
79.24%
99.9%
0.0284
0.0130
100%
99.9%
0.0291
0.0119
100%
97.0%
0.0886
0.1111
100%
95.5%
0.1387
0.0998
100%
Model M3 was tested with the hold-out sample [14], [15] and validated using n-fold cross-validation. Additionally, real time data was used by randomly selecting maintenance tasks from both the product lines; the results of the data were found to be consistent with the fitted regression model M3. VII. INTERPRETING THE PREDICTION MODEL FOR EV We refer to model M3 for the purpose of interpreting the prediction model. This selection is justified considering the accuracy comparisons from TABLE 4. M3 is interpreted for the range of data values and any calibrations done as part of the analysis. Accordingly, M3 holds true for tasks having Size between 100UFPs and 3877UFPs (from sampled data set), all values of Skill Level between 1 and 5. The ‘Y’ intercept (-1.64) has no practical interpretation as Size and Skill Level cannot be 0. EV will change by 0.003895 for every unit change in Size while EV will change by 9.96 for every unit change in Skill Level. Examining two specific cases of Size, we check the impact of the Skill Level on the Effort Variance. For Size 1000UFPs, Skill Level 5 (lowest), the Effort Variance is predicted to be ~52%, while with Skill Level 1 (highest) the Effort Variance will be just ~12%. Likewise, Size 100UFPs, Skill Level 5, gives an Effort Variance of ~48.5%, while Skill Level 1, gives an Effort Variance of ~8.7%. Thus, skill levels are very critical for the calibrated Size of tasks. Thus, any attempts at skills development and/or knowledge management will have a positive
VIII. CONCLUSION AND FUTURE WORK The software maintenance processes and practices are significantly different from those of software development. While a great amount of research has been done with a software development focus, very few research papers have been published describing the development and evaluation of formal models that predict software maintenance efforts. Any research and generalization that may be done in this area will serve to provide valuable inputs to project managers when planning software maintenance activities. We generalized that team member Skill Level (domain area) significantly influenced the efforts to complete software maintenance activities for tasks of normal sizes. For small tasks, a dependency on Skill Level could not be established. The prediction models established herein serve to support an analysis of the software maintenance process. The prediction accuracy indicators MdMRE and PRED are also meaningful to software maintenance project managers and serve as in-process indicators. The Effort Variance model developed allows project managers to plan around the predictions for EV and has been used to estimate 5 different change requests across the two product lines. Variations in the predictions for the 5 change requests were used to further field-test the model as an empirical validation of the results. Data, and learnings will continue to feed into the model for further refinements. Considering that the primary component for Rework in maintenance projects is defects, a prediction model for residual defects in software will be established and used to enable better predictions by combining the effort and defect models. Further, as a continuation of the work, it is proposed to add adjusting factors (tailoring) to the model so that other environment conditions and associated system dynamics are also considered. These would typically be based on the qualitative considerations from the current study. ACNOWLEDGMENT The authors thank CMC Limited Management and project managers from the Securities practice for providing the data set that served as the basis for this study.
AN ANALYSIS OF EFFORT VARIANCE IN SOFTWARE MAINTENANCE PROJECTS
REFERENCES [1] Briand L.C., Basili V.C, “A classification process for the effective management of changes during the maintenance process”, Proc. IEEE Conf. On Software Maintenance, Florida, 1992. [2] James Martin, Carma L., Mc Clure, Software Maintenance: the Problems and its Solutions, Prentice Hall Professional Technical Reference, 1983. [3] Harrison W., Cook C., “Insights on improving the maintenance process through software measurement”, Proceedings International Conference on Software Maintenance, IEEE Computer Society Press: San Diego; pp 37-45. [4] Yunsik Ann, et al, “The software maintenance project effort estimation model based on function points”, Journal of Software Maintenance Research and Practice,, 15(2), March 2003. [5] Mohagheghi P., Conradi R., “Experiences and challenges in evolution of a product line”, Proc. 5th International Workshop on Product Line Development, PFE 5, 2003, Springer LNCS 3014. [6] IFPUG, Enhancement Project Function Point Calculation, Function Point Counting Practices Manual (Release 4.1), International Function Point Users Group: Mequon WI, 1998, 8.10-8.17. [7] Low G.C., Jeffery R.D., “Function points in the estimation and evaluation of the software process”, IEEE Transactions on Software Engineering, 16, January 1990. [8] Abdel-Hamid T., Madnic S., “Impact of schedule estimation on software project behavior”, IEEE Software, 3(4), July 1986. [9] Basili V., Weiss A., “A methodology for collecting valid software engineering data”, IEEE Transactions on Software Engineering, 10(6), November 1984. [10] Kitchenham B., et al, “Towards a framework for software measurement validation”, IEEE Transactions on Software Engineering, 21(12), December 1995. [11] Pfleeger S., et al, “Status report on software measurement”, IEEE Software, 14(2), 1997. [12] Endres A., Rombach D., A Handbook of Software Engineering: Empirical Observations, Laws and Theories, Addison Wesley, 2004. [13] Jorgensen Magne, “Experience with the accuracy of software maintenance task effort prediction models”, IEEE Transactions on Software Engineering, 21(8), pp. 674-881, August 1995. [14] Scheidwind N., “Methodology for validating software metrics”, IEEE Transactions on Software Engineering, 18(5), May 1992. [15] Briand L., et al, “On the application of measurement theory in software engineering”, Journal of Empirical Software Engineering, 1(1), 1996. [16] Kitchenhan B.A. et al, “What accuracy statistics really measure”, IEE Proc. Software Eng., vol. 148, 2001. [17] Myrtveit Ingunn et al, “Reliability and validity in comparative studies of software prediction models”, IEEE Transactions on Software Engineering, 31(5), May 2005. [18] Stutzke D., “Measuring and Estimating Process Performance”, Annual CMMI Technology Conference, November 2005.
371
[19] Sarang N., “Process Performance Models – Enabling Organization Prediction Capability”, European SEPG Conference, June 2006.
Nita Sarang received her MS degree in Electrical Engineering from University of Notre Dame, USA and is currently a PhD student with the University Department of Computer Science, University of Mumbai. As a member of the leadership team at CMC Limited, she leads the quality and business excellence initiatives. She is a CMMI SCAMPI Lead Appraiser and a trained Six Sigma Black Belt. She continues to pursue research in software engineering and empirical studies and is actively associated with the activities of the International Software Engineering Research Network and the Empirical Software Community. She is also a regular speaker at international research and user conferences and has several paper publications to her credit. She recently certified as a SCAMPI High Maturity Lead Appraiser. She is a member of the IEEE. Mukund Sanglikar received his MSc degree in Mathematics and PhD degree in Computer Science from the University of Mumbai. He holds the position of Professor and Head, University Department of Computer Science, University of Mumbai. He is an active member of the board of studies and on the syllabus committee for undergraduate and post graduate curriculum in Computer Science and Information Technology. His research areas include computer graphics, soft computing techniques, evolutionary game theory, mobile databases and empirical studies in software engineering.
Design of Adaptive Neural Network Frequency Controller for Performance Improvement of an Isolated Thermal Power System Ognjen Kuljaca, Jyotirmay Gadewadikar, Kwabena Agyepong Systems Research Institute, Department of Advanced Technologies, Alcorn State University 1000 ASU Drive #360, Alcorn State, MS 39096, USA [email protected], [email protected], [email protected] Key-Words: -power system, neural network, adaptive control, frequency control Abstract: - An adaptive neural network control scheme for thermal power system is described. Neural network control scheme does not require off-line training. The online tuning algorithm and neural network architecture are described and a stability proof is given. The performance of the controller is illustrated via simulation for different changes in process parameters and for different disturbances. Performance of neural network controller is compared with conventional proportionalintegral control scheme for frequency control in thermal power systems.
1 Introduction The paper deals with the neural network (NN) frequency controller for isolated thermo power system. Frequency control becomes more and more important as power systems enter the era of deregulation. It is becoming very hard, if not impossible to schedule loads precisely, thus the load fluctuations in the power system are becoming more explicit. Emerging markets of ancillary services mean that primary controllers and turbines that are used in secondary control change constantly. These changes can cause serious problems when conventional control schemes are used, including problems with stability, unless the primary turbines and controllers are carefully selected. To avoid possible instability with parameter in the system, conventional secondary controllers are implemented using smaller integral gains than optimal performance would require. The literature about frequency and load – frequency control is numerous ([1], [2], [3], [4], [5], [6], [7], and many others). Many nonadaptive schemes are given in ([1], [2], [3], [4], [5] and [6]. However, the modern power systems in deregulated environment are subject to frequent parameters changes which may
diminish the quality of control when nonadaptive controllers are used. NN loadfrequency control is described in [8], [9] and [10]. The results obtained by using NN controllers are good. However, the described controllers require training. We provide here a performance analysis of adaptive NN controller that does not require training. The neural network is capable of on-line learning. The controller described here is an improved version NN control scheme given for the first time in [29]. The paper is organized as follows. In Section 2 are given some mathematical preliminaries. Model of isolated thermo power system is given in Section 3. Neural network control scheme is described in Section 4. In Section 5 are given the simulation results and in Section 6 is given the conclusion.
2 Mathematical Preliminaries Let R denote the real numbers, R n the real nvectors, R mxn the real mxn matrices. Let S be a compact simply connected set of R n . With map f : S → R m , let us define C m (S) the space such that f is continuous. Let • be any suitable
vector norm. The supremum norm of f ( x ) over S is defined as: sup f (x)
,
f : S → R m , ∀x ∈ S
(1)
Given x ∈ R N , a two-layer NN (Fig. 1) has a net output given by 1
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 372–377. © Springer Science+Business Media B.V. 2008
y = WT σ(VT x)
(2)
373
DESIGN OF ADAPTIVE NEURAL NETWORK FREQUENCY CONTROLLER
[
where
x = 1 x1 ⋅ ⋅ ⋅ x N1
], T
[
y = y1
y 2 ⋅ ⋅ ⋅ y N3
]
T
and
the activation function. If z = [z1 z2 ⋅ ⋅ ⋅] , we define σ(z) = [σ(z1 ) σ(z 2 ) ⋅ ⋅ ⋅]T . Including “1” as a first term in σ(V T x) allows one to incorporate the tresholds as the first column of W T . Then any tuning of NN weights includes tuning of thresholds as well. σ (•)
T
1
v1,1
∑
∑
x2 ∑
z1
σ1
z2
σ2
z3
σ3
w1,1
xn
vN h ,n
V
∑
zNh σ Nh
1/R ΔPL ΔPr +
-
Gg
Gt
ΔPm +
-
Gs
Δf
θ w,1 ∑
y1
∑
y2
θ w,m θv, N h
The model of an isolated thermo power system is shown in Fig. 2.
Fig. 2: The model of isolated thermo power system
1
θv,1
x1
3 Isolated Thermopower System
∑
The transfer functions in the model are: Gg =
1 , 1 + s ⋅ Tg
(4)
Gt =
1 , 1 + s ⋅ Tt
(5)
Gs =
Ks , 1 + s ⋅ Ts
(6)
ym
wN h , m
W
Fig. 1: Two layer neural network
The main property of NN we are concerned with for control and estimation purposes is the function approximation property ([16], [20]). Let f ( x ) be a smooth function from R n → R m . Then it can be shown that if the activation functions are suitably selected, as long as x is restricted to a compact set S ∈ R n , then for some sufficiently large number of hidden-layer neurons L, there exist weights and thresholds such that f ( x ) = W T σ(V T x ) + ε( x ). (3) The value of ε(x ) is called the neural network functional approximation error. In fact, for any choice of a positive number ε N , one can find a
neural network such that ε( x ) ≤ ε N for all x ∈S. Also, it has been shown that, if the first-layer weights V are suitably fixed, then the approximation property can be satisfied by selecting only the output weights W for good approximation. For this to occur ϕ( x ) = σ(V T x ) must be a basis [28]. If one selects the activation functions suitably, e.g. as sigmoids, then it was shown by Igelnik and Pao [19] that σ(V T x ) is a basis if V is selected randomly.
where Gg, Gt and Gs are representing turbine governors, control turbines and the power system respectively. Such models are described in more details in [1], [2], [3], [4], [5], [6], [7], and many others. It is also shown that the system in Fig. 2 is always asymptotically stable if R is positive number. In real systems that is always the case. The system is linear and the need for adaptive control or use of the function approximation property of the neural network is not obvious since there are no nonlinearities in the controlled plant. However, all the parameters can and do change during the operation. Thus, it is conceivable that adaptive control scheme would perform better than nonadaptive control. The most usual way of control is to use linear PI controllers. The controller in that case takes the change of power system frequency Δf as the input and produces the control output ΔPr as output. That signal is fed to turbine governors in order to counter the changes caused by the change in the load ΔPL. The turbine output is the mechanical power ΔPm. However, introduction of integral action also means that the system can become unstable.
374
KULJACA ET AL.
The system shown in Fig. 2 can be represented in state space as ⎡ 1 ⎢− T ⎢ g 1 x = ⎢ ⎢ Tt ⎢ ⎢ 0 ⎣⎢
0 1 Tt Ks Ts
−
1 ⎤ ⎡ 1 RTg ⎥ ⎢− ⎥ T ⎢ g 0 ⎥x + ⎢ 0 ⎥ ⎢ 1 ⎥⎥ ⎢ 0 − Ts ⎥⎦ ⎣
−
⎤ 0 ⎥ ⎥ ⎡ ΔP ⎤ 0 ⎥⎢ r ⎥ K s ⎥ ⎣ΔPL ⎦ − Ts ⎥ ⎦
(7)
The state vector x is
This assumption is always true as long as we deal with the power system in the normal mode of operation. If the load disturbance is too big there cannot be any control action since in that case the system just doesn’t have enough power generation capability available. The protection functions take over in that case and some loads have to be disconnected. Let the control signal be given by (11)
ΔPr = W T ϕ( x )
(8)
and the weight updates are provided by
where yg is output of the turbine controllers. These states are physically available, so this representation will allow for the NN control scheme design.
4 Adaptive Neural Network Control The neural network is build as shown in Section 2. The first layer weights, sigmoid functions, are initialized randomly and then fixed to form basis ϕ(x). The NN output is y = W T ϕ( x ).
(10)
ΔPL ≤ PM .
= Ax + Bu , Δf = [0 0 1]x.
⎡ yg ⎤ x = ⎢⎢ΔPm ⎥⎥, ⎢⎣ Δf ⎥⎦
It is assumed that the load disturbance ΔPL is bounded so that
= Fϕ( x ) Δf − k x FW. W w
with F any symmetric and positive definite matrix and kw positive design parameter. Then, the system states x and neural network weight W are ultimately uniformly bounded (UUB) and the system is stable in Lyapunov sense as long as. .
(9)
This architecture is an adapted form of the tracking NN controller described in [23], [24], [25], [26], [27] and numerous other papers. However, there are some differences. Namely, here the problem is control, not tracking. Also, there is no special robustifyng term or PD controller parallel to. Since there are significant time constants present in the controlled plant, derivative part will not have an effect. Proportional part is not needed to initially stabilize the system since uncontrolled system is always stable. Proportional gain K was still used in scheme given and analyzed in [29], but simulation analysis had shown that proportional gain does not improve the performance of the control scheme. At last, unlike in papers mentioned above, we don’t use filtered error approach.
(12)
W =
d M σ ( P ) max + x >
D2 4 k 2w
1 σ ( Q ) min 2
,
D D 2 d M σ(P) max . + + . 2k W 4k 2W kW
(13)
(14)
Thus, the Lyapunov derivative is negative as long x and W are outside a compact set, meaning that x and W are UUB and the system is stable. Proof:
Let us first rewrite equation (7) into the form that is more suitable for stability proof. x = Ax + B 2 ΔPL + B1W T ϕ( x ) , (15) where T
⎡ −1 ⎤ B1 = ⎢ 0 0⎥ , ⎢⎣ Tg ⎥⎦
(16)
375
DESIGN OF ADAPTIVE NEURAL NETWORK FREQUENCY CONTROLLER
⎡ B 2 = ⎢0 0 ⎣
T
− Ks ⎤ . ⎥ Ts ⎦
(17)
2⎞ ⎛1 L ≤ − x ⎜ x σ(Q) min − d M σ(P) max − W D + k w W ⎟ . ⎝2 ⎠
We will also redefine the disturbance into: d = B 2 ΔPL .
(18)
Note that d is also bounded by some value dM because Ks/Ts has always a finite value. We can now rewrite (15) into: x = Ax + d + B1W T ϕ( x ) ,
(19)
Let us now define the Lyapunov candidate: L=
1 T 1 x Px + W T F −1W , 2 2
(20)
with P diagonal and positive definite matrix (that can be easily done uncontrolled system is asymptotically stable). In this design W is a vector because we have only one output. The Lyapunov derivative is: 1 . L = ( x T Px + x T Px ) + W T F −1W 2
introducing norms and some arithmetic we will obtain the following inequality:
(21)
Introduce (19) into (21) to obtain
(24)
Lyapunov derivative is negative as long the term in parentheses in (24) is positive. This term will be positive as long as (13) and (14) hold, meaning as long as x and W are outside a compact set. Vectors x and W are thus UUB and the system is stable.
5 Simulation Results The simulations were performed for the following set of parameters: Tq = 0.08 s, Tt = 0.3 s, Ks = 120 pu/s, R = 2.4 Hz/pu. The neural network had 20 nodes in the hidden layer; initial network values were initialized to small random numbers, parameter kw=10-4 and F=diag(0.05). The network used sigmoid activation function. The responses of the system with NN control is compared with the usual PI controlled system with proportional gain of the controller kp = 0.08 and integral gain ki = 0.6. The load disturbance was simulated as ΔPL = 0.1sin( 2Π f L t ) p.u. The simulation results for different frequencies of disturbance are shown in Fig. 3 - Fig. 5. -3
1 . L = x T (AT P + PA) x + x T Pd + x T PB1W T ϕ( x) + W T F−1W 2
2.5
(22)
x 10
PI NN
2 1.5
1 L = x T Qx + x T Pd + W T ϕ( x )( x T PB1 + Δf ) + k w x W 2
2
. (23)
1 0.5
Δf [Hz]
Now, the first term in (22) is nothing else than well known presentation of linear system and can be easily rewritten as follows in (23) with Q positive definite (remember that uncontrolled system is asymptotically stable, so positive definite Q always exist). We obtain (23) by introducing (12) into (22).
0 -0.5 -1 -1.5 -2 -2.5 0
100
200
300
400
500 t [s]
600
700
800
900
1000
Fig. 3: The responseS for fL=0.002
Let us define D = max( ϕ(x) ( PB1 + 1)) . Activation functions are bounded so we can make this substitution. Let us define σ(Q)min and σ(P)max as the minimum and maximum singular values of matrices Q and P respectively. After
It can be seen that NN control scheme outperforms conventional PI controller in all three cases. Frequency deviation is smaller when NN controller is used. The response of the system for changed turbine governor and turbine parameters (Tg=0.09s and
376
KULJACA ET AL.
Tt=0.8s) is shown in Fig. 6 and Fig. 7. In this case PI controlled system becomes unstable. On the other hand, adaptive NN control keeps the system stable.
-3
x 10 3
NN
2 1
-3
0
PI NN
Δf [Hz]
6
x 10
-1
4 -2
Δf [Hz]
2
-3 -4
0
100
200
300
-2
100
200
300
400
500 t [s]
600
700
800
900
1000
Fig. 4: The response for fL=0.005 PI NN
0.02 0.015 0.01 0.005
Δf [Hz]
700
800
900
1000
0 -0.005 -0.01 -0.015 -0.02
100
200
300
400
500 t [s]
600
700
800
and changed governor and turbine parameters – NN controlled system - detail
6 Conclusion
0.025
-0.025 0
500 600 t [s]
Fig. 7: The response for fL=0.002
-4
-6 0
400
900
1000
Fig. 5: The response for fL=0.02
The results of an initial research in neural network control of power systems are shown. The simulation results show that controller performs well and adapts to changing parameters. Moreover, when NN adaptive control is used the system is always stable, unlike with conventional control. The controller does not require off-line training phase. By defining the neural network differently, having output layer weights W defined as a matrix this scheme can be adjusted to deal with the multivariable systems. The performance analysis here shows that is would be worth to continue with the research effort toward neural network controllers for interconnected systems as well as for the systems with generation rate constraint.
2 NN 1.5 1
Δf [Hz]
0.5 0 -0.5 -1 -1.5 -2 0
100
200
300
400
500 t [s]
600
700
800
900
Fig. 6: The response for fL=0.002
and changed governor and turbine parameters –NN controlled system
1000
References: [1] Elgerd, O.I. “Control of Electric Power Systems”, IEEE Proceedings, pp. 4-15, 1981 [2] Elgerd, O.I., Fosha, C.E., “Optimum MegawattFrequency Control of Multiarea electric Energy Systems” IEEE Transactions Power Apparatus and Systems, PAS-80, No. 4, pp. 556-562, 1970 [3] Fellachi, A., “Optimum Decentralized Load Frequency Control” IEEE Transactions on Power Systems, Vol. PWRS-2, No. 2, pp. 379384, 1987 [4] Fosha, C.E., Elgerd, O.I., “The MegawattFrequency Control Problem: A New Approach Via Optimal control Theory” IEEE Transactions Power Apparatus and Systems, PAS-80, No. 4, pp. 563-577, 1970
DESIGN OF ADAPTIVE NEURAL NETWORK FREQUENCY CONTROLLER
[5] Hamza, M.H., Agathoklis, P., Chan, W.C., “A Combined Selftuning and Integral controller for Load Frequency Control of Interconnected Power systems” Regelungstechnik 30. Jahrgang Heft 7 pp. 226-232, 1982 [6] Kundur, P., Power System Stability and Control. McGraw-Hill, Inc.,1993 [7] Liaw, C.M., “Design of a Reduced-Order Adaptive Load-Frequency Controller for an Interconnected Hydrothermal Power System” International Journal of Control, Vol. 60, No. 6, pp. 1051-1063, 1994 [8] Demiroren, A., Zeyneigly, H.I., Sengor, N.S., “The application of ANN technique to loadfrequency control for three-area power system”, 2001 IEEE Porto Power Tech Conference, Porto, Portugal [9] Bevrani, H. “A novel approach for power system load frequency controller design”, Transmission and Distribution Conference and Exhibition 2002: Asia Pacific. IEEE/PES , Volume: 1 , 6-10 Oct. 2002 [10] Birch, A.P., Sapeluk, A.T., Ozveren, C.T., “An enhanced neural network load frequency control technique”, International Conference on Control, 1994. Control '94. Volume 1, 21-24 Mar 1994 [11] R. Barron, “Universal approximation bounds for superpositions of a sigmoidal function,” IEEE Trans. Info. Theory, vol. 39, no. 3, pp. 930-945, May 1993. [12] A.G. Barto, “Reinforcement learning and adaptive critic methods,” Handbook of Intelligent Control, pp. 469-492, Van Nostrand Reinhold, New York, 1992. [13] J. Campos and F.L. Lewis, “Adaptive critic neural network for feedforward compensation,” Proc. American Control Conf., vol. 4, pp. 2813 – 2818, San Diego, 1999 [14] F.-C. Chen, and H. K. Khalil, “Adaptive control of nonlinear systems using neural networks,” Int. J. Contr., vol. 55, no. 6, pp. 12991317, 1992. [15] M. J. Corless and G. Leitmann, “Continuous state feedback guaranteeing uniform ultimate boundedness for uncertain dynamic systems,” IEEE Trans. Automat. Contr., vol. 26, no. 5, pp. 850-861, 1982. [16] G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Math. Contr. Signals, Syst., vol. 2, no. 4, pp. 303-314, 1989. [17] C. A. Desoer and S. M. Shahruz, “Stability of dithered nonlinear systems with backlash or
377
hysteresis,” Int. J. Contr., vol. 43, no. 4, pp. 1045-1060, 1986. [18] B. Friedland, Advanced Control System Design, Prentice-Hall, New Jersey, 1996. [19] B. Igelnik and Y. H. Pao, “Stochastic Choice of Basis Functions in Adaptive Function Approximation and the Functional-Link Net,” IEEE Trans. Neural Networks, vol. 6, no. 6, pp. 1320-1329, Nov. 1995. [20] K. Hornik, M. Stinchombe, and H. White, “Multilayer feedforward networks are universal approximators,” Neural Networks, vol. 2, pp. 359-366, 1989. [21] B.S. Kim and A.J. Calise, “Nonlinear flight control using neural networks,” AIAA J. Guidance, Control, Dynamics, vol. 20, no. 1, pp 26-33, 1997. [22] J.-H. Kim, J.-H. Park, S.-W. Lee, and E. K. P. Chong, “Fuzzy precompensation of PD controllers for systems with deadzones,” J. Int. Fuzzy Syst., vol. 1, pp. 125-133, 1993. [23] S.-W. Lee and J.-H. Kim, “Control of systems with deadzones using neural-network based learning control,” Proc. IEEE Int. Conf. Neural Networks, June 1994, pp. 2535-2538. [24] F. L. Lewis, C. T. Abdallah, and D. M. Dawson, Control of Robot Manipulators, Macmillan, New York, 1993. [25] F. L. Lewis, A. Yesildirek, and K. Liu, “Multilayer neural-net robot controller with guaranteed tracking performance,” IEEE Trans. Neural Networks, vol. 7, no. 2, pp. 1-11, Mar. 1996. [26] F. L. Lewis, K. Liu, and A. Yesilidrek, “Neural net robot controller with guaranteed tracking performance,” IEEE Trans. Neural Networks, vol. 6, no. 3, pp. 703-715, 1995. [27] F. L. Lewis, K. Liu, R. R. Selmic, and LiXin Wang, “Adaptive fuzzy logic compensation of actuator deadzones,” J. Robot. Sys., vol. 14, no. 6, pp. 501-511, 1997. [28] F.L. Lewis, S. Jagannathan, and A. Yesildirek, Neural Network Control of Robot Manipulators and Nonlinear Systems, Taylor and Francis, Lindon, 1998. [29] N. Sadegh, “A perceptron network for functional identification and control of nonlinear systems,” IEEE Trans. Neural Networks, vol. 4, no. 6, pp. 982-988, Nov. 1993. [30] Kuljaca, O., Lewis, F. Tesnjak, S., “Neural Network Frequency Control for Thermal Power Systems”, IEEE Control and Decision Conference, 2004
Comparing PMBOK and Agile Project Management Software Development Processes P. Fitsilis TEI Larissa 41335 Larissa, Greece [email protected]
Abstract- The objective of this article is to compare a generic set of project management processes as defined in Project Management Body of Knowledge (PMBOK) with a number of agile project management processes. PMBOK is developed by Project Management Institute and it is structured around five process groups (initiating, planning, execution, controlling and closure) and nine knowledge areas (integration management, scope management, time management, cost management, quality management, human resource management, communication management, risk management, procurement management). On the other hand, agile software project management is based on the following principles: embrace change, focus on customer value, deliver part of functionality incrementally, collaborate, reflect and learn continuously. The purpose of this comparison is to identify gaps, differences, discrepancies etc. The result is that, agile project management methodologies cannot be considered complete, from the traditional project management point of view, since a number of processes either are missing or not described explicitly.
I.
INTRODUCTION
Traditional software development methodologies grew out of a need to control large development projects, and from the difficulties of estimating and managing these efforts to reliably deliver results. These difficulties are inherited in the nature of software and they were identified from the early years of software system development and unfortunately most of them still remain. Most of the skepticism expressed in the legendary book of Frederic Brooks, “The mythical man-month” thirty years ago is still valid [1]. Further, today’s information technology professionals are under tremendous pressure to deliver quality IT products and services, in order to respond to an always dynamic and fast changing market. As a result, the list of large software projects that have failed is still growing. Robert Charette [2] compiled a list of the most notable fiascoes in the IT industry. Further, he states that “most IT experts agree that such failures occur far more often than they should” and that “the failures are universally unprejudiced”. Literature suggests that project organization, stakeholders’ expectation management, scope creep etc., are always important factors leading to project success, when managed properly [3, 4, 5].
Agile methodologies attempt to overcome these obstacles by changing the approach used to develop software and manage projects. Agile software development attempts to put the software being developed first. Further, agile methods acknowledge that the user requirements change, that we have to respond quickly to the users’ needs, that there is a need to produce frequent and regular, software releases, etc. The Manifesto for Agile Software Development was released in February 2001 by a group of 17 software process methodologists, who attended a summit meeting to promote a better way of developing software and then formed the Agile Alliance. The Manifesto for Agile Software Development can be found on the Agile Alliance website (http://www.agilemanifesto.org). Since then, a number of software development methods subscribed to this approach. The list varies depending on different viewpoints and interpretations, but in general the list includes Extreme Programming (XP), Scrum, Feature-Driven Development (FDD), Adaptive Software Development (ASD), Crystal Clear Methodology, etc. Most agile development methods were created within corporations by software process experts as an attempt to improve existing processes. For example, XP was created by Kent Beck during his work on the Chrysler Comprehensive Compensation System payroll project. Kent Beck refined the development method used and the result was published in his book “Extreme Programming Explained” [6]. Similarly, FDD was initially introduced by Jeff De Luca, in order to meet the specific needs of a 15 month, 50 person software development project at a large Singapore bank in 1997. FDD was influenced by ideas of Peter Coad on object modeling. The description of FDD was first introduced in the book “Java Modeling in Color with UML” by Peter Coad, Eric Lefebvre and Jeff De Luca in 1999 [7]. A more generic description of FDD decoupled from Java can be found in the book “A Practical Guide to FeatureDriven Development” [8]. On the other end more traditional project management methodologies rely heavier on processes, linear development cycles and waterfall like software development life cycles. Along with predictability, they inherited a deterministic, reductionist approach that relied on task breakdown, and was predicated on stability – stable requirements, analysis and
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 378–383. © Springer Science+Business Media B.V. 2008
AGILE PROJECT MANAGEMENT SOFTWARE DEVELOPMENT PROCESSES
stable design. This rigidity was also marked by a tendency towards slavish process “compliance” as a means of project control [9]. Project Management Body of Knowledge (PMBOK) developed by Project Management Institute is the best representative of this approach [10]. PMBOK formally defines a total of 44 project processes that describe activities throughout a project’s life cycle. These 44 project processes are organized into two axis: into five process groups and into nine knowledge areas that will be described briefly in the following section. Within PMBOK each process is described in terms of inputs (documents, plans, design, other data, etc.), outputs (documents, products) and tools and techniques (mechanisms that are applied to inputs for producing outputs) and without being too specific, it provides guidance to someone that wishes to apply the processes [11]. Similar approaches, process oriented, have been introduced by other international bodies or associations. Among them, worths mentioning International Project Management Association, Competence Baseline (ICB), which describes the necessary competences (technical, behavioral, contextual) for project management [12] and project management body of knowledge as defined by Association for Project Management in UK (APM) (http://www.apm.org.uk) which again describes 40 necessary for project management competencies. These approaches study project management in a multi facet way and therefore somebody can argue about the definition of a process or if a competence is needed but not about their completeness. Further, most of them are widely known and accepted by professionals and organizations. However, even if agile methods look attractive and their use is achieving promising results there is criticism if they are complete project management methodologies or extended, with some management elements, software lifecycles. Similar studies and discussion can be found to [13, 14] In the next sections, we will briefly present PMBOK along with some of the most known agile methods. Then we will to compare them having as basis for comparison the processes as defined, per knowledge area, in PMBOK. II.
PROJECT MANAGEMENT INSTITUTE BODY OF KNOWLEDGE
As it was mentioned before, Project Management Body of Knowledge (PMBOK) [10] is defined in terms of process groups and knowledge areas. In this study, we will focus on the knowledge areas, since these areas are offering a more precise idea of what is project management about and at the same time they give the overall picture. The knowledge areas are the following: 1. Project Integration Management describes the processes and activities that integrate different aspects of project management. It consists of the following processes: a. Develop Project Charter, b. Develop Preliminary Project Scope Statement,
2.
3.
4.
5.
6.
7.
8.
379
c. Develop Project Management Plan, d. Direct and Manage Project Execution, e. Monitor and Control Project Work, f. Integrated Change Control, and g. Project Closure. Project Scope Management. It encapsulates processes that are responsible for controlling project scope. It consists of: a. Scope Planning, b. Scope Definition, c. Create Work Breakdown Structure (WBS), d. Scope Verification, and e. Scope Control. Project Time Management, which describes the processes concerning the timely completion of the project. It consists of a. Activity Definition, b. Activity Sequencing, c. Activity Resource Estimating, d. Activity Duration Estimating, e. Schedule Development, and f. Schedule Control. Project Cost Management that includes processes concerning the cost. The processes that are part of this knowledge area are: a. Cost Estimating, b. Cost Budgeting, and c. Cost Control. Project Quality Management describes the processes involved in assuring that the project will satisfy the objectives for which it was undertaken. It consists of: a. Quality Planning, b. Perform Quality Assurance, and c. Perform Quality Control. Project Human Resource Management includes all necessary processes for organizing and managing the project team. It consists of: a. Human Resource Planning, b. Acquire Project Team, c. Develop Project Team, and d. Manage Project Team. Project Communications Management describes the processes concerning communication mechanisms of a project, and relate to the timely and appropriate generation, collection, dissemination, storage and ultimate disposition of project information. It consists of the following processes: a. Communications Planning, b. Information Distribution, c. Performance Reporting, and d. Manage Stakeholders Project Risk Management describes the processes concerned with project-related risk management. It consists of: a. Risk Management Planning,
380
FITSILIS
b. Risk Identification, c. Qualitative Risk Analysis, d. Quantitative Risk Analysis, e. Risk Response Planning, and f. Risk Monitoring and Control. 9. Project Procurement Management includes all processes that deal with acquiring products and services needed to complete a project. It consists of: a. Plan Purchases and Acquisitions, b. Plan Contracting, c. Request Seller Responses, d. Select Sellers, e. Contract Administration, and f. Contract Closure. III. AGILE PROJECT MANAGEMENT Agile Software Development manifesto objective was to “uncover better ways of developing software by doing it and helping others do it” ((http://www.agilemanifesto.org). The principles that it is based are the following: • Individuals and interactions over processes and tools. • Working software over comprehensive documentation. • Customer collaboration over contract negotiation. • Responding to change over following a plan. Obviously, the agile manifesto is not a process oriented approach. As, it was presented in the introduction, there are many software development methods that can be called “agile”. However, in this work we have selected to include only a few of them as the most representative. The software development methods that were included are: Extreme Programming (XP), Scrum and Feature-Driven Development (FDD). A. eXtreme Programming (XP) Extreme Programming, or XP, is an agile method that emerged from a project at Chrysler Corporation in the late 1990s. It was devised by Ward Cunningham, Kent Beck, and Ron Jeffries. The method is well documented in a number of books, articles and web sites (http://www.extremeprogramming.org/) [6, 15, 16, 17]. XP method is based on four values: • Communication, which is based on practices such as unit testing, pair programming, task estimation. • Simplicity, by always seeking the simplest solution. • Feedback, by having concrete knowledge about the current state of system • Courage, to admit flaws in the system and take immediate corrective actions Further, XP is based on a number of practices. Some of them are: • Planning. The scope of the next release should be defined as soon as possible, by combining business priorities and technical estimates. • Small releases. The system has to be up and running by having very short cycles and quick releases.
•
Metaphor. All development is driven by a simple story of how the whole system works. • Simple design. Simple solutions are the preferred. Complexity should be removed wherever possible. • Testing. Testing is a continuous activity. • Pair programming. All programmers are working in pairs. • Collective ownership. Anyone can change any code anywhere in the system at any time. • Continuous integration. The system is built many times a day, as soon as a task is finished. • On-site customer. A representative of the customer organization should be available full-time to the project team to answer questions. B. SCRUM Scrum approach for new product development was first presented by Takeuchi and Nonaka [18] after observing small high performance teams, at various companies. Similar, observations were made as well by other researchers [19, 20]. Scrum is an agile software development process where projects progress via a series of month-long iterations called sprints [21, 22, 23]. Furthermore, a series of scrum sprints, on average 6 to 9, can produce a product release. At the beginning of the project, the project requirements are captured into a list known as the product backlog. Then, at the start of each sprint a sprint planning meeting is held during which the product owner prioritizes over the product backlog and the members of the scrum team define the tasks that they can complete during the coming sprint. For each day of the sprint, there is daily stand-up meeting for discussing project issues, called the daily scrum. Daily scrums help significantly team development and team communication. Further, the members of the daily scrum can quickly decide on any issue requiring further attention. Issues are not debated in the meeting itself, since the meeting should not last more than 15 minutes. At the end of the sprint the team presents the developed functionality at a Sprint Review Meeting. Scrum processes are grouped in three phases [23], the pregame, the game and the post game. Pregame phase includes the following processes: • Planning, which includes the definition of a new release based on currently known product backlog, along with an estimate of its schedule and cost. If the system under development is new, planning includes both conceptualization and analysis. • Architecture development. Includes the architecture development and the high level design. The game consists of development sprints that produce a new product release. Postgame is the closure of the project, which includes preparation of the releases, producing the final documentation, executing the site acceptance testing and the final product release.
381
AGILE PROJECT MANAGEMENT SOFTWARE DEVELOPMENT PROCESSES
C. Feature Driven Development (FDD) Feature Driven Development (FDD) is an iterative and incremental software development process [7, 8]. FDD methodology consists of five steps that are the following: • Develop the Overall Model, • Build the Feature List, • Plan by Subject Area, • Design by Feature Set, and • Build by Feature. In the first step, of developing the overall model, a high level walkthrough of the scope of the system and of its environment is done. The main purpose is to capture requirements and to develop the initial UML class diagram which describes the entities of the problem domain. The whole work is split and assigned to small teams, consisting of developers and users. These teams are responsible for parts of the whole model and for presenting their models for review and approval. Finally, the proposed models are merged in order to form the domain model. The second step of FDD is to build the feature list. The development team will identify the set of features needed, by decomposing the system functionality into subject areas. Each subject area is composed of business activities that comprise business activity steps (features). Features are granular functions expressed in customer terms. Usually, each feature requires up to two weeks of development. In case, the feature requires more time then this step is decomposed into smaller steps. The third step is to produce the project development plan. Planning by subject area involves planning the project by grouping features to feature sets and subject areas. The grouping is done according to the functionality and the existing dependencies between features. Other factors to take into consideration are: the load across the development team and the complexity of the features to be implemented.
As soon as features are grouped and the features to be included in the release are agreed, we have to determine the development sequence, to assign business activities to chief programmers and in doing so, consider which of the key classes are assigned to which developers. When this planning is complete the team would then agree on a schedule of delivery for the subject area and the feature sets together with the project sponsor. The forth step is the “Design by Feature Set”. The objective in this step us to produce the design of each feature set. The design model includes UML sequence diagrams, refinement of the overall UML class diagram developed in the first step, etc. Finally, the fifth step, “Build by feature”, involves packaging a smaller batch of features from the feature set decided in step 4 and developing the code for those features and unit testing them. The batch of features is known as a Chief Programmer Work Package (CPWP) and should be selected such that it can be completed by a single feature team in less than 2 weeks. IV. COMPARISON OF PMBOK AND AGILE METHODS In order to compare the presented project management methods we took as basis the PMBOK processes as they are organized per knowledge area. The reasons that contributed to this decision were two: • PMBOK is an exhaustive list of good practices, in the form of processes that can be tailored and customized to specific needs. • PMBOK is well known and formally documented compared with the presented in this work agile methods. The result of this comparison is presented Table I.
TABLE I COMPARISON OF PMBOK AND AGILE METHODS PROCESSES
PMBOK XP Project Integration Management • Develop Project Charter • Develop Preliminary Project Scope Statement • Develop Project Management Plan • Direct and Manage Project Execution • Monitor and Control Project Work • Integrated Change Control • Close Project
• Integration of software as soon as possible and as often as possible (mostly related with software code). • Collective code ownership • Project velocity measurement
Agile methods Scrum • Verification of management approval and funding during planning phase. • Validation of development tools and infrastructure during planning phase. • Strong change management procedure with product and sprint backlog. • Refinement of system architecture to support changes. • Postgame phase.
FDD • Development of the overall system model.
382
FITSILIS
PMBOK
Agile methods Scrum
XP
FDD
Project Scope Management • Scope Planning, • Scope Definition, • Create Work Breakdown Structure (WBS), • Scope Verification, and • Scope Control
• User Stories • Release Planning, Small Releases
• Perform domain analysis for building domain model. • Development of a comprehensive product backlog list. • Development of a comprehensive product sprint backlog. • Definition of the functionality that will be included in each release. • Selection of the release most appropriate for immediate development. • Review of progress for assigned backlog items.
• Perform domain analysis for building domain model (step 1). • Build Features List, subject areas (step 2).
• Release Planning, • Iterations planning
• Definition of the delivery date and functionality for each release. • Monthly iterations
• Determine development sequence (step 3). • Assign Business Activities to Chief Programmers (step 3). • Assign Classes to Developers (step 3). • Chief programmer work package.
Not available
• Estimation of release cost, during planning phase.
Not available
• Emphasize on testing (unit, acceptance) • Based on simplicity • Use of project standards
• Distribution, review and adjustment of the standards with which the product will conform. • Design review meeting • Sprint planning meeting • Sprint review meeting. • Daily scrum.
• Review meetings (all steps) • Code inspection and unit test
• Appointment of project team(s) per release. • Team participation in sprint meetings. • Team participation in daily scrums.
• Appoint modeling team (step 1) • Appoint feature list team (step 2) • Appoint Planning Team (step 3) • Appoint Feature Team (step 3)
• • • •
• Review meetings (all steps)
Project Time Management • Activity Definition, • Activity Sequencing, • Activity Resource Estimating, • Activity Duration Estimating, • Schedule Development, and • Schedule Control
Project Cost Management • Cost Estimating, • Cost Budgeting, and • Cost Control project
Project Quality Management • Quality Planning, • Perform Quality Assurance, and • Perform Quality Control.
(step 5)
Project Human Resource Management • • • •
Human Resource Planning, Acquire Project Team, Develop Project Team, and Manage Project Team.
• Personnel rotation to various positions • Pair programming • Good working conditions (no overtime)
Project Communications Management • • • •
Communications Planning, Information Distribution, Performance Reporting, and Manage Stakeholders
• Use of system metaphor • Customer always available • Daily meetings • Use of project standards
Design review meeting Scrum meeting Sprint planning meeting Sprint review meeting • Communication of standards to the project team
383
AGILE PROJECT MANAGEMENT SOFTWARE DEVELOPMENT PROCESSES
PMBOK
Agile methods Scrum
XP
FDD
Project Risk Management • • • • • •
Risk Management Planning, Risk Identification, Qualitative Risk Analysis, Quantitative Risk Analysis, Risk Response Planning, and Risk Monitoring and Control.
• Create prototype to limit risk
• Initial assessment of risks during pregame. • Risk review during review meetings
Not available
Not available
Not available
Project Procurement Management • Plan Purchases and Acquisitions, • Plan Contracting, • Request Seller Responses, • Select Sellers, • Contract Administration, and • Contract Closure.
Not available
V. CONCLUSION Table I proves that agile methods do not define all facets needed in order to cover all aspects of project management, in the traditional sense. This was partially expected since traditional project management processes are fully defined compared with agile methods that are considered “empirical”. However, from this study we could conclude the following. Agile methods are giving emphasis in the following knowledge areas: • Scope Management, since emphasis is given in managing requirements. • Human resource management, since emphasis is given in team work. • Quality management, even though not formally defined, use of standards, testing and frequent reviews are promoted. On the other hand, agile methods do not fully address the following knowledge areas: • Risk is not managed explicitly, • Cost management is not part of the agile methodologies • Procurement management is not addressed at all.
[4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18]
This implies that connecting agile methods with PMBOK will benefit the software project management community. The next step of this work is the detailed mapping between PMBOK processes and the agile methodologies.
[19]
REFERENCES
[22]
[1] [2] [3]
F. Brooks Jr., The mythical man-month: essays on software engineering — Anniversary ed., Addison Wesley Longman, 1995. R. Charette, Why software fails, IEEE Spectrum, September 2005. J.S. Reel, Critical success factors in software projects, IEEE Software, 16(3), 19-23.
[20] [21]
[23]
T. Chow, and D-B. Cao, A Survey Study of Critical Success Factors in Agile Software Projects, The Journal of Systems and Software, doi: 10.1016/j.jss.2007.08.020, in press. A. Carmichael, and D. Haywood, Better Software Faster, Prentice-Hall NJ, 2002. K. Beck, Extreme Programming Explained: Embrace Change, AddisonWesley Professional, 1999. P. Coad, E. Lefebvre, and J. De Luca, Java Modeling in Color with UML: Enterprise Components and Process, Prentice Hall, 1999. A. Carmichael, and D. Haywood, Better Software Faster, Prentice-Hall NJ, 2002. S. Augustine and S. Woodcock, Agile Project Management, CC Pace, Available at www.ccpace.com. PMI Institute, A Guide to the Project Management Body of Knowledge, PMI Standard Committee, 2004. U.S. Department of Defense, Extension to: A Guide to the Project Management Body of Knowledge, First Edition, Version 1.0, 2003. International Project Management Association, IPMA Competence Baseline, Version 3.0. Van Haren Publishing, 2006. M. Sliger, Relating PMBOK Practices to Agile Practices, Available at http://www.stickyminds.com G. Alleman, PMBOK and agile article, Available at http://herdingcats.typepad.com/my_weblog/2007/08/pmbok-andagile.html K. Auer, and R. Miller, Extreme Programming Applied: Playing to Win (The XP Series), Addison Wesley - New York, 2002. S. W. Ambler, Agile Modeling: Effective Practices for Extreme Programming and the Unified Process, Wiley and Son, Inc., 2002. K. Beck and M. Fowler, Planning Extreme Programming, AddisonWesley, 200. H. Takeuchi and I. Nonaka, The New New Product Development Game, Harvard Business Review, January-February 1986. J. Coplien, Borland Software Craftsmanship: A New Look at Process, Quality and Productivity, Proceedings of the 5th Annual Borland International Conference, June 1994, Orlando, Florida. K. Schwaber, Controlled Chaos: Living on the Edge, American Programmer, April 1996. K. Schwaber, Agile Project Management with Scrum, Microsoft Press, 2004. L. Rising, and N.S. Janoff, The Scrum Software Development Process for Small Teams, IEEE Software, July-August 2000. K. Schwaber, Advanced Development Methods. SCRUM Development. Available at http://jeffsutherland.com/oopsla/schwapub.pdf
An Expert System for Diagnosing Heavy-Duty Diesel Engine Faults Peter Nabende and Tom Wanyama Faculty of Computing and Information Technology Makerere University, Kampala, Uganda Abstract – Heavy-Duty Diesel Engines (HDDEs) support critical and high cost services, thus failure of such engines can have serious economic and health impacts. It is necessary that diagnosis is done during both preventive maintenance and when the engine has failed. Because of their complexity, HDDEs require high expertise for the diagnosis of their faults; such expertise is in many cases scarce, or just unavailable. Current computerized tools for diagnosing HDDEs are tied to particular manufacturer’s products. In addition, most of them do not have the functionality that is required to assist inexperienced technicians to completely diagnose and repair HDDE faults, because most of the tools have only the capability to identify HDDE faults. These tools are not able to recommend corrective action. This paper presents an easy to use expert system for diagnosing HDDE faults that is based on the Bayesian Network Technology. Using Bayesian Networks simplified the modeling of the complex process of diagnosing HDDEs. Moreover, it enabled us to capture the uncertainty associated with engine diagnosis, and to incorporate learning capabilities in the expert system. Keywords: Diagnosis, Bayesian Belief Networks, Uncertainty I.
INTRODUCTION
Today, engines are an important component of many systems. Diesel engines are normally designed for use in heavy duty applications as in trucks, construction mobile machines, buses, and marine propulsion. Heavy-Duty Diesel Engines (HDDEs) are complex pieces of machinery, containing three times as many parts and components compared to engines used in medium-duty applications [1]. In most situations HDDEs support costly services and failure can have serious economic and health impacts. Diagnosis of diesel engines is therefore necessary and can be done both during preventive maintenance and when the engine has failed. Gelgele and Wang [2] give reasons as to why maintaining a high level of engine reliability by efficient fault diagnosis is necessary. Firstly, the downtime of the HDDE is expensive. Secondly, certain malfunctioning conditions can be a threat to safety of both human beings and the environment. When diagnosing a HDDE, a systematic search for the source or cause of a fault is done. A number of decisions are made regarding the faults and their causes, the components that are to be repaired and those that are to be rebuilt. A skilled mechanic is the human expert usually concerned with diagnosing HDDE problems. A mechanic can offer creative ideas, solve difficult engine faults, and perform routine tasks with regard to HDDE maintenance. When dealing with
engine faults the mechanic first obtains facts about the fault (case facts), and stores them in his Short-Term Memory (STM) [3]. The mechanic then reasons about the fault by combining the STM facts with Long Term Memory (LTM) knowledge (domain knowledge). The mechanic then infers new problem information and eventually arrives at a conclusion about the fault. A good approach to improving fault diagnosis efficiency is to capture and reuse know-how that exists in the heads of the key mechanics who really understand how a complex HDDE works. This know-how can be made accessible, and usable by: machine operators, less skilled mechanics, and experienced mechanics. Such an approach has been taken before in the form of Expert Systems [3]. The take up of expert systems, however, has not been great as they have simply provided rigid structures to guide diagnostic analysis [4]. There are also computerized tools that have been developed and in which engines can be diagnosed; some heavy duty vehicles have what is called ‘On Board Diagnostics (OBD)’. According to Barkai [5], OBD was developed to provide improved, information-rich visibility to complex operation and control mechanism that many service technicians treat as a black box. When a simple correlation exists between the OBD malfunction data and its root cause, OBD is a useful troubleshooting tool, but it provides little assistance in diagnosing more complex situations such as multiple fault codes or inconsistent information [5]. Classical auto-diagnostic solutions that most repair shops use are based on measurements of physical parameters such as temperature, flow and resistance [6]. In certain cases, auto-diagnosis of Electronic Control Units (ECUs) is used on many systems running on HDDEs. Although computer microchips of ECUs can store a diagnostic trouble code if certain problems occur in the engine [2], a stored trouble code does not always pinpoint the cause of the problem. It also requires thorough understanding and interpretation of what each code implies, and identification of which circuits the ECUs can monitor [2]. A variety of methods for engine fault detection and diagnosis have been sought, however, engine repair shops mostly in developing countries are still using traditional maintenance techniques with insignificant changes of automation [2]. Moreover, a few individuals are the definitive sources of knowhow for diagnosing HDDE faults which represents a bottleneck in the diagnosis process. In this paper, we present a prototype of an intelligent Heavy Duty Diesel Engine Expert System (HDDE-ES). The major aim of developing this system is to provide instant expert guidance in dealing with HDDE problems when time is limited and when
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 384–389. © Springer Science+Business Media B.V. 2008
AN EXPERT SYSTEM FOR DIAGNOSING HEAVY-DUTY DIESEL ENGINE FAULTS
decisions have to be made in a situation where a mechanic specialized in a sub system of a HDDE is not available. Because of the complexity of a HDDE System, Bayesian Belief Network (BBN) technology is used for modeling expert knowledge. This paper is organized as follows: the next section explores related work; the third section we present the conceptual framework in which the expert system works; in the fourth section, we present the HDDE expert system development; results are discussed in the fifth section; in the sixth section we describe testing and validation of the HDDE expert system; and conclude in section seven. II.
RELATED WORK
There is a lot of work that has been done concerning diagnosis of engine faults. In this section, we do not focus on surveying most of the systems developed for diagnosis of engine faults, however, we describe some of the related work that has been done and associated limitations or challenges. Zaiyadi [7] developed an expert system prototype for car maintenance that uses rule-based inference. According to Zaiyadi [7], the expert system was developed using prolog version 4.0 programming language. “The major problem with this version of Prolog is that it cannot be run correctly on Windows XP or any other latest version of operating system because it will not be compatible, with possibility of errors” [7]. The other challenge with this expert system lies in the techniques used for modeling expert knowledge – rules. A lot of work is required in refining errors and rules before it can really be used in a real situation. Gelgele and Wang [2] developed Expert Engine Diagnosis System (EXEDS) that uses rules for searching for possible causes of systems that have been listed. Although the EXEDS prototype successfully automates the engine diagnosis environment, it is not quite a complete and reliable system [2]. According to Gelgele, no further work was done towards the completion of the system because of financial constraints. Magnus [8] presented in his Master’s thesis, work on “Fault Isolation Utilizing Bayesian Networks”. Magnus [8] mainly developed Bayesian Networks for Scania systems. Magnus’s networks can be made to understand corralling faults, but there was no implementation of the work in to a prototype expert system. III.
FRAMEWORK IN WHICH THE PROPOSED EXPERT SYSTEM WORKS
The HDDE-ES is mainly concerned with the diagnosis of heavy duty diesel engine faults and is built on BBN Technology. BBNs facilitate reasoning by incorporating knowledge, experience, and content to evaluate large, complex systems and to mimic the thinking of experts. This technology closely models human reasoning due to its ability to rationally deal with uncertainty and its ability to augment “intelligence” with “wisdom” [9]. A. Diesel Engines An Engine converts chemical energy stored in petroleum fuels into useful mechanical energy. Diesel engines are normally designed for use in heavy-duty applications such as
385
trucks, construction mobile machines, and buses. HDDEs are complex machinery containing more parts than engines used for light to medium duty applications. From a maintenance view point, a HDDE is a major cost item and the most complex component for fault detection and diagnosis [2]. Not only are the most failures hidden, they can involve several aspects: mechanical, electrical, thermal or any combination thereof. At the same time, any single failure symptom can be caused by several failure sources [2]. A thorough understanding of the operation of a HDDE is therefore useful in understanding the architectural, functional, and behavioral details of the system. For a HDDE to operate efficiently, individual subsystems should work satisfactorily, and they should also work in tandem. In order to obtain the states for individual subsystems and components as well as the entire engine, a thorough grouping of the engine subsystems and components is essential. The following subsystems were considered as major components of a HDDE: Starting System; Cooling System, Lubrication System, Fuel injection system, Intake and exhaust system. Components that are mounted on the engine or engine skid include the following: Drive train components, Valve train and timing components, Governor / Control, Turbocharger / Blower. Knowledge models designed to represent diagnosis of these components take on different levels of detail ranging from the overall engine system down to subsystems, components and parts. B. Bayesian Belief Networks A Bayesian Belief Network (BBN) is a ‘causal reasoning’ tool that has in the last decade attracted an increasing number of researchers in the field of applied artificial intelligence where uncertainty is an intrinsic characteristic of the problem domain. BBNs summarize a problem domain as a set of nodes interconnected with arcs from a dependent structured network. A graphical illustration of a simple BBN is shown in Fig. 1. The node from which the arc originates is the parent node, and it influences the node where the arc terminates, the child node. The relationship between the connected variables is described by conditional probabilities associated with the states of the parent variables and the child variable. The relationship between a set of parent variables and child variables is described by the states of the parent and the conditional probabilities that relate the state of the parent to the state of the child. Bayesian and conditional probability theory, provide a mathematical framework for updating one’s belief in the occurrence of an outcome or event given the observation of Directed Arc
Parent Variable
Child Variable Parent Variable
Fig. 1: Components of a Belief Network Relationship
386
NABENDE AND WANYAMA
certain pieces of evidence; emulating the way an expert might be expected to make decisions within an uncertain environment. A formal description of the propagation of uncertainty is described as follows [10]: Let Y be a vector which represents the set of nodes (y1, y2, y3,…, yi) influencing a specific node X. Let the states of the variable at node X be defined as Xk. The confidence that X assumes a specific value K is defined as: n (1) p(X k ) = p(X k | y j) p( y j)
∑
i
i =1
i
Where: p (Xk) = Confidence associated with the node X assuming the state k. p (Xk | yij) = confidence that node X assumes the state k when causal node I assumes a value of j, and p (yij) = confidence that causal node assumed state j. A number of commercially available software packages exist, which enable Bayesian Networks to be constructed using standard desktop personal computers for example Netica from Norsys Corp [11] and Hugin from Hugin Expert A/S [12]. IV.
EXPERT SYSTEM DEVELOPMENT
In the following subsections we show the architecture of HDDE-ES, the theory behind the BBN technique used, and implementation of the BBN models. A. HDDE-ES Architecture The HDDE-ES is concerned with the most common HDDE faults and failures that can be diagnosed off-line. Primarily, the HDDE-ES comprises of a Graphical User Interface (GUI) that constitutes a problem selection interface and a problem diagnosis interface. The HDDE-ES architecture takes on the form shown in Fig. 2. The diagnosis of a particular HDDE problem is modeled using BBN technology. The Knowledge Base thus comprises a variety of BBNs modeling the USER
Graphical User Interface
Knowledge Base Bayesian Network Models with Reasoning Specifications
Problem Selection Interface Application Programming Interface Problem Diagnosis Interface Model Selector Fig. 2. HDDE-ES Architecture
diagnosis of various HDDE faults. These models are used during the diagnosis process. The BBN models are made available through a model selector. An Application Programming Interface (API) is used to connect the problem selection interface to the problem diagnosis interface. B. Knowledge Acquisition and Representation Knowledge acquisition is believed to be the critical and most difficult task in the development of an Expert System. As is stated in [2], this is because “the knowledge to be represented should not only be based on correct facts and data, but should also include all possible alternatives”. In spite of these inherent difficulties, the diagnostic knowledge was easily captured. This is because of the following reasons: the utilization of Bayesian Networks enabled a clear and compact way of representing the knowledge, allowing the possibility of visual guidance. The developers’ past experience as a trainee in the maintenance and repair of Heavy-duty Diesel Engines (HDDEs) and References to HDDE troubleshooting and repair manuals from different companies such as Caterpillar and Komatsu also simplified the Knowledge Acquisition process. The developer also worked with expert key personnel in HDDE maintenance to capture expert diagnosis information and associated links to relevant documentation. C. Knowledge Modeling for the HDDE-ES To build the knowledge models using Dezide Advisor [9], the process of diagnosis is reversed as compared to the normal diagnosis sequence. This requires approaching model building with the understanding that the model developer already knows the causes and solutions for the problem being modeled. Heckerman et al. [13] presented a method for performing sequential troubleshooting using BBNs. This work is based on some of their ideas. These are presented in the following: When troubleshooting under uncertainty, we need to compute the probabilities that components have failed. Bayesian Networks are used to compute these probabilities. Bayesian Networks are constructed by drawing arcs from cause to effect. Given observation of some nodes, we can use Bayesian Network inference algorithm to compute the probability that any or all or the system components are faulty. The conditional independencies represented by the Bayesian Network to this computation make this computation practical in this case, and for most realworld problems as well. Heckerman et al. [13] described a set of assumptions under which it is possible to identify an optimal sequence of observations and repair actions in time proportional to the number of components in a device. The approach that was taken was an extension of the troubleshooting approaches described in [15] and [16]. We suppose that we want to diagnose a device which has n components represented by variables c1, …, cn and that each component is exactly in one finite state of states. In a HDDE, the components can be “Fuel Piping,” “air cleaner,” “piston,” “cylinder walls”.
AN EXPERT SYSTEM FOR DIAGNOSING HEAVY-DUTY DIESEL ENGINE FAULTS
Heckerman et al. [13] follow the single fault assumption which specifies that exactly one component is malfunctioning and that this component is the cause of the problem. If pi denotes the probability that component ci is abnormal given the current state of information, we must have under the single-fault assumption. Each component ci has a cost observation, denoted by Cio (measured in time and / or money), and a cost of repair Cir. The costs of observation and repair of any component do not depend on previous repair observation actions. If we observe and possibly repair components in the order c1,…,cn then for the Expected Cost of Repair, denoted ECR (c1,…,cn), is ECR (c1 ,..., cn ) = (C1 + p1 (C1 + C )) o
+ (1 − p1 )(C2 + o
r
p
p2 r p (C 2 + C )) 1 − p1
+ (1 − p1 − p2 )(C3 + o
p3
1 − p1 − p2
(C3 + C )) r
p
(2)
⎡⎛ i =1 ⎞ o r p ⎤ + ... = ∑ ⎢⎜ 1 − ∑ p j ⎟ Ci + pi (Ci + C ) ⎥ i =1 ⎣⎝ j =1 ⎠ ⎦ n
That is we first observe component c1 incurring cost
C1o with probability p1, we find that the component is faulty and repair it (and the device) incurring cost C1 + C . With probability 1 – p1, we find that r
p
the component is functioning properly, and observe component c2. With probability p 2 , we find that 1 − p1 c2 is faulty and repair it; and so on. Now consider a diagnosis sequence where we reverse the observation and possible repair of components ck and ck+1. All terms in the Expected Cost of Repair of this sequence will be the same as those for the original sequence, except terms i=k and i= k+1. Therefore, we obtain
ECR(c1 ,..., cn ) − ECR(c1 ,..., ck −1 , ck +1 , ck ,..., cn ) = pk +1Cko − pk Cko+1
process can be stopped. This algorithm works well if the single fault assumption is lifted, in which case step 1 will take into account new information gained in steps 2 and 3, and step 4 will be: 4. If the device is still malfunctioning go to step 1. Unobservable components can be included in this procedure provided Cio is set to Ri, and Cir is set to zero. Heckerman et al. [13] also introduced a theory for handling service calls (used when the expected cost of the optimal troubleshooting step is higher), an approximate theory for handling systems with multiple faults, a theory for incorporating non-base observations (observations not related to components, but which potentially provide useful information). In the companion paper [14], the method was further advanced to enable configuration changes in the system to provide further useful information that can potentially lower the cost of optimal troubleshooting sequence. D. Implementation Implementation of the HDDE-ES involved creating Graphical User Interfaces that introduce the Expert System and enable the user to select engine problems for diagnosis. For each of the engine problems, a Bayesian Network model was built for diagnosing that particular problem using Dezide Advisor [9]. Fig. 3 and Fig. 4 show the knowledge model for the “Poor Starting Problem” in the TSS and the graphical network respectively. The Dezide Advisor package constitutes three different software tools that include: the Troubleshooter, Authoring tool, and the Troubleshooter Model Selector (TMS) Editor. In the BBN model building step, the first task is to create a Troubleshooter Specification (TSS) file. The next task is to add causes and assign probabilities to the causes through a cause probability editor [9].
(3)
Consequently, the sequence c1, …, cn has a lower (preferred) ECR than that with ck and ck+1 reversed if . It follows that the optimal and only if is thus given by the following plan: 1. Compute the probabilities of component faults pi given that the system is not functioning. 2. Observe the component with the highest . ratio 3. If the component is faulty, then repair it. 4. If the component was repaired, then terminate. Otherwise, go to step 2. In the above plan, if a component is repaired in step 3, we know from the single fault assumption that the device must be repaired, and the troubleshooting
387
Fig. 3: Knowledge Model for Poor starting problem
388
NABENDE AND WANYAMA
Fig. 4. Graphical Network for Poor Start Problem
The probabilities are used by the Troubleshooter to enable it know how to determine the most efficient troubleshooting sequence [15]. Actions are added and specified through an action editor and constraints for action window [9]. To add functional questions, a question wizard is used that can quickly categorize and assign the questions to performing the tasks they are intended to perform. After building a BBN model, it has to be compiled and saved as a net file by adding it to the model selector using the TMS Editor [9]. In the TMS Editor a category is created and a sub-level is then added and in its Label field, a title is added. V.
RESULTS
We used the C-sharp programming language for developing the problem selection interface. The window for the selection of a problem is shown in Fig. 5. When a particular problem is selected, the user is presented with the troubleshooter (Fig. 6) from which the user can diagnose the engine problem.
Fig. 5. Problem Selection Interface
Fig. 6. Dezide Troubleshooter Interface for Problem Diagnosis
The Troubleshooter utilizes the advantages of Bayesian Network technology to optimize the most efficient sequence of questions and steps for troubleshooting the problem. It takes into consideration various factors such as the most likely cause of the problem, the “cost” of each troubleshooting step (in terms of time, difficulty, risk, and money), how much information can be gathered from each step [9]. The troubleshooter adjusts its various internal variables, and therefore its diagnosis at each step depending on the answers to the questions and results of each troubleshooting step. VI.
VALIDATION OF THE HDDE-ES
In order to validate the Expert System, the following areas were of concern: correctness, consistency, and completeness of Bayesian Network models developed; the ability of the control strategy to consider information in the order that corresponds to the problem solving process; appropriateness of information about how conclusions are reached and why certain information is required; and agreement of the Expert System’s output with Experienced mechanics’ corresponding solutions or solutions provided in troubleshooting manuals. BBN models were used for representation of the diagnosis of different types of engine problems. The BBNs were also used as major components of correctness proofs. These proofs can be worthless if the underlying knowledge about the diagnosis of HDDEs is false in the domain. Validation of the BBN models developed involved them being reviewed and compared to models a mechanic would apply when diagnosing HDDE faults leading to particular types of engine problems. Results showed that some models lacked some steps that some of the experienced mechanics considered should have been included. These BBN models were redeveloped incorporating the missing variables. The performance of the expert system was then validated against that of experienced mechanics involved in the maintenance and repair of HDDEs. A collection of carefully selected Cases were generated. The Test Cases were solved by a number of experienced mechanics as
AN EXPERT SYSTEM FOR DIAGNOSING HEAVY-DUTY DIESEL ENGINE FAULTS
well as the Expert system. The term “Case” is used to describe an entire diagnosis sequence, including user response, for a particular cause. Three experienced mechanics were each given two different Test Cases and were allowed to criticize or agree with the steps performed. The purposes of this exercise was to see if the model could solve the problem and / or isolate the selected causes as the experienced mechanics would normally do. All mechanics agreed with the way the Expert System guided a user in making diagnoses. The Expert System made its diagnoses similar to the way the mechanics diagnose HDDEs. However, the Bayesian Network models that we created were not sufficient to cover all faults that can be associated with a HDDE. These models can easily be added on to the Expert System’s Knowledge base. VII.
CONCLUSION
In this paper, we have presented an Expert System for diagnosing HDDE faults which utilizes Bayesian network technology. First of all, Bayesian network models representing different types of engine problems and their causes were built. Probabilities were then associated with the causes. The method used involved development of a system of constraints enforcing the Bayesian network to have the correct prior probabilities as specified by the knowledge engineer. Having Bayesian networks or models with causes alone are insufficient. There is need for an explicit recommendation as to which components can be repaired or replaced. To provide this type of support, the Bayesian Network models with only causes were extended, where each diagnosed node would have to be modeled by two additional nodes in order to obtain sequential diagnostics where the expert system needs to recommend which step to perform next and when to stop. The expert system was tested and validated against the diagnostic process of experienced mechanics who are involved in the maintenance and repair of HDDEs and the troubleshooting procedures recommended in Engine Repair manuals for selected Test Cases. Results showed that the Expert System made diagnoses similar to those of the experienced mechanics for a given number of selected test cases. By using Bayesian network technology for representation of knowledge and as the reasoning mechanism, this expert system offers advantages over expert systems developed using different techniques for KRR for example production rules and decision trees. Most important of all is the ability to easily account for uncertainty by use of probabilistic theory and Baye’s Theorem; and the ability to integrated learning from data and expert knowledge, which is difficult to implement using other Knowledge Representation techniques. The HDDE-ES can further be improved by automating the updating process of the knowledge base and reasoning steps when new information is gained from experience that is obtained after using the expert system for a given number of cases. Research in the techniques of Bayesian Networks has also gone in various directions; interesting work can be done in the implementation of similar expert systems that involve
389
new techniques that are extensions to Bayesian networks such as those derived from Dynamic Bayesian Networks and Hybrid Bayesian Networks. REFERENCES [1] R. Yunpeng, H. Tianyou, Y. Ping, and L. Xin, “Approach to Diesel Engine Fault Diagnosis Based on Crankshaft Angular Acceleration Measurement and its Realization,” Proceedings of the IEEE International Conference on Mechatronics & Automation, vol. 3, pp. 1451-14154, July 2005. [2] H.L. Gelgele and K. Wang, “An Expert System for Engine Fault Diagnosis: Development and Application,” Journal of Intelligent Manufacturing, vol. 9, issue 6, 1998, pp. 539545. [3] J. Durkin and J. Durkin, Expert Systems: Design and Development, 1st ed., Prentice Hall PTR, USA: Upper Saddle River, NJ, 1998. [4] M. Sagheb-Tehrani, “Expert Systems Development: Some Issues of Design Process,” ACM SIGSOFT Software Engineering Notes, vol. 30, New York: ACM Press, 2005, pp. 1-5. [5] J. Barkai, “Vehicle Diagnostics – Are You Ready for the Challenge?” In ATTCE 2001, vol. 5. SAE, October 2001. [6] T. Roussat, “Diagnostic Stations, Expert Systems and Physical Measurements,” Proceedings of the Institute of Mechanical Engineers International Conference on Automotive Diagnostics, London: IMechE, 1990. [7] M.F.B. Zaiyadi, “Expert System for Car Maintenance and Troubleshooting,” unpublished. [8] J. Magnus, Fault Isolation Utilizing Bayesian Networks, Department of Numerical Analysis and Computer Science, Sweden: Stockholm, 2004, p. 58. [9] Dezide ApS, Dezide Advisor Users Guide, Dezide ApS, Denmark: Aalborg, 2004. [10] D. Heckermann and M.P. Wellman, “Bayesian Networks,” Communications of the ACM, vol. 38, pp. 2730, March 1995. [11] Norsys Software Corporation, “Netica Bayesian Network Software,” 2006. [12] Hugin Expert A/S, HUGIN, Denmark: Aalborg, 2004. [13] D. Heckerman, J. Breese, and K. Rommelse, “DecisionTheoretic Troubleshooting,” Communications of the ACM, vol. 38, pp. 49 – 57, 1995. [14] J. Breese and D. Heckerman, “Topics in DecisionTheoretic Troubleshooting: Repair and Experiment,” In Proceedings of Twelfth Conference on Uncertainty in Artificial Intelligence, Portland, OR, pp. 124 – 132. Morgan Kaufmann, August 1996. [15] J. Kadane and H. Simon, “Optimal Strategies for a Class of Constrained Sequential Problems,” Annals of Statistics, vol. 5, pp. 235 – 255, 1977. [16] J. Kalagnanam and M. Henrion, “A Comparison of Decision Analysis and Expert Rules for Sequential Analysis. In P. Besnard and S. Hanks (eds.), Uncertainty in Artificial Intelligence, New York: North Holland, pp. 271 – 281, 1990.
Interactive Visualization of Data-Oriented XML Documents Petr Chmelar, Radim Hernych, Daniel Kubicek Brno University of Technology, FIT [email protected], { xherny00 | xkubic17}@stud.fit.vutbr.cz
Abstract-There is many data stored and exchanged in XML nowadays. However, understanding of the information included in large data-oriented XML instances may not be simple in contrast to the presentation of document-oriented XML instances. Moreover, acquaint the desired knowledge from complex XML data is hard or even impossible. Although we propose two solutions for simple presentation and further analysis of the information, the main goal of the paper is to get a feedback from researchers and professionals who need to deal with large data-oriented XML in a discussion on the data visualization, modeling and understanding.
I.
INTRODUCTION
In many applications, complex structured data have been used. For the purpose of storing and exchange of a wide variety of data, the XML is playing an increasingly important role and it is widely supposed to be the universal solution by many vendors. XML (Extensible Markup Language [7]) is a simple, flexible text format derived from SGML (ISO 8879) originally designed to meet the challenges of large-scale electronic publishing. It is a standardized flexible markup language and format for creating, storing, exchange and publishing structured data for specific domains or applications. Data specified in XML is tree-structured using properly nested (pairs of) markup tags, that represents its metadata, more in [7]. According to [2], the XML instances may be divided into two groups: document-oriented and data-oriented instances. The document-oriented XML documents are aimed to structure and format the textual information (in a standard way). The meaning is usually present without the tags (markup elements). Example of the document-oriented XML are XHTML, RSS documents or some Wiki pages. Contrary, the data-oriented XML aims mainly at the domain of machine (computer) readable data and it is usually not suitable for viewing by a user without specific software that is able to interpret the particular document or its part and presents it to the user. In these documents, tags are used both to create the structure of the XML instance and to give particular semantics to a data of the document. If the semantic tags were removed, the data would most usually loose its information value. Examples of data-oriented XML instances are database dumps, configuration files, serialized objects or standardized documents using XML like UML (Unified Modeling
Language), SOAP (Simple Object Access Protocol), AJAX (Asynchronous Javascript And XML) or MPEG-7 (Moving Picture Experts Group: Multimedia Content Description Interface) data standards. We think of the data that the appropriate visualization together with its metadata gets the user better insight both of the data structure and of the application itself. However, the main question of data visualization is “How to visualize?” We can admit to we do not know, because we do not have the application dependent data. Of course, we cannot. Thus, we let the mayor part of the work to the user who has the data and who has or interactively gets an idea of the information it should contain. So, we focus on interactive visualization and modeling of the data together with its metadata. The paper is organized as follows. After this introduction, the report on state of the art and theoretical analysis is present. The third chapter briefly presents our solutions to the problem. The basic idea was to enrich UML to model not only the metadata. This is our first proposed solution and the second is a table-oriented approach. Both techniques are demonstrated using a simple case study. The further work and the conclusions are in the chapter four. II.
REPORT ON STATE OF THE ART
XML tends to model the logical data structure similarly to common modeling techniques like Entity-relationship model (E-R) or UML [UML]. There are many tools modeling the XML structure using DTD or XML Schema [8]. E.g. Liquid XML Studio or tools within various development environments (Oracle JDeveloper, Eclipse, Microsoft Visual Studio) can visualize the structure similarly to the class and package diagrams in UML. However, the structure holds a minor part of the information of an XML document; the rest is present in values of tags and its attributes. This information cannot be modeled in the manner of UML’s structure diagrams. The common way of XML data visualization is by using the tree structure of an XML document [7], e.g. DOM (Document Object Model [10]) by common WWW browsers. However this is the rough way, because XML doesn’t implicitly define how to display the document. Browsers are primarily designed to transform the XML according to the ad-hoc XSL style sheet [9] or, of course, some (X)HTML specification. The
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 390–393. © Springer Science+Business Media B.V. 2008
INTERACTIVE VISUALIZATION OF DATA-ORIENTED XML DOCUMENTS
transformation-based visualization approach has its mayor disadvantage – the ad-hoc property. There is no way how to present information present in general document with unknown of previously unseen logic and semantic structure. This is common for all deterministic techniques. The substance we miss while using deterministic techniques is the deduction – induction process, common to the people (informally called thinking). Although we were thinking hard, we haven’t found any artificial intelligence or data mining technique that is able to solve the general presentation problem. Nor in the literature we have found a serious one [5]. The only way (to publish this paper) is to keep human in the loop. This feature is commonly called interactivity. Well, presuming users really want to simply understand unknown XML documents, there is a few techniques that can be used to visualize information present in hierarchical data [3]. We think of techniques illustrated in the previous figure as they are (nice but) inappropriate for visualizing large hierarchies e.g. having hundreds of nodes. This is similar to using standard DOM tree (see fig. 2.). For the space reduction, there has been published some approaches using tree rewriting (e.g. [4]). However, we think of the technique that it doesn’t help to understand large documents much, because of the need to specify “boxing” rules for elements in advance in the transformation process, which may result in a graphical representation with really high entropy. Similarly, there has been introduced some visualization techniques using the 3rd dimension of the space to suppress the actually uninteresting information. For instance there are cone
391
rooted tree, fisheye application on the treemap layout or other kinds of perspective and hyperbolic geometry applied on radial tree, as surveyed in [Buckner]. We decided not to use any of the described techniques because they are not intuitive and interactive enough, which seems to be important for people meeting new information. Moreover, these approaches aren’t capable to the further (visual) analysis e.g. using charts or (famous cake) graphs. III. PROPOSED SOLUTIONS The basic in the data-oriented XML visualization is the data structure. Of course, the XML documents are usually modeled as the tree, similarly to the DOM. But, we have reformulated the question as follows. What data structures are contained in a data-oriented XML? If we omit the mixed content (of text and elements), typical for the document-oriented XML instances, the answer is surprisingly easy. There are simple elements, sequences – structures of ordered elements (objects like), collections of such elements and its combinations. This information can be found in the XML Schema (XSD) or derived directly from the XML where the schema is not present (e.g. in the way we do). TABLE I XML ILLUSTRATION, [6] (8.3KB TOTAL) <weather ver="2.0"> ... ... 10/22/07 6:30 PM Local Time Brno, Czech Republic 5 0 Mostly Cloudy 27 ... <wind> ... ... 10/22/07 3:21 PM Local Time N/A 3 ... ...
Structure diagrams in UML [UML] model the sequences as objects. The problem is that a collection of these aggregated elements should be modeled as a relation of the elements. It is not a problem in case of meta-modeling in UML, but it is a trouble for our purpose. This means we cannot simply use
392
CHMELAR ET AL.
(structure diagrams of) UML for modeling the data together with the metadata. However this has been our first idea. A. Case Study As an example, we use a week weather forecast from weather.com® [6] as illustrated in Table I. The simplest way to interactively present the data is to show the document model [10] in a navigable tree component. Of course, this is not satisfactory for our purpose – to intuitively analyze information in complex documents (e.g. MPEG-7, SOAP), as you can se in the Fig. 2. B. UML-oriented approach Our first proposed approach for modeling data together with the metadata has been inspired by UML Structure diagrams [UML]. Class and package diagrams model elements as entities (classes) but our approach adds the data of its attributes and simple sub-elements in a unified way (except the icon). Moreover, it visualizes collections of elements as lists. The mixed complex elements, especially containing at once sequence of simple elements (XSD maxOccurs is 1) and (unbounded) amount of some other elements are modeled using the relation, as you can see in Fig. 3. The proposed tool is called VisualXML1. It can interactively display the XML data model and the DOM tree as described above including some additional features, e.g. “hawkeye” for
Fig. 3. Modeling using VisualXML.
quick orientation in large data models. We don’t presume the data has any kind of definition. It also is capable to visually edit such XML documents – remove, edit, add and clone complex elements. Although VisualXML1 is capable to interactively filter displayed elements, the further analysis of the visualized information is limited by that. C. Table-oriented approach We found out, we need an advanced analysis of the information present in XML documents. Our advanced requirements include the standard visualization techniques (charts and graphs, [5]) as well as simple integration of the data to existing applications (e.g. using database) of much various XML data for research purposes. Consequently, we would like to analyze the data using Online Analytical Processing (OLAP) and data mining techniques to acquaint the knowledge from the proprietary XML data e.g. for security purposes. Thus we have decided to develop more advanced application. Fig. 2. Illustration of the DOM tree.
1 The release of VisualXML together with its Java source code under GNU General Public License is available at http://www.fit.vutbr.cz/~chmelarp/public/.
INTERACTIVE VISUALIZATION OF DATA-ORIENTED XML DOCUMENTS
The idea was to use the standard relational tables to display the XML structure. However, relational tables are often criticized to be flat and poorly flexible to manage hierarchical and object data present in the XML. Even though, today’s object-relational solutions seriously beat pure object databases. We were inspired. The idea was to use tables for the purpose they are best in. It is the presentation of collections of large records. The crucial of application of the idea is to find collections of records in the XML. As we found out, interactive discovery of such structures in XML is surprisingly easy, because there are elements and their collections usually. For most purposes, it is typical to have a collection of elements that are not (unbouded) trees itself. This means we can display the collection and unroll the sequences of aggregated sub-items. Moreover, there is a good possibility that a theoretically unbounded collection of elements is bounded to some small value in praxis (e.g. periods or phone numbers). Consider displaying these data using a flat table that is column-flexible – table in which the columns are not defined strictly (e.g. am, pm or phone1, phone2, …). This might work. There is example of the proposed application called XMLAD2 and its modeling abilities in the Fig. 4. There can be seen also the limitation of the table-oriented principle – nested collections must be modeled as relations. XMLAD2 is able to visualize the XML data effectively in the current development stage. The aggregation table (10 day records in the Fig. 4.) consists of the data and the metadata part that it is able to unroll as (new) columns. In case this is not possible (or a user may decide), aggregated entity is displayed as a separate table in the same way as VisualXML1 does. However, even XMLAD has several development limitations, we think of it to be the best possible solution, because all our needs were met. IV.
CONCLUSIONS
We have proposed ideas and two working solutions for XML data visualization, modeling and understanding in the paper. Although our needs and requirements were met successfully by the introduced applications, we are extremely interested in the following discussion and requests of other researchers and professionals in many different areas, where large data-oriented XML is being manipulated. Because we believe, that there is an increasing need of simple large XML data visualization and analysis, our future work may be strongly influenced by the discussion. As examples of topics or requests may be XML-relational mapping, data exchange in the multiple formats (e.g. Excel) and multiple document visualization, processing and joining, especially where foreign keys are not explicitly defined in the interrelated XML documents. 2 The pre-release version of XMLAD together with its Java source code under GNU General Public License is available at http://www.fit.vutbr.cz/~chmelarp/public/.
393
Fig. 4. Modeling using XMLAD.
REFERENCES [1]
N. Buckner. “Visualization Techniques for Hierarchical Information Structures”. 2002. . E. R. Harrold, and W. S. Means. XML in a Nutshell Second Edition. 640 p. O’Reilly & Associates, 2002. ISBN 0596002920. [3] D. Holten. “Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data”. IEEE Trans. On Visualization And Computer Graphics, vol. 12, no. 5, 2006. [4] J. Jelinek, and P. Slavik. “XML Visualization Using Tree Rewriting”. Procs. of 20th Spring Conf. on Computer Graphics SCCG. p. 65 – 72. ACM Press, 2004. [5] T. Soukup, and I. Davidson. Visual data mining. New York: J. Wiley, 2002. 382 p. ISBN 0-471-14999-3. [6] Meteorologists at The Weather Channel®. 2007. <weather.com>. [7] W3C. Extensible Markup Language (XML). 2007. . [8] W3C. XML Schema. 2007. <www.w3.org/XML/Schema>. [9] W3C. XSL Transformations (XSLT) Version 2.0. 2007. . [10] W3C. Document Object Model (DOM). 2005. . [2]
Issues in Simulation for Valuing Long-Term Forwards Phillip G. Bradford1, Alina Olteanu2 Department of Computer Science The University of Alabama Box 870290 Tuscaloosa, AL 35487-0290
Reserve’s effective over-night rates. Where over the longAbstract- This paper explores valuing long-term equity forward-contracts or futures where both the underlying volatility and the interest rates are modeled as stochastic random variables. For each future, the underlying is an equity or index with no dividends. A key computational question we wish to understand is the relationship between (1) stochastically modeling the interest rates using different interest rate models while holding the volatility fixed and (2) stochastically modeling the underlying volatility using a volatility model while holding the interest rates fixed. In other words, let a “pure-X model” be a futures model where X varies stochastically while all else is fixed. Given a single future to model, this paper works towards understanding when a pure-interest rate model is equivalent to a purevolatility model. This paper is focused on simulation and modeling issues and does not offer economic interpretation.
S
I.
INTRODUCTION AND MOTIVATION
types of long-term futures actively trade. For example, S&P Futures are traded as much as three years out. In general, dealing with a long-term future, the underlying’s characteristics may change over time. For example, in the recent credit-crunch, some mortgage products or SIVs are no longer collateralizable. The anticipation of significant changes in the characteristics of underlying issues may force traders to unwind arbitraged positions. Thus, it is important to understand the interaction of different components of futures models for the long-term. Evaluating very long-term equity forwards or futures requires understanding and modeling of risk-free interest rates. Likewise, the volatility of the underlying stock is critical for valuing long-term futures. Both risk-free interest rates and the underlying’s volatility are critical inputs to Black-Scholes-Merton models. Dividends or coupons are not addressed here. Our preliminary study works toward understanding the relationship of the stochastic volatility models and the stochastic interest rate models. Stochastic interest rate models are different from stochastic volatility models in that short term interest rates tend to correlate to the FederalEVERAL
term, interest rates are more dynamic. Likewise, volatility is subject to reversion to the mean arguments. This paper is structures as follows. In Section II we give some background for this paper and we discuss Itô’s general lemma. We then give a comparison of random volatility and interest rates in Section III. This is followed by preliminary simulation results in Section IV and Section V concludes the paper. II.
A. Annuities Given the fixed annual interest rate R , let n be the number of years and m be the number of times the principle P is compounded per year. This increases P’s value as the mn
⎛ R⎞ next well-known product, P⎜1 + ⎟ . Following the ⎝ m⎠ elementary and classical work, we take this product’s limit m → ∞ which by elementary calculus gives Pe Rn . B. Discounting The classic question is: today how much is principle P worth in n years to an investor? The well-known answer is Pe − Rn . Note, the annual fixed interest rate is R . TABLE I A TRADING ACCOUNT ILLUSTRATING ARBITRAGE Account Contents Initial Cost t =T
+S
1
[email protected] 2 [email protected]
BACKGROUND
Fundamentally, a stock future derives its value from the underlying stock. The future has a strike price X and an expiration date T . We assume the future can be exercised only at time t = T . Without loss, it is generally assumed the underlying has a continuous price St at time t for t : T ≥ t ≥ 0 . Take t = 0 as the current time. The expiration date T is generally measured in multiples or fractions of years. For example, an expiration date in 18 months is represented by T = 1.5 and an expiration date in one week is represented by T = 1 52 .
−F
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 394–398. © Springer Science+Business Media B.V. 2008
− $ Se R (T −t ) +$ F
−$S +$ F
ISSUES IN SIMULATION FOR VALUING LONG-TERM FORWARDS
A Brief Review of the Itô Generalized Lemma C. Itô’s Lemma is a classical result in the stochastic calculus with numerous applications [3]. In computational finance Itô’s Lemma is routinely applied to valuing various types of derivatives. Let the underlying price St be a random function of t . In this paper a future is modeled by a function F of St and t since F’s value is derived from St and its time-to-expiry is determined by t. As is standard [3], let S=St be modeled by the following stochastic process when t is understood, dS = a( S , t )dt + b( S , t )dz , where dz is the derivative of a Wiener process Wt therefore
we can write dz = ε dt , where ε ∈ N (1,0) , where N(1,0) is the normal distribution with mean 1 and standard deviation 0. The next result is classical (see for example [3], pp. 304305). Lemma 1: (Itô’s Lemma – The General Case) Assume f = f (x1 ,..., x k , t ) has continuous first and second derivatives and dx i = a i (S , t )dt + bi (S , t )dz i , for i : k ≥ i ≥ 1 and the correlation of the Wiener processes
(
)
is ρ ij = ρ dz i , dz j . Then df =
∂f ∂f 1 k k ∂2 f bi (S , t )b j (S , t )ρ i , j dt dx i + ∑∑ dt + ∑ 2 i =1 j =1 ∂x i x j ∂t i =1 ∂x i k
k ⎛ ∂f ⎞ ∂f 1 k k ∂2 f =⎜ +∑ a (S , t ) + ∑ ∑ bi (S , t )b j (S , t )ρ i , j ⎟dt ⎜ ∂t i =1 ∂x i i ⎟ 2 i =1 j =1 ∂x i x j ⎝ ⎠ k ∂f bi (S , t )dz i . +∑ i =1 ∂x i
D. Forward Contracts on Stock without Dividends Let S=S0 be the current spot price of the underlying stock and let F =F0 be the current price of the forward contract. Say r is the current risk-free interest rate from the current time t to the time of expiry T. We assume the underlying has no dividends or dilutive actions during the life of the future. The next argument is classical, see for example [3]. Now, suppose currently at time t we have F > Se r (T −t ) , then right now we can borrow S for T-t time at the risk free rate of r to buy the underlying and short the forward contract. This gives a guaranteed profit of F − Se r (T −t ) . Note, this is at the current spot risk-free interest rate r. On the other hand, suppose F < Se r (T −t ) , then go long the forward contract and short the underlying. This gives S cash for T-t giving a profit of Se r (T −t ) − F . Again, this is because the current spot risk-free interest rate
395
from the current time t to the time of expiry T is r. Together, these last two inequalities give the next wellknown fact [3]. Fact 1: Say T is the time of expiry of the future F and t is the current time. The current risk-free interest rate from time t to the time of expiry T is r. Let S be the spot price of the underlying, F is the current price of the future for S, then F = Se r (T −t ) . In fact, the price in the futures or forward markets follows Fact 1 very closely. The classic arbitrage arguments used to prove this fact depend only on the current spot interest rates. However, if we anticipate having to unwind such an arbitraged position, then the stochastic behavior of the interest rates as well as the volatility are important. III.
MAKING THE VOLATILITY AND INTEREST RATE STOCHASTIC
The next stochastic equations are used to model longterm changes in the underlying, volatility and risk-free interest rates. The third equation is an equation we use to model long-term interest rates that is chosen due to its similarity to the first two equations. We will examine other classical interest rate models. The first two equations are rather standard, see [4] or [3]. They model the stock price S and the volatility σ using geometric Brownian motion. dS = μSdt + σS α dz dσ = μ s σdt + σ s σ β dz, where α ≥ 0 and β ≥ 0 . To model stochastic interest rates, first consider the Rendleman-Bartter model, see [3]: dR = μ R Rdt + σ R R γ dz R , where γ ≥ 0 , and the Vasicek model: dR = a(b − R) dt + σ R dz R , where a and b are constants. We do note, the Rendleman-Bartter model may not be the best suited interest rate model because it does not capture the mean reversion phenomenon, known to characterize long-term interest rates ([3], pp. 418). Instead, in this model, the interest rate behaves like a stock price with an increasing drift, etc. Vasicek’s model, on the other hand, incorporates mean reversion. The discrete versions of these equations are: ΔS = μSΔt + σS α Δz Δσ = μ s σΔt + σ s σ β Δz, The Rendleman-Bartter Model: ΔR = μ R R Δt + σ R R γ Δz R The Vasicek Model: ΔR = a(b − R) Δt + σ R Δz R , where α ≥ 0 , β ≥ 0 and a , b are constants.
396
BRADFORD AND OLTEANU TABLE II APPLICATIONS OF ITÔ’S GENERAL LEMMA
Rendleman-Bartter:
Vasicek:
dFR and dFR ,σ
Rendleman-Bartter:
Vasicek:
dFR and dFR ,σ
dFR , S
and
dFS , R and dFS , R ,σ
dFS ,R ,σ
1 S Re r (T −t ) [( σ 2 R(T − t ) 2 + μ R (T − t ) − 1)dt + σ R (T − t )dz R ] 2 R
1 Se r (T −t ) [( σ 2 R(T − t ) 2 + a (b − R )(T − t ) − R)dt + σ R (T − t )dz R ] 2 R
1 Se r (T −t ) [( σ 2R R(T − t ) 2 + ( μ R − σσ R ) R(T − t ) + μ − R )dt + σdz + Rσ R (T − t )dz R ] 2 1 Se r (T −t ) [( σ 2R R(T − t ) 2 + (ab − aR + σσ R )(T − t ) + μ − R )dt + σdz + σ R (T − t )dz R ] 2
Here we assume α = 1, β = 1 and γ = 1 . That is, we are assuming both the underlying stock S and the (long-term) interest rate follow geometric Brownian motion. We assume μ and μ R are both randomly and independently chosen as well as σ and σ R , see [4]. To make things more precise, the term dF will be denoted dFR ,σ when applying Itô’s general lemma to F for the stochastic random variables S and R. Likewise, dF will be denoted as dFS ,R ,σ when applying Itô’s general lemma to F for the stochastic random variables S and R and σ , etc. Take the quadratic equation Ax 2 + Bx + C = 0 . This equation has a parabolic solution with a focus of ⎛ − B − B2 ⎞ 1 ⎜ ⎟ ⎜ 2 A , 4 A + 4 A + C ⎟ , see for example [7]. Parabolas are ⎝ ⎠ well-understood structures to model. By applying the Itô general Lemma, we obtain the formulas in Table II. Let the variable x be x = T − t and consider the equations in Table II. All equations in Table II have the attenuating term Se r (T −t ) . Theorem 1: Consider the functions dFR ,σ and dFS ,R ,σ
(t = (a(b − R)dt + σ 0
R
dz R ) σ R2 dt ;
(
) )
y (t 0 ) = − a(b − R)dz R σ R − dz R2 2dt − R + a 2 (b − R) 2 σ R2 dt . There is a similar parabolic equation for dFS ,R ,σ with a slightly more complex focus. Notice, σ , σ R , a, b are all fixed constants. The terms dt is the change in t and R = Rt is based on the last value of R. QED. IV. EXPERIMENTAL RESULTS
Experiments were conducted using the GNU scientific library [2]. We used about 100 time steps for the graphs in the Figures. The time to expiry was set to T=10 years, μ = 1.1, a = 0.1, b = 0.1, S 0 = 100 and we used 16% for the volatility and 6% for the fixed interest rate. Fig. 1 shows a simulation of random S and random volatility only.
for both the Vasicek and the Rendleman-Bartter model. These functions represent attenuating convex stochastic parabolas and the maximum values of these functions are decreasing as t → T . The function of just stochastic underlying and volatility dFS ,σ is a single attenuating stochastic linear term σdz . Proof: Consider the Vasicek model for dFR ,σ ( dFR ) from Table II. The term Se r (T −t ) attenuates the rest of the equation between square brackets as T − t approaches 0. Then the equation between the square brackets is a parabolic function of t . This parabolic and stochastic function has focus:
Fig. 1. Modeling only random S and volatility
σ
giving
dFS ,σ
with
constant interest rates, see row 2 of Table II. Note in Table II, since
dFS ,σ = dFS
random
σ
adds nothing to random S.
397
ISSUES IN SIMULATION FOR VALUING LONG-TERM FORWARDS TABLE III DISCUSSION OF dF GRAPHICS
t → T , dFσ → − rSdt , so the graphic converges to a negative value (see Fig. 2).
dFσ
This is a concave increasing function. When
dFS ,σ
This equation is a form of geometric Brownian motion. For example, let Moreover,
the
r
terms
μ −r
and
dFS ,σ ≤ (μ − r )Se rT dt + σSe rT dz bounded from above by:
dFR ,σ
(μ − r )Se
rT
are
both
for every t
dt + σSe M rT
. When
M
(t
t →T
(
constant.
We
and
b(S , t ) = σSe r (T −t )
can
consider
max dFS ,σ
,
.
the
inequality:
dz . Then, max dFS ,σ
(t = (a(b − R)dt + σ
is
is reached at the value mentioned above.
dz , the function in square parenthesis is a convex parabola with
)
= T + (μ R dt + σ R dz ) σ R2 Rdt ; y (t 0 ) = − 1 + μ R2 2σ R2 R dt + (1 2 R )(dz dt ) − (1 R ) μ R σ R − (1 R ) dz dt 2
0
and
be the maximum of the absolute values of
Consider both Randleman-Bartter and Vasicek models. For each fixed focus given by and
a(S , t ) = (μ − r )Se r (T −t )
positive
2
(
)
) )
dz R ) σ R2 d t ; y (t 0 ) = − a(b − R)dz R σ R − dz R2 2dt − R + a 2 (b − R) 2 σ R2 dt respectively. In fact, we are not talking about a decreasing parabola branch, because dz is not a constant. dz is a parameter and for every fixed dz there is a second degree function in t . From another perspective, when t is close to T , dFR ,σ stabilizes itself around 0
R
− SRdt , hence it becomes negative. dFS , R ,σ
So,
dFR ,σ
goes to
− SRdt .
Again, for both models, the function in square parenthesis is a convex parabola with focus
⎞ ⎛ t 0 = T + (μ R + σ R σ + σ R dz R dt ) σ R2 R ; ⎟ ⎜ ⎜ y (t ) = μ − R − μ 2 2σ 2 − σ 2 2 − μ σ σ dt + σdz − (μ σ + σ + dz 2dt )dz ⎟ 0 R R R R R R R R ⎠ ⎝
(
)
for the Rendleman-Bartter model, and
⎛ t 0 = T + (ab − aR + σ R σ + σ R dz R dt ) σ R2 ; ⎞ ⎜ ⎟ ⎜ y (t ) = μ − R − a 2 (b − R )2 2σ 2 − σ 2 2 − a(b − R )σ σ dt − dz 2 2dt − a(b − R )dz σ ⎟ R R R R R ⎠ ⎝ 0
(
)
for the Vasicek model respectively. In this case there are two random terms:
σ R (T − t )dz R when R dz
σdz
and
σ R R(T − t )dz R when R
follows the Vasicek model (see Table II). When
term does not. Another observation is that for
dFS ,R ,σ
follows the Rendleman-Bartter model, and
t →T
the term containing
dz R
σdz
and
goes to zero, whereas the
computed with the Rendleman-Bartter model, there is an extra
R
in the
dz R
term, whereas for the Vasicek model, we can think of R as being 1. This represents the reason why, for the graph with the RendlemanBartter model, the oscillations tend to attenuate, but they do not disappear, as in the case of dFS ,R ,σ with the Vasicek model. See the differences between the two graphics in Figs. 5 and 6.
Fig. 2. Modeling only random volatility
σ
giving
dFσ , a concave
increasing function, see row 1 of Table II. The graph converges to a negative value.
Fig. 3. Graph of
dFR ,σ by applying Itô’s General Lemma. follows the Rendleman-Bartter Model.
See Table II. R
398
BRADFORD AND OLTEANU V.
CONCLUSION
Simulating a future with only stochastic interest rate dFR ,σ compared to the same future with only stochastic volatility and underlying dFS ,σ both have attenuating terms in Se r (T −t ) . But, the stochastic interest rate simulation is a stochastic parabolic equation where the volatility is based on a stochastic linear equation. More investigation will give insight into the simulation effects of combining these two stochastic models. REFERENCES [1] Fig. 4. Graph of
dFR ,σ by applying Itô’s General Lemma.
See Table II. R
follows the Vasicek Model.
[2] [3] [4] [5] [6] [7]
Fig. 5. Graph of
dFS ,R ,σ
by applying Itô’s General Lemma. See Table II.
R follows the Rendleman-Bartter Model.
Fig. 6. Graph of
dFS ,R ,σ
by applying Itô’s General Lemma. See Table II.
R follows the Vasicek Model.
R. Buff, Uncertain Volatility Models – Theory and Applications, Springer, 2002. GSL – GNU Scientific Library. (2007, September 1). [Online]. Available: http://www.gnu.org/software/gsl. J. C. Hull, Options, Futures, and other Derivatives, Prentice Hall, 3rd ed., 1997. H. Johnson and D. Shanno, “Option Pricing when the Variance is Changing”, Journal of Financial and Quantitative Analysis, vol. 22, pp. 143-151, 1987. I. Karatzas and S.E. Shreve, Brownian motion and stochastic calculus, Springer, 2nd ed., 1991. S.W. Malone. (2002, April). Alternative Price Processes for BlackScholes: Empirical Ev-idence and Theory [Online]. Available: http://www.math.duke.edu/vigre/pruv/studentwork/malone.pdf Parabola. (2007, October 20). [Online]. Available: http://en.wikipedia.org/wiki/Parabola.
A Model for Mobile Television Applications Based on Verbal Decision Analysis Isabelle Tamanini, Thais C. Sampaio Machado, Marília Soares Mendes, Ana Lisse Carvalho, Maria Elizabeth S. Furtado, Plácido R. Pinheiro University of Fortaleza (UNIFOR) – Graduate Course in Applied Computer Science (ACS) Av. Washington Soares, 1321 - Bl J Sl 30 - 60.811-341 - Fortaleza – Brasil {isabelle.tamanini, thais.sampaio, mariliamendes, ana.lisse}@gmail.com, {elizabet, placido}@unifor.br Abstract - The emergence of Digital Television - DTV brings the necessity to select an interface that may be used in Digital Television for Mobile Devices. Many different candidate solutions for interaction applications can be possible. This way, usability specialists prototyped interfaces that were analyzed according to users’ preferences through usability test, considering criteria classified in accordance with users’ preferences and alternatives, which were obtained from a ranking modeled by verbal decision analysis. Results revealed great influence of the application’s functions evidence in the ease of navigation.
I. INTRODUCTION In domains that represent a new paradigm of interactivity (as Digital Television - DTV, smart home and tangible interfaces), the decision of the most appropriate interaction design solution is a challenge. Researchers of the Human-Computer Interaction (HCI) field have proposed in their works the validation of the design alternative solutions with users before developing the final solution. Taking into account the users’ satisfaction and their preferences is an act that has also gained ground in this kind of work when designers are analyzing the appropriate solution(s). Recent research reveals that the understanding of subjective user satisfactions is an efficient parameter to evaluate application interfaces [1]. In domain of interaction design for DTV, we assume that it is necessary to consider both international aspects for supporting the accessibility for all and digital contents for supporting a holistic evaluation (content and user interface) of the TV applications that show a content through their user interfaces. Structured methods generally consider quantitative variables (such as: amount of errors, number of executions of the function “Help” by the user, time spent to find a new function, etc). Research has evaluated qualitative aspects, like user satisfaction and his emotion when manipulating the technology, through the observations and comments obtained during the usability tests [2]. The users generally are encouraged to judge the attractiveness of the application interface, and from these judgement, evaluators elicit qualitative preferences [3]. The aesthetic quality of a product influences in users’ preferences but other qualitative aspects may influence in their judgments more, so that transcend the aesthetic appearance [4]. When subjective questions were understood, another problem deliberated for the research of the technical report is about traditional way of evaluation. It is quite a little hard to create new project alternatives and new ways to consider these alternatives because of the low flexibility. For example, designers evaluated two interface solutions when they applied usability tests, and selected one of them to implement. During the system development, more three design solutions emerged as a result of new usability pattern.
How can designers consider these new alternatives? How do they evaluate if a new solution is better than the old one? Traditionally, usability tests should be applied to all alternatives. With a multicriteria model, these decisions are efficient and only some alternatives would be evaluated. In this project, three interface solutions for Mobile Digital Television Application were evaluated qualitatively by applying a verbal decision analysis methodology. This strategy maps correctly the provided information of user’s preferences, which helped the judgement of the project solutions. Provide a holistic evaluation of interactions situations and more information to understand and organize subjective questions. The analysis of design solutions becomes more flexible. The ranking generated by the model is a tool which makes easier to insert new alternatives and judgements for interfaces. In order to be able to use the model, hypotheses were elaborated. These hypotheses consider important characteristics for mobile DTV interaction. From these hypotheses, criteria were established as well as usability tests were applied in order to obtain information about user’s preferences. The method ZAPROS III [5], which belongs to the Verbal Decision Analysis framework, was used. The method must be applied to problems that have qualitative nature and that are difficult to be formalized. Problems called unstructured [6]. The Aranaú tool [7] was developed with the aim of applying the verbal decision analysis method ZAPROS-III, helping on modeling unstructured problems. II. HYPOTHESES AND EVALUATION SCENARIO The following hypotheses were the basis to elaborate the multicriteria model which adheres to the reality of evaluations for applications of mobile Digital Television: • Hypothesis 1: The evidence of the functions in the application facilitates the use and influences in the effort spent by the user to localize himself in the application; • Hypothesis 2: The user experience with applications which have similar ways of navigation will influence the choice of an interface that has facility of use, accuracy and user satisfaction; • Hypothesis 3: The locomotion of the user while manipulating the device will influence in the choice of the interface which requires less precision to navigate between the options and screens than the others interfaces, so that facilitates the navigation of a person while manipulating the application; • Hypothesis 4: The involvement with the content influences in the user choice, so that, if the content is interesting, it may be decisive for the user to choose the interface;
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 399–404. © Springer Science+Business Media B.V. 2008
400
TAMANINI ET AL.
• Hypothesis 5: The emotion felt by the user when using the interface exercises a considerable influence in the choice. Once the hypotheses were defined, usability tests with three mobile DTV prototypes were elaborated ( Fig.s 1, 2, and 3). The tests helped to elicit user preferences. The results of the tests were entered as data for the model. The usability tests were applied with young users who had wide experience with palms, DTV and desktop computing devices. The users evaluated were 12 university students and the duration of
Fig. 1. Prototype 1. Similar to TVD applications
the test for each user was between 20 and 30 minutes. Two different locations were used: the usability laboratory (LUQS) and a natural environment (field study). Interface Designers and Usability Specialists were present during the tests process. For each user, the test began showing them a sample portal application for digital TV [2]. This was done to the user know how the digital TV works. The next step is the execution of the usability tests with mobile DTV application. The tests begun as follows: before using the applications, the users were interviewed (using a questionnaire) in order to study their experience and opinions. They were also informed about how the test would be conducted. During the use of the application, the users were observed and specialists marked a checklist of questions. The user should execute four scenarios, and after each one, he should fill out a questionnaire about the context of the tested scenario. In addition, the users were monitored with cameras while using the application (with software that collected and stored results). The scenarios considered subjective aspects of each user. For example, if the user was a soccer fan who had a lot of experience with desktop applications, one example of scenario could be to execute sports programs in each one of the three prototypes. Should the solution for the project chosen by this user be something similar to desktop applications or the user who executes the scenario in movement would choose a solution similar to a DTV portal? These questions are directly connected to the hypotheses listed in the beginning of this section. Next we present a summary of the ZAPROS III method in order to get a better understanding how it works out. III. VERBAL DECISION ANALYSIS
Fig. 2. Prototype 2. Similar to Palm applications, navigation with scroll
Fig. 3. Prototype 3. Similar to Desktop applications, navigation with scroll
The ZAPROS III method belongs to the Verbal Decision Analysis – VDA framework. It combines a group of methods that are essentially based on a verbal description of decision making problems. This method is structured on the acknowledgment that most of the decision making problems can be verbally described. The Verbal Decision Analysis supports the decision making process by verbal representation of the problem [8]. ZAPROS III was developed with the aim of ranking given multicriteria alternatives, which makes it different from other verbal decision making, such as ORCLASS [9], and PACOM mainly due to its applicability. It uses the same procedures of elicitation of preferences, however with innovations related to the following aspects [5]: 1. The procedure for construction of the ordinal scale for quality variation and criteria scales are simpler and more transparent; 2. There is a new justification for the procedure of alternatives comparison, based on cognitive validations [10]; 3. The method offers both absolute and relative classification of alternatives. The method ZAPROS III can be applied to problems having the following characteristics [5]: • The decision rule must be developed before the definition of alternatives; • There are a large number of alternatives; • Criteria evaluations for alternatives definition can only be established by human beings;
401
A MODEL FOR MOBILE TELEVISION APPLICATIONS
• The graduations of quality inherent to criteria are verbal definitions that represent the subjective values of the decision maker. The decision maker is the key element of multicriteria problems and all necessary attention should be given in order to have well formed rules and consistent and correctly evaluated alternatives, always considering the limits and capacity of natural language. Thus, the order of preference will be adequately obtained according to the principle of good ordering established by Zorn’s lemma [11]. The application of the method ZAPROS III for problem modeling can be applied following this three-step procedure: a) Definition of Criteria and their Values: Once the problem is defined, the criteria related to the decision making problem are elicited. Quality Variations (QV) of criteria are established through interviews and conversations with specialists in the area and decision makers. b) Organization of Ordinal Scales of Preference An ordinal scale of preference for quality variations for two criteria is established based on pairwise comparisons. The preference between these two criteria is chosen according to the decision maker and the obtained scales of preferences are denominated Joint Scale of Quality Variation (JSQV) for two criteria. When carrying out the comparisons, it is assumed that there is an “ideal” alternative based on the decision maker’s preferences. From this ideal case, questions are made to the decision maker who will answer according to his preferences in relation to the other criteria values. In this way, the scale can be elaborated either by direct answers of the decision maker or by transitive operation [12], which helps diminish the quantity of necessary comparisons. The transitivity also helps the check of independence between criteria and groups of criteria as well as the identification of contradictions on the decision maker’s preferences. Dependence of criteria and contradictions should be eliminated by formulating new questions to the decision maker and remodeling the criteria (with possibility of carrying out a new formation of natural language and identification of other quality variations) [13]. c) Comparisons of Alternatives The ranking of the alternatives is constructed by comparisons between pairs of alternatives. Considering a group of alternatives, the elaboration of a partial order for these alternatives follows a three step algorithm: Step 1: Formal Index of Quality (FIQ); Step 2: Comparisons of Pairs of Alternatives; Step 3: Sequential Selection of non-dominated nuclei. By these phases (a, b and c), a problem modeled based on the ZAPROS III method results in an ordering of alternatives. The ordering gives us a quantitative notion of the order of preference, in an absolute way (in relation to all possible alternatives) as well as a relative way (in relation to a restricted group of alternatives) [14]. The next session will present the modeled study case and the validated hypotheses. IV. COMPUTATAONAL RESULTS The criteria used in the evaluation were established with the assistance of specialists in the usability of mobile DTV of the Usability and Quality of Software Laboratory (LUQS, of University
of Fortaleza). The specialists wanted to analyze the aspects that had greater influence on choosing a determined project interface. According to the hypotheses previously given, the following criteria were verbally defined: 1. Functions Evidence, which indicates whether the users are able to find easily the functions of the system. The user will prefer an interface which its use is probably easier for him. 2. User’s familiarity with a determined technology, which implies that if an interface is similar to one in a determined technology familiar to the user, this interface is preferable to him, since he is used to it. 3. User’s locomotion while manipulating the device, which infers whether the interface allows a good spatial orientation for the user, which doesn’t demand too much attention of the user while manipulating it. The user may choose an interface that has this feature instead of one with a familiar appearance or with excellent content evidence. 4. Content Influence; when the user uses an interface that has a content which attracts him, he may prefer this interface, although he is more interested in the content then in the interface itself. 5. User Emotion; considering that if the user feels fine when using the interface, he will want to use the device always more. It means that a good emotion makes the user want to use the interface. With conditions defined, the method ZAPROS III can be applied using the Aranaú tool, which implements Verbal Decision Analysis, according to our following presentation. Fig. 4 shows the definition of the criterion “Content Influence”. Table 1 shows the values of all criteria related to the aspects on which the definition of the attractiveness levels among the interfaces is based. TABLE I CRITERIA AND ASSOCIATED VALUES Criteria
A - Functions Evidence
B - User’s familiarity with a determined technology
C - User’s locomotion while manipulating the device
Values A1. No difficulty was found on identifying the system functionalities; A2. Some difficulty was found on identifying the system functionalities; A3. It was hard to identify the system functionalities. B1. No familiarity is required with similar applications of a determined technology; B2. Requires little user familiarity with applications of a determined technology; B3. The manipulation of the prototype is fairly easy when the user is familiar with similar applications. C1. The user was not hindered in any way when manipulating the prototype while moving; C2. The user was occasionally confused when manipulating the prototype while moving; C3. The spatial orientation of the application was hindered when the user was moving.
402
D - Content Influence
E - User Emotion
TAMANINI ET AL. D1. There is no influence of content on choosing the interface; D2. The content exerted some influence on choosing the interface; D3. The content was decisive on choosing the interface.
about how the users described the interfaces (for example, the
E1. He felt fine (safe, modern, comfortable, etc.) when using the interface; E2. He felt indifferent when using the interface; E3. He felt bad (uncomfortable, unsafe, frustrated) when using the interface.
Fig. 5. Example of preferences elicitation using Aranaú Tool.
Fig. 4. Example of the definition of criterion “Functions Evidence” using Aranaú Tool.
The order of preference among the criteria values was established by observing the results of the tests during its application. For example, it was observed that when the users were moving and trying to execute a task in a determined prototype, they complained that it was difficult to move and manipulate the device at the same time. After the tests, the responses to the questionnaires were gathered and evaluated. Questions like “What prototype did you prefer? And Why?” indicated the order of preference among the project alternatives and also which criteria values were decisive for the choice. Fig. 5 shows an example of preference elicitation on Aranaú Tool. The Joint Scales of Quality Variation (JSQV) for two criteria was gradually elaborated and validated with the gathered information of the tests. After this, the Joint Scale of Quality Variation for all criteria is elaborated. The obtained JSQV, from better to worse values, is: a1 a2 b1 b2 c1 e1 d1 e2 d2 b3 d3
majority of users said that access to content using prototype 3 (three) was quite easy – the criterion value “B1”, but it required a lot of familiarity with Desktop applications – the criterion value “A3”). Finally, the established relationship was: Prototype 1 - A2 B1 C2 D1 E2 (Alternative 1); Prototype 2 - A2 B3 C1 D1 E1 (Alternative 2) and Prototype 3 - A2 B1 C1 D1 E2 (Alternative 3). Each Quality Variation - QV of JSQV is numbered in ascending order from 1 (one) to 9 (nine). The sum of the determining QV numbers for each alternative is the Formal Index of Quality - FIQ [4]. Fig. 6 presents the FIQ value of each alternative and the resultant ranking. With the FIQ values, the ranking of the prototypes is organized by assumption that the alternative with the lowest FIQ value represents the highest rank and the best alternative. The alternative with the highest FIQ value is the least preferred prototype. V. DISCUSSION The resulting rank and the preference scale between two criteria values prove that the evidence of the functions in the application facilitates the use and influences the choice of the interface, which is easier to use, more exact and more satisfying for the user. This
c3 a3 e3. After the construction of the JSQV for all criteria, the comparisons of the interfaces were made. Each alternative was studied in order to define which criteria values materialized the prototypes. The usability tests also supplied important information c2
Fig. 6. Alternatives Ranking calculated by Aranaú Tool.
A MODEL FOR MOBILE TELEVISION APPLICATIONS
influence is a determining factor for the choice of the most preferable prototype solution. The value A1 was the top value in the scale of criteria values, demonstrating the Hypothesis 1. The criterion value B1 came just after A1 and A2, showing that user’s experience with similar types of applications of navigation influences the choice of the interface. So, Hypothesis 2 was proved. Considering the Hypothesis 3, since the criterion value C1 was less preferred than A1 and B1, we observed that when the user is moving and interacting with the interface, he chooses the interface that is easiest to use (the one he has more affinity and ease of access to more interesting content). The interesting content leads the user to choose the interface he had greatest affinity. This affinity is determined by the degree of similarity with applications commonly used by the user. The criterion value B1 came just after A1 and A2, showing that affinity is a determinant less important than the content accessed no matter how important or attractive this content is. Hypothesis 4 was proved because D1 is located after all others best criteria values. But the involvement with the content is important, so D2 and D3 are in the middle of scale. Hypothesis 5 “The emotion felt by the user when using the interface exercises a considerable influence in the choice” was demonstrated completely by criterion E (User Emotion). We noticed with the current model that criterion value E1 belongs to the first part of the scale, what means that it has a great influence in the user’s choice. It is important to integrate characteristics of criterion access to the content whit functions evidence. So that if the content is attractive, the user tends to choose the interface that more evidences the functionality to access the content. For example, if the user likes soccer games, he will feel fine if he could select a function to evident the sport in the application. The ranking of the alternatives showed prototype 3, similar to Desktop applications, as being the most preferred. Prototype 2, similar to palm applications, with a Formal Index of Quality value very close to the FIQ of prototype 3, demonstrated that their difference of attractiveness is quite modest. Prototype 1, similar to DTV applications, was proved inadequate for the content of DTV in mobile device. The high FIQ value related to the others prototypes shows the low attractiveness for this type of DTV application. Although ZAPROS III is for a large number of alternatives, this technical report analyses three alternatives. The model represents a guide to develop new solutions, which are going to integrate a new Rank. It will show the Verbal Decision Analysis flexibility related to the insertion of new alternatives in future works. VI. RELATED WORKS A discussion of user-centered development process considering real people’s needs is described in [4]. The authors use to mention that an interaction design made for a device does not fit to another one, but it does not specify any strategy to solve this problem. Behind, On the contrary of this study, that shows an experience which three solutions of design for different devices were defined and analyzed based on how users experience each solution. Our goal was not to find usability problems in the tests, but help designers to understand which criteria related to users’ experience could influence their decision and to discover users’ preferences. Reference [15] describes a multicriteria approach in which the
403
execution of its steps allows to identify the order of attractiveness of a list of usability interfaces for a certain interactive DTV application task, allowing the selection of the most appropriate interface to this new communication resource. However, the interface applications were not evaluated through executable prototypes. In this experiment and using the method ZAPROS III, a qualitative analysis about the users’ preference and their intentions of use with executable prototypes could be better appreciated. Reference [16] shows a multicriteria model with ZAPROS III applying three criteria: familiarity of the user with a determined technology, attractiveness of the task and locomotion of the user while manipulates the interface. VII. FUTURE WORK Given the complexity of HCI and the use of multicriteria analysis, our next goal is to develop a Collaborative Design method in order to assist the method ZAPROS III on the phases: Definition of Criteria and Organization of the Ordinal Scales of Preference, providing for more support on redefining and discovering criteria, and on clarifying the real preferences of users, designers and usability engineering. Results of usability tests are entries for Collaborative Design, which prepare supplementary material to accomplish the method ZAPROS III’s purpose. The design collaboratorium has been developed as a reaction against the failing capabilities of classical usability methods to cope with mobile and ubiquitous technologies [17]. The ZAPROS III method yielded results with respect to the criteria evaluated such that user familiarity with similar applications is a determining factor for ease of use of the interface. The usability interface for mobile Digital Television applications should strongly consider the most used applications by the target clientele. The method ZAPROS III proved to have characteristics such as flexibility, so that new project alternatives can be added, allowing researchers to have a better understanding of the needs and opinions of mobile DTV target users. The method has also helped usability specialists on understanding the relationship (order of preference) among the criteria frequently used for a project interface. The importance of formally evaluating the subjective aspects involves the analysis of which usability standards should be employed for this mobile DTV application. Prototypes using new usability standards are being developed in LUQS. Thus, the ranking supplied by this research will be increased and there will be new contributions in future studies. A research is also being developed to discover how to validate the hypotheses using quantitative metrics. What metrics are possible for each hypothesis? Could these metrics be entry points for information used to elaborate a multicriteria model such as ZAPROS III? The criteria used on hypothesis 5 are still being researched. Hybrid tests (qualitative/quantitative) are being developed using multicriteria. VIII. CONCLUSION It is important to point out that our intention was not to compare navigation techniques (as scrollbars, tap-and-drag, and so on) on mobile devices to identify the best one when users are performing navigation and selection tasks. Our goal was to help designers to
404
TAMANINI ET AL.
understand how criteria related to users’ experience could influence their preference for a final solution. In addition, we showed how to integrate two different areas (HCI and OR - Operational Research) describing an approach for evaluating the Interaction design in a subjective perspective of OR. It means that researchers interested in making qualitative analysis of the interaction can use this proposal, which leads to more objective results.
[6] [7] [8] [9]
ACKNOWLEDGMENT
[10]
The authors are thankful to Celestica of Brazil for the support they have received for this project.
[11] [12]
REFERENCES [1] [2]
[3] [4]
[5]
K. Chorianopoulos and D. Spinellis, User interface evaluation of interactive TV: a media studies perspective, Univ Access Inf Soc; 5: 209–21, 2006. E. Furtado, F. Carvalho, A. Schilling, D. Falcão, K. Sousa, F. Fava, Projeto de Interfaces de Usuário para a Televisão Digital Brasileira. in: SIBGRAPI 2005 – Simpósio Brasileiro de Computação Gráfica e Processamento de Imagens, 2005. Natal, 2005. N. Tractinsky, A. S. Katz, D. Ikar, What is beautiful is usable, Interacting with Computers, Volume 13, Issue 2, December 2000, Pages 127-145. A. Angeli, A. Sutcliffe, J. Hartmann, Interaction, Usability and Aesthetics: What Influences Users’ Preferences?, Symposium on Designing Interactive Systems, Proceedings of the 6th ACM conference on Designing Interactive systems, Pages: 271 – 280, 2006. O. Larichev, Ranking Multicriteria Alternatives: The Method ZAPROS III, European Journal of Operational Research, Vol. 131, 2001.
[13] [14] [15] [16]
[17]
H. Simon and A.Newell, Heuristic Problem Solving: The Next Advance in Operations Research, Oper. Res., vol. 6, pp. 4-10, 1958. I. Tamanini, P. R. Pinheiro, A. L. Carvalho, Aranaú Software: A New Tool of the Verbal Decision Analysis, Technical Report, University of Fortaleza, 2007. O. Larichev and H. Moshkovich, Verbal Decision Analysis For Unstructured Problems, Boston: Kluwer Academic Publishers, 1997. A. I. Mechitov, H. M. Moshkovich, D. L. Olson, Problems of decision rules elicitation in a classification task, Decision Support Systems, 12:115–126, 1994. O. Larichev, Cognitive validity in design of decision-aiding techniques, Journal of Multi-Criteria Decision Analysis, 1(3): 127–138, 1992. P. R. Halmos, Naive Set Theory, Springer, 116 p., 1974. O. Larichev, Psychological validation of decision methods. Journal of Applied Systems Analysis, 11:37–46, 1984. J. Figueira, S. Greco, M. Ehrgott,(Eds.), Multiple Criteria Decision Analysis: State of the Art Surveys Series: International Series in Operations Research & Management Science, Vol. 78, XXXVI, 1045 p, 2005. H. Moshkovich and O. Larichev, ZAPROS-LM– A method and system for ordering multiattribute alternatives, European Journal of Operational Research, 82:503–521, 1995. K. S. Sousa, H. Mendonça, M. E. S. Furtado, Applying a Multi-Criteria Approach for the Selection of Usability Patterns in the Development of DTV Applications, In: IHC’2006, 2006, Natal. IV IHC’2006, 2006. A. L. Carvalho, M. Mendes, P. Pinheiro, E. Furtado, Analysis of the Interaction Design for Mobile TV Applications based on Multi-Criteria, International Conference on Research and Practical Issues of Enterprise Information Systems (CONFENIS 2007), October 14-17, Beijing, China. S. Bødker, The Design Collaboratorium – a Place for Usability Design. ACM Transactions on Computer-Human Interaction, Vol. 9, No. 2, , Pages 152–169, June 2002.
Gene Selection for Predicting Survival Outcomes of Cancer Patients in Microarray Studies Tan Q1, 2, Thomassen M1, Jochumsen KM1, Mogensen O1, Christensen K2, Kruse TA1 1 Odense University Hospital, Sdr.Boulevard 29, Odense C, DK-5000, Denmark; 2 Institute of Public Health, University of Southern Denmark, J.B. Winsløws Vej 9B, Odense C DK-5000, Denmark E-mail:[email protected]
Abstract- In this paper, we introduce a multivariate approach for selecting genes for predicting survival outcomes of cancer patients in gene expression microarray studies. Combined with survival analysis for gene filtering, the method makes full use of individual’s survival information (both censored and uncensored) in selecting informative genes for survival outcome prediction. Application of our method to published data on epithelial ovarian cancer has identified genes that discriminate unfavorable and favorable outcomes with high significance ( χ 2 = 21.933, p = 3e − 06 ). The method can also be generalized to categorical variables for selecting gene expression signatures for predicting tumor metastasis or tumor subtypes.
I.
INTRODUCTION
One of the major applications of the high-throughput microarray technology in biomedical science is the classification of tumor subtypes or prediction of tumor prognosis using transcriptional profiling at the RNA level. Different from the conventional methods based on clinical and histologic criteria, microarrays provide more accurate information for sample classification and outcome prediction enabling the development of more efficient and objective treatment strategies. The development of a powerful prognostic profile requires selecting informative features or markers from a large pool of candidate genes that are present on the arrays. It is well known that a major challenge in microarray analysis is the large number of variable (genes) and the small number of samples which creates the problem of multiple testing. As a result, simply picking up the significant genes to use as prognostic signatures can result in poor performance of the classifier due to inclusion of false positive genes or significant genes with low impact on classification. Tibshirani et al [1] proposed a “nearest shrunken centroids” method for the diagnosis of multiple cancer types using gene expression data. This simple method is popular in use in cancer prognostic prediction in microarray studies. The same idea has been used for selecting genes for predicting tumor metastasis in microarray experiments using the paired case-control design [2]. In addition to the large number of genes, survival analysis
of microarray gene expression data is further complicated by issues concerning time-to-event data such as censoring which is a unique feature of survival data. In this case, making efficient use of the observed survival information is crucial in building a good performance prediction model. In this paper, we devise a combinatory approach that filters genes using the Cox proportional hazard model for rightcensored data, performs gene ranking using multivariate method and builds prediction models using the support vector machines (SVM). We first introduce our method while explaining the rational of the approach in identifying informative prognostic signatures for predicting survival outcomes. The method is then applied to a published dataset from a microarray study on epithelial ovarian cancer [3]. Performance of our method is assessed using the leave-one-out strategy for cross-validation and results from our analysis compared with that from the original study. II. METHODS A. Gene filtering Similar to differential gene expression analysis, we start with gene filtering to remove genes that are not expressed or genes not regulated by the mechanism of our interest. To do that, we apply the univariate Cox regression model to assess the marginal association between the expression of each gene and survival time. Insignificant genes are filtered out using a predefined type one error rate (here we set α=0.01). There are two considerations in performing gene filtering: a. To remove redundant or uninformative genes from subsequent analysis; b. To ensure that the variance in the expression data for the remaining genes is dominated by or mainly due to survival events. Cox regression analysis is conducted by the free R package survival. B. Gene ranking The idea of gene ranking is to provide a list of genes arranged according to their contribution to the classification (here favorable and unfavorable survival). This informative list of genes can be used for selecting prognostic genes for
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 405–409. © Springer Science+Business Media B.V. 2008
406
TAN ET AL.
prediction. Previous studies have shown that, in order to build a good classifier, it is crucial to identify minimal subset of genes that characterize each of the clusters in the data [1,4]. Unfortunately, in survival analysis of microarray gene expression data, such clusters can not be clearly defined because time-to-event is a continuous variable with right censoring. Since after gene filtering, the expression data for the remaining genes are dominated by genes associated with survival outcomes of the cancer subjects, we introduce a multivariate method based on the singular value decompositions (SVD), the correspondence analysis (CA) [5,6], to examine the contribution of genes on the dimension that characterizes survival outcomes of the cancer patients. Since all the remaining genes are strictly significant genes on survival, the top dimension in the transformed data space consequently represents survival of cancer patients. Here the nice feature of correspondence analysis is that it calculates, for each gene i , the contribution to the dimension of interest k , i.e. acik = pi. gik2 / λ2k where pi. is the mass for gene i , gik is the projection of gene i on dimension k , λk is the eigenvalue for dimension k . With this information, a list of genes is obtained simply by ranking the genes according to their magnitudes of the calculated contributions.
C. Model building SVM is a popular supervised machine learning algorithm widely in use in microarray studies [7]. SVM builds a hyper plane that separates the training set with maximal discriminative margin. This plane is used to classify new samples in the testing set. SVM is chosen for classification because of its effectiveness and meaningful output probability of outcome. Similar to Spentzos et al.[3], we first select samples with extremely low and uncensored survivals and pick up equal number of samples with extremely long survivals (both censored and uncensored) and assign as unfavorable and favorable groups respectively. A primary prediction model is trained using the extreme samples and then applied to the remaining samples in the training set to classify them into unfavorable and favorable groups. With the complete classification or labeling of the training set, a final prediction model is trained using the whole training set. For crossvalidation, we adopt the popular leave-one-out (LOO) strategy. In order to optimize the prediction model, we start with the list of genes ranked according to their contributions to survival in the training set. Beginning with all the selected genes, our method recursively drops genes with small contributions and records the performances of the training model in predicting survival outcomes using log rank chi-squared statistics for comparing the survival distributions between predicted unfavorable and favorable groups. The subset of top contribution genes that give the highest prediction accuracy is chosen as the prognostic signature (Figure 1). Analysis using SVM is realized by the free R package e1071. III. APPLICATION
Fig. 1 Diagram of the gene selection process
Correspondence analysis of the filtered gene expression data is achieved by the free R package multiv.
Spentzos et al.[3] reported prognostic significance of gene expression profiling in survival of epithelial ovarian cancer in a sample of 68 patients using Affymetrix U95A2 array containing approximately 12,000 genes. Their study identified a 115-gene signature that predicted patients with unfavorable and favorable survivals at a significance level of p=0.004. As an example, we apply our approach to the same data and compare our result with that from their original study. In our analysis, cross-validation is conducted using the popular leaveone-out strategy and performance of the prediction model examined using the log rank test for comparing differential survival between the predicted unfavorable and favorable groups. In the training set of 67 samples, we follow the procedure described in the methods section by picking up 9 subjects with the shortest uncensored survival times and 9 subjects with the longest survival times (censored or uncensored) and assign them into unfavorable and favorable groups. Group labels for the 49 remaining samples in the training set are predicted using the primary prediction model
GENE SELECTION FOR PREDICTING SURVIVAL OUTCOMES
407
trained by the 18 extreme samples. The final training step is carried out using the whole training set of 67 samples together with their corresponding labels. Using leave-one-out crossvalidation and recursive dropping of genes of low contribution, we identified a 40-gene signature that discriminates unfavorable and favorable groups with log rank p-value of 3e06. In Figure 2, we show the SVM probability of unfavorable outcome plotted against the observed survival time. In the figure, each individual is shown by numbers indicating individual ID followed by censoring status and observed survival time. As we can see, most of the samples are separated with high probabilities with extremely good separation for the favorable survivors at the bottom of the figure. Note also that of the subjects predicted as favorable survivors, many have only short observation or follow-up times. As most of them are censored, they are supposed to have long survival with high probabilities despite of their short follow-ups. Fig. 3. Kaplan-Meier survival curves for patients predicted with unfavorable (group 1) and favorable (group 2) outcomes. The mean survival for the unfavorable group is only 30 months while that for the favorable group has not yet been reached.
IV. DISCUSSIONS
Fig. 2. SVM probability of unfavorable outcome plotted against observation time using leave-one-out cross-validation. Each individual is indicated by individual ID followed by censoring status (0 for censored; 1 for uncensored) followed by observation time.
In Figure 3, we display the Kaplan-Meier survival curves for the unfavorable (designated by 1) and the favorable (designated by 2) groups. The survival difference of the two groups is obviously large with the mean survival for group 1 of 30 months and that for group 2 not yet reached. Finally, it is interesting that, of the 40 genes we identified, only 3 genes overlap with the 115 genes reported in the original study (Table 1. Overlapping genes are marked bold).
We have developed a multivariate approach for selecting informative genes for survival outcome prediction for cancer patients using gene expression microarrays. Application of our method to published data on epithelial ovarian cancer has captured a subset of statistically significant genes that discriminates cancer patients with unfavorable and favorable survival outcomes (p=3e-06). By recursively dropping genes with lower contribution, our method picks up the subset of genes that highly characterize the survival outcomes. Experiences with our method have shown that it is a method that is simple to implement and highly speedy. It is necessary to mention that our method for gene selection is advantaged by the following: 1. The use of survival analysis on gene expression data makes full use of individual survival information (both censored and uncensored) in the process of gene filtering. 2. Most importantly, gene filtering using survival analysis also helps to establish the crucial role of survival as the unique variable that dominates the variance in the expression data of the remaining genes enabling the calculation of contribution to the major dimension (representing survival) in the transformed data space. 3. In the correspondence analysis, singular value decomposition is applied to a matrix which is a chi-
408
TAN ET AL.
4.
squared transformation of the raw data [5]. In this way, the results of the subsequent analysis are less affected by the extreme values and skewed distribution of the expression data. Our multivariate approach for gene selection takes advantage of the calculated contribution by each gene to the main axis representing survival. Most importantly, as mentioned above, such a calculation takes into account of the existence of the other genes. As a result, genes that are on top of the contribution list are more relevant [8]. TABLE I
Details of 40 identified genes (overlap genes in bold) Probe ID 41607_at 34367_at 33500_i_at 816_g_at 719_g_at 33116_f_at 37519_at 34363_at 40188_f_at 718_at 740_at 34091_s_at 741_g_at 39751_at 33879_at 34743_at 31597_r_at 33082_at 31548_at 41831_at 34542_at 792_s_at 39889_at 39940_at 33146_at 38210_at 39656_at 1132_s_at 1256_at 32808_at 1896_s_at 34793_s_at 36813_at 35504_at 41555_at 34433_at 33225_at 40748_at 38199_at 36863_at
Contribution 0.24691 0.062212 0.034114 0.022797 0.021818 0.02129 0.020791 0.019749 0.017759 0.016734 0.015829 0.015541 0.014999 0.014273 0.014178 0.013702 0.01277 0.012131 0.011556 0.009942 0.009696 0.009465 0.008722 0.008446 0.00844 0.007772 0.007513 0.007482 0.007214 0.0071 0.006883 0.006848 0.006764 0.006342 0.006273 0.006152 0.006147 0.006015 0.005587 0.005357
p-value 0.000986 0.002056 0.005998 0.004027 0.003088 0.000824 0.007883 0.002671 0.0087 0.008164 0.000219 0.001445 0.004441 0.008337 0.002796 0.00973 0.004497 0.002661 0.004135 0.002703 0.003719 0.007175 0.005234 0.003951 0.002067 0.005304 0.000664 0.001124 0.008779 0.001822 0.003921 0.004493 0.003507 0.003078 0.003237 0.002532 0.001926 0.007276 0.008621 0.005459
Mean 145.7363 901.2295 634.4966 373.5103 1202.608 6486.466 71.33769 719.5511 41.97488 813.2382 127.4264 5333.171 174.7505 298.3884 346.0643 506.1249 282.8772 54.24643 48.70395 160.5802 46.35297 43.5571 38.40172 78.55549 986.0372 95.41556 18.78742 97.67764 34.08839 1916.733 66.862 376.2522 202.2838 32.08176 20.77446 163.0879 114.4902 21.85239 20.47364 68.60697
In Table 1, we see that the rank of gene contribution does not necessarily represent the rank of significance level. One highly significant gene may not give a high contribution to classification due to its low expression level. Take differential gene expression analysis for example, one gene at low expression level can be more significant than a highly expressed gene if they both have the same fold change simply
because the variance of first gene tends to be smaller. In our analysis, we have tried gene ranking using significance level and mean of gene expression but the optimal performance is only achieved by gene ranking using contributions (data not shown). A good classification signature should be a minimal subset of genes that is not only differentially expressed but also contains most relevant genes without redundancy [9,10]. As shown in the results section, our selected 40-gene signature outperforms the 115-gene signature though it uses only about one third of the number of genes. Although there are only 3 genes overlapping with the 115-gene signature, calculation using hypergeometric distribution shows that the probability of overlapping by chance is extremely unlikely (p=0.0006). In another example, Tan et al.[2] reanalyzed published data on breast cancer metastasis and identified 5 genes that predict metastasis with high accuracy and interestingly all the 5 genes overlap with the 32-gene profile reported in the original study [11]. Ein-Dor et al.[12] reported that the set of out-come predictive genes is not unique due to the existence of multiple genes that are correlated with survival and some of them have only small differences in their correlations. Overall, our results of comparison indicate that although there can be different set of predictive genes, significant overlapping should be expected from the same study. In this paper, we focus on gene selection for predicting survival outcomes using multivariate correspondence analysis to identify subset of significant genes with highest contributions. The same idea can be generalized to other categorical variables such as tumor metastasis or clustering of tumor subtypes. Here the only difference is in the gene filtering step where instead of performing survival analysis, the significance of genes can be assessed using statistics for differential gene expression in the categorical data. ACKNOWLEDGMENT We thank Dr. Dimitrios Spentzos at Beth Israel Deaconess Medical Center in Boston for help in accessing their data and for providing the list of their signature genes. This work was partially supported by the US National Institute on Ageing (NIA) research grant NIA-P01-AG08761. REFERENCES [1] [2] [3]
[4] [5]
R. Tibshirani, T. Hastie, B. Narasimhan and G. Chu. Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc Natl Acad Sci U S A., 20:6567-6572, 2002. Q. Tan, M. Thomassen and T.A. Kruse. Feature selection for predicting tumor metastases in microarray experiments using paired design. Cancer Informatics, 2:133-138, 2007. D. Spentzos, D. A. Levine, M. F. Ramoni, M. Joseph, X. Gu, J. Boyd, T. A. Libermann and S. A. Cannistra. Gene expression signature with independent prognostic significance in epithelial ovarian cancer. J. Clin. Oncol., 22:4700-4710, 2004. S. Matsui. Predicting survival outcomes using subsets of significant genes in prognostic marker studies with microarrays. BMC Bioinformatics, 7:156, 2006. K. Fellenberg, N. C. Hauser, B. Brors, A. Neutzner, J. D. Hoheisel and M. Vingron. Correspondence analysis applied to microarray data. Proc Natl Acad Sci U S A. 98: 10781-10786, 2001.
GENE SELECTION FOR PREDICTING SURVIVAL OUTCOMES [6]
Q. Tan, K. Brusgaard,T. A. Kruse, E. Oakeley, B. Hemmings, H. BeckNielsen, L. Hansen and M. Gaster. Correspondence analysis of microarray time-course data in case–control design, Journal of Biomedical Informatics, 37:358-365, 2004. [7] M. P. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet, T. S. Furey, M. Ares and D. Haussler. Knowledge-based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci U S A. 97:262-267, 2000. [8] Q. Tan, J. Dahlgaard, B. M. Abdallah, W. Vach, M. Kassem and T. A. Kruse. A bootstrap correspondence analysis for factorial microarray experiments with replications. In I. Mandoiu and A. Zelikovsky (eds), ISBRA 2007, LNBI 4463, pp.73-84. Springer-Verlag Berlin Heidelberg. [9] Y. Peng, W. Li and Y. Liu. A hybrid approach for biomarker discovery from microarray gene expression data for cancer classification. Cancer Informatics. 2:301-311, 2006. [10] S. G. Baker and B. S. Kramer. Identifying genes that contribute most to good classification in microarrays. BMC Bioinformatics. 7:407, 2006. [11] M. Thomassen, Q. Tan, F. Eiriksdottir, M. Bak, S. Cold and T. A. Kruse. Prediction of metastasis from low-malignant breast cancer by gene expression profiling. International Journal of Cancer 120:1070-1075, 2007. [12] L. Ein-Dor, I. Kela, G. Getz, D. Givol and E. Domany. Outcome signature genes in breast cancer: is there a unique set? Bioinformatics. 21:171-178, 2005.
409
Securing XML Web Services by using a Proxy Web Service Model Quratul-ain Mahesar ISRA University P.O. Box 313 Hala Road Hyderabad [email protected] +92 - 022- 2030181 Abstract-XML Web Services play a fundamental role in creating distributed, integrated and interoperable solutions across the Internet. The main problem that XML Web Services face is the lack of support for security in the Simple Object Access Protocol (SOAP) specification. SOAP is the vital communication protocol used by XML Web Services. Our research study aims at implementing a proxy based lightweight approach to provide security in XML Web Services. The approach provides an end-to-end message level security not only in the new Web Services but also the ones which are already running without their being disturbed. The SOAP message in a Web Services environment travels over the HTTP protocol which is not considered secure as messages are transmitted over the network in clear text format and can be easily read by protocol sniffers. Therefore, in our project we establish a secure connection between the end user and Proxy Web Service using Secure Sockets Layer (SSL) and the connection between the Proxy Web Service and actual Web Service is also secured using SSL. Keywords: Security, Web Services, XML, SSL
I.
INTRODUCTION
XML Web Services are defined as software pieces that interact with each other using internet standards to create an application in response to requests that confirm to agreed-upon formats [1]. Web Services provide a means to publish the enterprise services and application integration over the standard Internet infrastructure [2]. The XML Web Services are not widely implemented so far. The people are reluctant to deploy XML Web Services and the biggest concern for this is security. According to the survey conducted by the market researcher Evans, 48% of the 400 IT executives interviewed are not confident to deploy Web Services
Prof. Dr. Asadullah Shah ISRA University P.O. Box 313 Hala Road Hyderabad [email protected] +92 - 022- 2030181 for the public use due to lack of security and authentication [3].The lack of security standard for XML Web Services makes it limited for internal connectivity [4]. All the messages are transmitted in the XML Web Services environment using the SOAP protocol which lacks security. Simple Object Access Protocol (SOAP) is an XML based lightweight protocol for exchanging structured and typed information in a decentralized and distributed environment [5]. To date, much of the Web Security is built around encryption through Secure Sockets Layer (SSL). SSL is limited to providing point-to-point message security, and it is not enough to protect supply-chain operations and other Business-2-Business (B2B) transactions [6]. This problem has created the need for development of a new Web Services Security Model which is proposed and described in this paper. The purpose of our research study is to provide end-toend message level security in the XML Web Services. A proxy based model is demonstrated to test and analyze the various security aspects in the XML Web Services environment. A proxy Web Service is implemented in Asp.Net which resides between the end-client application and the actual Web Service. Client and server side digital certificates are used to provide Secure Sockets Layer (SSL) based security. Digital certificates are electronic files that are used to uniquely identify people and resources over the networks such as the Internet [7]. II.
XML WEB SERVICES SECURITY REQUIREMENTS
The basic security requirements for XML Web Services are summarized below [8]: Authentication: Authentication is the process of verifying a person or system that is
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 410–415. © Springer Science+Business Media B.V. 2008
SECURING XML WEB SERVICES BY USING A PROXY WEB SERVICE MODEL
accessing the service. It ensures that each entity involved in using a XML Web Service is what it actually claims to be. Authentication involves accepting credentials from the entity and validating them against an authority. Credentials maybe embedded in either the header or body of the SOAP message. Traditionally, authentication is done using user name and password pair but now it can be done via biological characteristics, such as fingerprint, retina, digital key, and so forth. Authorization: Authorization determines whether the service provider has granted access to the XML Web Service to the requestor. Authorization confirms the service requestor’s credentials. It determines if the service requestor is entitled to perform the operation, which can range from invoking the XML Web Service to executing a part of its functionality. In addition to authorizing what information users/applications have access to, there also need to be authorization of which operations an application or user has access rights to perform. Integrity: Integrity of data is important for data transmitted over the network. It guarantees that the data or information being transferred is intact. In other words, the data received by the intended receiver should be exactly the same as it was sent. Digitally signed document could help to maintain integrity of information being transmitted. Confidentiality: Sensitive information transferred over the network need to be protected. Confidentiality means to protect the data being transferred so that it is not exposed to a third party. Encryption is one of the methods to avoid data from being seen during transmission. The data can be encrypted before sending and decrypted on receiving. Non-repudiation: The person who has made the transaction should be accountable for it. Non-repudiation is to ensure that the person is liable for the action that he/she has made electronically. With the implementation of PKI infrastructure [9], the identity of an entity involved in a transaction can be identified by digital certificate or digital signature. III.
SOAP SECURITY
SOAP is the core technology for XML Web Services. It defines the framework for data
411
transmission between different systems and services. Without SOAP, XML Web Services are virtually nonexistent and not able to function. SOAP messages are composed in XML format. In other words, SOAP is a XML message. Therefore, it inherits the characteristics of XML document. In securing a SOAP message, XML security will be employed. On top of the XML security implementation, there are SOAP specific security implementations. Currently, most of the XML Web Services implementations are based on the HTTP binding and the SOAP messages are transferred over HTTP protocol. The implementation is simple and involves less parties or partners over simple network architecture. The easy and simple way to secure the service is to employ current Web security technologies. The most popular and widely available there is SSL. Thousands of e-commerce sites are available on the Internet and SSL is used to secure information that is being transferred between users and service providers. In fact, SSL is used to protect data sent between Web browsers and Web servers. It is a transport layer protocol that ensures privacy and confidentiality between two points. SSL encrypted data are transferred over HTTPS protocol instead of HTTP. In addition, it can be combined with digital certificates to identify both parties’ authentication by a trusted authority, such as VeriSign or Thawte. Thus, it provides integrity for the services as well. Through centralized trust authority verification, an entity can be identified. As a result, most of the security issues that arise at the transport level can be dealt with by using SSL. However, SSL keeps the privacy for the data in transit only. It secures the transport link at the transport layer. Like a tunnel, everything transferred inside is safe or kept from being exposed. However, if the tunnel breaks then the data will be exposed. Therefore, it is still not a very secure measure for security. In fact, there are no incidents of SSL use that resulted in compromise of data privacy. Increasing the protection security mechanism at the message level is required. IV.
ARCHITECTURE OF THE SECURITY MODEL
The high-level architecture of the security model is shown in Fig.1. The proxy based model introduces a lightweight framework for securing access to XML Web Services. The aim is to provide message level security. The basic idea is to deploy a Proxy Web Service that receives the requests from the end client on behalf of the actual Web Service. The Proxy service authenticates the end client by validating the client’s credentials that were sent along with the Web Service
412
MAHESAR AND SHAH End -Client Application Enter login information
& request service
Create & transmit a SOAP Message with login information in the header & service request in the body
HTTPS Proxy Web Service Received SOAP Mesage
Client Profile Database
Perform Client Authentication
Is Authenticated ?
NO
Send a login failure message to the end -client
YES
request. A client is provided the requested service only after such an authentication. The Proxy Web Service resides between the end client application and the actual Web Service. It consists of two major components: authentication handler and proxy client. The authentication handler is used for authenticating the client and the proxy client invokes the actual Web Service. In other words, the request of the end client application is passed on by the proxy client to the actual Web Service. Similarly, the response of the actual Web Service is passed on by the proxy client to the end client application. Fig.2. shows the flow chart of the request part of the proposed security model and Fig.3. shows the flow chart of the response part of the proposed security model. The link between the end client application and proxy Web Service is secured using SSL protocol. Similarly, the link between the proxy Web Service and the actual Web Service is secured using SSL protocol. The client sends its digital certificate as a proof of identity. Server’s digital certificate is also verified and trusted by the client to ensure full security.
Forward the end client’s service request in the form of a SOAP Message
HTTPS Actual Web Service Received SOAP Mesage
Fig.1. High-Level Architecture of the Proposed Security Model.
Perform the requested service
Fig.2. Flow Chart of the Request Part of the Proposed Security Model.
SECURING XML WEB SERVICES BY USING A PROXY WEB SERVICE MODEL Actual Web Service Create a SOAP Message containing the results of the performed service
SOAP Message containing the results
Transmit the SOAP Message
HTTPS Proxy Web Service Received SOAP Message
Forward the SOAP Message
HTTPS
End - Client Application Received SOAP Message
Display the results of the requested service
Fig.3. Flow Chart of the Response Part of the Proposed Security Model.
V.
ADVANTAGES OF THE SECURITY MODEL
Following are the advantages of the proposed model: Message-Level Security: It is based on message-level security. The actual SOAP message is digitally signed and sent to the server. The server verifies the signature to check whether the message is tampered. If the message is not tampered and sent by a valid user, it executes the business logic and sends the result of execution back to the client. In this proposed approach, the SOAP message sent by the client is sent directly to the Proxy
413
Web Service and therefore it does not cross any intermediaries. Authentication and Message Integrity: Not only the authentication of the user is performed but the message integrity is also verified. Integration into Existing Infrastructure: It does not disturb the actual Web Service, which may be running on the production server. By introducing this model all the applications are enabled to participate instantly in a secured environment not requiring any changes to the application code. The proposed approach could be transparently integrated into existing processes without requiring changes to the workflow or even the supporting business applications. Hence it forms a plug-in solution that can be removed and replaced with any other solution at any point in time. Abstraction of Details: The actual Web Service is hidden and the process is abstracted out from the client. The client would not know that his/her request was intercepted and processed by a proxy. Single Sign On: The credentials of the client are verified only once by the proxy Web Service. The client can then use any number of actual Web Services that he/she is registered. Integration of New Handlers: New handlers, such as Auditing and Notification handlers are integrated into the system very easily. The Auditing handler is used for maintaining the service access information, and the Notification handler for sending emails to the service providers in case of any problem in accessing the service. Lightweight Framework: This approach provides a lightweight framework. Low cost: The cost of building this model is low as compared to implementing other security standards such as WS-Security. Deployment: It can be quickly deployed as compared to other security standards such as WS-Security. VI.
IMPLEMENTATION
An online book information Web Service application is built. It is based on Asp.Net and is developed using Visual Studio Framework 1.1. The application is based on the security model discussed
414
MAHESAR AND SHAH
earlier. Fig.4. shows the working model of the application. For the purpose of demonstration, this simple book information store offers book-search capability to its users. The system is quiet simple and easy to use, it offers search by title, author and subject. Once a user selects the criteria for search and submits the keyword to be searched, the request goes directly to the Proxy Web Service. The credentials supplied by the user along with the request are verified by the Proxy Web Service. If the authentication is unsuccessful, an error message is sent to the user. On the other hand, if the authentication is successful, the Proxy Web Service generates a SOAP message requesting for the search results from the service provider. The service provider processes the request and returns the results in SOAP message to the Proxy Web Service. Further, the Proxy Web Service returns the results in SOAP message to the service consumer, that is, the book information store Web site. The system then processes the returned results for displaying according to the Web site specification. In these processes, the SOAP message travels over the HTTP transport protocol. The security employed in the HTTP protocol is a Basic Access Authentication scheme [10]. It is not considered as a secure method, because the user name and password are transmitted in clear text over the network which can be easily read by protocol sniffers. Therefore, it exposes the public to risks. To secure the information transferred over HTTP, Secure Socket Layer (SSL) must be used [11]. SSL is used to protect privacy and maintain data integrity between two-point communications. In this project, a secure connection is established between the end user and the Proxy Web Service using Secure Sockets Layer (SSL). Similarly, a secure connection is established between the Proxy Web Service and the actual Web Service using SSL. SSL client side and server side certificates are acquired from a well known and trusted certification authority named Thawte (http://www.thawte.com). VII.
RESULTS
The implementation of the proxy based security model results in providing message level security. It does not only authenticate the user but also verifies the message integrity. Authentication and repudiation were provided by using SSL digital certificates. The channel through which the data flows is completely secured using SSL protocol. An unauthorized person is not allowed to alter the data exchanged between the systems. Hence, the data integrity is ensured. Data is
not disclosed to unauthorized parties. SSL provides point-to-point protection of data. Hence, confidentiality of data is also achieved. The actual Web Service is not disturbed and the proxy Web Service acts like a plug-in service, it can be easily removed and replaced with any other solution at any point in time. The approach provides a light weight framework. It is low in cost and quick to deploy. VIII.
CONCLUSION
In this research study we demonstrated security in XML Web Services by using a proxy based approach. A software product is developed to demonstrate security in XML Web Services application. We demonstrated the use of SSL client and server side digital certificates and protected the channel through which SOAP messages pass by using the SSL protocol. The most important issue of providing endto-end message level security was implemented in the proposed security model. IX.
FUTURE WORK
Service firewall is one of the areas of further study. A service firewall is a network element, either logical or physical, that links an enterprise’s Web Services to external partners or internal consumers in order to secure the bidirectional flow of XML messages. It can monitor, secure and control all Web Services traffic. Another potential area for further study is a signing proxy for Web Services security. This approach allows corporate firewalls to handle authentication. Corporate signatures are added to all outgoing SOAP messages to enable a corporate trust relationship and use of proxy authentication. REFERENCES [1] Infravio, “Web Services – Next Generation Application Architecture”, 2003. [2] Austin, D., “Web Services Architecture Requirements”, 2004. [3] Worthen, B., “Web service still not ready for prime time”, CIO Magazine, 2002. [4] Bednarz, A., “Firm bullish on Web services, Network World”, 2003. [5] Mitra, N., “SOAP Version 1.2 Part 0: Primer”, 2003. [6] Rubenking Nj, “Securing Web Services”; PC Magazine, 2002. [7] Robinson P., “Understanding Digital Certificates and Secure Sockets Layer”, Entrust, 2001. [8] Fhied, C., “Gap Assessment of the Top Web Service Specifications, Managing the Security of Web Services”, SE 690 – Research Seminar, DePaul University, 2004. [9] Kiran, S. et al, “PKI Basics – A Technical Perspective”, PKI Forum Inc., 2002.
415
SECURING XML WEB SERVICES BY USING A PROXY WEB SERVICE MODEL [10] Franks, “HTTP, Authentication: Basic and Digest Access Authentication”, The Internet Society, 1999. [11] Dierks, T. & Allen C., “The TLS Protocol, Version 1.0, RFC2246”, The Internet Society, 1999.
Certification Authority Request Client Certificate Request Server Certificate
Server’s Certificate
Client’s Certificate
End-Client Web Application
HTTPS 1. Login Information
Proxy Web Service
HTTPS
Authentication Handler
2. Validation Response
Proxy Client 3. Service Request
4. Service Request
6. Service Response
5. Service Response
SOAP
Fig. 4. Working Model of the Application.
SOAP
Actual Web Service
O-Chord: A Method for Locating Relational Data Sources in a P2P Environment Raddad Al King, Abdelkader Hameurlain, Franck Morvan Institut de Recherche en Informatique de Toulouse (IRIT), Université Paul Sabatier 118, route de Narbonne, F-31062 Toulouse Cedex 9, France E-mail: {alking, hameur, morvan}@irit.fr Abstract- Due to the characteristics of Peer-to-Peer systems, SQL query processing in these systems is more complex than traditional distributed DBMS. In this context, semantic and structural heterogeneity of local schemas create a real problem for locating data sources. Schema heterogeneity prevents peers to exchange their data in a comprehensive way. It could lead to incorrect answers for the localization query. In this paper, we propose O-Chord method which integrates domain ontology in Chord protocol [22]. This integration provides comprehensive data exchange while carrying out efficient data sources localization. Our proposed method allows Chord protocol to select the relevant data sources to answer the localization query.
I.
INTRODUCTION
In spite of their great success in file-sharing domain, peer-topeer (P2P) systems provide relatively simple functions such as lookup a file by using its name (e.g. Gnutella [10], Kazaa [17]). Recently, the combination between the techniques used in traditional distributed DBMS [21] and those developed in P2P systems has appeared. The goal of this combination is to make P2P systems able to apply query processing on a small granularity (e.g. relational attribute) and to process queries expressed in a high-level language like SQL. SQL query processing proceeds in four main phases: (i) reformulation, (ii) data sources localization, (iii) optimization and (iv) execution. In a P2P environment, where each peer1 is client, server, router and even data source, databases are own by different organizations (or users). They are represented by different local schemas which could be semantically and structurally heterogeneous. For example, one relation found on many data sources could be: (i) represented by several names, (ii) called by a name having many meanings or (iii) described by different structures. Due to schema heterogeneity, a classical localization phase could generate incorrect answers. A real problem in a P2P environment is to allow comprehensive data exchange between peers while carrying out efficient data sources localization. In traditional distributed DBMS [21], the above problem is resolved due to global schema which is divided into two essential parts: (i) placement schema that allows efficient locating of data sources and (ii) conceptual schema which allows comprehensive data exchange. However, the scalability ------------------
1 In this paper, we use “node” and “peer” to refer to the same meaningful intention.
and the dynamic nature prevent P2P systems to have a global schema. In the other hand, P2P file-sharing systems use the file name (e.g. song name) to look up a file. Usually, the file name is known by the majority of peers (or users) in the system. This knowledge is obtained from media (e.g. TV) or from any social environment (e.g. university). Thus, this knowledge allows a comprehensive data exchange. However, in the context of this paper, the techniques developed in P2P file-sharing systems do not allow a comprehensive data exchange. This is generally because of the following raisons: (i) data structures and their management are often transparent to users and (ii) proposed solutions for efficient localization of file sources must be applied on data having a granularity much finer than a file. Many solutions [2, 8, 13, 15, 20] have been proposed to allow nodes of a P2P system to share their data in a comprehensive way while carrying out an efficient data sources localization. These solutions depend often on the type of P2P system (structured, unstructured or super-peers). Ref. [2, 20] use semantics to improve the random routing of the unstructured P2P systems. However, data sources localization remains not reliable. Sometimes, these systems are not able to locate the source of a data entity even if this one exists in the P2P system. Ref. [8] presents a super-peers P2P system providing more reliable localization. It uses expertise tables maintained by the super-peers to guide the query routing. However, each super-peer could create a bottleneck for the peers for which it is responsible. In [20], the user creates keywords to represent his local schema. His intervention is also needed to make semantic correspondence between the keywords and the local schemas of other peers. This could lead to incorrect correspondences. In [12, 13], to locate data sources, query must follow a chain of rewritings according to different schemas. Some other solutions (e.g. [8]) don’t take into account the structural heterogeneity of local schemas. In this paper, we use a structured P2P system based on the Chord protocol [4, 22]. This type of P2P systems is adapted, in efficient way, to the scalability and the dynamic nature of P2P environments. Chord protocol uses keys to locate data sources. Traditionally, there is no semantic matching between keys and data entities (relations in our study) represented by these keys. Furthermore, the keys do not take into account the structure of relations on their sources. These two limitations prevent Chord protocol to select the relevant data sources storing the relations having the semantics and the structures required by the peer
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 416–421. © Springer Science+Business Media B.V. 2008
A METHOD FOR LOCATING RELATIONAL DATA SOURCES IN A P2P ENVIRONMENT
initiating the query. So, in order to allow Chord protocol to do this selection, it is necessary to extend this protocol. For all these reasons and in order to locate relational data sources in a P2P environment, we propose O-Chord method. This method takes advantage of the resemblance between the conceptual part of a global schema and the domain ontology [3, 6, 11, 23]. It integrates domain ontology in Chord protocol. This integration appears in two levels: (i) the keys of Chord protocol represent terms of domain ontology instead of representing relations written according to the local schemas and (ii) the terms of domain ontology are used to describe the structure of relations on their sources. The main features of OChord method are: (i) Adding semantics to Chord protocol, which allows peers to be able to exchange their data in a comprehensive way while taking advantage of the efficient query routing of Chord protocol. (ii) Carrying out the localization phase by rewriting the query (according the terms of domain ontology) only once without the need of sharing knowledge about local schemas. (iii) Selecting the best relevant sources of relations having the semantics and the structures required by the peer initiating the query. The rest of this paper is organized as follows. Section 2 presents the problem position. Then, section 3 presents the OChord method that integrates domain ontology in Chord protocol. Before the conclusion and the future work, section 4 discusses related works. II.
PROBLEM POSITION
In the P2P environment considered in this paper, peers want to exchange their data in comprehensive and efficient manner. Each peer manages its own database in an autonomous way. The databases have a common interest (e.g. medical databases of researchers working on Alzheimer’s disease) and they store similar data which have a semantic interdependence. However, these data could be represented by heterogeneous local schemas. Basing on the example illustrated in figure 1, we can distinguish two levels of schema heterogeneity: (i) Structural level: one relation (e.g. Doctor) could be represented in different manners according to various local schemas. Each schema represents this relation according to the interest of its organization (or user); e.g. Doctor (Name, Salary) according to N1 schema and Doctor (Name, Address) according to N42 schema. (ii) Semantic level: in this heterogeneity level, we can distinguish two criteria: a. Different Names: one relation could be represented by several names belonging to different schemas. These names are lexically different but they indicate the same meaning and express the same intention of the users; e.g. “Doctor” according to N8 schema and “Physician” according to N14 schema, these two
N56
417
N1 N8
N51 N48
N42
N14
N38 N21 N32
N1 : Doctor (Name, Salary) N8 : Doctor (Name, Paycheck) Ill (Name, Address, Doctor_Name) N14: Physician (Name, Address) Patient (Name, Age) N21: Doctor (Name, Earnings, Address) N32: Consultant (Name, Salary, Telephone) N42: Doctor (Name, Address) N48: Patient (Name, Address, Doctor) Fig. 1. Relational schemas in a structured P2P system
words represent the concept of someone that is trained to treat people who are ill. b. Different semantics: one relation name exists in many schemas, it represents different meanings; e.g. the word “Doctor” in N8 schema has not the same meaning in all other schemas. For example, the same word, according to N21 schema, holds the meaning of a person obtaining his PhD from a university (e.g. doctor in chemistry science). Due to the schema heterogeneity, efficient localization of data sources is a real problem. Chord protocol [4, 22] used in this paper provides an efficient data sources localization in terms of scalability and localization query routing. However, Chord protocol represents each relation by a key. Traditionally, there is no semantic matching between keys and the relations represented by these keys. Furthermore, keys do not take into account the structures of relations on their sources. These two limits of Chord protocol prevent us to use directly (without any extension) this protocol to locate data sources. In figure 1, let us suppose that N8 wants to know the names of “Doctor” having for “Paycheck” the value of 2000$. In the localization phase, if we use Chord protocol, it locates peers storing the relation “Doctor” and sends the answers to the peer initializing the query (henceforth PIQ). The answer will be {N1, N8, N21, N42}, these peers are the only ones that store a relation having “Doctor” as a name. This answer is incorrect because of the following reasons: (i) The answer contains nodes {N1, N8, N21 and N42} which store relations having the same lexical word “Doctor” as a name. Nevertheless, Chord protocol ignores certain sources storing the same relation “Doctor” required by the node N8 but represented by lexically different names (e.g.
418
KING ET AL.
Chord ignores the node N32 which stores the relation “Doctor” represented by the name “Consultant”). (ii) Even if the node N21 stores a relation named “Doctor”, this relation is not the same relation required by N8; it holds the sense of a person obtaining his PhD in chemistry science. (iii) The node N42 stores a relation having “Doctor” as a name and holding the same sense required by N8. However, this relation is not useful because it doesn’t contain the “Paycheck” attribute. Since the protocol used in localization phase must answer the localization query by sending correct answers to the peer initialising the query (noted PIQ) and in order to resolve the problems (i), (ii), and (iii), we must extend Chord protocol. So, we propose to add to Chord protocol new information describing the semantics and the structures of relations represented by the keys. For that purpose, we must answer the following questions: (i) What semantics should be added to Chord protocol? (ii) How the peers can share the added semantics? (iii) How to integrate this semantics in Chord protocol? (iv) How to use the added semantics? (v) How to add information describing the relations structures to Chord protocol? In the rest of this paper, we present essential elements helping to answer these questions. III. O-CHORD METHOD In traditional distributed DBMS [21], the problem of locating the sources of heterogeneous data is resolved due to a global schema. Creating a global schema requires an agreement between the nodes on the basic concepts which the data models. This schema must be created before the nodes can store or exchange their data. However, the scalability and the dynamic nature prevent P2P environments to have a global schema. In order to find a new method replacing the global schema and answering the questions asked before, we start by dividing the global schema of a distributed DBMS into two parts: (i) Placement schema, which describes the structure of the physical data storage and the matching rules between different local schemas. This schema is a cornerstone in the localization phase. It indicates the placement of data sources and it makes localization phase reliable. (ii) Conceptual schema, which describes the inherent semantic structure of data. It is completely independent of physical data storage. This schema provides a comprehensive data exchange. It allows system to choose relations having the semantics and the structures required by PIQ. Due to the dynamic nature of P2P systems, placement schema must be replaced by a dynamic method. In our study, we use Chord protocol to replace the placement schema. Chord protocol is adapted in an efficient way to the scalability and the dynamic nature of P2P environments. It reduces the number of
nodes to be contacted in order to locate data sources [22]. Chord protocol represents each relation by a key. Basing on this key, Chord can find the site of the required relation. Furthermore, this key does not depend on the number of relations existing in the system. This last feature allows system to manage a big number of relations. In spite of its good characteristics, Chord protocol cannot select the relevant sources which store relations having the structures and the semantics required by PIQ. In fact, there is no semantic matching between keys and the relations represented by these keys. Furthermore, keys do not take into account the structures of relations on their sources. In this section, we propose O-Chord method which extends Chord protocol in order to allow it to overcome its limitations. Due to the similarity between the conceptual schema of distributed DBMS and the domain ontology of information systems, O-Chord method integrates domain ontology [3, 6, 11, 23] (henceforth DO) in Chord protocol. This integration appears in two levels: (i) the keys of Chord protocol represent the terms of DO instead of representing the terms of local schemas and (ii) the terms of DO are also used to describe the structures of the relations on their sources. Before explaining the extension of Chord protocol, we illustrate the resemblance between the conceptual schema and the DO. A. Conceptual Schema Vs Domain Ontology Conceptual schema of a DBMS is a unified and global description of all data and all conceptual relations between this data. The conceptual schema gives a precise vision on the semantics which is expressed by using the concepts of relation, attributes, and constraints of integrity. It must be created at the beginning of the creation of a database. On the other hand, Ref. [23] defines “Ontology” as “an intentional description of what is known about the essence of the entities in a particular domain of interest using abstractions, also called concepts and the relationships among them.”. The DO organizes, in the form of a graph, knowledge of a domain by gathering the domain objects in subcategories according to their essential characteristics. In the information systems, DO is mainly used as a terminology shared by the users for explicit and coherent description of their knowledge. It restricts the interpretation of the concepts which it defines in a context specified by the domain. This characteristic has the advantage of limiting the ambiguity of the terms defined in the DO. In spite of this functional separation, there are some similarities between these two concepts. Such as: (i) the EntityAssociation model used in the conceptual description of data managed by a DBMS forms a semantic schema that looks like a DO, (ii) in the context of the Semantic Web, DO is used as a schema of data integration. The principle of such schema is to provide a single interface for the interrogation of heterogeneous data sources. In the context of this paper, we use DO as a conceptual schema. This ontology forms mainly a terminology shared by the nodes for an explicit and coherent description of their
419
A METHOD FOR LOCATING RELATIONAL DATA SOURCES IN A P2P ENVIRONMENT
knowledge. Such ontology allows data exchange in semantically meaningful manner. The DO has to be created by specialists of the considered domain and it must be duplicated on all peers in the system. Beside his own local schema and DO, each peer has a wrapper having the role of transforming a SQL query written according to local schema into other query written according to DO. This transformation is done in the reformulation phase; it will not be addressed in this paper. Until this point, we have answered the first two questions putted in section 2. In the rest of this section, we will answer the remainder questions.
SI Table of N56 K53
(N38, {Identifier, Age}), (N42, {Name})
K54
(N14, {Name, Address}), (N32, {Name, Salary,
(N1, {Name, Salary}), (N8, {Name, Salary}), Telephone}), (N42, {Name, Address}) Towards the node Res (k54)
k55
N1 N56
K57
DHT extension Let us consider that each key represents a single concept in the DO. This concept represents relations which are semantically similar and stored on several sources. Given a key, Chord protocol becomes able to locate the sources of the relations having the semantics required by the PIQ. However, Chord remains unable to choose which relations have the structures required by the PIQ. So, it is necessary to enrich the DHT by new information making it able to describe the structures of the relations on their sources. For that purpose, we propose to associate each key k with a structures index SI(k). This index contains the different structures of the concept (more precisely, the structures of relations associated with the concept) represented by k. The SI(k) allows the peer responsible2 for k to select the relevant sources to answer the localization query. These sources store the relations having the semantics and the structures required by PIQ. In figure 2, the concept “Doctor” is represented by the key k54. The node N56 which is responsible for k54 stocks the SI(k54). This index describes the structures of all relations associated with the concept “Doctor”. Each SI(k) is created at the same time of the creation of the key k. It is stored on the node responsible for the key k. At the moment of its connexion to the system, each node responsible for injecting its keys and the structures -----------------2
called also the successor of k, which is the peer having the identifier
succeeding directly k on the virtual ring of Chord protocol [22].
K58
N8
N48
(N38, {Name, Age}) N42
Name_Doctor}), (N14, {Name, Age}),
N14
LQ (k54, N8, {Name, Salary})
(N48, {Name, Address, Name_Doctor})
k59
{N1, N8, N32}
N51
SI Table of N1
(N8, {Name, Address,
B. Chord Protocol Extension In order to carry out the localization phase, we propose to extend Chord protocol to make it able, in one hand, to locate the sources of the relations associated with one concept in the DO and, in the other hand, to select the relevant sources storing relations which have the structures required by PIQ. For that objective, we suppose that all relations having the same meaning are associated with one concept in the DO. This concept is represented, in its turn, by a key. We make also two modifications on Chord protocol. While the first one extends the DHT (Distributed Hash Table) by adding new information describing the relations structures, the second one modifies the routing algorithm to make Chord protocol able to select the relevant data sources.
Answer
(N38, {Number, Date})
N38
(N38, {Number, Service}), (N42, {Number, Service})
N21 N32
Fig. 2. Locating the key k54 representing “Doctor”
describing the relations associated with concepts represented by those keys. It is also responsible for deleting those injected elements from the system at the moment of its disconnection. Routing algorithm In the Chord protocol, the localization query (noted LQ) consists of a key and the identifier of PIQ. In O-Chord method, we propose to add new information to LQ. This information describes the names of attributes belonging to the relation associated with the concept represented by the key. The objective of this added information is to make the node responsible for the key able to select sources storing the relations which contain these attributes. The O-Chord query routing algorithm is consisted of the following steps: (i) Localization of the node responsible for the key. This step is similar to that explained in the Chord protocol [22]. (ii) Selection of the relevant sources (RS) to answer the LQ. The node responsible for the key looks in its structural index for the required key in order to choose the nodes that store relations having the attributes referred by the LQ. (iii) Sending the answers to the PIQ in order to allow the PIQ to continue SQL query processing. In figure 3, we present O-Chord query routing method. In order to locate relevant data sources that store the relations addressed in a SQL query (noted Q) submitted on PIQ, the reformulation phase rewrites Q according to the DO. The rewritten query is noted by Q’. After that, all keys corresponding to the relations addressed in Q’ are generated. This operation is done by using the same hash function used to build the DHT. Then, for every key k a localization query LQ(k, PIQ, {attj}j=1,2, ..m ) is sent towards the node responsible for that key Res(k). This node looks up the sources corresponded to the key and selects whom that stores the same attributes {attj}j=1,2, ..m addressed in the localization query. Finally, the answer of localization query is sent to the PIQ. In order to clarify the different steps of this algorithm, we give an example.
420
KING ET AL.
O-Chord query routing algorithm Input: Q’ = {(Ri/ att1, att2, …attj)j=1,2, ..m }i=1,2,..,n , SQL query reformulated and written according to the Domain Ontology Output: {RS(Ri) : Relevant Sources storing Ri } i=1,2,..,n Begin For each Ri do Begin KiÅ Generate_Key(Ri) If PIQ is not the Res (Ki) Then Begin N Å closest_prdecessor(Ki) Send LQ(Ki, PIQ, {attj}j=1,2, ..m ) to N
// PIQ is the Peer Initialising the Query, Res (Ki) is the peer Responsible for Ki // closest_prdecessor(Ki) is the peer having the identifier preceding directly Ki [22] // if N is not Res(Ki), it send LQ to another node having identifier that mostly precedes Ki. This // step is repeated until LQ reaches the Res(Ki) which answers LQ by sending RS(Ri) to PIQ
Receive (RS(Ri)) End_if Else Select (RS(Ri)) Result[i] Å RS(Ri) End_for End
Fig. 3. O-Chord query routing algorithm
C. Example Let us suppose that we have the relational schemas represented by figure 1, the relation “Doctor” on N21 represents a person obtaining his PhD in chemistry science, this relation is associated with the concept “PhD_Doctor” in the DO. However, on the other nodes, the relations “Doctor” on N1, “Doctor” on N8, “Physician” on N14, “consultant” on N32 and “Doctor” on N42 are associated with the single concept “Doctor” in the DO. The attributes “Salary” on N1, “Salary” on N32 and “Paycheck” on N8 are associated with the concept “Salary” in the DO. In the other hand, the relations: “Ill” on N8, “Patient” on N14 and “Patient” on N48 are represented by the concept “Patient” in the DO. The attributes “Doctor_Name” on N8 and “Doctor” on N48 are associated with the concept “Doctor_Name” in the DO. Given the following SQL query on the node N8: Q:
Select From Where And
Ill.Name, Ill.Address, Doctor.Name Ill, Doctor Ill.Doctor_Name = Doctor.Name Doctor.Paycheck = 2000
Q is written according to the local schema of the node N8. In order to be understood by other nodes in the system, Q must be reformulated and rewritten in terms of the DO. The rewriting of Q is done at two levels: relations level and attributes level. The symbol “≡” means “corresponding in the DO” Relations level: R1/ Ill ≡ Patient, R2/ Doctor ≡ Doctor Attributes level: R1/ Name ≡ Name, Address ≡ Address, Doctor_Name ≡ Doctor_Name
R2/ Name ≡ Name, Paycheck ≡ Salary After the rewriting process, Q is transferred into Q’: Q’ = {(Patient/ Name, Address, Doctor_Name), (Doctor/ Name, Salary)} Let us suppose that key k54 represents the concept “Doctor”, according to Chord protocol we notice in the figure 2 that the node N56 is the responsible for this key. In order to answer the LQ, node N56 selects among the sources storing the relation “Doctor” those storing the attributes referenced by Q’. Node N56 send to PIQ the set {N1, N8, N32} as answer. In figure 2, we notice that the node N21 is not found in the SI(k54) because the relation “Doctor” on N21 is associated with the concept “PhD_Doctor” in the DO. In a similar way, Chord protocol locates the sources storing the relations associate with the concept “Patient” which is represented by k58. The answer is {N8, N48}. IV. RELATED WORKS In the last few years, many research projects have developed methods. The main goal is to allow peers in a P2P system to exchange their data in comprehensive way while carrying out efficient localization of data sources. Many projects, e.g. [2, 20], use semantics to improve the random routing of the unstructured P2P systems. However, localization phase in these systems is not reliable. Sometimes, these systems can’t locate the sources of a data entity (e.g. relation) even if this one exists in the system. Other projects, like [8], introduce semantics into super-peers P2P systems
A METHOD FOR LOCATING RELATIONAL DATA SOURCES IN A P2P ENVIRONMENT
which is more reliable. However, each super-peer may create a bottleneck for the peers for which it is responsible. In this paper, we integrate semantics into a structured P2P system based on Chord protocol [4, 22]. In this type of P2P systems, where peers are equal, localization phase is more reliable than unstructured P2P systems and more effective than super-peers P2P systems. In Piazza [12, 13] project, nodes define semantic associations between their local schemas. Piazza combines and generalizes the approaches LAV (Local-As-View) [5, 18] and GAV (Global-As-View) [1, 19]. Nevertheless, to create these associations, each node has to know the local schemas of several nodes with whom it shares its data. To locate data sources, query has to follow a chain of rewritings. In our proposed method, nodes do not need to know the local schemas of other peers. It needs only to know the domain ontology. Furthermore, to locate the data sources, query must be rewritten only once according to the terms of the domain ontology. In Hyperion [2, 9], data exchange is carried out due to the mapping tables. These tables define value correspondences among different databases. However, many efforts are needed to create mapping tables in semi-automatic manner and to maintain these tables up-to-date. PeerDB [20] supposes that each peer stores for every relation and for every attribute of its own schema a set of keywords. Semantic correspondence between these keywords and local schemas of other peers is made by the user intervention. This could generate incorrect correspondences. As opposed to PeerDB, O-Chord method does not need a user intervention to carry out the semantic correspondence. SenPeer [7, 8] allows data sharing between different data models (relational, objects or XML documents). Unlike OChord method, SenPeer does not take into account the structural heterogeneity between the local schemas. Bibster [14] uses a common ontology allowing peers to communicate together in a comprehensive way. However, Bibster is an unstructured P2P system supporting a specific application. It supports only the exchange of bibliographical files between researchers. V.
CONCLUSION
This paper proposes a method, called O-Chord, for locating relational data sources in a P2P environment. O-Chord method integrates domain ontology in Chord protocol. It allows peers in a P2P environment to exchange their data in a comprehensive way while carrying out reliable and efficient localization of data sources. While Chord protocol provides efficient localization in terms of query routing and scalability, domain ontology provides a comprehensive data exchange. OChord method, allows Chord protocol to select the relevant data sources which store relations having the structures and the semantics required by the peer initialising the query.
421
Our future work will be dedicated to the quantitative studies of the impacts concerning the modifications made on the Chord protocol in terms of localization query routing and DHT storage volume. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8]
[9] [10] [11] [12]
[13] [14] [15] [16] [17] [18] [19] [20]
[21] [22] [23]
S. Adali, K. Candan, Y. Papakonstantinou, and V. Subrahmanian, “Query Caching and Optimization in Distributed Mediator Systems”, Proc. SIGMOD, pp. 137-148, 1996. M. Arenas et al., “The Hyperion Project: From Data Integration to Data Coordination”, SIGMOD Record 32(3), pp. 53-58, 2003. B. Chandrasekaran, J. R. Josephson and V. R. Benjamins, “What Are Ontologies, and Why Do We Need Them?”, IEEE Intelligent Systems, pp. 20-26, 1999. Chord project homepage, http://pdos.csail.mit.edu/chord. O.M. Duschka and M.R. Genesereth, “Answering Recursive Queries Using Views” Proc. 16th ACM SIGACT-SIGMOD-SIGART Symposium. Principles of Database Systems, pp. 109-116, 1997. J.O. Everett et al., “Making ontologies work for resolving redundancies across documents”. Communication of the ACM, Vol. 45, No. 2, pp. 55-60, 2002. D. Faye, G. Nachouki and P. Valduriez, “SenPeer, Un système Pair-àPair de médiation de données”, Research Report, INRIA, 2006, pp. 24-48, ARIMA. D. Faye, G. Nachouki and P. Valduriez, “Semantic Query routing in SenPeer, a P2P Data Management System”, Network-Based Information systems, first international conference, NBis2007, Regensburg, Germany, September, 2007. P. Rodriguez-Gianolli et al., “Data Sharing in the Hyperion Peer Database System”, In Proceedings of the International Conference on Very Large Databases (VLDB), pp. 1291-1294, 2005. Gnutella homepage, http://www.gnutella.com. T. R. Gruber, “A Translation approach to portable ontology specifications”, International Journal of Knowledge Acquisition for Knowledge-based Systems, Vol. 5, No.2 , 1993. A. Halevy, Z. Ives, P. Mork and I. Tatarinov, “Piazza : Data Management Infrastructure for Semantic Web Applications”, In Proceedings of the twelfth international conference on World Wide Web, Budapest, pp. 556-567, 2003. A. Halevy et al., “The Piazza Peer Data Management System”, IEEE Transactions on Knowledge and Data Engineering, Vol. 16, N. 7, PP. 787-798, July, 2004. P. Hasse et al., “Bibster - A Semantics-Based Bibliographic Peer-toPeer System”, Proceedings of the International Semantic Web Conference (ISWC2004), Hiroshima, Japan, November, 2004. R. Huebsch et al. “Querying the Internet with PIER”. 29th VLDB, Berlin, Germany, 2003. R. Huebsch et al. “The Architecture of PIER: an Internet-Scale Query Processor”. CIDR, 2005. Kazaa homepage, http:// www.Kazaa.com. A.Y. Levy, A. Rajaraman and J. J. Ordille. “Querying Heterogeneous Information Sources Using Source Descriptions”, In Proceedings of 22nd VLDB, pp. 251-262, 1996. H. Garcia-Molina et al., “The TSIMMIS Project: Integration of Heterogeneous Information Sources,” Journal of Intelligent Information Systems, vol. 8, No. 2, pp. 117-132, March, 1997. W. S. Ng, B. C. Ooi, K. Tan and A. Zhou, “PeerDB : A P2P-based System for Distributed Data Sharing”, in Proc. of the 19th International Conference on Data Engineering, Bangalore, India, pp. 633-644, 2003. T. Özsu and P. Valduriez, “Principles of Distributed Database Systems”. 2nd Edition, Prentice Hall, 1999. I. Stoica, et al. “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications”, SIGCOMM’01, San Diego, California, USA, 2001. W. Sun and D. X. Liu, “Using Ontologies for Semantic Optimization of XML Databases”, KDXD 2006, Singapore, April, 2006.
Intuitive Interface for the Exploration of Volumetric Datasets Rahul Sarkar1, Chrishnika de Almeida1, Noureen Syed1, Sheliza Jamal1, Jeff Orchard2 1 Department of Electrical and Computer Engineering, University of Waterloo 2 David R. Cheriton School of Computer Science, University of Waterloo Abstract Conventional human-computer interfaces for the exploration of volume datasets employ the mouse as an input device. Specifying an oblique orientation for a crosssectional plane through the dataset using such interfaces requires an indirect approach involving a combination of actions that must be learned by the user. In this paper we propose a new interface model that aims to provide an intuitive means of orienting and translating a crosssectional plane through a volume dataset. Our model uses a hand-held rectangular panel that is manipulated by the user in free space, resulting in corresponding manipulations of the cross-sectional plane through the dataset. A basic implementation of the proposed model was evaluated relative to a conventional mouse-based interface in a controlled experiment in which users were asked to find a series of targets within a specially designed volume dataset. The results of the experiment indicate that users experienced significantly less workload and better overall performance using our system.
1
INTRODUCTION AND RELATED WORK
Many applications that make use of 3D visualization involve the formation of a volume dataset through interpolation within sets of 2D images. Notable examples include clinical imaging modalities such as magnetic resonance imaging (MRI) and computed tomography (CT), which rely on software construction of a volume from 2D slice acquisitions. The analysis of constructed volumes for diagnostic and therapeutic purposes generally involves a search for features of interest, and is usually conducted by viewing and manipulating one or more cross-sectional planes through the volume. Computing systems used for these applications feature software interfaces which provide orthogonal cross-sectional planes in the standard axial, sagittal, and coronal views through the imaged volume. The translation of these planes along the standard axes is an action well suited to mouse scrolling, where an upwards or downwards scroll respectively correspond to movement
along a given axis in one direction or another. However, clinically relevant features often exist at orientations that are oblique with respect to the standard views. Support for interaction with oblique cross-sectional planes is often provided through the mouse by mapping combinations of button clicks and mouse motions to translation and rotation actions, requiring cognitive effort on the part of the user to learn and apply these combinations. Most commercial interfaces for dealing with volumetric data almost universally use the mouse as an input device, providing software tools for 3D manipulation using the mouse. An intuitive physical interface capable of providing 3D input could have significant utility towards improving user experience and reducing the workload associated with exploratory activities when compared with this mouse based interface. There has been a great deal of research in 3D user interfaces to date, resulting in the development of a wide variety of systems in research settings. Most of the recent advances in 3D interface design for exploring volumetric data have been inspired by surgical planning applications, resulting in the creation of several application-specific interfaces that each correspond to a particular anatomical or surgical domain [1, 3, 4, 5, 6]. All of these studies made use of 3D user interfaces that were rated as being useful by the test users, usually surgeons. Yet despite the demonstrated effectiveness of interfaces such as these, 3D interfaces are still rare in both clinical settings and general computing systems. Balakrishnan [2] notes that most of us continue to communicate with our computing technology via primarily 2D interfaces, even when dealing with 3D data. There are likely several reasons behind this, including the high cost associated with many of these systems. Reitinger et al. [7] describe the main limitation of their system being used in clinical routine as hardware cost, since a virtual reality (VR) setup must be installed. Most of the above mentioned systems also make use of VR and are hence subject to similar costs due to the need for components such as rendering workstations. Due to such practical considerations, an interface that can be used with a standard workstation may be more widely useful. Furthermore, all of the mentioned systems were developed for specific application domains, with tools and props customized to these applications. The abstraction of the problem domain to the exploration of any volumetric
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 422–427. © Springer Science+Business Media B.V. 2008
INTUITIVE INTERFACE FOR THE EXPLORATION OF VOLUMETRIC DATASETS
423
dataset presents the opportunity to create a general interface device that could have widespread applicability. While lacking some of the finer control points of the applicationspecific interfaces, such an interface could offer a similar character of generality that makes the mouse such a widely used two dimensional input device. We present an interface model intended to be intuitive and versatile. Our model uses a hand-held rectangular panel that is manipulated by the user in free space, resulting in corresponding manipulations of the cross-sectional plane through the dataset. In the next section, we will describe a basic implementation of this model using an electromagnetic tracking system, a minimal number of sensors coils, and a standard workstation, that allows us to evaluate the model with respect to a mouse-based interface. We describe a user study in which participants were asked to find a series of targets by manipulating a cross-sectional plane in a 3D image using both a mouse-based interface and the hand-held device interface. We then conclude with a description of the results and plans for future work.
2
IMPLEMENTATION
The components of the interface system include the hand-held panel, a magnetic motion tracking system, a PC and a standard display. The motion of the panel is tracked within the magnetic field generated by the motion tracking system. Two sensor coils are attached to diagonally opposite corners of the panel, and the motion tracking system measures five degree-of-freedom (5 DOF) position and orientation information for each coil. The PC polls the motion tracking system and retrieves positional data using application-programmer interface (API) calls. Visualization software converts the data for each sensor coil from the coordinate system of the motion tracking system to a standard graphical coordinate system and calculates the position of a third corner point based on the position and orientation of the two sensor coils and the known dimensions of the panel. The three points are then used to prescribe a cross-section through a previously loaded volume in the visualization software. Figure 1 shows a high level overview of the system in use. The user moves the panel in the free space bounded by the motion tracking volume produced by a field generator (not shown). The motion data from the attached sensor coils are transmitted via sensor interface units to the control unit of the motion tracking system. The control unit relays the data to the PC through an RS-232 interface. Once the system software translates this data into a visualization plane in the volume dataset, the plane appears on the display, and the user continues to adjust the motion of the device based upon the observed plane.
Figure 1. High-level overview of the system
Our implementation makes use of the Aurora tracking system from NDI (Northern Digital Inc., Waterloo, ON, Canada) for tracking the motion of hand-held panel. This tracking system uses electromagnetic measurement to track sensor coils within the scope of a defined measurement volume. The field generator produces an electromagnetic field with a characterized measurement volume of 50 cm x 50 cm x 50 cm, projecting outward from one side of the generator. When the field generator is placed on a table roughly across from the upper torso of a seated user, the bounds of the measurement volume are comfortably within the reach of most users. System control and measurement data are accessible through a NDI Combined Polaris and Aurora Application Programmers’ Interface, and communication between Aurora and the PC is done via a standard RS-232 port. The hand-held panel is composed of a firm cardboard material and measures 25cm x 21cm x 0.5 cm. The panel is comfortably gripped using either one or two hands, and is easily moved to arbitrary positions and orientations within the measurement volume. It can also be extended to the outer boundaries of the measurement volume with relative ease. Figure 2 shows the handheld panel with tracking tools attached. The panel uses two sensor tools attached to diagonally opposite corners. Each tool has a single 5 DOF sensor embedded within it, and is able to report 3D position coordinates with respect to the tracking volume, as well as orientations along any axis other than rotation about its longitudinal axis. The decision to use two sensors to measure corner points of the panel was motivated by the need to provide three points (or two lines) to define the plane of our desired cross-section, while trying to minimize the number of sensor tools needed. With two points directly measured in diagonally opposite corners, a third corner point along the longitudinal axis of either measured point can be calculated using the position information of that point, along with the combination of 5 DOF orientation information and the known distance between the measured point and the calculated point.
424
SARKAR ET AL.
Figure 2. The Hand-held Panel (sensor coil positions indicated)
Positional coordinates reported by the tracking system are with respect to the measurement volume coordinate system, which differs from that of traditional graphical coordinate systems in visualization applications. The interface uses a software mapping to convert between coordinate systems, and the bounds of each axis are also linearly scaled to the dimensions of the 3D visualization. Orientation information from each sensor coil is retrieved from the tracking system in normalized quaternion form, which is then converted to rotation matrices using governing equations as described in [6]. The orientation information is with respect to the coordinate system of each sensor coil, which defines its longitudinal axis to be the zaxis. The coordinates of the third corner point are calculated in order to enable the definition of the cross-sectional plane using lines that extend from the third point to each of the two measured points. The distance between one sensor coil and the corresponding corner point along its longitudinal axis was previously measured and is denoted L. In performing the calculation, the position of one of the sensor coils is assumed to be the origin, and the position of the missing point is assumed to be offset from the origin by a distance of L along the negative z-axis with respect to the coordinate system of the coil. The rotation matrix R describing the orientation of the sensor coil is multiplied with the assumed position of the missing point, placing it at the correct point in space relative to the assumed sensor coil position at the origin. The translated position coordinates for the sensor coil are then added to the rotated position of the missing point, shifting the point to the correct absolute location within the measurement volume. This process is repeated to obtain a second position calculation using the data from the second sensor coil, and the final position values in each axis are averaged between the two sets, providing some error correction in the case of minor shifts in either coil. Figure 3 depicts this process using the second sensor coil position (x2, y2, z2) to calculate the missing third point (x3, y3, z3).
Figure 3. Calculation of the third point
All serial communication between the PC and Aurora was implemented in C++ using the NDI Combined API for communication. Methods were written to initialize the system and ports, and poll the system for position and orientation information for each sensor coil. Coordinate system transformations, as well as conversion between quaternion and rotation matrix formats, were also implemented in C++. In order to support error handling in situations where either sensor coil is moved outside of the volume or the system undergoes temporary electromagnetic interference, additional pre-processing of the data was implemented to reject any samples that did not include valid coordinates for every axis and sensible orientation information. The methods were compiled under Windows XP and packaged as a dynamic-link library (DLL). Visualization software was written in Python 2.3 using VTK 4.1 (www.vtk.org) and Atamai 1.0 (www.atamai.com) classes. The Atamai classes include support for the rendering of cross-sectional planes in the standard axial, sagittal, and coronal views as well as oblique planes. These classes all support mouse interaction through click and drag translation, and the oblique plane class further supports full plane rotation through additional click and drag methods. In order to add support for our hand-held panel, we used the SWIG interface compiler (www.swig.org) to generate Python wrappers for the C++ methods that communicate with Aurora and perform data pre-processing. The conversion between coordinate systems and calculations for specifying the plane were also implemented in Python.
INTUITIVE INTERFACE FOR THE EXPLORATION OF VOLUMETRIC DATASETS
3
USER STUDY
We conducted a study to compare the general usability of our hand-held panel interface to that of a standard mousebased interface. Since the intended applications of our interface are to allow for effective and intuitive exploration of volume datasets, we were also interested the performance of users when finding regions of interest within a volume using each interface. To this end, we designed an experiment in which each participant was asked to find a series of clear targets within a 3D image of a specially designed volume. The completion times for each target acquisition were measured, and participants completed surveys regarding the workload in using each interface as well as their overall experience. We hypothesized that completion times for the tasks assigned would be reduced when using the panel interface since the cross-sectional position and orientation is updated with the continuous motion of the panel, whereas the mouse based interface often requires composite actions to achieve the same cross-sectional prescriptions. We expected this difference to be more pronounced in the case of oblique targets when compared with axial targets, since oblique prescriptions generally require combinations of rotations in addition to basic translations. Workload factors of interest for the tasks used in the experiment were mental demand, performance, physical demand, frustration and effort. For assessing participant workload in completing the tasks, a software implementation of the NASA Task Load Index (NASA-TLX) workload rating was used [7]. At the conclusion of the experiment, participants completed a survey indicating which interface they preferred in completing the tasks, and which they found more intuitive for cross-section control. The independent variables were the interface used and type of target. We chose two target types: axial targets, and oblique targets. For axial targets, the desired crosssectional target plane was situated at a particular depth along the axis, normal to the axial view in the volume image. These target planes could be acquired without any need to rotate the cutting plane. For oblique targets, the plane was situated at some angle that was oblique to the standard views, requiring both rotation and translation operations from the default axial orientation. Our trials involved two different axial targets, and two different oblique targets. Each participant was asked to find one axial target and one oblique target using each interface, for a total of four targets. The participants were randomly assigned to one of two groups, and each of the groups was assigned different axial and oblique targets following an independent-measures design in order to eliminate the effects of target location learning by the participants. The order in which the targets were presented was based on a 4x4 Latin Square design for each group, where one dimension represented the target and the other dimension
425
represented the four participants in the group, for a total of eight participants. Participants were undergraduate and graduate students from a variety of disciplines. The vast majority of participants reported their degree of computer usage as moderate to high, and their degree of experience with graphics/visualization software as minimal. For the test volume used in our experiments, we used a DICOM dataset of a specially constructed target set acquired using MRI. The target set included certain structural shapes and engravings intended to serve as clear target markers in the volume image acquired using MRI. Target markers were strategically placed at a variety of orientations. Figure 4 shows the targets that participants were asked to find during the experiment. Two of the four targets (a and b) were along planes that were roughly parallel to one of the axial views, and the other two targets (c and d) were along planes with orientations that were oblique with respect to the standard views.
Figure 4. Axial (a and b) and oblique (c and d) targets acquired by participants.
The experiment was conducted on one participant at a time. Before beginning the experiment, each participant underwent a brief orientation session during which the general nature of the study and the experimental tasks were explained. They were then instructed on the use of the visualization software using both the mouse-based interface and the panel interface. These instructions were also presented in written form, and the participant was given the opportunity to ask for clarification. Once the experiment began, a target was presented on screen and the participant used the assigned interface to acquire the target within the volume image. This was done under the supervision of a coordinator who was present to enforce consistency in environmental factors between trials. Each participant was timed in their completion of each target task, and there was a maximum timeout of five minutes for the acquisition of each target. After completing the second task for each assigned interface, the participant was asked to complete the NASA-TLX workload rating. Once all tasks were completed, the trial concluded by having the participants complete a questionnaire regarding their overall preferences between the interfaces.
426
4
SARKAR ET AL.
RESULTS
Completion times for all tasks acquisitions using each interface were measured. Figure 5 shows the average completion times for axial targets, with error bars indicating one standard deviation. Average completion times for axial target acquisition was comparable between the two interfaces, with no statistically significant difference observed (p=0.195). Average completion times were higher using the panel interface for three of the four targets, with the exception being the first axial target (target b in Figure 4). This observation can likely be explained in part by the close proximity of the first axial target to the default starting position of the cross-sectional plane.
Participant responses to the NASA-TLX rating for each interface indicated lower average values for each workload factor besides physical demand when rating the panel interface. We were able to identify statistically significant differences for mental demand (p=0.041), effort (p=0.018) and frustration (p=0.010), with lower average workload values for the panel interface, as well as a statistically significant difference for physical demand (p=0.003) with a lower average value for the mouse interface. Figure 7 shows the average workload values for each of these factors with error bars indicating one standard deviation.
Average Workload Factor Ratings 25
Average Completion Time
20 Rating
120 100 Time (s)
80 Mouse
60
Panel
40
Mouse Panel
10 5 0 Mental Demand
20
Effort
Frustration
Physical Demand
Workload Factors
0 Axial 1
Axial 2
Figure 7. Average workload factor ratings for target acquisition using each interface
Targets
Figure 5. Average completion times for axial targets using each interface
The average completion times for the oblique targets are summarized in Figure 6. The average completion times for each target acquisition were markedly lower using the panel interface. We were able to show a statistically significant difference in completion times for the oblique targets (p=0.0047) based on an alpha-level of p=0.05 Average Completion Time 250 200 Time (s)
15
150
Mouse Panel
100 50 0 Oblique 1
Oblique 2 Target
Figure 6. Average completion times for oblique targets using each interface
The higher average physical demand rating for the panel interface is statistically significant, but is still low in absolute terms when compared to ratings for other factors. The lack of significant difference in workload due to performance (p=0.439) may be the result of statistically similar completion times for the acquisition of three of the four targets. The results for performance ratings may have also been affected by the phrasing of related survey questions. Several participants expressed confusion regarding questions relating to performance and temporal demand, and asked for clarification while completing the survey. Figure 8 shows the average overall workload rating for each interface. We observed statistically-significant lower average overall workload ratings for the panel interface (p=0.021).
INTUITIVE INTERFACE FOR THE EXPLORATION OF VOLUMETRIC DATASETS
questionnaire, we conclude that panel interface is generally preferred to the mouse-based interface for feature localization tasks such as the target acquisitions in the user study, and provides more intuitive control of the crosssectional plane.
Average Overall Workload 80 70 60 Rating
50 Mouse
40
Panel
30 20 10 0 Total Workload
Figure 8. Average overall workload ratings for target acquisition using each interface
Responses to the overall experience questionnaire indicated that six of the eight participants preferred the use of the panel interface to the mouse for completion of the tasks, and all participants found the panel interface more intuitive for the control of the cross-sectional plane.
5
427
CONCLUSIONS
In this paper, we proposed the use of a new 3D interface model for the exploration of volumetric datasets. With the aim of designing an interface that takes advantage of proprioceptive abilities while remaining generically applicable, the model makes use of a hand-held panel that models the cross-sectional plane of interest, as opposed to props that model application-specific features. We described a basic implementation of this model, along with a user study that evaluated its usefulness against that of a mouse-based interface. From the results of the study, the performance using each interface is comparable for acquiring targets in a standard axial view, but can be significantly improved for acquiring oblique targets by using the panel interface. In practice, features of interest can exist at arbitrary locations and orientations within the volume during exploratory activities, emphasizing the importance of the improved oblique exploration offered by the panel interface. Based upon lower NASA-TLX overall workload ratings for the panel interface, as well as lower ratings across several factors including mental demand, effort, and frustration, we conclude that the use of panel interface for completing the task involves significantly less workload than the mouse-based interface. While the physical demand of using the panel interface was found to be significantly higher than that of using the mouse-based interface, the physical demand was found to be a very low contributor to the overall workload experienced for both interfaces. From the results of the overall experience
Finally, given that the panel interface model can be applied to any type of volumetric data exploration, while involving implementation costs significantly less than comparable interfaces that make use of VR immersive environments, we believe that this model has the potential for widespread use in volumetric data exploration.
6
FUTURE WORK
We are currently examining alternative implementations for the interface model, including the use of optical tracking using infrared sensors and triangulation, in order to further reduce the associated costs. In addition, we are exploring the possible application of this interface model towards real-time scan plane control in MRI, where the absolute position of the panel with respect to the reference could be linked to the position of the imaging plane in the target.
ACKNOWLEDGMENT We would like to thank Ali Abharisa and Dingrong Yi from Sunnybrook Research Institute for providing the phantom volume image used for target acquisition in the user study.
REFERENCES [1]
[2] [3] [4] [5]
[6]
[7]
A.F. Ayoub, D. Wray, K.F. Moos, P. Siebert, J. Jin, T.B. Niblett, C. Urquhart, and P. Mowforth. Three-dimensional modeling for modern diagnosis and planning in maxillofacial surgery. Int. J. Adult Orthod.Orthognath. Surg, 11:225.233, 1996. R. Balakrishnan. Why Aren’t We Using 3D User Interfaces, and Will We Ever? IEEE 3DUI 2006, viii-viii, 2006. H.S. Byrd and P.C. Hobar. Rhinoplasty: a practical guide for surgical planning. Plast. Reconstr. Surg., 91:642.654, 1993. E.Y.S Chao and F.H. Sim. Computer-aided preoperative planning in knee osteotomy. Iowa Orthop. J., 15:4.18, 1995. S. Haasfeld, J. Zoller, F.K. Albert, C.R. Wirtz, M. Knauth, and J. Muhling. Preoperative planning and intraoperative navigation in skull base surgery. J. Cranio-Maxillofac. Surg., 26:220.225, 1998. K. Hinckley, R. Pausch, J.H. Downs, D. Proffitt, and N.F. Kassell. The Props-Based Interface for Neurosurgical Visualization. Stud. Health Technol Inform, 39:552-562, 1997. B. Reitinger, D. Schmalstieg, A. Bornik, and R. Beichel. Spatial Analysis Tools for Virtual Reality-based Surgical Planning. IEEE 3DUI 2006, 37-44, 2006.
New Trends in Cryptography by Quantum Concepts 1
2
SGK MURTHY
M.V.Ramana Murthy P.Ram Kumar
1
2
3
Defence Research & Development Laboratory Kanchanbagh PO, Hyderabad – 500 058,India
Department of Mathematics and Computer Science, Osmania University.Hyderabad – 500 007,India
3
Department of Computer Science and Engineering,College Of Engineering,Osmania University.Hyderabad – 500 007,India Abstract Communication of key to the receiver of cipher message is an important step in any cryptographic system. But the field of cryptanalysis is also progressing keeping pace with progress in cryptography and everyday newer and faster computers with more and more memory space are appearing. Increasing the vulnerability of key distribution process and in turn vulnerability of secret message. People are exploring totally different areas of technology in search of a secure key distribution system. “Quantum cryptography” is one such technology with promises for future. This paper describes issues related to integrity check of quantum cryptography based on qbits. 1. Introduction Information security has it’s own importance right from the early days of communication. Cryptography plays a vital role in Internet based applications where secrecy is a primary requirement. Because of its importance advances are taking place in techniques and technologies of cryptography[1] even today. The science of transforming information with the sole purpose of hiding[2] it from prying eyes has developed over the years. There are different policies and methodologies for encryption and decryption all of which use one or more keys[3]. The cryptographic systems can be divided into two classes – symmetric and asymmetric[4]. The symmetric system uses a single key for both encryption and decryption whereas the asymmetric system uses a different key for encryption and another different key for decryption. Both classes have advantages and disadvantages. The algorithms for encryption and decryption in symmetric key system are faster and the expansion of the message on encryption
is minimal and can be controlled by choice of policy[5]. But the disadvantage is that it is more vulnerable because of a single key, which must be transferred through communication channel. In an asymmetric system, one of the two keys, the private key, used for decryption of cipher, can be kept secret[6] with the receiver making the system very secure. The security[7] of an asymmetric system is based on the fact that the private key used for decryption is not required to be shared or communicated and even if an intruder lays hand on the other key, the public key, used for encryption and the encrypted message, he will not be able to break the cipher. For calculating the secret private key from public key information one has to solve certain hard mathematical problems like factorization problem or discrete logarithm problem. The problem is hard as the algorithms for solving such problems take exponential time and by suitable choice of key length one can ensure security for any desired time span. But the disadvantages are that the computations for encryption and decryption are very slow because of involvement of very large integers and also expansion of message on encryption is very large making it unsuitable. Algorithms based on symmetric key and asymmetric key cryptographic methods are used for achieving different types of security features[8] demanded by application areas. Confidentiality of the message primarily depends on the confidentiality of the key. Like the message the key must also be communicated between the sender and the receiver along with the message or separately. In the case of key the communication may be in reverse direction also in some arrangement - the receiver can design the key and send it to sender for encrypting the message with the key. There may be a third party arrangement for secure key generation and
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 428–432. © Springer Science+Business Media B.V. 2008
NEW TRENDS IN CRYPTOGRAPHY BY QUANTUM CONCEPTS
its transfer to users. Barring the case of private key of asymmetric cryptography, all other keys get exposed to the hazards of communication channels at some stage or other except in cases where it is feasible to transfer keys from person to person. 2. Key Distribution Problem in cryptographic systems
429
of providing adequate security in near future. People are exploring totally different areas of technology in search of a secure key distribution system. However the same quantum theory, which promises speed in quantum computers, provides an alternative approach for absolute security. Quantum bits or qbits or Qubits as some authors prefer to call it, is one such technology with promises for future. 3. Quantum physics
The problem of key transfer over communication channel, sometime referred to as key distribution problem, needs separate discussion. To achieve the confidentiality of the key during communication, the key itself should be treated like a message and encrypted using another set of keys. That of course leads one to the problem of communication of second set of keys and so on. This does not raise hope of a reasonable solution to key distribution problem by existing methods of message encryption and decryption. The problem of key distribution has different emphasis area as keys are essentially very small in size compared to normal messages. Because of these large overheads in computation time, memory space and complexity in procedures are manageable. Presently widely used cryptographic systems are hybrid in nature – partly symmetric and partly asymmetric. The message part is encrypted with a single key and this key is communicated by encryption-decryption by an asymmetric system. Of the two keys in an asymmetric system one is a public key, used for encryption of original message key and the other is the private key, used for decryption. The private key is never communicated and so this method of key distribution is quite secure. In any operational system, the key length is so chosen that sufficient time margin is available before the cryptographic system is broken by solving the factorization or discrete logarithm problem. But the situation is likely to change. The advent of concept of quantum computers and progress in the realization of such a computer promises availability of enormous computing power. Simultaneous progresses are taking place in cryptanalysis – the science of cipher breaking. Peter Shor of Bell Laboratory has patented an algorithm for solving hitherto categorized as hard problems factorization and discrete logarithm in polynomial time using quantum computers. As a result present asymmetric cryptographic systems will not be capable
Light, according to quantum theory, is a propagating wave of discrete particles, called photons. A photon is a quantum of electromagnetic field characterized by energy, momentum and angular momentum or spin but without mass. The direction of spin of a photon is the angle of polarization of the photon. There is another description of polarization. The electrical part of the electromagnetic field of photon vibrates in a plane. The angle of vibration of the plane is called polarization of the photon. In normal light, even though the photons travel in one direction, the angles of vibration of each photon are different. A normal light source generates un-polarized light as it consists of photons each one of which has different angle of polarization. If a beam of normal light is passed through a polarization filter, by law of quantum physics, the filter stops fifty percent of the photons and remaining fifty percent pass through it. The angle of polarization of all the photons that pass through the filter get aligned to the angle of the filter, regardless of initial angle of polarization of each one of those passed photons, and the process is called polarization of light. A photon’s polarization can be determined by a photon detector to determine whether it passed through a filter and the angle of the filter, to be more accurate, the axis of the filter, known as Polaroid filter. Also feasible and available with some limitation is a photon gun where one can get a photon on demand. With the combined availability of photon gun, polarization filter and photon detector, one can assure generation of photon with known polarization. For use of photon in cryptography one should have, in addition to capability of controlled generation of polarized photons, a technology for transmission of photon over long distances through a medium or vacuum. Optical fibre technology may be used for transport of photon over a distance. According to a report a maximum distance of 67 kilometer has been achieved in fibre optic cable[9] without loss of a photon. Similarly at the receiving end a polarization filter and a pair of photon detectors are
430
MURTHY ET AL.
sufficient to tell whether a photon has arrived and whether it has passed the filter or not. A photon with any predetermined polarization angle can be used to transmit information. For example one may use a photon with vertical polarization to represent bit 0. The corresponding orthogonal, the horizontal polarization of another photon may be used to represent bit 1. Mention may be made at this stage that the photons, rather the quantum bits can exist in superposition of two states, providing the photon with capability of storing 0 and 1 at the same time. This property of quantum mechanics is exploited to generate the computing power of a quantum computer, but it has no effect in quantum cryptography. In quantum cryptography a photon in a single state of polarization is used to transmit a bit of information. In this paper for the sake of convenience we consider four possible polarizations of the photon in so called pure or un-superposed state, vertical, horizontal, right diagonal and left diagonal. Two photons with vertical (‘V’) or horizontal (‘H’) polarization raise the concept of orthogonal quantum states. A ‘V’ photon never passes from the polariser that is meant for ‘H’ polarization. Similarly ‘H’ photons never pass from the polariser, is meant for ‘V’ polarization. So ‘V’ and ‘H’ are two orthogonal quantum states of a photon. We can visualize that any single photon polarization is a two-state quantum system. Further ‘V’ and ‘H’ form a basis for the space of polarizations, denoted as VH basis. A photon polarized in vertical direction has a probability 0 for passing through horizontal polarizer, probability 1 of passing through vertical polarizes and probability ½ of passing through a polarizer kept at 45 degree or 135 degree, in left or right diagonal directions. Because of this, the orthogonal polarizers in left and right diagonal directions form another basis. conjugate to VH basis. The two bases are conjugate as a photon from any member from one basis is completely randomized when presented to a member of the other. The second combination may be denoted as LR basis.
If Alice wants to send any message to Bob, she must have two communication channels called quantum channel and insecure channel as shown in Fig 1. Quantum channel is used for quantum bits communication, and insecure channel is used for general as well as symmetric cryptographic encryption communication. I A IN SECURE CHANNEL ALICE
BOB
QUANTUM CHANNEL
Fig 1 Step 1 - Alice sends a series of photons to Bob in such a way that each of the photon is randomly polarized in any of the given four polarizations i.e. rectilinear:+ (horizontal, vertical) and Diagonal : X (left diagonal, right diagonal). Suppose Alice sends photons to Bob in the following polarizations.
A
B
POLARISER POLARISER
p
QUANTUM CHANNEL
Fig2
4. A method of key distribution by using quantum concepts If a photon is polarized in a given direction by Alice and Bob measures it with a Polaroid, which is nothing but a polarizing filter, with same direction. Bob gets the photon with a certainty. If Bob measures in wrong direction, he gets a random result, which depends on the probability. How two persons could agree on a secret key by using quantum mechanics is described in the following.
Step 2 - Bob has a polarizer to detect the photons. He can adjust his polarizer in any one direction, so that he can measure rectilinear or diagonal polarizations. But it is not possible for both. The natural uncertainty of photons would not allow him to measure both.
NEW TRENDS IN CRYPTOGRAPHY BY QUANTUM CONCEPTS
Suppose he sets his polarizer in the following directions corresponding to each photon.
431
Step7- If integrity check is successful, the same bit string is used as one time pad. Otherwise the above process can be repeated.
X +XX + +X+X ++X If the polarizer setting is matched with the corresponding photon’s polarization, then Bob gets the photon through the Polarizer. Suppose Bob gets the following result.
Step3 – Now Bob informs Alice over the in-secured channel, about the settings he has used. Step4 – Alice tells Bob which settings are correct. Alice and Bob can exchange the settings information in terms of bases - + and X to restrict the information put on open communication channel. Within the same base there are two orientations of the polarizer and the probability of a mismatch of actual angles of polarizers at the two end is fifty percent. But it is possible to infer about correct orientations of polarizers from the outcome at Bob’s end from result of detection. Step5 – Alice and Bob consider the polarizations that were correctly measured.
For translating the photons polarization measurements into bits, a pre arranged code be used. For example it may be considered that for horizontal and left diagonal 1 might be used and for vertical and right diagonal 0 might be used. By translating the above polarizations that are commonly agreed, the following bit string can be generated. 001010 On the average, the probability of guessing the correct polarizations is 0.5. To generate n bits, Alice requires 2n photons for transmission. Step6 – Alice and Bob check the integrity of the considered bit string by testing.
At the end of the process the sender and receiver has got a sequence of 0s and 1s known to each other. This sequence in turn may be used to generate a sequence of numbers. This is known as quantum key distribution - the essence of quantum cryptography. Any intruder hijacking few bits during communication can be detected easily. Loss of bits are of no consequence as keys are formed from available bits known at both ends If bits are short in number than required, more bits can be communicated to make up the number. The quantum phenomena make the sequence of bits finally available at both ends, random in nature. The random characteristic may further be enhanced and may be guaranteed by choice of bit sequence at Alice’s end and selection of starting point and size of key pad. Following a ‘use and discard’ policy of the keypad and use of operations of Vernam cipher, will, as far as one can see, provide a perfectly secure key system. By using the above scheme, Bennet and Brassard built a working model of quantum key distribution. By exchanging a series of qbits through fiber optic link, quantum cryptography[10] provides absolute security. 5. Conclusion The problem of digital signature has some commonality in technical details with key distribution problem. In a typical digital signature problem, a document, mostly financial or legal in nature is communicated where secrecy of the content is not important but the authenticity is. It is to be ensured that the authorized person has sent the document and it has reached the receiver without any tampering. In such a situation, the document is first hashed or converted into an error detecting code. Thereafter this code or hash number is treated like a key of key distribution problem. There is a small change in use of terminology. The key used for encryption of the hash number is the private or secret key in this case and the key used for decryption is the public key. Quantum mechanics that promise an unconditionally secure mode of key distribution, has to offer an equally secure method for authentication or digital signature. Progresses have been reported in realization of photon gun and photon detectors. Different versions of polarizers based on transmission, reflection, refraction and scattering of photons are already available commercially. Experiments for
432
MURTHY ET AL.
communication of photon in fibre optic cable for longer distances are under way in some centers. Recent experiments elsewhere proved that polarized photons could be transmitted over long distances with out the use of fiber optics With all these developments, one can see that quantum cryptography is going to play a dominating role in secrecy functions in near future. There are indications that some of the advanced countries have already started converting their secrets in quantum versions. BIBLIOGRAPHY 1. Singh , S.Code Book: “THE Evolution of Secrecy from Mary, Queen of Scots to Quantum Cryptography,” Doubleday, 1999 2. Stephenson,P: “Securing Remote Access”, Network Computing 130-134 Feb. 1995 3. Steinke, S, “Authentication and Cryptography” , Network Magazine pp.51-55 Jan.1998.
4. Landau,S “Designing Cryptography for the New Century”, Communications of the ACM, pp.115-120, May 2000. 5. Dyson, P “The Network Press Dictionary of Networking” 2nd ed. San Francisco: Sybex 6. Morse, S, “Network Security: How to stay Ahead of Network Explosion”:Network Computing pp. 54-63, Feb 1994 7. Simonds, F Network Security New York:McGrawHill 1996 8. Schneier B. Applied Cryptography, John Wiley New York 9. Zeilinger,A, Fundamentals of Quantum Information, Physics world, March 1998 10.Stallings,W,Cryptography and network security: principles and practice, 2nd Ed(Indian Ed), Prentice hall
On Use of Operation Semantics for Parallel iSCSI Protocol Ranjana Singh
Rekha Singhal
IT Department, Mumbai University [email protected]
Abstract: iSCSI [1,2], is an emerging communication protocol enabling block data transport over TCP/IP network. The iSCSI protocol is perceived as a low cost alternative to the FC protocol for networked storages [8, 9, 10, and 11]. This paper uses a storage architecture allowing parallel processing of iSCSI commands and presents two novel techniques for improving the performance of iSCSI protocol – the first is the elimination technique for reducing latency caused by redundant overwrites and second technique reduces the latency caused due to multiple reads to same storage sector. The performance analysis shows that use of operation semantics has the potential to improve performance as compared to the existing iSCSI storages. 1. Introduction IP Storage Area Network (SAN) uses IP network storage structure, which enables the separation of data processing and data storage. iSCSI is a new communication standard defined by IETF [1], which encapsulates SCSI commands within a TCP/IP stack. It has a wide potential for acceptance because it leverages the benefits of Ethernet, however, TCP/IP introduces its own overheads which degrades the performance. Optimizing the performance of iSCSI storage systems have been the design goal of storage designers for a long time. Storage devices such as disks or tapes involve mechanical operations for each data access; hence storage operations have been the major bottleneck in an IT system compared to CPU, RAM, and network that have all improved by several orders of magnitudes over the past decade. Technologies to improve storage performance can generally be classified into two categories: caching and parallel processing. The research [4, 5, 6, and 7] discusses the use of reliable and effective caching technology to improve the performance in iSCSI systems. Examples of parallel processing include RAID (redundant array of independent disks) and various parallel interconnect technologies such as InfiniBand, Ultra Wide SCSI, Gigabit
C-DAC, Mumbai [email protected]
FC, etc. These technologies aim at increasing the storage throughput by means of parallel data accesses (RAID), parallel connectivity (InfiniBand and Ultra Wide SCSI), and high data rate (FC). The advent of 10 Gigabit network and inexpensive network components make it economically feasible to have multiple ports at server and multiple ports at a storage system. iSCSI protocol supports multiple logical connections between the storage system and the client to enhance the performance. Therefore, one can have combination of multiple physical links and multiple logical links on each one of those to boost up the performance. The iSCSI protocol makes use of these multiple logical parallel connections to improve performance but collects and orders the commands in a buffer queue in a session before execution. The protocol makes sure that all commands are executed in the order at the storage system as they were originally generated from the initiator side. This essentially results in serial execution of the commands coming from parallel ports, thereby limiting performance gains achieved through parallelism. Executing commands out of order but committing them in the order of their generation at initiator can improve network storage performance. Yang [12] has proposed a novel methodology for processing parallel packets to improve the performance. However, we have gone one step ahead by exploiting the properties of operations to further improve the performance. This paper proposes two innovative ways of harnessing the properties of reads and writes to improve the performance of parallel command processing algorithm. Commands are processed as soon as they arrive at the target. Data dependencies among the operations are resolved without delays. The results of these parallel executions will commit in the order of their initiation. Performance of the system is further enhanced by eliminating the redundant overwrites to the storage, thereby reducing latency and also checking the latency due to multiple reads to the same storage sector.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 433–440. © Springer Science+Business Media B.V. 2008
434
SINGH AND SINGHAL
Our iSCSI target follows the architecture of parallel command processor as discussed in [12]. It consists of a command processor that receives commands in parallel from the parallel network ports and process them according to the algorithm discussed in Figure 2. It maintains a command queue that buffers received storage commands, a reservation station that records commands being executed; a Read commit cache and a Write commit cache that stores the data for a read and write operation respectively. The basic strategy is to process the command as much as possible without violating its dependencies with respect to other out of order command. Such an execution is recorded in the reservation station associated with dependencies of the command on other commands in the reservation station as well as commands that have not arrived at the storage yet. The actual commits of such executions will be done in the order of commands sequence. To describe how the parallel command processing algorithm works, let us first consider the following example command sequence.
Target
Initiator
TCP Connection
Physical interconnect
Fig. 1: An iSCSI session consistingof multiple TCP Connections over several physical interconnects
2. System Architecture and the Parallel Algorithm Our system consists of a number of application servers sharing target storage. Multiple logical and physical connections exist between the server and the storage, shown in Fig 1. The storage system should therefore be able to handle multiple logical as well as physical connections from multiple servers.
EXAMPLE: Suppose the correct command sequence generated by the initiator is
1. read block B : RB1 2. read block A: RA2 3. write block B : WB3 4. write block C : WC4 5. read block A: RA5 6. read block B: RB6 7. read block A: RA7 8. write block C: WC8 9. write block A : WA9 10. read block C: RC1 The sequence of these commands is shown below: The iSCSI protocol defines command sequence numbers for all storage commands issued from an initiator to target storage. The number above each command indicates this command sequence number. RB1
RA2
WB3
WC 4
RA5
RB6
RA7
WC 8
WA9
During the network transmission of these commands, the arrival sequence to the storage has been changed as follows: 1. Write block A : WA9 2. Write block B : WB3 3. Read block C: RC10 4. Read block A: RA5 5. Read block A: RA2 6. Write block C: WC4 7. Read block A: RA7 8. Read block B: RB1 9. Read block B: RB6 10. Write block C : WC8 The arrived storage commands will be buffered in a command queue as shown below after all commands have arrived at the storage.
WA9
WB3
RC10
RA5
RA2
WC4
RA7
RB1
RB6
WC8
RC10
ON USE OF OPERATION SEMANTICS FOR PARALLEL ISCSI PROTOCOL
Execution of commands from different servers starts as soon as they arrive at the command queue. In case of the given example, the execution starts as soon as command WA9 arrives even though its sequence number is not what is expected by the storage that is expecting RB1. Here, we are effectively allowing parallel and out of order execution of these commands in the command queue. However, there are data dependencies among these commands. For example, there is a read after write (RAW) dependency between RB6 and WB3. And there is a write after write (WAW) data dependencies between WC8 & WC4. RAR dependency can be seen between the commands RA7 and RA5.There is a write after read (WAR) data dependency between RB1 and WB3. To guarantee data correctness and allow parallel out of order execution, we start executing these arrived commands by putting them in a data structure called Reservation Station as shown in the Table 1. As shown in this reservation station, command #6 at reservation entry RS[8] has a RAW data dependency on command #3 at RS[1] and command #3 has a
WAR dependency on command #1 buffered in reservation station entry 7, RS[7], and so forth. As the commands are arriving out of order, some of the data dependencies shown in the table may not be available while commands are arriving at the storage. However, the command dependencies are readily available whenever a command arrives after comparing with the expected command sequence number. In this case, all the 7 commands in the top 7 rows of the table arrived earlier than command #1 and therefore such dependencies are marked in the CMD dependencies column of the reservation station. When a command is in the reservation station, it is eligible for execution. The results of such execution will be buffered in a data cache called commit cache. The commit cache has a separate read cache and a write cache. Each entry in the cache has an additional field indicating the condition for commit.
Table 1 Reservation Station CMD Seq #
Storage CMD
LBA
CMD dependency 1,2,3,4,5,6,7 ,8
Data Pointer to Cache
Data dependency
0
9
W
A
CWi
WAR, RS[3]
1
3
W
B
1,2
CWj
2
10
R
C
1,2,4,5,6,7,8
CRi
3
5
R
A
1,2,4
CRj
4
2
R
A
1
CRj
5
4
W
C
1
CWk
6 7 8 9
7 1 6 8
R R R W
A B B C
1,6 -------------------
CRj CRk CRk CWl
RS Entry #
RAW, RS[8] WAR,RS[7] RAW, RS[5] WAR,RS[0] RAR,RS[4] RAR,RS[6] RAR,RS[3] RAW,RS[2] WAW,RS[9] RAR,RS[3] WAR RS[1] RAW RS[1] WAW RS[5]
Table 2: Commit Cache Pointer
Read Cache Data Dependency
CRi
C
1,2,4,5,6,7,8
CRj CRk
A B
1,2,4 ----
435
Write Cache Pointer Data CWi A CWj B CWk C CWl C
Dependency 1,2,3,4,5,6,7,8 1,2 1 ----
436
SINGH AND SINGHAL
When command #8 finally arrives at the storage and is placed in RS [9], we will be able to determine all data dependency conditions and fill out the last column of the reservation table. After the completion of successful execution of command #8, the command dependency column of the reservation station will be cleared and the algorithm will commit all the commands in the reservation station by transmitting data in the commit cache to the initiator for read operations and committing writes to the storage. Whether the committed data come from the read cache or write cache will depend on data dependency conditions listed in the last column of the reservation station. Cache table 2 shows the cache image after all commands in the sequence have arrived at the storage.
Data are committed as follows: 1. Block A in CWi will be written to storage for command W A9 2. Block B in CWj will be written to Storage for command :WB3 3. Block C in CWl will be sent for command: RC10 4. Block A in CRj will be sent for command: RA5 5. Block B in CRj will be sent for command: RA2 6. Block A in CRj will be sent for command: RA7 7. Block B in CRk will be sent for command: RA1 8. Block B in CWj will be sent for command: RB6 9. Block C in CWl will be written to storage for command: WC8
Table 3 Reservation Station (After applying operation semantics)
0
9
Storage CMD W
A
CMD Dependency 1,2,3,4,5,6,7,8
1
3
W
B
1,2
CWJ
2
10
R
C
1,2,4,5,6,7,8
CRi
3
5
R
A
1,2,4
CRj
4 5 6 7 8 9
2
R
A
1
CRj
Data Dependency WAR, RS[3] RAW, RS[8] WAR,RS[7] RAW, RS[9] WAR,RS[0] RAR,RS[4] RAR,RS[6] RAR,RS[3]
7 1 6 8
R R R W
A B B C
1,6 -------------------
CRj CRk CRk CWl
RAR,RS[3] WAR RS[1] RAW RS[1] ---------
RS Entry #
CMD Seq #
LBA
Data Pointer to cache CWi
Table 4: Commit Cache Read Cache Pointer CRi CRj CRk
Write Cache
Data C
Dependency 1,2,4,5,6,7,8
A
1,2,4
B
----
Pointer
Data
Dependency
CWi
A
1,2,3,4,5,6,7,8
CWj
B
1,2
CWl
C
------
437
ON USE OF OPERATION SEMANTICS FOR PARALLEL ISCSI PROTOCOL
As shown in the example, the data dependencies such as WAR are eliminated by renaming destination address (separate cache entries for them). RAW dependencies are also resolved and data are forwarded between the two caches. WAW dependency is eliminated by eliminating the Write with a lower command number as the Write with a higher command number overwrites the the one with a lower command number. Data dependencies are then correspondingly changed. In the given example, WC4 command at reservation station entry, RS [5] has a WAW dependency on command WC8 at reservation station entry RS [9] and since there is no read in between them, the entries in the row corresponding to lower command i.e command #4 at reservation station entry RS [5] have been eliminated ,as shown in Table 3.Data dependency of command #2 at RS [3] is changed to RAW, RS [9] instead of RAW, RS [5] to maintain data consistency. Thus, latency involved due to overwriting to the storage is eliminated, thereby potentially improving performance. Data dependency such as RAR is eliminated through Lookup cache. As shown in Table 1, there are three Read operations performed on the same block A (reservation station entries RS [3], RS [4] and RS [6]). Data is
read from storage for command #5 at RS [3] and stored in Read cache at location pointer CRj .All the subsequent Reads on block A are assigned the same cache location and data is read from Read commit cache rather than from main storage. This reduces latency due to multiple Reads as data is fetched directly from Read cache rather than from main storage, thus improving performance. As the access latency of storage devices is quite large, this out-of-order execution of commands, in the given example, saves a considerable amount of time. By the time the data for #8 arrives at the storage, the data for Read and Write operations of the 9 commands that have arrived ahead of command #8 have been in the commit cache or on the way to the commit cache. There is also reduction in latency by elimination of redundant overwrites to storage, thereby improving performance. As research [12] has shown that around 30% of packets arrive at an iSCSI target out of order for 3 network connections, the algorithm for parallel command processing using operation semantics, has potential for improving storage network performance. The general algorithm for parallel command processing has been given below. The operation semantics applied to the algorithm has been highlighted in blue in Figure 2.
Figure 2 . The general algorithm for parallel command processing using operation semantics
Figure2.
A storage command arrives at the storage called Current_CMD; /**reserved Execution stage**/ Compare Sequence_No(Current_CMD) with Sequence_No_Expected If Not Match Then { Find an entry in RS, called RS[free]; RS [free].S# <- Sequence_No[Current_CMD]; RS [free].CMD <- CMD [Current_CMD); RS [free].LBA <- LBA [Current_CMD); RS [free].size <- Size [Current_CMD); RS [free].CMD_dependency <- Sequence_No_Expected, Sequence_No_Expected+1, … Sequence_Number[Current_CMD)-1 Examine all entries in RS to find any data dependency; If data dependency is found with RS[i] then {RS [free].Data_Dep <- ( the data dependency, RS[i]); RS[i].Data_Dep <- (reverse of the data dependency, RS [free])}; If CMD [Current_CM]==Read Then {Lookup RS and Commit cache if there is a hit; If hit in read cache, called CR[i] Else {Allocate an entry in read cache called CR[i]; RS [free].DataPtr <- CR[i]; CR[i].Dependency <- RS [free].CMD_dependency ; CR[i].Data <- read data at LBA [Current_CMD] from storage} } Else {Allocate an entry in write cache called CW[i]; RS [free].DataPtr <- CW[i]; CW [i].Dependency <- RS [free].CMD_dependency ; CW[i] <- Data [Current_CMD] }; }
cont.
438
SINGH AND SINGHAL
Else /** the command sequence number matches the expected sequence number**/ {For all i, Look up RS[i].CMD_dependency to find a match for Sequence_No[Current_CMD] If there is no match then issue Current_CMD for normal execution Else {Find an entry in RS, called RS [free]; RS [free].S# <- Sequence No[Current_CMD]; RS [free].CMD <- CMD [Current_CMD); RS [free].LBA <- LBA [Current_CMD); RS [free].size <- Size [Current_CMD); Examine all entries in RS to find any data dependency; If data dependency is found with RS[i] then {RS[free].Data_Dep <- ( the data dependency, RS[i]); RS[i].Data_Dep <- (reverse of the data dependency, RS[free])}; If CMD [Current_CM] ==Read then {Lookup RS and Commit cache if there is a hit; If hit in read cache, called CR[i] Else {Allocate an entry in read cache called CR[i]; CR[i].Data <- read data at LBA [Current_CMD] from storage or CW} RS [free].DataPtr <- CR[i]; Clear Sequence_No[Current_CMD] in all RS[i], CR[i], CW[i]} Else {Allocate an entry in write cache called CW[i]; RS [free].DataPtr <- CW[i]; CW[i] <- Data [Current_CMD]; Clear Sequence_No[Current_CMD] in all RS[i], CR[i] CW[i]} }; /**Elimination of Writes**/ Examine all entries in the WRITE CACHE Loop till no double occurrences encountered {Find double occurrences of the same Sector If found then {Look for CR[i] and CR[j] in RS.DataPtr for all i if there are hits; If CR[i].Data_Dep has WAW AND CR[j].Data_Dep has WAW then { If CR[i].Data_Dep OR CR[j].Data_Dep has RAW, RS[k] then {look up RS[k].S# If RS[k].S# > S# of both WRITEs then { update the data dependency column AND eliminate the row of the WRITE with the lower S# value } } } Else { update the data dependency column AND eliminate the row of the WRITE with the lower S# value} } } /**Commit Stage**/ Loop For i do { If RS[i].CMD_dependency = null then { If RS[i].Data_Dep = null then Commit data pointed by RS[i].Data_Ptr Else Commit data based on dependency and CMD sequence; } }
ON USE OF OPERATION SEMANTICS FOR PARALLEL ISCSI PROTOCOL
439
3. Performance Analysis
Assuming there are B blocks in the storage system,
The proposed protocol has potential for improving performance for two types of dependencies- WAW and RAR. We observed that for WAW the later write overwrites the former so no need of executing the former one. Similarly in RAR, once first Read has read from hard disk, the later Reads read directly from the earlier one if there is no other Write between these two Reads. We compare the performance of the proposed protocol with the one proposed in [12] and the baseline storage system that orders arrived commands before execution. Consider the following assumptions as mentioned in [12]. • Let Ω be number of unexpected packets with greater command sequence number arrived before the out of order packet arrives. • Let δ be time delay from the time the first unexpected packet arrives till the out-of-order packet arrives at the storage. This indicates the waiting time for an unexpected packet before it is processed. • Using uniform distribution to simplify our analysis, the average waiting time experienced by an unexpected packet is δ (Ω +1)/ (2* Ω).
Number of unexpected packets which are Write = Ω /2 (it could be either Write or Read uniform distribution)
The storage service time for baseline storage system is: Sb = S + δ (Ω + 1)/ (2* Ω) The parallel algorithm proposed in [12] has service time as: Sp = S + (δ – S) ^2 / (2* δ) if S < δ, else Sp= S, where Post execution waiting time = (δ –S)/2 and Prob (packet needs to wait) = (δ –S)/ δ In the proposed algorithm, the probability for a packet to wait will depend on its dependencies. The packet will wait only if it is a Write and there is no Write after it on the same block, or it is a Read and there is no Read before it on the same block. Assuming there is no other dependency between these two Writes and two Reads respectively, Prob( packet needs to wait)= (δ – S)/ δ * [Prob(it is Write and no Write after it on same block) + Prob(it is Read and no Read before it on same block)] .
Number of unexpected packets which are written on same block = Ω / (2*B) Similarly, number of unexpected packets which are read on same block = Ω / (2*B) Prob(the packet is Write and no Write after it on same block i.e. it is the last Write on the same block) = 2*B/ Ω . Similarly, Prob(the packet is Read and no Read before it on the same block i.e. it is the first Read on the same block) = 2*B/ Ω Prob( packet needs to wait)= (δ – S)/ δ [4*B/ Ω], lesser B and higher Ω lowers the probability for packet to wait and hence improves the service time. In the proposed algorithm, it is observed that the WAW dependency is eliminated by deleting the entire row from the Reservation Station as against redirecting the data pointer as in [12]. Thus, the latency involved due to overwriting to the Storage is eliminated, thereby potentially improving the performance In case of multiple Reads, the reads are performed on the cache rather than the storage. Data is read just once from the Storage. Otherwise it is read from the Cache, depending on the data dependency. Thus, the latency due to fetching of data from the Storage is reduced, hence improving performance. 4. Conclusions We have exploited overwriting properties of ‘Write’ and same data read by consecutive ‘Reads’ to improve the performance of parallel processing of commands for networked data .The prime objective of the paper is to optimize the parallel processing of iSCSI commands using operation semantics to enhance the performance of iSCSI. The theoretical improvement in performance has been shown, however the simulation and experimental set up of the system is under progress.
440
SINGH AND SINGHAL
References: [1] J. Satran , K. Meth, C. Sapuntzakis, M. Chadalapaka, and E. Zeidner. iSCSI draft standard. http://www.ietf.org/ internetdrafts/draft-ietf-ips-iscsi-20.txt. [2] UNH. iSCSI reference implementation. [3] C. Boulton. iSCSI becomes official storage standard. http://www.internetnews.com/storage/article.php/1583331. [4] X. He, Q. Yang, and M. Zhang, “Introducing SCSI-To-IP Cache for Storage Area Networks,” in Proceedings of the 2002 International Conference on Parallel Processing, Vancouver, Canada, Aug. 2002, pp. 203-210. [5] Ding X., Jiang S., and Chen F., “ABuffer Cache Management Scheme Exploiting both Temporal and Spatial Locality”, in Proceedings in ACM Transaction on Storage, Vol 3, June 2007. [6] Binny S. Gill and Dharmendra S. Modha., “WOW: Wise Ordering for Writes – Combining Spatial and Temporal Locality in http://www.iol.unh.edu/consortiums/iscsi/. Non-Volatile Caches”, Binny S. Gill and Dharmendra S. Modha , FAST ‘05.: 4th USENIX Conference on File and Storage Technologies (FAST ‘05’ )
[7] Bigang Li, Ji wu Shu, and Weimin Zheng, “Design and Optimization of an iSCSI System”, in GCC 2004 Workshop, LNCS 3252, pp 262-269. [8] W. T. Ng, B. Hillyer, E. Shriver, E. Gabber, and B. Ozden, “Obtaining high performance for storage outsourcing,” in Proceedings of the Conference on File and Storage Technologies (FAST), Monterey, CA, Jan. 2002, pp. 145-158. [9] S. Aiken, D. Grunwald, A. R. Pleszkun, and J. Willeke, “A performance analysis of the iSCSI protocol,” in IEEE Symposium on Mass Storage Systems, San Diego, CA, Apr. 2003, pp. 123-134. [10] Y. Lu and D. H. C. Du, “Performance study of iSCSI-based storage subsystems,” IEEE Communication Magazine, vol. 41, no. 8, Aug. 2003. [11] P. Radkov et al. “A performance comparison of NFS and iSCSI for IP-Networked storage,” Proc. of the 3rd USENIX Conf. On File and Storage Technologies, CA, 2004. [12] Qing (Ken). Yang “On Performance of Parallel iSCSI Protocol for Networked Storage Systems,” Proc. Of the 20th International Conf. On Advanced Information Networking and Applications Technologies, (AINA), 2006.
BlueCard: Mobile Device-Based Authentication and Profile Exchange Riddhiman Ghosh, Mohamed Dekhil Hewlett-Packard Laboratories, Palo Alto, CA {riddhiman.ghosh, mohamed.dekhil}@hp.com Abstract—In this paper we present a framework for ubiquitous authentication using commodity mobile devices. The solution is intended to be a replacement for the proliferation of physical authentication artifacts that users typically have to carry today. We describe the authentication protocol and its prototypical implementation for a solution designed for the retail industry. We also propose a means of personalizing user–service interactions with embedded user-controlled profiles.
I.
INTRODUCTION
Personalization of content, services and user experiences is considered to be an effective method of fostering customer loyalty and building one-to-one relationships with users; it provides benefits both to the service providers and their customers [13]. Indeed attempts at personalization can be widely seen in online services such as the delivery of music, news, and in electronic commerce. This trend is not confined to the online realm alone however. When Chris Anderson, alluding to online marketplaces in The Long Tail [1] says, “the era of one-size-fits-all is ending”, it is also what customers have come to expect in offline interactions and attempts at personalization are now also increasingly visible in brick-andmortar environments. This, combined with the emerging trend of increased self-service options, has led to the creation of a new set of customer touch-points, typically in the form of kiosks. The HP Labs Retail Store Assistant [6] is one such example targeted at the retail domain; several other examples exist whether it is for shopping, or checking out books or buying movie tickets. As a pre-requisite for many personalization techniques, or simply for authentication purposes, users have to first identify themselves to these touch-points. The inconvenience of entering a username/password combination on these service front-ends is typically avoided by giving users a physical token, such as a badge or card, with which they can authenticate themselves, following the familiar ‘you hold this token so it is you’ model. The problem presently however is that users have to carry a proliferation of these different physical tokens in their pockets for use with various services, e.g. multiple grocery store loyalty cards for different store chains, library cards for different libraries, ATM cards, etc. Typically these tokens exist in various form factors such as RFID badges, magnetic stripe cards, barcode passes, etc. and are also service-specific, as a result of which they cannot be easily used across different services or be repurposed.
In this paper we propose a solution that intends to do away with this proliferation of physical identity tokens. Our principal contributions in this work fall into the following three themes: ● Ubiquitous Identity Token: We describe an approach to replace multiple physical identity tokens with a single Bluetooth-enabled ubiquitous computing device such as a user’s cell-phone or PDA. Specifically, the BlueCard1 is a device-derived certificate encoded with authentication and profile information, which is stored on a user’s mobile device. This single token can be used with multiple services, and can be easily updated or repurposed in the future. ● Addressing the ‘Deployment Problem’: A significant problem with mobile devices is solution deployment due to the large diversity of device operating systems and runtimes—the platforms are varied, proprietary or obscure. While some phones support a Java Virtual Machine (Java ME) the VMs supported vary in flavor (e.g. PP or MIDP on CDLC or CDC) and version [10]. Still others may use Microsoft Mobile, Palm OS, Symbian OS, or the BREW (Binary Runtime Environment for Wireless) platform. This complicates deployment since one solution cannot effectively run on these different platforms without significant effort in re-designing and porting the solution to work with the specific capabilities of each platform. There are business issues that contribute to the deployment problem as well—for largely commercial (and arguably security) reasons most cell phone carriers in the United States have a “walled garden” approach when it comes to the software applications that can run on a phone and only carrierapproved software can be loaded. Thus for a custom software solution to work on most phones, one would need to negotiate with different carriers or device manufacturers. Our solution addresses the deployment problem and is designed in such a way so as to work on these diverse runtimes on common mobile devices (any Bluetooth phone with the Object Push Profile) without having to port and maintain independent codebases for each of these different platforms. ● Portable Profiles: We also propose that the BlueCard be a portable way for users to carry their profile or preference information (or a subset thereof) along with themselves on their mobile devices so as to enhance user–service interactions. II. APPLICATION SCENARIO The application scenarios that we have focused on, and which have motivated this solution, are based in the retail 1
portmanteau of the terms Bluetooth and ‘loyalty card’
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 441–446. © Springer Science+Business Media B.V. 2008
442
GHOSH AND DEKHIL
industry. The HP Labs Retail Store Assistant [6] is a platform designed to enhance total customer experience in the retail environment and create a personalized shopping experience for customers. The primary delivery mechanism of these services for users is through an in-store kiosk that is integrated with store-owned information systems, external web services and user-owned devices. Our initial focus is targeted towards this retail solutions platform. Consider a user who walks into a store, clicks a button on a kiosk and uses his phone to identify himself. The kiosk recognizes him, pulls his service preferences from his phone and presents him with a personalized shopping list and product recommendations. Later if the user goes to a different chain of stores, or to the library to check out books, or to the video store to rent DVDs—all he needs to identify himself to these different services is his cell phone; he does not need to carry a bunch of ID cards or remember various combinations of usernames and passwords. We can state that the desiderata of a solution to be used in these scenarios are: a ubiquitous and repurposable authentication token that solves the mobile deployment problem, can aid in personalization and is easy to use with a familiar use model. III. THE BLUECARD APPROACH The BlueCard is a certificate which is derived from the hardware address of a mobile device and stored on the device itself. Similar to a X.509 digital certificate [7] which binds a public key to an identity, the BlueCard binds a device to an identity; the device is thus a first-class participant in the authentication transaction rather than merely a bearer of an identity certificate. The major components of the overall system are shown in Figure 1. The user-owned mobile device (P) is any Bluetooth enabled device such as a cell phone that supports the OBEX Object Push Profile (OPP) [4]. The OPP standard allows for the transfer of high-level objects between devices, and along with the headset profile, is amongst the most widely implemented of the Bluetooth specifications on cell phones. The service front-end (K), such as a kiosk as shown in Figure 1, is the touch-point to the customer and is the means of delivery of the services or content provided by the service and authentication back-end (S). A. Certificate Content and Format The BlueCard (B) at a minimum encodes the Bluetooth hardware address of the containing device, an identifier of the subject owning the device and a message authentication code for tamper-detection of the contents of the certificate. Also optionally encoded is user profile information (or a subset thereof) of the subject, which can be used by the service. We discuss the benefits of this optional information later in the paper in Section III-C. Our choice of the certificate encoding format for the BlueCard is constrained by our requirement of designing a mobile-platform independent solution as regards to the deployment problem mentioned in Section I. For instance, the storage, parsing, loading or transfer of a DER/PEM or XML
Figure 1: Overview and major components of the BlueCard system
encoded certificate from a device’s persistent store would require a platform specific implementation for the device. However this software footprint on the device can be avoided if one is able use a format for storing structured information that is natively supported by mobile devices. The vCard format (VCF) [15] is one such format that is natively supported by all OPP devices. VCF was designed as an open specification for personal data interchange and it specifies an interchange format for collecting and communicating information about people and resources (typically contact information). A VCF object is a structured collection of properties that may include not only information usually found on a business card such as names, addresses, telephone numbers but also other types of information describing the resource such as audio, graphical objects or geo-positioning data. One of the reasons VCF is widely supported on mobile devices like cell phones and PDAs is because the specification allows for a simple clear-text encoding. We have chosen to use VCF as the persistence format for the BlueCard, and its contents are packaged into the existing fields of a VCF contact object, as can be see in Figure 1. B. BlueCard Protocol The BlueCard protocol defines the following two main operations between a user’s mobile device and the service it wishes to authenticate with (via the service front-end): • One-time registration of a mobile device for use with the service; this involves the creation of a new BlueCard. • Subsequent recurrent logins, where the BlueCard stored on the device is used for authentication.
BLUECARD: MOBILE DEVICE-BASED AUTHENTICATION AND PROFILE EXCHANGE
Figure 2: BlueCard Protocol — registration and certificate creation
1) Certificate Creation: A graphical representation of the registration step of the BlueCard protocol is shown in Figure 2, along with the sequence of events and messages between the different entities involved in the creation of a new BlueCard. Typically an existing account/identity of a user with the service is to be associated with the new BlueCard. The user initiates the registration process at a Bluetooth enabled front-end such as a kiosk. The front-end then starts a discovery of its Bluetooth neighborhood by broadcasting ‘inquiry’ messages in order to locate proximal devices. If not already so by default, the user needs to put his mobile device in Bluetooth ‘discoverable’ mode to ensure that it can respond to ‘inquiry’ scans. (This is only required during the initial registration process, and the user’s device need not be in discoverable mode during subsequent uses of the BlueCard.) On completion of discovery the front-end presents to the user a list of the friendly names [8] of the proximal devices that have been discovered and asks the user to choose his device from the list. Since the range of a Bluetooth receiver is typically around 10 m., devices in addition to the one the user is holding may have been discovered—the user is therefore asked to identify the device he holds. The user’s claim that he indeed holds the particular device he has selected is then verified in order to prevent the user from (inadvertently or otherwise) selecting a device belonging to someone else. This is achieved by employing a type of challenge-response test where the frontend generates and displays a random challenge code (say a string of around 5 or more digits), and the user is then prompted to enter this code on his mobile device numeric
443
keypad. The response that is transmitted through the mobile device is then matched with the issued challenge code for verification and thus only the user who is at the front-end and is able to operate the claimed device will be able to successfully complete this verification. The entire challengeresponse process can be realized by leveraging the built-in bonding capabilities of the Bluetooth stack of the device and the front-end, with custom implementation only on the serverside (in this case the front-end) and not the user’s device. The service front-end then interrogates its Bluetooth stack for information about the user’s mobile device, including its Bluetooth hardware address. This address (or a unique identifier derived from this address by combining it with other information) is used in the generation of the BlueCard, and is sent off to the service by the front-end. It should be noted that there are other alternatives to the front-end obtaining the hardware address of the user’s Bluetooth device. For instance [14] describes an approach of using visual tags such as special barcodes encoded with the Bluetooth address, intended to be read by a suitable visual tag reader such as a camera, to bypass the built-in Bluetooth discovery mechanisms. We chose not to adopt this approach as user devices cannot be used ‘out-of-thebox’ without affixing special tags, and would also need the use of suitable tag readers. The service creates a tamper-proof BlueCard certificate in the VCF format as mentioned in Section III-A, with the existing subject identifier of the user and the obtained device Bluetooth hardware address. The certificate is then “pushed” onto the user’s device using the Bluetooth Object Push services. 2) Certificate Integrity and Authenticity: It is important for the authentication back-end to be able to both, ensure that the BlueCard has not been tampered with, and also confirm that it is the authority that generated the BlueCard and that its contents are authentic. This can typically be achieved by using digital signatures, with the signature embedded along with the content in the BlueCard itself to allow verification of integrity and authenticity. However from an implementation perspective this approach may have a drawback. Consider the contents of a BlueCard that has been MD5 hashed (128 bit), RSA encrypted (1024-bit key) and base-64 encoded—the result will be a string 172 characters in length. This will need to be packed into an existing VCF field in the BlueCard. While the VCF specification does not prevent us from storing strings of this length or longer, we have seen through practical experience with different commodity cell phones and PDAs that many do not handle well strings of this length in a VCF field and truncate anything above a certain length, e.g. 50. We therefore instead choose to embed Keyed-Hash Message Authentication Codes (HMAC) [9] in BlueCards to provide for integrity and authenticity checks. This results in ‘signatures’ that are much smaller instead (24 characters in length) and well within the practical VCF field length limits seen on commodity mobile devices. While HMAC creation and verification rely on a shared secret, in our case the same authority that is generating the BlueCard is also the one that verifies it. For a BlueCard B, if Ball represents all fields except the HMAC Bhmac, then,
444
GHOSH AND DEKHIL
between the two devices should they wish to securely exchange data in the future. However requiring users to pair every time they intend to login with their mobile devices is unreasonably cumbersome. This is especially true in the application scenarios envisaged for BlueCard, since it is unlikely a user will always login at a kiosk or front-end he has visited (and paired with) before. We have therefore designed the login protocol to not require paired connections, and users can interact and login with front-ends they have not encountered before without having to first pair with them. This makes the entire login experience simpler and faster. Note that a link protected with pairing passkeys provides us no advantage since a BlueCard, like an X.509 certificate, is public and does not contain information that needs to be encrypted. If the policy of a user’s mobile device does not allow non-paired links by default or design, and if the user is unwilling to change this policy, he can perform pairing on-demand on encountering a service front-end for the first time. Figure 3: BlueCard Protocol – login with mobile device
B = Concat(Ball , Bhmac), where Bhmac = HMAC(Ball) 3) BlueCard Login: When a BlueCard is pushed onto a user’s device it is stored as a regular VCF contact, and the user can access it just as he would any other contact. A mobile device may have more than one BlueCard for use with different services—they can be identified based on the name of the service with which they are intended to be used (this is possible since the BlueCard stores the service name in one of the VCF name fields, such as the formatted name field FN). Figure 3 shows the steps and sequence of messages involved in a BlueCard login. The user uses his mobile device to send the stored BlueCard to the service via the front-end. While the shortcut keys and specifics of the user interface to accomplish this task vary between different mobile devices, they essentially involve selecting a proximate Bluetooth device (the kiosk / other service front-end identified by a displayed friendly name), selecting the service’s BlueCard and pressing ‘send’. On receiving the BlueCard the service checks it for integrity and authenticity. It then uses the information contained in the certificate, and matches it with the device information available from the Bluetooth stack. Since the certificate is bound to the device, authentication will succeed only if the registered device is used. Upon successful verification, the service uses the identity information contained in the BlueCard to successfully complete the authentication. Compared to the one-time registration, login is a much simpler process (and much more frequent activity) with only one step required to be performed by the user—to send the service BlueCard. Most OPP devices have built-in shortcuts to achieve this effortlessly. 4) Pairing Considerations: Pairing (or bonding) in Bluetooth is a process where the two end-points of a Bluetooth link can agree to exchange data and form a trusted ‘pair’ by exchanging passkeys. This allows for a trust relationship to be maintained
C. Profile in your Pocket We propose in this paper that the BlueCard be a portable means for users to carry their profile and preference information along with themselves on their mobile devices so as to enhance user–service interactions. Currently there are two approaches to storing user profiles [11] — they can be stored on the service-side, with users not having much control over this information, or stored entirely user-side, with the disadvantage that the information is not diverse or rich enough to be useful to a variety of different services. We believe in a combination of these two approaches, with users using the BlueCard to assert general preferences and information about themselves, which can then be combined with service-specific user profiles that are maintained at the service end. Given the storage limitations that exist on many mobile devices, the BlueCard stores profile information referentially, i.e. it contains a link to a user-authored and controlled profile document, retrieved by the service. The are multiple advantages of the profile-in-your-pocket approach: meaningful personalization can be offered by services with which the user has no established interaction history or user account; the user can adopt different personas while interacting with different services by changing the profile linked to by their BlueCard (thus allowing them assume a different role or persona for a particular service); the profiles are portable between multiple online and offline interaction touch-points. While there are several options for expressing BlueCard user profiles, we have been exploring the use of the FOAF (Friend-Of-A-Friend) format [5], which is an open standard to describe a person and his interests, with tools support to easily create and maintain this information. A related approach in using FOAF for userside personal profiles can be seen in [2]; however their focus has been on modifying the HTTP protocol to include additional headers so as to have GET requests to web servers transmit a user’s FOAF file. The BlueCard can be seen as extending this approach to a personal mobile device, thus unifying personalization of online and offline user–service interactions. We have prototyped simple content customization scenarios
BLUECARD: MOBILE DEVICE-BASED AUTHENTICATION AND PROFILE EXCHANGE
445
Figure 4: Screen-shots of BlueCard: (a) Login using BlueCard on device. (b) Device discovery during registration. (c) Challengeresponse verification during registration.
(a)
(b)
using BlueCard profiles based on interests, location and group membership information. For brevity we do not discuss those details here. IV. IMPLEMENTATION In this section we present some of the pertinent details of our prototype implementation of the BlueCard system, which has been integrated with the HP Labs Retail Store Assistant to provide for authentication and profile exchange. The 3 main entities we will discuss here are: the service front-end; the service and authentication back-end; and the mobile device. The service front-end in the current prototype is a kioskmachine running Windows XP, with the application written in the C# programming language on the .NET framework. All of the Bluetooth-related operations such as device discovery, service discovery, implementation of object push server and client capability and the bonding-based challenge and response functionality is implemented in C++ and exposed through a library. It uses the Bluetooth RFCOMM and SDP interfaces provided by the OS/Widcomm protocol stack. We have exposed the functionality of this C++ library to the front-end C# code using managed-unmanaged Interop based on the Platform Invoke Services of the .NET framework, with custom data marshallers written to marshal the Bluetooth protocol stack data structures. The front-end communicates with the service and authentication back-end via SOAP-based web services. Applications on the front-end such as a kiosk can be commonly written as both, a browser-based application, or a standalone application. To make the services of the BlueCard system easily available to web-based applications that are limited by the browser sandbox, we have written a custom protocol handler tied to a Windows OS service that handles the “bluecard://” protocol. This allows web-browser applications to place, for instance, login links containing “bluecard://”, which then invokes our Windows service that executes the BlueCard registration or login protocols, as the case may be. The service and authentication back-end is implemented as a collection of hosted SOAP web services through which it is accessible to the front-end. It is written in C# and utilizes the cryptographic support in the .NET Framework Class Library and extensions for certificate store and key management, and also for generating BlueCard HMACs using the HMAC-MD5 algorithm. The service component that utilizes the FOAFexpressed BlueCard profiles, is implemented as a web-service wrapper around a Java component using Jena—HP Labs’ Semantic Web framework. On the mobile-device side we can use any Bluetooth-enabled device, such as a cell phone or PDA, which supports the OPP
(c)
standard. We have used BlueCard with multiple devices running different mobile runtimes such as BREW, J2ME and Windows Mobile. Figure 4 shows some screenshots during login and registration using BlueCard. V. DISCUSSION In this section we will discuss the common threats we anticipate in the BlueCard system and our assumptions. The BlueCard certificate binds a particular user identity to a device. A tampering threat constitutes the modification of this identity–device tuple in the certificate. This is prevented by embedding an HMAC in the certificate and verifying it to guarantee integrity and authenticity of the certificate contents. Transferring the BlueCard intended for one device into another and using it to attempt login fails on account of the verification of hardware information of the device taking part in a run of the protocol—however this verification relies on our assumption that the operating system (or third-party) Bluetooth stack accurately reports link and device information. A compromised stack, on being interrogated about device information, may not respond faithfully. This threat maybe avoided by requiring the stack implementation to be cryptographically signed and to load only a trusted stack implementation on the service front-end (which we control). An impersonation threat exists during the BlueCard registration process, when a discovery of the Bluetooth neighborhood is done and the user is asked to select his device from the list of discovered devices—he may intentionally or otherwise select any other device from the neighborhood that does not actually belong to him. The protocol is designed to prevent this by issuing a random numeric challenge on the front-end user interface that the user is expected to enter on his device, and only if the response is accurately transmitted from the claimed device is the registration successful. That only a user who both, sees the random challenge and correctly enters the response on the claimed device, can actually hold the device, we believe is a reasonable assumption to make. Since BlueCard follows the familiar ‘you hold this token so it is you’ model another threat that exists pertains to the loss or theft of the user’s mobile device. A common way to deal with this threat is to require a knowledge-based secret such as a PIN, in combination with the device, and the same applies for BlueCard. This tradeoff between security and usability (requiring additional input) we believe is best left to the requirements of the particular application domain. For instance, in the retail domain loyalty programs, it uncommon to see knowledge-based input requested in addition to the presence of a physical token, while in other domains both are
446
GHOSH AND DEKHIL
required. The BlueCard is suitable in both types of applications. VI. RELATED WORK Physical authentication artifacts such as RFID-, barcode- or magnetic swipe-badges are currently widely used; however the disadvantage for users using these schemes is that they have to remember to carry a special per-service token to authenticate, these are not repurposable, and this has resulted in a proliferation of ID badges in users’ wallets. There exist mobile solutions for authentication that try to recreate the PC/browser experience at the device by asking users to visit special authorization web-pages on their cell phones. For instance [16] describes a system where a user receives a special URL using SMS on his phone, to which he then needs to navigate using his device WAP browser and allow or disallow a proxy to authenticate on his behalf. Most mobile browser interfaces are not suitable for complex interactions. These solutions not only suffer from a usability perspective, but also need a carrier connection for WAP / SMS access, for successful authentication. Similar SMS schemes are also used for twofactor authentication, where a special code sent through SMS has to be entered to authenticate, as in [12]. The disadvantage of SMS solutions is the loss of privacy—users have to divulge their cell phone numbers to the service they are authenticating with, which can then be used in perpetuity by the service to send unwanted spam. Also SMS and network access over the carrier are typically fee-based and hence the user is essentially paying for every authentication transaction. Our solution is privacy preserving since it does not need revelation of a user’s cell phone number, is based only on a proximity sensitive adhoc connection between a user’s device and the service, and is also not fee-based. Near-Field Communication (www.nfcforum.org) enabled phones can be used as contact-less smart cards, and thus offer a means of authentication using mobile devices. In our solution the mobile device is not just a bearer of an identity certificate, but rather we use a device-derived certificate making the mobile device a first-class participant in the authentication transaction. NFC phones apart from being expensive are far from wide market adoption and presently very few companies manufacture or sell them. What is needed is a solution using currently widely available commodity devices. Reference [3] describes a smart-phone based solution for physical access control. Their solution requires custom code to be running on the phone which complicates deployment on a wide variety of mobile devices given diverse device platforms. Moreover their solution requires multiple modes—camera, Bluetooth, SMS and MMS to work together for their distributed proving scheme for authentication. Our hardware and phone requirements are much more modest and can work on commodity Bluetooth cell phones with OPP support; we also provide service personalization support. For every login their solution requires a GPRS data connection since it requires the grantors and grantee to all be available and the authorization is done in a distributed fashion. Apart from latency and data communication costs, a scheme such as this where multiple parties are required for logins does not fit in with our application scenarios.
VII. CONCLUSION In this paper we describe BlueCard, a ubiquitous system for authentication and personalization using commodity cell phones. The authentication protocol relies on a device-derived certificate and is designed in such a way as to not require a custom software implementation on cell phones. We also propose a way for BlueCard to be used to personalize user– service interactions through user-controlled embedded profiles. We have presented a discussion of the anticipated threats and assumptions and also discussed our prototype implementation on the HP Labs Retail Store Assistant kiosk. We would like to further examine our solution from a usability perspective, conduct user studies to determine how the BlueCard userexperience can be improved, and explore other alternatives in BlueCard personalization. REFERENCES [1] Anderson, C., The long tail: why the future of business is selling less of more, Hyperion, 2006. [2] Ankolekar, A., Vrandecic, D., Personalizing web surfing with semantically enriched personal profiles, in Bouzid, M., Henze, N., editors, in Proceedings of the Semantic Web Personalization Workshop, Budva, Montenegro, 2006. [3] Bauer, L., Garriss, S., McCune, J., Reiter, M., Rouse, J., Rutenbar, P., Device-enabled authorization in the Grey system, in Proceedings of the 8th Information Security Conference (ISC’05), Singapore, 2005, pp. 431– 445. [4] Bluetooth SIG, Object Push Profile, in Specification of the Bluetooth system: profiles, 2001, pp. 339-364, available at http://www.bluetooth.com/Bluetooth/Learn/Technology/Specifications/ [5] Brickley, D., Miller, L., FOAF vocabulary specification 0.9, available at http://xmlns.com/foaf/0.1/. [6] Hewlett-Packard Press Release, HP shows off system that affords every customer a personal shopper, May 2007, available at http://www.hp.com/hpinfo/newsroom/press/2007/070529b.html . [7] International Telecommunication Union, ITU-T recommendation X.509, 2005, available at http://www.itu.int/rec/T-REC-X.509/en [8] Kindberg, T., Jones, T., Merolyn the phone: a study of Bluetooth naming practices, in Krumm, J. et al., eds., Proceedings of the 9th international conference on Ubiquitous Computing, Innsbruck, Austria, 2007, pp. 318335. [9] Krawczyk, H., Bellare, M., Canetti, R., HMAC: keyed-hashing for message authentication, IETF RFC 2104, 1997. [10] McCready, J., Integral Java: a single solution for bypassing the pitfalls of split stacks, in Java Developers Journal, 11(8), August 2006. [11] Mulligan, D., Schwartz, A., Your place or mine?: privacy concerns and solutions for server and client-side storage of personal information, in Proceedings of the 10th conference on Computers, Freedom and Privacy: Challenging the assumptions, Toronto, Canada, 2000. [12] Pullar-Strecker, T., NZ bank adds security online, in Sydney Morning Herald, Wellington, November 8, 2004. [13] Riecken, D., Personalized views of personalization, in Communications of the ACM, 43(8):27-28, August 2000. [14] Scott, D., Sharp, R., Madhavapeddy, A., Upton, E., Using visual tags to bypass Bluetooth device discovery, in ACM SIGMOBILE Mobile Computing and Communications Review, 9(1), January 2005. [15] Versit Consortium, vCard: the electronic business card, version 2.1, 1996, available at http://www.imc.org/pdi/vcard-21.txt. [16] Wu, M., Garfinkel, S., Miller, R., Secure web authentication with mobile phones, in DIMACS Workshop on Usable Privacy and Security Software, 2004.
Component Based Face Recognition System Pavan Kandepet and Roman W. Swiniarski Department of Mathematics and Computer Science San Diego State University San Diego, California 92182 Email: [email protected], [email protected]
Abstract-The goal of this research paper was to design and use a component based approach to face recognition and show that this technique gives us recognition rates of up to 92%. A novel graphical user interface was also developed as part of the research to showcase and control the process of face detection and component extraction and to display the recognition results.The paper essentially consists of two parts, face detection and face recognition. The face detection system takes a given image as the input from which a face is located and extracted using 2D Haar Wavelets and Support Vector Machines. The face region is then used to locate and extract the individual components of the face such as eyes, eyebrows, lips and nose which are then sent to the face recognition system where the individual components are recognized by using Wavelets, Principal Component Analysis and Error Backpropagation Neural Networks. Pattern dimension reduction technique is used to significantly reduce the dimensionality and complexity of the task.
I.
INTRODUCTION
Humans have an innate ability to identify and recognize faces in distorted images, cluttered scenes and complicated scenarios with minimal effort. This is the result of millions of years of evolution where massive amounts of neurons in the human brain operate in parallel to visualize and identify faces. Although artificial face recognition is not a new topic, the challenge of developing a perfect face recognition system still remains unsolved. Automated face recognition systems involve the use of a database where the challenge is to train the system to identify a given test image as belonging to a known individual from the database or rejecting it as unknown. The task of recognition involves object detection and object classification. Object recognition involves detecting and locating the presence of a face in a given image and then to classify the given image. Approaches to face recognition can either be holistic or component based. In the holistic approach, the entire face is treated as one unit and further classification on the entire region is carried out. In the component approach, individual components in the face are detected and then recognition of those individual components are carried out. The latter is the focus of this research paper. The motivation for a component approach is partly due to the fact that global or holistic approaches do not capture minute differences between facial components such as differences left and right eyebrows, left and right eye etc. When we incorporate classifiers specifically designed to recognize individual components of the face, research has shown that this leads a better performance in the overall face recognition rate. Changes in shape and size of the faces can pose a significant problem to holistic approaches which are overcome by component approaches since the area of focus is much smaller. Also, changes in illumination, shape etc.., affect only specific components of the recognition system thus leading to a better recognition system. There are many important steps to be carried out before the actual recognition takes place. The images must be digitally
preprocessed, patterns must be formed, features selected, classified and finally recognized. These steps are discussed in the following sections. We start this section with an introduction to digital image processing and the particular compensation technique used in this research. Next, we introduce the concept of a pattern and the different approaches for pattern formation followed again by the particular approach taken here. II. DIGITAL IMAGE PROCESSING AND PATTERN FORMATION We start this section with an introduction to digital image processing and the particular compensation technique used in this research. Next, we introduce the concept of a pattern and the different approaches for pattern formation followed again by the particular approach taken here. A. Digital Image Processing: Digital image processing refers to the wide range of techniques for manipulating and modifying digital images to satisfy certain objectives. An image can be represented in a coordinate system whose origin is the upper left corner of the image. For monochrome images, each pixel takes the values of a function f(x, y) where x and y are the distances from the origin. The value of the function at any point is the intensity or brightness of the light detected at that point. 1) Digital Image Processing Techniques: In face recognition, we have as input a digital image that needs to be recognized. The success of this depends on how good the quality the input image is. Different image processing techniques can be applied to enhance the quality of the image before we start the actual process of recognition. We will restrict ourselves to intensity enhancement techniques, specifically histogram equalization. It can be safely assumed that the input images that we use for recognition are not distorted to render it difficult if not impossible for further processing, but only have slight changes in intensity distribution. Histogram equalization is useful in adjusting images with backgrounds and foregrounds that are both bright or both dark. A key advantage of the method is that it is a fairly straightforward technique and also an invertible operation. If the histogram equalization function is known, then the original histogram can be recovered. The calculation is not computationally intensive. A disadvantage of the method is that it is indiscriminate and it may increase the contrast of background noise, while decreasing the usable signal [1]. B.
Pattern Dimensionality Reduction and Feature Selection A pattern can be understood as an entity representing an abstract concept or a physical object. The object to be recognized is characterized by description, representing information about the object commonly called the pattern. A pattern may contain several attributes also called features characterizing an object. Feature and attribute can be used interchangeably. Attributes may be of different types, take
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 447–454. © Springer Science+Business Media B.V. 2008
448
KANDEPET AND SWINIARSKI
values from different domains and have different discriminatory powers. In general, a pattern can be described as a collection or a set of n attributes xi
pattern = x1 , x 2 ,..., x n
(1)
where xi is a single attribute taking values from a domain Vi [2].
A pattern is usually represented by column vectors, where each element xi corresponds to a pattern attribute and take on numerical values,
⎡ x1 ⎤ ⎢x ⎥ X ∈ Rn = ⎢ 2 ⎥ ⎢... ⎥ ⎢ ⎥ ⎣ xn ⎦
(2)
The process of pattern formation is two fold: first we need to extract features from the given data set and then create a pattern with the most discernible information elements. The goal of feature extraction is to discover for a given data set reduced features most suitable for a specific processing goal [2]. The task of pattern formation is normally carried out at the end. A few well known feature extraction methods are moments, Singular Value Decomposition (SVD), Cooccurrence matrices, 2D Fast Fourier Transform (2D FFT), Wavelets, Two Dimensional moving average model (2D ARMA), Fourier descriptors, Fractal dimensions, PCA, and Independent Component Analysis (ICA) [2]. III. FEATURE EXTRACTION, PATTERN DIMENSIONALITY REDUCTION AND FEATURE SELECTION
ψ j ,k (t ) = ψ (2 j t − k ), k = 0,1,...2 j − 1
The m-dimensional transformation function
y = F (x)
superimposing Sines and Cosines. In Wavelet analysis, data is processed at different scales or resolutions. To give a better idea, consider looking at a signal with a large window to notice significant or major changes and looking with a small window to showcase fine variations. With Wavelet analysis we can observe both these variations. Wavelet analysis adapts a prototype function called the Mother Wavelet. Temporal analysis is performed with a contracted, high frequency version of the prototype Wavelet, while frequency analysis is performed with a dilated, lowfrequency version of the same Wavelet. Because the original signal or function can be represented in terms of a Wavelet expansion, data operations can be performed using just the corresponding Wavelet coefficients. If we further choose the best Wavelets adapted to our data, or truncate the coefficients below a threshold, the data gets sparsely represented which results in pattern dimensionality reduction. Continuous Wavelet transforms operate at every scale right from the scale of the original signal up to some maximum scale depending on the resolution needed for our analysis. It also generates excess data, all of which might not be useful or worse not meaningful for our task. It was observed that choosing scales and positions based on powers of 2, the so called dydaic scales and positions made the observations much more efficient and at the same time equal accurate [3]. This type of analysis is known as the discrete analysis and can be performed by using filters. 1) Haar Wavelets: Haar Wavelets are one of the simplest orthonormal systems which are generated from the Haar scaling function and the Haar mother Wavelet. The basis function for the Haar Wavelet looks like a pulse shifted along the t-axis, to approximate the original signal or function. The Haar mother Wavelet is defined as,
where (3)
can be considered as a nonlinear transformation of n−dimensional original patterns x into m−dimensional transformed patterns y. The transformation function is always dependent on the domain knowledge and data statistics. The elements yi (i = 1, 2, ...,m) of the transformed patterns y are called features and the m−dimensional transformed patterns y are called feature vectors. These feature vectors represent data objects in the feature space. There are numerous pattern dimension reduction techniques as mentioned in the previous section. We have used Wavelets and Principal Component Analysis (PCA) based techniques in this research. These techniques not only allow dimensionality reduction of a pattern but also provide projection of patterns into better pattern space for classification. A. Wavelets Wavelets are mathematical functions that can be used to gather data into different frequency components and then study each component with a suitable resolution matching its scale. They have advantages over Fourier methods in analyzing physical situations where the signal contains discontinuities and sharp spikes. The fundamental idea behind Wavelets is to analyze data according to scale. This idea is an extension of Fourier analysis where functions are represented by
ψ (t ) = 0
if
0 ≤ t ≤ 12 ,
-1 if
1 2
≤ t ≤,1
(4) and 0
otherwise.
We are interested in detecting and capturing edges in images for our face recognition system. This is due to the fact that facial components have detectable edges in them. Take the eyes for example which have distinctive horizontal edges. The eyebrows and lips have horizontal edges and noses have vertical edges although not very distinctive compared to the other components. Since the discrete Wavelet transform for matrices involves computing the averages and differences of adjacent pixel values in various combinations, we can interpret the coefficients as identifying edge points of the image. The two dimensional image data can be considered as a matrix of rows and columns as follows, the values of f are the MN real numbers { f j ,k } .
⎛ ⎜ ⎜ f =⎜ ⎜ ⎜ ⎝
f 1,1
f 2 ,1
...
f 1, 2 .
f 2, 2 .
... .
f 1, M
f 2, M
...
f N ,1 ⎞ ⎟ f N ,2 ⎟ . ⎟ ⎟ f N , M ⎟⎠
(5)
The two dimensional Wavelet transform of a discrete image can be performed whenever the image has an even number of rows and columns. A single level Wavelet transform of an
449
COMPONENT BASED FACE RECOGNITION SYSTEM
image f is defined using any of the one dimensional Wavelet transform by following the two essential steps, 1) Perform the single level one dimensional Wavelet transform on each row of f resulting in a new image. 2) On the new image thus obtained, perform the one dimensional Wavelet transform on the columns. The results remain the same if the order of computation of the wavelet transforms are reversed. We can symbolize the first level transform as follows,
⎡ Approx Coeff Horizontal Coeff ⎤ f →⎢ ⎥ ⎣Vertical Coeff Diagonal Coeff ⎦
Wavelet transform (Haar)
Image
Decomposition at different levels Approx Horz Cfs Cfs Vtcl Cfs
Dgnl Cfs
Approx Cfs – Approximation Coefficients Horz Cfs – Horizontal Coefficients Vtcl Cfs – Vertical Coefficients Dgnl Cfs – Diagonal Coefficients
(6)
In the above matrix each one of the sub matrices: Approximation, Horizontal, Vertical and the Diagonal coefficients represent one image. The sub image Approximation Coefficients is created by computing trends along the rows of f followed by computing trends along columns. It is an averaged, lower resolution version of the image f. The Horizontal Coefficients is created by computing fluctuations along the columns. The Vertical Coefficients is created in the same manner as Horizontal Coefficients except that the roles of the horizontal and the vertical are reversed. The Diagonal Coefficients is calculated using fluctuations from both the columns and the rows. Trends represent average summations while fluctuations represent average differences between successive measures. By successively applying the Wavelet transform to the approximation coefficients from each stage, we can get more sub levels and at each level the resolution decreases. This results in a decrease in the amount of information stored in the image. The idea of pattern formation with regard to images is to have as much information as possible with the least possible resolution. When applying Wavelets to images, a transformation to the second level is enough to retain meaningful edge information with little degradation in resolution. The dimension of the image is reduced to half the original size after successive decompositions. In this research, we used the second level Haar Wavelet transform for the face detection and first level decomposition for the remaining parts. The justification is due to the fact that the face dimension is quite large and concatenating all the rows at this level would lead to a very large pattern. The second level decomposition maintains enough resolution to provide us a pattern that can be effectively used in pattern recognition. Also, as mentioned previously, increasing the pattern size thinking that it might hold more information for either detection or recognition is not always guaranteed. The curse of dimensionality plays a significant role in letting us choose the optimal number of elements for a pattern. Fig. 1 shows the procedure for applying wavelet transforms for first level pattern formation. B. Principal Component Analysis (PCA) PCA is a well proven technique and is usually one of the first steps in more complex pattern dimensionality reduction techniques. Excessive dimensionality can be reduced by combining features [5]. Simple pattern dimensionality reduction techniques like linear transformations reduce the dimension of a pattern by projecting it into lower dimensions. They are simple to compute and are also analytically tractable [5]. PCA is based on the statistical characteristics of the data being analyzed which is represented by the covariance matrix,
Horz Cfs
Vtcl Cfs
Arithmetic Addition
Edge pattern
Concatenate Rows = Wavelet Pattern
Fig. 1. Technique for forming wavelet pattern. The rows of the final transformed pattern are concatenated to create a final single row pattern
its Eigenvalues and the corresponding eigenvectors. PCA determines the optimal linear transformation,
y = Wx
(7)
of a real valued n-dimensional pattern x into another mdimensional ( m ≤ n) transformed vector y. The mxn transformation vector W is made optimal by exploring statistical correlations among elements of the original pattern and finding possibly reduced compact data representations retaining maximum non redundant uncorrelated information of the original data [6]. This exploration of the original data is based on computing the Eigenvectors and the Eigenvalues of the covariance matrix. The elements of the m-dimensional transformed feature vectors y will be arranged in decreasing order of information content. We can discard the trailing feature elements with lower information content and hence get a reduction of the pattern dimension. If we assume that the observed input patterns has n dimensions, they we can choose only m ( m ≤ n) of these intrinsic independent variables, this is model on which PCA is based. PCA gives us a method to reduce data representation by choosing the feature vectors which are uncorrelated and are placed along the orthogonal directions of the principal components with maximal variance and lower dimensionality. When we want to perform PCA based transformation, we first obtain the Eigenvalues and the Eigenvectors and then choose only the first m Eigenvectors based upon some criterion for our transformation task. Instead of the whole n− dimensional original patterns x one can form the m− dimensional feature vector where m ≤ n with feature vector
y = [ y1 , y 2 ,..., y n ]T containing only the first m dominant Eigenvalues λ1 , λ 2 ,..., λ m of the original covariance matrix. The first m principal components are the most expressive features of a data set. The reduced data set will represent data in new feature space, where feature vectors are uncorrelated and placed along the orthogonal directions of principal components with maximal variances. In order to significantly reduce the dimension of the final pattern, we use the 95% rule in this research. PCA is applied for all the components of the face that are used for recognition. Thus, in this research we use two pattern dimensionality reduction techniques, Wavelet transforms and PCA.
450
KANDEPET AND SWINIARSKI
IV. CLASSIFICATION Pattern recognition is the task of taking in raw data and making an action based on the category of the pattern. This process of pattern recognition involves the task of pattern classification, which tries to make sense of how to distinguish patterns on which the system was trained to being able to identify. Our research involves two classification tasks: The task of face and face component detection and the ability to identify these detected components as belonging to a specific category. We take two approaches for our research, for the face and component detection we use Support Vector Machines (SVM) and for the final classification task of recognizing them, we use Error Backpropagation neural networks. A. Support Vector Machines The SVM principle is based on statistical learning theory and its technique for training data patterns is achieved by applying structure risk management (SRM) providing the best solution within the framework. This is achieved by widening or maximizing the margin between two classes where the framework minimizes the empirical risk. This lower risk will be optimized by the Vapnik Chervonenkis (VC) dimension, with the SVM learning system considering the VC confidence that will be chosen by the number of training examples for generalized mapping function for a new input pattern. In traditional classifiers that try to minimize the absolute value of an error or an error square, the SVM performs SRM. When the VC dimension of the model is low, the expected probability of error is low as well, which means that we can obtain good generalization performance on unseen data. It must be noted that as mentioned before, SVM are a class of binary classifiers. We can also combine many single SVMs to solve an n class problem if needed. For a given learning task with a finite amount of training data, the best generalization performance will be achieved if the right balance is struck between the accuracy attained on that particular training set, and the capacity of the machine to learn any training set without error [6]. For a pattern recognition task, we try to estimate the function,
f ( x) x : R N → {±1}
(8)
1 l ∑ | y i − f ( xi , α ) | 2l i =1
The Remp is fixed for a particular choice of particular training set xi, yi [6]. The quantity
(9)
such that f will correctly classify new examples (x, y) which were generated from the same underling probability distribution P(x, y) as the training data. If there are no restrictions on the class of functions that we choose our estimate f from, even a function that does extremely well on the training data need not generalize well on unseen examples. Hence learning is impossible and minimizing the training error does not imply a small expected test error. Statistical learning or the VC theory has shown that it is crucial to restrict the class of functions that the learning machine can implement to one with a capacity that is suitable for the amount of available training data [7]. The quantity R(α ) is called the expected risk or just the risk. This risk is the mean error rate on the training set for a fixed and finite number of observations. If Remp is the empirical risk,
(10)
α and
for a
1 | y i − f ( xi , α ) | is called the loss and can 2
take values of only 0 and 1 for the case here. If we choose η such that 0 ≤ η ≤ 1 , then the losses for taking these values with probability 1−n has the following bounds,
2l η (h(log( ) + 1) − log( ) 4 h R(α ) ≤ Remp (α ) + l
(11)
where h is the non negative VC dimension which is also the measure of capacity. The VC dimension is a property of a set of function f (α ) where α is a generic set of parameters. A choice of α specifies a particular function. Since SVM’s are binary classifiers,
f ( x, α ) ∈ {−1,1}∀x, α
(12)
If a given set of l points can be labeled in all possible 2l ways, and for each labeling, a member of the set f (α ) can be found which correctly assigns those labels, we can say that the set of points is shattered by those set of functions. The VC dimension for the set of functions is defined as the maximum number of training points that can be shattered by f (α ) . 1) Hyperplane Classifiers: To design learning algorithms we need to come up with a class of functions whose capacity can be computed [19]. Support vector classifiers are based on the class of hyperplanes,
( wx ) + b = 0 w ∈ R N , b ∈ R
(13)
corresponding to the decision functions,
f ( x) = sign(( wx) + b)
using the n-dimensional training data xi and class labels yi,
( x1 , y1 ),..., ( xl , y l ) ∈ R N X {±1}
Remp (α ) =
(14)
the optimal hyperplane has the largest margin among the separating hyperplanes for a given data set. In this section we show how to calculate the optimal separating hyperplane for linearly separable data.
To compute the particular pair (W*, b*) that has the largest margin, compute the margin achieved by any pair (w,b) which satisfies, wxi + b > o if yi = +1 and fwxi + b < 0 if yi = -1. Everything discussed up to this point with regard to SVM’s becomes invalid if the input data is non linear. In this case, the decision function is not a linear function of the input data. Solving this is relatively straightforward. Since the input data appears in the form of dot products xi.xj, we can map the data into another space using a mapping function φ .
φ : Rn → Rm
(15)
COMPONENT BASED FACE RECOGNITION SYSTEM
By choosing a non linear mapping, the SVM can construct an optimal separating hyperplane in this high dimension feature space. Let xi be the input data, then,
φ ( x i ) = (φ1 ( x i ), φ 2 ( x i ),.., φ m ( x i )) ∈ R m ,
(16)
where n ≤ m . When this mapping of all objects in the input space we obtain in the feature space Rm,
φ ( D) = {(φ1 ( x1 ), y 1 ), (φ 2 ( x 2 ), y 2 ),.., (φ m ( x m ), y m } (17) We used SVM for the first part of the recognition system which includes face and face component detection. Wavelet patterns are formed for a given test image and fed successively to SVM based component detectors, which give a output of +1 if it belongs to the positive region or a -1 if it belongs to the negative region. B. Error Backpropagation Neural Networks We used neural networks as SVMs are binary classifiers and although we can combine many single SVM classifiers to solve a multi class problem, it becomes cumbersome. Neural networks implement linear discriminants in a space where the inputs have been mapped nonlinearly. They admit fairly simple algorithms where the form of the nonlinearity can be learned from training data. Multilayer neural networks usually consist of three layers: input layer, a hidden layer and an output layer. Each one of these layers is connected by modifiable weights representing links between layers. There is a single bias unit that is connected to each unit other than the input units. The input to this type of a system represents the components of a feature vector and the output represents the values of the discriminant function used for our purpose of classification. Neural network systems with hidden units and output units have greater computational and expressive power than similar networks that otherwise lack hidden layers. We can have multiple hidden layers if necessary depending on the task at hand. We have specifically used backpropagation networks for the task of face recognition. V. FACE RECOGNITION This section discusses the implementation details of the different components of the face recognition system. We first discuss the component detection followed by component recognition. The reason we wanted to pursue a component approach instead of a global or a holistic approach was because faces have a strong configurational appearance which can be exploited and combinations of multiple classifiers will reduce the inaccuracy of a single detector. A. Face and Component Detection Our implementation of the face and face component detection is completely automatic with no user intervention whatsoever and no manual or statistical hints provided of any sort. In order to implement the detection part, we made use of Haar wavelets and Support vector machines. Six facial components were used, right eye, left eye, right eyebrow, left eyebrow, lips and nose. The other components such as chin, cheek etc. is not very good at showcasing the details necessary for the purpose of classification and hence were not used. The detection first starts with the face and then proceeds with detecting the other components. For the preprocessing stage, we used the histogram equalization technique that was
451
discussed previously to make appropriate changes to the contrast distribution of the image. We also made provisions to change the brightness and if necessary, adjust more contrast settings with controls on the graphical user interface. After preprocessing, the image is ready to be analyzed for face detection. Every image used has a fixed size of 384x286 pixels. This was the fixed size of the images in the BioID [46] database that was used in this research. This particular database has a vast collection of frontal images of faces with varying illumination conditions. To add more variations in faces we also added images from the Caltech [9] database. These frontal images in the database do not cover the entire region of the picture, but usually are rather half the size of the image resolution. Other databases like the Olivetti [10] database which is widely used in face recognition tests have the entire image with only the face and no backgrounds. Our database has varying backgrounds to challenge the recognition and detection of faces. It was not practical to analyze the image at every scale and resolution and hence the sizes of the detectable components were fixed at specific resolutions. The sizes of the components that was chosen based on observations of the database is as follows, 1) Face has the resolution of 160 × 150 pixels. 2) Right eye has a resolution of 50 × 20 pixels. 3) Left eye has a resolution of 50 × 20 pixels. 4) Right eyebrow has a resolution of 60 × 30 pixels. 5) Left eyebrow has a resolution of 60 × 30 pixels. 6) Nose has a resolution of 60 × 40 pixels. 7) Lips has a resolution of 70 × 30 pixels. The procedure for training SVM’s involves the presentation of two classes of patterns, namely the acceptable pattern and the rejection pattern. These are usually called the positive patterns and the negative patterns in SVM literature. As the names indicate, positive patterns belong to one of the two regions in the SVM space where the classification result yields a +1 and the negative pattern yields a -1. Before we start to train the SVM, we need to create the training patterns and we used Haar Wavelets for this. Also, since we are using a component approach we have individual classifiers for each component. Hence we have a total of seven support vector machine classifiers. The process of creating patterns for classification using SVM is straightforward. There is a set of training images for each component, the numerical details of which is discussed in the numerical results section. The training set for each component is divided into positive images and negative images. The positive image is obviously our object of interest such as an eye or a nose etc. and the negative images include everything except the object of interest. When discrete wavelet analysis is applied to a image, the approximation and detail coefficients, depending on our level of decomposition is one half size of the original image. If wavelet analysis is applied to the original size of 384x286 pixels, the first level decomposition will yield us decompositions at 192x143. If further decomposition is applied to this image, the next result will be at 92x71. For the face detection, the training images are at 160x150 and we use the second level decomposition to achieve practical recognition rates. This technique is applied to each image in the training set for both the negative and the positive images. For each image, after the wavelet decomposition is carried out, the horizontal and the vertical detail coefficients are added up and the individual rows in the entire decomposed picture is concatenated into a single row and a +1 or a -1 is added at the
452
KANDEPET AND SWINIARSKI
end depending on the region type. For the face detection, the second level Haar decomposition is used where as for the rest of the components, the first level decomposition is used. The training of support vector machines is also relatively simple. For classification, we use the Radial Basis Kernel type SVM for each of the classifiers. This entire implementation was done using Matlab which has numerous toolboxes. The patterns are presented sequentially to the classifiers until the training is complete for all the training images. Once this is complete, the trained SVM is saved. Each SVM is trained for specific dimensions depending on the object that it was trained for, and will be able to only recognize objects that are presented with the same dimensions. When a new image is presented to the system, the first step is to apply histogram equalization as a preprocessing step. After this, we use a sliding window technique for detecting components in the face. Face detection: The face detection is carried out as follows, 1) Start with a window of size 160x150 at the upper left corner of the input image. 2) Apply second level wavelet decomposition to the preprocessed image and arithmetically add the horizontal and vertical composition coefficients. 3) Concatenate all the rows of the decomposed image and form the pattern. 4) Input this pattern to the SVM face classifier. 5) Obtain the classification result and add it to the end of the pattern along the coordinates at which this image was sectioned from the test image. 6) Move the sliding window by a predefined position horizontally and vertically and continue with step 2. 7) Repeat the above the steps until the lower right corner of the image coincides at the end of the lower right corner of the sliding window. Once the system has finished processing the entire space of the given input image, the SVM classifier for that specific component is also ready with the classification result. After the detection of the face, the right eye is located first, then the left eye, right eyebrow, left eyebrow, lips and finally the nose. Each component is detected using a dedicated SVM specifically designed for that component. B. Component Recognition Once the facial components are detected and extracted from the test image, we make use of Wavelets for pattern formation and pattern dimensionality reduction, PCA for further pattern dimensionality reduction and finally backpropagation neural networks for classification. We make use of seven backpropagation classifiers, one for each of the facial components. Before we are able to start recognizing components, we first need to train the neural network with the patterns that they need to recognize. In order to do this we first create training images of the component of interest. Our database had only raw frontal images and since we needed component images for training, we extracted the facial components manually and divided them into a test set and a training set. We used the sliding window technique here and later separated the images required for training. To train the system, we create a empty matrix called the pattern matrix. Now, for every image in the training set, 1) Obtain the second level Haar wavelet decomposition. 2) Arithmetically add the horizontal and vertical coefficients 3) Form a row pattern by concatenating all the rows. 4) Attach this as a row to a pattern matrix. Once the pattern matrix is ready, we can apply PCA,
1) Calculate the covariance matrix for the pattern matrix. 2) Calculate the Eigenvalues and the eigenvectors. 3) Choose the number of Eigenvalues that contribute to approximately 95% of the total Eigenvalue. 4) Project the original pattern from the pattern matrix into the principal component space by using these number of components from the previous step that contribute 95%.
TABLE I OUTPUT RESULTS FROM A TYPICAL RECOGNITOIN PHASE
Class1 Class2 Class3 Class4 Class5 Class6 Class7 Class8 Class9 Class10
LE 15 4 0 0 99 19 0 0 0 21
LEBrow 0 0 0 22 91 9 2 0 6 0
RE 0 28 1 3 96 12 0 0 11 20
REBrow 10 3 0 4 91 28 25 17 16 29
Nose 0 26 4 0 84 0 0 3 0 4
Lips 7 16 0 10 95 0 0 0 22 1
Result 123 143 22 33 655 45 50 20 55 159
For training the neural network, we created a 1 out of L coding pattern. The actual task of recognition involves conversion of the raw pattern into a suitable pattern for which our system was designed to recognize. As soon as the detection of all the components is complete, the respective image sections are cropped from the original image and sent to the recognition system. In the recognition system the following steps are performed to recognize each component, 1) Apply second level Haar wavelet decomposition to the detected section in the original image. 2) Arithmetically add the horizontal and the vertical coefficients to create one single reduced pattern matrix. 3) Concatenate all the rows of the reduced pattern matrix to form a single reduced wavelet row pattern. 4) Depending on the component, transform this row pattern into principal component space by multiplying it with the chosen principal components that represent 95% of the Eigen value of that component. The Eigen vectors is saved separately for each component from the training stage, also the number of components chosen is also saved. 5) Transpose this into a column vector. 6) Input this column vector into the respective neural network classifier and obtain the output representing the class. Kittler [46] analyzed several classifier combination rules and concluded that the sum rule outperforms other combination schemes based on empirical observations. Unlike explicitly setting up combination rules, it is possible to use the results of individual classifiers by using their outputs to perform a simple summation to obtain a final decision. Each of the classifier output represents how close the presented input pattern matches the pattern with which the classifier was trained. Once this is done for all the components we can finalize the result of classification. Each classifier produces outputs for all the 10 classes that it was trained for, i.e the output layer has 10 neurons, each representing one class. When a pattern is presented to a classifier, we have results for 10 classes, If we consider a table where the columns represent the components and the rows represent the classes, we have a total of 7 columns, each for one component and one for the combined result and 10 rows, one for each class. When a component like say the left eye is presented, we get the results in all the 10 rows for that specific component.
453
COMPONENT BASED FACE RECOGNITION SYSTEM
Similarly, when all the remaining component patterns are presented, we have all the required results. The combined pattern is presented to all the 10 combined classifier, each one of them represented one row. Each of the cells in the table can be considered as a numerical value representing a vote that indicated how close the presented pattern is to the pattern with which the system was trained with. The more closer the pattern is to the trained pattern, the greater the output value (0 ≤ o / p ≤ 1) . This is shown in TABLE 1. When all the patterns are presented and the results of each of the classifiers tabulated, we use the simple addition rule wherein the columns of this table are added and the class that receives the maximum number of votes is chosen as the winner. VI. GRAPHICAL USER INTERFACE We developed a GUI to provide a intuitive and a user friendly interface to the face recognition system. With this, the user of the system is able to visually see the process of a machine based face recognition system. It has five components, 1) File menu 2) Image windows 3) Image controls 4) Face detection and recognition controls 5) Results window The file menu enables the user to select files that used as test images to the system, at this moment the system is limited to recognizing pgm files only. The image window has two sections, the original image and the adjusted image. When a image is first selected, it is loaded into the original image window. It might not be ready for recognition yet, hence a second window is provided to see changes that can be made and visualized as compared to the original image. The original image can be transferred to the adjusted image window by using the transfer image button. The image controls section provides access to image manipulation functions. The Auto adjust button applies histogram equalization to the original image and transfers the new image to the adjusted image section. There are also controls for adjusting the brightness and the contrast settings of the image. There is a slider for the brightness functionality and gain and cutoff sliders for adjusting the contrast of the image. The face detection and recognition controls enable the user to detect specific components of the image and then perform recognition or just perform recognition by automatic detection of all the components as a whole. The results of the recognition are also show in the “Recognition result window”. The results window displays the numerical values of results of performing face recognition on the selected image. This table was discussed previously. VII. NUMERICAL EXPERIMENTS AND RESULTS This section discusses the database that was used in this research and also the test and the training sets. We also discuss the feature extraction, reduction and feature selection methods used to create the final pattern. Finally, we discuss the details of the classifiers designed. A. Face Dataset The face dataset used in this research is unique in the sense that we used a combination of two different face databases, the BioID face database [9] and the Caltech face database [10]. This was done to provide more varied faces to the system for
both training and testing. Six classes were chosen from the BioID face dataset which included three male and three female candidates. In addition, four male candidates were chosen from the caltech face database for a total of 10 classes. Approximately 70% of the images were used in training and the remaining 30% in the test set. The training set had 308 images and the test set had 132 images. 1) Component Dataset: The component detection system was entirely comprised of support vector machines, one for each component of the face including the face component. Therefore we had seven support vector machines. The breakdown of the images for the positive and the negative sections is shown in table below, 2) Pattern Formation and Dimensionality Reduction: We used both wavelets and PCA for dimensionality reduction. For face detection use used the second level Haar Wavelet decomposition and for the other components, we used the first level coefficients. In each case we used the horizontal and the vertical coefficients for creating the patterns. For the face and component detection we make use of support vector machines with the radial basis kernel having a kernel width of 0.1. We arrived with this optimal number by performing many numerical experiments while determining the classification efficiency. TABLE II BREAKDOWN OF POSITIVE AND NEGATIVE IMAGES
Component Face Right Eye Right Eyebrow Left Eye Left Eyebrow Lips Nose
Positive Images 613 512 528 545 718 661 1097 4674
Negative Images 647 72927 68112 72898 67922 65339 65463 413399
TABLE III DIMENSIONALITY REDUCTION OF WAVELET PATTERN TO PCA
Component
Size
Wavelet Pattern
With PCA
Face Right Eye Right Eyebrow Left Eye Left Eyebrow Lips Nose
160x150 50x20 60x30 50x20 60x30 70x30 60x40
38x41 (1558) 11x26 (286) 16x31 (496) 11x26 (286) 16x31 (496) 16x36 (576) 21x31 (651)
81 38 41 43 43 49 59
Since the detection part makes use of only two classes (positive images and negative images), the training is performed sequentially first with the positive images, followed by the negative images. Each image is decomposed using the Wavelet technique as mentioned before by adding the horizontal and the vertical coefficients at the levels described for each level. Then the individual rows of the final pattern matrix are concatenated to form a single pattern. Once the detection is done, the next step is to carry out the recognition process. Our strategy for choosing the number of elements was to use that many number of Eigenvalues that enabled us to reconstruct 95% of the original pattern from the reduced pattern. PCA is the final pattern dimensionality step before we send the data for classification by the neural network. The input pattern to principal component based data reduction is the reduced wavelet pattern from the previous step. Since we are interested only in classification, we need to input only the positive patterns to this system.
454
KANDEPET AND SWINIARSKI
When a new pattern arrives, we already have the eigenvectors and the Eigenvalues for that specific component of interest, we just express this new pattern in terms of these chosen principal components. The amount of data reduction that we obtained by applying principal component analysis is shown in the table below, 3) Applying Backpropagation Neural Networks for Recognition: The backpropagation neural networks have an input layer, a hidden layer and a output layer. We designed one neural network for each component of the face. Since the combined pattern uses images from other components for its training, we used the least common denominator for the training set, the least number of images that had all the components was the Lips with 253 images. TABLE 4 four shows the number of images that were used in the test set along with the recognition accuracy. As shown in the previous table, the system is capable of recognizing never before seen images with a overall accuracy of 91.6%. This can partly be attributed to the individual classifiers having good recognition rates. Since the classifiers are made to focus on individual components rather than one full section such as a face, it is also able to recognize these components with good accuracy. Also, since holistic based approaches focus on the entire face, they miss out on the essential characteristics which can be captured by individual classifiers. TABLE IV BREAKDOWN OF IMAGES IN TEST SET
Class
Test images
Class1 Class2 Class3 Class4 Class5 Class6 Class7 Class8 Class9 Class10
15 13 10 10 13 10 16 15 13 19
Correct Recognition 13 11 9 10 13 10 10 13 11 16
Recognition Accuracy 86.6 84.61 90 100 100 100 100 86.6 84.61 84.21
91.6%
Fig. 2. Complete detection of all components.
VIII.
CONCLUSION
This paper presents a hybrid face recognition technique. We have used a unique approach to pattern extraction, formation and reduction. The recognition accuracy is quite high even with the limited number of training images.
IX. FUTURE WORK As a future enhancement to this system, we propose to have the following incorporated, 1) Face recognition from video. 2) Color based recognition. 3) Being able to obtain classifiers from a central database for more accurate recognition in case of misclassification etc. 4) Using different classification strategies such as rough set classifier for identification.
REFERENCES [1] John C.Russ, Image processing handbook, CRC Press 2002 [2] Krzysztof J. Cios, W. Pedrycz and Roman W. Swiniarski. Data mining methods for knowledge discovery. Kluwer Academic Publishers, Boston 1998. [3] Matlab wavelet toolbox. [4] James S. Walker. A primer on Wavelets and their scientific applications, pages 95-122. Chapman Hall/CRC, 1999. [5] Richard O. Duda, Perte E. Hart, David G. Stork. Pattern Classification. Wiley-Interscience; 2 Sub edition (October 2000) [6] Christopher J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2, 121167 (1998), 1998 Kluwer Academic Publishers, Boston. [7] Marti A. Hearst. SVM trends and controversies Intelligent Systems and Their Applications, IEEE Volume 13, Issue 4, Jul/Aug 1998 Page(s):18 - 28 [8] Simon Haykin. Neural Networks, a comprehensive foundation. pages 178224. Prentice Hall; 2nd edition (July 6, 1998) [9] BioID face database, http://www.bioid.com/downloads/facedb/index.php [10] Caltech face database, http://www.vision.caltech.edu/htmlfiles/ archive.html [11] K. Cios, W. Pedrycz, R. Swiniarski, L. Kurman. Data Mining. A Knowledge Discovery Approach. Springer, New York, 2000
An MDA-Based Generic Framework to Address Various Aspects of Enterprise Architecture S. Shervin Ostadzadeh
Fereidoon Shams Aliee
S. Arash Ostadzadeh
Computer Engineering Department, Faculty of Engineering, Science & Research Branch of Islamic Azad University, Tehran, Iran
Computer Engineering Department, Faculty of Electrical & Computer Eng., Shahid Beheshti University, Tehran, Iran
Computer Engineering Laboratory, Microelectronics and CE Department, Delft University of Technology, Delft, The Netherlands
Abstract - With a trend toward becoming more and more information based, enterprises constantly attempt to surpass the accomplishments of each other by improving their information activities. Building an Enterprise Architecture (EA) undoubtedly serves as a fundamental concept to accomplish this goal. EA typically encompasses an overview of the entire information system in an enterprise, including the software, hardware, and information architectures. Here, we aim the use of Model Driven Architecture (MDA) in order to cover different aspects of Enterprise Architecture. MDA, the most recent de facto standard for software development, has been selected to address EA across multiple hierarchical levels spanned from business to IT. Despite the fact that MDA is not intended to contribute in this respect, we plan to enhance its initial scope to take advantage of the facilities provided by this innovative architecture. The presented framework helps developers to design and justify completely integrated business and IT systems which results in improved project success rate. Keywords: Enterprise Architecture, Model Information Technology, Enterprise Framework.
Driven
Architecture,
I. INTRODUCTION
Business and Information Technology (IT) integration is essential for enterprises to achieve their goals. In recent decades, there have been considerable investments in this field. Unfortunately, many IT investments yield disappointing results. According to [1], 85% of IT departments in the U.S. fail to meet their organizations’ strategic business needs. Some projects result in failure, while others go so far over their budgets that managements eventually kill them (another form of failure). Some projects that initially seemed to be successful proved to be unstable or inflexible over time. Enterprise Architecture (EA) addresses this issue. EA integrates business resources, such as people, machines, facilities, etc., with IT resources, e.g. applications, networks, and clusters, in order to produce efficient business processes. Over time, the expectations of the degree of automation that could be achieved by computing continued to increase. It was no longer satisfying to have islands of automation within the enterprise. The various islands had overlapping functionalities that duplicate information and made interference in automation. As a result, it is necessary to integrate the islands across the enterprise. Applications in such an enterprise can’t be designed using traditional analysis and design methods. We need architecture to address this problem. By using
architecture in IT industry, we have more control over the IT projects in order to prevent their failures. OMG is trying to improve software development by introducing MDA. MDA is “an approach to system specification that separates the specification of system functionality from the specification of the implementation of that functionality on a specific technology platform” [2]. Although, MDA is limited to the software development, it is not clear yet whether or not it can be used to describe systems found in business and IT. In this paper, we investigate the use of MDA in Enterprise Architecture across multiple hierarchical levels ranging from business to IT. Out goal is to propose a generic framework to address various aspects of EA. However, this seems as an extension of the initial scope of MDA, in accordance with our belief that MDA can play a pivotal role in EA. The rest of this paper is organized as follows. In section 2, we introduce some basic concepts and principles. We describe MDA in section 3. Applying MDA to Enterprise Architecture via a generic framework is discussed in section 4. In section 5, our proposed method is compared to traditional methods and the achievements are highlighted. Finally, some concluding remarks and suggestions for future works are stated. II.
BASIC CONCEPTS
In this section, we briefly introduce some basic concepts and principles. We believe these concepts can help readers to clearly understand what we mean by the subsequent ideas presented later in this work. A. Enterprise An enterprise consists of people, information, and technologies; performs business functions; has a defined organizational structure that is commonly distributed in multiple locations; responds to internal and external events; has a purpose for its activities; provides specific services and products to its customers [3]. An IT-related enterprise is an enterprise in which IT plays an important role in its activities. In this paper, we refer to an IT-related enterprise as an enterprise. B. Architecture Architecture has emerged as a crucial part of design process. Generically, architecture is the description of a set of
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 455–460. © Springer Science+Business Media B.V. 2008
456
OSTADZADEH ET AL.
components and the relationships between them. In computer science, there are software architectures [4], hardware architectures, network architectures, information architectures, and enterprise architectures. Software architecture describes the layout of the software modules and the connections and relationships among them. Hardware architecture can describe how the hardware components are organized. However, both of these definitions can apply to a single computer, a single information system, or a family of information systems. Thus “architecture” can have a range of meanings, goals, and abstraction levels, depending on the speaker. C. Enterprise Architecture Enterprise Architecture (EA) is a comprehensive view of an enterprise. EA shows the primary components of an enterprise and depicts how these components interact with or relate to each other. EA typically encompasses an overview of the entire information system in an enterprise; including the software, hardware, and information architectures. In this sense, the EA is a meta-architecture. In summary, EA contains views of an enterprise, including work, function, process, and information, it is at the highest level in the architecture pyramid. For details refer to [5]. III.
MODEL DRIVEN ARCHIECTURE
Here, we briefly summarize MDA’s key models, and show how they can be utilized. More details can be found in [6-8]. A. Models in the MDA MDA separates certain key models of a system, and brings a consistent structure to these models: Computation Independent Model (CIM): A computation independent model is a view of a system from the computation independent viewpoint. A CIM does not show details of the structure of systems. A CIM is sometimes called a domain model and a vocabulary that is familiar to practitioners of the domain in question is used in its specification. Platform Independent Model (PIM): A platform independent model is a view of a system from the platform independent viewpoint. A PIM exhibits a specified degree of platform independence so as to be suitable for use with a number of different platforms of the similar type. Platform Specific Model (PSM): A platform specific model is a view of a system from the platform specific viewpoint. A PSM combines the specifications in the PIM with the details that specify how that system uses a particular type of platform. Platform Specific Implementation (PSI): A PSI is an implementation of PSM. It is the execution code that is run on machine. B. MDA Development Process This section describes how the MDA models relate to each other and how they are used by a development group. A short
description of the MDA development process follows. Initially, the CIM requirements for the system are modeled with the computation independent model. CIM describes the CIM to PIM mapping situation in which the system will be used. It may hide much or all of the information about the employment of automated data processing systems. PIM Typically, such a model is independent of system implementation. The model is created by business analysts. Architects PIM to PSM mapping and designers subsequently create Platform Independent Models (PIM) to illustrate the enterprise’s architecture, PSM with no reference to any specific implementation or platform. A PIM is suitable for a particular architectural PSM to code mapping style, or even several styles. The architect will then choose a (or several) platform(s) that enables the system PSI implementation with the desired (Code) architectural qualities. An MDA mapping provides Fig. 1. specifications for transformation of a MDA Process PIM into a PSM regarding a particular platform. The platform model determines the nature of the mapping. In model instance mapping, the architect marks elements of the PIM in order to indicate the mappings used to transform that PIM into a PSM. The next step is taking the marked PIM and transforming it into a PSM. This can be done manually, with computer assistance, or automatically. A tool might transform a PIM directly into deployable code, without producing a PSM. Such a tool might also produce a PSM, for understanding and/or debugging the code. The outputs of transforming a PIM using a particular technique are a PSM and a transformation record. The transformation record includes a mapping of the PIM elements to the corresponding elements of the PSM, and shows which parts of the mapping were used for each section of the transformation. The platform specific model produced by the transformation is a model of the same system specified by the PIM; it also specifies how that system uses of the chosen platform. Finally, developers or testers will use tools to generate code from PSM. Figure 1 demonstrates this process. IV.
APPLYING MDA TO EA
As we mentioned earlier, the integration of business and IT forces project teams to analyze and design a hierarchy of systems. This hierarchy can contain the following levels [9]: Groups of companies collaborating in business systems People and IT systems collaborating in business processes Software components collaborating in IT systems Programming language objects collaborating in software components.
457
AN MDA-BASED GENERIC FRAMEWORK
Suitable approaches for EA should have the ability to analyze and design a system in hierarchical levels. Does MDA have such ability? Can it be used in enterprise applications? Some researchers [9] argue that this is not the case, as MDA is based on UML and UML does not have enough hierarchical levels. However, it should be noted that OMG has adopted a number of technologies, which together enable the model driven architecture. These technologies include UML, MOF, CWM, XMI, and profiles (such as the profiles for EDOC, EJB, etc). They are used to describe PIMs. A PIM can be refined as many times as needed, until the desired system description level is obtained. Afterwards, the infrastructure is taken into account and the PIM is transformed into a PSM. Subsequently, PSMs are also refined as many times as needed. This result in MDA being more descriptive than UML. MDA can be used in analysis and design of hierarchical levels, and so can be used to describe systems found in business and IT. We address some researches for clarification. Grady Booch [10] states that “Model Driven Architecture is a style of enterprise application development and integration, based on using automated tools to build system independent models and transform them into efficient implementations”. Another quote that refer to the utilization of MDA in enterprise, is stated in D’Souza’s work [11], indicating that “MDA is an approach to the full lifecycle integration of enterprise systems comprised of software, hardware, humans and business practices. It provides a systematic framework to understand, design, operate, and evolve all aspects of such enterprise systems”. Moreover, our previous researches [12, 13] also signify that MDA can be used in EA modeling, and describing its various aspects. In this section we aim to show how MDA can be applied to hierarchical levels of analysis and design needed for EA.
A. A Basic Taxonomy As stated earlier, the common way to comprehend procedures in an enterprise is to provide views of components within that enterprise, which is called architecture. Architecture, such as Data Architecture, represents only a single view of an enterprise, but Enterprise Architecture refers to a collection of architectures which are assembled to form a comprehensive view of an enterprise. Organizing such great amounts of information requires a framework. Various enterprise architecture frameworks have already been proposed; among them are Zachman Framework, FEAF, TEAF, and C4ISR. Each framework classifies enterprise architecture aspects based on its own viewpoints. For example, Zachman Framework refers to Scope, Business, System, Technology, Detailed Representations, and Functioning Enterprise; while, FEAF addresses Technology, Applications, Data, and Business. C4ISR views enterprise architecture aspects as Operational, Systems, and Technical. This is not the case with TEAF which adopts Infrastructure, Organizational, Information, and Functional classification. Apparently, there is no agreement in this respect. Figure 2 depicts a basic taxonomy of different kinds of aspects that are commonly used in enterprise architecture. This taxonomy is not deterministic and some enterprise architecture frameworks use somewhat different classification and/or terms. We present this taxonomy to refer to various kinds of perspectives and their requirements. Here are the basic distinctions in the aspects. • Business vs. Technical: A business aspect describes aspects of the business, irrespective of whether those aspects are stated to be automated or not. A technical aspect describes aspects of an IT system that automate elements of the business. Since technical aspect just
EA
Business
Technology
Dynamic
Static
Logical
Infrastructure
Implementation
Application
Data
Service
Process
Strategic
Organization Chart
Fig. 2. A basic taxonomy of enterprise architecture aspects
Physical
458
•
•
OSTADZADEH ET AL.
models the automatic parts of a business, the scope of a technical aspect is smaller than the scope of a corresponding business aspect. Static vs. Dynamic: A static aspect describes the fixed part of the business. It includes strategic models and Organization Chart. A dynamic aspect describes behavioral part of the business. It contains the processes and services that are provided by the business. Logical vs. Physical: A logical aspect describes the logic of a system. It can include data and applications. A physical aspect describes physical artifacts and resources used during IT systems development and runtime. It contains infrastructure and implementation. Infrastructure describes the underlying services that are used for computing. Implementation describes the physical models used to execute the systems. Implementation models may include model files, source code files, executable files, archive files, and other implementation files.
B. Modeling Approach A business model describes business domain aspects of EA. This model can be drawn by business analysts and domain experts. The business analyst perceives the enterprise’s business processes and the information that the processes use. Generally, using UML business use case and activity diagrams is recommended for describing the business aspects [14, 15]. We can also use CRC cards, which are not UMLbased, to model a business. Since, MDA is based on UML, CRC cards approach is not suitable. Using UML primitives is easy and can be applied by the business analysts and domain experts. However, we prefer to use MDA metamodels to describe various aspects of business domain. Note that business aspects can be a part of CIM models in MDA. Strategic model can be described by BMM and SBVR. Organization chart can be modeled by OSM. We can also employ BPDM to describe business processes and services. Refer to [16] for more information about how these metamodels can be applied in practice. Data models provide data view of IT systems. They are part of the logical aspect. We divide data aspects into Knowledge models, Information models, and Raw Data models. Knowledge models describe the data aspects from the most abstract viewpoint. Knowledge models are CIMs, since they describe data aspects of IT systems in a computation independent way. These models should be refined into a computational model, in order to generate PIMs and PSMs. Information models are PIMs. They describe EA knowledge that should be automated, uninterested in a specific platform. Raw Data models describe data aspects based on a special platform. Data aspects of EA can be model with CWM. For more information see [16, 17]. Applications models describe applications view of logical aspect. They show software applications of IT systems in an enterprise. Applications aspects can be divided into information systems and systems technologies. Information systems models describe the applications from the most abstract viewpoint. They are CIMs. Systems technologies
provide a less abstract viewpoint of the applications compared to information systems models, because they consider technical environment. MDA is an approach that separates the specification of system functionality from the specification of the implementation of that functionality on a specific technology platform. Therefore, according to MDA perspective, system technologies models can be divided into two levels: Platform Independent Technologies (PITs) and Platform Specific Technologies (PSTs). As already explained, the distinction between a PIT and a PST depends on the specification of a reference set of platform technologies. PITs are independent of a special platform, while PSTs are specific to a platform. What distinguishes MDA from other approaches is the description of the applications models. We have already stressed that information systems models describe the application from the most abstract viewpoint. PITs provide a less abstract view because they factor in some considerations of the technical environment. PITs are refinements of information systems models. PSTs provide an even less abstract viewpoint of the applications, because they consider platform-specific environment. PSTs are refinements of PITs, in order to specify some software formatting technologies, programming languages, distributed component middlewares, or messaging middlewares. These refinements are based on platform model. To describe applications models, we can use general-purpose UML primitives, such as class, interaction, collaboration, and use case modeling. We can also utilize WSM and PSM-specific profiles (CORBA, EJB, and .Net) for the PSTs. Physical aspect doesn’t deal with logic aspect; therefore, it provides a very different view of the systems. A physical aspect describes physical artifacts and resources used during deployment and runtime. From an MDA perspective, physical models can drive automated deployment tools [7]. Using UML and/or some other MOF-compliant languages to describe deployment makes it possible to incorporate deployment automation into the MDA effort to automate more of the deployment process. MOF is a very abstract metamodel that every model in MDA is defined in terms of its constructs. UML deployment diagram can describe physical models. MDA tools can generate deployment diagrams from PIM. A generator that reads a PIM and implements a mapping to PSM could generate a skeletal deployment model, since certain outlines of the deployment requirements are only known. Figure 3 illustrates mapping MDA models to the basic taxonomy of enterprise architecture aspects. C. Synchronizing Models There are three basic engineering approaches for synchronizing models in various abstract levels [7]: • “Forward Engineering Only” approach: This approach permits no changes to be made in low level models. The changes can occur from top level models to the low level models. Since synchronization ripples in one direction only, it is not possible for a developer to make changes to low level models (here, PSMs and codes).
459
AN MDA-BASED GENERIC FRAMEWORK
•
•
“Partial Round-Trip Engineering” approach: This approach allows developers to enhance generated low level models. However, enhancement must be additive. Anything generated from the higher level model can not be overwritten or deleted. Furthermore, the developer can’t add anything that affects the higher level of abstraction. In this approach, synchronization also ripples in one direction only, however some local changes are allowed in low level models. “Full Round-Trip Engineering” approach: In this approach, it is permissible to define something at the lower level of abstraction that is reflected back to the upper level. In fact, the developers can add, edit, or delete something in low level models that affects high level models. With this approach, synchronization can ripple both up and down the abstraction levels.
From a software development process perspective (such as MDA), the Full Round-Trip Engineering is the best approach,
since the developers have considerably more freedom to alter the generated artifacts. However, if we want to employ MDA in EA, there are some notes that should be considered. A crucial contribution of EA in enterprise is the ability to enforce architecture. The Forward Engineering Only approach has the greatest capacity to enforce architectural styles that an architecture dictates. The more freedom engineers have, the more difficult it is to ensure architectural enforcement. RoundTrip engineering, whether partial or full, makes it more difficult to enforce architectural styles. Although these approaches have more capacities for engineering, but they are not suitable for EA. V.
PROPOSED METHOD VS. TRADITIONAL METHODS
Let us now take a closer look at the advantages achieved by MDA based EA compared to traditional non-MDA approaches.
EA
Business
Static
Logical
Physical
PSTs
Fig. 3. Mapping MDA models to the basic taxonomy of enterprise architecture aspects
Infrastructure
PITs
Raw Data
PSI
Implementation
Information Systems
Info.
System Technologies
Knowledge
Service
Process
PSM
Dynamic
Strategic
PIM
Organization Chart
CIM
Technology
460
OSTADZADEH ET AL.
A. Productivity MDA can improve productivity in two ways. First, the PIM developers have less work to do since platform specific details need not to be designed and considered. These details are already addressed in the transformation definition at the PSM and code level. Note that there is much less code to be written, because a large amount of the code is already generated from the PIM. The second improvement comes from the fact that in MDA, the developers can shift focus from code to CIM and PIM, thus paying more attention to solving the business problem at hand. This results in a system that fits much better with the needs of the end users.
are needed in an EA. We showed how MDA can support each abstract model. We have suggested that using Forward Engineering Only approach is the best solution for synchronizing models at various abstract levels. Finally, we have highlighted the improvements achieved by using MDA in Enterprise Architecture. In future, it would be a good idea if one investigates the way information in the framework can be modeled by MDA standards. This can serve as a practical guide for an architect who is willing to use MDA in EA, since it specifies which model(s) should be used for special architectural information. REFERENCES
B. Portability Within the MDA, portability is achieved by focusing on the development of CIMs and PIMs, which are platform independent. A PIM can be automatically transformed into multiple PSMs for different platforms. Everything that is specified at the CIM or PIM level is therefore completely portable. C. Interoperability In MDA, multiple PSMs generated from one PIM may have relationships, which are called bridges. When PSMs are targeted at different platforms, they cannot directly talk with each other. In this case, we need to transform elements from one PSM into elements used in another PSM. This is what interoperability is all about. MDA addresses this problem by generating not only the PSMs, but the necessary bridges between them as well. If we transform one PIM into two PSMs targeted at two platforms, all the information we need to bridge the gap between these two PSMs is available. For a given element in PIM, we know which elements corresponding to it are in the PSMs. So, we can find how elements in the first PSM relate to elements in the second one. In this case, we have all the information we need to generate a bridge between the two PSMs. D. Documenting In MDA, we need to document high level models of abstraction. The CIM and PIM models fulfill the function of high level documentations that are needed for any software system. These high level documentations are not new in this approach, since the traditional methods also do it. But the major difference is that the CIM and PIM are not abandoned after creating. In fact, these models are part of our software. When we want to make changes to the system, the changes will be made in CIM and PIM models, and then they are reflected in the lower models. VI.
CONCLUDING REMARKS
Business and IT integration is a critical challenge faced by IT industry. A key to overcome this problem is applying Enterprise Architecture. We have investigated the employment of MDA in EA. MDA seems to be a good solution for supporting the modeling of hierarchical systems. We discussed various models in different abstract levels that
[1] T. Hoffman, “Study: 85% of IT Departments Fail to Meet Biz Needs”, Computer World, October 11, 1999. [2] J. Miller and J. Mukerji, “Model Driven Architecture (MDA)”, Object Management Group, 2001. [3] M.A. Rood, “Enterprise Architecture: Definition, Content, and Utility”, IEEE Trans., 1994. [4] L. Bass, P. Clements, and R. Kazman, Software Architecture in Practice, 2nd Edition, Addison-Wesley, 2004. [5] C.M. Pereira, and P. Sousa, “Enterprise Architectures: Business and IT Alignment”, Proceedings of the 2005 ACM symposium on Applied computing, Santa fe, New Mexico, 2005. [6] MDA Guide, OMG document, available at http://www.omg.org.mda/, 2003. [7] D.S. Frankel, Model Driven Architecture: Applying MDA to Enterprise Computing, OMG Press, John Wiley & Sons, 2003. [8] S.J. Mellor, K. Scott, A. Uhl, and D. Weise, MDA Distilled: Principles of Model-Driven Architecture, Addison Wesley, 2004. [9] A. Wegmann, and O. Preiss, “MDA in Enterprise Architecture? The Living System Theory to the Rescue…”, Proceedings of the 7th IEEE international Enterprise Distributed Object Computing Conference (EDOC’03), 2003. [10] G. Booch, B. Brown, S. Iyengar, J. Rumbaugh, B. Selic, “An MDA Manifesto”, MDA Journal, May 2004. [11] D. D’Souza, “Model-Driven Architecture and Integration”, Kinetium, March 2001. [12] S.S. Ostadzadeh, F. Shams, and S.A. Ostadzadeh, “A Method for Consistent Modeling of Zachman Framework,” Advances and Innovations in Systems, Computing Sciences and Software Engineering, Springer, pp. 375-380, August 2007. (ISBN 9781-4020-6263-6) [13] S.S. Ostadzadeh, F. Shams, and S.A. Ostadzadeh, “Employing MDA in Enterprise Architecture,” Proceedings of the 12th International CSI Computer Conference (CSICC’ 2007), pp. 1646-1653, February 2007. [14] A. Fatholahi, An Investigation into Applying UML to Zachman Framework, MSc thesis, Shahid Beheshti University, Tehran, 2004. [15] C. Larman, Applying UML and Patterns: An Introduction to Object-Oriented Analysis and Design and the Unified Process, Prentice Hall, 2002. [16] S.S. Ostadzadeh, An MDA-Based Unified Modeling Approach for Zachman Framework Cells, MSc thesis, Science & Research branch of Islamic Azad University, Tehran, 2006. [17] J. Poole, D. Chang, D. Tolbert, D. Mellor, Common Warehouse Metamodel: An Introduction to the Standard for Data Warehouse Integration, OMG Press, John Wiley & Sons, 2001.
Simulating VHDL in PSpice Software Saeid Moslehpour#1, Chandrasekhar Puliroju#2, Christopher L Spivey#3 Department of Electrical and Computer Engineering, University of Hartford, West Hartford, CT, USA
#1,#2,#3
#1
[email protected] [email protected] #3 [email protected]
#2
Abstract— This paper incorporates the use of PSpice to simplify massive complex circuits. This involves the simulation of a VHDL based designed processor in PSpice software. After reading through the various properties that have been displayed, it would seem as an easy method of how a VHDL coded program can be easily converted into PSpice module. This can be treated as an assessment tool for the students which not only help them interact with the circuitry in VHDL, but, become much more involved with the practical aspects of PSpice software.
I. INTRODUCTION The present project serves as an application of the knowledge gained from past studies of the PSpice program. The study will show how PSpice can be used to simplify massive complex circuits. In other words, the study will involve the breakdown of a complex circuit designed by a graduate student at the University of Hartford. The circuit is a VHDL Synthesis Model of an 8-bit processor. The purpose of the project is to explore the designed processor model piece by piece, examine and understand the Input/ Output pins, and to show how the VHDL synthesis code can be converted to a simplified PSpice model. The project will also serve as a collection of various research materials about the pieces of the circuit. Background information will be given to give the reader a basic understanding of the parts to support the further exploration in the project methods. This report is useful for students who would like to know more about the PSpice program and how it is useful for many educational purposes. II. SIGNIFICANCE OF THE PROJECT The study of VHDL and PSpice is important because technology students are using computer software to design and analyze circuits. A better understanding of computers and computer languages can be gained when exploring these programs. It is also important to show that a VHDL model can be represented in a working PSpice schematic. Each part of a processor, whether it be an Arithmetic Logic Unit or a simple Program Counter- has a specific set of input and output logic that can be emulated from PSpice. If the logic is already known to the user, then the user does not have to build the part using each individual logic gate. Instead, a circuit designer can create a PSpice part that gives the desired logical outputs. Most importantly, this study can help bridge a gap between those who simulate with VHDL and those who use PSpice.
III. CONVERTING VHDL TO PSPICE This section will cover how to successfully convert a VHDL synthesis model to a working PSpice part. First, the user would have to extract the program to C:\Program Files. This part is important because the program will not work unless it is extracted to the specified directory. It will create a directory in C:\Program Files called VHDL2PSpice. The contents of the folder will show all the necessary files. The folders created are Capture Symbols, DelphiExamples, Library, and Source. These folders are necessary for the program to save the appropriate Capture Model symbols and library files. Now, the user should open the Lattice Synplify program. Once in the program, click FILE -> NEW -> and start a new project. The window shown should pop-up onto the screen. Click Project File and give the project a name. NEXT, a project interface screen will show up. Right click on the project and click add source file.
Fig. 1 New Synplify Profile [8]
The option should be in the pop-up menu that appears when clicking on the project. Once the VHDL file is added, the user should now be able to view the file in the source window by clicking on the source folder and then the file.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 461–466. © Springer Science+Business Media B.V. 2008
462
MOSLEHPOUR ET AL.
Fig. 4 Synplify procedure [8]
Fig. 2 VHDL code [8]
A window with line numbers shown above should be shown on the screen. However before synthesis, certain conditions need to be set up by the user in order for the synthesis to properly work. Right click on the icon in the project window that appears as a ‘blank screen’. It should be the last item in the project window. Click on Implementation options. A window should appear with several tabs and options.
Fig. 3 Synplify settings [8]
Make sure the Technology option is set to LATTICE MACH, and the part option is set to MACH111. Click the box in the Device Mapping Options window that says “Disable I/O insertion.” NEXT, go to the implementation tab and click the Write Mapped VHDL Net list check box. After this is done, the program is ready to synthesize. Click the RUN button back on the project menu. On the right window, several files will be generated.
We are interested in the gate level VHDL file. This is the file with the extension VHM on the right list. Double click on the VHM file to view its contents. Once there, the file should be saved. Click File-> SAVE AS and save the file in C:\ProgramFiles\Vhdl2PSpice\Source. Once the file is saved, the user can now exit the Synplify program as it is no longer needed for the conversion. Now in C:\ProgramFiles\Vhdl2PSpice directory, run the Vhdl2PSpice.exe utility. A window will pop-up on screen with a button labeled “Convert Vhdl 2 PSpice.” Enter the name of the VHDL file that was saved into the source directory. There is no need to type the extension, just the name of the file. After OK is clicked, the user should enter the name of the top level entity (in this example the device is called COUNTER). Click OK again and almost immediately, the conversion will take place. The user should see a success window appear. The program has now created output files that can be used in PSpice. Go to the LIBRARY folder in VHDL2PSpice to view the converted files. The file that the user should look for will be the same name, however with the word “copy” starting off the filename. The extension of the file is .LIB. There is also a file called LIBPOINTER.LIB, which is a library file that tells PSpice where to look for the created parts. Next the user should open the PSpice Model Editor program and open the created library file that was converted. It should still be found in the VHDL2PSpice library directory. Click on the part name to see the structural representation of the VHDL code that was converted.
SIMULATING VHDL IN PSPICE SOFTWARE
463
IV. APPLYING THE CONVERSION STEPS TO THE PROCESSOR MODEL The next objective will be to successfully synthesize and convert the code of each VHDL model to a PSpice library and object file. The following files will be converted in this project:
Fig. 5 Converted code [8]
The next part is very important. Click FILE and click Export to Capture Part Library. The library file is already created, however the program will not know how to display the part unless it is properly exported.
Fig. 8 Files used for conversion [9]
These files are named pul (using the first three letters of the creators name) followed by a word that represents the part. This makes it easier to follow. Illustration
Fig. 6 Exporting the part to library [8]
Export it to the VHDL2PSpice library called “Capture Symbols.” It is alright to copy the files here because the Libpointer.lib file will tell PSpice where these files are saved. When the Capture program is opened, the user should be able to place the symbol onto the schematic. The part should work with PSpice simulation as well. The Fig 8 shows the VHDL device successfully working in PSpice.
The first file we will synthesize will be the ALU file. The file is called pulalu.vhd. This file can be saved anywhere on the computer since the VHDL2PSpice program will not need this. The file that the VHDL2PSpice will use will be the synthesized version.
Fig. 9 Readying files in Synplify [9]
Fig. 7 Part shown in schematic page
By following these steps the first file is synthesized. As can be seen on the right window, the file PULALU.VHM has been created. This is the file that will be saved to C:\program files\vhdl2PSpice. Now the synthesis program is finished and
464
MOSLEHPOUR ET AL.
it is time to run the VHDL2PSpice program to create a library file. This window shows that we are ready to enter our filename. After entering the name of the part, a message box will appear and tell the user that the conversion was successful. The entity name must match the entity title located in the VHDL file, it cannot be a made up label. It should be remembered that the library file generated will say copy_filename_lib. In this case, the filename in the LIBRARY folder is named “copypulalu.lib”. Now, the file is ready to be opened with PSpice Model Editor. This program will allow the library file to be exported to the Capture Library. Once in the model editor, the copypulalu.lib file should contain one part named “alu”. When exporting to Capture Part Library, the .olb file should be exported to the Capture Symbols folder that is a part of the VHDL2PSpice folder. When placing a part, the library will be located in the same capturesymbols directory. This is the .olb file and not the .lib file.
U1 ACLK ACOUT4 MDAT7 ACOUT3 MDAT6 ACOUT2 MDAT5 ACOUT1 MDAT4 ACOUT0 MDAT3 ALOUT7 MDAT2 ALOUT6 MDAT1 ALOUT5 MDAT0 ALOUT4 OPCD2 ALOUT3 OPCD1 ALOUT2 OPCD0 ALOUT1 ACOUT7ALOUT0 ACOUT6 ZR ACOUT5
ACLK MDAT 70 OPCD 2-0
ACOUT 7-0
ALOUT ZR (zero bit)
A. CONVERTED ALU
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
PINS
16 17 18 19 20 21 22 23 24 25 26 27 28 29
ALU Fig. 10 Converted ALU [9]
WHEN ASSERTED Clock trigger 8 bit data code ( 1 or 0 depends on data word) 3-bit opcode that determines the operation the ALU must perform 8-bit data coming from accumulator to serve as an operand. The 8-bit result output from the ALU ALU result is 00000000
WHEN DEASSERTED Clock trigger 8 bit data code ( 1 or 0 depends on data word) (same)
ALU result is anything other than ZERO
Fig. 11 ALU assertion chart [9]
The remaining parts of the processor will be implemented using the same steps as the ALU since it is known that the program works successfully. These parts and their descriptions are shown as follows: B. CONVERTED CLOCK GENERATOR:
1 2 3
U2 RSTREQ CLK2 CLK FCH CLK1 RST
4 5 6
CLKGEN Fig. 12 Clock Generator, [9]
This is the converted ALU. The device, just like the VHDL model has all of the input/output pins. The only difference is the orientation of the 8-bit input/outputs. This circuit can be simplified further by placing it into a hierarchal block and sending a bus line that covers all 8 bits. This device has an ACLK pin, which means that the opcode is read and the ALU performs at the RISING edge of the clock.
RSTREQ: (input) Reset pin. Clears clock on high signal. Inactive on low signal CLK, CLK1, CLK2: Clock generators necessary for processing cycles. RST: (output) Sends high signal for other components to be reset FCH: (output) used as interface with program counter.
465
SIMULATING VHDL IN PSPICE SOFTWARE
C. CONVERTED PROGRAM COUNTER:
E. CONVERTED ACCUMULATOR U5
U3
1 2 3 4 5 6 7
PCLK RST LDPC ADIR4 ADIR3 ADIR2 ADIR1
ADIR0 ADPC4 ADPC3 ADPC2 ADPC1 ADPC0
8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10
CLK RST LDAC ALOUT7 ALOUT6 ALOUT5 ALOUT4 ALOUT3 ALOUT2 ALOUT1
PRGCNT Fig. 13 Program Counter [9]
11 12 13 14 15 16 17 18 19
ALOUT0 ACOUT7 ACOUT6 ACOUT5 ACOUT4 ACOUT3 ACOUT2 ACOUT1 ACOUT0
ACCUMULATOR
PCLK: Clock signal is sent to this input pin to allow the PC to count upward. RST: This line resets the program counter with a high signal LDPC: When set to high signal, the program counter sets the PC with the value at ADPC 4:0 ADIR 4:0: The value of PC that is passed to the instruction register. ADPC4:0: The value that is returned to the PC. D. CONVERTED INSTRUCTION REGISTER
Fig. 15 Accumulator, [9]
CLK: clock pin runs the accumulator, since in theory the accumulator is made up of latches. RST: Resets the accumulator to an ‘empty state’ LDAC: When this pin is asserted the accumulator is loaded with the value that flows through ALOUT (input pins) ALOUT: This 8-bit set of pins is the data passed from the ALU ACOUT: This 8-bit set of output pins sends a value to the ALU for operation.
U4
1 2 3 4 5 6 7 8 9 10
CLK RST LDIR MDAT7 MDAT6 MDAT5 MDAT4 MDAT3 MDAT2 MDAT1
MDAT0 ADIR4 ADIR3 ADIR2 ADIR1 ADIR0 OPCD2 OPCD1 OPCD0
11 12 13 14 15 16 17 18 19
F. CONVERTED DECODER U6
1 2 3 4 5 6 7
CLK: The clock pin is necessary for the device works with a rising clock edge. RST: On a rising clock edge, the Instruction register becomes reset LDIR: Load Instruction Register. High signal allows the signal from the PC to be input into ADIR4:0 OPCD: This is the three bit operation code used for operations of the ALU.
LDIR LDAC MRD MWR LDPC PCLK ACLK
8 9 10 11 12 13 14
DECODER
INSREG Fig. 14 Instruction Register [9]
CLK1 CLK2 FCH RST OPCD2 OPCD1 OPCD0
Fig. 16 Instruction Decoder [9]
The decoder contains pins that enable the respective control lines. G. CONVERTED IO BUFFER
1 2 3 4 5 6 7 8 9 10
U7 MRD ALOUT0 FCH MDAT7 CLK2 MDAT6 ALOUT7 MDAT5 ALOUT6 MDAT4 ALOUT5 MDAT3 ALOUT4 MDAT2 ALOUT3 MDAT1 ALOUT2 MDAT0 ALOUT1 IOBUFFER
Fig. 17 Input/ Output Buffer [9]
11 12 13 14 15 16 17 18 19
466
MOSLEHPOUR ET AL.
MRD: Memory Read, allows the data held in the buffer to be passed out of MDAT 7:0. FCH: Allows the device to be written to with the 8-bit data word of ALOUT 7:0 CLK2: Clock signal that runs the part (rising clock edge) H. CONVERTED MULTIPLEXER
U8
1 2 3 4 5 6 7 8
ADPC4 ADPC3 ADPC2 ADPC1 ADPC0 ADIR4 ADIR3 ADIR2
ADIR1 ADIR0 FCH ADMEM4 ADMEM3 ADMEM2 ADMEM1 ADMEM0
9 10 11 12 13 14 15 16
MUX Fig. 18 Multiplexer [9]
ADPC 4:0: PC address to be sent to memory and Instruction register. I. CONVERTED MEMORY
V. CONCLUSION The first problem to note was that halfway through the components; the synthesis program gave “warnings” yet still synthesized the files. These warnings only existed with the IO BUFFER, MULTIPLEXER, and MEMORY. The program stated that there were illegal statements/declarations. However, the parts of the project that were successfully synthesized proved to work with the CAPTURE program just fine. Another problem lies in the program’s disregard to the synthesis errors. When a part is created from the synthesis file, the running part will crash the software. This error was not fixed, as it was not a PSpice software error, or an error with the VHDL2PSpice program. The problem existed in the VHDL errors. It is not known whether the errors are small or large, or whether or not the parts are complete. However, the project goal has still been met. VHDL parts can successfully be converted to PSpice and can be displayed and simulated in a schematic. Given fully functional code, a complete processor would have been assembled and simulated. It is possible to connect all of the modules just as the VHDL top module, however, without working code; there is no reason to attempt to simulate the modules. Also, the RISC design provided contained no set of pre-programmed code or language. The model seemed to only be an empty shell of what a RISC processor looks like from an outside perspective. The project can be deemed a failure in terms of building a fully functional processor, however the bridge between PSpice and VHDL has been met. The knowledge has been gained on how to convert the VHDL modules to PSpice. However the VHDL programmer must be sure that the code is fully functional. There is no software known as of yet that will convert PSpice to VHDL.
U9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
MRD MWR EWR RST MAD4 MAD3 MAD2 MAD1 MAD0 EAD4 EAD3 EAD2 EAD1 EAD0 EDAT7
EDAT6 EDAT5 EDAT4 EDAT3 EDAT2 EDAT1 EDAT0 MDAT7 MDAT6 MDAT5 MDAT4 MDAT3 MDAT2 MDAT1 MDAT0
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
REFERENCE [1] [2] [3] [4] [5] [6] [7] [8]
[9]
MEMORY Fig. 19 Memory [9]
D. K. Every. (1999) What is Risc? Design Matters. [Online] Available: http://www.mackido.com/Hardware/WhatIsRISC.html (1999) RISC Architecture [Online] Available: http://www.geocities.com/SiliconValley/Chip/5014/arch.html A. Allison. (2001) Brief History of RISC [Online] Available: http://www.aallison.com/history.htm D. Carey. (2006) VHDL and Verilog. [Online] Available: http://course.wilkes.edu/Engineer1/ J. O. Hamblen. (1997) AVHDL Synthesis Model of MIPS Processor for Use in Computer Architecture Laboratories [Online] Available: http://www.ewh.ieee.org/soc/es/Nov1997/01/INDEX.HTM (2003) RCORE54 Processor. [Online] Available: http://www.ht-lab.com/freecores/risc/risc.html Cadence SPB 15.7 Release PSpice (2006) [Reference Manual provided with software] S. Rajagopalan (2006) Mixed Level and Mixed Signal Simulation using PSpice A/D and VHDL [Online] Available: http://www.cdnusers.org/Portals/0/cdnlive/na2006/PNP/PNP_413/413_p aper.pdf C. Spivey, “Creating PSpice Parts with VHDL Models,” unpublished.
VLSI Implementation of Discrete Wavelet Transform using Systolic Array Architecture S. Sankar Sumanth and K.A. Narayanan Kutty Abstract- In this paper, we introduce an architectural design for efficient hardware acceleration of the Discrete Wavelet Transform. The unit designed can be used to accelerate multimedia applications as JPEG2000. The design is based on the Systolic Architecture Design for Discrete Wavelet Transform, which is a fast implementation of the Discrete Wavelet Transform. The design utilizes various techniques such as pipelining, data reusability, parallel execution and specific features of the Xilinx (Core Generator) Spartan II to accelerate the transform. For performance analysis, simulators like (ModelSim 5.8c, MATLAB 7.1) were used along with xilinx ISE. The Matlab simulator was used to see the images obtained at intermediate stages and check the performance at each stage. Index Terms—Discrete Wavelet transform (DWT), Systolic Architecture (SA), FIR filter, Baugh-Wooley multiplier (BWM).
I.
INTRODUCTION
Wavelet Transform has been successfully applied in different fields, ranging from pure mathematics to applied science. Numerous studies, carried out on Wavelet Transform, have proven its advantages in image processing and data compression and have made it a basic encoding technique in recent data compression standards. Pure software implementation of the DWT, however, appears to be the performance bottleneck in real-time systems in terms of performance. Therefore, hardware acceleration of the DWT has become a topic of recent research. The goal of this research is to investigate the possibility of hardware acceleration of Discrete Wavelet Transform for image compression applications and to use software implementation. In the architectures with direct form FIR filters the intermediate results are stored and routed. Since only half of the outputs are needed, each of the architectures computes the higher octaves in the intermediate time periods. H.T.kung [7] suggested SA’s, which permit multiple computations for each memory access, can speed execution of compute-bound problems without increasing I/O requirements. Vishwanath et al. [3] suggested RAM based VLSI architectures with systolic routing networks. Parhi [2] suggested VLSI architecture for implementation of the DWT using Life time analysis and forward backward register allocation method which needs less hardware than the architectures of [3, 4]. However, the architecture is not regular. The architecture is not scalable and hardware complexity increases exponentially as the number of resolution levels increases. Grzeszczak et al. [4] proposed an efficient SA for VLSI implementation of 1-D DWT, which computes both high-pass (HP) and low-pass (LP) frequency
coefficients in the same clock cycle. However, the hardware complexity and computational complexity of the architecture of [10] are very significantly greater and the architecture requires more control circuitry. In this paper, a SA for VLSI implementation of the DWT is proposed which uses a systolic BWM. The systolic BWM is 22.5% and 74.5% faster than conventional parallel carry save array multiplier and parallel carry ripple array multiplier respectively. The novel 2D-DWT architecture is proposed for one level of decomposition which explores the memory requirement. This paper mostly deals with software implementation of DWT using simulators like Matlab, Xilinx and Modelsim. II.
DISCRETE WAVELET TRANSFORM (DWT)
The DWT may be calculated recursively as a series of convolutions and decimations. The DWT represents an arbitrary square integrable function as a superposition of a family of basis functions called ‘wavelets’. The ‘mother wavelet’ corresponding to a family is translated and dilated to generate the family of wavelet basis functions. The inner products between the input image coefficients and wavelet coefficients give the DWT coefficients. The basis functions are translated and dilated versions of each other. DWT coefficients of any stage can be computed from DWT coefficients of the previous stage using a simple algorithm known as Mallat’s tree algorithm or the pyramid algorithm [9]. The decomposition algorithm is expressed by the equations.
∑ D (m, j − 1)h(m − 2n) ( n, j ) = ∑ D ( m , j − 1) g ( m − 2n)
D L ( n, j ) =
L
(1a)
m
DH
H
(1b)
m
Where DL (n, j) is the nth scaling coefficient at the jth stage, DH (n, j) is the nth wavelet Coefficient at the jth stage, and h(n), g(n) are the dilation coefficients corresponding to the Scaling and wavelet functions, respectively. The coefficients act as signal filters. Hence the resolution obtained depends on the size of the filter selected. We apply 6 tap FIR filter on the input image data. These 6 tap FIR filter coefficients are the Daubechies 6 tap Scaling coefficients which are in floating point are converted into 8 bit binary, since the floating point representation is not possible in simulators. The wavelet Coefficients can be obtained by just reversing the order and reversing the sign of the scaling Coefficients.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 467–472. © Springer Science+Business Media B.V. 2008
468
SUMANTH AND KUTTY g(n) High H1
MxN samples
(M*N)/2 b samples
2 High H2
a Low L1
2
High H3
(M*N)/2 c samples Low L2
h(n)
(M*N)/4 d samples
2
2
(M*N)/4 samples
(M*N)/8 f samples
2
(M*N)/8 g samples
(M*N)/4 e samples Low L3
(M*N)/2 samples
2
(M*N)/8 samples
Figure 1. Three Stage DWT decomposition using pyramid algorithm. [4] A. Decomposition Algorithm The input data is taken from an image of a high-resolution of size M x N as the DWT coefficients. DWT coefficients of subsequent stages are computed using the equation (i). The 1st level of wavelet Decomposition (WD) extracts the high-frequency components while the second and all subsequent WD’s extract progressively lower frequency components. Two filters derived from the wavelets, a HP filter G and corresponding LP filter H are used at each level of the tree for the forward transform. The output of the two-channel QMF bank is evaluated and decimated by a factor two. The G filter output gives the DWT values of the 1st octave. The H outputs are then recycled back into the filter banks to generate the next octave, and so on. The DWT is calculated recursively as a series of convolutions and decimations at each octave. In Fig. 1, the transfer functions of the sixth-order HP (g(n)) and LP (h(n)) FIR filter are expressed as follows: H(z)=g(0)+g(1)z-1+g(2)z-2+g(3)z-3+g(4) z-4+g(5) z-5. (2a) (2b) L(z)=h(0)+h(1)z-1+h(2)z-2+h(3)z-3+h(4) z-4+h(5) z-5. Let ‘a’ denote the N-sample input data, i.e. the DWT coefficient of the high-resolution stage. Let b, c, d, e, f and g denote the intermediate and final DWT coefficients. The following equations give the relationships among a, b, c, d, e, f and g 1st octave HPF: b(0) = g(0) a(0) + g(1) a(-1)+ g(2) a(-2) + g(3) a(-3) + g(4) a(-4) + g(5) a(-5) (3a) b(2) = g(0) a(2) + g(1) a( 1) + g(2) a( 0) + g(3) a(-1) + g(4) a(-2) + g(5) a(-3) (3b) b(4) = g(0) a(4) + g(1) a( 3) + g(2) a( 2) + g(3) a( 1) + g(4) a( 0) + g(5) a(-1) (3c) b(6) = g(0) a(6) + g(1) a( 5) + g(2) a( 4) + g(3) a( 3) + g(4) a( 2) + g(5) a( 1) (3d) 1st octave LPF: c(0) = h(0) a(0) + h(1) a(-1) + h(2) a(-2) + h(3) a(-3) + h(4) a(-4) + h(5) a(-5) (3e)
c(2) = h(0) a(2) + h(1) a( 1) + h(2) a( 0) + h(3) a(-1) + h(4) a(-2) + h(5) a(-3) (3f) c(4) = h(0) a(4) + h(1) a( 3) + h(2) a( 2) + h(3) a( 1) + h(4) a( 0) + h(5) a(-1) (3g) c(6) = h(0) a(6) + h(1) a( 5) + h(2) a( 4) + h(3) a( 3) + h(4) a( 2) + h(5) a( 1) (3h) 2nd octave HPF and LPF: d(0) = g(0) c(0) + g(1) c(-2) + g(2) c(-4) + g(3) c(-6) + g(4) c(-8) + g(5) c(-10) (4a) d(4) = g(0) c(4) + g(1) c( 2) + g(2) c( 0) + g(3) c(-2) + g(4) c(-4) + g(5) c(-6) (4b) e(0) = h(0) c(0) + h(1) c(-2) + h(2) c(-4) + h(3) c(-6) + h(4) c(-8) + h(5) c(-10) (4c) e(4) = h(0) c( 4) + h(1) c( 2) + h(2) c( 0) + h(3) c(-2) + h(4) c(-4) + h(5) c(-6) (4d) 3rd octave HPF and LPF: f(0) = g(0) e(0) + g(1) e(-4) + g(2) e(-8) + g(3) e(-12) + g(4) e(-16) + g(5)e(-20) (5a) g(0) = h(0) e(0) + h(1) e(-4) + h(2) e(-8) + h(3) e(-12) + h(4) e(-16) + h(5) e(-20) (5b) B. SYSTOLIC ARCHITECTURE DESIGN SA’s represent a network of processing elements (PE) that rhythmically compute and pass data through the system. These PE’s regularly pump data in and out such that a regular flow of data is maintained. This operation is analogous to the flow of blood through the heart, thus the name “systolic”. The PE’s in a systolic array are uniform and fully pipelined, and all the communicating edges among the PE’s contain delay elements, and the whole system contains local interconnections. The Dependency Graph (DG) corresponds to a space representation where no timing instance is assigned to any computation which Corresponds to t=0 plane. The mapping technique transforms a space representation to a space-time representation where each node is mapped to a certain processing element and is scheduled to a certain time instance. 1.) Systolic design methodology: The Systolic design methodology maps an N- dimensional DG to a lower dimensional SA. The Basic vectors involved in systolic array design are
469
VLSI IMPLEMENTATION OF DISCRETE WAVELET TRANSFORM ⎡d 1 ⎤ ⎥ ⎣ d 2⎦
i.) Projection vector d = ⎢
(p1 p2). iii.) Scheduling vector. sT = (s1 s2). The SA is designed by selecting the appropriate vector satisfying the feasibility constraints according to [8] i.e. a.) Processor space vector and projection space vector must be orthogonal to each other. PT(IA - IB) = 0 => PT d = 0. (6a) b.) If A and B are mapped to the same processor then they must be executed by then they cannot be executed by the same processor. i.e., sT IA ≠ sT IB, i.e., sTd ≠ 0. (6b) 2.) Edge Mapping: Here we select the vectors as pT = [0 1], sT = [1 0], d = [1 0] and Edge Mapping is done following the conditions above. In Edge Mapping we obtain the edges (PTe) and delays (STe) needed in the architecture. The table below shown gives the details. Decomposition Level 1
Edge|Delay PTe
STe
Wt(1,0)
0
0
i/p(1,1)
1
1
o/p(0,1)
1
0
3.) Systolic Block Diagram: Here the edges indicate the connections to be made with the adjacent processing elements and the delays give the delay elements involved with the corresponding edges. In the design obtained here the weights (Wt) stay in the PE’s with no delays, the input’s move to the adjacent PE’s with one delay element and output’s move to the adjacent PE’s with zeros delay elements. The corresponding block diagram obtained is shown below. First level i/p o /p
D P E 1
P E 2
D
D
D P E 3
N
. ii.) Processor Space vector pT =
P E 4
D P E 5
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
g(5) g(4) g(3) g(2) g(1) g(0) a(-5) a(-4) a(-3) a(-2) a(-1) a(0)
a(1) a(2) a(3) a(4) a(5) a(6)
b(0)
b(2)
b(4)
a(7) a(8) a(9) a(10)
b(6)
Fig 3. DG for 1st Octave for LPF.
III. DWT–SYSTOLIC ARCHITECTURE PROPOSED 1.) Filter Unit. The filter unit (FU) used in this architecture (shown in Fig. 4.) is a six-tap non-recursive FIR digital filter whose transfer functions for HP and LP components are given by (1). g(0) to g(5) and h(0) to h(5) are the coefficients of the HPF and LPF, respectively. The DWT coefficients are computed by a multiply and accumulate method where partial products are computed separately and then added. Partial results of each cell are computed and then passed in a systolic manner from one cell to the adjacent cell.Sample L(nt) L(n-1) L(n-2) L(n-3) L(n-4) L(n-5)
0
1
2
3
4
6
C Fig 4. Systolic Operation of Six tap FIR Filter [4]
2.) Filter cell (FC): Computation of both HP and LP DWT coefficients is done the hardware shown below. P E 6
From the previous PE
+
H
Processor axis Fig 2 Block diagram of the Design
4.) Dependency Graph (DG): The data dependency structure for the Ist octave is obtained for the block diagram. The DG obtained is shown in the figure 3. Where N indicates the time instance and the filter coefficients are given on the x-axis, the input image values are given on the diagonal axis and the output scaling (wavelet) coefficients depending on the filter used are obtained on the y-axis. Here the input image values when implemented will be given using a shift register so that the Implementation would resemble FIR filter and the inputs would be given in a serial fashion to the architecture.
To the next
MUX
X
L Input image value Fig 5. Filter cell [4]
The partial results obtained are passed synchronously in a systolic manner from one cell to an adjacent cell. Each cell consists of one multiplier, one adder and two registers where the HP and LP coefficients are preloaded in the design while implementation.
470
SUMANTH AND KUTTY OUTPUT DATA CSA
CSA
h
h
X
M
CSA
g
h
X
M g
g
D
input data
X
M
CSA
CSA
h
h
X
M
M g
g
D
X
D
CSA
h M
X
g
D
D
Input Delay Unit Fig 6. Systolic DWT Architecture proposed a7
a6
a5
a4
a3
a2
a0
a1
IV. 2D-DWT ARCHITECTURE PROPOSED
b0 P0
b1 HA HA HA
HA HA HA
HA
b2 FA
FA
FA
FA
FA
FA
FA
b3 FA
FA
FA
FA
FA
FA
FA
b4 FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
FA
b5
b6
b7
RIPPLE CARRY ADDER
P14
P13
P12
P11
P10
∑a = ∑a = ∑b = ∑b
n DLL , K 1, K 2 =
P1
n −1 m1 − 2k1 a m 2 − 2k 2 DLL, m1, m 2
(7a)
n −1 m1 − 2k1 bm 2 − 2k 2 DLH , m1, m 2
(7b)
n −1 m1 − 2k1 am 2 − 2k 2 DHL, m1, m 2
(7c)
n −1 m1 − 2k1 bm 2 − 2k 2 DHH , m1, m 2
(7d)
m1, m 2
n DLH , K 1, K 2
m1, m 2
P2
n DHL , K 1, K 2
m1, m 2
n DHH , K 1, K 2
P3
P4
P5
P6
P7
1
P9
Decomposition Algorithm The 2d-Dwt Decomposition algorithm is shown below.
P8
m1, ma
The Three level WD is shown in fig 7 [4]. Here the original image is considered as LL and the further process done on the original image. In the LLm which implies applying LPF in xdirection in y-direction and m is the resolution level. Similarly LH implies applying LPF in x-direction and HPF in ydirection refers to Horizontal Orientation sub image. HL implies HPF in horizontal direction and LPF in vertical direction referred as Vertical Orientation sub image. HH means applying HPF in x-direction and y-direction referred as Diagonal Orientation sub image. In every level of decomposition we obtain one LP and three HP sub images. The process is continued on the obtained LLm image till the required number of decomposition levels are obtained. The 2D-DWT architecture is mostly dependent on 1D-DWT architecture. Here in 2D-DWT the 1D-DWT module is applied on both the rows and columns of the image.
Fig 7. 8x8 bit Baugh Wooley Multiplier [8]
Hence, the 1D-DWT architecture is obtained based on the analysis done above. The 1D-DWT architecture obtained is shown in figure 6, which consists of Filter cells, Input delay Unit and Carry save adder. Here the multiplier used in filter cells is BWM [8]. Here we have made use of 8 x 8 bit BaughWooley’s 2’s complement multiplier. BWM has regular and SA which has Full Adders (FA), Half Adders (HA) and a Ripple Carry Adder to get the Msb bits P7 to P14.
V. VLSI IMPLEMENTATION OF 2D-DWT The Architecture of 2D-DWT is mostly dependent on the 1DDWT Architecture as shown in fig 11. In order to obtain the LL image, we apply LPF to the (M x N) image in the xdirection and decimate the image, so that the size of the image would be (M x N/2). Then we apply LPF to the obtained image of size (MxN/2) in the y-direction and decimate it, the size of the image would be (M/2xN/2). Here the address generator is used to get the values from the memory in the
471
VLSI IMPLEMENTATION OF DISCRETE WAVELET TRANSFORM HH 3
N/4
N/8 M/8
LL 3
LH 3
m
m
1-DWT Module
N
Input Image
N/2
LH2 m=2
LH Resolution level m=1
m M/4
M
M/2
HH2 m=2
Here the elements of the image are read in row-wise initially i.e., the 1-D DWT is applied on the row and once the row is completed it returns to the next row. This type of scan requires minimum of intermediate buffer requirement, and this resembles letter ‘z’, hence called as Optimal Z-scan [6] as shown in fig 10, we applied DWT using this type of scan.
(HORIZONTAL ORIENTATION SUB IMAGE)
HL
HH
Resolution level m=1
Resolution level m=1
(VERTICAL ORIENTATION SUB IMAGE)
(DIAGONAL ORIENTATION SUB IMAGE)
Here initially input image 256 x 256 is read through Matlab and stored in a Block ram. The DWT is applied on the image values obtained through block ram Fig 10 . Optimal Z-Scan [6] using address generator. Here after applying the DWT since the values obtained are 17 bit due to the binary daubechies coefficients are truncated or DC level Shifting is done. Prior to computation of the forward discrete wavelet transform (DWT) on each image tile, all samples of the image tile component are dc level shifted by subtracting the same quantity 2P-1, where P is the component’s precision. DC level shifting is performed on samples of components that are unsigned only. Level shifting does not affect variances. It actually converts an unsigned representation to a two’s complement representation, or vice versa [10]. The results obtained through Verilog HDL and process flow is shown in fig12. and the results obtained through Matlab with floating db coefficients in the program are shown in fig 14. And the results are compared with binary coefficients are in fig 14.
Fig 8. Wavelet Decomposition of an image [4]
specified direction either row wise or column wise i.e. x or ydirection. Hence the Matrix- transposition is avoided. In the architecture shown above, we first apply LPF and HPF on the image (MxN) and the Obtained images would be (MxN/2). So here the memory requirement would be ((MxN/2) + (MxN/2) = MxN) MxN only. Further when we apply LPF and HPF on the obtained images of L and H, we get LL, LH,HL,&HH, whose sizes would be (M/2xN/2), and the total memory required would be ((M/2xN/2)+ (M/2xN/2)+ (M/2xN/2)+ (M/2xN/2) = (MxN)). Hence the total number of memory access and memory requirement would be only 2*(MxN). The output of the First level Decomposition LL1 is applied again to the input buffer obtain the LP sub image, to derive a higher level of decomposition. L
Address Generator
1D-DWT
L
M x N/2 Image H
Input MxN Image
L H
Address Generator
1D-DWT
Address Generator
Address Generator
Address Generator
1D-DWT
M/2 x N/2 Image
1D-DWT
M/2 x N/2 Image
LH
1D-DWT
M/2 x N/2 Image
HL
1D-DWT
M/2 x N/2 Image
HH
M x N/2 Image H
MxN
Address Generator
Output Block RAM
Fig 9. Implementation of 1D-DWT Module
HL 3
HL2 m=2
Input Block RAM
Address Generator
MxN Fig 11. 2D-DWT ARCHITECTURE WITH MEMORY REQUIREMENT AT EACH STAGE OF ONE LEVEL DECOMPOSITION
MxN
LL
472
SUMANTH AND KUTTY Apply DWT, Decimation
Compile/Simulate Verilog Design in Modelsim Read the Image 256X256X8 MATLAB Obtain COE File Format for the image in MATLAB
Obtain COE File Format for the image in MATLAB
performance. This decreases the average amount of latency and therefore leads to a performance increase. VII. RESULTS
Generate Block ram XILINX-Core Gen Using the COE file Depth = 8192 Width = 8
Intermediate Output file 256Dx128Wx17b is Obtained
Compile/Simulate Verilog Design Modelsim
Transpose image
Intermediate Output file 128Dx128Wx17b is Obtained
Original image
Generate Block ram XILINX-Core Gen Using the COE file Depth =16384 Width = 8
Read the file In MATLAB & Truncate
After applying HPF & Decimating
Transpose image Decomposition
Read the file In MATLAB & Truncate
MIF file generated through Core generator
After applying LPF & Decimating
M=1
M=2
M=3
Fig 13. Results from Matlab with floating
(b)
(a)
Fig 14. Comparing results with floating coefficients and binary coefficients. (a) Floating point (b) Binary
VIII.
VI.
CONCLUSION
SA design for an efficient 1D-DWT has been presented. The structure of DWT is simple and scalable for higher levels, that is more modules can be put next to each other to achieve a higher performance. 2D-DWT architecture based on 1D-DWT are presented. The Systolic BWM which processes one word per clock cycle is used, because of its advantages like regular, reduced computation time(since carries generated are saved and added) and significant reduction in power dissipation compared to parallel carry ripple and parallel carry save array multipliers. Design implementation: Verilog (HDL) was used to describe the behavior of the design. Subsequently, the design was simulated in Modelsim for functional correctness and synthesized using Xilinx ISE and Leonardo spectrum tools. Here the Matrix transposition has been avoided because the address generator designed produces the values accordingly as required for the stage. The design is developed for db’s 6 tap FIR filters, but it can easily be modified to operate with other filter types of higher degrees. While the image is being transformed, if the transformed coefficients can be loaded from or written back to the external memory simultaneously can be a possible method for increasing the
REFERENCES:
[1] S.S. Nayak, “Bit-level systolic implementation of 1-D and 2-D discrete wavelet transform” IEEE Proc.-Circuits Devices Syst., Vol. 152, No. 1, February - 2005. [2] Keshab K. Parhi and Takao Nishitani “VLSI Architectures for Discrete Wavelet Transforms” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, VOL.1, NO.2, pp 191-202, JUNE 1993. [3] Mohan Vishwanath, Robert Michael Owens and Mary Jane Irwin VLSI Architectures for the Discrete Wavelet Transform” IEEE Transactions on Circuits and Systems-11: Analog and Digital Signal Processing, VOL. 42, NO. 5, pp 305-316, MAY 1995. [4] Aleksander Grzeszczak, Mrinal K. Mandal, Sethuraman Panchanathan and Tet Yeap “VLSI Implementation of Discrete Wavelet Transform”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, VOL. 4, NO. 4, pp 421-433, December 1996. [5] Fridman.J, and Manolakos.E, “Distributed memory and control VLSI Architectures for the 1-D discrete wavelet transform’. Proc. IEEE Workshop VLSI Signal Processing VII, La Jolla, CA, USA, pp. 388–397, 1994. [6] Rahul Jain and Preeti Ranjan Panda, “Memory Architecture Exploration for Power-Efficient 2D-Discrete Wavelet Transform” 20th International Conference on VLSI Design (VLSID’07) IEEE Computer Society, 2007. [7] H.T. Kung, Why Systolic Architectures?, Computer, 15, No. 1, pp. 37-46, 1982. [8] Keshab K. Parhi, “VLSI Digital Signal Processing Systems Design and Implementation”. John wiley,1999. [9] Mallat, S.G.: ‘A theory of multiresolution signal decomposition: The wavelet representation’, IEEE Trans. Pattern Anal.Mach. Intell., 1989,11 [10] Athanassios Skodras, Charilaos Christopoulos, and Touradj Ebrahimi “The JPEG 2000 Still Image Compression Standard” IEEE Signal Processing Magazine, September 2001. [11] Lakshminarayanan, G. Venkataramani, B. Senthil Kumar, J. Yousuf, A.K. Sriram, G. Design and FPGA implementation of image block encoders with 2D-DWT Dept. of Electron. & Commun. Eng., Nat. Inst. of Technol., Tiruchirappalli, India. ”
Fig 12: VLSI implementation of DWT using Matlab, Verilog, and Xilinx. (Repetitive process shown in Detail)
Introducing MARF: A Modular Audio Recognition Framework and its Applications for Scientific and Software Engineering Research Serguei A. Mokhov SGW, EV7.139-2 Department of Computer Science and Software Engineering Concordia University Montreal, Quebec, Canada Email: [email protected]
Abstract—In this paper we introduce a Modular Audio Recognition Framework (MARF), as an open-source research platform implemented in Java. MARF is used to evaluate various pattern-recognition algorithms and beyond in areas such as audio and text processing (NLP) and may also act as a library in applications as well as serve as a basis for learning and extension as it encompasses good software engineering practices in its design and implementation. Thus, the paper subsequently summarizes the core framework’s features and capabilities and where it is heading. Index Terms—MARF, signal processing, audio processing, pattern recognition, algorithms, software engineering, research
I. INTRODUCTION A. Purpose This paper introduces the Modular Audio Recognition Framework (MARF) and its example applications to the Java research community and other interested parties. MARF [1], [2] is an open-source project hosted on sourceforge.net with 1 23,500+ downloads as of this writing. MARF is a collection of general-purpose pattern-recognition frameworks, APIs, and concrete algorithms implemented within to illustrate the workings of the algorithms, and, more importantly, give researchers a platform to test existing and new algorithms against each other in various performance metrics. The frameworks originally evolved around audio recognition, but are not restricted to it due to their generality as well as that of the implemented algorithms as going to be shown further. Today’s MARF covers pattern-recognition pipeline (loading, preprocessing, feature extraction, training and classification) as in Figure 1, various algorithms, natural-language processing, and has a distributed version [3]. B. Brief History MARF has begun its life in 2002 with a group of four students to work on a project course for pattern recognition by four students: Serguei Mokhov, Stephen Sinclair, Ian Clement, 1
as of this writing circa end of November 2007
and Dimitrios Nicolacopoulos [1]. The project has been opensource from the start and maintained sporadically after the course was over, primarily by the author if this paper. The author additionally used MARF in his master’s thesis [4], [5], other course projects, and work. MARF has a proof-ofconcept distributed implementation which enables core stages of sample loading, preprocessing, feature extraction, and classification over Java RMI [6], CORBA [7], or Web Services [8]. As of this writing there are more than 23,500 downloads to date. This paper intends to present MARF to the scientific and software engineering research communities. C. Conceptual Design The pattern recognition pipeline in Figure 1 illustrates a kind of a data flow between various stages with the inner boxes indicating the available modules and their implementations. The grayed-out boxes are either stubs or partly implemented; conversely the white boxes indicate implemented algorithms. Generally, the process starts by loading a sample (e.g. an audio recording), preprocessing and normalizing it, then extracting most discriminating features, and finally either training the system or run classification. The result of training is the feature vectors or their clusters of a given subject are stored and the result of classification is the collection of possible subject identifiers, from the most likely to the least likely. II. ACTUAL FRAMEWORK DESIGN The main principle of the framework deign behind MARF was ability to add any number or at any time switch between various algorithm implementations that are executed as a part of the pipeline. This enabled a plug-in-like architecture for all pipeline stages, which later propagated to the other aspects of the framework, such as NLP and its distributed version as well as the utility modules. MARF itself is a collection of frameworks within. They generally draw lines around the boundaries of the aspects they implement. These are Storage and Sample Loading, Preprocessing, Feature Extraction, Training and Classification, Math and Algorithms, NLP, and
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 473–478. © Springer Science+Business Media B.V. 2008
474
MOKHOV
implementations, i.e. the last stage of the pipeline. The classical implementations here include distance classifiers and artificial neural network algorithms.
Fig. 1. MARF’s Pattern Recognition Pipeline
Fig. 2. MARF Package Structure
Network-related services. These usually correspond to the Java packages shown in Figure 2. •
marf – is the root package where the main MARF class resides, its Configuration, and Version.
•
Preprocessing – corresponds to the preprocessing sub-framework of MARF and its implementations, i.e. the second stage of the pipeline. The bulk of the preprocessing algorithms are various types of filters, silence removal, and normalization.
•
FeatureExtraction – corresponds to the feature extraction sub-framework of MARF and its implementations, i.e. the third stage of the pipeline. The classical implementations here include FFT and LPC algorithms.
•
Classification – corresponds to the training and classification sub-framework of MARF and its
•
Stats and math – correspond to the stage-independent implementation of statistics gathering as well as plain math algorithms usable in any of the pipeline stages (e.g. FFT can be used to model filters in preprocessing and its frequencies can be features in feature extraction).
•
nlp – corresponds to the sub-frameworks related to NLP, such as parsing of the natural language, grammar compilation, stemming, and others.
•
net – corresponds to the sub-frameworks related to the Distributed MARF components (each stage can run on a separate host) and their management over SNMP.
•
Storage – corresponds to the main storage management mechanisms in MARF, including sample loading (the first stage of the pipeline) and storing the training data.
•
util – corresponds to the collection of general-purpose utility modules.
•
gui – corresponds to the collection of general-purpose GUI modules.
All the major components are represented by the Java interfaces such that other modules or applications may use the framework in the general manner. According to MARF’s coding conventions [9] the interfaces are prefixed with an “I”, as e.g. in IClassification. The major interfaces are shown in Figure 3. In most cases the names of the interfaces are self-explanatory, and include: •
IStorageManager – most of the modules in MARF implement this interface in order to serialize and reload the training or intermediate data in whatever format specified in the process of evaluation. Generally, they do not directly implement the interface rather they extend its most common implementation in StorageManager. ISampleLoader – all audio and text loaders implement this interface to be able to load the sample data (e.g. WAVE file or a text file) and interpret it as an array of doubles for the further processing.
•
IDatabase – this interface is designed to abstract any possible statistics and other data storage (e.g. speakers) from the MARF, and its implementation can map to plain Java object serialization, SQL database, XML file, or any other data storage management system.
•
IPreprocessing
–
a
common
interface
all
475
INTRODUCING MARF
Fig. 3. MARF’s Main Interfaces
preprocessing modules implement and how other modules talk to this one. •
IFilter – a special category of preprocessing modules, designated to all sorts of filters, implement this interface, and present an independent sub-framework to enable people working in filters design just operate on the filters.
•
IFeatureExtraction – all modules implement this interface.
•
IClassification – all training and classification modules implement this interface.
•
IStemming – an interface designed for the NLP processing of stemming words to their roots (a kind of preprocessing in the NLP world).
feature
extraction
•
ICompiler – a common compiler interface that all grammar compiler and MARFL (MARF’s scripting language) compiler implement this interface.
•
IParser – a part of the compiler that all parsers implement, including grammar, MARFL, and probabilistic (for natural languages, e.g. English) parsers.
IStemming, ICompiler, IParser are not discussed 2
further in this work (as they deserve a separate publication) . The actual implementation of the pipeline is in the MARF 3 class that corresponds to the conceptual statechart in Figure 1 . 2
Not all of these components are publicly available yet in MARF’s CVS repository as of this writing, they can be made available on demand.
MARF invokes the concrete implementation of modules for every state of the pipeline using the API exposed by the described earlier interfaces in Figure 3 in the same order: sample loading (i.e. in the rudimentary form loadSample()), preprocessing (preprocess()), feature extraction (extractFeatures()), and then either training (train()) or classification (classify()). This implementation of the pipeline is made very general such that it is not aware which concrete modules are being called because it operates only on the common API presented in the interfaces described earlier. This makes the pipeline flexible for inclusion of external and internal plug-ins. III. APPLICATIONS AND EXPERIMENTS There is a large number of possible applications that can be written using MARF in its present form. In fact, there are four major and many minor applications that are bundled with MARF upon a release or released independently. The four major example applications include TextIndependent Speaker Identification, Language Identification, Probabilistic Parsing, and Zipf’s Law analysis. The former is the most developed application that is used in a variety of tests of the MARF’s features, including exhausting testing of the implemented algorithms with the ultimate goal to show 3
The finer detail UML sequence diagram has been removed from this paper due to page limit and copyright restrictions. One can still find it in the open source MARF’s manual in [2].
476
MOKHOV
pattern-recognition-like application. Furthermore, the MARF was used in the research related to the implementation and testing of a security framework for data security (integrity, origin authentication, and privacy) research, which is a subject of another publication in progress as of this writing. A. Algorithms
Fig. 4. Example Testing Results of Speaker Identification
which combination of the preprocessing, feature extraction, and classification algorithm configuration yields best results thereby helping possible researchers to focus on the algorithm they need for their particular problem in their domain. The application collects statistics on the amount of successful vs. unsuccessful guesses as well as a “second best” approach. This also adds the measurements of a run-time of each configuration (assuming similar hardware and software environments for all the tests when the experiment is running). The “second best” approach is an interesting measure, because if we don’t “guess” right from the first time, likely the next guess in line is correct. E.g. like humans, for example an adult son and a father or an adult daughter and a mother, or several similar-age brothers or sisters sometimes can be mixed up when listened to on the phone by another person, yielding the 2nd guess is usually right. Thus, we provide this measure of interest as well. In Figure 4 is an excerpt of the results for the first and second guesses produced after the system models were trained on 306 voice samples of 30 speakers and tested on the “unseen” 30 testing samples. Each option constitutes a parameter to the application that selects an appropriate algorithm at run time (in this case just to give the reader some idea -norm means normalization, -endp means endpointing, -silence indicates to remove silence from samples, -low means to use low-pass FFT filter, -fft means to use FFT for feature extraction, -cos means the use of the cosine similarity measure, etc.). IV. BEYOND AUDIO The word “Audio” in the catchy framework’s name and abbreviation may make the reader to think that the framework is only constrained to the audio processing, which is not exactly true, as it was mentioned earlier there are some NLP processing tasks as well as network-related implementation of MARF’s pipeline stages. Additionally, the algorithms, such as FFT, LPC, neural network, various distance and similarity measures are not tied in to audio, but can be used in any
Aside from the skeleton framework itself, MARF has actual implementations of several algorithms to demonstrate its abilities in various pipeline stages and modules of MARF. These include Fast Fourier Transform (used in FFT-based filtering as well as feature extraction [10]), Linear Predictive Coding (feature extraction), neural network (classification), various distance classifiers (Chebyshev a.k.a Manhattan or 4 city-block [11], Euclidean [12], Mahalanobis [13], Diff , Hamming [14], and Minkowski [15]), cosine similarity measure [16], [17], Zipf’s Law-based classifier [18], general probability classifier, CFE-based filters [19]. From the NLP, parsing, and compiler world, there are grammar-based parsers, top-down decent recursive parser, and statistical natural language parser (for now just English), a number of math-related tools, for matrix and vector processing, including complex numbers matrix and vector operations, and statistical estimators used in smoothing of sparse matrices (e.g. in probabilistic parsing based on the CYK algorithm [20] or Mahalanobis distance’s covariance matrix). Thus, Java application just wishing to use already implemented and tested algorithms and compare to the other implementations in a homogeneous MARF environment can do so, or just use it as way of teaching and learning. B. Utility Modules MARF has a large collection of general-purpose utility modules that came about while the rest of the framework’s development. The utilities are even released as a separate package and being used in the applications that have nothing to do with pattern recognition or NLP. marf.util.Arrays class is (not pictured) a conceptual extension (not by inheritance) of java.util.Arrays, which groups a lot of array related functionality for copying, sorting, searching, and conversion of arrays of different types of arrays (in terms of types of their elements) as well as conversion to and from java.lang.String and java.util.Vector types. Utility modules also provide some thread management, debugging and logging, option processing tools and comparators common storage data structures for sorting, which simplify the life of applications and the internal modules themselves considerably. This also makes the classical MARF 4
internally developed within the project, similar in behavior to the UNIX/Linux diff utility
INTRODUCING MARF
477
Fig. 3. MARF’s Major Utility Modules 5
stand-alone and independent on any other libraries . In Figure 5 are the threading and collection classes (that allow growing when an element of a vector or matrix is set beyond their upper limit, grow to that limit).
another example in this category. The statistical based n-gram models rely on the statistical estimators implemented within the Stats framework. The latter are in use by the LangIdentApp to do character-based language recognition.
C. Natural Language Processing (NLP) A lot of modules and interfaces came in with addition of the nlp package to MARF. This includes parsing, stemming, and n-gram models. The already mentioned probabilistic parsing algorithm is implemented in MARF, CYK [20]. Its example application of use is the ProbabilisticParsingApp. The general set of parsers and compilers is there to allow compilation of grammar for a given language and then its use for the parsing. Zipf’s Law [18] module implementation in the ZipfLaw class to test the law and use it as a classification module along with the testing application ZipfLawApp is 5
except when a testing version is downloaded, it requires junit.jar to be present in the classpath to make it possible to compile and invoke the built-in unit tests
D. Software Engineering The framework is used in research and teaching of software engineering. A lot of care was dedicated in the design of the framework with good software engineering practices and patterns. It goes through the iterations of constant improvements and refinements. The described interface hierarchy represents the core of the framework’s API implemented by its concrete modules. The modules commonly follow the Abstract Factory, Builder, Singleton, Visitor, Composite and other design patterns. The MARF’s documentation serves as an example of how to use various UML diagrams. MARF’s utility modules and algorithm implementations are also used in the outside scope of the pattern recognition.
478
MOKHOV
E. Distributed Computing MARF’s pipeline has been extended to allow running its stages on different computers and manage such nodes over SNMP to test various properties of such pipeline-based distributed systems. MARF’s components are in use also by the General Intensional Programming System [5], [4]. F. Testing A lot of attention was devoted to the testing aspect of MARF. There are JUnit [21] test cases for some of the MARF’s modules, as well as numerous Test-applications that test specific algorithm or functionality in near-isolation (e.g. TestFFT or TestLPC or TestFilters). There is also an aggregate Regression application that attempts to consolidate all available test application and JUnit tests in one package. V. FUTURE WORK While there is a vast TODO/wish list of desired enhancements and feature requests within the project; the future work will largely focus on improving the quality of the existing code, further streamlining it, and implementation of the missing essential features. Speech-related aspects along the MARF applications, a speech recognition and generation (text-to-speech and speech-to-text), GUI, and the MARFL intensional scripting language (to create MARF applications through a script) will be a major focus in the following versions. Given that there are some open-source tools with compatible licenses (BSD) are already present, some of the work can be really as minimal as just writing plug-in wrappers for packages like CMU Sphinx [22], which already provide a powerful speech-to-text facility. VI. CONCLUSION In conclusion, we would like to summarize the aspects MARF provides and how it facilitates the research into pattern-recognition algorithms by providing an easy to extend plug-gable platform for research and evolution of new and existing algorithms in terms of recognition and run-time performance. The framework design revolved around the typical pattern-recognition pipeline, such as sample loading, preprocessing, feature extraction, training and classification. This resulted in supporting thread management, storage management, mathematics, and utility modules. We believe this contribution to the open-source community helps various research projects in the field of pattern recognition and beyond. VII. ACKNOWLEDGMENTS Here we would like to acknowledge the contributions from the original MARF founders, namely Stephen Sinclair, Ian Clement, Dimitrios Nicolacopoulos as well as all subsequent contributors to the project. We would like to thank SourceForge.net and its maintainers for providing hosting and essential services for the open-source community. This research and development work was funded in part by the Faculty of Engineering and Computer Science of Concordia University, Montreal, Canada.
REFERENCES [1] S. Mokhov, I. Clement, S. Sinclair, and D. Nicolacopoulos, Modular Audio Recognition Framework. Department of Computer Science and Software Engineering, Concordia University, 2002-2003, http://marf.sf. net. [2] MARF Research & Development Group, Modular Audio Recognition Framework and Applications. SourceForge.net, 2002-2007, http://marf. sf.net. [3] S. Mokhov, On Design and Implementation of Distributed Modular Audio Recognition Framework: Requirements and Specification Design Document. Department of Computer Science and Software Engineering, Concordia University, 2006, http://marf.sf.net. [4] S. A. Mokhov, “Towards Hybrid Intensional Programming with JLucid, Objective Lucid, and General Imperative Compiler Framework in the GIPSY,” Master’s thesis, Department of Computer Science and Software Engineering, Concordia University, Oct. 2005, iSBN 0494102934. [5] T. G. Research and D. Group, The GIPSY Project. Department of Computer Science and Software Engineering, Concordia University, 2002-2007, http://newton.cs.concordia.ca/ gipsy/. ∼
[6] A. Wollrath and J. Waldo, Java RMI Tutorial. Sun Microsystems, Inc., 1995-2005, http://java.sun.com/docs/books/tutorial/rmi/index.html. [7] S. Microsystems, Java IDL. Sun Microsystems, Inc., 2004, http://java. sun.com/j2se/1.5.0/docs/guide/idl/index.html. [8] ——, The Java Web Services Tutorial (For Java Web Services Developer’s Pack, v2.0). Sun Microsystems, Inc., Feb. 2006, http: //java.sun.com/webservices/docs/2.0/tutorial/doc/index.html. [9] S. Mokhov, “MARF Coding Conventions,” 2005-2007, http://marf.sf. net/coding.html. [10] S. M. Bernsee, The DFT “a pied”: Mastering The Fourier Transform in One Day. DSPdimension.com, 1999-2005, http://www.dspdimension. com/data/html/dftapied.html. [11] H. Abdi, “Distance.” In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage, 2007, http://en. wikipedia.org/wiki/Chebyshev distance. [12] ——, “Distance.” In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage, 2007, http://en.wikipedia.org/ wiki/Euclidean distance. [13] P. Mahalanobis, “On the generalised distance in statistics.” Proceedings of the National Institute of Science of India 12 (1936) 49-55, 1936, http://en.wikipedia.org/wiki/Mahalanobis distance. [14] R. W. Hamming, “Error Detecting and Error Correcting Codes.” Bell System Technical Journal 26(2):147-160, 1950, http://en.wikipedia.org/ wiki/Hamming distance. [15] H. Abdi, “Distance.” In N.J. Salkind (Ed.): Encyclopedia of Measurement and Statistics. Thousand Oaks (CA): Sage, 2007, http://en. wikipedia.org/wiki/Distance#Distance in Euclidean space. [16] E. Garcia, “Cosine similarity and term weight tutorial,” 2006, http://www.miislita.com/information-retrieval-tutorial/ cosinesimilarity-tutorial.html. [17] A. Kishore, “Similarity measure: Cosine similarity or euclidean distance or both,” feb 2007, http://semanticvoid.com/blog/2007/02/23/ similarity-measure-cosine-similarity-or-euclidean-distance-or-both/. [18] G. K. Zipf, The Psychobiology of Language. Houghton-Mifflin, New York, NY, 1935, http://en.wikipedia.org/wiki/Zipf%27s law. [19] S. Haridas, “Generation of 2-d digital filters with variable magnitude characteristics starting from a particular type of 2-variable continued fraction expansion,” Master’s thesis, Concordia University, Montr´eal, Canada, Jul. 2006. [20] J. H. Martin, CYK Probabilistic Parsing Algorithm, http://www.cs. colorado.edu/ martin/SLP/New Pages/pg455.pdf. ∼
[21] E. Gamma and K. Beck, JUnit. Object Mentor, Inc., 2001-2004, http: //junit.org/. [22] T. S. G. at Carnegie Mellon, The CMU Sphinx Group Open Source Speech Recognition Engines. cmusphinx.org, 2007, http://cmusphinx. sourceforge.net.
TCP/IP Over Bluetooth Umar F. Khan, University of Bradford, Bradford, UK, [email protected] Shafqat Hameed, University of Wales, Newport, UK, [email protected] Tim Macintyre, University of Wales, Newport, UK, [email protected] Abstract- As new communication technologies are emerging they are introducing new form of short range wireless networks. Bluetooth is one of them as well, which allows information exchange over a short range. Bluetooth is a low cost, short range and low power radio technology which was originally developed to connect the devices such as mobile phone handsets, portable computers, headsets without having cable. Bluetooth was started in about 1994 by Ericson mobile communications but version 1.0 of Bluetooth came out in 1999. Bluetooth is a fastest growing technology. Its applications are increasing as the research goes on. By using Bluetooth we can make connection for traffic between sender and receiver. It can be either synchronous traffic such as voice or asynchronous traffic such as traffic over the internet protocol. In this paper we shall discuss that how efficiently Bluetooth can carry the TCP/IP traffic and as well as we shall analyse that how retransmission and delays are handled when there is an error in a packet of data. In addition we shall discuss the Bluetooth layer model and how it works and make the comparison between OSI reference model and Bluetooth layer model. Keywords: TCP/IP, Bluetooth, OSI reference model
I- INTRODUCTION Bluetooth is an ad-hoc wireless network concept that was presented in mid nineties. Bluetooth can connect the mobile terminals within the range of each other and can make an ad-hoc connection between them. Bluetooth is designed for both types of data synchronous and asynchronous data. The example of synchronous data is voice and example of a synchronous data is IP. In any network especially ad-hoc network reliability should be the key consideration for transmission of data for different applications. For this reason data packets in Bluetooth are protected by an ARQ in the link layer, but it is also important to have congestion control. To get the solution of this problem we use TCP (Transmission control protocol). TCP guarantees the reliable delivery of data packets. TCP was originally used for the wireless networks that has low packet error rate. So the problem is that wireless networks usually has high data losses so losses of data packets may still trigger the congestion control in TCP even though there is no congestion. In our discussion we shall consider the performance of TCP over Bluetooth and we shall discuss TCP/IP over Bluetooth in great detail. We shall discuss the throughput
and delays of packets for TCP/IP under different conditions. We find that Bluetooth is powerful tool to carry TCP/IP and we can achieve low delays and high throughput. 1.0 BLUETOOTH Bluetooth is a short range wireless link which originally developed to avoid the cables between electronic or portable devices. Now Bluetooth is an adhoc network which is used for both synchronous and a synchronous data. Bluetooth consists of number of protocols which are all residing on physical and data link layer of the OSI reference model. Detailed description will be discussed later on in the paper. 1.1 Bluetooth system architecture In the architecture we shall discuss how Bluetooth is developed from its connections point of view with the devices and what type of topology does it use. Bluetooth operates in an environment where there is high level of interference between the networks. To make strong links Bluetooth uses acknowledgement which is fast and it also uses the frequency hoping scheme. Bluetooth operates at 2.45 GHZ band. Bluetooth uses point to point (PPP) or multipoint connections. Several connections established and linked together in an ad-hoc Bluetooth actually uses a master slave approach two connections.
Master Master
Slave Single slave
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 479–484. © Springer Science+Business Media B.V. 2008
Slaves Multi slave Fig.1.Bluetooth topology
point to can be fashion. between
480
KHAN ET AL.
As you can see from the above figures that how the topology of Bluetooth will look like. 1.2 Bluetooth protocol Stack Bluetooth protocol stack given below will give you an idea that how the Bluetooth protocol architecture will look like actually. The figure below shows the Bluetooth protocol stack. Now we shall discuss the functionality of the above parts of the protocol stack. In the Applications Bluetooth profiles actually give guidelines on how applications should use the Bluetooth protocol stack . Telephony Control Service (TCS) provide telephony services and Service discovery protocol (SDP) lets Bluetooth devices discover that what are other services supported by Bluetooth devices. WAP and OBEX provide the interface to other communication protocols. RFCOMM provides serial interface. Logic Link Control (LLC) and adaptation multiplexes data from upper layers and convert that data into different packets. Host Controller Interface (HCI) handles communications between a separate host and a Bluetooth device. Link manager controls and configures links to other devices. Link controller controls physical links via radio assembles packets and controls frequency hopping. The job of the Radio is to do the modulation of data for transmission and reception.
Applications
Telephony control ifi ti
Service discovery protocol O B E X
W A P
RFCOMM Logic link control and adaptation
Host Controller interface
Link Manager
Link Controller
Radio
Fig.2.Bluetooth protocol stack As it is discussed earlier that Bluetooth uses a master slave approach. In Bluetooth all units are peer to peer with identical interfaces. When two or more units share a same channel they from a piconet. In piconet one unit acts as master and other becomes slave. There is no condition that which one should be master and which one should be slave. Any of them can be master or slave. For full duplex transmissions Bluetooth uses time division duplex (TDD) scheme. It is divided into slots and mostly slots are 0.62ms long. 1.3 Relationship with OSI reference Model Actually Bluetooth does exactly match with OSI reference model, but still it will be useful to relate the Bluetooth with OSI reference model so that we can analyze the difference and similarities between them. Figure below will show you the both models. As you can see both these models are not same but it is useful to relate various parts of OSI reference model and Bluetooth model. This comparison will highlight the responsibilities of different part of the Bluetooth. The physical layer is responsible for electrical interfaces and channel coding so it covers the radio and some part of the baseband. Link layer is responsible for transmission, framing and error control so it overlaps the link control tasks and the control end of the baseband which includes error checking and error correction as well. Network layer is responsible for data transfer across the network so this consists of the higher end of the link controller; it also sets up and maintains multiple links as well. It covers most of the link manager tasks as well.
TCP/IP OVER BLUETOOTH
481
Transport layer is responsible for reliability and multiplexing of data transfer across the network to the level which provided by the application layer. It overlaps at the high end of the link manager and Host Controller Interface (HCI). It actually provides the data transport mechanism.
sequence number to each data packet; this sequence number also makes sure that data is given in the correct order at the other end.
The session layer provides the data flow and the management services. These are covered by RFCOMM and logical control. Presentation layer actually provides a common representation fro application layer data by adding the service structure to the units of data which is the main task of RFCOMM and sequence discovery protocol (SDP).
The time when a segment is transmitted and the acknowledgement is received is called the Round Trip Time for one segment. It is very important that control scheme has very good estimation of time so that it will not create any chances of overlapping and corruption of data. It is also equally important for congestion control. The congestion control is a sliding window flow control. The flow control scheme is self clocking. When it gets the ACK signal from the receiver it actually submits the next segment for the transmission. Transmitter actually adjusts the congestion window depending upon the state. When there is the beginning of the transmission, which means there is no acknowledgement to receive the scheme is in the slow start mode. When the transmission starts running and it is running smoothly as well then the state changes to congestion avoidance state. If there is any packet lost then the state goes to retransmit and fast recovery or the state slow mode.
The application layer is finally responsible for managing communication between host applications. Application layer
Applications RFCOMM
Presentation layer
Session layer
Logical link control and adaptation protocol. HCI
Transport layer
Link manager
Network layer
Link controller
Link layer
Baseband
Physical layer
Radio
2.1 Time calculation for complete Round and Congestion Control
Each time state window is updated the scheme then checks whether it is possible to transmit segments or not. This is determined by the value of the counter which calculates the estimated rate and actual rate. The expected rate is the estimated transmission rate for a non congested connection and the actual rate is the estimated current transmission time. Both these rates are estimated every time the segment is transmitted. As all the segments have equal size so it is calculated straightaway. There are actually three states which determine the status of transmission. a) Slow start
OSI Model
Bluetooth model Fig.3. OSI Vs Bluetooth
2.0
TRANSMISSION CONTROL PROTOCOL (TCP)
In this section we shall discuss about the TCP its congestion control scheme and TCP is one of the core protocols of the internet protocol suite. Using TCP applications on the networked host create connections with one another, over which they exchange data packets or information. TCP guarantees reliable and orderly correct exchange of information between the transmitter and the receiver. When a connection is established between the two nodes, TCP transmits the information with the same order and sequence as well. Before transmitting this should be clear that data is divided into segments and these segments have equal size and segmentation is done before the TCP layer. To make sure that no packet is lost TCP gives a
The purpose of the slow start is to increase the transmission rate. As there is no acknowledgement signal coming from the receiver, so it works following way. Firstly it sets the current congestion value to 1 segment. After that is increased by doubling its value at every segment transmission. b) Congestion avoidance state The purpose of the congestion avoidance state is to get maximum throughput without causing congestion by finding a window size. c)
Fast retransmit and fast recovery
This scheme is used to retransmit the data or to recover the lost data. It does the functionality the by adjusting the counter’s value which is called “sstresh”. “sstresh” is initially set ½ of the congestion
482
KHAN ET AL.
window. The segment which was lost is retransmitted and congestion window is wet to “sstresh+3”. 3.0 TRANSMISSION PROCEDURE The transmission procedure containing TCP/IP and Bluetooth is described by using the piconet topology with sender and receiver. You may consider it as a PC and the Server attached with each other. To understand the transmission process you will have to consider the protocol stack of the Bluetooth. Now we shall discuss that how segments are transmitted and received with the help of the figure below. Initially the sender’s application layer generates segments of data. These segments then are transmitted to the TCP layer. Now on the TCP layer a TCP header is added with the segments. From the TCP layer the segment is transmitted to the IP layer. IP keeps on storing the segments and then transmitting them to the Logical control and adaptation protocol layer (L2CAP) at regular intervals. Now form L2CAP segments are divided into Bluetooth data packets and then they are transmitted to the Bluetooth baseband layer. Now Baseband transmits the packets. Packets may be lost during transmission because of BER (Bit Error Rate). The errors are modelled in a table called constant Packet Error Probability (PEP). When the receiver receives a packet of data at its Baseband layer, it sends the packet to L2CAP layer of the stack. Now the job of L2CAP layer is to reassemble the packet into the TCP/IP segment and send it to Network or IP layer. IP layer receives the segment and then transfer it to the TCP layer. At this point TCP sends an acknowledgement signal for the segment and then the segment is shifted to the Application layer.
Application Layer
Application Layer
Application Layer
Application Layer
Transport Layer
Transport Layer TCP
Transport Layer TCP
Transport Layer TCP
Network Layer IP
Network Layer IP
Network Layer IP
TCP Network Layer IP
L2CAP Layer
L2CAP Layer Data Link
Baseband
Baseband
Physical Device
Device
Radio Channel
Fig.4. Transmissions of packets
3.1 Data Packet Format Table1 parameters and values
Parameter Data segment size Maximum receiver window Buffer size on Bluetooth Delay on TCP layer Delay on IP layer Delay on L2CAP layer Delay on Baseband Layer
Value 1429 Bytes 12 segments 15 kbytes 1 micro seconds 1 micro seconds 1 milli seconds 1 milli seconds
A data packet which is received by a Bluetooth layer consists of the following three parts 1. TCP header 2. IP header 3. Payload Now L2CAP adds 4bytes which are for channel identification and the length of packet.
483
TCP/IP OVER BLUETOOTH
The size of TCP header is normally 32 Bytes, IP header is 20 bytes and the payload is of 1429 Bytes. Total packet size is 32+20+1429+4 = 1485 Bytes. There are certain delays which need to take in account as well. e.g. when a packet of data is transmitted to TCP is delayed by an amount of time which allows the TCP to add a TCP header. Similarly IP header is added, which takes fraction of second. So every time data is passed from one layer to another, data is stored in the buffer for a certain amount of time. 4.0 The Arrival process How the application layer delivers the data to the TCP layer is determined by arrival process. There are two cases for that. One is when the TCP has always the data to send. So it means that we want to check the maximum throughput. So the arrival data of the data is high enough to fill the queue of the TCP layer. The second case is when we assume more bursty application process. The process is modelled by an Interrupted Bernoulli Process (IBR). Arrival occurs according to the Bernoulli process this is called as an active state. This period is followed by an idle state during which there is no arrival. When we are in active state the process will stay in the active state with the probability l-p or it will go the idle state with the probability p. Now if the process is in the idle state it will stay in the idle state for the probability l-q or it will go the active state with the probability q. when there is an active state a slot contains a packet with probability l. The time slots for the traffic generator are aligned with the time slots of the piconet which is been modelled.
manager to get the PIN and to authorise the new services as well. Protocol layers like RFCOMM, L2CAP etc. query the security manager with access request. The device data base holds the information on whether devices are authenticated and authorised as well. The service data base holds the information on whether the authorisation, authentication and encryption are required for access to the services. It is the service that decides the level of security to be enforced. Security is enforced only when access is required to protocol or a service which required security. The protocol or service requests access from the security manager. The security manager looks up the service or protocol in the service database to see what level of security to be imposed. Then it looks up the connecting device in the device database to see whether it meets the requirements of the service. If necessary the security manager enforces authentication and/or encryption, and sends necessary queries to PIN or authorisation to the user interface. Access is then granted or refused, and if access is granted then service can be used, and if not then service cannot be used. User interface
Queries/regis Application
Security manager
Query
Query
RFCOMM
General management
Service database
Register L2CAP
Query Register
Device database
Host Controller interface (HCI)
Fig.5. Security architecture 5.0 SECURITY ARCHITECTURE The Bluetooth security architecture is defined on the basis of modes. There are actually different modes of security. • Security mode 1 is no secure it means that device in this mode will never initiate any security procedure. • Security mode 2 gives service level enforced security services using L2CAP decide whether or not security is required. • Device in this mode initiates security procedure before it establishes a connection. As TCP guarantees sequential and reliable transmissions so TCP services mostly operate at mode 3. The security architecture of the Bluetooth is given in the following figure below. Host controller interface (HCI) queries to find out whether to apply authentication to the connection or not the user interface is queried by the security
It is possible that some services may use the connections without encryption, and then another service will begin using the service which requires encryption. Encryption will be set up for the service which requires it. Other than the link management messages required to configure security, there is no impact on the bandwidth. Same no. of bits is sent for encrypted link as are sent on a UN encrypted link. 6.0 CONCLUSION Carrying TCP over wireless networks causes the degradation in the throughput and increased delays as flow control mechanism in TCP reacts on delays which are introduced by retransmitting the packets having errors is the network was congested.
484
KHAN ET AL.
We have looked into the Bluetooth protocol stack and compared the Bluetooth model with OSI model to have an understanding of the layered architecture. We have tried to show that how Bluetooth handles the TCP/IP traffic and we also tried to show that Bluetooth wireless ad-hoc network handles the traffic very well. Throughput is kept at high level and end to end delays caused at different layers during buffering and retransmission is also at acceptable level.
References: [1] Bray, J and Sturman, F, C. (2002) “Bluetooth 1.1: Connect without cables” 2nd Ed, Prentice Hall PTR 2001, 2002. [2] Sidnie Feit and Jay Renade,”TCP/IP Architecture, Protocols and Implementation” McGraw-Hill Series on computer communications. [3] Muller and Nathan, J (2001) “Bluetooth Demystified/Nathan J.Muller” McGraw Hill, 2001 New York; London. [4] Christian, G.,Person, J.,Smeets, B(2004) “Bluetooth Security” Boston, Mass, London. [5] Niklas Johnson, Maria Kihl and Ulf Korner “TCP/IP over the Bluetooth Wireless ad-hoc network” Lund University Sweden. [6] Insam and Edwards (2003)”TCP/IP embedded internet applications” Oxford, newness, 2003.
Measurement-Based Admission Control for Non-Real-Time Services in Wireless Data Networks Show-Shiow Tzeng and Hsin-Yi Lu Department of Optoelectronics and Communication Engineering National Kaohsiung Normal University Kaohsiung, 802 Taiwan, R.O.C. Abstract- Non-real-time services are an important category of network services in future wireless networks. When mobile users access non-real-time services, mobile users usually care about two important points; one is whether mobile users are not forced to terminate during their lifetime, and the other is whether the total time to complete mobile users’ data transfer is within their time tolerance. Low forced termination probability can be achieved through use of the technique of bandwidth adaptation which dynamically adjusts the number of bandwidths allocated to a mobile user during the mobile user’s connection time. However, there is not a metric at a connection level to present the degree of the length of the total completion time. In this paper, we describe a quality-of-service metric, called stretch ratio, to present the degree of the length of the total completion time. We design a measurement based call admission control scheme that uses the stretch ratio to determine whether or not to accept a new mobile user into a cell. Extensive simulation results show that the measurement based call admission control scheme not only satisfies the quality-of-service requirement of mobile users (in terms of the stretch ratio) but also highly utilizes radio resource.
I.
INTRODUCTION
Service areas in wireless networks can be divided into cells. In a cell, a mobile user can use one or more bandwidth units (or channels) to access network services. When a mobile user moves from one cell to a neighbor cell, a procedure, called hand-off procedure, is initiated to maintain the mobile user’s communication. If there are sufficient free bandwidths in the neighbor cell such that the bandwidth requirement of the mobile user can be satisfied, the mobile user successfully hand-offs to the neighbor cell; otherwise, the mobile user is forced to terminate. The probability that a mobile user is forced to terminate during the mobile user’s lifetime is called forced termination probability. Mobile users expect wireless networks to provide networks services with low forced termination probability. Many schemes have been proposed to reduce the forced termination probability [1]-[3]. One important category among these schemes is bandwidth adaptation schemes [2]-[11]. The core idea of the bandwidth adaptation schemes is to dynamically adjust the number of bandwidths allocated to a mobile user during the mobile user’s connection time. When a mobile user arrives at a congested cell, the bandwidths allocated to those mobile users already in the cell may need to be reduced in order to satisfy the minimum bandwidth requirement of the arriving user and avoid forcing to terminate the arriving user. The bandwidth adaptation schemes can
enable wireless systems to accommodate more mobile users in a heavy load, but fewer bandwidths are allocated to a mobile user. Real-time services are sensitive to data transfer delay. Adaptive multimedia services are those real-time multimedia services that employee the technique of bandwidth adaptation [4], [6], [8]. When a mobile user suffers a situation that fewer bandwidths are allocated to the mobile user, the data not being timely transferred will be dropped, which will immediately reduce the mobile user’s perceived quality-of-service. In addition, the more bandwidth degradation a mobile user suffers, the lower quality-of-service the mobile user experiences. Therefore, mobile users require a metric to measure the degree of the bandwidth degradation. To measure the degree of the reduction in the number of bandwidth units allocated to a mobile user, several measurements have been proposed in [18], [19], [20]. In [18], the degradation period ratio (DPR) is proposed to measure the percentage of the time that a mobile user suffers bandwidth degradation during the mobile user’s lifetime. The degradation period ratio cannot depict the degree of the degradation in the number of bandwidths allocated to a mobile user. To describe the degree of the bandwidth degradation, the degradation degree (DD) and the degradation ratio (DR) are proposed in [19]. The degradation degree is used to describe the degree of the degradation in the mean number of channels allocated to a mobile user. The degradation ratio is used to represent the percentage of the number of the degraded mobile users in a cell. The call admission control with the DD and DR outperforms that with the DPR [19]. However, either the DPR or the DD and DR just partially describe the degree of the bandwidth degradation. An alternative measurement, called degradation area (DA), is proposed to further describe the degree of the degradation [20]. The concept of the DA is to include the degradation period and the degradation degree; that is, the effect of the DA is similar to combine the measurements in [18], [19]. The call admission control scheme with the DA further improves system performance than that with the DD and DR. All the aforementioned measurement metrics are proposed for adaptive multimedia. The aforementioned measurements [18], [19], [20] cannot be directly applied into non-real-time services with bandwidth adaptation schemes. This is because non-real-time services differ from real-time services in the following ways. First,
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 485–490. © Springer Science+Business Media B.V. 2008
486
TZENG AND LU
non-real-time data must be reliably transferred. That is, when a mobile user experiences a situation that fewer bandwidths are allocated to transfer data, the mobile user does not drop partial data but takes longer time to transfer the data. Second, non-real-time data is insensitive or moderately sensitive to data transfer delay. A mobile user does not concern a situation that fewer bandwidths are temporally allocated to the mobile user, as long as the mobile user can complete its data transfer within a certain time. Third, it is possible for mobile users to use future free bandwidths to compensate the previous bandwidth degradation. In summary, mobile users do not concern about the temporal degradation in the bandwidth, but really care about the total data transfer time instead. Based on the above reasons, we need a new quality-of-service metric to measure the degree of the length of the total data transfer time that mobile users experience at a connection level. In order to quantify the degree of the length of the total transfer time that mobile users experience at a connection level, one measure, called stretch ratio, is employed in this paper. The core idea of the stretch ratio originates from the stretch factor which has been introduced in the job, request or content scheduling to measure the degree of the length of the job processing time [12]-[17]. Given an amount of data that is transferred by a mobile user, the stretch ratio used in this paper for the mobile user is conceptually defined as the ratio of the total time that the mobile user experiences in transferring the data to the total time that mobile user spends in transferring the same data on a light system in which the bandwidth request of the mobile user is satisfied. The stretch ratio can be used as an important metric to measure the quality-of-service of non-real-time services. A measurement based call admission control which uses the stretch ratio is proposed in this paper. The proposed call admission control scheme keeps the stretch ratio below a pre-determined threshold. Extensive simulation results show that the proposed call admission control scheme provides good quality-ofservice in terms of stretch ratio while fully utilizes limited bandwidths. The rest of this paper is organized as follows. Section II describes the traffic model used in this paper. The quality-ofservice metrics for non-real-time services then is explained in Section III. Section IV describes our measurement based call admission control and bandwidth adaptation procedures. Subsequently, simulation environment and results are described in Section V. Finally, some concluding remarks are presented in Section VI. II.
TRAFFIC MODEL
In this paper, we consider a two-dimensional service area that consists of hexagonal cells. There are B bandwidth units in a cell. A mobile user can employ different number of bandwidth units to access non-real-time services; besides, the mobile user can adjust their bandwidth requirements according to the traffic load in a cell. The number of possible bandwidth units that mobile users can choose is denoted by K.
All the possible bandwidth units that mobile users can choose are denoted by b1 , b2 ,..., bK , where bk < bk +1 for k = 1, 2, ..., K − 1 . Therefore, b1 denotes the minimum bandwidth requirement of mobile users, and bK denotes the maximum bandwidth requirement of mobile users. For simplicity, we assume that all of mobile users have the same bandwidth request in this paper. The number of bandwidth units that mobile user i requests is denoted by ri . Since nonreal-time mobile users can tolerate pauses during the period of services, the minimum bandwidth requirement of mobile users can be set to zero. III.
THE QUALITY-OF-SERVICE METRICS FOR NON-REAL-TIME SERVICES
As mentioned earlier, forced termination probability is a well-known quality-of-service metric for networks services in wireless networks. Besides the forced termination probability, we need another quality-of-service metric which is used to measure the degree of the length of the total completion time that a mobile user experiences. In this section, we describe a quality-of-service metric, namely, stretch ratio, to measure the degree of the length of the total completion time. The definition of the stretch ratio is described as follows. Let ti denote the instant time at which mobile user i builds its connection and starts to transfer data. Let d i (t ) denote the amount of data that mobile user i has transferred during the time period between the time ti and the time t ; in other words, the total time that mobile user i spends in transferring d i (t ) data is equal to t − ti . Given an amount of data d i (t ) that mobile user i has transferred at time t , the stretch ratio for mobile user i is defined as the ratio of the total time that mobile user i experiences in transferring the data to the total time that mobile user i spends in transferring the data on a system in which the bandwidth request of mobile user i is satisfied. The stretch ratio for mobile user i at time t is expressed as follows: t − ti SRi = (1) d i (t ) / ri j
Given that there are n (t ) mobile users in cell j at time t , the mean stretch ratio for cell j is equal to the average of the sum of all stretch ratios for the mobile users in cell j at time t , which is expressed as follows: SR (t ) = j
1
j
n (t )
∑ SR (t ) n (t ) i
j
(2)
i =1
IV.
MEASUREMENT BASED CALL ADMISSION CONTROL AND BANDWIDTH ADAPTATION PROCEDURES
This section first describes the measurement based call admission control procedure proposed in this paper. Since mobile users can be classified into new calls and hand-off calls, the measurement based call admission control procedure includes two sub-procedures: one is a new call admission control sub-procedure which determines whether to admit a
MEASUREMENT-BASED ADMISSION CONTROL
487
new mobile user or not, and the other is a hand-off call admission control sub-procedure which determines whether to accept a hand-off call or not. Finally, a bandwidth adaptation procedure for allocating bandwidth units between admitted mobile users is introduced. A. Measurement based call admission control procedure When a new call arrives at a cell, a new call admission control sub-procedure is initiated. The new call admission control sub-procedure uses two constraints to determine whether or not to admit a new call into a cell. The first constraint is whether the stretch ratio measured in a cell is less than a pre-determined value, which goal is to control the value of the stretch ratio measured in a cell under a pre-determined value. The second constraint is whether the number of mobile users in a cell is less than a certain number, N n , which objective is to avoid admitting a new call into a congested cell. The reason we need to use the second constraint is that besides the stretch ratio, system designers also consider other system parameters, such as the capacity of system equipments and forced termination probability etc. When the new call admission control sub-procedure is enabled in a cell, the subprocedure will calculate the mean value of the stretch ratio in the cell and examine the number of mobile users in the cell. If the mean value of the stretch ratio in the cell is less than a predefined value and the number of mobile users in the cell is less than the maximum number of mobile users in the cell, the new mobile user is admitted; otherwise, the new mobile user is rejected. When a mobile user moves from one cell to an adjacent cell, a hand-off call admission control sub-procedure is initiated to maintain the mobile user’s communication. Due to limited radio resources, the maximum number of mobile users that are allowed to transfer data simultaneously in a cell is also limited. In this paper, the maximum number of mobile users that can transfer data simultaneously in a cell is denoted by N h . The hand-off call admission control sub-procedure deals with the hand-off call according to the number of mobile users in the adjacent cell, which can be classified into three cases. In the case that the number of mobile users in the adjacent cell is less than the maximum number of mobile users served simultaneously in a cell, N h , the hand-off call continues its communication. Otherwise, the hand-off call cannot be served immediately, and the hand-off call admission sub-procedure will attempt to place the hand-off call’s hand-off request into a queue in the adjacent cell. If the hand-off call can tolerate pauses (that is, the minimum bandwidth requirement of the hand-off call is zero), then the hand-off request of the hand-off call can be placed into a queue. Let Q denote the maximum number of hand-off requests accommodated in a queue. Then, in the case that (i) the number of hand-off requests in the queue is less than the maximum number of hand-off requests
that can be accommodated in the queue, Q , and (ii) the minimum bandwidth requirement of the hand-off call is equal to zero, the hand-off request of the hand-off call is placed into the queue of the adjacent cell; otherwise, the hand-off call is forced to terminate. If the hand-off request of a hand-off call is placed in a queue, the hand-off call will wait released bandwidths to be served. If there are two or more hand-off requests in the queue, the requests are served in the way of first-in-first-out. The measurement based call admission control procedure which processes the situation that a mobile user arrives in cell j at time t is summarized in Table 1. B. Bandwidth adaptation procedure A bandwidth adaptation procedure is involved when (i) a mobile user is admitted into a cell or (ii) a mobile user departs a cell. In the case that a mobile user is accepted into a cell, the mobile user may temporarily stay in a queue or immediately be allowed to use bandwidths to transfer data. If the mobile user stays in a queue, the bandwidth adaptation procedure will not allocate any bandwidths to the mobile user and the bandwidth adaptation procedure terminates; otherwise, the following bandwidth allocation operation between m mobile users is enabled. Given m mobile users that are allowed to immediately use bandwidths to transfer data in a cell, if the sum of the maximum bandwidth requirements of the m mobile users is less than or equal to the total bandwidths B , then the maximum bandwidth requirements of the m mobile users are satisfied; otherwise, the bandwidth allocation operation initially assigns ⎣⎢ B / m ⎦⎥ bandwidth units to each one of the m mobile users. Next, we randomly select B − ⎣⎢ B / m ⎦⎥ × m users from the m mobile users, and each selected mobile user is allocated one of the remaining free bandwidth units in the cell. Next, we consider the case that a mobile user departs a cell. If the mobile user stays in a queue before departure, the bandwidth adaptation does not perform any bandwidth allocation operation after the mobile user departs the cell; otherwise, the bandwidth allocation operation described in last paragraph is applied to those mobile users in the cell.
488
TZENG AND LU 7
without stretch ratio 6
Stretch ratio
5
with stretch ratio = 4 4
3
with stretch ratio = 2
Fig.1 A two-dimensional simulation space with 6x6 cells
2
1
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Erlang load per bandwidth unit
V.
SIMULATION ENVIRONMENT AND RESULTS
In this section, extensive simulation results are conducted to study the performances of the proposed measurement based call admission control with the stretch ratio and that without the stretch ratio. First, we describe the simulation environment, and then simulation results are presented and discussed. A. Simulation environment Ideally, we need a simulation environment which is a large two-dimensional space consisting of hexagonal cells. However, it is time consuming to simulate a large twodimensional space. Therefore, we simplify the simulation environment to a two-dimensional space which consists of 6 × 6 cells. Besides, the edges of the two-dimensional space wrap around to the reverse edges as shown in Fig. 1. We also assume that the simulation environment is a homogeneous system. New mobile users arrive at a cell according to a Poisson process with mean rate λn . All cells have the same arrival rate. When a mobile user uses the maximum number of bandwidths bK to transfer data during its entire connection period, the lifetime that the mobile user experiences is referred to as minimum lifetime and is assumed to be exponentially distributed with mean 1 / μ n . The duration that a mobile user sojourns in a cell is exponentially distributed with mean 1/ μ h . When a mobile user attempts to hand-off, the mobile user hand-offs to each of the adjacent cells with equal probability. B.
Simulation results In order to study the performance of the proposed measurement based call admission control scheme, we describe a comparative call admission control scheme. The comparative admission control scheme differs from the proposed measurement based call admission control scheme in that the comparative admission control does not use the measured stretch ratio but merely uses a pre-defined value N n to determine whether or not to admit mobile users. When the measurement based call admission scheme is compared with the comparative admission control scheme, we can observe the effect of the stretch ratio upon wireless systems.
Fig. 2. Stretch ratio The parameters used in the simulation results are described as follows. The number of bandwidth units in a cell, B, is 30. The number of possible bandwidth units that mobile users can choose, K, is equal to seven. The values of all the possible bandwidth units that mobile users can choose, b1 , b2 , ..., b7 , are 0, 1, 2, 3, 4, 5, 6 respectively. The value of N n is set to 30. N h , the maximum number of the mobile users which can transfer data simultaneously in a cell, is equal to 30. The maximum number of hand-off requests accommodated in a queue, Q, is equal to 15. The mean of the minimum lifetime of a mobile user, 1/ μ n , is equal to 120 seconds. The mean duration that a mobile user stays in a cell, 1 / μ h , is 60 seconds. 6 For each simulation run in the simulation results, 10 calls are collected to produce the results. Figs. 2-5 show the performances of the systems with the proposed measurement based call admission control scheme and the system with the comparative scheme. The values of the stretch ratios used in the measurement based call admission control scheme are respectively 2 and 4. Fig. 2 shows the stretch ratios of the systems with the proposed measurement based call admission control scheme and the system with the comparative scheme. From the figure, we can observe the following phenomena: 1) When the system load is light, the stretch ratios in the systems with and without the measurement based call admission control are nearly equal to one. This is because all the systems almost satisfy the bandwidth requirement that mobile users request in a light load. 2) When the system load increases from light load to load 1.0, all the stretch ratios are rapidly increasing. The reason is explained as follows. When more mobile users are admitted into a cell, mobile users will be allocated fewer bandwidths. Fewer bandwidths result in longer service time that further leads to more mobile users stay in a cell. The above process will continue to
489
1
6
0.9 5
Bandwidth utilization
Mean number of occuplied bandwidths per call
MEASUREMENT-BASED ADMISSION CONTROL
4
with stretch ratio = 2 3
2
with stretch ratio = 4
0.7
0.6
0.5
without stretch ratio with stretch ratio = 4
1
0.4
without stretch ratio 0
0.8
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0.3
5
Erlang load per bandwidth unit
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Fig. 4. Bandwidth utilization 2)
Fig. 3 shows the mean number of occupied bandwidths for an admitted call. From the figure, we can observe four phenomena. 1) Mobile users use approximately six bandwidth units in a light load. This is because there are sufficient free bandwidths in a light load and mobile users can use maximum bandwidth requirement (i.e. use six bandwidth units) to transfer data.
0
Erlang load per bandwidth unit
Fig. 3. Mean number of occupied bandwidths for and admitted call
repeat until call admission control mechanism starts to block new mobile users. 3) The system with the proposed measurement based call admission control scheme keeps the stretch ratio of the system around the pre-determined value, 2 or 4, in a heavy load. This is because the measurement based call admission control scheme timely uses the measured stretch ratio to control the number of mobile users in a cell such that the stretch ratio is kept below a predetermined value. 4) The system with the comparative call admission control scheme cannot keep the stretch ratio below the predetermined value, 2 or 4. However, we observe that the stretch ratio is around six when the system load is heavy. The reason is as follows. Although the system with the comparative call admission control scheme does not use the measured stretch ratio to determine whether or not to accept new mobile user, the system uses another constraint that employees N n to control whether or not to accept new calls. The value of the N n is set to 30; that is, when the number of mobile users in a cell is greater than or equal to 30, new mobile users will not be admitted anymore. In such a heavy load situation, each mobile user is on average allocated one bandwidth unit. The stretch ratio in this situation is equal to six.
with stretch ratio = 2
3)
4)
The system using the measured stretch ratio assigns more bandwidth units to an admitted call than that without using the measured stretch ratio. This is because the measurement based call admission control scheme uses the measured stretch ratio to restrict the number of mobile users in a cell. The restriction results in that fewer mobile users are admitted into a cell and more bandwidth units are assigned for an admitted mobile user. The number of bandwidths allocated to a mobile user is decreasing when the Erlang load is approaching 1.0. The reason is as follows. When more new mobile users are admitted into a cell, the sum of the full bandwidth requirements of the admitted mobile users will exceed the total bandwidth available in a cell, which results in that fewer bandwidths are allocated to each mobile user. When the Erlang load is greater than 1.0, the mean number of bandwidths allocated to a mobile user remains constant. The reason is that at heavy load, the number of mobile users accommodated in a cell reaches the maximum number that the measurement based call admission control scheme or the comparative scheme can accommodate.
Fig. 4 shows the bandwidth utilization of wireless systems. From the figure, we can observe that utilization is increasing with the increasing load when the Erlang load is below 1.0. This is because more mobile users are accepted and be served. From the figure, we can also observe that the utilization is nearly one when the Erlang load is around or greater than 1.0, which means that all bandwidths in a cell are allocated to the mobile users in the cell. Fig. 5 shows the forced termination probability. From the figure, we can observe two phenomena.
490
TZENG AND LU
admission control scheme satisfies the quality-of-service requirement of mobile users, in terms of the stretch ratio. In addition, the proposed scheme also highly utilizes radio resources.
0.1
without stretch ratio
0.09
with stretch ratio = 4
Forced termination probability
0.08
with stretch ratio = 2
REFERENCES
0.07
[1]
0.06
0.05
0.04
[2]
0.03
[3]
0.02
0.01
0
[4] 0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Erlang load per bandwidth unit
Fig. 5. Forced termination probability 1)
2)
In a heavy load, the measurement based call admission control scheme produces lower forced termination probability than the comparative scheme. The reason is as follows. The measurement based call admission control scheme differs from the comparative scheme in the first constraint used in the measurement based call admission control procedure. The constraint further limits the number of mobile users accommodated in a cell at a heavy load, which avoids the cell into a congested state. The measurement based call admission control scheme and the comparative scheme produce low forced termination probability even though the Erlang load is greater than 1.0 but less than 1.5. This is because the technique of the bandwidth adaptation can dynamically adjust the bandwidth requirement of mobile users to reduce the forced termination probability. In addition, the queue that accommodates hand-off requests also decreases the forced termination probability.
[5] [6]
[7] [8] [9] [10]
[11]
[12] [13] [14]
VI.
CONCLUSIONS
In this paper, we study the quality-of-service of non-realtime services in wireless data networks. We propose a qualityof-service metric, stretch ratio, for non-real-time services at a connection level in wireless networks. We devise a measurement based call admission control scheme that uses two constraints to determine whether or not to accept new calls. The first constraint is whether the stretch ratio measured in a cell is less than a pre-determined value; the second constraint is whether the number of mobile users in a cell is less than a threshold number. Extensive simulations are conducted to evaluate the performance of the proposed measurement based call admission control scheme. Simulation results show that the proposed measurement based call
[15] [16] [17] [18]
[19] [20]
I. Katzela, and M. Naghshineh, “Channel assignment schemes forcellular mobile telecommunication systems: a comprehensive survey,” IEEE Personal Communications, vol. 3, no. 3, pp. 10-31, June 1996. D. Niyato, E. Hossain, “Call admission control for QoS provisioning in 4G wireless networks: issues and approaches,” IEEE Network, vol. 19, no. 5, pp. 5 - 11, Sept.-Oct. 2005. M. H. Ahmed, “Call admission control in wireless networks: a comprehensive survey,” IEEE Communications Surveys & Tutorials, vol. 7, no. 1, pp. 49-68, First Qtr. 2005. M. Naghshineh and M. Willebeek-LeMair, “End-to-end QoS provisioning in multimedia wireless/mobile networks using an adaptive framework,” IEEE Communications Magazine, vol. 35, no. 11, pp. 7281, Nov. 1997. V. Bharghavan, K. Lee, S. Lu, S. Ha, J. Li, and D. Dwyer, “The timely adaptive resource management architecture,” IEEE Personal Communications Magazine, vol. 5, no. 4, pp. 20-31, Aug. 1998. P.M. Ruiz, J.A. Botia, and A. Gomez-Skarmeta, “Providing QoS through machine-learning-driven adaptive multimedia applications,” IEEE Transactions on Systems, Man and Cybernetics, Part B, vol. 34, no. 3, pp. 1398 - 1411, June 2004. J.-Z. Sun, J. Tenhunen, and J. Sauvola, “CME: a middleware architecture for network-aware adaptive applications,” Proc. IEEE PIMRC 2003, vol. 1, pp. 839 - 843, Sept. 2003. D. Wu, Y.T. Hou and Y.-Q. Zhang, “Scalable video coding and transport over broadband wireless networks,” Proceedings of the IEEE, vol. 89, no. 1, pp. 6-20, Jan. 2001. G. Bianchi and A.T. Campbell, “A programmable MAC framework for utility-based adaptive quality of service support,” IEEE J. Selected Areas in Communications, vol. 18, no. 2, pp. 244-255, Feb. 2000. B. Li, L. Li, B. Li, K. M. Sivalingam, X. -R. Cao, “Call admission control for voice/data integrated cellular networks: performance analysis and comparative study,” IEEE J. Selected Areas in Communications, vol. 22, no. 4, pp. 706-718, May 2004. D. Niyato and E. Hossain, “A novel analytical framework for integrated cross-layer study of call-level and packet-level QoS in wireless mobile multimedia networks,” IEEE Transactions on Mobile Computing, vol. 6, no. 3, pp. 322-355, Mar. 2007. M. Bender, S. Chakrabarti, and S. Muthukrishnan, “Flow and stretch metrics for scheduling continuous job streams,” Proc. ACM Symp. Discrete Algorithms, pp. 270-279, 1998. S. Ganguly, M. Chatterjee, and R. Izmailov, “Non-real-time content scheduling algorithms for wireless data networks,” IEEE Transactions on Computers, vol. 55, no. 7, pp. 893-905, July 2006. L. Becchetti, S. Diggavi, S. Leonardi, A. Marchetti, S. Muthukrishnan, and T. Nandagopal, “Parallel scheduling problems in next generation wireless networks,” Proc. ACM Symp. Parallel Algorithms and Architectures, pp. 238-247, Aug. 2002. N. Joshi, S. Kadaba, S. Patel, and G. Sundaram, “Downlink scheduling in CDMA data networks,” Proc. ACM Mobicom, pp. 179-190, 2000. D. Verma, Content Distribution Networks: An Engineering Approach, first ed. John Wiley & Sons, 2001. H. Zhu, H. Tang, and T. Yang, “Demand-driven service differentiation for cluster-based network servers,” Proc. IEEE INFOCOM, pp. 679-688, 2001. T. Kwon, Y. Choi, C. Bisdikian, and M. Naghshineh, “Call admission control or adaptive multimedia in wireless/mobile networks,” Proc. 1st ACM international workshop on Wireless mobile multimedia, pp. 111 116, 1998. Y. Xiao, C.L.P. Chen, and B. Wang, “Bandwidth degradation QoS provisioning for adaptive multimedia in wireless/mobile networks,” Computer Communications, vol. 25, no. 13, pp. 1153-1161, 2002. Y. Xiao, H. Li, C. L. P. Chen, B. Wang and Y. Pan, “Proportional degradation services in wireless/mobile adaptive multimedia networks,” Wirel. Commun. Mob. Comput. vol. 5, no. 2, pp. 219-243, 2005.
A Cooperation Mechanism in Agent Organization 1
W. Alshabi1, S. Ramaswamy2, M. Itmi1, H. Abdulrab1
LITIS Laboratory, INSA – Rouen, 76131 Mt. St. Aignan Cedex, Rouen, France. {waled.alshabi, mhamed.itmi, habib.abdulrab}@insa-rouen.fr 2 Computer Science Department, Univ. of Arkansas at Little Rock, AR 72204 USA. [email protected]
Abstract-Current research in autonomous Agents and Multi-agent systems (MAS) has reached a level of maturity sufficient enough for MAS to be applied as a technology for solving problems in an increasingly wide range of complex applications. Our aim in this paper is to define a simple, extendible, and formal framework for multi-agent cooperation, over which businesses may build their business frameworks for effecting cooperative business strategies using distributed multi-agent systems. It is a simple, fair and efficient model for orchestrating effecting cooperation between multiple agents. Keywords: Cooperation, coordination, multi-agent systems (MAS), organization. I.
INTRODUCTION
Cooperation and coordination are interconnected. For all successful coordinations, cooperation is essential [1, 2]. While many technologies today have worked on coordination strategies, most implementations are at best either ad-hoc or poorly adaptable and scalable. Similarly in the systems we have observed so far, cooperation is also best ad-hoc in the way it is developed. Hence in this paper, we attempt to define a simple and formal framework for multi-agent cooperation, over which businesses may build their framework for effecting cooperative business strategies using distributed multi-agent systems. If there is no cooperation, entities (agents) will only realize business opportunities that they have a priori knowledge of, or their clients happened to find them. However, it is difficult to rapidly grow such a business in a competitive market space. The reasons are: - Efforts in finding new business opportunities. - Making sure their customers are always satisfied. - Improving their business growth opportunities. Cooperation helps business evolution/growth. Simple, fair and efficient cooperation techniques are fundamental to building efficient coordination mechanisms. In our work, we have focused on the development of a simple, fair and efficient model for effecting cooperation between multiple agents. - It’s simple, because it is quite intuitive in its approach and is computationally tractable such that it can be easily adapted and applied across multiple domains with different types of constraints w.r.t. issues such as bandwidth, etc.
- It is fair implies that each participating agent (if they can provide a needed service) gets a fair and equal bidding opportunity on incoming jobs. - It is efficient because it is pragmatic in the way that cooperation is designed and orchestrated. By decoupling overarching business and policy expectations from the solution design, we provide a very efficient mechanism for not just implementing cooperation, but also to reason about issues that stifle cooperation (this is due to the DAB1, which will be explained subsequently). In addition, within this framework, we also relegate the responsibilities for learning to the individual agents, hence allowing agents to evolve independent of one another. Thereby individual agents are not bound by predefined learning strategies and agents may choose to use any strategy as well as build locally driven heuristics for learning and for responding to a request for bids. Over time, since both their success factors and DAB are used in the bid evaluation process, agents representing entities that do not positively cooperate for the mutual benefit of the entire group will be evidently noticeable. The technique proposed by this paper enforces very minimalistic global control policies, while at the same time allowing maximal control for bid decisions within each of the individual agents. This is also supported by our decision to decouple the dynamic execution hierarchy (in real-time) from the static business hierarchy (for organizational needs) within the proposed cooperation model. This paper is organized as follows: The second section present a brief survey of architectures, and frameworks for cooperative MAS. The third section introduces our hierarchical model, the notion of the CPS2 process as well as the different steps, which compose our CPSP3. The fourth section specifies the choice mechanism that allows a flexible and efficient cooperation between multiple agents. The fifth section gives an illustrative example. In the sixth section, we will have a discussion about the developed model. Finally, the seventh section concludes the paper. II.
BACKGROUND
Cooperation is a key MAS concept [3-6]. Durfee and colleagues [7] have proposed four generic goals for agent 1
Degree of Agent’s Believability. Cooperative Problem Solving. 3 Cooperative Problem Solving Process. 2
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 491–496. © Springer Science+Business Media B.V. 2008
492
ALSHABI ET AL.
cooperation: (i) Increase the rate of task completion through parallelism; (ii) Increase the number of concurrent tasks by sharing resources (information, expertise, devices, etc); (iii) Increase the chances for task completion by duplication and possibility using different modes of realization; (iv) Decrease the interferences between tasks by avoiding the negative interactions. However, cooperation in agent-based systems is at best unclear and at worst highly inconsistent [8]. Researchers like Galliers [9-10] and Conte [11] underlined the importance of adopting a common goal for agent of cooperation which they consider as an essential element of the social activity. We can characterize a MAS system by the type of implemented cooperation which can range from total cooperation to the total antagonism [12]. Completely cooperative agents can change their goals to meet the needs of other agents. Antagonistic agents, on the other hand, will not cooperate and, their respective goals may be blocked. A. Architectures, Frameworks for Cooperative MAS Bond [13] describes the existence of two types of MAS architectures: (i) Horizontal: This structure is useful in some contexts, for example, a situation where a group of agents having different (non-overlapping) capabilities and hence can work towards the goal without needing any conflict resolution. Here all the agents are on the same level with equal importance without a Master / slave relationship. (ii) Vertical: In a vertical architecture, the agents are structured in some hierarchical order. Agents at the same sub-level may share the characteristics of a horizontal structure. The ‘horizontally structured’ MAS model has several issues –a critical issue is that it quickly becomes too complex and unwieldy for practical applications, wherein agents in the MAS may share some common capabilities. Hence most current frameworks have adopted a hierarchical MAS model (vertical) by organizing the agents in some organizational structure. In [14] we compared between three widely used models for agent cooperation in MAS: HOPES, HECODES [15] and MAGIC [16-17]. We highlighted also the shortcoming and the limitations of these models of our viewpoint. III.
A HIERARCHICAL MODEL
We have developed a hierarchical MAS model [14, 18-19] focused on enabling effective cooperation. The choice of a hierarchical model was essentially to overcome the limitations of prior architectures and to avoid certain inconveniences that appear in classical multi-agent models. For instance, the number of links between agents and the quantity of information exchanged between agents that becomes complicated with the increasing number of agents involved in the interactions; and therefore turns into a serious handicap in such models. Additionally, other problems can also appear, such as coordination and control [20-23]. In our model, agents are autonomous entities having control of their own resources as well as bestowed with competences which allow them to cooperate, communicate and work with other such agents. Each agent is capable of providing specific solutions and of resolving local problems autonomously.
Agents can either provide, or need the assistance of other agents, due to lack of information, resources, etc. to accomplish a particular task. This need for assistance by other agents can appear when an agent either doesn’t have the necessary skills that allow the most effective realization of a specific task, or when an agent prefers a cooperative solution. The skill of an agent is an important notion that characterizes the agents in our model. Each agent may have one or several specific skills at the same time. These skills may allow each agent to have specific roles. Agents can also be divided into several groups. Each group of agents consists of several cooperative agents’ members, called professional agents, and a superior member, called coordinator agent. The following sections describe, in a brief manner, the functionality of our CPS process model. A. Hierarchical model functionality In this section, we give an overview about the model’s functionality as well as its four phases. These different phases are explained in detailed manner in related article [18]. The CPS phases of our model include: - Recognition: The CPS process begins when an agent, or a group of agents, identify the need for cooperative action. This need for cooperative action may occur when an agent has to accomplish a goal for which it does not have the necessary capability, or when the agent has a preference towards a cooperative solution. The identification of the need for cooperative action is the recognition phase. - Skills’ Search: In this phase, the agent that identifies the need for cooperative action solicits appropriate assistance. This solicitation is realized through a special process named skills’ search process (SSP). The SSP searches for agents which have the necessary skill(s) to realize the task. The SSP is explained in detail in [18]. - Agents’ Choice: we will concentrate on this particular aspect in this paper. During this phase, the agent(s) which possess the necessary skills to realize the specific task, say T, will be identified as competent agents for accomplishing the needs of the customer. These agents will be contacted by an initiator agent, which attempts to negotiate with the set of competent agents to choose the best agent for accomplishing T. This process is often repetitive; since the chosen agent can repeat the same process independently for a set of tasks. This phase allows us to form the initial cooperation structure [2425], termed a collective. - Execution: In this phase, the members of the collective realize the roles they have been negotiated to perform and provide a feedback that can be used to judge the quality of service provided in accomplishing the customer’s request. The initiator then is able to provide a performance evaluation of the ability of the agents in the collective to perform the job as negotiated before the actual task assignments. IV. OUR CHOICE MECHANISM In the prior section we mentioned the four phases that constitute the CPSP of our model. As the two first phases has
A COOPERATION MECHANISM IN AGENT ORGANIZATION
been widely treated in a related article [9], this section concentrate on the choice mechanism. The objective of this process is to allow for appropriate choices among agents that have the required skills to accomplish a necessary task (also referred to as operation). Before delving into the details of this process; let us recap and clarify the result of the search mechanism, discussed previously, which can be one of the following: - No agent possesses the required skills; In this case the initiator agent doesn’t have any possibilities of choice due to the non-existence of candidates to perform the required task. - One agent possesses the required skills; In this case the initiator agent finds only one competent agent at the end of the search process. As a result, the initiator agent doesn’t have a real choice to make, due to the fact that there is no more than one available candidate for the task. - Several agents possess the required skills; In this case the initiator agent finds, at the end of the search process, several competent agents that have simultaneously the needed skills. For that reason, the initiator agent needs a negotiation as well as choice mechanism that allows for choosing amongst the available candidates. The need of the choice mechanism emerges when we have several agents having the needed skill, these agents are the result of the skills’ search process, and by consequence, the initiator needs to select among the competent agents. The selection will concern those who offer the optimal solutions. For this objective our choice mechanism will apply the negotiation in order to get the optimal solutions. For example, when the initiator agent searches for a skill, all the competent agents (agents who possess the requested skill) will reply to the initiator by sending their bids. These agents will be considered as potential candidates. Consequently, their bids will be evaluated by the initiator according to several parameters. This choice of optimal solution can be realized after evaluating the different bids given by the agents. This evaluation can be realized through a special function defined for this purpose by the user according to the application’s domain. Thus, the choice of optimal solution can be realized according to many parameters, for instance, it can be the agents who offer the best price or the best time to realize the needed skill. As mentioned earlier, when any agent wishes to have some task performed (by one or more agents), the Initiator agent starts by initiating a search for competent agents. Then, the initiator will solicit proposals from these agents by issuing a call for proposals (cfp) act (see [FIPA00037]), which specifies the task, as well any conditions the Initiator is placing upon the execution of the task. Participants receiving the call for proposals are viewed as potential candidates and are able to generate N responses. The agents may refuse to propose or respond with proposals to perform the required tasks. The proposal of each agent should contain the agent’s bid in addition to its Success Factor (detailed in the sequel paragraphs). The Initiator will evaluate the received proposals and then selects the agent that have the most satisfactory (optimal if
493
needed, dependent upon the design) proposal to perform the required task. For the realization of a specific task T, the evaluation of the different bids will be realized by using the following equation: ⎡ ⎤ ⎛ FV ⎞ evaluation (1) × DABAi ⎥ ⎢CI exp ected × ⎜ ⎟ × Bid Ai ⎜ ⎟ SF ⎢⎣ ⎥ A i ⎠ ⎝ ⎦ Ai This evaluation, using (1), will be repeated automatically for each bid. Each parameter in (1) has an important role during the evaluation process: - CIexpected: We write CI to represent the Cooperation Indicator, which has a fundamental role to define a new flexible model. The CI indicates the level of desired and expected cooperation. This provides the necessary framework to model both passive and active cooperation, depending on the user’s need. The expected value of CI is given by the user as initial data, driven by appropriate business process needs, at initialization. The following figure demonstrates the notion of cooperation indicator (it can be any of the grey triangles). Increased cooperation, Decreased autonomy
Equitable cooperation and autonomy
Increased autonomy, Decreased cooperation
Fig. 1. Agents’ Cooperation indicator
- FV: the FV is used to represent the Fairness Value, which is an important parameter for the choice mechanism. FV can be regarded as the optimal value that each agent has to reach as its stated objective, again from an ideal perspective. FV is used to improve the different agents’ success metric during a cooperation process. The FV is also a given set by the user as initial data, at the initialization. - SFAi : SF to signify a Success Factor. The notion of Success Factor was mentioned briefly earlier. We had pointed out that, agents having the required skills for a task may respond to the initiator agent’s solicitation by sending their bids. These bids will be accompanied by a Success Factor for each agent. This value of SF defines the offers’ acceptance average for each agent. For instance, if the SF of agent Ai is equal to 0.4, which means that, 4 offers of 10 that the agent has responded to, have been successful for this agent. Thus, for each agent Ai, the Success Factor called SFAi, is calculated separately. This is not the case with the two first parameters (CI & FV), which are predefined for the entire system. Consequently, each agent in our model has the following values calculate with every transactional event: i) The Total number of Requested Bids for agent Ai, called RBAi. ii) The Total number of Successful Bids received by agent Ai, called SBAi. These two parameters will be essential to calculate the SF for each agent. For this purpose, we divide the number of Successful Bids for agent Ai by the number of Requested Bids
494
ALSHABI ET AL.
for this agent. Thus, SFAi = SBAi / RBAi, where the value of the SF for agent Ai is situated between 0 and 1, 0 ≤ SFAi ≤ 1. To evaluate the value of the SFAi resulting from the previous equation, we have to take in consideration two indispensable factors. The first factor is the Fairness Value (FV), this factor was the subject of the precedent point. The second factor is the Tolerance Value, called (TV). Both factors are defined for the entire system and set up during the initialisation. In the optimal case, the Success Factor for Ai should be equal to the given Fairness value, SFAi = FVinitial. In less advantageous case, the Success Factor for agent Ai is not equal to the given Fairness value, Nevertheless, the SFAi may be situated within a tolerance zone provided by the Tolerance Value, TVmin ≤ SFAi ≤ TVmax. Therefore, all the agents having a SF that lies within the tolerance zone (defined by TV) will be considered as agents having a good degree of success within this system. In worst case, we find agents that have a SF, which is not in the appropriate ranges. Here we distinguish two positions; agents with an inferior or superior SF, outside the tolerance range defined by FV and TV. From a cooperation perspective, agents in an inferior position are worthy of being shown favour; than those that have had much success. Tuning of this measure, can be done based upon specific applicative needs. Bid Aevaluation : As its name indicates, this parameter reprei
sents the agent’s bid evaluation. The value of this parameter results after evaluating the bid given by the agent Ai. This evaluation can be realized through a special function defined for this purpose by the user according to the application’s domain. This function should take in consideration different parameters essential and related to the application’s domain specific needs. These parameters may or may not have the same importance in every application. DABAi: Degree of Agent’s Believability Once the selected agent has completed its assigned task, it begins a completion process to the Initiator. As part of the completion process, the Initiator starts an evaluation process, used to arrive at Degree of Agent Believability (DAB), for the agent that begins the completion process. In this process each agent will be evaluated directly at the end of its task execution by its Initiator. The value of DAB for each agent will be saved in a matrix devised for this purpose. The access and the modification of DAB matrix are mandatory and locked for the Initiator agent until the process is completed. Initially, the value of DAB for every agent is set to 1, which means DAB0Ai = 1 , then the DAB value can be calculated using the following equation:
(
DABAnew = 1 2 × DABAlast + BAinitial BAdelivered i i i i Where, DAB
last Ai
)
(2) initial
is the last value of DABAi , BAi
evaluation of agent Ai offer, B
delivered Ai
is the
is the evaluation of
agent Ai performance (for assigned and accepted task (job) offers). We have to emphasize that the equation (1) is used to evaluate the different agents’ proposals except when the Initiator has the requested skill. That authorizes the Initiator to
participate fairly in the proposition process. In this case, the evaluation of Initiator proposal will be different from other agents’ proposals evaluation. To evaluate its own proposal the Initiator has to apply the subsequent equation: ⎡ ⎤ ⎛ FV ⎞ evaluation (3) × DABAi ⎥ ⎢(1 − CI exp ected ) × ⎜ ⎟ × Bid Ai ⎜ ⎟ SF ⎢⎣ ⎥ A i ⎠ ⎝ ⎦ Initiator The main distinction between the equation (1) and the equation (3) exist in the method of calculate the first parameter of the equation, which concerns the Cooperation Indicator. In the equation (1) the value of CI is equal to the value of CI expected (initial data) whereas, in the equation (3) we deduct the value of CI expected from 1. This operation has as objective to favours the cooperation with other agents. The purpose of having two different behaviours, to evaluate the first party of the equation, is to permit our model to be flexible and to avoid the problem of rigidity that subsists in many other models. So in systems tuned to offer a higher degree of cooperation, priority is given to the best candidate agent. V.
EXAMPLE
In order to illustrate the proposed idea we present this simple example. Consider the following scenario: Let three agents A1, A2 and A3 (the restricted number of agents is just for illustrative purposes). Each one of these agents may possess one or several skills at the same time. These skills allow the agent to realize a number of tasks (sometimes called operations), and consequently, each agent hold a specific role within the group. Lets further assume that the agent A1 possess skills S1, S2 and S3, the agent A2 possess the skills S1, S4 and S5, finally, the agent A3 possess the skills S1, S3 and S6 (Figure 2), Nor the nature neither the functioning of these skills are the subject of our interest in this paper.
A1
S1, S2 and S3
A2
A3
S1, S4 and S5
S1, S3 and S6
Fig. 2. Agents’ hierarchy
At the initialization, the user will be required to provide the value of some essential parameters. These initial data given by the user is the Cooperation Indicator expected (CI), the Tolerance Value (TV) and the Fairness Value (FV). We suppose that the values given by the user at the initialization is equal to 80% for the CI, 20% for TV and 0.5 for the FV. The value of CI given (by the user) indicates that the user prefer the cooperation among agents (prefer cooperative solution by 80%), the FV is equal to 0.5, which means that 5 offers over 10 requested should be realized by each agent in this group. due to the existing of TV, the FV can be extended.
A COOPERATION MECHANISM IN AGENT ORGANIZATION
Thus, we can consider the value of SFAi is good (satisfied) if it is situated between 0.4 and 0.6, thus 0.4 ≤ SFAi
≤ 0.6 .
In this example, we have the coming situation: the agent A2 is asked to realize a task T3, which the agent A2 is incapable to realize. We say that an agent is capable to realize a task T if it has the appropriated skills. Thus, the realization of T3 necessities the possession of the skill S3. In the given example, we distinguish that the agent A2 doesn’t possess the required skill. Whereas, within this group the other two agents (A1 and A3) possess the required skill S3. As a result, the agent A2 (initiator) will solicit proposals from other agents that have the required and appropriated skills. This is effectuated by issuing a message that call for proposals (cfp) from other agents, as mentioned in section 3. The answer given by the agents A1 and A3, after evaluating their capabilities as well as their availability to realize T3, is equal to 38 for A1 and 41 for A3. These values can represent the operation’s cost, the operation’s duration …etc. that depends on the application’s domain. In order to choose between these two competent agents, we should use the equation (1) in which the value of two parameters are already given by the user at the initialization, CI = 80 and FV = 0.5. We still have to calculate three parameters: DAB, SF and Bid evaluation for each agent separately. Being at the initialization, and because of the nonexistent of previous values: - The DAB’s value is equal to 1 for both A1 and A3; otherwise the DAB’s value is calculated by the equation (2). - In case of the existence of several parameters such as time, cost…etc. the agents’ Bid evaluation depends on the importance of these parameters, which varies according to the application’s domain. For instance, Bid evaluation = 3*Time +1/cost. In this example, the Bid evaluation is equal to 1 over the bids given by the competent agents;
Bid Aevaluation = 1 Ai offer Thus, bid evaluation for A1 =1/38 and i 1/41 for A3. The agents’ SF is equal to 0.1 at the initialization, otherwise SFA = SBA RBA . i
i
i
According to these values, for agent A2 the offer of agent A1 becomes: ⎡ ⎤ ⎛ FV (0.5) ⎞ evaluation (1 38) × DABA (1) ⎥ ⎢CI exp ected (0.8) × ⎜ ⎟⎟ × Bid A ⎜ ⎢⎣ ⎥⎦ A ⎝ SFA (0.1) ⎠ = 0.104 1
1
1
1
And the offer of agent A3 becomes: ⎡ ⎤ ⎛ FV (0.5) ⎞ evaluation (1 41) × DABA3 (1) ⎥ ⎢CI exp ected (0.8) × ⎜ ⎟ × Bid A3 SF (0.1) ⎝ A3 ⎠ ⎣ ⎦ A3 = 0.096 From the two prior equations, we distinguish that the agent A2 will favors the cooperation with A1 due to its results, which is superior to the result of A3. Therefore, in the succeeding bids the value of SF as well as the value of DAB for agent A1 will be different from the values given at the initialization.
495
Once the agent A1 has finished the realization of the task T3 and sends the results; the initiator (A2) will revaluate the DAB concerning the agent A1. The purpose of this revaluation is to insure the agent’s engagement respect to its bid. If the initiator agent A2 possessed the required skill its bid will evaluated by the equation (3) instead of the prior equation. Thus, the first parameter will be equal to (1 − CI exp ected ) = 0.2, in this case, even if the initiator possess the required skill, it will be disfavored. VI.
DISCUSSION
The prior mechanism is a flexible mechanism, which is capable to treat passive and active cooperation. This mechanism allows the user to determine the percentage of expected cooperation. This can be realized by the CI given by the user at the initialization. In the figure below, the discontinuous arrow represents the CI expected and given by the user, which is equal to 0.5.
Fig. 3. Choice mechanism
Our model allows also the user to precise the tolerance value expected. All the agents situated within the tolerance zone (defined by TV) will be considered as satisfying from the user’s point of view. Whereas the value of TV given earlier by the user is equal to 20%, that means 10% less than 0.5 (the CI’s value) and 10% more than 0.5. This TV is represented in the area situated between the values 0.4 and 0.6 in the previous figure. The curved line in figure 3 represents the desired performance of the agents’ cooperation. Thus, we aim to maintain the maximum number of agents within the tolerance zone. All the agents having a performance situated out of the tolerance zone will be oriented towards this zone. One of the advantages of our mechanism is to give an equal opportunity for the different agents. Consequently, it favours the cooperation among agents. This leads to improve the global performance. VII.
CONCLUSIONS
In this paper, we have presented a hybrid hierarchical model for cooperative problem solving, which describes all aspects of the cooperation process, from recognition of the potential for cooperation through to execution. A measure of cooperation, the Cooperation Index (CI) has been developed and it indicates the level of desired and expected cooperation among the different agents in the MAS. This provides the
496
ALSHABI ET AL.
necessary framework to model both passive and active cooperation, depending on the user’s needs. REFERENCES [2] [3] [4]
[5] [6] [7]
[8] [9] [10]
[11]
[12]
[13] [14]
[15] [16] [17] [18] [19] [20] [21]
1. H. S. Nwana, L. Lee, and N. R. Jennings, “Co-ordination in software agent systems,” in BT Technol J Vol 14 No 4, October 1996. 2. J. Ferber, “Coopération réactive et émergence,” in Intellectica, 1994, pp. 19-52. S. Cammarata, D. McArhur, and R.Steeb, “Strategies of Cooperation in Distributed Problem Solving”, In Karlsruhe, editor, Proc. of the 8th Intnl Joint Conf. on AI, v 2, pp 767-770, Aug.1983. Y. Demazeau and J.-P. Müller, editors. Decentralized AI 2 - Proceedings of the Second European Workshop on Modelling Autonomous Agents and Multi-Agent Worlds (MAAMAW-90). Elsevier Science Publishers, 1991. E. H. Durfee, V. R. Lesser, and D. D. Corkill. Cooperative Distributed Problem Solving. Vol. 4, page 83-147,Addison Wesley,1989. InterEditions, 1995. E. H. Durfee and V. Lesser. Negotiating task decomposition and allocation using partial global planning. In L. Gasser and M. Huhns, editors, Distributed Artificial Intelligence Volume II, pages 229-244. Morgan Kaufmann: San Mateo, CA, 1989. J. E. Doran, S. Franklin, N. R. Jennings and T. J. Norman, “On Cooperation in Multi-Agent Systems”, The Knowledge Engineering Review, 12(3): 309-314, 1997. J. R. Galliers. A Theoretical Framework for Computer Models of Cooperative Dialogue, Acknowledging Multi-Agent Conflict. PhD thesis, Open University, UK, 1988. GALLIERS J.R. (1991) Modeling autonomous belief revision in dialogue. In Yves Demazeau et Pierre Müller (éd.) Decentralized Artificial Intelligence 2: Proceedings of the Second European Workshop on Autonomous Agents in a Multi-Agents World, Elsevier Science Pub./North Holland. R. Conte, M. Miceli, and C. Castelfranchi. Limits and levels of cooperation. In Y. Demazeau and J.-P. Müller, editors, Decentralized AI 2 – Proceedings of the Second European Workshop on Modelling Autonomous Agents and Multi-Agent Worlds (MAAMAW-90), pages 147-160. Elsevier Science Publishers B.V.: Amsterdam, The Netherlands, 1991. B. Moulin and B. Chaib-draa, “An overview of distributed artificial intelligence” In G. M. P. O’Hare and N. R. Jennings, editors, Foundations of Distributed AI, pages 3-54. John Wiley & Sons: Chichester, England, 1996. A. H. Bond, “Distributed Decision Making in Organisation” IEEE Transactions on SMC Conference, November 1990. 3. Alshabi W., Ramaswamy S., Itmi M., and Abdulrab H., “Coordination, Cooperation and Conflict Resolution in Multi-Agent Systems,” in Intnl Joint Conferences on Computer, Information, and Systems Sciences, and Engineering, (IEEE CISSE’06), 2006, p. 6p. D. Bell and J. Grimson, “Distributed Database Systems”, AddisonWesley, 1992. N. Bensaid and P. Mathieu, “A Framework for Cooperation in Hierarchical Multi-Agent Systems”, Mathematical Modeling and Scientific Computing. Vol 8. 1997. N. Bensaid and P. Mathieu, “An Autonomous Agent System to Simulate a Set of Robots Exploring a Labyrinth”, Proceedings of the 11th Intl. FLAIRS Conference, FLAIRS’98. AAAI Press W. Al-shabi, S. Ramaswamy, M. Itmi, and H. Abdulrab, “A cooperative problem-solving process in hierarchical organization in Summer Computer Simulation Conference, (SCSC’07), July 15 - 18, 2007, p. 6. 4. Alshabi W., Itmi M., and Abdulrab H., “A Hierarchical model for transport application,” in 7ième édition du congrès international pluridisciplinaire, Qualita, 2007. 5. N.E. Bensaid and P. Mathieu, “MAGIQUE Un Modèle d’Architecture Multi-Agents Hiérarchique.” 6. N.E. Bensaid and P. Mathieu, “A hybrid and hierarchical multi-agent architecture model,” in Proceedings of the Second International Conference and Exhibition on the Practical Application of Intelligent Agents and Multi-Agent Technology (PAAM’97), London- United Kingdom, 21st-23rd April 1997, pp. 145–155. “
[1]
[22] 7. N.E. Bensaid and P. Mathieu, “A Hybrid Archtecture for Hierarchical Agents,” 1997. [23] 8. J. P. Briot and Y. Demazeau, Principes et architecture des systèmes multi-agents, 2001. [24] 10. Li Minqiang Li Changhong, Kou Jisong, Cooperation Structure of Multi-agent and Algorithms. In proceedings of the 2002 IEEE International Conference on Artificial Intelligence Systems (ICAIS’02), 2002. [25] M. Luck M. d’Inverno, and M. Wooldridge. Cooperation Structures. In Proceedings of the Fifteenth International Joint Conference on Artificial Intelli-gence (IJCAI-97), Nagoya, Japan, August 1997
A Test Taxonomy Applied to the Mechanics of Java Refactorings Steve Counsell, Stephen Swift and Rob M. Hierons School of Information Systems, Computing and Mathematics, Brunel University, Uxbridge, Middlesex. UB8 3PH. {Steve.Counsell, Stephen.Swift, Rob.Hierons}@brunel.ac.uk
Abstract. In this paper, we describe the automated production of all interactions between the mechanics of seventy-two refactorings proposed by Fowler in the form of chains. Each chain represents the paths a specific refactoring may follow due to its dependencies on n possible other refactorings. We enumerate all possible chains and then investigate three hypotheses related firstly, to the number of chains generated by specific refactorings and their chain length, secondly, the relevance and applicability of a test taxonomy proposed by van Deursen and Moonen (vD&M) and, finally as to whether certain ‘server’ refactorings exist when those chains are scrutinized. Two of the proposed hypotheses were supported by the data examined, suggesting a far deeper complexity to refactoring inter-relationships than first envisaged. We also investigated the possibility that chains eradicate bad code smells.
I. INTRODUCTION Much research has addressed the area of refactoring in recent years [4, 6, 7, 8, 9, 10]. An open problem that remains, however, is the likely impact in terms of developer effort that having to follow a specific refactoring and its associated dependencies will require. For example, many of the seventy-two refactorings stated in Fowler’s seminal text [8] require the use of at least one other refactoring in order to be completed properly. This makes the task of re-testing after each individual refactoring more involved and convoluted than it might initially appear. The fact that the source (i.e., initial) refactoring may have multiple dependencies implies a chain of refactorings, each link of which requires specific re-test effort. Moreover, each refactoring link in that chain may have its own dependencies. As an example, it is far easier to test that program semantics have been preserved as a result of renaming a method, than it is to test the result of having to create two subclasses from a single class - the mechanics of the former are relatively simple. In this paper, we describe the automated generation of all interactions between the seventy-two refactorings proposed by the Fowler [8] into chains. Each chain represents the paths a specific refactoring may need to follow due to its dependencies on other refactorings. We describe an analysis of these chains in the context of van Deursen & Moonen’s (vD&M) testing taxonomy and of the 282 chains induced by the seventy-
two refactorings. We then test three specific hypotheses related to chain characteristics. We also identified a specific set of ‘server’ refactorings that appeared more often in chains than others; this result is in keeping with earlier reported, empirical data [1]. There are three motivating factors for the work described in this paper. Firstly, a static analysis of Fowler’s seventy-two refactorings has proved useful in the past for understanding the complexities associated with refactoring. However, the generation of the complete set of chains from those seventy-two refactorings provides the basis for a much more thorough analysis of the trends and characteristics of each refactoring (and its relationships). Secondly, the link between testing, itself the subject of much work [2, 13], and refactoring permeates throughout Fowler’s text. However, surprisingly little work has investigated the empirical link between the two. During our description of the hypotheses and results we will often refer to the research by Advani et al., [1] where the results of an empirical study of the trends across multiple versions of seven, Java OSS were described. In that paper, the most common refactorings of the fifteen coined a ‘Gang of Six’ were shown to be those with a high in-degree and low outdegree when mapped on a dependency graph. (The dependency matrix that we use in this paper is an alternative and more convenient form of that dependency graph.) In this paper, we explore the intricacies of refactoring inter-relationships, an area acknowledged as a challenge facing the refactoring community [11]. In terms of other relevant related work, an investigation of change metrics as a basis of refactorings is provided in Demeyer et al., [4] and a survey of refactoring in Mens et al., [12]. One observation that could be made about our study is that we shouldn’t need to consider the practical or the theoretical implications of refactorings and their chains, since a software tool will handle all the mechanics of a chosen refactoring for us automatically. While it is true that a refactoring tool can be, and has been, of immense use in assisting developers, there are still many of the seventytwo refactorings proposed by Fowler that have yet to be automated. Pragmatically, many of the seventy two refactorings are exceptionally difficult to automate; equally, some refactorings are more obscure than others
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 497–502. © Springer Science+Business Media B.V. 2008
498
COUNSELL ET AL.
and have not figured in tools as a result, yet play a large part in the mechanics and workings of more high-profile, less obscure refactorings. Orphan refactorings, i.e., those that are neither used by, or use, other refactorings, fall into this category. A third motivation for the work in this paper is therefore the need to understand the inter-relationships between all seventy-two refactorings and not just a subset.
II. TAXONOMY AND HYPOTHESES The algorithms to extract each ‘chain’ used a dependency matrix as a basis. The matrix was developed directly from the mechanics of 72 refactorings stated by Fowler in [8]. To pre-empt a formal definition, an informal notion of a chain is that: ‘a refactoring, let’s say X, may use a set of many other refactorings as part of its mechanics. Each of those set elements, may, in turn, use many other refactorings etc., thus forming a tree-like structure, each path through which is a chain’. The algorithm for generating the chains is recursive. Partial chains are initially created containing each of the refactorings; the recursive function is then invoked to follow all possible paths through the dependency matrix until a ‘dead end’ (i.e., cul-de-sac) occurs, i.e., a situation in which there is no (refactoring) path to follow that is not already in the chain under construction.
A. vD&M test taxonomy The analysis in this paper is supported by a refactoring test taxonomy proposed by van Deursen and Moonen (vD&M) [5] and is based on the influence that a refactoring has on the ability to use the same set of tests ‘post-refactoring’. vD&M describe four separate categories into which the seventy-two refactorings of Fowler can be placed. Initially, five categories were described in their paper, including a composite refactoring category. These ‘Type A’ refactorings were dropped from their analysis on the basis that the four ‘Big Refactorings’ making up this category were ‘performed as a series of smaller refactorings’ and could not be analyzed in the same way as the other four categories. The four remaining categories (BE) are: 1. 2. 3.
Compatible (Type B refactorings): do not change the original interface. Backwards Compatible (Type C refactorings): change the original interface and are backwards compatible since they extend the interface. Make Backwards Compatible (Type D refactorings): change the original interface and can be made backwards compatible by adapting the old interface. For example, the ‘Move Method’ refactoring that moves a method from one class to another can be made backwards
4.
compatible through the addition of a ‘wrapper’ method to retain the old interface. Incompatible (Type E refactorings): change the original interface and are not backwards compatible because they may, for example, change the types of classes that are involved making it difficult to wrap the changes.
In theory, a developer would always prefer Type B refactorings in preference to Type C, and Type C refactorings in preference to Type D, etc, since increasing amounts of changes are required to the test suite with each Type in ascending alphabetical order. In the analysis that follows, we use two ‘groups’ of refactorings. The first is a combined group comprising Fowler’s two categories: ‘Dealing with Generalisation’ and the ‘Big Refactorings’ (16 refactorings in total). Henceforward, we refer to these refactorings as ‘Group One’ refactorings. The second group comprises the refactorings in Fowler’s ‘Making Method Calls Simpler’ category; we will refer to these refactorings as ‘Group Two’ refactorings (15 refactorings in total). The two groups were chosen for the study to provide the widest contrast between the seven groups into which the 72 refactorings specified in [8] are split (space prevents a treatment of all seven groups).
B. Three Hypotheses Hypothesis H1: Refactorings drawn from different categories of refactoring (as defined by Fowler) differ significantly in the number of chains they generate and their maximum chain length. Hypothesis H1 is based on the belief that the more abstract and high-level a refactoring, the more chains will be generated by these types of refactorings. For example, we would expect inheritance-based refactorings where we are manipulating classes and methods in a hierarchy to require the use of a large number of other refactorings and hence generate large numbers of chains. Conversely, we would expect low-level, concrete refactorings to generate relatively fewer chains since they tend to ‘used by’ other refactorings rather than doing the ‘using’ themselves. The renaming of a method is one such example of the latter, since it may be used by many refactorings, but does not, by itself, use that many refactorings. Hypothesis H2: The distribution of Type B, C, D and E refactorings will be distributed in such a way that Type B and C refactorings will dominate Group Two refactorings and Type D and E refactorings will dominate Group One refactorings. Hypothesis H2 is based on the belief that refactorings from which relatively long chains are generated (i.e., more complex refactorings) tend to alter the original test suite to a much greater extent than less complex refactorings (generating smaller chains) because they use a relatively larger number of other refactorings
A TEST TAXONOMY APPLIED TO THE MECHANICS OF JAVA REFACTORINGS
with far-reaching effects; consequently, we would expect Type D and E refactorings to dominate the chains of Group One refactorings and Type B and C refactorings to dominate Group Two refactorings. Hypothesis H3: There is a small subset of the seventy-two refactorings that appear more frequently in the chains than other refactorings and, as such, the distribution of refactorings is not evenly spread. Hypothesis H3 is based on the belief that there is a set of ‘server’ refactorings that are used by a large number of other refactorings because they provide a service integral to the mechanics of the refactoring using them. For example, it has been shown that ‘Move Method’ (MM) is a frequently applied refactoring, probably because it is used by the mechanics of no less than twelve other refactorings [3]. If the MM refactoring is used frequently, then all chains generated by the MM refactoring will also be used by the original refactoring using MM.
III. DATA ANALYSIS C. Hypothesis H1 Fig. 1 shows the distribution of the number of chains from the seventy-two refactorings generating at least two chains. The refactoring with the largest number of chains is the ‘Extract Hierarchy’ refactoring with 35 generated chains. According to Fowler, the motivation for this refactoring is when a class is doing far too much work itself and should be decomposed into a set of classes linked through an inheritance structure. The refactorings with the next two highest number of generated chains are the ‘Extract Subclass’ (17 generated chains) and ‘Extract Superclass’ refactorings (23 generated chains). Both of these refactorings relate to manipulation of the inheritance hierarchy. In the former case, ‘A class has features that are used only in some instances’ [8]. A subclass is thus created for that subset of features. In the latter case, ‘You have two classes with similar features’ [8]; a superclass is created and the common features are moved to that superclass.
35 30 25 20 15 10 5 0
40 35 30 25 20 15 10 5 0 hy r c .. ra in . Hie ma a... o r ct tra e D edu .. Ex r at oc ri. pa Pr nh e Se e rt rt I nv pa .. Co e A io. at as leg .. Te De itan ce er ... pla Inh e M Re ace plat chy r pl m a Re Te ier r m e H ce Fo ps erfa ss lla nt la Co ct I pe rc tra u ass S Ex act bcl d tr Su iel Ex act n F hod tr w et Ex Do n M t.. sh ow ruc Pu h D nst s Co d Pu Up th o ll Me Pu U p ld ll Fie Pu U p ll Pu
No. of Chains
We remark that 38 of the 72 refactorings generated no chains implying that they did not ‘use’ any other refactorings as part of their mechanics. Two other notable refactorings from Fig. 1 are refactorings 33 and 34 – ‘Separate Domain from Presentation’ and ‘Tease Apart Inheritance’, respectively. The former of these two refactorings is similar to an implementation of the ModelView-Controller pattern; the latter refactoring creates two inheritance hierarchies from one. Interestingly, these same two refactorings, together with ‘Collapse Hierarchy’ are all drawn from the ‘Big Refactorings’ category specified by Fowler [8] and form part of Group One refactorings. The other refactoring in this category is ‘Convert Procedural Design to Objects’ (this refactoring transforms code written in a procedural style without appropriate objects to an OO style). These were the same four refactorings discarded by vD&M because they were considered too complex to categorize in testing terms. (This explains why there was no Category A in their taxonomy.) Fowler suggests that these four refactorings require a team-wide effort and understanding to be implemented properly rather than being applied by a single developer. Both Fowler and vD&M provide a compelling argument for treating these four refactorings with caution. The feature that bonds all four refactorings together is that they either manipulate the inheritance hierarchy or make fundamental changes to the way the system is architected. Figs. 2 and 3 show the number of generated chains for refactorings in Group One and Two categories, respectively. As expected, the Group One refactorings, on average, generate far more chains than their Group Two counter-parts. The maximum number of chains in Group Two is eight, shared by two refactorings. These are ‘Introduce Parameter Object’ (a group of parameters are formed into an object) and ‘Encapsulate Downcast’ (where a return value is cast from ‘Object’ to an Object Type). Only five of a total of sixteen Group One refactorings generate a single chain; this compares with eleven of the fifteen Group Two refactorings. Table 1 illustrates the differences between the two groups in terms of summary statistics. The median value reveals the extent of the difference in the two groups. No. Generated Chains
40
34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1
Refactoring
Refactoring
Fig. 1: Chains generated by a 34 of the 72 refactorings
499
Fig. 2: Chains generated by Group One refactorings
500
Max. Chain Length
Two is not purely coincidental. On the basis of evidence presented, there is sufficient evidence to support H1.
9 8 7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0 . y m .. ch a r fr o .. . n er H i ma i l D e ct Do ura ce tr a Ex ra te ced tan i p a r o er P h Se r t In .. e t ar th . nv p wi Co A n e ti o i.. as ga e w Te e le nc d e D rita ho a c nh e M et pl I R e ace lat e y pl mp ch R e e e r ar T i rm H e F o ps e rfa c te ass l la C o ct In er cl t r a up s Ex ct S cl as tr a ub ld Ex ct S F ie d y tr a w n tho od Ex D o M e r B o ct s h wn Pu Do stru on sh Pu p C h od t ll U e Pu p M d l l l U Fi e p
ll U
Refactoring
Pu
Pu
... wit on t... i pti ce de w Ex o ce st rC pla rro nca Re ce E ow w.. . D or a pl late ct Re u tru s ps on ca d En ce C ho pla hod M et . . Re et ing O. tt er eM Se me t .. . Hid e ra r wit ov m Pa e Re uc e met t d j ec ra tro Ob . Pa In e le wit.. ac ho pl W eter e Re erv aram od es th .. Pr ce P Me . pla rize From Re e te y er m ra Qu eter Pa a te r am ar pa Se v e P r e et mo Re aram od h P et d Ad eM m na Re
No. of Generated Chains
COUNSELL ET AL.
Fig. 3: Chains generated by Group Two refactorings
Refactoring
Table 1. Summary statistics for chains Min. 1
Max. 7
Med. 3
Std. Dev. 1.35
Group One
16
1
7
4.5
2.60
Group Two
15
1
6
1
2.26
7 6 5 4 3 2 1 0 ... wit on .. i pti ce ew Ex od ce r C as t pl a rro nc Re e E ow .. c w. D pl a te tor Re s ula truc s p ca on C En od ce eth pl a hod Re et g M O... r ttin eM e e t Hid e S me i.. v ra mo Pa ter w e Re c e t du aram bj ec ro O . Int ce P le i. ho r w pl a Re v e W ete er am od es P ar eth .. Pr M . ce pl a ri ze rom Re e te ry F e m r ra Qu ete Pa te ra am p a Pa r Se v e ter e mo Re aram od P eth d Ad eM m na Re
Two chains of length seven are generated from all 72 refactorings. These belong to the ‘Decompose Conditional (DC)’ and ‘Extract Hierarchy’ refactorings (a Group One refactoring). The DC refactoring has 7 chains in total (two more of which are of length 6, and four of which are of length 5). The motivation for the DC refactoring is when ‘You have a complicated conditional (if-then-else) statement. Extract methods from the condition, then part, and else parts.’ [8]. The preceding analysis suggests that there are significant differences in the number of chains generated by different refactorings. The fact that Group One refactorings are high-level refactorings and need to use relatively more refactorings than Group Two counterparts may be a significant cause of this difference. Correlation of the two groups values showed non-significant, positive correlations: 0.15, 0.20 and 0.22 for Pearson’s (a parametric test assuming a normal distribution), Kendall’s and Spearman’s coefficients (both non-parametric tests), respectively. Fig. 4 shows the maximum chain length for Group One refactorings, contrasted with the maximum chain length for Group Two refactorings in Fig. 5. There is a clear difference in size between the maximum lengths of chain generated by the two groups. Nine of the sixteen Group One refactorings have maximum chain lengths of four or more, compared with only three in Group Two. The correlation of the two sets of data gave non-significant, positive values of 0.37, 0.26 and 0.24 for Pearson’s, Kendall’s and Spearman’s coefficients, respectively; this reinforces a lack of relationship between the two groups. A Wilcoxon Signed Ranks (non-parametric) test revealed significance at the 5% level (2-tailed) suggesting that the difference between the lengths of chains in Group One and
Fig. 4: Maximum lengths of chains for Group One Max. Chain Length
No. 282
Refactoring
Fig. 5: Maximum lengths of chains for Group Two
D. Hypothesis H2 Hypothesis H2 establishes whether there are patterns in the distribution of vD&M categories in the chains generated by different refactorings. We hypothesize that high-level (i.e., Group One) refactorings will comprise relatively more Type D and E Type refactorings than lower-level (i.e., Group Two) refactorings because of the far-reaching effects they have on the structure of the system and the subsequent re-test required. Fig. 6 shows the percentage representation of each vD&M category (B, C, D and E) generated by Group One and Two refactorings. 60 50 % Representation
Total
40 Group One
30
Group Two
20 10 0 B
C
D
E
vD&M Category
Fig. 6: vD&M categories/frequencies (two categories)
501
A TEST TAXONOMY APPLIED TO THE MECHANICS OF JAVA REFACTORINGS
Fig. 6 shows a clear trend for the Group Two refactorings to be comprised (in terms of percentage) of more Type D and E refactorings than Group One. This is a surprising result and is in complete contrast to the stated hypothesis (H2). Over half of all refactorings used by Group One were taken from Type C of the vD&M taxonomy. Less than 20% of refactorings in Group Two were of the same Type; approximately 40% of Group Two refactorings were drawn from Type D. The difference between these two groups is further emphasized by the trend in Type E refactorings. For Group Two, just over 21% of refactorings used were drawn from this group. This contrasts with just over 9% from Group One. One explanation for the high number of Type D refactorings is the nature of the refactorings themselves; 9 of the 13 Type D refactorings were accounted for by just six refactorings, all involving the manipulation of parameters. Table 2 shows these six refactorings (n.b., there are only 11 refactorings of Type D in total); all six appear in Group Two. Equally, many of the Type E refactorings were accounted for by encapsulation-based refactorings.
The striking feature of Table 3 is the significant, positive correlation between chain length and Type B and Type C refactorings. In other words, the longer the chain, the more Type B and C refactorings which appear in those chains. We could tentatively conclude that it is not necessarily true that a long chain will necessarily imply a bias towards Type D and E refactorings. Statistically, by virtue of Table 3 alone, Hypothesis H2 cannot be supported.
E. Hypothesis H3
Table 3 shows the five Type E refactorings that relate to encapsulation (from a total of twenty in total). Two of these refactorings appear in Group Two.
Hypothesis H3 explores the possibility that there are certain refactorings of the seventy-two that appear in more chains than others. We would expect a refactoring that appears in many chains (apart from its own) to represent a highly reusable refactoring ‘component’ and, as such, to contain very few dependencies in the form of large numbers of chains. For each of the refactorings in the two groups studied (One and Two), we extracted the number of times that each of those refactorings appeared in other chains. Table 4 shows the sixteen refactorings in Group One and fifteen in Group Two (italicized), together with the number of ‘appearances’ (App.) they made in the chains of other refactorings and the number of chains they themselves generated (we omit a refactoring from the table if it appears in no other refactorings).
Table 2: Type D and Type E refactorings
Table 4: Appearances of refactorings in chains
Type D Type E
Replace Parameter with Explicit Methods, Replace Parameter with Method, Introduce Parameter Object, Parameterize Method, Remove Parameter, Add Parameter. Encapsulate Collection, Encapsulate Downcast, Encapsulate Field, Hide Delegate, Hide Method.
A question that arises from this analysis is whether there is a correlation between the length of a chain and the number of B, C, D and E Type categories. We would expect, by extending our initial hypothesis, that the longer the chain, the more Type D and E categories and the smaller the chain, the more B and C Type categories. Table 3 shows Pearson’s, Kendall’s and Spearman’s correlation values for: chain length versus the number from each vD&M category for Group One and Two chains combined (asterisked values denote significance at the 1% level). Table 3. Chain length versus B, C, D and E Type categories B
C
D
E
Pearson’s
0.58*
0.60*
0.07
0.10
Kendall’s
0.50*
0.53*
0.02
0.07
Spearman’s
0.58*
0.62*
0.03
0.08
Refactoring Pull Up Field Pull Up Method Pull Up Constructor Body Push Down Method Push Down Field Extract Interface Form Template Method Rename Method Add Parameter Remove Parameter Separate Query from Modifier Introduce Parameter Object Remove Setting Method Replace Constructor with Factory Method
App. 10 14 8 9 9 1 10 14 1 1 23 7 1 10
No. Chains 1 2 8 1 1 1 10 1 7 1 1 8 1 1
From Table 4, only seven of the sixteen Group One refactorings and seven of the fifteen Group Two refactorings appear in other chains. For Group One refactorings, a non-significant, positive correlation (between the number of times a refactoring appears in the set of chains and the number of chains) was found: values 0.13, 0.23 and 0.29 for Pearson’s, Kendall’s and Spearman’s coefficients, respectively. For Group Two refactorings, a strong negative correlation was found (0.32, -0.21, -0.28 for Pearson’s, Kendall’s and Spearman’s, respectively). In other words, the more chains a Group
502
COUNSELL ET AL.
Two refactoring appeared in the smaller the chains that the refactoring generated. The opposite applies to Group One refactorings. This supports the claim that Group Two refactorings are ‘used by’ other refactoring more than they ‘use’ other refactorings. Equally, it might explain why Group One refactorings ‘use’ many other refactorings, but are not ‘used by’ many refactorings themselves. The interesting refactoring from Table 4 is ‘Separate Query from Modifier’, which appears in twenty-three chains. The motivation for this refactoring [8] is when a method returns a value but also changes the state of an object. In such a case, two methods are created, one for the query and one for the modification; inspection of the chains when this refactoring appeared revealed the reason why it was so common. It was positioned in a ‘block’ of recurring refactorings whose source refactoring was ‘Extract Method (EM)’. The EM refactoring takes a method that is doing ‘too much’ and splits it into two. The EM refactoring appears no les than 126 times in the set of 282 chains, i.e., in approximately 45% of all chains. This was a revealing and surprising feature of the generated chains; we thus find strong support for H3.
IV. CONCLUSIONS/FURTHER WORK In this paper, we have investigated the characteristics of refactoring chains generated by each of the seventy-two refactorings of Fowler. The chains were extracted automatically using a bespoke software tool. We adopted a hypothesis-based approach to investigate firstly, whether the number and length of chains generated by refactorings were of similar number and size. We also investigated the trends in the make-up of chains with specific recourse to a test taxonomy of van Deursen and Moonen. Finally, we investigated the possibility of certain server refactorings existing, i.e., those that are, in a sense, ‘reused’ by many other refactorings. The work in this paper contributes to knowledge and an understanding of the inter-relationships between refactorings. We feel it is essential, both for the development of software tools to automate the refactoring process and for developers undertaking refactorings on a purely manual basis, that the mechanics of chains and the inter-relationships are well-understood. The length of a chain can have serious implications for re-test and effort expended by a developer. Further work will focus on combining the theoretical approach described in this paper with an empirical set of data on which refactorings developers actually undertook. In particular, whether developers do avoid refactorings that the vD&M taxonomy suggests lead to long and unwieldy chains.
REFERENCES [1] D. Advani, Y. Hassoun and S. Counsell. Extracting Refactoring Trends from Open-source Software and a
Possible Solution to the ‘Related Refactoring’ Conundrum. Proceedings of ACM Symp. on Applied Computing, Dijon, France, April 2006. [2] M. Bruntink and A. van Deursen. An empirical study into class testability. Journal of Systems and Software, vol. 79, no. 9, pages 1219-1232, 2006. [3] S. Counsell, R. M. Hierons, R. Najjar, G. Loizou and Y. Hassoun. The Effectiveness of Refactoring Based on a Compatibility Testing Taxonomy and a Dependency Graph. Proc. of Testing: Academic and Industrial Conf. (TAIC PART), Windsor, UK, 2006, pp. 181-190. IEEE Computer Society Press. [4] S. Demeyer, S. Ducasse and O. Nierstrasz, Finding refactorings via change metrics, ACM Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA), Minneapolis, USA. pages 166177, 2000. [5] A. van Deursen and L. Moonen. The Video Store Revisited - Thoughts on Refactoring and Testing. Proc of the third International Conference on eXtreme Programming and Flexible Processes in Software Engineering XP 2002, Sardinia, Italy. [6] A. van Deursen, L. Moonen, A. van den Bergh and G. Kok. Refactoring Test Code. In G. Succi et al., (eds.), XP Perspectives. Addison Wesley, 2002, pages 141-152. [7] B. Foote and W. Opdyke, Life Cycle and Refactoring Patterns that Support Evolution and Reuse. Pattern Languages of Programs (James O. Coplien and Douglas C. Schmidt, editors), Addison Wesley, May, 1995. [8] M. Fowler. Refactoring (Improving the Design of Existing Code). Addison Wesley, 1999. [9] R. Johnson and B. Foote. Designing Reusable Classes, Journal of OO Programming 1(2), pages 22-35. 1988. [10] J. Kerievsky, Refactoring to Patterns, Addison Wesley, 2004. [11] T. Mens and A. van Deursen. Refactoring: Emerging Trends and Open Problems. Proceedings First International Workshop on REFactoring: Achievements, Challenges, Effects (REFACE). University of Waterloo, 2003. [12] T. Mens and T. Tourwe. A Survey of Software Refactoring, IEEE Trans. on Software Engineering, 30(2): 126-139, 2004. [13] S. Mouchawrab, L. C. Briand and Y. Labiche, A Measurement Framework for Object-Oriented Software Testability, Journal of Information and Soft Technology, vol. 47, no. 15, pages 979-997, 2005.
Classification Techniques with Cooperative Routing for Industrial Wireless Sensor Networks Sudhir G. Akojwar
Rajendra M. Patrikar
Department of Electronics Engineering Rajiv Gandhi College of Engineering. Research. & Tech. Chandrapur, India [email protected], [email protected]
Department of Electronics and Comp. Sc. Engineering. Visvesvaraya National Institute of Technology Nagpur, India [email protected], [email protected]
Abstract— Industrial environment pose concern for harsh noisy environment upon use of Wireless Sensor Networks (WSN). Wavelet Transform is used as preprocessor for denoising the real word data from sensor node. Wireless Sensor Networks are battery powered. Hence every aspect of WSN is designed with energy constrain. Communication is the largest consumer of energy in WSN. Hence energy consumption during communication must be reduced to the minimum possible. This paper focuses on reduced energy consumption on communication. The co-operative routing protocol is designed for communication in a distributed environment. In a distributed environment, the data routing takes place in multiple hops and all the nodes take part in communication. The main objective is to achieve a uniform dissipation of energy for all the nodes in the whole network, and make the classifier work in industrial environment. The paper discusses classification technique using ART1 and Fuzzy ART neural network models. Keywords- WSN; Neural Network; Clustering;Classification; Ptolemy-II; Visual sence; DWT.
I.
INTRODUCTION
Industrial environment pose special concern for the harsh electromagnetic noise during the deployment of WSN for industrial applications. Advances in sensor technology, lowpower electronics, and low -power radio frequency design have enabled the development of small, relatively inexpensive and low-power sensors, called microsensors. These wireless microsensor networks represents[1] a new paradigm for extracting data from the environment and enable the reliable monitoring of a variety of environments for applications that include surveillance, machine failure diagnosis, chemical/biological detection, habitat monitoring, environmental monitoring etc. An important challenge in the design of these networks is that two key resources communication bandwidth and energy - are significantly more limited than in a tethered network environment. These constraints require innovative design techniques to use the available bandwidth and energy efficiently [2]. The communication consumes the largest part of the energy budget. Hence attempt must be done to implement techniques two save energy on communications. The paper discusses real time classifier using ART1[3] and Fuzzy ART neural networks model. This brings a saving of considerable amount of energy. The implementation of Classifier using ART1 and Fuzzy ART
is discussed in detail in [4]. Ptolemy-II is used to model the sensor networks. Ptolemy-II is the software infrastructure of the Ptolemy Project. Cooperative routing is implemented and simulated under Ptolemy-II – Visual Sense environment. Real time classifier using ART1 and Fuzzy ART neural network model with DWT as preprocessor is developed in MATLAB. Ptolemy permits interfacing MATLAB. The classified sensor data is then communicated further using cooperative routing protocol. This scheme gives the wonderful advantage of improving the network bandwidth by use of classification technique and energy conservation by use of cooperative routing. II.
WIRELESS SENSOR NETWORKS
Wireless sensor network is an emerging technology that has wide range of potential applications including environment monitoring, smart spaces, medical systems, and robotics exploration. Such networks will consist of large number of distributed nodes that organize themselves into a multihop wireless network. Each sensor node has one or more sensors, embedded processors, and low-power radios, and is normally battery operated. Typically, these nodes coordinate and communicate to perform a common task. These sensor nodes remain largely inactive for long time, but becoming suddenly active when something is detected. III.
CLASSIFICATION TECHNIQUE
Wireless sensor network is highly data centric. Data communication in WSN must be efficient one and must consume optimum power. Every sensor node consists of multiple sensors embedded in the same node. Thus every sensor node is a source of data. These raw data streams cannot be straight away communicated further to the neighboring node or the base station. These sensor data streams needs to be classified. A group of sensor nodes forms a cluster. Each node transfer data to a cluster head and then cluster head aggregates the data and sends to base station. Hence clustering and classification techniques are important and can give new dimension to the WSN paradigm. Basically, classification system are either supervised or unsupervised, depending on whether they assign new inputs to one of a infinite number of discrete supervised classes or unsupervised
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 503–508. © Springer Science+Business Media B.V. 2008
504
AKOJWAR AND PATRIKAR
categories respectively. ART1 and Fuzzy ART are unsupervised neural network models which are used for classification of sensor data. ART1 model is used for classification of Binary valued data. While Fuzzy ART model can be used for analog data, wherein the input data is fuzzy valued. The general structure of the Fuzzy ART neural network is shown in Fig. 1. It consists of two layers of neurons that are fully connected: a 2M-neuron input or comparison layer (F1) and an N neuron output or competitive layer (F2). A weight value z ji is associated with each connection, where the indices i and j denote the neurons that belong to the layer F1 and F2 respectively. The set of weight Z = {z ji : i = 1,2,...,2M ; j = 1,2,..., N } encodes information that defines the categories learned by the network. These can be modified dynamically during network operation. For each neuron j of F2, the vector adaptive weights z j = ( z j1 , z j 2 ,..., z j 2 M ) correspond to the subset of weights
( z j ⊂ Z ) connected to neuron j. This vector z j is named prototype vector or template, and it represents the set of characteristics defining the category j. Each prototype vector z j is formed by the characteristics of the input patterns to which category j has previously been assigned through winner-take-all competition. ART! And Fuzzy ART classification models are discussed at length in [4][5]. IV.
WSN IN INDUSTRIAL ENVIRONMENT
In industrial environments abundant RF noise is a significant concern. Sources may be explicit, such as 802.11 access points, wireless radios or radar/radio navigation equipment. Other sources are the result of radiated electrical noise from machinery, such as frequency motor controllers or solid-state switchgear. Such noise sources may adversely impact reliability and power consumption of sensor networks. Conversely, the impact of interference from the sensor network, particularly on other plant communication channels must also be taken care while deployment of WSN. The sensing parameters temperature, pressure, humidity, flow etc. too exhibit drastic variations. Unlike traditional wired networks, the sensors of a WSN can be deployed in bearings of motors, oil pumps, into the heart of whirring engines, vibration sensors on packing crates, or many unpleasant or inaccessible or hazardous environments which are impractical with normal wired systems. The deployment issue for sensor networks is discussed at length in [6][7]. V.
treated as one dimensional time series. Many wavelet transforms have been developed, more or less complex. We have chosen Haar 1D wavelet transform, which can be summarized as under – if, c1 (k ) = 1 [c 0 ( k ) + c 0 (k − 1)] is the smoothing of the 2 signal, then w1( k ) = c1 ( k ) − c 0 ( k ) is the first wavelet coefficient. This can be generalized as:
wi (k ) = ci (k ) − ci −1 (k ) . The first smoothing c 0 (t ) = x(t ) is the signal itself and it is the finest scale. The combination of wavelet coefficients and the residual vector c p {w1 (t ), w2 (t ),..., wn (t ), c p (t )} can be taken as a multi-resolution wavelet feature at time T of the signal and it can be processed further as a representation of the signal itself. This Discrete Wavelet Transform (DWT) acts as the entry layer for the Classifier, which smoothes the signal for the classifier. VI.
CLUSTERING ARCHITECTURE FOR WSN
For organizing the distributed data of the sensor networks this ART1 and Fuzzy ART neural network models can be used in two different clustering and data aggregation schemes for sensors network. (1)One cluster head collecting all sensors data: The sensor data is filtered by Discrete Wavelet Transform (DWT), as shown in fig. 1, and then feed to classifier for classification. In this architecture the filtered data from sensor nodes is send to one of them chosen to be a cluster head, (gateway node) where an FA neuron is implemented . This model brings advantages in that we need not to fix in advance the number of clusters (categories) that the network should learn to recognize. This architecture is implemented in this paper. Fuzzy ART
F2
{z ji } -
F1
+ Comp Coding
F0
DWT DWT
WAVELET PREPROCESSOR
Wavelet Transform, is a tool that divides up data, functions or operators into different frequency components and then studies each components with a resolution matched to its scale. Therefore the wavelet transform is anticipated to provide economical and informative mathematical representation of many objects of interest. Each of the classifier in the WSN can use Wavelet preprocessor on each sensor signal, which is
φ
• • • DWT
y y y y y y y SENSOR NODE
SENSOR NODE
Figure 1. The sensor nodes send the sensory reading to one node, which is chosen to be a cluster head, where an Fuzzy ART neuron is implemented.
CLASSIFICATION TECHNIQUES WITH COOPERATIVE ROUTING
remains lot to be done in order to ensure more equitable distribution of the energy consumption. Further, these existing protocols either use probabilistic method for using a path. We introduce a deterministic method for choosing a path. In addition, updating involves extra overhead in the proposed protocols.
ART1
y
y y y
φ
yyy • • • •
Fuzzy ART
Fuzzy ART
-
φ
-
+
Comp Coding
φ
+
Comp Coding
DWT DWT • • • DWT
y y y
DWT DWT • • • DWT
y y y
Figure 2. One Clusterhead Collecting and Classifying the data after they once classified at the lower level
(2)Clusterhead collecting only clustering outputs from the other unit: each sensor node has FA implementations , with DWT entry layer, as shown in Fig.2. Each classifying only its sensor readings. One of this unit can be chosen to be a clusterhead collecting and classifying only the classifications obtained at other units. Since the clusters at each unit can be represented with binary values, the neural network implementation at the clusterhead is ART1 with binary inputs. With this architecture a great dimensionality reduction can be achieved depending on the number of sensor inputs in each unit. In the same time communication savings benefit from the fact that the cluster number is a small binary number unlike raw sensory readings which can be several bytes long real numbers converted from the analog inputs. If the number of sensors in each unit is s , the clusterhead collects data from h units, and the number of different categories in each unit can be represented by b – byte integer, while the sensor readings are real numbers represented with p bytes, then the communication saving can be
calculated as:
s.h. p s. p = . Since the h.b b
communication is the biggest consumer of energy in the sensor node, this leads to bigger energy savings. VII.
505
CO-OPERATIVE ROUTING
The existing protocols aim to enhance lifetime by using sub optimal paths on many occasions. While this constitutes the basis for almost all the approaches for enhancing lifetime, there
In our approach to increase network lifetime, we use a completely different set of parameters to use optimal and sub optimal paths. We propose the use of local group average to make a decision regarding the rejection of an optimal path and switching over to a sub optimal path. The local average that we use is the average of the residual energies of all the directed nodes in a local group. Thus we avoid the use of any such node, which has the residual energy below this average. Therefore the name of this protocol is co-operative routing protocol as a node uses the information about other nodes in its local group. The local group averages need to be updated and we propose a mechanism for these updates without spending any extra energy for communicating these updates. Thus apart from this inherent advantage of automatic updation, our protocol ensures the usage of optimal path for maximum number of times without creating any hotspots. Apart from these advantages, we also solve the issues like directivity in a very simple manner. We have important assumptions. (1) While designing Co-operative routing protocol, we have assumed that the Gateway node has renewable energy resources thereby has the power to perform unlimited number of operations. (2)The nodes in the network are stationary between two setup phases. (3)The transmission range of the Gateway node is large enough to cover the whole cluster. (4) All the nodes are having equal and fixed transmission range. (5) The initial residual energy of all the nodes in a cluster (except the gateway node) is assumed to be equal.(6) Energy required to transmit over a constant range is constant. (7) Each node is having two radios, one is the normal data transmission radio which operates at a higher bit rate and its operation consumes most of the energy spent in communication and the other is MAC radio which operates at a lower bit rate and its operation consumes very less energy as compared to the normal transmission radio. VIII. MODELING AND SIMULATION A. Modeling the Lifetime of the Network: Lifetime of a network is defined as the time after which certain fraction of the network runs out of battery and therefore ceases to function properly, resulting in a failure in transmission of data. In a more simplistic, deterministic and hardbound manner it is defined as the instant at which any of the nodes in the network dies out. Recent advances in the embedded systems have managed to accommodate sensor nodes in such remote environments where refueling them is not possible. One part of the node that consumes a large share of the battery power present with the node is its transceiver. Apart from this, the data processing unit of the sensor node constitutes a big quota of consumed power. That is why the network lifetime calculations need to be based on both the routing protocol as well the data processing units. A lot of work in the UbiSens[8] research project was aimed at inventing a routing protocol that
506
AKOJWAR AND PATRIKAR
would minimize the transceiver consumption. The life time issue of the WSN are discussed in detailed in [9][10]. Life time of the network is improved by reducing the energy consumption on communication and uniform dissipation of energy by all the nodes.
are the originator of data, i.e. the data generated only at the periphery by the initiator nodes, as shown in Fig.3.. This assumption is quite reasonable because it ensures that most of the nodes simply perform the task of routing the data and thus provides a better scope for evaluating the algorithm.
B. Introduction to Ptolemy-II: Ptolemy II is the current software infrastructure of the Ptolemy Project. It is published freely in open-source form. Ptolemy II is the third generation of design software to emerge from UC Berkley [11]. The Ptolemy-II is very helpful to study heterogeneous modeling, simulation, and design of concurrent systems. Visual Sense- is a modeling and simulation framework for wireless and sensor networks that builds on and leverages Ptolemy II. Modeling of Wireless Sensor Networks require sophisticated modeling of communication channels, sensors, ad-hoc networking protocols, localization strategies, media access control protocols, energy consumption in sensor nodes, etc. This modeling framework as presented in[12] is designed to support a component-based construction of such models. It supports actor-oriented definition of network nodes; wireless communication channels, physical media and wired subsystems. Custom nodes can be defined by sub-classing the base classes and defining the behavior in Java or by creating composite models using any of several Ptolemy II modeling environments. Custom channels can be defined by sub-classing the Wireless Channel base class and by attaching functionality defined in Ptolemy II models.
The Nodes that are seen scattered all over the network are the actual nodes, which carry the data from the initiator to the destination. We are assuming uniform distribution of the sensor nodes. Thus these form data-path in the network. Ptolemy-II plays an important role in the placement of the nodes. A randomizer has been used to set the locations of the nodes. This experiment is repeated for different seeds of randomization..
IX.
IMPLEMENTATION OF CO-OPERATIVE ROUTING
This section is arranged as the description of the network topology, the implementation of setup phase, the communication phase and then by incorporating the updations of essential parameters. A. Topology of the Network: Sensor Network is implemented in Ptolemy-II – Visual Sense. The Initiators that are seen at the extremities of the clusters
Figure 3. Pre-initialized Sensor Network
The third type of node in the network is the Gateway. It could easily be visualized as the central controller of the network. It is assumed as to be omni-powerful i.e. it could transmit over its entire cluster. Again, there are no power constraints on the Gateway. It is the controller of the network in the sense that it controls the establishment of communication between the nodes. It is also the node to which the data is to be communicated to. Therefore it is centrally placed to provide symmetrical access. Assuming a single sink also helps in checking the reliability of the routing algorithm. The gateway performs the important task differentiating the phases of operation of the network. Another important feature of this implementation is that the entire process is assumed to be source initiated. This is deliberately done because the destination initiated processes are sparsely spaced in time. Again the routing mechanism is independent of the initiation of transmission, thus making the query initiated implementation trivial, though possible. B. Implementing the Setup Phase The complete internals of the sensor node implemented in Ptolemy is described in Fig. 4. It consists of different functional block implemented by using different Ptolemy actors - Setupper, Averager, Arranger, Router and Updater.
Figure 4. Internals of a Sensor Node
CLASSIFICATION TECHNIQUES WITH COOPERATIVE ROUTING
During the setup phase of the network the network is established. In real life situations, the gateway initiates this stage on getting a trigger from some external controller. But for simplicity of simulations the gateway initiates the setup at time 0. This is indeed true for real life systems as well, since the network lifetime starts with the network being recognized and formed. The gateway initiates this phase by setting a global variable ‘Setup’=1. The status of this variable is globally transmitted and then the nodes act accordingly. The gateway is the one to stop the phase as well. This it does by simply resetting the ‘Setup’ flag. The user-defined actor called ‘Setupper’ performs the setup function in the nodes and the initiator. When the setup flag is transmitted by the gateway, the node’s receivers find out the received energy. The ‘getProperties’ actor is used to find the received energy. The received energy is stored in a variable called ‘Er’. The nodes check if the ‘Setup’ flag is unity, and if true, transmit a packet containing their ID and Received Energy. If the setup period is going on, the receiving node disassembles the packet and stores the ID in its list of neighbors. This leads to a new definition of setup phase which states that setup is the phase of discovery of neighbors. In this algorithm, the routing is done through forwarding tables. Therefore we are interested only in the neighbors with higher directivity. As is evident from previous treatment, the received energy is symbolic of directivity. So, during the setup itself we reject the neighboring nodes with lesser directivity. Not storing the ID previously saved is also of prime importance because replication of IDs leads to faulty routing. Therefore the ‘Setupper’ has to ensure a lot of selectivity. Equally important is to avoid overwriting the already existing links. The ‘Setupper’ takes precautions for this as well. The pseudo-code for the ‘Setupper’ is as follows: for i=1:10 if setup==1 // checks if setup is on if send(i )!=send(i-1) //ensures there is no repetition if send(i )==0 //prevents overwriting send(i )=ID; end; end; end; end; The above loop runs ten times. This is a sensible approximation for the given density nodes. For a more densely populated network, the number of neighbors to be obtained would be higher. Another functionality that we have embedded into the setup phase is the arrangement of the IDs of a node’s neighbors. Doing this solves a lot many problems, foremost of which is the complexity of the scan during the actual routing process. For a stationary network, the setup phase occurs only once while the routing is everlasting, at least till the network dies out. Thus reducing the complexity of the route-time scan saves a lot of computation energy over a large time slice. The arrangement that is most suitable is the one where IDs are arranged in descending order of their directivities. The actor
507
called ‘Arranger’ performs this job. The arranger is deliberately detached from the ‘Setupper’ so that it can be used anywhere to get a descending sequence. The added function that the arranger does is to report the number of non-zero IDs. This function is of serious application during routing. That is because, we do not want to transmit data to a non-existing node, and thereby making certain that there are no loop-holes in the routes established. The pseudocode for finding this is as follows: for i=1:10 if send(i)!=0 len=len+1; end; end; C. Implementing the Communication Phase The setup phase ends when the gateway resets the ‘Setup’ flag to 0. This also marks the beginning of the communication phase. The communication phase implies that all the nodes have discovered the forward links and if the data arrives at any of these nodes, it can be easily routed to the gateway. That is why the initiators are inhibited from generating data until the onset of communication phase or till the end of setup. After the completion of setup phase, the Poisson clock in the initiator is triggered. This is same as data generation. Generated data is a pointer to the initiator in which it is generated. This helps in verifying the correct disposal packets. The initiator is also a simple node but does not forward the packets. Rather, it generates data packets. So it also detects neighboring nodes during the setup and routes the data to the most suitable one during the communication period. Another actor called the ‘Router’, as shown in Fig.4 performs the job of routing the packets. The router is the most important part of the design as it is the one taking decisions based on the proposed algorithm. As defined in the algorithm, the router has to find out the most cost effective link. Such link is the one, which is most directed as well as farthest from dying out. Computation of cost of the links is based on the Er of the node and the difference between the residual energy of the node and the average energy of all the forward nodes. Therefore the router simply has to select the node which has maximum directivity among the nodes with residual energy greater than the average energy. Now the significance of arranging the nodes in descending order of directivity becomes obvious. The router simply scans through the list or the forwarding table and selects the first node that is found to be above the average. The pseudo-code for this operation is as follows: for i=1:10 if Er(i)>=avg RD=send(i); exit; end; end; Another important block of the design is the ‘Averager’. It finds the average of the residual energies of all the nodes in the forwarding table. An important factor in this process is the nonavailable nodes i.e. the nodes with ID as zero, should not be
508
AKOJWAR AND PATRIKAR
taken into account while calculating the average. The logic for this operation is implemented as: for i=1:10 if send(i )!=0 add=add+send(i ) else exit; end; end; avg=add/i; The ‘Averager’ output , avg, is fed to the router enabling it to take a correct decision. D. Updating the Network: Average energy of the forward paths in a particular local group, is the controlling parameter in all the routing decisions that are being made through the proposed algorithm. This energy has to be calculated often. For this purpose, the residual energies of the neighbors need to be known. This is done in the ‘Updation phase’. This functionality is implemented in the above model using an actor called ‘Updater’ as shown in Fig.4. The pseudo-code for this block is as follows: for i=1:10 if ID=send(i) RE(i)=RE(i)-dec; exit; end; end; X.
CONCLUSION
Real time classifier with wavelet preprocessor (DWT) is implemented using ART1and Fuzzy ART neural network models in MATLAB. Sensor data is classified at each node and then the class ID is communicated further. Using classification technique the quantum of number of bytes to be communicated falls down by a factor of - s.p/b as depicted earlier as This provides dual benefits of improving the bandwidth of the communication channel and also reduces the energy consumption. Discrete Wavelet Transform (DWT) is applied to the sensor signals before classifying them. DWT helps to smooth the raw data and it also help for extracting important features in the data like sudden changes at various scales. Cooperative routing protocol with the addition of updater is a new concept, designed for communication in a distributed environment. The routing takes place in multiple hops and all the nodes takes part in communication. It achieves uniform dissipation of energy for all the nodes. With cooperative routing we could find that the lifetime of the WSN improves by about 30% as compared with the diffusion routing when tested with 50 nodes network. The energy consumption is further reduced by a factor of (s.p/b) because of classification of sensor
data. This adds to the considerable improvement of Life time of the WSN.
REFERENCES [1]
A Wendi B. Heinzelman, Member IEEE, Ananth P. Chandrakasan, Senior Member, IEEE, and Hari Balakrishnan, Member, IEEE, An Application –Specific protocol Architecture for Wireless Microsensor Networks, IEEE Transaction on Wireless Communications, Vol.1 No. 4 October 2002 [2] Wendi Rabiner Heinzelman, Ananth Chandrakas and Hari Balakrishnan, “Energy-efficient communication protocols for wireless microsensor networks,”in Proceedings of the Hawaii International Conference on Systems Sciences, Jan 2000. [3] Neural networks Algorithms, applications, and programming techniques ,J.A.Freeman, David M. Skapura, Pearson Education Asia, 2001. [4] Sudhir G. Akojwar, R.M.Patrikar ,Real Time Classifier For Industrial Wireless Sensor Network Using Neural Networks with Wavelet Preprocessors, IEEE International Conference on Industrial Technology, DEC 15-17 2006, Mumbai.. (ICIT 2006), pp 512-517. [5] Sudhir G. Akojwar, R.M.Patrikar, Classification techniques for sensor data and clustering architecture for wireless sensor networks, IAENG International conference on communication systems and applications (ICCSA’07) ,Hong Kong, 21-23 march, 2007, [IMECS 2007. pp 12461251] [6] Lakshaman Krishnamurthy, Robert Adler, Phil Buonadonna, Jasmeer Chhabra et.al. Design and Deployment of Industrial Sensor networks: Experiences from a Semiconductor Plant and the North Sea, SenSys’05, November 2-4, ACM 2005. [7] Kay Soon Low, Win Nu Nu Win, Meng Joo Er,Wireless Sensor Networks for Industrial Environments,International Conference on Computational Intelligence for Modelling, Control and Automation, and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC’05) [8] Saket Sakunia, Shantanu Bhalerao, Abhishek Chaudhary, Mukund Jyotishi, Mandar Dixit, Raghav Jumade, Rajendra Patrikar, “UbiSens: Achieving Low Power Wireless Sensor Nodes,” IEEE International Conference on Wireless and Optical Communications Networks, (WOCN 2007) [9] Konstantinos kalpakis, Koustuv Dasgupta and Parag Namjoshi, ,”Efficient algorithm for maximum lifetime data gathering and aggregation in wireless sensor networks,”Computer Networks Volume 42, Issue 6, 21 August 2003, pp.697-716. [10] Gracanin, D. Eltoweissy, M. Olariu, S. Wadaa, “On modelling wireless sensor networks,” A. Parallel and Distributed Processing Symposium, 2004. [11] Y. Xiong, “An Extensible Type System for Component-Based Design,” Technical memorandum UCB/ERL Mo2/13, University of California, Berkely,CA94720,May1,2002. [12] P. Baldwin, S. Kohli, E. A. Lee, X. Liu, and Y. Zhao, “Modelling of Sensor Nets in Ptolemy-II,” In Proc. of Information Processing in Sensor Networks, (IPSN), April 26-27, 2004, pp.359-368.
Biometric Approaches of 2D-3D Ear and Face: A Survey S. M. S. Islam1, M. Bennamoun1, R. Owens2 and R. Davies1 School of Computer Science and Software Engineering, The University of Western Australia 35 Stirling Hwy, Crawley, WA 6009 1
{shams, bennamou, rowan}@csse.uwa.edu.au 2 [email protected]
Abstract The use of biometrics in human recognition is rapidly gaining in popularity. Considering the limitations of biometric systems with a single biometric trait, a current research trend is to combine multiple traits to improve performance. Among the biometric traits, the face is considered to be the most non-intrusive and the ear is the most promising candidate to be combined with the face. In this survey, existing approaches involving these two biometric traits are summarized and their scopes and limitations are discussed. Some challenges to be addressed are discussed and few research directions are outlined.
1. Introduction The field of Biometric Recognition is a comparatively new research area where one or more physiological (such as face, fingerprint, palmprint, iris and DNA) or behavioral (handwriting, gait, voice etc) traits of a subject are taken into consideration for automatic recognition purposes [18, 29]. Unlike the traditional recognition means (such as passwords, ID cards) which may be stolen, denied or faked easily, biometric traits are universal to everyone, distinct to individuals, permanent over a period of time and measurable quantitatively [17]. Therefore, biometric systems can reliably be used for commercial applications as well as in many government applications such as national IDs, passports, voter IDs, driving licenses etc. and in forensic, security and law enforcement applications [22]. Biometric systems operating with a single biometric data suffer from a number of problems including noise in sensed data, intra-class variations, inter-class similarities (i.e. overlaps of features in the case of large databases), non-universality (for example, in fingerprint systems, incorrect minutiae may be extracted in the case of a poor quality of the ridges) and spoof attacks [29]. To overcome these problems of unimodal systems, multimodal approaches have been proposed. A system may be called multimodal if it collects data from different biometric
sources or uses different types of sensors (e.g. infra-red, reflected light etc), or uses multiple samples of data or multiple algorithms [33] to combine the data [4]. The combination or fusion can be done before classification at the data or feature level, during classification by dynamically selecting the best possible classifier or after classification at the match sore or decision level. Among the biometric traits, the face is not as accurate as DNA or retina, but in terms of reliability and acceptability, the face is considered to be the most promising due to its non-intrusiveness and featurerichness. Proposals have been made to integrate the voice, gait, palmprint, fingerprint, the ear and the iris with the face. The ear has advantages over other alternatives due to its close location to the face. Ear data can easily be collected (with the same sensor) along with face image. It can efficiently supplement face images when frontal views are difficult to collect or occluded. Besides, it has some other attractive characteristics for a biometric trait such as consistency (not changing with expressions and with age between 8 years and 70 years), reduced spatial resolution and uniform distribution of color [15]. Many attempts have been made using the ear and the face images for computer based recognition systems. In this paper, the state-of-art unimodal and multimodal approaches involving these two biometric traits are summarized and their scopes and limitations are discussed. The challenges to be addressed are also discussed and research trends are outlined. The paper is organized as follows. After a description of ear and face detection approaches in the next section, existing biometric approaches with the ear and the face data are summarized in Section 3. The challenges involved and some avenues for improvements are outlined in Section 4. Section 5 concludes.
2. Detection Approaches Detection of the ear and the face region from the background images captured from the sensors is the first and one of most challenging tasks in the recognition process. In this section, some approaches proposed for this purpose are summarized.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 509–514. © Springer Science+Business Media B.V. 2008
510
ISLAM ET AL.
2.1. Ear Detection Detecting 2-D ears from arbitrary side face images is a challenging problem due to the fact that ear images can vary in appearance under different viewing and illumination conditions. Most of the current research on ear biometrics assumes that the ear is already correctly detected and there are only few reported methods for accurately detecting the ear from the side face images. One of the earliest such methods is to use the Canny edge maps to detect the ear contour [6]. A somewhat different approach was proposed by Hurley et al. [14] using the “force field transformation”. Approaches that utilize template matching include the work of Yuizono et al. [42] where both a hierarchal and a sequential similarity detection algorithm were used to detect the ear from 2D intensity images. Chen and Bhanu [8] reported 91.5 % detection rate with a 2.52% FPR by combining template matching with average histograms. The approach was further enhanced by using a reference ear shape model based on helix and anti-helix curves and the global-to-local shape registration [10]. It gave 99.3% and 87.71% detection rates while tested on a database of 902 images from 155 subjects and the UND database of 302 subjects respectively. Another technique based on a modified snake algorithm and Ovoid model is proposed in [1]. It requires the user to input an approximated ear contour which is then used for estimating the ovoid model parameters for matching. Yan and Bowyer [36] proposed using a two-line landmark, with one line along the border between the ear and the face, and the other from the top of the ear to the bottom, in order to detect and crop the ear. Ear Pit Detection Preprocessing for dropping out shoulder and some hair areas Nose tip detection Profile face image (2D color and 3D range image)
Crop out a sector from nose tip with a radius of 20 cm and spanning +/- 30 degree. Crop out non-ear skin regions first by transforming each pixel of 2D image into the YCbCr color space and then by color matching.
Apply Gaussian smoothing to remove some noise using window size of 11*11 pixels. Calculate Gaussian curvature (K) and the mean curvature (H) Group 3D points at the same curvature label into a region. Select region (s) with K>0 and H>0 as pit region (s).
Ear Extraction using Active Contour Algorithm Select initial contour ellipse with ear pit as centre, major axis as 20 pixels and minor as 30 pixels. Determine appropriate parameters for growing contour
Cropped Ear
Take 150 iterations of growth and crop the final contour as extracted ear.
Apply Symmetric Voting method to select the real Ear Pit.
Figure 1. Block diagram of the Ear Extraction approach proposed by Yan and Bowyer [39].
Ear contours have also been detected based on illumination changes within a chosen window [11]. This method compares the difference between the maximum and minimum intensity values of a window to a threshold computed from the mean and standard deviation of that region to decide whether the centre of the region belongs to the contour of the ear or to the background. Another approach is proposed for segmenting ear edges extracted using the Canny edge detector, into helix and anti-helix, based on the intersection of perpendicular lines drawn
from one edge to the convex side of the other [2]. The complete ear boundary is then obtained by joining the two end points of the helix curve with straight lines. Yan and Bowyer [39] developed a fully automatic ear contour extraction algorithm as shown in Figure 1. However, the system fails if the ear pit is not visible. Most recently, Islam et al. [16] proposed an ear detection approach based on the AdaBoost algorithm [30]. The approach is fully automatic, provides 100% detection while tested with 203 non-occluded images of the UND ear database and also works well with some occluded and degraded images.
2.2. Face Detection Techniques for face detection from still images can be categorized in knowledge or rule-based, feature invariant (to pose, viewpoint or lighting), template matching and appearance-based approaches using learning algorithms [40]. Among these techniques, the cascaded AdaBoost algorithm [34] proposed by Viola and Jones is gaining popularity for its near real-time face detection from 2D face frontal/profile views. This algorithm is characterized by the use of integral image and rectangle features with a cascade of classifiers where each successive classifier is used based on the rejection or acceptance result of the previous classifier. Around 95% detection accuracy has been achieved with a false positive rate of 1 in 14084. It has been followed by many variations and improvements such as for occlusion invariance [12], rotation invariance [13] and for speed up [31]. Mian et al. [45] proposed a simple but fully automatic approach for face detection with 98.3% accuracy by first detecting the nose tip from the 3D depth images and then taking a pre-defined spherical region around that. However, the system requires that the image contains only a single face with only 15 degrees pose variation along the x and y-axes is allowed and with the nose tip not occluded.
3. Recognition Approaches We have investigated thoroughly most of the existing research related to ear and face recognition. In this section, at first, unimodal approaches with the ear and the face are described and then the multimodal approaches dealing with both of them are summarized.
3.1. Ear Recognition Pun et al., 2004 [27] summarized the 2D and 3D ear recognition approaches proposed until 2004. Approaches made so far can be categorized into two groups based on whether occlusion due to earrings, hairs etc. are
BIOMETRIC APPROACHES OF 2D-3D EAR AND FACE: A SURVEY
511
considered or not. The features of most of the approaches are summarized in Table 1 in short. Most of the proposed ear recognition approaches use either the PCA (Principal Component Analysis) or the ICP (Iterative Closest Point) algorithm for matching. Choras, 2005 [11] proposed a different automated geometrical method. Testing with 240 images (20 different views) of 12 subjects 100% recognition rate is reported. Genetic local search and the force field transformation based approaches have also been proposed by Yuizono et al. [42] and Hurley et al. [14] respectively. We observed that most of the systems reporting high accuracy are not tested with significantly large and varied databases. The first ever ear recognition system tested with a larger database of 415 subjects is proposed by Yan and Bowyer [39]. Using a modified version of the ICP, they achieved accuracy of 95.7% with occlusion and 97.8% without occlusion (with an Equal-error rate (EER) of 1.2%). The system does not work well if the ear pit is invisible.
significant success has been reported with face recognition with neutral faces. The current research trend is to develop systems invariant to facial expressions which is also one of the focuses of the Face Recognition Grand Challenge (FRGC) [26]. In this section, we summarize only the facial-expression invariant face recognition systems (see Table 2). Li and Barreto, 2006 [21] tried to solve the problem of facial expression deformation by integrating expression recognition and face recognition in a system. At first, an assessment of the expression of an unknown face is made and then, appropriate recognition sub-system is used for person recognition. For non-neutral images, the proposed system finds the right face through modelling the variations of the face features between the neutral face and the face with expression using a Linear Discriminant Analysis (LDA) classifier. The system is tested with 30 neutral and 30 smiling face (3D range) images from 30 subjects and a 80% recognition rate is reported in case of smiling faces.
Table 1. Summary of the existing ear recognition approaches.
Table 2. Face recognition with varying facial expressions.
Category
Occlusion considered
Occlusion not considered
Author and Year of Publication Ping Yan et al., 2007 [39] Zhang et al., 2005 [43] Chen and Bhanu, 2007 [10] Ping Yan et al., 2005[36]
Database Size Sub. Images 415 830, 2 sessions 60 180
Methodolog y
155
Ear model, LSP, modified ICP 3D ICP
96.4
Choras, 2005 [11] Chen and Bhanu, 2005 [9] Hurley et al., 2005 [14]
12
902, 1 Session 830, 2 sessions 240
100
30
60
Geometrical method Two-stage ICP
63
634, 2 sessions
PCA
99.2
404
Modified snakes and ICP PCA
Rec. Rate (%) 95.7 85
97.5
93.3
The symmetry of two ears of a subject was investigated by Yan and Bowyer [36]. Around 90% accuracy is reported when matching with a mirrored left ear with the stored right ear indicating that symmetry-based ear recognition cannot be expected to be highly accurate.
3.2. Face Recognition Among the biometric recognition systems, face recognition is the most extensively researched one. After the maturity of 2D face recognition failed to resolve several problems, researchers are now pursuing 3D face recognition. Extensive surveys on 2D and 3D face recognition techniques can be found in [44, 3]. Quite
Author and Year of Publication
Data Type and Database
Mian et al. 2007 [25] Kakadiaris et al., 2007 [19] Bronstein et al., 2006 [5] Chang et al., 2006 [7] Li et al., 2006 [23] Lu and Jain, 2006 [24] Li and Barreto, 2006 [21]
2D and 3D scans 3D scans (706 gallery images) 180 3D scans from 30 sub. 1590 3D scans of 355 sub. 90 images from 15 subjects 150 2.5D scans from 50 sub. 60 3D images from 30 sub.
Rec. Rate (%) 95.3 95.6 100 87.1 96 97 80
Lu and Jain, 2006 [24] proposed fitting a facial surface deformation model to the test scans to handle expressions and large pose variations. A geodesic-based re-sampling approach is applied to extract the landmarks for the model. With a subset of FRGC v2 dataset (150 test scans of 50 subjects each with one neutral, one smiling and one surprise expression), rank-1 identification accuracy of 97% was achieved. The system requires manual landmark labelling for deformation modelling. Li et al., 2006 [23] combined texture and geometry attributes of faces using PCA to form a classifier which is capable of recognizing faces with different expressions with 96% accuracy. The system was tested with 90 face images from 15 subjects with or without facial expressions. Chang et al., 2006 [7] developed a 3D face recognition system based on a combination of the match scores from matching multiple overlapping regions around the nose. It is tested with the ever largest database of 4485 3D scans of 449 subjects including 2349 and 1590 probes and 449 and 355 gallery images for neutral and non-neutral expressions respectively. The system provides 97.1% and
512
ISLAM ET AL.
87.1% recognition rate with ERR of 0.12 and 0.23 for neutral and non-neutral expression respectively. It takes one second to match one probe shape against one gallery shape in this system. Bronstein et al., 2006 [5] proposed using an isometric model of facial expression, embedding the probe facial surface into that of the model not requiring the same amount of information, Generalized multidimensional scaling (GMDS) numerical core and hierarchical matching strategy. Testing with 180 3D faces from 30 subjects, 100% recognition rate is achieved with six different expressions and under severe occlusions. Kakadiaris et al., 2007 [19] achieved 95.6% recognition rate with a non-neutral expression and 99.0% with the neutral expression by using an annotated face model, wavelet analysis, normal maps and composite alignment algorithm to extract a biometric signature from 3D face data. The work was tested with the largest ever dataset of 706 gallery and 4185 probe data taken from FRGC v2 (4007 scans of 466 subjects) and UH (4891 scans) databases. Most recently, Mian et al., 2007 [25] proposed a fully automatic face recognition approach based on the pose correction with nose tip detection and Hotelling transform, a novel 3D Spherical Face Representation and the ScaleInvariant Feature Transform descriptor, a region based matching approach and the modified ICP algorithm. In an experiment with the FRGC v2 dataset they achieved 99.74 % and 98.31 % verification rates at a 0.001 false acceptance rate (FAR) and identification rates of 99.02 % and 95.37 % for probes with a neutral and a non-neutral expression, respectively.
3.3. Ear-Face Multimodal Approaches In this section, multimodal approaches for human recognition involving the ear and face data are described. A summary of the approaches is shown in Table 3. Table 3. Multimodal Approaches with the ear and the face data. Author and Year of Publication Theoharis et al., 2007 [32] Woodard et al., 2006 [35] Yuan et al., 2006 [41] Yan et al., 2005 [37] Chang et al., 2003 [20]
Data Type and Database Size 3D images from 324 subjects 3D images from 85 sub. 395 images from 79 sub. 1884 2D and 3D scans from 302 subjects. 197 2D images
Algorithms Model fitting, ICP, SA, wavelet ICP and RMS FSLDA ICP
PCA
Level of Fusion Data
Rec. Rate (%) 99.7
Match Score Data
97
Match Score
91.7
Data
90.9
96.2
Chang et al. [20] proposed a PCA based multimodal approach of fusing face and ear data at data level. With a database of 197 2D images with no occlusion they achieved a recognition rate of 90.9%. Yan et al. [37] proposed a multimodal human recognition approach with face and ear at score level fusion. Using 2D PCA with 3D ICP they obtained a recognition rate of 91.7 % in absence of occlusion with a database of 1884 2D and 3D images from 302 subjects. With the same database they obtained 97% recognition with multi-instance approach of using two 3D gallery and two 3D probe images. Yuan et al. [41] proposed combining the face profile silhouette with the ear to form a multimodal image. Applying Full-space LDA (FSLDA) they obtained a recognition rate of 96.2% with a database of 395 Profile images from 79 subjects (USTB database). But they did not consider the occlusion in the images. Besides, they had just made a simple concatenation of ear and face profile images without considering individual contribution of the two modalities. Their system is also not guaranteed for scalability. Woodard et al. [35] proposed combination of 3D face with ear and fingers. Using ICP algorithm, Root Mean Square (RMS) registration error and score-level fusion, they got 97% rank-one recognition rate with 85 subjects. They also tested performance of combining 3D finger and ear modalities and achieved a 87% rank-one recognition rate. Most recently, Theoharis et al. [32] proposed a unified 3D face and ear recognition system using wavelets. They extracted geometry images from the 3D ear and face data by fitting annotated ear and face models representing the respective average shapes to them through an ICP and Simulated Annealing (SA) based registration process. Then, the wavelet transform is applied to the extracted images to find the biometric signature. In a test with 324 gallery and the same number of probe images, 99.7% rank-1 recognition is reported. It is not explicitly mentioned whether the system is invariant to occlusion and facial expression or not.
4. Challenges In this section, the problems faced by the researchers and the challenges to be addressed in achieving a reliable performance with the ear and the face biometrics are identified and discussed.
4.1. Occlusion Although a quite acceptable recognition rate is achieved with a clear view of the ear and the face data, accurate recognition of them occluded by earrings, hair, beard, ornaments etc. is still a great challenge.
BIOMETRIC APPROACHES OF 2D-3D EAR AND FACE: A SURVEY
513
4.2. Expression Invariance
4.6. Automation
Although the ear is not affected by the change of facial expressions, the geometry of the face significantly changes. Almost perfect recognition rates with insignificant false acceptance have been reported for the face with a neutral expression. But face recognition with various facial expressions is still to be addressed to get an acceptable recognition rate suitable for real life applications.
Most of the approaches are manual or semi-automatic requiring some manual intervention. But for real-time applications, recognition should be done in a fully automatic form.
4.3. Efficient Fusion Technique An important avenue of improving existing multimodal biometric systems is to apply an efficient data or feature level fusion. Fusion at match score or decision level is easy to perform. But fusion at these levels cannot fully exploit the discriminating capabilities of the combined biometrics. Fusion at the data or feature extraction level is believed to produce better results in terms of accuracy and robustness because richer information about the ID or class can be combined at these levels [29]. Fusion at the feature level is the most challenging [28, 29, 17] because the feature sets of various modalities may not be compatible and the relationship between the feature spaces of different biometric systems may not be known. Again, resultant feature vectors may increase in dimensionality and a significantly complex matching algorithm may be required. In addition, good features may be degraded by bad features during fusion and hence, we need to apply an efficient feature selection approach prior to fusion. These challenges should be efficiently addressed for a successful multimodal approach.
4.4. Speed The speed of a multimodal biometric recognition system is also an important factor for real time applications particularly when deployed in public places such as airports and stadiums. Unfortunately, most of the algorithms that address the issue of accuracy (such as ICP) are computationally expensive [38]. Therefore, developing an accurate as well as time-efficient algorithm is of great research interest.
4.5. Scalability Testing with significantly larger databases and getting acceptable results is another big challenge to be addressed. Most of the proposed biometric systems are tested with databases containing data from less than 500 subjects. Again, although there is a benchmark database like FRGC v2 for 3D face data, there is no comparable benchmark for ear data.
5. Conclusion The ear and the face are considered to be the most promising biometric traits suitable for non-intrusive applications. In this paper, recent advances in using these two biometric traits for human recognition are summarized. It is found that many works have been reported with unimodal approaches achieving quite high recognition and low error rates. But there are not many significant ear-face multimodal approaches. The most important challenges includes: invariance to occlusions in ear images and facial expressions in face images; application of data or feature level fusion; real time deployment and making the recognition system fully automatic and scalable to a large number of subjects.
Acknowledgements This research is sponsored by ARC grant DP0344338.
References [1] L. Alvarez, E. Gonzalez, and L. Mazorra. Fitting ear contour using an ovoid model. In Proc. of Int’l Carnahan Conf. on Security Technology, 2005, pages 145 – 148, Oct. 2005. [2] S. Ansari and P. Gupta. Localization of ear using outer helix curve of the ear. In Proc. of the Int’l Conf. on Computing: Theory and Applications, 2007, pages 688 – 692, March 2007. [3] K. Bowyer, K. Chang, and P. Flynn. A survey of approaches and challenges in 3D and multi-modal 3D+2D face recognition. Computer Vision and Image Understanding, 101(1):1– 15, Jan. 2006. [4] K. W. Bowyer, K. I. Chang, P. Yan, P. J. Flynn, E. Hansley, and S. Sarkar. Multi-modal biometrics: an overview. In Proc. of Second Workshop on MultiModal User Authentication, May 2006. [5] A. Bronstein, M. Bronstein, and R. Kimmel. Robust expression-invariant face recognition from partially missing data. Proc. ECCV’06, Lecture Notes in Computer Science. [6] M. Burge and W. Burger. Ear biometrics in computer vision. In Proc. of the ICPR’00, pages 822 – 826, Sept 2000. [7] K. I. Chang, K. Bowyer, and P. Flynn. Multiple Nose Regions Matching for 3D Face Recognition under Varying Facial Expression. IEEE Trans. on PAMI, 28(10):1695 – 1700, Oct. 2006. [8] H. Chen and B. Bhanu. Human ear detection from side face range images. In Proc. of ICPR’04, Vol. 3:574 – 577, Aug. 2004.
514
ISLAM ET AL.
[9] H. Chen and B. Bhanu. Contour matching for 3D ear recognition. In Proc. of IEEE WACV, pages 123–128, 2005. [10] H. Chen and B. Bhanu. Human ear recognition in 3D. IEEE Trans. on PAMI, Vol. 29(4):718 – 737, Apr. 2007. [11] M. Choras. Ear biometrics based on geometrical feature extraction. Elec. Letters on CVIA, Vol. 5:84–95, 2005. [12] L. Goldmann, U. J. Monich, and T. Sikora. Components and Their Topology for Robust Face Detection in the Presence of Partial Occlusions. IEEE Trans. on IFS, 2(3):559 – 569, Sept. 2007. [13] C. Huang, H. Ai, Y. Li, and S. Lao. High-Performance Rotation Invariant Multiview Face Detection. IEEE Trans. On PAMI, 29(4):671 – 686, April 2007. [14] D. J. Hurley, M. S. Nixon, and J. N. Carter. Force field feature extraction for ear biometrics. Computer Vision and Image Understanding, 98(3):491–512, June 2005. [15] A. Iannarelli. Ear Identification. Forensic Identification Series. Paramount Publishing Company, Fremont, California, 1989. [16] S. Islam, M. Bennamoun, R. Owens, and R. Davies. Fast and Fully Automatic Ear Detection Using Cascaded AdaBoost. Proc. of IEEE WACV’08 (Accepted), Jan. 2008. [17] A. Jain, A. Ross, and S. Prabhakar. An introduction to biometric recognition. IEEE Trans. on Circuits and Systems for Video Technology, 14(1):4–20, Jan. 2004. [18] A. K. Jain, A. Ross, and S. Pankanti. Biometrics: A Tool for Information Security. IEEE Trans. on Information Forensics and Security, 1(2):125–143, June 2006. [19] I. Kakadiaris, G. Passalis, G. Toderici, M. Murtuza, Y. Lu, N. Karampatziakis, and T. Theoharis. Three-Dimensional Face Recognition in the Presence of Facial Expressions: An Annotated Deformable Model Approach. IEEE Trans. On PAMI, 29(4):640 – 649, Apr. 2007. [20] S. S. Kyong Chang, Kevin W. Bowyer and B. Victor. Comparison and combination of ear and face images for appearance-based biometrics. IEEE Trans. on PAMI, Vol. 25, No. 9:1160–1165, September 2003. [21] C. Li and A. Barreto. Evaluation of 3D Face Recognition in the presence of facial expressions: an Annotated Deformable Model approach. In Proc. of ICASSP’06, 3:14– 19, May 2006. [22] S. Z. Li and A. K. Jain. Handbook of Face Recognition. Springer, 2005. [23] X. Li, G. Mori, and H. Zhang. Expression-Invariant Face Recognition with Expression Classification. In Proc. of the 3rd Canadian Conf. on CRV, pages 77–83, June 2006. [24] X. Lu and A. K. Jain. Deformation Modeling for Robust 3D Face Matching. In Proc. of CVPR’06, pages 1377–1383, Jun. 2006. [25] A. Mian, M. Bennamoun, and R. Owens. An Efficient Multimodal 2D-3D Hybrid Approach to Automatic Face Recognition. IEEE Trans. on PAMI, 29(11):1927 – 1943, Nov. 2007. [26] P. Phillips, P. Flynn, T. Scruggs, K. Bowyer, J. Chang, K. Hoffman, J. Marques, J. Min, and W. Worek. Overview of the face recognition grand challenge. In Proc. of CVPR’05, 1:947 – 954, June 2005. [27] K. H. Pun and Y. S. Moon. Recent advances in ear biometrics. In Proc. of the Sixth IEEE Int’l Conf. on
[28]
[29] [30] [31]
[32]
[33] [34] [35] [36] [37] [38]
[39] [40] [41] [42] [43]
[44] [45]
Automatic Face and Gesture Recognition, pages 164 – 169, May 2004. A. Ross and R. Govindarajan. Feature Level Fusion Using Hand and Face Biometrics. In Proc. of SPIE Conf. on Biometric Technology for Human Identification II, pages 196–204, March 2005. A. Ross and A. K. Jain. Multimodal Biometrics: An Overview. In Proc. of the 12th European Signal Processing Conf., pages 1221–1224, Sept. 2004. R. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Mach. Learn., Vol. 37, No. 3:297–336, 1999. J. Sochman and J. Malas. Adaboost with totally corrective updates for fast face detection. In Proc. of the Sixth IEEE Int’l Conf. on Automatic Face and Gesture Recognition, 2004, Vol. 2:445 – 450, May 2004. T. Theoharis, G. Passalis, G. Toderici, and I. Kakadiaris. Unified 3D face and ear recognition using wavelets on geometry images. Pattern Recognition, doi: 10.1016/j.patcog.2007.06.024 2007. O. Ushmaev and S. Novikov. Biometric Fusion: Robust approach. In Proc. of MMUA’06, May 2006. P. Viola and M. Jones. Robust Real-Time Face Detection. Int’l Journal of Compter Vision, 57(2):137–154, 2004. D. Woodard, T. Faltemier, P. Yan, P. Flynn, and K. Bowyer. A comparison of 3D biometric modalities. In Proc. of CVPR Workshop, pages 57 – 61, June 2006. P. Yan and K. W. Bowyer. Empirical evaluation of advanced ear biometrics. In Proc. of Conf. on Empirical Evaluation Methods in Computer Vision, June 2005. P. Yan and K. W. Bowyer. Multi-biometric 2D and 3D ear recognition. In Proc. of Audio- and Video-Based Person Authentication Conf. Rye Brook, NY, 25 (9), Sept. 2005. P. Yan and K. W. Bowyer. An automatic 3D ear recognition system. In Proc. of the Third Int’l Symposium on 3D Data Processing, Visualization and Transmission, Jun. 2006. P. Yan and K.W. Bowyer. Biometric recognition using 3D ear shape. IEEE Trans. on PAMI, Vol. 29, No. 8:1297 – 1308, Aug. 2007. M.-H. Yang, D. Kriegman, and N. Ahuja. Detecting Faces in Images: A Survey. IEEE Trans. on PAMI, 24 (1):34–58, 2002. L. Yuan, Z. Mu, and Y. Liu. Multimodal recognition using face profile and ear. In Proc. of the 1st ISSCAA, pages 887–891, Jan. 2006. T. Yuizono, Y. Wang, K. Satoh, and S. Nakayama. Study on individual recognition for ear images by using genetic local search. In Proc. of CEC, pages 237–242, 2002. H.-J. Zhang, Z.-C. Mu, W. Qu, L.-M. Liu, and C.-Y. Zhang. A novel approach for ear recognition based on ICA and RBF network. In Proc. of Int’l Conf. on MLC, ‘05, Vol. 7:4511 – 4515, Aug. 2005. W. Zhao, R. Chellappa, A. Rosenfeld, and P. Phillips. Face Recognition: A Literature Survey. ACM Computing Surveys, pages 399–458, 2003. A. Mian, M. Bennamoun, and R. Owens. Automatic 3D Face Detection, Normalization and Recognition. Proc. Of the Third Int’l Sym. on 3D Data Processing, Visualization and Transmission (3DPVT), 2006.
Performance Model for a Reconfigurable Coprocessor Syed S. Rizvi1, Syed N. Hyder2, Aasia Riasat3
Computer Science & Engineering Department, University of Bridgeport1, Bridgeport, CT International Business Machines2 (IBM), Rochester, MN Department of Computer Science, Institute of Business Management3, Karachi, Pakistan [email protected] , [email protected], [email protected]
Abstract—This paper presents an analytical model for the performance of a generic reconfigurable coprocessor (RC) system. The system is characterized by a standard processor with a portion that is reconfigurable. We describe a general performance model for the speedup of a generic RC system. We demonstrate how different parameters of speedup model can affect the performance of reconfigurable system (RS). In addition, we implement our pre-developed speedup model for a system that allows preloading of the functional blocks (FB) into the reconfigurable hardware (RH). The redevelopment of speedup model with the consideration of preloading demonstrates some interesting results that can be used to improve the performance of RH with a coprocessor. Finally, we develop a performance model for a specific application. The application is characterized by a main iterative loop in which a core operation is to be defined in a FB. Our experiments show that the minimum and maximum speedup mainly depends on the probabilities of miss and hit for the FB that resides in the RH of a coprocessor. In addition, our simulation results for application specific model demonstrate how the probability of dependency degrades the achievable speedup. Index Terms — Performance model, coprocessors, software implementation, hardware.
reconfigurable reconfigurable
I. INTRODUCTION Modern applications are increasingly demanding high performance and highly flexible hardware. These requirements have forced hardware architects to look for an alternate solution to general-purpose programmable solutions. The conventional approach is to employ a hardware coprocessor with a programmable CPU which implements to determine those tasks efficiently. Although both general-purpose programmable CPU and the RH coprocessor solve the performance and power efficiency issues, they have their own limitations which degrade their performance. For example, RH coprocessors do not give acceptable performance if poor FB organization is employed. In order to efficiently use RC, one of the ways is to treat the reconfigurable logic not as a fixed resource but instead as a cache for the reconfigurable FB instructions. Those instructions
that have recently been executed, or that we can otherwise predict might be needed soon, are kept in the reconfigurable logic. For instance, if an instruction is needed, it is brought into the FB by preloading it in or before the cycle in which it is required. In this way, the system uses partial run-time reconfiguration techniques to manage the reconfigurable logic. Since the reconfigurable logic is somewhat symmetric, a given instruction may be placed into the FB wherever there is space available. Reconfigure systems have provided significant performance improvements by adapting to computations not well served with current processor architectures [5]. A programmable processor ceaselessly follow a three-phase implementation, where an instruction is first fetched from memory, after which it is decoded, then to be passed on to the final execution phase. The third phase of this implementation may require several clock cycles. This process is then repeated for the next instruction, and so on. An RC, on the other hand, can be regarded as having but a single, non-iterative fetch phase. No further phases or iterations are needed as the processor is now configured for the task at hand [9]. The primary strength of a RC or functional unit is the ability to customize hardware for a specific program’s requirements [1]. In this paper, we investigate the benefits of using a reconfigurable architecture over a SI for FB execution. For both speedup models, we present a variety of simulation results. Our experiments demonstrate that the minimum and maximum speedup mainly depends on the probabilities of miss and hit for the FB that reside in the RH of coprocessor. An important motivation of this paper is the characterization of a speedup model for a RC that interacts with current high-performance core processors. In general, we compare the performance of a generic RC system with the performance of conventional core processors that use software implementation (SI). The rest of the paper is structured as follows. In Section II, we give an overview of the RH coprocessor systems. Section III presents the performance models for both generic RC and the system that supports preloading functional blocks in the RH. Section IV presents a performance analysis of the proposed model. Finally, we conclude the paper in Section V.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 515–520. © Springer Science+Business Media B.V. 2008
516
RIZVI ET AL.
II. RELATED WORK The FPGA architecture, such as the small width of the programmable logic blocks and the programmable interconnection network, provides a great flexibility for the systems [7]. On the other hand, FPGAs have following shortcomings: (i) small width of the programmable logic blocks (ii) slower than a custom integrated circuit and have lower logic density. Previous work shows significant performance gains with such an approach [6]. Significant speedup for certain applications has been reported. MorphoSys [2] was targeted at applications with inherent data-parallelism, high regularity and high throughput requirements. The logic cell is configured using a 32-bit context word. By placing the reconfigurable logic in a separate chip from the processor, the limited off-chip bandwidth and added delay interfere in efficient FPGA-processor communication. The resulting overhead requires that large chunks of the application code be mapped to the reconfigurable logic to achieve any performance benefits at all. RS integrates a high performance processor with a reconfigurable functional unit into the same chip (Garp [3], Remarc [8], PipeRench I-COP [4], etc.). In this paper, we investigate the benefits of using a reconfigurable architecture over a SI for FB execution. For both speedup models, we present variety of simulation results that demonstrate that the minimum and maximum speedup mainly depends on the probabilities of miss and hit for the FB that reside in the RH of coprocessor. III. ANALYTICAL AND PERFORMANCE MODELS FOR THE GENERIC RC SYSTEM Model variables, along with their definition are listed in Table 1. Before going to present a performance model for the speedup of a RS, we discuss the main design methodology adopted behind the speedup model. We assume that all the data communication between core processor and RC is done using the shared data memory. In addition, we also assume that we have a main controller that takes care of both the main processor and the coprocessor with the RH. The controller is also responsible for configuring the RC by loading the configuration data from the external memory. Since the architecture of the coprocessor demands high bandwidth to the data memory, we assume that we have a dedicated memory bus for data communication between the coprocessor and memory. A. Speedup Model for RH in Coprocessor For the ease of implementation and understanding, we divide the development of the performance model for the speedup of a RC in the following two parts: best-case scenario (ideal model) and the worst case-scenario (real model). B. Model Development for the Best-Case Scenario We start developing the first performance model for an ideal case where we assume that all the required FB are always present in the RH and there is no dependencies exist between the instructions etc. These assumptions, therefore, permit us to ignore some of the system parameters such as the reconfigurable programming time and the probability of miss
Parameter
TABLE I SYSTEM PARAMETERS DEFINITION Definition
ET
Enhanced execution time for FB
TC TP
FB call time Reconfigurable programming time
PE
Portion of enhancement
TRFB
Execution time for a reconfigurable FB
TPFB
Time to preload the FB into RH
T INI
Time required to perform initialization
TCL
Cleanup time
TI
Preloading initiation time
TBC
Time required to perform basic computation
TRD
Time required to resolve dependencies
C
Constant value
NA
Ignored
Pd
Probability of dependency among N instructions
N
Given set of instructions
Ph
Probability of a hit
K
Speedup
Pm
Probability of a miss
NT
Normal execution time for FB
TN
Normal execution time between FB
TB
Normal FB execution time
etc. Based on the above assumptions, we can define the speedup as a ratio of execution time for a certain task using the SI (core processor) to the execution time for the same task using the RH (coprocessor). Thus, this allows us to derive an expression for the portion of enhancement ( Pe ) that one may achieve using the RH. The portion of enhancement for a complete application can be defined as a ratio of enhanced time ( Et ) due to the RH to the normal time ( Nt ) using the SI. This can be expressed as follows: Et Pe = where 0 < Pe < 1 Nt
(1)
Therefore, (1) clearly indicates that the resultant speedup for the system should be the reciprocal of the portion of enhancement and must be greater that one. This leads us to the following mathematical expression for the speedup of a system where the probability of miss almost reaches to zero. Nt Speedup = K = 1 P = where K > 1 Et e
(2)
One common point that we can observe in both (1) and (2) is the fact that the enhanced time should be less than the normal time processor takes to execute a certain amount of task. For instance, consider an example of a program that contains n number of instructions. If we assume that a portion of program (for example, a RH is dedicated to serve instruction-50 to instruction-200) can completely use the RH, then the improvement can be defined as a simple ratio between the normal execution to the enhanced time. In addition, the resultant
517
PERFORMANCE MODEL FOR A RECONFIGURABLE COPROCESSOR
speedup always exceeds one. This implies that the total execution time using the RH should equal to the sum of the portion of enhancement time and the normal execution time (i.e., the portion of program which can not use enhanced mode). Mathematically it can be expressed as follows:
(
)
(3)
Substituting K from (2) into (3), yields, ⎡ ⎤ ⎢ ⎥ P ⎥ Et = N t ⎢ 1− Pe + e Nt ⎥ ⎢ ⎢ Et ⎥⎦ ⎣
(
)
Execution Time(tn) Enhanced Time(te) 12
E x e c u t io n T im e A n d E n h a n c e d T im e
P ⎤ ⎡ Et = N t ⎢ 1− Pe + e ⎥ K⎦ ⎣
Normal Execution Time & Enhanced Time Versus Speedup (K)
14
10
8
6
(4) 4
2 1.1
After performing some simplification, we get
1.15
1.2
1.25
1.3
1.35
Speedup (K)
Fig. 1. Normal Execution and Enhanced Time versus the probability of Hit
( ) + ( Nt )
Et Et = Nt
Since K
=
Nt Et
2
2
− Et Nt
( Nt )
2
(5)
, the final equation of the speedup can be written
as: Speedup = K =
( Et )
( Nt ) 2
2
( )
− Et Nt + Nt
2
(6)
Fig. 1 shows the comparison of the required execution time between the RH and the SI with respect to specific values of speedup. For instance, if we assume that the normal execution time (assume that the normal execution time represents SI) takes four cycles and the enhanced time (i.e., RH) takes two cycles, the overall speedup that can be achieved through this performance model using the given system parameters is no more than 1.333 as shown in the Fig. 1. It should be noticed that (6) can not speedup the system more than the reciprocal of one minus the portion of enhancement (i.e. K <1 1- Pe ) as shown in Fig. 1. C. Model Development for the Worst-Case Scenario A normal execution time for a system that does not support RH should equal to the sum of normal execution time between the FB and the time it takes to execute a FB on the processor in software. Mathematically, it can be expressed as: (7) N T = TN + TB Before going to develop the performance model for the worst-case scenario, it is imperative to mention some fundamental assumptions we established for this model. We assume that the time required to call a FB in the RH is less than the time required to execute a FB in software without a RH. This leads us to the following assumption: TB > TC . Since reconfigurable technology provides dedicated hardware to perform a certain task, the time to call and execute a FB should be less than the time required to execute a FB in SI. This
argument is always almost true especially when
Ph
reaches to
its peak value. On the other hand, when a miss occurs, we need to pay a penalty of TP cycles that we assume should be less than the time required to execute a FB on the processor in software (i.e., TB TP ). This implies that the overall enhance time which uses the RH as a part of processor should equal to the sum of normal execution time and the reconfiguring programming time as well as the normal block execution time. Since we assume that the probability that a FB is present in the RH is Ph , the probability that the RH need to be programmed is
1 − Ph (i.e. Pm = 1 − Ph
).
When one instruction comes into a processor for execution, we assume that the processor first checks the availability of the required FB into the RH for performing a single operation. In addition, in order to call the FB in a RH, processor needs TC cycles of time. If the required FB is present into the RH, there will be no miss penalty. This can only be happened when the probability of hit reaches to its maximum value. This implies that, when the probability of hit in the RH closes to one, the time required to program a missing FB tends to zero. This leads us to the following mathematical fact: Ph ∝ 1 TP . In other words, this can also be expressed as: when Ph
→ 1 then TP → 0 .
According to one of our assumptions, in case of a miss in the RH, we pay a penalty of TP cycles. This, therefore, implies that we should multiply the time required to call a FB in the RH with the probability of miss. Furthermore, we also assume that the TRFB is a time required to execute a FB in the RH. Once the FB call time initiates (i.e., TC ), the dedicated hardware starts executing the instruction and takes
TRFB
amount of cycles to
finish the job. The TRFB is extremely small due to the fact that we use dedicated hardware that runs extremely faster than the
518
RIZVI ET AL.
equivalent SI. The above assumptions lead us to following mathematical formula for computing the enhanced time of the RC system. (8) ET = TN + TC (1− Ph ) + TRFB + TP
from the RH can be achieved when the probability of hit reaches to its peak value. On the other hand, the worst-case performance of the RH can also be noticed in Fig. 2 when the probability of miss approaching its maximum value. One can observe in Fig. 2 that the RH performs well by means of execution time when the probability that the required FB resides in the RH is greater than 50%. Once the probability of hit goes down by 50%, the SI outperforms the RC systems. Recall our fundamental equations (1) and (2). The portion of enhancement and the speedup of a RH system for the real model can be expressed as follows:
NT
⎡ T +T (1− P ) +T ⎤ h P⎥ Pe =⎢ N C TN +TB ⎣⎢ ⎦⎥
(9)
TN +TB ET = T +T (1− P ) +T N C h P
(10)
In order to derive the expression for the speedup, we implement the concept of (3) on (9) and (10). This combine implementation results (11) as follows: K =
1 ⎡ ⎛ T (1− P )+T +T ⎞⎤ TN +TB ⎞ ⎛ TN +TC (1− Ph ) +TP C h N P ⎢1−⎜ ⎟⎥ +⎜ ÷ TN +TB ⎟⎠ ⎜ ⎟ + + + − T T T T T (1 P ) ⎢⎣ ⎝ N B N P C h ⎠⎥⎦ ⎝
(11)
After performing some simplification, we get
(TN +TB ) {(TB −TC (1− Ph )−TP )(TN +TB )}+(TN +TC (1− Ph )+TP )2 2
K =
(12)
Simulation results show that the probability of hit plays a major role in (12). According to our experiment, when we setup the probability of hit by 0.5, the RH does not provide any speedup over the SI. On the other hand, when the probability of hit exceeds 0.5 (i.e., 0.5< Ph ≤1 ), the RH starts providing better execution rate than the SI. This is because the probability that a processor executes an instruction in TN +TC cycles is higher than TN +TB
cycles (this is true since TC
increase the probability of miss, we would likely to get more speedup through SI rather than the RH. In other words, setting up Ph (such as 0 < Ph < 0.5 ) less than 50% results in performance degradation due to the miss penalty of
TP cycles.
In addition,
8
7
6 E x e c u t io n T im e ( C lo c k C y c le s )
Fig. 2 demonstrates a comparison of the required execution time for the same task between the RH and the SI with respect to a range of Ph .It can be seen in Fig. 2 that the best performance
Reconfigurable Hardw are & Softw are Implementation Time Versus Probablity Of Hit (Ph)
5
4
3
2
Reconfigurable Hardware Time Software Implementation Time
1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Probablity Of Hit (Ph)
Fig. 2.
RH and SI versus the probability of hit
this also ensures that the reduction in
Ph
forces TP to effects
comprehensively on the performance of the RH. We use (12) to compute speedup by selecting specific values of system parameters. Note that the term Ph in (12) is inversely proportional to the term
TP
(i.e., a decrease in
Ph
causes an
increase in TP that may degrade the overall performance). It can be seen in Fig. 3 that how much faster a FB executes on average in the RH over a SI through a core processor. In addition, Fig. 3 demonstrates the change in the performance of a RH with respect to the changes in the probability of hit. This implies that, in order to achieve an optimum performance from the RH, we should have a mechanism that ensures the higher values of probability of hits. D. Speedup Model for RH with Preloading We need to consider an implementation of a system that permits preloading of FB into the RH. Before going to redevelop the speedup model, we mention some important characteristics necessary to understand the concept of preloading RH systems. The implementation of the preloading with the RH does not fully utilize the normal execution time between the FBs. In other words, the value of TN can be used partially in a preloading system. The preloading of
TN
into the RH is limited
to the partial use of normal execution time of a processor, since an instruction can not be called until it is clear that a FB is required. When the system starts up for the first time, the preloading of the required FB into a RH can not be completed until it is determined that a FB is really required. This, therefore, leads us to the following sequential implementation of the preloading: (1) a signal generates that indicates that a FB is required, (2) an instruction executes that initiates the preloading of the required FB, and (3) the loading of the required FB into the RH starts that may complete while another call for the FB generates. In order to develop a general speedup model for a
519
PERFORMANCE MODEL FOR A RECONFIGURABLE COPROCESSOR
required to perform the computation. Equation (13) can also be written as:
Speedup (K) Versus Probablity Of Hit (Ph)
1.25 Speedup (K)
(
( )
⎡ ⎤ ET = ⎢ 1 (TN ) +TC Pm + 1 (TP ) +TI ⎥ + TPFB 2 ⎣ 2 ⎦
1.2
)C
(15)
1.15
Using (7) and (15) and, the mathematical expression for the speedup model can be expressed as follows:
Sp eed u p (K )
1.1
1.05
K=
1
0.95
1
{
}
{
}
(16)
2
⎡ ⎡ 1 (T ) +T ( P ) +1 (T ) +T +T +(T ) ⎤⎤ ⎡⎡ 1 (T ) +T ( P ) +1 (T ) +T +T +(T ) ⎤⎤ ⎢1−⎣ 2 N C m 2 P I RFB PFB C⎦⎥+⎢⎣ 2 N C m 2 P I RFB PFB C⎦⎥ ⎢ ⎥⎢ ⎥ (TN+TB) (TN+TB) ⎢⎣ ⎥⎦ ⎢⎣ ⎥⎦
0.9
IV. PERFORMANCE ANALYSIS OF THE PROPOSED MODEL 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Probablity Of Hit (Ph)
Fig. 3.
Speedup (K) for the RH versus the Probability of Hit
generic RC, we assume that we have an IC attached with the core processor and a high bandwidth is available all the time between the processor and the RH. In order to make a realistic speedup model, one should also include the startup latency in computing the execution time for the RH. In order to evaluate the performance of coprocessor with a RH, it is imperative to consider the communication latency between the core processor and the RH as well as the configuration time for FB within the RH. It appears that the reconfiguration time reduces when use with the preloading technique since instructions that have recently been executed or those that can be predicted and preloaded into the RH stay in the reconfigurable logic. This, therefore, not only improves the performance of the RH by means of the total execution time but also gives a speedup over the SI. The best and the worst case equations to compute the enhanced time for the preloading RH systems are as follows:
(
( )
⎡ ⎤ ET = ⎢ 1 (TN ) +TC Pm + 1 (TP ) +TI +TRFB ⎥ + TPFB 2 ⎣ 2 ⎦
)C
ET
(
)
(
)
system parameters are set to 4, 3, 4, and 3 cycles for TN ,TB ,TC , and TP respectively. It can be evident in Fig. 5 that
(14)
where TC →1 and TP →0, when Pm →0 and TI → NA
Equation (14) shows the enhanced time computation for the best-case scenario where all system parameters may have ideal values. It should be noticed that TRFB in (13) represents a time required to execute a FB in the RH. Once the FB is loaded into the RH, there is no further instruction bandwidth (cycles)
TN
increases the SI time by a
large magnitude (approximately 43%) compare to an increase in the RH implementation. In addition, Fig. 5 demonstrates that the effect of TN on SI is twice the effect of TN on the RH implementation. In other words, a 75% increase in
TN
results an
increase of 43% in SI-time and 21% increase in the RH implementation-time (when Pm = 1 ). It should be noticed that the system parameters for Fig. 5 are computed using (7) and Reconfigurable Hardw are & Softw are Implementation Time Versus Probability Of Miss (Pm)
(13)
Equation (13) represents the enhanced time for the worst case scenario where each system parameter may have non ideal values such as the probability of miss might reach to one or the startup latency might be greater than the reconfigurable programming time. Likewise, the enhanced time for the best case can be expressed as: ⎡ ⎤ = ⎢ 1 (TN ) +1+TRFB ⎥ + TPFB C ⎣ 2 ⎦
For simulation results, we ignore the first-time-penalty (i.e., the startup latency), since the time required executing a large number of instructions dominates this system parameter. For Fig. 4, simulation results are computed using (7) and (15) for both SI and the RH systems with preloading technique respectively. Fig. 4 demonstrates the performance of both RH and the SI with respect to a varying amount of Pm . In addition,
7
6
5 E x e c u t io n T im e ( C y c le s )
0.85
4
3
2 Reconfigurable Hardware Time Software Implementation Time 1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Probablity Of Miss (Pm)
Fig. 4.
RH and SI versus the Probability of Miss
1
520
RIZVI ET AL. [3] Reconfigurable Hardw are & Softw are Implementation Time Versus Probability Of Miss (Pm)
[4]
10 9
Reconfigurable Hardware Time Software Implementation Time
[5]
8
E x e c u t io n T im e ( C y c le s )
7
[6]
6 5 4
[7]
3 2 1 0
[8] 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Probablity Of Miss (Pm)
Fig. 5.
(15)
and
[9]
RH and SI versus the Probability of Miss
are set to 7, respectively.
3,
4,
and
2
cycles
for
TN ,TB ,TC , and TP
Making
TB < TC
might be considered as a theoretical
assumption. In practice, however, this does not happen. One of the important issues that we have not considered in this paper is the placement of the RH with respect to the core processor. Much research has been done in this area to device new architecture that can place both the RH and the main processor as close as possible. The reason for doing this is to reduce the access latency between the core processor and the coprocessor. In reality, the access latency for a FB that resides in the core processor and performs SI is extremely low than the access latency required to call a FB from a RH. V. CONCLUSION In this paper, we presented a general performance model for the speedup of a system which uses a RC. In addition, we presented variety of simulation results that compare the performance of RH coprocessor system with the conventional general purpose processors. Furthermore, we described the performance model of a system which allows preloading FB into the RH. We explored various system parameters and studied their impact on the overall system performance. Simulation results suggest that the RH coprocessor has potential to achieve significant speedup when implement with a good FB management system that can ensure the availably of required FB into the RH. REFERENCES [1] [2]
A. DeHon ™Architectures for General-Purpose Computing, “A.I. Technical Report,” No. 1586, Artificial Intelligence Laboratory, MIT. Hartej Singh, Ming-Hau Lee, Guangming Lu, Fadi J, Kurdahi, and Nader Bagherzadeh, “MorphoSys - An Integrated Reconfigurable System for Data Parallel Computation-Intensive Applications,” University of California, Irvine, CA 92697.
T. J. Callahan, J. R. Kouser, and J. Wawrzynek, “The GARP Architecture and C Compiler,” IEEE Computer, 33(4):62-69 2000. Y. Chou, P. Pillai, H. Schmit, and J. P. Shen, “Pipe-Rench Implementation of the Instruction Path Coprocessor,” Int. Symp. On Micro-architecture, pp.147-158, 2000. Scott Hauck, Thomas W. Fry, Matthew M. Hosler, and Jeffrey P. Kao, “The Chimaera Reconfigurable Functional Unit,” IEEE Symposium on FPGAs for Custom Computing Machines, 1997. Available: http://citeseer.ist.psu.edu/hauck97chimaera.html Vishal Choudhary, A. Wel, M. Bekooij, and J. Huisken, “Reconfigurable Architecture for Multi-Standard Audio Codecs,” Philips Research Labs. The Netherlands. Available: http://wwwhome.cs.utwente.nl/~smit/HWSWcodesign/paper_choudhary. pdf J. R. Hauser and J. Wawrzynek, “GARP: A MIPS processor with a reconfigurable coprocessor,” in IEEE Workshop on FPGAs for Custom Computing Machines, pp.24--33, 1997. Available: http://citeseer.ist.psu.edu/hauser97garp.html T. Miyamori, and K. Olukotun, “A Quantitative Analysis of Reconfigurable Coprocessors for Multimedia Applications,” IEEE Symp. On FCCM, 1998. Seth C. Goldstein, Herman Schmit, Matthew Moe, Mihai Budiu, Srihari Cadambi, R. Reed Taylor, and Ronald Laufer, “Piperench: A coprocessor for streaming multimedia acceleration,” In Proceedings of the 26th Annual International Symposium on Computer Architecture, pages 28-39, May 1999.
RFID: A New Software Based Solution to Avoid Interference Syed S. Rizvi1, Eslam M. Gebriel2, and Aasia Riasat3
Computer Science & Engineering Department, University Of Bridgeport1, 2, Bridgeport, CT Department of Computer Science, Institute of Business Management3, Karachi, Pakistan {srizvi1, emahmoud2}@bridgeport.edu, [email protected]
Abstract - RFID (Radio Frequency Identification) interference is one of the most important issues for RFID applications. The dense-reader mode in Class 1 Gen2 in EPC-global is one of the solutions which can reduce the interference problem. This paper presents a new software based solution that can be used for improving the performance of dense-reader mode through effectively suppressing the unwanted interference. We use two existing methods, Reva and Feig’s systems, to build a complete simulation program for determining the effectiveness of the new proposed software based solution. For the sake of our simulation results, we strictly follow the EPC-global Gen2 standard and use the Listen and Wait protocols [2]. Our simulation results demonstrate that the interference can be reduced significantly by using the proposed software based solution. Keywords - RFID, interference, dense-reader mode, listen and wait protocol
I.
INTRODUCTION
In this paper, we present a software based solution that can effectively reduce the interference between the RFID applications. Our designed approach addresses many issues related to RFID such as tag collision and interference. Our proposed solution is not only shown to reduce the interference but can also be used as an effective tool to analysis the interference among the RFID applications that are located within the closed proximity. RFID is an electronic barcode which can be used to track items under very harsh condition. RFID consists of three components: a tag, a reader and an antenna. In tag memory, the tags can be divided into read-only and read-write tag. With read-only tags, tags send data to the reader. With read-write tags, the tags send data to the reader where as the reader writes data to the tag. In tag power supply, the tags can be divided into passive and active tag. The passive tags contain no power supply and store very small amount of data. In addition, the passive tags need a transceiver signal in order to activate. The active tags, on the other hand, contain an internal battery and capable of storing large amount of data. At different frequencies, the tags have different read capability in the distance. At a frequency of 2.45 GHz, the tag can be read from 2 to 3 mm. The read distance can be
increased from 24 to 30 cm by adding a small booster antenna. At a frequency of 13.56 MHz, the tag can be read from 15 to 20 cm with the booster antenna and from 30 to 40 with the external antenna. In the UHF spectrum, the reading distance is up to 1 meter when use with the booster antenna where as up to 5 meters when use with the external antenna. The tags need both on-chip antenna and two external antennas to achieve the maximum read distances at the three common RFID frequencies (13.56 MHz, 868 to 956 MHz and 2.45GHz)[2]. The RFID could be considered as a dangerous source for spying on people [1, 4]. Since the tags can be read without being swiped or scanned, anyone with an RFID tag reader can read the tags embedded in your clothes and other consumer products without your knowledge [5]. II.
RELATED WORK
Dense-reader mode (DRM) reduces instances of interference between multiple readers. The power of signal is sent from reader is millionfold stronger than the tags. If the frequency is typically narrow bounded, the radio signals that are reflected from the tags will be covered. This implies that, in the dense reader mode, we need to separate the bandwidth use by each tag and limit the reader’s signal in a channel. The use of separate bandwidth strictly prevents the interference in the RFID applications [6]. Dense-reader mode reduces the bandwidth by getting the speed of signal slow and avoids the leak by filter. In the same time, they put the decode filter on tags to improve the anti-interference ability of tags. If the readers of DRM wants to send signal, it will listen first to be sure this channel is available. If it’s not, the reader switches (or hops) to another channel to avoid interfering with another reader on that channel [7]. Three widely used solutions were proposed by Hendrik van Eeden to avoid the interference in RFID applications [8]. In the first proposed solution, readers hop randomly between different frequencies in a particular portion of the UHF spectrum. However, it only works effectively in North America since united state (US) allows UHF tags to only jump between frequencies from 902 MHz to 928
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 521–525. © Springer Science+Business Media B.V. 2008
522
RIZVI ET AL.
MHz. In their second solution, they use the concept of time division multiplexing (TDM) to provide synchronization between all the readers and assign them the time slots for transmission.. However, the time slots would become very small in a large installation based systems. When there are many readers in one location, the readers need more time to access tags that results longer delays. In their third solution, they enforce each tag and reader to use separate portion of the RF spectrum to minimize the chances of interference. In order to prevent the more powerful radio waves from the reader that may overwhelm the waves reflected back by the tags, readers send out energy at one separate frequency where as the tags reflect back a signal at a different frequency. This implies that a mutually compatible and standardized manner is required to avoid frequency hopping. In addition, to avoid interference caused by constantly emitting heavy energy signals from readers, the best solution is to enforce readers not to transmit data during the tag identification cycle. III.
interference between the READERs that operate within 10 channels of UHF range. We use 4-channel approach where four channels are dedicated to the readers and six are allocated for the TAGs. Moreover, for our program, we assume that the interference may occur between the READERs which may exist in the same or in the neighboring CELL. Therefore, each CELL can have at least 4 or more READERs. In other words, four READERs can be activated and worked simultaneously. In this case, each of one works on a specific channel where as the rest has to wait until they find a clear channel. By doing this, we can avoid the possibility that any two READERs can work simultaneously over the same frequency/channel or on the same/neighboring cell(s).
INTERFERENCE AVOIDANCE VIA SOFTWARE BASED SOLUTION
The main purpose of the proposed software based solution is to improve the dense-reader mode to reduce instances of interference. RFID uses EPC protocol for communication though Class 1 Gen 2 networks. It includes three modes: single reader, multi-reader and dense-reader. The dense-reader mode can reduce the interference problem [2]. Dense-reader mode (DRM) reduces (but not solves) instances of interference between multiple readers. In 2006, a European Tele-communications Standards Institute (ETSI) demonstrated two synchronization techniques designed to enable many readers which were operating close to each other and shared frequency simultaneously. These two techniques are Reva system and Feig’s method. Reva System implements a RFID management system to control the interference between multi-reader and tags [3]. On the other hand, in Feig’s system, the reader synchronization instructs the first interrogator to emit a unique pre-pulse RF signal immediately before to its interrogation process [4]. A.
The Proposed Software Based Solution In the proposed technique, we first connect all the readers which are operating in the same area to a single centralized reader controller. Once the reader connects to the controller, it uses Feig’s method to emit a unique prepulse RF signal to the controller. The controller stores the available channel. If there are two reader’s channels overlapped, the controller rejects the later reader and the reader has to change the channel range until the controller accepts. Likewise, the active readers must follow the standard. Instead of using equipments to test performance in a real environment, we provide a software based solution to reduce the interference. Our approach reduces the
Fig.1. Flow Chart for the proposed software approach to resolve interference
RFID: A NEW SOFTWARE BASED SOLUTION TO AVOID INTERFERENCE
523
The proposed approach for avoiding interference is shown in Fig.1. It should be noted in Fig. 1 that the entire proposed approach is based on the concept of closed loop solution where a constant feedback is provided periodically by the system to the requester reader. B.
Basic Scenario Initially, a user starts searching for a specific tag or a list of tags using the controller. On the other hand, the controller sends a signal to the READERs which are located within the close proximity of these tags as shown in Fig.2. Once the Reader receives a signal, it can then check the availability of the four UHF channels as shown in Fig. 3. Once we successfully pass this step, there are two possibilities that we address though our proposed solution: If the Reader finds that all the channels are currently busy, the Reader starts searching the channels. Once it finds a free channel, the Reader broadcasts a READY signal to all the neighboring Readers in order to announce that this specific channel is occupied at this point of time. The Reader will then wait for a specific amount of time to ensure that no other Reader wants to use the same channel. For instance, if another Reader wants to use exactly the same channel, that Reader is required to send the READY signal to request the same channel. In this case, our Reader releases the channel gracefully and starts searching for the other available channel. This cycle of searching goes on unless the Reader successfully occupied the channel. One obvious advantage of this approach is that our proposed solution avoids unnecessary collisions from the other active Readers. On the other hand, if none of the active Readers are interested to use the same channel, the Reader can immediately occupy the desired channel and broadcasts a
The simulating program has three main parts: Controller, Reader, and the Environment. We use a TCP connection between these three parts in which the Environment program works as a TCP Server whereas both Controller
Fig.2. Controller program with testing the availability of Tags
Fig.4 Setting dialog for Reader Program
Fig.3. Reader program with frequency and channel usage
BUSY signal on the line, so that all the neighboring READERs know that this specific channel is fully occupied and currently unavailable. Once the Reader has finished using the channel (BUSY time), the Reader broadcasts a RELEASE signal to announce that the channel is released gracefully. II.
PERFORMANCE ANALYSIS OF THE PROPOSED SOFTWARE BASED APPROACH
524
RIZVI ET AL.
in Station Mode, Reader receives a list of Tag IDs which are randomly chosen and are located all around a particular cell. In Mobile Mode, Reader first sets the header, and then gets a list of Tag IDs which are all around the header of the cell. Since Reader is using four-channel approach, it is essential to display the Reader ID in each channel of the Operating Reader within the receivable distance at the bottom of the panel. The Environment program provides graph feature to view the status of the Readers as shown in Fig. 5. In the simulation, when Reader is operating, all Readers around operating Reader can receive the signal. Therefore, if any two Readers that exist within the same or neighboring cell and/or operating in the same frequency range, they might interfere each other. The size of cells can be changed by clicking the Setting in the menu bar. It can be changed from 3x3 to 10x10. Fig.5. Environment program and analysis the result of test run
and Reader work as TCP Clients. Therefore, the simulation can run on multiple machines. Controller program has a power switch button which will be operated by the Controller to connect to the Environment as shown in Fig. 4. Controller can make a query command by manual typing Tag ID, or open a file that stores a list of Tag IDs. After opening the file, the Controller will determine the total operating time of the commands for performance measure. Reader program also has a power switch button as shown in Fig. 3. When Controller is on, the Reader receives a unique Reader ID, and displays in the title. On the top of the panel, when the reader is operating, it displays the operating frequency, the header of the located tags, reader mode, and Station and Mobile. When Reader is
A.
Simulation Results
We set two scenarios. In the first case, we use 5 readers in one cell that share 4 channels. In the second case, we use 9 cells and use one reader in each cell. There are 27 commands for all readers. The execution time since the first command was sent by the Controller to the last result received by the Controller is like the chart. We notice that the execution time decreases when we get waiting time less with certain limitations. Even we set fewer waiting time, we can not get shorter execution time. Even we use Listenand-Wait protocol to avoid collision; the improvement depends on the choice of the random numbers. If the random number is closed, we get more collisions as shown in Fig. 6. III.
CONCLUSION
The proposed approach can avoid RFID interference efficiently. However, all readers have to follow EPCglobal Gen2 standard and use Listen and Wait protocol. We also need to setup one or more controllers in the working environment where all readers have to communicate with controller(s) before the initial operation that involves extra cost and time. Also, we need to put RFID tags by area as SDMA (Space Division Multiple Access). If we put tags too close, the readers might have more signal collisions which will consequently decrease the performance. For future, it will be interesting to implement this method for comparatively big work space like a warehouse or factory. REFERENCES [1]
Fig.6. Analysis the result of test run with process time (ms) versus waiting time (ms)
[2] [3]
Deal, III, Walter F., “RFID: A REVOLUTION IN AUTOMATIC DATA RECOGNITION,” Technology Teacher, 07463537, April 2004, Vol. 63, Issue 7 Rachel Wadham, “Radio Frequency Identification,” Library Mosaics; Sep/Oct2003, Vol. 14 Issue 5, p22, 1p, 1bw “RFID Big Picture” Microwave Journal; Jul2006, Vol. 49 Issue 7, p47-47, 1/3p
RFID: A NEW SOFTWARE BASED SOLUTION TO AVOID INTERFERENCE [4] [5] [6] [7] [8]
Lisa Smith, “Warrantless wiretaps and your EZ pass” Humanist, Mar/Apr2006, Vol. 66 Issue 2, p38-39, 2p. Erika Fricke, “Air wave interference a consideration,” Daily Planet Staff (12-20-00) Chris Diorio, “What is “dense reader” mode?,” RFID Journal, Available at: http://www.rfidjournal.com/faq/19/78 Alien Technology, “ALR-9800 Enterprise RFID Reader”, http://www03.ibm.com/solutions/businesssolutions/sensors/doc/cont ent/bin/sa_at_ds_9800_v3_web.pdf?g_type=pspot, p3 Hendrik van Eeden, “Why UHF RFID Systems Won’t Scale,” RFID Journal, http://www.rfidjournal.com/article/articleview/1056/1/82
525
A Software Component Architecture for Adaptive and Predictive Rate Control of Video Streaming Taner Arsan , Tuncay Saydam Kadir Has University Department of Computer Engineering Cibali 34230 Istanbul, Turkey Abstract-Quality of Service and Transmission Rate Optimization in live and on-demand video streaming is a very important issue in lossy IP networks. Infrastructure of the Internet exhibits variable bandwidths, delays, congestions and time-varying packet losses. Because of such attributes, video streaming applications should not only have good end-to-end transport layer performance, but also a robust rate control optimization mechanisms. This paper gives an overview of video streaming applications and proposal of a new software architecture that controls transport QoS and path and bandwidth estimation. Predictive Control is one of the best solutions for difficult problems in control engineering applications that can be used in Internet environment. Therefore, we provide an end-to-end software architecture between the video requesting clients, their destination servers, distant streaming servers and video broadcasters. This architecture contains an important Streaming Server Component with Application Layer QoS Manager and Transport Layer Path and Bandwidth Estimator. QoS Manager considers the necessary parameters such as network delay, packet loss, distortions, round trip time, channel errors, network discontinuity and session dropping probability to make video streaming more efficient and to provide required video quality. Transport Path and Bandwidth Estimator, on the other hand provides transmission rates to Destination Servers for optimum video streaming. The paper provides and discusses a software component model of video streaming.
tures, but this event is not live and there is no need any broadcasting tools. If the media content contains live events, this type of streaming is called live streaming. On the other hand, if the media contents can be provided on user’s demand, it is called on-demand streaming. A. Video Streaming Architectures There are several common media streaming architectures available in the commercial use. These architectures [1] are Single Sender - Single Receiver Streaming System, Single Sender - Multiple Receivers Streaming System, Multiple Senders - Single Receiver Streaming System, and Multiple Sender - Multiple Receivers Streaming as seen in Fig.1.
Index Terms-Intelligent Systems, Video Streaming Component Architecture, Optimization Methods, Video Streaming.
I. INTRODUCTION Streaming is very popular in today’s best effort delivery networks. Delivering audio or video content over the Internet can be achieved by two methods: progressive download and real-time streaming. If the content size is short, the progressive download method is used in general. In this method, the media content is directly downloaded from a server into storage units of a client, but in real-time streaming, client software plays media content without storing the content into any storage units. Real-time streaming can be easily explained as delivering media from a server to a client over a network in real-time. The client software is responsible for playing the media as it is delivered. There are two main types of delivery options for real-time streaming: live and on-demand. Simulated live is also another method that is used with live streaming or on-demand as a part of them to add some extra materials such as prerecorded scenes, concerts, interviews and lec-
Fig.1. Video Streaming Architectures: (a) Single Sender-Single Receiver (b) Single Sender-Multiple Receivers (c) Multiple Senders-Single Receiver (d) Multiple Senders-Multiple Receivers.
The first system is the most common streaming architecture, but the second system is the typical broadcast architecture. Although, in the first system the bandwidth will be enough for streaming with the constraints of delay and packet recover acknowledgement, in the second system there is a need for bandwidth regulation and adaptation for network conditions. The last two systems will be popular in the near future, because of its distributed system architecture structure. These systems should have a robust scheduling structure, because it is necessary to send the media content to client in a certain hierarchical order, and also client should put the received packet in correct order to have the media content. In addition to that, there are a large number of audio and video streaming systems available in the market today [2].
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 526–531. © Springer Science+Business Media B.V. 2008
ADAPTIVE AND PREDICTIVE RATE CONTROL OF VIDEO STREAMING
In recent commercial applications, a new concept is in use: Relaying Streaming Media. The main idea is to listen to the incoming stream and then send that stream to corresponding destinations. The main goal is to reduce the Internet bandwidth consumptions. The architecture of such a system is given in Fig.2. In general, streaming technologies support the latest digital media standards such as AAC, H.264, MP3, MPEG-4 and 3GPP. Client software uses protocols rather than TCP/IP, because of its long and bandwidth greedy acknowledgement procedure. These protocols are Real-Time Transport Protocol (RTP) or Real-Time Streaming Protocol (RTSP) [3] etc. and these protocols run in the application layer of the Internet, and control the streaming of media content. First, the client sends a RTP or RTSP request to server and server receives the request and calls necessary software modules to satisfy the request and to start the streaming.
Fig.2. Video Streaming Architecture with Relay.
Network transport layer uses multicast or unicast transport to stream the media content. Multicast transportation is used in media broadcasting. The main idea is to share the stream among the clients as shown in Fig.3. On the other hand, in a unicast transportation schema each client has its own connection to the server as seen also in Fig.3. The more unicast transportation occurs between the server and the clients, the more bandwidth will be necessary in network. Although unicast transportation needs more bandwidth consumption, unicast transportation is reliable, and special transport support is not required.
527
network. Raw video and audio must be compressed before transmission on IP network. This makes data transmission effective, but it is also noted that transmission of the packets should be under the quality of service control. Application layer Quality of Service control tries to achieve this task. It includes congestion control and error control techniques for successful transmission of the packets on IP networks. Congestion control is used for preventing packet loss and reducing the delay. Error control is used for improving video presentation quality, if the packet loss is occurred. Forward Error Correction (FEC), retransmission, error-resilient encoding and error concealment are some examples of error control mechanism. To implement these, continuous media distribution services are necessary. In the Internet, IP protocol can successfully try to achieve this task by means of its best-effort delivery structure. These services also include network filtering, application-level multicast and content replication. In video streaming technology, streaming servers are one of the most important components. Streaming servers offer quality streaming services; they also contain transport protocols, operating system and storage system. Although, quality streaming can be successfully achieved by the streaming servers, there will be a media synchronization mechanism to multiplex the video and audio. Otherwise, video and audio can not be synchronized at the receiver’s media playing tools. Finally, some protocols are needed to regulate the transmission of the packet such as Internet Protocol-IP, transport protocol such as user Datagram Protocol-UDP or Transmission Control protocol-TCP, and session control protocol such as Real-Time Streaming Protocol-RTSP.
Fig.4. Internal Component Architecture of Video Streaming System.
Fig.3. Network Transportation Methods for Delivering Streaming Media
B. Internal Component Architecture of Video Streaming System
Our video streaming component system corresponding to single sender-multiple receivers and given in Fig.4, includes broadcaster, repository for archiving, streaming server and a receiver unit or a client. The main idea is to capture video and audio signals from the source of the raw video and audio and put these media content into correct form to transmit on IP
C. Bandwidth Estimation Video streaming system has some constraints such as bandwidth, congestion and delay. The Internet has also a heterogeneous structure, while performance measurement is not normally considered as the primary goal of its architecture. In our system, to obtain a better video streaming system, end-toend transport performance should be optimized. It is possible to understand the network characteristics by using a bandwidth estimator. This qualification makes packet forwarding more effective and evaluates the performance of video
528
ARSAN AND SAYDAM
streaming via the Internet. On the other hand, decentralized measurement control of the Internet becomes an attraction point for some scientists. There are two major end-to-end measurement methods available: Active Methods and Passive Methods [4]. 1) Active Methods In Active Methods, the idea is to send dummy packets from sender to receiver and trying to understand the network characteristics with the help of transmitted packets. Although, Internet Control Message Protocol – ICMP based traditional tools are available such as ping and traceroute today, it is not possible to obtain satisfactory results by using these tools. Especially, there are no details whether the packet was lost or response was lost in ICMP-based tools, because packet loss rates on the forward and reverse path are different from each other. On the other hand, customer oriented tools are helpful to evaluate the performance of Internet applications. Different techniques exist for active bandwidth estimation [4, 5] such as: • Variable Packet Size Probing, • Packet Pair / Train Dispersion Probing, • Self-Loading Periodic Streams, • Trains of Packet Pairs. There are also different classifications available for active bandwidth estimation tools, such as: • Per-hop Capacity Estimation Tools, • End-to-end Estimation Tools, • Available Bandwidth Estimation Tools, • TCP Throughput and Bulk Transfer Measurement Tools, • Intrusiveness of Bandwidth Estimation Tools. A classification of publicly available bandwidth estimation tools is given in Table I [4]. It is important to note that cprobe was the first end-to-end available bandwidth estimation tool in the literature. On the other hand, treno is the first tool to measure the Bulk Transfer Capacity of a path, and cap is the
first canonical implementation of the Bulk Transfer Capacity measurement methodology. NIMI-National Internet Measurement Infrastructure of Vern Paxson uses cap to estimate Bulk Transfer Capacity of a path [4]. Pathchar, pathload, Sting and PathChirp are some today’s popular examples for publicly available end-to-end bandwidth estimation Active Methods. cprobe [6] estimated the available bandwidth based on the dispersion of long packet trains at the receiver, but then it is shown that the dispersion of long packet trains does not measure the available bandwidth in a path: instead it measures a different throughput metric, pathcar is a tool that is to estimate Internet link characteristics by measuring the round trip time (RTT) of packets sent from a single host. It is possible to obtain the per-hop capacity characteristics such as latency, bandwidth, queue and delays of any selected links. clink, pchar and the tailgating techniques are the examples of measuring the per-hop capacity methods. Sting is a TCPbased network measurement tool that measures the packet loss rate from sender to receiver and vice versa. Sting is developed under the lights of NIMI and it is similar to ICMPbased tools, but in this method TCP’s error control mechanism can be used and it is possible to understand the direction in which a packet was lost. Pathload is another active measurement tool that estimates the available bandwidth of a network path [7]. The main idea is that the one-way delays of a periodic packet stream show increasing trend, when the stream rate is larger than the available bandwidth. A sender process and a receiver process are running in Pathload. Periodic packet streams are achieved by UDP-User Datagram Protocol. On the other hand, TCP connection is used as control channel between two end-points. This algorithm has a very good approach for estimating bandwidth. It uses equation of (R = L / T) where T is transmission period, L is packet size and R is the transmission rate. Minimum transmission period for back-to-back minimumsized packets is around 15-30μs. Therefore Tmin is selected as
TABLE - I CLASSIFICATIONS OF BANDWIDTH ESTIMATION TOOLS IN DETAILS TOOL
AUTHOR
MEASUREMENT METRIC
METHODOLOGY
pathchar clink pchar bprobe nettimer pathrate sprobe cprobe pathload IGI pathChirp
Jacobson Downey Mah Carter Lai Dovrolis – Prasad Saroiu Carter Jain - Dovrolis Hu Ribeiro
Per-hop Capacity Per-hop Capacity Per-hop Capacity End-to-End Capacity End-to-End Capacity End-to-End Capacity End-to-End Capacity End-to-End Available Bandwidth End-to-End Available Bandwidth End-to-End Available Bandwidth End-to-End Available Bandwidth
Variable Packet Size Variable Packet Size Variable Packet Size Packet Pairs Packet Pairs Packet Pairs & Trains Packet Pairs Packet Pairs Self-Loading Periodic Streams Self-Loading Periodic Streams Self-Loading Packet Chirps
treno cap sting ttcp Iperf Netperf
Mathis Allman Savage Muuss NLANR NLANR
Bulk Transfer Capacity Bulk Transfer Capacity Achievable TCP Throughput Achievable TCP Throughput Achievable TCP Throughput Achievable TCP Throughput
Emulated TCP Throughput Standardized TCP Throughput TCP Connection TCP Connection Parallel TCP Connections Parallel TCP Connections
ADAPTIVE AND PREDICTIVE RATE CONTROL OF VIDEO STREAMING
100μs. Minimum allowed packet size is 96bytes. At the beginning, L is set to 96bytes and considering target streaming rate, T is computed from the equation of (R = L / T). In this point, it is also noted that selecting stream length as 100 packets rarely causes packet losses, and 100 packets also provides adequate number of delay measurements. PathChirp is based on the concept of self-induced congestion and it is used the Self-Loading Periodic Streams methodology called self-loading packet chirps [8]. Pathload uses adaptive search method, for that reason it has a long convergence time. Both algorithms use different methodology, and their outputs are different from each other. PathChirp provides a single estimate of available bandwidth, whereas pathload provides minimum and maximum bounds on the available bandwidths. The idea of PathChirp is to use exponentially spaced highly efficient chirp probing train. The task is to estimate the available bandwidth over the path based on queuing delays of chirp probe packets transmitted from sender to the receiver and then conducting a statistical analysis at the receiver. 2) Passive Methods While dummy packets are sent to receiver from sender in active methods, there is no injected packet in passive methods. The idea is to observe traffic already present in the network and then estimate the bandwidth of the network. This technique is called non-intrusive measurement technique, and it is used for detecting shared upstream congestion and discovering bottleneck (significant queuing) router link speeds. This method considers Probability Distribution Function (PDF) of packet inter-arrival in a TCP flow. The PDF shows behavior of spike, spike bump, spike train and train of spike bumps. These characteristic behaviors are interpreted as a bottleneck with no substantial cross traffic, a low bandwidth bottleneck followed by a high bandwidth bottleneck, traversed bottleneck shared with a substantial amount of cross traffic and a low bandwidth upstream bottleneck shared with a substantial amount of cross traffic respectively. In this method, one of the most important points is a clustering problem that detect shared bottleneck. In the receiving part of the end-to-end system, there is an observer that watches the arrivals of packets at some link. After all these steps, minimization of Rènyi Entropy, which is a generalized version of Shannon Entropy formula, is used for discriminating between bottleneck sharing and non-sharing flows [4,5]. Now the problem will change into an optimization (briefly a minimization) problem and then a cost function is selected and an algorithm for optimization is proposed. II. SOFTWARE COMPONENT ARCHITECTURE OF THE PROPOSED PREDICTIVE VIDEO STREAMING SYSTEM After these explanations, it is now possible to develop and describe suitable system architecture for successful end-toend video streaming. This end-to-end general system struc-
529
ture is given in Fig.5. A detailed functional structure of this system is given in Fig.6.
Fig.5. System Architecture of end-to-end Video Streaming System with Relay.
Fig.6. Internal Functional Component Structure of Video and Audio Streaming System with Relay.
In this functional structure we consider four components, namely Broadcaster Component, Streaming Server Component, Destination Service Component and Client Component. Key functionalities within each component are clearly shown in Fig.6. For our purposes, here we would be mainly interested in the interactions between the streaming server and the destination server, such that transmission rates would be optimized and QoS requirements would be met. Within the streaming server QoS Manager, Adaptive Path Optimizer and
530
ARSAN AND SAYDAM
Fig.7. Architecture of Adaptive Path Optimizer, Predictive Rate Controller and QoS Manager.
Predictive Rate Controller are of special importance. The architecture of these subcomponents is given in Fig.7. A detailed internal structure of QoS Manager is provided in Fig.8. Video requests generated by the clients first reach a streaming server which, working closely with the broadcaster component and/or repository, downloads the requested video and streams it up to a destination server from which a client receives its response. In Fig.9, we propose a near complete Software Information Model of our component architecture, by providing four key components and the objects within those components, as well as key functionalities shown by
Fig.8. Internal Structure of Adaptive Path Optimizer, Predictive Rate Controller and QoS Manager.
methods. Clients Component sends live video or video on demand requests to a Streaming Server Component by remotely invoking receive live and receive live video on demand requests. This component schedules such client requests and sends them to a broadcaster component. Streaming server component may find some of the client video on demand requests already archived in the repository. In this case, they are directly downloaded from the repository and sent to a requesting client component. Live video requests are past to a broadcaster component which then sends live RTP packets over UDP to streaming server component. It is important to
Fig.9. Software Component Information Model of Video and Audio Streaming System with Relay.
ADAPTIVE AND PREDICTIVE RATE CONTROL OF VIDEO STREAMING
note that this component architecture is location transparence and all of its components communicate through a component middleware bus over Internet. Streaming server component and the objects shown inside it are most important to our discussion in this paper. Normally Destination Server Components lie closer to client components, while streaming server components are closer to Broadcaster Components (see Fig.2 and Fig.1(b)). Therefore, the QoS as well as the optimized transmission rates between a Streaming Server Component and Destination Server Component become very important. Especially, Transport Layer Path [9,10] and Bandwidth Estimator works closely with Destination Server Component in optimizing the transmission rate such that the receiving queues of the Destination Server Components are not overwhelmed. In this end-to-end system, there is a relay streaming server and several destination servers. Destination servers also provide streaming media to all their clients that receive the media multicast or unicast transmissions. Clients can also request the media, choosing one of the methods such as live video, simulated live video or video on demand (VoD). Broadcaster provides encoded audio and video signal to a streaming server, while there is a repository system for archiving the encoded media. In this architecture, streaming server contains some very important elements, such as an Adaptive Path Optimizer with Bandwidth Estimator and Predictive Rate Controller. It also ensures the Quality of Services for destination servers. Destination Servers have two primary functions: they provide Quality of Service for end-clients, and act as a network agent for the streaming server. Most networking and service companies do research on streaming technologies, and provide many audio and video streaming systems available in the market today. Some systems have limited specifications, containing only transmitting algorithms and not providing optimization and bandwidth estimation. In this study, an integrated solution for effective video streaming architecture has been proposed and developed. Here we use a bandwidth estimation algorithm called pathchirp [4, 8]. For adaptive path optimizer we use a neural network regulated Adaptive Path Optimizer and Predictive [11] Rate Controller [12]. Internal structure of this QoS management that works in the application layer is given in Fig.8. We believe this is one of our main contributions, as well as the novel rate controller and the rate allocation methods that we use. This contribution will be expanded in Fig.9, where we now attack the internal behavior of all the components of video and audio streaming system with relay. In this component information model, we show each internal component and/or object functionality in terms of software object methods. These methods can be locally or remotely instantiated on a client server architecture, communicating over a component middleware bus, such as CORBA, 2JEE or DCOM. There should be an architectural component object and functional consistency between the Figures 4,5,6,7,8 and 9 which together forms our main contribution. The key idea behind all
531
these architectural considerations is to overcome the lossy IP networking properties through an intelligent Path Optimization, Predictive Rate Controller as well as a QoS Management, between a Streaming Server and Destination Server environments. III. CONCLUSIONS Current Multimedia streaming technologies are reviewed, relevant concepts and architectures are considered and classifications of bandwidth estimation tools are discussed. A Software component architecture of a predictive streaming system is proposed, developed and presented in detail. A novel Intelligent Path Optimizer with Transport Layer Path and Bandwidth Component within this architecture is developed and discussed. A new, functionally content rich component information model of a video streaming system is developed, presented and discussed. This information model is closely related to a design class diagram of a critical software design phase. This level of design detail, normally reserved for implementing software, does not commonly appear in the literature. We believe this is a significant contribution. The strength of our work lies not only in researching an presenting an intelligent predictive streaming system with relay, but also in developing and presenting this within the rich context and content of component-based software engineering technology. In other words, networking world and software world are brought together for a better scientific synergy. REFERENCES [1] Quicktime Streaming Server, Darwin Streaming Server – Administrator’s Guide, Apple Computer, 2002. [2] G. Venditto, “Instant Video,” Internet World, pp. 84-101 (Nov. 1996). [3] H. Schulzrinne, A. Rao, and R. Lanphier, “Real time streaming protocol (RTSP),” Request for Comments (Proposed Standard) 2326, Internet Engineering Task Force, Apr. 1998. [4] R. S. Prasad, M. Murray, C. Dovrolis and K. Claffy, ``Bandwidth estimation: metrics, measurement techniques and tools”, IEEE Network , Nov 2003, Vol. 17, No. 6, pages 27-35. [5] Alok Shriram, Margaret Murray, Young Hyun, Nevil Brownlee, Andre Broido, Marina Fomenkov, and Kimberly C. Claffy. Comparison of public end-to-end bandwidth estimation tools on high-speed links. In Constantinos Dovrolis, editor, PAM, volume 3431 of Lecture Notes in Computer Science, pages 306–320. Springer, 2005. ISBN 3-540-25520-6. [6] R. L. Carter and M. E. Crovella. Measuring Bottleneck Link Speed in PacketSwitched Networks. In Performance Evaluation, 27,28:297-3 18, 1996. [7] M. Jain and C. Dovrolis, “Pathload: A measurement tool for end-to-end available bandwidth”, In Proceedings of Passive and Active Measurements (PAM) Workshop, Mar. 2002. [8] V. Riberio, R. Riedi, R. G. Baraniuk, J. Navratil, and L. Cottrell, “pathChirp: Efficient Available Bandwidth Estimation for Network Paths .” Passive Active Measurement Workshop -- PAM2003, April 2003). [9] S.Savage, A.Collins, and E.Hoffman, “The end-to-end effects of internet path selection” in Proceedings of ACM SIGCOMM,1999, pp.289–299. [10] D.Katabi and C.Blake, “Inferring Congestion Sharing and Path Characteristics for Packet Inter arrival times”, MIT-LCSTR-828, Dec.2001. [11] E. Ronco, T. Arsan and P. J. Gawthrop, Open-loop Intermittent Feedback Control: Practical Continuous-time GPC, IEE Proceedings of Control, Theory and Applications, Vol.146, Issue 5, p.426-434,September 1999. [12] Y. Saw, P. M. Grant, and J. M. Hannah, “A Comparative Study of Nonlinear Video Rate Control Techniques: Neural Networks and Fuzzy Logic”, IEEE , 1998.
Routing Table Instability in Real-World Ad-Hoc Network Testbed Tirthankar Ghosh, Benjamin Pratt Computer Networking and Applications College of Science and Engineering St. Cloud State University St. Cloud, Minnesota 56301 U.S.A. Abstract – In this paper we have carried out an experimental study of the stability of routing tables in a real-world ad-hoc network. Two sets of experiments were conducted, one with a static topology, and the other with node mobility. The Ad-Hoc On-Demand Distance Vector (AODV) routing protocol was used to route packets over multiple hops. Both experiments showed that unstable wireless channels have significant effects on the stability of the routing tables.1
I.
INTRODUCTION
Research on mobile ad-hoc network gained momentum over the past several years. Advent in wireless technologies together with rising demand for ubiquitous and mobile computing have contributed even more towards active research in this area. However, there has been a clear lack of commercial applications of ad-hoc networks, as industries are carefully weighing the options of such implementations. One major reason behind this is the lack of experimental results obtained from real-life deployment of ad-hoc networks that can prove beyond doubt their performance efficiency and effectiveness of various protocols designed for their applications. Research in ad-hoc networks over the past several years have focused on routing, security, multicasting, and transport layer issues. The Mobile Ad-Hoc Network working group (MANET WG) was set up by Internet Engineering Task Force with the responsibility of standardizing IP routing protocol functionalities suitable for static and dynamic environments of ad-hoc networks. Some of the routing protocols that have been proposed and have been adopted by the working group are the Ad-Hoc On-Demand Distance Vector (AODV) [14], Optimized Link State Routing (OLSR) [5], Dynamic Source Routing (DSR) [8], and Dynamic MANET On-demand (DYMO) Routing [2]. We are not discussing these protocols in this paper because of space limitation; interested readers may refer to the respective RFCs for details. Although security was not a part of the initial MANET WG effort to standardize the ad-hoc network operations, it started attracting considerable attention over the last decade. Several security algorithms, ranging from cryptographic applications 1
This work is partially funded by an internal grant from St. Cloud State University.
to secure routing to trust formulation, have been proposed and evaluated. However, there has been one fundamental problem with all this research, all too often it is not based on reality. Simulation has evolved to be the predominant method to evaluate these protocols and algorithms. Most algorithms have never been implemented in real environment at all, thus raising serious questions about their effectiveness in real world. It is well known that wireless networks tend to be unpredictable, and hard to characterize. They lack well understood phenomena that can be modeled with close approximation to observed reality. Moreover, all mobility models used in the past research are too simplistic, and have little grounding in actual behavior of mobile users. It is precisely because of this reason that we have not seen a welcoming and enthusiastic acceptance of ad-hoc networks with wide-scale commercial and civilian applications, in spite of having enormous potential. II. STATE-OF-THE-ART All is not bleak after all. In recent years there has been a push for creating ad-hoc testbeds, and evaluating the performance of such networks based on experimental analysis. Several routing protocols, including the Ad-Hoc On-Demand Distance Vector (AODV), Dynamic Source Routing (DSR), Optimized Link State Routing (OLSR), and Destination Sequence Distance Vector (DSDV), were evaluated in some of these deployments. One of the earlier implementations of AODV was discussed in [16] where the authors modified the base protocol while carrying out the implementation. Gratuitous route replies were added to the implementation (which was later added to the standard itself), where an intermediate node, after sending a route reply from its cache, sends a gratuitous route reply to the destination informing it of its action. Routing loops created by rebooting nodes were also taken care of during implementation. In [4] the authors evaluated both AODV and DSDV (Destination Sequence Distance Vector) routing protocols. DSDV was a table-driven or proactive protocol unlike AODV which is reactive. The authors found that both protocols tend to choose unstable links due to their inherent characteristics of selecting lower hop-count routes. This tends to result in poor performance of both protocols, although DSDV was less
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 532–535. © Springer Science+Business Media B.V. 2008
ROUTING TABLE INSTABILITY IN REAL-WORLD AD-HOC NETWORK TESTBED
affected by this. The reason is that DSDV relies on prior handshaking before choosing a link, which reduces the probability of selecting an unstable link. In another outdoor experiment with the AODV routing protocol, the authors in [6] used 40 laptops equipped with 802.11b wireless cards on a rectangular athletic field of size 225 x 365 meters. Their experiment revealed that AODV had an average of 50% message delivery ratio, an average of 370 milliseconds of message latency, and about 6 control packets per message. In [1] the authors analyzed the influence of unidirectional and asymmetric links on AODV performance in a 4-node adhoc network testbed. They also carried out a comparative analysis between the performance of AODV and OLSR, where the later was found to outperform the former in presence of unstable links. In [12, 13] the authors evaluated the Dynamic Source Routing (DSR) protocol using five mobile nodes installed in cars, one mobile node using mobile IP, and two stationary nodes separated by a distance of 671 meters at the two ends of the course. The experiment revealed that about 90% of the packets used two and three hop routes. The overall end-to-end loss rate was found to be 10%. In [17] the authors used unmodified code from ns2 simulator to implement AODV and DSR routing protocols. The experiment consisted of four stationary nodes and two mobile nodes. The experiment with DSR revealed an average packet delivery ratio above 95%, and overall end-to-end latency of about 30 milliseconds. An implementation of a small ad-hoc network has been carried out at University of Colorado at Boulder with 10 nodes including some remote-controlled airborne miniature planes [7]. The authors achieved a throughput of about 250 Kb/s, and latency of about 30 msecs. In [10] an Ad-Hoc Protocol Evaluation (APE) testbed was created using Linux distributions which can be directly booted from a CD. Experiments were conducted with up to 37 physical nodes, with an emulated virtual mobility pattern. Each participant was supposed to move with a set choreographic pattern. An APE with four nodes was created in [11], where the authors confirmed the communication gray zones in real-world implementation of mobile ad-hoc networks. A node i is in communication gray zone of another node j, when i is in the neighbor list of j, but j cannot send packets to i. The reasons for this can be attributed to the broadcast characteristics of wireless networks where the broadcast packets are normally sent at a lower bit rate and hence can reach further. Also, presence of unidirectional links does not prevent a node form seeing its neighbor in one direction, although it cannot send data to the neighbor as acknowledgements are required to travel in the opposite direction. The authors also evaluated different strategies to overcome the effects of gray zones. The MIT Roofnet project [3] was undertaken by a group of researchers in MIT, where a mesh network has been set up with up to 29 indoor and 38 outdoor nodes. Each household in and around MIT was given a laptop computer with an antenna
533
set up on the roof. The authors used Srcr, a variant of DSR, as the routing protocol. The protocol intends to find routes with high throughput. However, the entire experiment has been carried out in a static environment, and thus does not give a realistic performance evaluation in a mobile setting. A prototype of ad-hoc network testbed has been built by the National Institute of Standards and Technology (NIST) to help researchers carry out real-life testing in a laboratory environment. The project, called mLab [9], is made up of three components: one to create different network topology, the second to capture packets, and the third to vary the transmission signal strength. The project is a significant step towards bridging the gap between simulation and real-life deployment of ad-hoc networks and, when completed, will help to create an open source testbed that will allow developers to test, troubleshoot, and monitor the performance of their ad-hoc networking protocols and applications. In a recent report submitted to NSF by a group of researchers [15], the need for developing a testbed or a prototype of an ad-hoc network has been presented very strongly. According to the report, all research in this area “should be performed on a basis of realism. Simulation should be used as a supporting method of evaluating a system, not as the only method. More attention should be paid by researchers to the realities of what is actually happening every day, rather than relying on outmoded models that were created when it was only possible to guess what might happen. Most research should result in working prototypes. Most research should make use of either live tests or modeling based directly on observed behavior of real users and systems working in the targeted environment.” It goes on by recommending that most funding agencies should encourage research that address “problems that are being actively exploited today, projects that help us to better understand the actual uses of mobility in the real world, and the actual behavior of wireless networks in real environments”. Hence, it has become quite clear that we need tools that help researchers build and test their applications and algorithms in realistic environments, which would be valuable for ongoing research in this area. III.
EXPERIMENTAL SET UP
Our testbed was built with four HP Evo N600c laptops running Fedora 7 with kernel version 2.6.21-1, equipped with Orinoco Gold b/g wireless cards. The wireless card drivers were set up in ad-hoc mode using 802.11b standard. The transmission power of each card was reduced to 2 milliWatts during each run of the experiments. This was done to scale down the area of coverage. We selected AODV-UU [18], the AODV implementation developed by Uppsala University, Sweden, as our routing protocol. Our experiments were conducted with simple ping messages. The AODV protocol was run with the traditional HELLO messages, with HELLO_INTERVAL = 2 milliseconds and ALLOWED_HELLO_LOSS = 2. We conducted two sets of experiments, one with static topology, and the other with mobility. The experiments were conducted in our University campus, over an area of about 300 x 250 meters. Routing table
534
GHOSH AND PRATT
stabilities were studied. Below we describe each set of experiments in details. A. Experiment 1: Static Topology In this experiment, the nodes were deployed in a static topology as shown in Fig. 1. Nodes with IP addresses 192.168.1.101, 192.168.1.15, 192.168.1.2 and 192.168.1.202 will henceforth be referred to as nodes A, B, C and D respectively. The nodes were deployed such that nodes A and B were out of each other’s transmission range. Nodes C and D were in the transmission range of all the nodes, and also each one had all other nodes in its own transmission range. Transmission power of the wireless card in each node was reduced to 2 milliWatts to reduce the area of coverage. The experiment was designed to study the stability of routing tables in a real-life ad-hoc network deployment with wireless channel instabilities. Nodes A and B started pinging each other when the experiment began. The node which was acting as the intermediate node between A and B was made unavailable after some time, and the handoff latency was noted when the next hop information changed in each of nodes A and B. Eventually, the node was brought back in the network and the handoff latency was noted again.
changed state 24 times with average valid route of 22 seconds. The maximum time for which the route was invalid at a stretch was 51 seconds. When the experiment started, and nodes A and B started pinging each other, they picked up a route through node C. After some time, AODV was shut down at C; and it took about 200 seconds for node B to rediscover a route through node D. Node A however, took about 102 seconds to rediscover the route to B through D. However, when C again joined the network, it took about 25 seconds for B and 27 seconds for A to revert back to the old route through C. This also showed that the wireless channels between B and D and between A and D were unstable. B. Experiment 2: Mobile Node The second experiment was designed with node mobility with the aim of studying the routing table stability in a real-life setting, the topology being shown in Fig. 2. Nodes with IP addresses 192.168.1.101, 192.168.1.15 and 192.168.1.2 were stationary nodes. As in the previous experiment, we will henceforth refer these nodes as A, B, and C respectively. Node 192.168.1.202 (node D) was the mobile node, whose path of movement is shown with the black arrow. A, B and C were deployed in a manner such that B was in the transmission range of both A and C, but A and C were not in the transmission range of each other. Transmission power of each wireless card was reduced to 2 milliWatts to reduce the area of coverage. Nodes A, B and C started pinging one another during the start of the experiment. After some time, node D started its movement following the path as shown, pinging all the nodes as it passed one node’s transmission range and entered another’s.
Fig. 1. Experiment with static topology
The routing tables in all the nodes were found to be extremely unstable. The wireless channel between nodes B and D was particularly unstable, as the route from B to D changed 66 times in the entire 22 minute span of the experiment, changing back and forth between valid and invalid. The average period of time the route was valid was 23 seconds. Route from B to C was unstable also, although not as bad as the other one, changing 41 times during the span of 22 minutes. However, the average period of valid route was 27 seconds. The wireless channel between nodes A and D was found to be the most unstable. Route from A to D changed state 41 times during the entire span of the experiment; the average time period of valid route being only 14 seconds. The route remained invalid at a stretch for a maximum period of 175 seconds. Route from A to C was comparatively better. It
Fig. 2. Experiment with node mobility
The link between B and C was found to be unstable, resulting in unstable routes in their routing tables. The route from B to C changed 56 times back and forth between valid and invalid during the entire span of 23 minutes of the experiment. The average time the route remained valid was only 22 seconds, with a maximum span of 126 seconds of invalid route. However, the link between A and B was quite stable, showing stable routes to each other in their respective routing tables. When node D started its journey from the vicinity of node A, it quickly found A as its next hop
ROUTING TABLE INSTABILITY IN REAL-WORLD AD-HOC NETWORK TESTBED
neighbor. As D moved out of A’s range, and entered the transmission range of B, it promptly rediscovered the two-hop route to A through B. Occasionally, it received HELLOs from A, and had A as one-hop neighbor in its routing table; hence its route to A changed back and forth between one-hop and two-hops. As D moved out of the transmission range of B, and entered C’s transmission range, it took a long time to regain the routes to A and B through C. This was because of the unstable wireless link between B and C. D’s routing table showed the routes to A and B as extremely unstable. IV. CONCLUSION We conducted real-world experiments with the AODV routing protocol on a 4-node ad-hoc network testbed. We studied the routing table stability in presence of unstable wireless links. Two sets of experiments were conducted, both in a parking lot within the university campus, one with static topology, and the other with node mobility. In both cases, it was found that the routing tables were unstable because of unstable wireless links with frequent loss of HELLO messages. We only had about 30% packet delivery ratio in both experiments. These experiments were only the starting point in our effort to analyze the performance of ad-hoc networks in a real-world setting. We have secured a grant from our university to create and evaluate a large-scale ad-hoc network testbed. We started evaluating the performance by creating 4-node testbed, and are currently in the process of scaling it up. We are also working towards creating an opportunistic ad-hoc network on our campus, with gateways connecting the ad-hoc nodes to the campus backbone. Once successfully deployed, we have plans to secure the network by implementing secure routing protocols and trust formulation techniques. ACKNOWLEDGMENT We extend our sincere thanks to all students of Computer Networking and Applications who helped us in conducting the experiments. REFERENCES [1] [2] [3] [4] [5] [6]
E. Borgia, F. Delmastro, “Effects of Unstable Links on AODV Performance in Real Testbeds”, EURASIP Journal of Wireless Communications and networking, vol. 2007, pp. 14, January 30, 2007. I. Chakeres, C. Perkins, “Dynamic MANET On-demand (DYMO) Routing”, Internet Draft, MANET Working Group, July 2007. Benjamin A. Chambers, “The Grid roofnet: a rooftop ad hoc wireless network”, Master’s thesis, Massachusetts Institue of Technology, May 2002. K.-W. Chin, J. Judge, A. Williams, and R. Kermode, “Implementation Experience with MANET Routing Protocols”, SIGCOMM Computer Communication Review (CCR), 32(5):49–59, 2002. T. Clausen, P. Jacquet, “Optimized Link State Routing Protocol (OLSR)”, Internet Draft, Networking Group, IETF, October 2003. R. S. Gray, D. Kotz, C. Newport, N. Dubrovsky, A. Fiske, J. Liu, C. Masone, S. McGrath, and Y. Yuan, “Outdoor Experimental Comparison of Four Ad Hoc Routing Algorithms”, in Proceedings of the 7th ACM
[7]
[8] [9] [10]
[11]
[12]
[13]
[14] [15] [16]
[17]
[18]
535
International Symposium on Modeling, Analysis and Simulation of Wireless and Mobile Systems (MSWiM), pp. 220–229, 2004. S. Jadhav, T. Brown, S. Doshi, D. Henkel, and R. Thekkekunnel, “Lessons Learned Constructing a Wireless Ad Hoc Network Testbed”, in Proceedings of the 1st Workshop on Wireless Network Measurements (WINMee), April 2005. D. Johnson, Y. Hu, D. Maltz., “The Dynamic Source Routing Protocol for Mobile Ad-hoc Networks for IPv4”, RFC 4728, MANET Working Group, IETF, February 2007. A. Karygiannis, E. Antonakakis, “mLab: An Ad-hoc Network Testbed”, Consumer Communications and Networking Conference, January 2006. H. Lundgren, D. Lundberg, J. Nielsen, E. Nordström, and C. Tschudin, “A Large-scale Testbed for Reproducible Ad Hoc Protocol Evaluations”, in Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC), pp. 337–343, March 2002. H. Lundgren, E. Nordström, and C. Tschudin, “Coping with Communication Gray Zones in IEEE 802.11b Based Ad Hoc Networks”, in Proceedings of the fifth ACM international workshop on Wireless mobile multimedia (WOWMOM), pp 49–55, September 2002. D. Maltz, J. Broch, and D. Johnson, “Quantitative Lessons From a Fullscale Multi-hop Wireless Ad Hoc Network Testbed”, in Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC), September 2000. D. A. Maltz, J. Broch, and D. B. Johnson, “Experiences Designing and Building a Multi-hop Wireless Ad Hoc Network Testbed”, Technical Report CMU-CS-99-116, School of Computer Science, Carnegie Mellon University, 1999. C. Perkins, E. Belding-Royer, S. Das, “RFC 3561 - Ad Hoc On-demand Distance Vector Routing”, http://www.faqs.org/rfcs/rfc3561.html, July 2003. Peter Reiher, et al., “Research Direction in Security and Privacy for Mobile and Wireless Networks”, Technical Report to National Science Foundation, July 2006. E. M. Royer and C. E. Perkins, “An Implementation Study of the AODV Routing Protocol”, in Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC), September 2000. A. K. Saha, K. To, S. PalChaudhuri, S. Du, and D. B. Johnson, “Physical Implementation and Evaluation of Ad Hoc Network Protocols Using Unmodified Simulation Models”, in ACM SIGCOMM Asia Workshop, April 2005. AODV Implementation, (AODV-UU), Department of Information Technology, Uppsala University (Sweden), http://core.it.uu.se/core/index.php/AODV-UU.
Quality Attributes for Embedded Systems Trudy Sherman Arizona State University Ira A. Fulton School of Engineering, Department of Computer Science and Engineering Tempe, Arizona 85281 Abstract- Software quality attributes (QAs) such as reliability and modifiability have been used to define nonfunctional requirements of software systems for many years. More recently, they have been used as the basis for generating utility trees in the Software Engineering Institute’s Architecture Tradeoff Analysis Model (ATAM). Software processes and models, such as the ATAM, are often utilized when developing embedded systems which consist of both software and hardware. In order to determine whether the QA’s defined for software are adequate when working with embedded systems, research based on trade studies performed during the development of embedded system architectures were evaluated. The results of the research shows that while many of the embedded system quality attributes map directly to existing software quality attributes, some attributes such as portability take on a modified definition, and others, such as weight, do not normally apply to software systems. This paper presents the quality attributes for embedded systems identified as a result of the trade study evaluation. I. INTRODUCTION This paper presents a set of quality attributes for embedded systems based on the evaluation of eleven embedded system architecture trade studies. In general, embedded systems are systems consisting of hardware and software designed to work together for a specific purpose. These systems exist in virtually every aspect of our lives. They run our home appliances, home entertainment systems, communications devices, medical equipment, automobiles, transportation systems, etc. With such a widespread impact on everyday life, it is important that these systems meet not only the functional requirements of “what” they should be able to do, but also the nonfunctional or “quality” requirements expected of them such as reliability, security, and maintainability. Over the last several years, these nonfunctional requirements have come to be referred to as software quality attributes (QAs). A brief background on software QAs is provided. Lists of common software QAs are included with definitions for some of the most common ones. For the purpose of this paper, embedded systems are defined as electronic devices that include one or more microprocessors with a customized user interface. This definition is expanded to discuss the similarities with and differences from software-only systems. The trade study research is explained. The technique used to evaluate each trade study and record the findings is described. Finally, the results of the trade study evaluation are presented. Similarities to existing software QAs and the modifications to the definition of existing QAs are discussed. New QAs that are specific to embedded systems are described.
II. SOFTWARE QUALITY ATTRIBUTES Software QAs have been used by software developers for many years. They are usually first identified in the Software Requirements Specification (SRS) which is written in the requirements phase of development. In addition to the functional requirements specifying “what” a software application must be able to do, many standard templates for the SRS include sections dedicated to nonfunctional requirements (quality attributes) such as reliability, security, maintainability, and usability that specify the characteristics of “how” the system will respond to various types of stimuli. For example, section 3 Specific Requirements, in the SRS template defined by IEEE Standard 830-1998 includes an entire subsection on software system attributes. It gives reliability, availability, security, maintainability and portability as examples of software attributes that should be documented as requirements [1]. In the next stage of software development, most commonly referred to as the Architecture or High Level Design phase, a strategy or software framework is put into place for implementing both the functional requirements and the QAs. Architectural design decisions can have a significant impact on the ability of a software system to fulfill QAs. For example, the security of data within a system is highly dependent both on how the user interface is structured and how the data is partitioned. If passwords and levels of access are not determined and built into the structure of the software at a high level, every function that is performed will have to include its own user validation slowing performance and opening the door to security errors and inconsistent security implementation. If both secure and non-secure data are kept together, it is very difficult to limit access to the secure data. Security checks must be performed by every function that accesses the database again leading to reduced performance and increased opportunities for security breaches. As a final step in the architecture design phase, more and more software development organizations are evaluating their design decisions using the Software Engineering Institute’s Architecture Trade-off Analysis Model (ATAM) [2]. This is a multi-day process that involves representatives of all the major stakeholders. The goal of the ATAM is to understand the consequences of architectural decisions with respect to the quality attribute requirements of the system. It is important to note that there is not just one standard list of software QAs. Even though most QA lists are based on a standard such IEEE Std 830, the list of QAs used by an organization is added to and deleted from depending on the customer needs and the developer’s focus on quality. In addition, the descriptions will also change based on the project
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 536–539. © Springer Science+Business Media B.V. 2008
537
QUALITY ATTRIBUTES FOR EMBEDDED SYSTEMS
needs. Table I shows two lists of software QAs from two similar sources that illustrate how the list of QAs might change as well as how the definitions might vary depending on focus. TABLE I EXAMPLES OF SOFTWARE QUALITY ATTRIBUTE LISTS Software Quality Attribute Accuracy Availability Conceptual integrity Functionality Modifiability
Performance
Portability Reliability Security
Subsetability
Usability
Variability
QA List 1 [3] - Not Included Proportion of time the system is up and running. Underlying theme that unifies the design of the system at all levels. Ability of the system to do the work for which it was intended. Ability to make changes to a system quickly and cost effectively. Responsiveness of the system - the time required to respond to stimuli or the number of events. Ability of the system to run under different computing environments. Ability of the system to keep operating over time. Ability to resist unauthorized attempts at usage and denial of service. Ability to support the production of a subset of the system. - Not Included How well the architecture can be expanded or modified to produce new architectures that differ in specific, preplanned ways.
QA List 2 [4] Accuracy of the computed result Deadlock, synchronization, and data consistency - Not Included - Not Included Impact of an expected change How long things take.
- Not Included - Not Included Authentication and integrity concerns - Not Included Information presented to the user, data reuse, operations such as cut-andpaste and undo - Not Included -
EMBEDDED SYSTEMS Philip Koopman of Carnegie Mellon University [5] defines an embedded system as: “A computer purchased as part of some other piece of equipment - typically dedicated software (may be user customized) - often replaces previously electromechanical components - often no “real” keyboard - often limited display or no general-purpose display device” For these reasons embedded systems are also known as “hidden” computer systems. These systems are built into a wide variety of products such as audio/visual equipment, appliances, vehicle engines, elevators, and medical III.
instruments. It is common for these systems to appear to be very simple when they are actually quite complex. Because embedded systems consist of both hardware and software, they have many of the characteristics of softwareonly systems. Every one of the QAs listed in Table 1 might apply to an embedded system. Even though embedded systems have many characteristics in common with software-only systems, there are a number of characteristics which set them very much apart and can pose tough challenges throughout the development process. Characteristics that are common to embedded systems include multi-tasking, real-time response to external events, and event handling without human intervention [6]. There are also a number of constraints that are commonly imposed on embedded systems. These include small size and low weight, low power consumption, harsh environments (heat, vibration, corrosion, etc.), safety-critical operation, and sensitivity to cost [5]. The issues which face the embedded system developer start at the onset of requirements gathering and functional definition just as they do in software-only systems. And, like the software-only systems, in addition to very specific functional requirements, the embedded system is also faced with nonfunctional requirements (NFRs). Even though they are not usually referred to as QAs, these NFRs are the QAs of embedded systems. Embedded system NFRs deal with issues such as timing, memory requirements, and cost. The architecture and implementation choices must adequately address the NFRs as well as the functional requirements or the system will fail. A few specific examples of embedded system NFRs found in literature are defined in Table II [6]. TABLE II EXAMPLES OF EMBEDDED SYSTEM NONFUNCTIONAL REQUIREMENTS DEFINITIONS Embedded System Quality Attribute Debugability
Memory Space
Reliability
Response
Testability
Throughput
Definition Failures in an embedded system must be diagnosable (and repairable) even though the human interface is minimal or non-existent Embedded systems have a limited fixed amount of memory. The usage of memory must be monitored constantly throughout development and is usually a driving force in design and implementation choices. The embedded system is not typically allowed to crash. The software must function (or recover) without human intervention. The embedded system must meet all required response times even if it is in the middle of doing other tasks (such as handling data efficiently). Failure to do so may have catastrophic consequences. The embedded system must be shown to be able to properly respond to all unexpected and simultaneous events, usually without human intervention, even though not all potential failures are known. The embedded system should not be a bottleneck on the machine-to-machine interfaces. Data handling must be extremely efficient.
IV. RESEARCH Eleven architectural trade studies for several different embedded systems developed by three different organizations
538
SHERMAN
were evaluated. Although the systems being developed were similar in nature, there was enough diversity in their functionality to assume that they were representative of embedded systems in general. The trade studies were evaluated to see what characteristics in terms of QAs were being looked at when the developers were making architectural design decisions. Each trade study was given a numbered identifier and entered into an Excel spreadsheet. As each trade study was reviewed electronically, every reference to an attribute of concern was highlighted and annotated. Attributes that were mentioned but were not used as part of the trade study decision criteria were not highlighted. After all attributes in a trade study had been highlighted, the data was entered into the spreadsheet. Data was entered in two tables. The first table tracked the total counts for each attribute in each trade study. The second table tracked only whether a specific QA was referred to or not in each trade study. Both tables provided totals for the QAs referenced. Table III summarizes the research findings. The first column lists the QAs by name. The second column shows how many total times each QA was referenced. The third column indicates how many trade studies referenced a particular QA at least once. The rows that are highlighted are attributes that were referenced by five or more trade studies. TABLE IIII EMBEDDED SYSTEM QAs RESULTS Embedded System QA Accuracy Availability Bandwidth Commonality/Compatibility Complexity - CPU, System, Peripherals Code Size Cost CPU Time Used Debug-ability Durability Ease of Integration Ease of Use/Usability Expansion Capability Features/Functionality Maintenance/Logistical Burden Maturity Memory Usage Performance Physical Size Power Consumption Processor Choice Reliability Safety Schedule Security Speed/Timing Standards Compliance/ Ease of Certification System Resource Usage Versatility/Flexibility Weight
# of Total Times Referenced 4 1 1 13 25 3 10 2 3 1 2 8 6 17 2 1 7 26 16 30 3 17 3 1 2 7 3 3 2 22
# of Trade Studies In 2 1 1 7 5 1 5 2 2 1 2 4 1 6 2 1 3 6 6 10 3 6 2 1 2 3 2 2 2 8
V. RESULTS Thirty embedded system QAs were referenced in the trade studies. Of these, three were easily identified as software-only QAs. They are code size, CPU time used, and memory usage. However, even though they can be considered software-only attributes, they still apply to embedded systems since embedded systems consist of both hardware and software. Seven of the attributes apply only to systems that include hardware. They are durability, maintenance/logistical burden, physical size (length, width, height, and footprint), power consumption, safety, and weight. These are considered new QAs from the perspective of software-only systems. The remaining attributes are often seen as software QAs. However, their usage in the trade studies was expanded to include hardware and firmware (programmable logic) stimuli and responses. The fact that so many of the software QAs correlated so well to those in the trade studies suggests that the quality focus of embedded systems architecture designs might have a good deal in common with the quality focus of software architecture designs. Table IV summarizes these findings. Two correlations were found when evaluating the results. First, as the highlighted rows in Table III show, the QAs that were referenced by the most trade studies were also the ones that were referenced the highest total number of times. This was an expected correlation. The second correlation was that the QAs that were referenced the most often (the highlighted rows) were either new QAs or existing software QAs that had taken on a new or expanded definition. None of them were software-only QAs. This is a reasonable correlation, but it was not expected. It is possible that the software-only portion of the embedded systems being evaluated performed their own independent architecture trade studies. A final finding of interest was that there were six of the QAs that were referenced by only one trade study with a total number of times referenced of three or fewer times. They are availability, bandwidth, code size, durability, maturity, and schedule. A possible explanation for why these QAs were referenced so few times is that some of them are actually a subset or specific instance of another QA. For example, availability, durability, and possibly maturity are all ways of looking at reliability. Bandwidth is a specific measure of performance. Code size is a direct contributor to physical size and has an indirect effect on power consumption. The more code there is, the more memory is required which eventually can lead to more power being consumed. An explanation as to why schedule showed up so few times is that it is not really a QA but rather a project management issue. This is also true of cost even though it was referenced a total of 10 times in 5 trade studies. VI. SUMMARY Based on the findings of the trade study evaluations, it is clear that there are QAs that are specific to embedded systems only and not to software systems. However, the majority of
QUALITY ATTRIBUTES FOR EMBEDDED SYSTEMS
the QAs apply equally well to both software-only and to embedded systems. In many cases the definition of the QA might have been modified, but that is already being done with software-only QAs, so it does not appear to be an issue. These findings open the door to safely assume that embedded systems and software-only systems have many characteristics in common. As a result, it is possible to use software processes and models such as the ATAM to evaluate embedded systems and their architecture design decisions in a more rigorous way with a focus on quality. TABLE IV
Embedded System QA Accuracy Availability Bandwidth Commonality/Compatibility Complexity - CPU, System, Peripherals Code Size Cost CPU Time Used Debug-ability Durability Ease of Integration Ease of Use/Usability Expansion Capability Features/Functionality Maintenance/Logistical Burden Maturity Memory Usage Performance Physical Size Power Consumption Processor Choice Reliability Safety Schedule Security Speed/Timing Standards Compliance/ Ease of Certification System Resource Usage Versatility/Flexibility Weight
SW QA
REFERENCES [1] [2] [3] [4] [5]
EMBEDDED SYSTEM QAs RESULTS Modified Definition X X X X X
New
[6] [7]
X X X X X X X X X X X X X X X X X X X X X X X X X
539
IEEE Software Engineering Standards Committee Std. 830-1998, IEEE Recommended Practice for Software Requirements Specifications, IEEE, New York, 1998. R. Kazman, M. Klein, and P. Clements, “ATAM: Method for architecture evaluation,” Technical Report CMU/SEI-2000-TR-004, 2000. P. Clements, R. Kazman, and M. Klein, Evaluating Software Architectures: Methods and Case Studies, Reading, Massachusetts, Addison Wesley Professional, 2001. P. Clements, F. Bachmann, L. Bass, D. Garlan, J. Ivers, R. Little, et al, Documenting Software Architectures: Views and Beyond, Reading, Massachusetts, Addison Wesley Professional, 2002. P. Koopman, “Embedded Systems In the Real World”, Course Slides, Carnegie Mellon University, January 14, 1999. D. Simon, “An Embedded Software Primer”, Reading, Massachusetts, Addison-Wesley, 1999. David E. Simon, “An Embedded Software Primer”, Addison-Wesley, Reading, Massachusetts, 1999.
A Mesoscale Simulation of the Morphology of the PEDT/PSS Complex in the Water Dispersion and Thin Film: the Use of the MesoDyn Simulation Code T. Kaevand, A. Öpik and Ü. Lille1 Department of Material Sciences, Tallinn University of Technology, Ehitajate tee 5, 19086 Tallinn, Estonia
I. INTRODUCTION Poly(3,4-ethylenedioxythiophene) (PEDT) is a prominent intrinsically conducting polymer. Its complex with polystyrene sulfonic acid (PSS) attracts general interest ([1] and citations therein). The dispersion of the PEDT/PSS complex is usually prepared by an oxidative oligomerization of ethylenedioxythiophene (EDT) in the water solution in the presence of the soluble charge balancing and stabilizing polyanion of PSS. The oxidation (doping) process controls the number of the supposed ionic groups but the positions of the latter are random and not controlled (Fig. 1). The electrophoretic and viscosity characteristics of the swollen particles present in the dispersion are similar to those of nonstoichiometric polyelectrolyte complexes [2].
O S O
O S O
O
OH
+
S
O
x
S
O O
O y z
Fig.1 Simplified chemical structure of PEDT/PSS complex. Average values of x,y,z are typically 7, 3 and 48 respectively
The thin films of the PEDT/PSS complex are prepared in various conditions of the dispersion/solution casting onto solid substrates and their subsequent thermal treatment in vacuum at temperatures of from 50 to 150 °C. Under these conditions the assumed polyelectrolyte complex is transformed into an ionomer. This means that the protons migrate back to the non-associated sulfonic groups. The films prepared are spatially and functionally three-dimensionally inhomogeneous and consist of ca 10–50 nm in size conducting PEDT-rich particles embedded in a quasi-isolating PSS matrix. The morphology in the top layer is different from that in the bulk. This is due to the phase separation resulting in the predominance of PSS in the surface region of ca 0.3–0.4 nm in thickness [3,4]. The morphology-controlled charge transport takes place along the percolating network formed by the strongly coupled conductive particles. In thin film exists nonassociated PSS. Its removal by washing with water (20°C, up 1
to 50% of PSS) does not substantially change the conductive network, because the electrostatic associations between strong polycations and polyanions in water are irreversible ([5] and citations therein). The dried PEDT/PSS film can be redispersed homogeneously in hot water at 100°C [6]. It seems that the basic features of morphology are formed already in the water dispersion. The aim of this research is to simulate the main morphological features of the PEDT/PSS system as a polyelectrolyte/ionomer to gain a better understanding of the formation of the three-dimensional morphology during the whole process of thin film preparation. For this purpose an attempt is made to use the dynamic density functional theory (DDFT) implemented in the MesoDyn module of the Material Studio program package (Accelrys, see also [7]). The MesoDyn enables the evolution of the mesoscale structures to be followed directly from the observation of three-dimensional density fields without introducing any a priori symmetry. However, differently from typical polymeric systems, which are governed by repulsive dispersive interactions, in the PEDT/PSS system dominate attractive electrostatic interactions. It is for this reason that classical approaches to parametrization do not work. In this work an attempt is made to use an atomistic simulation of charged systems to calculate in the framework of the mean field approximation the negative association energies captured in the Flory-Huggins interaction parameter. Below the morphology of the PEDT/PSS complex formed will be described and compared to that deduced from the experiments as outlined above. In view of the polaronic nature of PEDT charged particles this is a highly simplified approach. II. THE PARAMETRIZATION OF THE MESOSCALE MODEL. In calculations, the Compass force field, Amorphous Cell, Discovery, Synthia and MesoDyn modules from the Material Studio 4.2 program package were used. II .A. An atomistic simulation. The full atomistic simulation enables a direct calculation of energies of mixing (ΔEmix) for systems in which the entropy effects are not important unless realistic models of mixed and de-mixed systems can be generated [8]. The calculation of the designed PEDT/PSS systems was performed at 298 K as described in [9]. The systems consisted of at least 8 oligomeric PEDT chains, 8–12 monomers in each, or 1–2 PSS chains comprising 40–80 units
E-mail address [email protected]
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 540–546. © Springer Science+Business Media B.V. 2008
THE USE OF THE MESODYN SIMULATION CODE
or their proper mixtures (the number of PSS monomers was designated as N). The initial bulk phases were constructed using the Amorphous Cell. In all experiments, the final density of 1.45 g/cm3 was used [5]. The equilibrium density to evaluate the force field was calculated as described in [10,11]. The 50 ps production run was performed in NPT conditions at a pressure of 10-4 GP, using the cutoff value 0.95 and spline width of 0.1 nm. For a representative doped system (see below, N = 80) the equilibrium density was found to be equal to 1.53, i.e. 5.5% higher than the experimental value of the real system. In the ionized and non-ionized sulfonic groups the S4 and generic Sgen. atom types were used, respectively. It is to be noted that the latter, being the only choice in the Compass force field, led to an unreasonably low value of the solubility parameter δ in case of neutral PSS (designated as δPSS , 6.6 √(cal/cm3 , see below). For the neutral thiophene component the δPEDT values obtained using the simulation and connectivity indices methods implemented in the Synthia module (Fedors) were 9.84 and 11.3, respectively. The estimate from the experiment equals 9.6 [12]. The calculated cohesive energy densities (ced values, cal/cm3), were extrapolated to the real average PSS chain comprising 380 units (M 70,000). Only statistically acceptable correlations were used (R2 >0.49). In doped systems, the doping level 0.25, i.e. one positive charge per four PEDT repeat units, was used. From the 1:2.5 mass ratio of PEDT/PSS one negative charge per ca 8 PSS units resulted. In the water-washed film these numbers were 1 : 1.4 and 4 respectively . Below the data for the washed film are shown in brackets. These numbers represent the average density of the charge. The charges were located on the hydroxylic oxygen atom of evenly distributed sulfonic groups of the PSS chain and on the sulfur atom of the selected thiophene rings (see Fig. 1). To calculate the ΔEmix of the doped PSS and PEDT components three schemes were tested. The final choice was the scheme shown below which yielded the lowest ΔEmix values. The auxiliary ions to guarantee the neutrality of the systems were included in calculations. (H3O+)nPSSn- + (OH-)n PEDTn+ → PSSn-PEDTn+ + nOH- + nH3O+ (1) , where n designates the number of charges. The mixing energy and the values of the FloryHuggins interaction parameter χ were calculated using formulas [8]: ΔEmix = θ1(ced)1 + θ2 (ced)2 – (ced)12 , (2) χ = Vref (ΔEmix/RT θ1 θ2), (3) where θi denotes the volume fraction and Vref is the molar reference volume (see this value and mesoscale bead types below). In view of the formula (2) the calculated ΔEmix values were extremely sensitive to the ced values of the components. However, in this way the ΔEmix value of the charged PSS and PEDT components was estimated as minus 40.2 (minus 68.5) cal/cm3, resulting in the interaction (χRT) between T and Sc beads (see below) minus 1090 (minus 1570) kJ/mol in the dry film. It should be noted that in a typical ionomer the association energy of two ion pairs is about minus 40 kJ/mol [13]. In both cases the distance between the interacting ions was ca 0.2 nm. In the doped PEDT/PSS system this value was estimated from the pair correlation function. Based on the charge densities shown above the high χFHRT values were
541
roughly consistent with an association energy of about 50(80) ionic pairs in an associated PSS chain. This was quite reasonable and consistent with the input data, i.e. full charges on the interacting ions. However, these extraordinary high values can not be used in the MesoDyn simulation code in the framework of mean field approximation. Furthermore, in the real film the charges are delocalized in both partners, particularly the positive charge is delocalized over several thiophene cycles [14,15]. In the calculations below an adjusted χRT value minus 1.4 was used for all Sc-T interactions. The interaction between Sn and T beads was roughly estimated via the δ values of neutral PEDT and PSS components, using the solubility parameter approach [8]: χ = Vref (δPSS – δPEDT)2/RT (4) The δ(PSS) value calculated using the Synthia module (Fedors) was found to equal 25.0 √J/cm3 (12.2√cal/cm3 , δ(PEDT) value see above). The calculation yielded the χRT value of 4.95 kJ/mol. For neutral polymers this value is too high. For Sn-T and Sn-Sc interactions an adjusted value of 2.6 kJ/mol was used in the calculations below. In the water dispersion, for S1-T1, W-S1 and T1-W interactions the χRT values minus 16 (an arbitrary value considering possible electrostatic correlations [16]). 0 (considering the high solubility in the water) and 0.16 (a weak repulsion) were used, respectively. II. B. A mesoscale simulation. The mesoscopic molecules consisting of beads of equal volume were designed on a coarse-grained level. For real polymeric PSS chains the length (Lm) and molecular volume (Vmol) of the repeat unit and characteristic ratio calculated using the Synthia equaled to 0.2516 nm, 0.2155 nm3 and 11.79, respectively. For the oligomeric PEDT repeat unit the Vmol value was found to be similar (0.1636 nm3). Further the arithmetic mean of 0.1896 nm3 was used. The bead length Lbead = Lm C∞. (further termed a, the Gaussian bond length) was found to be equal to 2.966 nm. This resulted in 12 repeat units in the bead with a volume of 12x0.1896=2.275 nm3, which is the bead reference volume (Vref). The number of PSS monomer units was ca 380 per 1 chain. For PEDT the respective values were 204 and 17 oligomeric chains 12 repeat units in each. This yielded one PSS mesomolecule as a Gaussian chain consisting of 32 beads and 17 PEDT mesomolecules per one bead. These beads were further termed Sc (complexed) and T, respectively (in water termed S1 and T1, respectively). It should be noted that in the water all the sulfonic groups are bound in water clusters or complexed with PEDT. To represent the free PSS and water additional bead types Sn and W were introduced. Based on the bead volume calculated above and in [17] the latter consisted of 76 water molecules. In the water dispersion and dry film the beads interact in accordance with the adjusted χFHRT values. The simulation box size was ca 3x82.3 nm. The grid consisted of 32x32x32 cells and the grid spacing h was equal to 2.573 nm. The value of the scaling parameter d=a/h was 1.1543. In all calculations, the dimensionless time step ca 0.5 was used. The noise factor was set equal to 75. The morphologies were equilibrated in 750–5000 time steps at 298 K (if not shown otherwise). The phase separation of all beads was characterized by the order parameters P(i) (the deviation from the mean bead density at homogeneity).
542
KAEVAND ET AL.
Three different topologies of associated PSS chain represented as homopolymer (i.e. Sc 32), a random copolymer (Sc 1 Sn 5 Sc 2 Sn 7 Sc 5 Sn 12) and an alternating copolymer [(Sc 1 Sn 3) 8] were tested.
398 K did not change significantly. So this significant aspect is not reflected by this model. Note, that we neglected the temperature dependence of interaction parameters. Such an approximation was often used working in the relatively narrow temperature range [20].
III. RESULTS. III .A. An associated PSS chain as a homopolymer. In the dilute dispersion (65%) the swollen oval and round particles of the PSS/PEDT complex of ca 10 to 60 nm in size were formed /see Fig. 2; 1,2, here and below these very similar density profiles of S1 (Sc) and T1 (T) beads are presented in one box/. It was noteworthy that in a much more diluted dispersion an average size of 110 nm was experimentally observed for a 1:2.5 weight ratio of PSS:PEDT [18]. The density was higher in the centre of the particle. These particles were embedded in the water with a high value of the density field (0.9, the distribution curve is not shown). In the 15.7 % dispersion (30 H2O molecules per a sulfonic group, Fig. 2; 3,4) the shapeless clouds of this complex with a much higher density were observed. The microphase separation (Fig. 2; 6) was lower than that in the more diluted dispersion (Fig; 2;5). These dense clouds were embedded in a finer dispersion of the complex in water with lower density fields. The density distribution curves (not shown) of T1 beads in the dilute (maximum distribution below field value of 0.05) and more concentrated dispersion (broad distribution, maximum at 0.4) agreed qualitatively with experimental data, showing the higher swelling in the dilute dispersion [2]. The morphology of the dehydrated PSS/PEDT complex in the absence of the non-associated PSS chain was characterized by a fine-grained structure, the size of grains being from a few nanometers up to ca 10 nm (Fig. 3; 1). A very slight microphase separation of Sc beads was observed, with the P(Sc) value of 0.0024. The P(T) value remained below 0.002 (the P curves are not shown). This is a logical result for the system consisting of two attracting particles. In the system which contained, besides associated PSS chain, also a non-associated PSS chain the macrophase separation of the latter took place (Figs 3; 2). The P(Sn) value reached 0.17 which is several times higher than that of Sc and T beads (Fig. 3; 3). The density field of T beads consisted of two peaks at field values of 0.45 and 0.12 On the contrary, on the density fields of Sn and Sc beads the main maximum is below 0.05 (Fig. 3; 4,5,6 respectively). Similar density distributions of Sc, T and Sn beads were also observed using models discussed below. Thus a three-dimensional, relatively dense network of PEDT particles accompanied by a much less dense layer of Sc beads was formed. This complex was located in the non-associated PSS matrix. This is consistent with experimental data [4]. As mentioned in the Introduction above, the mass fraction of free PSS in the surface layer is ca twofold in comparison to that in the bulk film. This seems to be qualitatively similar to the grades of PEDT/PSS with a low mass ratio of these components. The simulations showed that in such a situation the macrophase separation was increased and the P(Sn) values up to 0.21 were observed (data not shown). This is indicative of a positive feedback, a wellknown phenomenon in polymeric systems [19]. Using this simulation model we observed that the basic features of the PEDT/PSS complex under annealing at
III.B. An associated PSS chain as a random copolymer The same network was observed when the associated PSS chain was simulated as a random polymer but the particles were more clearly seen (Fig. 4; 1). Interestingly, the P(Sn) values up to 0.035 were indicative of the microphase separation. In this case annealing at 398 K caused significant changes in the morphology, i.e. some destruction of the network (Fig 4; 2) accompanied by changes in the density distribution curves was observed (Fig.4; 3-5). III.C. An associated PSS chain as an alternating copolymer. Changing the associated PSS chain topology from a random to an alternating polymer we observed the same microphase separation and a highly oriented directional inhomogeneity (Fig. 5; 1,2). Based on the X-ray diffraction data such a highly anisotropic structure has been reported for various PEDT salts (see [1]). However, the annealing of the structure obtained at 348 K decreased this clear anisotropy, but a certain lateral preference remained and the oval grains of the complex with a maximum size of up to ca 20 nm embedded in a neutral matrix were observed (Fig. 5; 3).A further increase of the annealing temperature resulted in a more fine-grained morphology (Fig.5; 4) which was similar to that observed in the case of the previous model (Fig. 4; 2). These results are qualitatively consistent with experimental data showing the preferred lateral conductivity of PEDT/PSS films accompanied by order-of-magnitude variations in conductivity in all spatial directions ([3] and citations therein). So far in the practice short PEDT chains are used. Introducing a long (34 beads) PEDT chain instead of an oligomeric chain increased the P(Sn) and P(T) values to 0.15 and 0.19, respectively. In both phases the anisotropic structure began to form a three-dimensional network (data not shown). Thus the method used has a certain predictive power as well. Finally, it was quite logical that the treatment of the PEDT/PSS complex as a polyelectrolyte/ionomer carrying full charges on the interacting oxygen and sulfur atoms resulted in very high negative Flory-Huggins interaction parameter values. Therefore we were forced to use adjusted parameters. In this way the DDFT was used and the main features of the generated morphology were consistent with these deduced from the experiment.
CONCLUSION The mesoscale simulation of the PEDT/PSS associate as a polyelectrolyte/ionomeric complex in the water dispersion/dry thin film using the MesoDyn simulation code and adjusted interaction parameter values provided a useful insight into the three-dimensional morphology formation during the whole cycle of thin film preparation.
543
THE USE OF THE MESODYN SIMULATION CODE
1
2
3
4
O rd e r Pa ra me te rs v s. Time St e p
O rd e r Pa ra me te rs v s. Time S te p
O rde r P a ra me te rs
5
0. 11 0. 10
O rd e r Pa ra me te rs
6
0. 05
0. 09 0. 04
0. 08 0. 07
0. 03
0. 06 0. 05
0. 02
0. 04 0. 03 0. 02
0. 01
0. 01 0. 00 0
100
2 00
30 0
4 00
50 0
6 00
70 0
8 00
0. 00 0
10 0
2 00
30 0
Time S te p
Le g e n d Be a d S 1
Be a d T1
4 00
50 0
60 0
7 00
80 0
Time S te p
Be a d W
Le g e n d Be a d S1
Be a d T1
Be a d W
Fig. 2. Density profiles on three sides of periodic box of S1+T1 (1) and water beads (2) in 65 % and 15.7% water dispersion (3),(4), both 750 time steps; (5) and (6) show the evolution of the bead order parameters respectively. The PSS in the complex is presented as a homopolymer.
544
KAEVAND ET AL.
1
Orde r Parame te r
Order Parameter vs. Time Step
3
0.2 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0
2
4
Field Distribution Distribution 0.06 0.05 0.04 0.03 0.02 0.01 0.00 0.0
0.1
0.2
0.3
0.4
0.5
Field Value
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
T ime Ste p
T Density
Field Distribution
5
Distribution
Field Distribution
6
Distribution
0.30 0.25 0.20 0.15
0.5 0.4 0.3
0.10 0.05 0.00 0.0
0.2 0.1 0.0 0.0
0.1
0.2
0.3
0.4 Field Value Sc Density
0.5
0.6
0.7
0.8
0.1
0.2
0.3
0.4 0.5
0.6
0.7
0.8
0.9 1.0
1.1
1.2
1.3 1.4
1.5
1.6
1.7
1.8
Field Value Sn Density
Fig. 3. Density profiles of Sc+T beads in the absence (1, 750 time steps) and presence of Sn beads (2, 5000 time steps); the evolution of the bead order parameters (3) and respective density distribution curves (4,5,6). The PSS chain in the complex is presented as a homopolymer.
See Fig 3, 4 on the next page Fig. 4 Density profiles of Sc+T beads (1, 5000 time steps), the same annealed at 398 K (2, 3000 time steps) and respective density distribution curves (3,4 - 298 K, 5,6 - 398 K). The PSS chain in the complex is presented as a random copolymer. Fig. 5 Density profiles of Sc+T (1) and Sn beads (2) at 298 K, Sc+T beads at 348 K (3) and 398 K (4); (1,2,3) 5000 time steps, (4, 3000 time steps. The PSS chain in the complex is presented as an alternating copolymer.
545
THE USE OF THE MESODYN SIMULATION CODE
1
2
Field Dist ribut ion
Field Dist ribut ion
Dist ribut ion
Dist ribut ion
3
0. 025 0. 020
4
0. 08 0. 06
0. 015
0. 04
0. 010
0. 02
0. 005 0. 000 0. 1
0. 2
0. 3
0. 4
0. 00 0. 00
0. 5
0. 02
0. 04
0. 06
0. 08
0. 10
Field Value T Densit y
5
Dist ribut ion
0. 18
0. 20
6
Dist ribut ion 0. 020
0. 020
0. 015
0. 015 0. 010
0. 010 0. 005
0. 005 0. 24
0. 26
0. 28
0. 30
0. 32
0. 34
0. 36
0. 38
0. 40
0. 42
0. 44
0. 46
0. 000 0. 02
0. 48
Field Value
Fig 5.
0. 16
Field Dist ribut ion
0. 025
Fig 4.
0. 14
Sc Densit y
Field Dist ribut ion
0. 000
0. 12
Field Value
0. 03
0. 04
0. 05
0. 06
0. 07
0. 08
0. 09
0. 10
0. 11
0. 12
0. 13
0. 14
0. 15
0. 16
Field Value
T Densit y
Sc Densit y
1
2
3
4
546
KAEVAND ET AL.
ACKNOWLEDGMENT The authors are grateful to the Estonian Science Foundation for financial support (Grant No 6633) and to the Accelrys Support Group for customizing computational programs.
REFERENCES 1. S. Kirchmeyer and K. Reuter, Scientific importance, properties and growing applications of poly(3,4-ethylenedioxythiophene), Journal of Materials Chemistry 15, pp. 2077-2088 (2005). 2. S. Ghosh and O. Inganas, Self-assembly of a conducting polymer nanostructure by physical crosslinking: applications to conducting blends and modified electrodes, Synthetic Metals 101, pp. 413-416 (1999). 3. M. Kemerink, S. Timpanaro, M. M. deKok, E. A. Meulemkamp and M. J. Touwslager, Three-dimensional inhomogeneities in PEDOT: PSS films, J. Phys. Chem. B 108, pp. 18820-18825 (2004). 4. S. Timpanaro, M. Kemerink, F. J. Touwslager, M. M. De Kok and S. Schrader, Morphology and conductivity of PEDOT/PSS films studied by scanning-tunneling microscopy, Chemical Physics Letters 394, pp. 339-343 (2004). 5. D. M. DeLongchamp, B. D. Vogt, C. M. Brooks, K. Kano, J. Obrzut, C. A. Richter, O. A. Kirillov and E. K. Lin, Influence of a water rinse on the structure and properties of poly(3,4-ethylene dioxythiophene):poly(styrene solfonate) films, Langmuir 21, pp. 11480-11483 (2005). 6. J. Ouyang, Q. Xu, C. Chu, Y. Yang, G. Li and J. Shinar, On the mechanism of conductivity enhancement in poly(3,4ethylenedioxythiophene):poly(styrene sulfonate) film through solvent treament, Polymer 45, pp. 8443-8450 (2004). 7. O. Evers, Mesoscopic simulation of polymer mixtures. In Molecular simulation methods for predictng polymer properties (Edited by Galiatsatos V.), Chap. 4. Wiley interscience, (2005). 8. F. H. Case and J. D. Homeycutt, Will my polymers mix?: methods for studying polymer miscibility, Trends in polymer science 2, pp. 259-266 (1994).
9. Th. Spyriouni and C. Vergelati, A molecular modeling study of binary blend compatibility of polyamide 6 and poly(vinyl acetate) with different degrees of hydrolysis: an atomistic and mesoscopic approach, Macromolecules 34, pp. 5306-5316 (2001). 10. A. C. Genix, A. Arbe, F. Alvarez, J. Colmenero, W. Schweika and D. Richter, Local structure of syndiotactic poly(methyl metacrylate). A combined stydy by neutron diffraction with polarization analysis and atomic molecular dynamics simulation, Macromolecules 39, pp. 3947-3958 (2006). 11. M. Meunier, Diffusion coefficients of small gas molecules in amorphous Cis-1,4-polybutadiene estimated by molecular dynamics simulations, Accelrys. com (2006). 12. S. Reuter and S. Kirchmeyer, EP 1 327 645, (2001). 13. A. Eisenberg and J.-S. Kim, Introduction to ionomers, pp. 267-306. John Wiley & Sons Inc., New York/Chicester/Weinheim/Brisbane/Singapore/Toronto (1998). 14. G.Greczynski, Th.Kugler, M.Keil, W.Osikowicz, M.Fahlman and W.R.Salaneck, Photoelectron spectroscopy of thin films of PEDOT-PSS conjugated polymer blend: a mini-review and some new results, J. of Electron Spectroscopy and Related Phenomena 121, pp. 1-17 (2001). 15. A. Zykwinska, W. Domagala, A. Czardybon, B. Pilawa and M. Lapkowski, In situ EPR spectroelectrochemical studies of paramagnetic centres in poly(3,4-ethylenedioxythiophene) (PEDOT) and poly(3,4butylenedioxythiophene) (PBuDOT) films, Chemical Physics 292, pp. 31-45 (2003). 16. A. Naji, S. Jungblut, G. Moreira and R. Netz, Electrostatic interactons in strongly-coupled soft matter, Physica A 352, pp. 1-26 (2005). 17. J. T. Wescott, Y. Qi, L. Subramanian and T. W. Capehart, Mesoscale simulation of morphology in hydrated perfluorosulfonic acid membranes, The Journal of Chemical Physics 124, pp. 1-14 (2006). 18. R. R. Smith, A. P. Smith, J. T. Stricker, B. E. Taylor and M. F. Durstock, Layer-by-layer assembly of poly(ethylenedioxythiophene): poly(styrenesulfonate), Macromolecules 39, pp. 6071-6074 (2006). 19. I. R. Epstein, J. A. Pojman and Q. Tran-Cong-Myata, Nonlinear dynamics and polymeric systems: An overview. In Nonlinear dynamics in polymeric systems (Edited by Pojman J.A. and Tran-Cong-Myata Q.), Chap. 1. ACS, Washington D.C. (2004). 20. Li Y., Hou T., Guo S., Wang K. and Xu X., The Mesodyn simulation of pluronic water mixtures using the “equivalent” chain method, Phys. Chem. Chem Phys. 2, pp. 2749-2753 (2000).
Developing Ontology-Based Framework Using Semantic Grid Venkata Krishna. P and Ratika Khetrapal School of Computing Sciences Vellore Institute of Technology, Vellore, India E-mail: [email protected]
Abstract Semantic Grid is a type of grid in which resources, services, etc can be searched very easily and efficiently. It helps in finding the best possible match for the desired service or resource. In a Semantic Grid it is very necessary to have an ontology-based framework which can be used to discover most appropriate resource or service required. In this paper, an ontology-based framework is defined which can be used to discover the resources and the services depending upon the match. The framework will define the properties, the major concepts, their description and also the relationship between them in order to find the best possible match for the required resource or service. Also a B2B (Business to Business) marketplace will be used to explain its working more clearly and efficiently. Keywords – Semantic Grid, Ontology, Business to Business (B2B) Strategy.
1 Introduction Now days, Internet is used on a very large scale. Almost everyone use Internet for communication or for finding the information about any person or thing. Internet can be used for any purpose. Thus by seeing this fast growth of Internet an extension of Internet was developed that is Grid Computing. In Grid Computing, the systems are connected to each other in order to share knowledge or information about a particular domain. Grid is always specified for a particular task and for sharing information about a particular domain [1]. Grid Computing is a way of sharing heterogeneous resource, storage space, computing power, etc in order to solve a domainspecific problem. It is a combination of both parallel and distributed computing. Here the nodes do not send the task to a particular node but it is spread to the whole grid and is solved by all the nodes which are interested and also free to do the task. Grid Computing efficiently utilizes the idle CPU time and makes use of it for performing the computational tasks [4]. It also utilizes the storage space of the systems
when needed. Thus large size data can also be stored very easily in a grid by distributing it to different nodes present in the gird. There are many Grids already present which share the information specific to that particular domain. Some of the Grids are BioGrid, NeuroGrid, myGrid, DataGrid etc. All these grids are used for different purposes. This paper firstly describes the basic concepts of Semantic Grid, Ontology, how they are related to each other, then the proposed framework and its working using B2B environment. In the 2nd section, the basic concepts of Semantic Grids and Ontology are described. Then a brief description about the languages used for making the Ontology for a specific domain like RDF and OWL is described in the 3rd section. The 4th section describes the Proposed Framework, its actions, the layers present in this framework, and how it works. Then finally in the 5th section the conclusion about the paper is given.
2 Semantic Grids and Ontology 2.1 Semantic Grid In the Semantic Grid, Grid-related information and services are given a well-defined meaning, better enabling computers and people to work in cooperation [6]. It is an extension of grid in which the resources can be searched very efficiently depending upon the need of the task. Here the services, information’s etc are all defined in a proper manner so that they can be discovered easily and efficiently when needed. This type of grids consists of meta-data information. Here these details are used in order to search the best match for the service or the resource desired for a task to be performed. The information present in Semantic Grid can be upgraded depending upon the needs. The details about a service or resource can be added or deleted as required. Also the software’s, services, information’s etc can be reused more efficiently.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 547–552. © Springer Science+Business Media B.V. 2008
548
KRISHNA AND KHETRAPAL
Semantic Grid has higher inter-operability as compared to the normal. The standards adhered to in the Semantic Grid allows anyone around the world to develop their own tools to interact with semantic grid resources easily [8]. In the Semantic Grid the details about the resources or the services are given. The capabilities of those resources and services are also described. Also Contentaware decision making is supported. Semantic Grid is basically concerned about the data, computation, knowledge and the information related to the services and also the resources. The most important concept in Semantic Grid is Ontology.
2.2 Ontology Ontologies play a pivotal role by providing a source of shared and precisely defined terms that can be used as (meta-data) resource annotations in order to make those resources more accessible to automated agents [3]. Ontologies define the concepts and also describe them. It provides generalization about a domain. Ontology is mainly a conceptualization. Ontologies usually consist of a set of concept definitions, property definitions and axioms about them [1]. Here the terms related to a domain are defined and described in order to make it easier to discover a particular service. There are many ontologies already published. Some of them are Cyc, WordNet, Gene Ontology, Protein Ontology, SBO, GOLD and CIDOC CRM [7]. The main aim of developing Ontology is to define a generalized structure for a domain which can be shared by all the users in the grid. If such a common structure will be used then the chances of misconception will be reduced. Like a word ‘product’ can refer to a substance produced or it can also refer to the quantity obtained by multiplying. These two meaning are totally different from each other and thus if two nodes present in a grid refers to two different concepts then the whole meaning will be changed and misunderstanding will take place. Thus, Ontology can be developed and description about the concepts can be given in order to reduce the time and also to increase accuracy. One more advantage of defining Ontology is that if the meaning of a concept changes in a domain then this meaning can be easily and efficiently updated in the Ontology defined without completely changing all the concepts and their relations. Thus, Ontology is also flexible in nature and can be modified depending upon
the needs of the domain. Concepts can added, deleted or modified if required. Ontologies are often seen as basic building blocks for the semantic Grid services, as they provide a reusable piece of knowledge about a specific domain [1]. They use the existing information about a service or resource and utilize it for upgrading the database and for adding new services if required. A B2B marketplace consist both of suppliers and buyers uploading and gathering product information, exploring potential market partners, and exchanging bids and offers, and of the movement of products, payments, and post-transaction activities [2]. In B2B marketplace the companies who sell the products, also purchase the materials it uses to make their products from other companies [10]. Here, the major classes can be the product, suppliers and the buyers. The details about the products provided by a supplier are maintained. All the properties of the products like type, cost, manufacturing, materials used, etc. These attributes helps the consumer to find the best product depending upon its necessity.
3 Languages for defining Ontology An ontology language is a formal language used to encode the ontology [7]. The most important languages used for developing Ontology are RDF (Resource Description Framework) and OWL (Web Ontology Language). In order to work with Ontology Languages, there are some useful technologies like Ontology Editor (to create ontologies using one of these languages), Ontology DBMS (to store and to query ontology) and Ontology Warehouse (to integrate and to explore a set of related ontologies) [7].
3.1 Resource (RDF)
Description
Framework
Resource Description Framework defines the classes and makes a hierarchical view of it. RDF is used for expressing the meanings of the concepts, which encodes itself in sets of triples, wherein each triple represents the subject, object and the predicate of an elementary sentence [4]. Here the subject denotes the resource whose description is given, object defines the specific type of property and the predicate defines the generalized property and also the relation between the subject and the object. Like consider an example, “Table is made up of wood”:
DEVELOPING ONTOLOGY-BASED FRAMEWORK USING SEMANTIC GRID
where, subject is the “Table”, predicate is “is made up of” and the object is “wood”. Thus, concepts can be defined along with their relations with other objects. Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata model but which has come to be used as a general method of modeling information, through a variety of syntax formats [7]. The characteristics of RDF data model help in adding a new class and its properties when needed without affecting the existing concepts [6].
3.2 Web Ontology Language (OWL) OWL is a XML-based markup language. The Web Ontology Language (OWL), is a knowledge representing scheme designed specifically for use on the semantic web; it exploits existing web standards (XML and RDF), adding the familiar ontological primitives of object and frame based systems, and the formal rigor of a very expressive Description Logic (DL) [3]. Thus it is also used in Semantic Grids in order to domain-knowledge. OWL is the emerging industry standard and is recommended by the W3C for the representation of ontologies [3]. OWL is a semantic markup language for publishing and sharing ontologies on the World Wide Web [6]. It is said to be a combination of DAML+OIL. It is also said to be as an extension of RDF. It uses RDF in its syntax, in the instances etc. OWL uses both URIs for naming and the description framework for the Web provided by RDF to add the following capabilities to ontologies [11]: • • • •
Ability to be distributed across many systems Scalability to Web needs Compatibility with Web standards for accessibility and internationalization Openness and extensibility
There are three species of OWL [7] [11]: • •
OWL Lite, supports those users primarily needing a classification hierarchy and simple constraint features. OWL DL, supports those users who want the maximum expressiveness without losing
•
549
computational completeness and decidability of reasoning systems. OWL Full, is meant for users who want maximum expressiveness and the syntactic freedom of RDF with no computational guarantees.
4 System Model This framework is proposed for making the discovery of services and resources in a Semantic Grid efficient and accurate. It is based on bottom-up approach. A bottom-up approach means the environment that allows ontology users to evaluate impact factors of concepts in ontology and that results of the evaluation are reflected to the modification of the ontology [1]. Here the services or concepts are added and evaluated by the user. Any user having the authority can add a service but the service is checked in order to reduce redundancy and if it’s not already present then it’s inserted in the ontology as a class.
4.1 Major actions Framework
in
Ontology-based
There can be three actions that can take place in this environment: Registering a Service, Deleting a Service and Searching a Service. When a user enters a new entry for adding in the existing ontology then all the databases present in different nodes in the grid are evaluated to check whether the service already exists or not. If the service already exists then it is checked whether both the services are having the same description or not. If the services are having the same description then the new service description is not added but if they have different descriptions then the new description is also added as one of the descriptions of that service. Now if the service doesn’t exist in the ontology then this new service is added in the hierarchy along with its description on the proper level of the ontology hierarchy. When the services in a domain changes and few concepts have to be deleted then before deleting a concept it is checked whether any other is concept is related to it or if this particular concept id used frequently by any other user. If it’s so then the service is not deleted otherwise it is deleted. For searching a service in a Semantic Grid the databases are checked for that particular concept in all the databases present in different nodes or different systems in the Grid. If it is not present then the descriptions of
the concepts are searched. If the service is not present
550
KRISHNA AND KHETRAPAL
in the descriptions also then the concepts with similar meaning are searched and then the result is send back to the user.
Web Page
Application Interface
4.2 Layers in the Framework Convert the Request based on the Ontology
The description and the working of this proposed framework is described by explaining the search mechanism in a Semantic Grid. The proposed framework consists of three basic layers as shown in the Fig. 1: • • •
Process Request
Filter
User Interface Semantic Layer Repository Layer
Manage Information
User Interface Layer, is for interacting with the user. Here any web page or an interface in the form of an application can be provided. Using any of these interfaces the user can enter the required service or resource. The request is accepted using this interface. Semantic Layer, takes the request accepted by the application layer and converts it in the form of the Ontology specified for that particular domain. It also sends the query to Repository Layer. Repository Layer, maintains the details, information, or database present in different nodes in the grid related to the concepts, their description (properties) and also relation between them.
User Interface
DB for Concepts C1… Cℓ
Repository Layer Fig. 1 Layer in Ontology-based Framework
4.3 Details of the Framework The detailed diagram of the Ontology-based Framework is shown in the Fig. 2:
DB for Description D1… Dm
DB for Relation Rd1..Rdn
Fig.2 Proposed Framework
Let there be ‘ℓ’ records in Concepts Database (C), ‘m’ records in Description Database (D) and ‘n’ records in Relations Database (Rd). (C1, C2, ------, Cℓ ) ∈ C ; (D1, D2, ------, Dm ) ∈ D ; & (Rd1, Rd2, ------, Rdn ) ∈ Rd Relations Database contains details about the relations between a Concept and a Description & also between two Concepts. Let
γ Semantic Layer
Combine Results
= {(Cx, Dy) | Cx has a definition or property Dy}; α = {(Cx ,Cy) | (Cx ≡ Cy) ∧ (Cx ≡ ¬ Cy)}
γ ∪ α ≡ ( ∀ x ) (Rd(x) ∧ E(x)) where, Rd(x) : x is an element of Rd & E(x) : x is the element to be searched. All the above notations describe the elements in the Repository Layer. Now consider the Figure 2 where the element entered by the user for searching be X then, Q = f(X) P = Query (Q) = Query (f(X))
DEVELOPING ONTOLOGY-BASED FRAMEWORK USING SEMANTIC GRID
I = I1 ∧ I2 ∧ …. = I1 = Ca
γ ∪α
∧ … ∧ Cx ∧ Db ∧ ….. ∧ Dy where, Ca ≡ Cx; Db ≡ D y
I’ = I - Ix where, Ix = some useless information’s R1 = h (I1) = h (Ca ∧ … ∧ Cx ∧ Db Similarly, R2 , R3 etc.
∧ ….. ∧ Dy)
R = R1 ∧ R2 ∧ ... = h (I1) ∧ h (I2) ∧ … Here, Q is got by applying some function which will covert the user input according to the Ontology. Then this Q is sent to the Process Request Phase where a Query is written according to the requirement and P is obtained. Also the information I is taken from the Manage Information Phase and then result is searched in it. The I is updated by the Filter which takes out the useless information. Then hash function is applied on the information’s in order to the results R1, R2 … These results are combined and send back to the User in the form of a Web Page or an Application. The working of the framework for searching a service in a Semantic Grid is as follows: • The user sends the request through the User Interface layer in the form of a Web page or an application and the request is accepted by this layer itself. • This request is then send to the Semantic Layer where it is converted into the form of the Ontology described. • This converted request is then processed and the query is send to all the databases present in the Repository Layer. • The Repository Layer checks for the service in the concepts database, then the description database and then finally in the relations database. • The results from all the databases are sent back to the Semantic Layer which manages these details and information’s about the services. • These details are also filtered in order to see if any replication is there or not and also depending upon the specifications given by the user. • Then all the results are combined and converted into the Web format or application format and sent to the User Interface. • Then finally the results are displayed to the user.
551
This is how this framework works. The classes and their details are stored in the Repository Layer and maintained by it. Theorem 1: Result of a Search is always successful provided the element to be searched is present in the database and the result is semantically correct. Proof: As it’s already described that whenever a Search is performed then the data is searched in the databases present in the Repository Layer. Here, two types of output are obtained if the element to be searched is present in any of the databases present in different nodes of the Grid; either they belong to γ or to α . If some relevant results are obtained then the final result R is obtained and is displayed to the user. Here the time constraints can be calculated as follows: T = Xt + Qt + Pt + It + (I’t × p) + (h t × q) + Rt + (ℓ × t1) + (m × t2 ) + (n × t3 × ℓ × m ) Xt is the time taken to send the Request to the Semantic Layer. Qt is the time taken to convert the Request according to the Ontology. Pt is the time taken to obtain P. It & I’t are the time taken to evaluate I & I’. I’ can be evaluated many times and thus it is multiplied with ‘p’ where I’ is evaluated p times. Then h t is the time taken to find out the Hash values of ‘q’ Information obtained from the Manage Information Phase. Rt is the time taken to obtain the final result by combining all the results. While searching the element in the database each record requires sometime. Let the time required by each records present in the Concepts database be t1. Similarly, let this time be t2 in Description Database and t3 in Relations Database. When the records are searched and matched in Relations Database then for each record both the other Databases are also searched in order to find out the γ & α . Thus finally time T is required to search an element and to get the final output.
5 Conclusion In this paper the basic concepts about the Grid, Semantic Grids, Ontologies and Languages used for describing Ontology is described. It also defined a framework which can be used to discover a service in the Semantic Grid desired by the user. Also the working of the framework is described and also the notations are described which are used to evaluate the output in each and every Phase of the Framework. It is finally concluded that this framework can be used as a base in
552
KRISHNA AND KHETRAPAL
any domain-specific ontology development and it can also maintain and search the perfect match.
References [1] Sunjae Lee, Wonchul Seo, Dongwoo Kang, Kwangsoo Kim and Jae Yeol Lee, “A framework for supporting bottom-up ontology evolution for discovery and description of Grid services”, Expert Systems with Applications, Volume 32, Issue 2, February 2007. [2] Taehee Lee, Jonghoon Chun, Junho Shim and Sanggoo Lee, “ An Ontology-Based Product Recommender System for B2B marketplace”, International Journal of Electronic Commerce, Volume 11 Issue 2, December 2006. [3] Mehul Bhatt, Wenny Rahayu, Sury Prakash Soni and Carlo Wouters, “OntoMove: A Knowledge Based Framework for Semantic Requirement Profiling and Resource Acquisition”, 18th Australian Software Engineering Conference, 2007 (ASWEC 2007), April 2007. [4] Joshy Joseph and Craig Fellenstein, “Grid Computing”, Pearson Education, 2007. [5] Houda Lamehamedi and Boleslaw K. Szymanski, “Decentralized data management framework for Data Grids”, Future Generation Computer Systems, August 2006. [6] Wei Xing, Marios D. Dikaiakos and Rizos Sakellariou, “A Core Grid Ontology for the Semantic Grid”, Sixth IEEE International Symposium on Cluster Computing and the Grid, 2006 (CCGRID 06), Volume 1, May 2006. [7] Ontology(Computer Science), www.wikipedia.com. [8] Andrew Flahive, Wenny Rahayu, David Taniar and Bernady Apduhan, “A distributed ontology framework in the semantic grid environment”, 19th International Conference on Advanced Information Networking and Applications, 2005 (AINA 2005), Volume 2, March 2005. [9] Jinguang Gu, Heping Chen, Lingxian Yang and Lin Zhang, “OBSA: Ontology-based Semantic Information Processing Architecture”, Proceedings of the 2004 IEEE/WIC/ACM International Conference on Web Intelligence WI ‘04, September 2004. [10] Gary P. Schneider, “Electronic Commerce”, 4th Annual Edition.
[11] Web Ontology Language (OWL), www.w3.org .
A Tree Based Buyer-Seller Watermarking Protocol Vinu V Das, Member, IEEE Assistant Professor, Department of Computer Science, Saintgits College of Engineering, Kottayam-686532, Kerala, India. Email: [email protected]
Thus only legitimate owners of the digital content can perform the detection and extraction. Non-blind watermark schemes are typically considered more robust than blind one, because they exhibit better tolerance against various image processing techniques. Due to the inherent robustness, this paper restricts itself to the invisible watermarks. A number of watermarking protocols have been proposed to crack down the distribution of illegal replica [4], [8], and [10]. However most of them ignore the customer right and the other address the issue inefficiently. The lack of appropriate mechanism to protect customer privacy during transaction is also one of another shortcoming of these protocols. Buyer-Seller watermarking protocol in this paper is an alternate method, which is more secure, flexible, and convenient and provides privacy for the general public than previous solutions. The proposed protocol fix the problems in 1. INTRODUCTION the Qiao and Nahrstedt’s [10] Owner-Customer watermarking This decade has seen a rapid growth in the availability protocol and Memon and Wong’s [4] Buyer-Seller of the digitized form of numerous multimedia contents. The watermarking protocol. Moreover it enables buyers to check rapid growth of internet and the different open networks has led the quality of the product purchased from the seller. to the great need to preserve the digital rights of both the buyer 2. RELATED WORKS and the seller to avoid any unlawful manipulation of digital data, unauthorized duplication and distribution of multimedia A number of watermarking protocols have been contents. Such a threat on intellectual property and copyright can be effectively protected using the digital watermarking proposed using cryptographic techniques to tackle the technique, which is considered to be one of the most promising distribution of illegal replicas, All the traditional watermark based fingerprinting protocols were designed to protect the solutions [1], [2]. A digital watermark is an additional signal added to seller’s ownership of digital content rather than the buyer’s digital data (normally an image, audio or video) which can be right. In owner-customer watermarking protocol [10], a extracted or detected some time later to judge the ownership of customer (or buyer) forwards encrypted version of a predefined bit-sequence to the owner (or seller). The owner the digital content. In general, the watermark could be visible or invisible. then generates a unique watermark based on the encrypted A visible watermark typically consists of visible text message or data received and inserts it into a copy of the digital content, a company logo indicating the ownership of the digital content. and sends the watermarked copy back to the buyer. Since only In contrast, invisible watermarked content appears identical to the buyer knows the secret key, he can prove to anyone on the the original. The only way to find out the existence of the legitimate ownership of the digital content. However, when a invisible watermark is to examine the digital con tent with pirated copy is found, the change against the customer pointed by the embedded watermark is objectionable because the appropriate watermark detection and extraction algorithms. Invisible watermarking schemes can be further divided owner in Qiao and Nahrstedt’s protocol still has access to the into blind, semi blind and private watermarking schemes [3], watermark copy in its final form, which is called customer’s [11]. In blind watermarking, original digital content is not right problem. This is equivalent to assuming that the seller is required to watermark detection and extraction algorithm. On always trustworthy, which is not true. Thus a malicious seller the other hand non-blind techniques require the original digital can easily frame the buyer involved in a particular transaction content for the watermark detection and extraction as input. Abstract-To restrict the unauthorized duplication and distribution of multimedia contents, a content owner (seller) may insert a unique digital watermark into each copy of the multimedia contents before it is sold to a buyer. Some time later, the seller can trace the unlawful reseller (original buyer) by extracting and checking the watermark embedded, if an illegal replica is found in the market. However, the accusation against the seller cannot be overruled as he also has access to the watermarked copies. This paper proposes a new efficient watermarking protocol which is based on tree model and public key encryption standard. Buyer places the purchase order to sales point which will be forwarded to the certification authority via the dealer. The trusted certification authority issues the watermarked image in such a way that the buyer can check its originality. This prevents the buyer from claiming that an unauthorized copy may have originated from the seller. This gives highest privilege to the user to ensure the quality of the purchased digital content.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 553–557. © Springer Science+Business Media B.V. 2008
554
DAS
The proposed model based on any public key cryptosystem has five different roles as follows. 1. D: The Dealer, who always wants to make a profit on the bulk sales of certain digital content. The dealer may be the original owner of the digital content, or an authorized reselling agent. 2. S: The seller or the sales point, from where the buyer purchases the digital content. The seller may be the original owner of the digital content, or an authorized reselling agent or dealer. 3. B: The buyer, who wants to purchase a digital content from the seller S. 4. CA: Certification Authority, is responsible for issuing public and secret key for anyone who would like to take part in commerce via internet. The Certification Authority also issues the digital signature certificate based on private key in such a way that anyone in the electronics commerce transaction can check the authenticity to avoid any future repudiation. CA can also be viewed as trusted watermark certification authority, which is responsible for generating valid watermark embedded digital contents. CA may appoint trusted Regional Certification Authorities (RCA) to share and reduce its workload. 5. ARB: An arbiter, who adjudicates lawsuits against the infringement of copyright and intellectual property. This paper restricts itself to further discussion on security or implementation issues related to the public key cryptosystem by assuming that a public key infrastructure has been well deployed so that each party has its own publicprivate key pair as well as a digital signature certificate issued by CA. Also assume that every member in transaction has watermarking extraction and detection algorithm software. Before elaborating further on the proposed protocol, the notations used in this model are defined below. DC Original copy of the Digital Content without any watermark WI Generated Watermark Index information, which is to be embedded into the digital content K Random 24 digit number generated by the B as product serial number P Unique number given by the CA to identify a digital content, which will be published for anyone who is likely to take part in the transaction (PB, SB) A public-secret key pair, where PB is buyer’s (or B’s) public key and SB is B’s secret key. DSB(M) The message M is digitally signed by B’s private key 3. PROPOSED PROTOCOL EPB(M) The message M is encrypted using B’s public key The message M is encrypted using B’s private The proposed protocol addresses many issues found in ESB(M) key its predecessor such as customer’s right problem, unbinding problem and it also fix the customer unbinding relation problem. by releasing the corresponding watermarked copy afterwards. Thus it is a breach of security in the selling system. The Memon and Wong’s buyer-seller watermarking protocol [4] solved the customer’s right problem. In this protocol the existence of a watermark can only be determined by detection algorithm. One transaction in this protocol contains the seller, the buyer and the Watermark Certification Authority (WCA). In this scheme, watermark insertion is performed in the encrypted domain. Since the seller cannot access the watermarked digital content in its final form, he cannot fabricate piracy to frame a buyer. If the seller wants to verify the buyers legitimate possession of the watermark copy, then the arbiter can ask the buyer to perform the verification via zero-knowledge watermarking scheme [5][12]. Even though the customer’s right problem is solved by the Memon and Wong’s protocol, there are a few issues left undiscussed. The seller can intentionally transplant a watermark initially embedded in a copy of a completely original digital content into another copy of a completely different digital content, provided both copies are sold to the same buyer. This problem is referred to as unbinding problem by Lei et al[13]. The unbinding problem arises since there is no proper mechanism on binding specific watermark to a digital content. Therefore, once the seller extracts the extracts the pirated watermark, it is possible to transplant in a higher priced digital content to produce made-up piracy. The Lie et al [13] watermarking protocol binds a watermark to a common agreement (ARG) by the trusted third party’s (TTP) signature and the ARG uniquely binds a particular transaction to a piece of digital content. Under this scheme, the seller cannot transplant the watermark embedded in a pirated copy into a copy of higher priced digital content. In addition, the buyer can remain anonymous during the transaction through applying in advance to a Watermark Certification Authority (WCA). Even though customer’s right problem and unbinding problem is solved, the customer has extra burden to purchase any original digital product. The interaction of customer and WCA is not there in conventional commercial transaction, thus it is a labors for a buyer to get in touch with WCA and purchase a digital content, especially in the contemporary fast world. This paper refers to this problem as customer unbinding relation problem. Customer has unbinding relation with seller and unnecessary relation with WCA, which is to be aborted. It would be idle if the buyer places order to seller and rest of transaction taken out by the system itself sincerely so as the buyer gets the original copy of digital contents. At the end of the transaction the buyer would be able to ensure that what he purchased is a quality one. This paper proposes a secure and flexible tree based buyer-seller watermarking protocol that addresses all these issues.
A TREE BASED BUYER-SELLER WATERMARKING PROTOCOL
555
First level contains the trusted certification authority (CA), its database (DB) where unique product serial number and its watermark information is stored for any future purpose, and DSB(C) the arbiter, who takes necessary details from the DB of CA. Second level of the tree has n dealers, number of sales points Figure 1 shows the proposed model of tree based buyer-seller are at the third level and buyers are at the fourth level. watermarking protocol. DPB(C)
The cipher text C is decrypted using B’s public key The cipher text C is decrypted using B’s private key
556
DAS
A. Watermarking Protocol
Figure 2 visualizes the details of possible transactions in the proposed watermarking protocol. Step-by-step procedures of the transactions are given below: 1. When B wants to purchase a digital content from S, B will generate a large number key K and encrypt using the public key PB (ie; EPB(K)), which can be decrypted only by B’s private key SB. Now B will place the purchase order message M by digitally signing it with private key SB (ie; DSB(M)) along with EPB(K). 2. Upon receiving the purchase order from B, (ie; EPB(K), DSB(M)), S will analyze the authenticity of the order by checking the digital signature by decrypting the message M using the public key of B, PB. After digitally signing the message M with private key SS (ie; DSS(M)), and unique product ID P added to EPB(K) (ie; (P+EPB(K)) before encrypting with B’s public key PS (ie; EPS(P+EPB(K))), then the purchase order will be forwarded to D. 3. D checks the authentication of the purchase order from S, by decrypting the digital signature, DPS(DSS(M)). Then the purchase order is forwarded to CA, after digitally signing the message M (ie; DSD(M)), and encrypting EPD(EPS(P+EPB(K))) 4. Upon receiving the purchase order from B, (ie; EPD(EPS(P+EPB(K))), DSD(M)), via S and D, CA will check the digital signature by decrypting M, DPD(DSD(M)). CA can find out the private key from the details in the message M of B, S and D, to decrypt the purchase order to get P and K. DSD(EPD(EPS(P+EPB(K)))) = EPS(P+EPB(K)) (1) DSS(EPS(P+EPB(K))) = P, EPB(K) (2) = P, K (3) DSB(EPB(K)) CA will generate the WI from K and public key of PB, PS and PD, in such as way that it satisfies the following conditions from (4) to (9), where Γ, ƒ and Δ are the privacy-homomorphic encryption functions with respect to watermarking insertion and σ, and μ are permutation functions. Γ(K, PB, PS, PD) = WI (4) (5) σ(WI, K) = PB ƒ(SS, EPB(WI)) = WIS (6) (7) Δ(WIS, SS) = PS ƒ(SB, WI) = K (8) μ(PB, SB, P) = PK (9) These functions are designed to give desired output from the given input parameters. 5. CA stores the P+K and WI in their database with a primary key PK for any future use (arbitration). Generated watermark WI is encrypted with private keys, EPD(EPS(EPB(WI))), and forwarded to D. After the decryption, DSD(EPD(EPS(EPB(WI)))), EPS(EPB(WI)) embedded digital content is forwarded to S without having any authentication check at D. 6. S extracts and decrypts the received the watermarking, DSS(EPS(EPB(WI))), to get EPB(WI). This can be passed
to the conditions (6) and (7) to check its authenticity; if the function Δ(WIS, SS) gives PS, then the received EPB(WI) is a valid watermark and it is embedded into the digital content by applying a special function δ, δ(DC, (EPB(WI))), which will be only known to S and will be kept secret, and then forward to B along with EPB(WI). If the forwarded watermark is not original reject the same to D to resend it. 7. B may decrypt, DSB(EPB(WI)), to get WI and apply the function ƒ(SB, WI) to get K, which is a 24 digit number originally generated by B itself. Thus the authentication of the purchased digital content is proved. One of the main advantages of this protocol is that the buyers can check and purchase the quality of the original digital content by themselves. Levels of the tree may be increased or decreased depending on the electronic commerce transaction tree. B. Arbitration Protocol
When a pirated copy PC of certain digital content DC owned by B is found in the market, the arbitration protocol proposed in this section can be used to determine the identity of the responsible member for the PC. Following steps are involved to locate the responsible illegal distributor. 1. ARB generates PK from the function μ(PB, SB, P) to search and find out WI and K from CA’s database. 2. ARB will ask the B to generate K from WI and SB using ƒ(SB, WI). B is the responsible illegal distributor if he doesn’t do it and not able to generate PB using the equation (5). 3. If B forwards a complaint against S, S will be asked to extract (EPB(WI)) from δ(DC, (EPB(WI))). Since the function of δ is only known to S, authentication can be done very easily. For any further clarification S may be asked to generate PS from (EPB(WI)) using the equation (6) and (7). Since all doors are open to the buyer to check the originality of the purchased digital content, only the above steps are necessary for arbitration. 4. CONCLUSIONS This paper proposed an efficient tree based buyerseller digital watermarking protocol that solves the customer’s right problem, the unbinding problem and the customer’s unbinding relation problem. Protocol is designed for public key crypto system and the operations of watermark insertion are performed by the seller rather than the certification authority. Moreover the buyer need not contact the certification authority; instead the buyer can place the purchase order and wait for the digital content to arrive from the seller. The security of this protocol lies on the security and robustness of the encryption standard used.
A TREE BASED BUYER-SELLER WATERMARKING PROTOCOL
REFERENCES [1] N. Memon and P W Wong, “Protecting Digital Media Contents,” Commun. ACM, Vol 41, No. 7, pp. 35-43, July 1998 [2] G Voyatzis, N Nikolaidis, and I. Pitas, “Digital Watermarking: An Overview,” in Proc. 9th European Signal Processing Conf., Sept. 1998, pp. 9-12 [3] S Katzenbeisser, “On the Design of Copyright Protection Protocols for Multimedia Distribution using Symmetric and Public key Watermarking,” in Proc. 12th Int. workshop on Database and Expert System Application, Sept. 2001, pp. 815-819 [4] N Memon and P W Wong, “A Buyer-Seller Watermarking Protocol,” IEEE Trans. on Image Processing, Vol. 10, pp. 643-649, Apr. 2001 [5] K Gopalakrishnan, N. Memon, and PL Vora, “Protocols for Watermark Verification,” IEEE Multimedia, Vol. 8, pp. 66-70, Oct-Dec. 2001 [6] J J Eggers and B Girod, “Quantization Watermarking,” in Proc. of SPIE, Vol. 3971: Security and Watermarking of Multimedia Contents II, San Jose, CA, USA, 2000 [7] Goi B-M, Phon R C-W, Yang Y, Bao F, Deng R H, and Siddiqi M U, “Cryptanalysis of two Anonymous Buyer-Seller Watermarking Protocols and an Improvement for true Anonymity,” in Proc. Applied Cryptography and Network Security, 2004, LNCS 3089, pp. 369-382 [8] J Zhang, W Kou and K Fan, “Secure Buyer-Seller Watermarking Protocol,” IEEE Proc. Int. Conf. Security, Vol. 153, No. 1, March 2006 [9] R RIvest, A Shamir and L Adelman, “A Method for Obtaining Digital Signature and Public key Crypto System,” Commun. ACM, Vol. 21, pp. 120-126, 120-1978. [10] L Qiao and K Nahrstedlt, “Watermarking Schemes and Protocols for Protecting Rightful Ownership and Customer’s Right,” Journal of Visual Communication and Image Representation, Vol. 9, pp. 194-210, Sept. 1998 [11] F Hartung and M Kutter, “Multimedia Watermarking Techniques,” Proc. IEEE Vol. 87, pp. 1079-1107, July 1999 [12] S Craver, “Zero Knowledge Watermark Detection,” in Proc. 3rd Int. workshop on Information Hiding, Vol. 1768, LNCS, Sep. 1999, pp. 101116 [13] Lei C-L, Yu P-L, Tsai P-L and Chan M-H, “An Efficient and Anonymous Buyer-Seller Watermarking Protocol,” IEEE Trans. on Image Processing, Vol. 13, No.12, pp. 1618-1626, 2004.
557
A Spatiotemporal Parallel Image Processing on FPGA for Augmented Vision System W. Atabany1, and P. Degenaar2 The Institute of Biomedical Engineering, Imperial College 2 Institute of Biomedical Engineering & Division of Neuroscience, Imperial College 1
[email protected] Abstract—In this paper we describe a spatiotemporal parallel algorithm to optimize the power consumption of image processing on FPGA’s. We show how the implementation of our method can significantly reduce power consumption at higher processing speeds compared to traditional spatial (pipeline) parallel processing techniques. We demonstrated a real-time image processing system on a FPGA device and calculated the power consumption. The results show that when the image partitioned into 6 sections the power consumption drops by 45% compared to previous approaches. Index Terms— FPGA, retinal prosthetic, temporal parallel image processing. I.
INTRODUCTION
Portable image processing systems are of interest to those developing retinal prosthetic and augmented vision devices [1], in addition to portable consumer systems. Key to these systems is real-time processing capability and low power consumption. This latter feature will determine the battery size and thus the device weight and dimensions. Both retinal prosthetic devices and augmented vision devices will need image processing of the visually scene. The common processing algorithm to both systems is the edge convolution filter. Such convolution masks, though powerful, can require significant processing resources. In addition, many other image processing operators such as anisotropic diffusion will also require several operations per pixel resulting in large volumes of data to be processed. However, it is difficult to satisfy this requirement in real-time while maintaining low power. As the energy consumption per operation of CMOS transistors scales with operating frequency, simply increasing the speed of the processor is undesirable. Sequential processing in CPU’s is very flexible, but not suited to the task efficient image processing. Thus, most image processing tasks are now performed on dedicated graphics processors with highly parallel architectures. While these latter systems achieve the performance requirements, they are still multifunctional and thus, do not have the power efficiency of dedicated parallel processing engines. To solve this problem, dedicated analog image processing arrays have been proposed [2]. These retinomorphic chips are very low power, but require large surface areas of costly dedicated CMOS, and with some exceptions [3] tend to have non reconfigurable filters. Thus arrays tend to have maximum sizes of 64x64 pixels, whereas augmented vision would ideally have VGA resolutions or above.
One alternative is to use field programmable gate arrays (FPGAs). Continual growth in the size and functionality of FPGAs over recent years has resulted in an increasing interest in their use as implementation platforms for image processing applications, particularly real-time video processing [4] for its parallel processing capability. Parallelism in image processing algorithms exists in two major forms [5]: spatial parallelism and temporal parallelism. FPGA implementations have the potential to be parallel using a mixture of these two forms. For example, in order to exploit both forms, the FPGA could be configured to partition the image and distribute the resulting sections to multiple pipelines. All of which can process data concurrently. Previously, Sobel filters have been implemented onto FPGA’s using a single spatial (pipeline) [6]. Real time operation was achieved, but the power consumption was not optimized. This paper focuses on the effect of parallel image processing pipelines and image partitioning techniques on the performance and power consumption of the system. The following sections are organized as follows. Section II presents a description of the used filter method, section III presents a detailed description of the proposed spatial (pipeline) parallelism architecture, section IV presents a detailed description of the temporal (partitioning) parallelism architecture. In section V we show an application and the obtained results. Finally section VI presents some conclusions of the present work. II. FILTER DESCRIPTION Edge detection is a fundamental tool used in most image processing applications to obtain information from the frames as a precursor step to feature extraction and object segmentation. This process detects outlines of an object and boundaries between objects and the background in the image. It is our desire to perform edge detection algorithms to enhance the contrast of important objects in the visual scene via augmented visual feedback. Additionally, edge-detection filter can also be used to improve the appearance of blurred or anti-aliased video streams. The base convolution filter equation can be described by:
g[m, n] =
Ν −1 Ν −1
∑ ∑ (f[k, l]h[m − k,n − l] ) k 0l 0
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 558–561. © Springer Science+Business Media B.V. 2008
=
=
(1)
559
A SPATIOTEMPORAL PARALLEL IMAGE PROCESSING ON FPGA
Firstly the image smoothed using a Gaussian convolution filter: G gaussian =
⎛ 1 2 1⎞ ⎟ 1⎜ ⎜ 2 4 2⎟ 16 ⎜ ⎟ ⎝ 1 2 1⎠
times. To solve this problem an N-1 line buffer is used to store the required number of lines needed for the window operation as shown in Fig. 2. (2) Data in from original image
Laplacian convolutions can then be used to determine the edges of an image. However, for higher resolutions we find that Sobel filters give better results. The Sobel edge detection algorithm is one of the most commonly used techniques for edge detection [7]. It can be performed (approximately) by adding the absolute results of separate horizontal and vertical sobel convolutions. In this paper, we used 4 types of directional Sobel edge detection filter to minimize the complexity and resources of the FPGA. The output from these 4 filters are added together to get more enhancement edge image. The used 3x3 Diagonal Sobel filters are: 0 ⎞ ⎛1 1 ⎜ ⎟ G d1 = ⎜ 1 0 − 1 ⎟ ⎜ 0 − 1 − 1⎟ ⎝ ⎠
⎛ − 1 − 1 0⎞ ⎜ ⎟ G d2 = ⎜ − 1 0 1 ⎟ ⎜ 0 1 1 ⎟⎠ ⎝
2 1 ⎞ ⎛ 1 ⎜ ⎟ GH = ⎜ 0 0 0 ⎟ ⎜ − 1 − 2 − 1⎟ ⎝ ⎠
⎛ 1 0 − 1⎞ ⎟ ⎜ G V = ⎜ 2 0 − 2⎟ ⎜ 1 0 − 1⎟ ⎠ ⎝
(5)
The filtering operation on the image work based on the window operation, as each pixel in the output image is produced by sliding an N×M window over the input image and computing an operation according to the input pixels under the window and the chosen window operator. The result is a pixel value that is assigned to the centre of the window in the output image, as shown below in Fig. 1.
3×3 Filter
Line Buffer Line Buffer
Fig. 2. 2 line buffers connecting with 3×3 filter for buffering data coming from the original image stored in the input RAM
Two lines buffers have been used for a filter with 3×3 kernel. Then the data window enters the filter in a parallel form. The structure of the 3×3 filter consists of 3 modules of 3-Taps (Coefficient) filters as shown in Fig. 3. Each 3-Tap module takes the data from the line buffer with a rate faster than the system CLK rate by 3 times and multiply them by the 3 filter coefficients (represent one raw for each 3-Tap) stored in a fixed depth RAM and the output from each multiplication process accumulated in accumulator as shown in Fig. 4. The output from all the three 3-Tap filters are then added together to form one processed pixel 3-Tap filter ∑
3-Tap filter ∑
Output
3-Tap filter Fig. 3. Parallel 3×3 filter module
Input Data
Counter
MAC
Output
Coefficients ROM
Fig. 4. The structure of the 3-Tap filter
After filtering all the pixels then the output data from the filter stored into the output RAM. IV. TEMPORAL PARALLELISM METHOD
Fig. 1. Conceptual example of window filtering
III. SPATIAL PARALLELISM ARCHITECTURE In parallel spatial operations the filtering operation is performed in a parallel form. Thus, if real time processing of the video stream is required, N×M pixel values need to be processed each time the window is moved. However, Each pixel in the image would then need to be read up to N×M
To make real time image processing, the system should output high frame rates, as in some cases each pixel can be processed several times which means to process one pixel several clock cycles are needed. To be able to perform operations at the right frequency, we could simply increase frame rate, but that would decrease the power efficiency. The overall system clock in modern high end FPGA devices can operate at over 500MHz, but increasing the system speed increases the capacitive power losses, and thus increases thermal dissipation while increasing power consumption. To
560
ATABANY AND DEGENAAR Timing and Reading Module N Counters for Multiplexing and Addressing
Timing Control
Control
Image Processing Module n-1 Line Buffer
Recollecting Module
nxn Filter
Counter Counter
M U X
Original Image Stored in RAM
D M U X
N Processing Units
n-1 Line Buffer
M U X
Processed Image Stored in Buffer
nxn Filter
Counter
Fig. 5. The architecture of the system partitioning technique
solve this problem we should benefit from the parallel operation features of the FPGA. In this case the image can be partitioned into small sections and then all of them can be processed together in a parallel form, in addition to the parallelized processing of the sub-units described above. Fig. 5, shows the structure of the partitioning system technique. The timing and reading module consists of N counters for N sections, as each counter is specified for addressing specific section of the original image stored in the main RAM. The output of the counters which are the required addresses are directed to the input of N×1 multiplexer operated at a speed faster than the system CLK by N times. The timing control signals control the synchronization between the Multiplexer, Demultiplexer and the 3×3 filters modules in the image processing module. After accessing the memory, the output of it goes to a 1×N demultiplexer to distribute the output data to N processing unit, which operate together in parallel. The recollecting module consists of another multiplexer which operates also N times faster than the system CLK. The function of this module is to collect and store the processed data from the N processing units into the output buffer. V. PARTITIONING VERSUS UNPARTITIONING ALGORITHM We have compared the spatial and temporal parallelism to partition the image at different frame rates while measuring power consumption. A program file was designed on the System Generator from Xilinx [8] to read a 256×256 grayscale image stored in RAM without partitioning it and passed it to a 3×3 Gaussian filter followed by a Sobel edge detection then the output stored in the output buffer. After designing the file under the System generator environment it is compiled into a bitstream file and downloaded into the ML401 Virtex 4 evaluation platform board with XC4VLX25 FPGA device [9].
The file executed several times with different clock speed to calculate the frame processing time and the power consumption. The same file was reimplemented twice but in this case the temporal partioning technique was applied, as in the first time the image was partitioned into 3 small images and in the second the image was partitioned into 6 partitions. Fig. 6, shows the result of applying the Sobel edge detection algorithm on the image. Fig. 7, shows the relation between the frame processing time and the power consumption for the Sobel operator when the image processed without any partitioning (just using the spatial parallelism) and when it is partitioned into 3 and 6 partitions. The power consumption calculated using the XPOWER ESTIMATOR (XPE) program from Xilinx [10]. As shown in Fig. 7, the power consumption reduces sharply when using the partitioning algorithm in the regions that require high processing frame rate. For example, to achieve a processing frame time of 0.6ms the consumed power will be 771mW. While when partitioning the image into 3 and 6 partitions the power consumption reduces to 503mW and 429mW respectively. From Fig. 7, we can see that the minimum power of the XC4VLX25 FPGA device is 227mW even that there is no logic cells had been used in the chip. So whatever reducing the frame rate to achieve lower power consumption, the minimum will not be less than 227mW [9]. Fig. 8, Shows the effect of the image size on the power consumption of the system, as the filtering process was done on images of size 64×64, 128×128 and 256×256 respectively when processing the image without partitioning and when partitioning it into 3 and 6 partitions, respectively. Table I shows the percentages of the used resources from the XC4VLX25 when processing the image without partitioning it and when processing it with partitioning algorithm into 3 and 6 partitions respectively.
561
A SPATIOTEMPORAL PARALLEL IMAGE PROCESSING ON FPGA
VI. CONCLUSION The reconfigurable field programmable gate array (FPGA) is ideal for the development of fast reconfigurable processing systems, especially for image processing applications.
(a) (b) Fig.6. The result of applying the smoothing filter followed by four diagonal sobel filter. (a) is the original image and (b) the edge detected image. 900 P ower C ons um ption (m W)
800 700
P1
600
P3
500
P6
Slice Flip Flops 4 input LUTs Occupied Slices FIFO16/RAMB16s - DSP48s
partitioning 4% 2% 12% 88% 18%
3 partitions 14% 7% 36% 88% 37%
partitions 30% 18% 75% 88% 37%
This paper discussed the implementation of image processing algorithms (smoothing and Sobel edge detection filters) using spatial and temporal parallelism architecture. We show how by increasing the parallelism of the architecture, we can reduce the power consumption at higher speed requirements. These requirements will increase with image size and thus the benefits of the parallelism will become more prevalent. Additionally, if the algorithm is implemented directly in CMOS, we would expect the baseline power consumption per operation to be reduced, and thus increase the benefit of this architecture. ACKNOWLEDGMENT The authors would like to thank The University of London Central Research Fund (AR/CRF/B) and the Royal Society Research fund (PS0429_DNCCA) for supporting this research. Also I would like to thank the Egyptian government for supporting me. REFERENCES
400 300
[1]
200 100
40
20
13
10
5
2
1
0.
66
0
F ra m e P roc e ssing T im e (m s)
Fig. 7. T he effect of partitioning technique on the power consumption for different frame processing time (achieved by different operating frequency) when processing the image using Sobel operator. P1 without partitioning the image, P3 and P6 partitioned into 3 and 6 respectively 900 800 Power Consumption (mW)
-
700 64*64
600
128*128
500
256*256
400 300 200 100 0 0
2
4
6
8
No. of Partitions
Fig. 8. The effect of image size on the power consumption for processing an image without partitioning and when partitioned into 3 and 6 partitions. TABLE I THE PERCENTAGES OF USED RESOURCES WHEN USING PARTITIONING AND WITHOUT USING IT Percentage of Used Resources from XC4VLX25 FPGA device Used Resources Processing without
Processing with partitioning into
Processing with partitioning into 6
Y.Huang, E.M.Drakakis, C.Toumazou, K.Nikolic and P.Degenaar “An Optoelectronic platform for retinal prothesis”, in IEE conference of BIOCAS, 2006 [2] P.Dudek and P.J.Hicks, “A General-Purpose Processor-per-Pixel Analog SIMD Vision Chip”, in IEEE Transactions on Circuits and Systems - I: Analog and Digital Signal Processing, vol. 520, no. 1, pp. 13-20, January 2005 [3] Banks, DJ, Degenaar, P, Tournazou, C, “Distributed current-mode image processing filters”, ELECTRON LETT, 2005, Vol: 41, Pages: 1201 1202, ISSN: 0013-5194 [4] Hutchings, B. and Villasenor, J., The Flexibility of Configurable Computing IEEE Signal Processing Magazine, vol. 15, pp. 67-84, Sep, 1998. [5] Downton, A. and Crookes, D., Parallel Architectures for Image Processing Electronics & Communication Engineering Journal, vol. 10, pp. 139-151, Jun, 1998. [6] Chi-Jeng Chang, Zen-Yi Huang, Hsin-Yen Li, Kai-Ting Hu, and WenChih Tseng, “Pipelined operation of image capturing and processing”, Nanotechnology, 2005. 5th IEEE Conference on July 2005 pp:275 - 278 vol. 1 [7] K.R. Castleman, Digital Image Processing, Prentice Hall, 1995, ISBN: 0132114674. [8] System Generator under Matlab Simulink from Xilinx. Available http://www.xilinx.com/ise/optional_prod/system_generator.htm [9] ML-40x Evaluation Platform user guide. Available http://www.xilinx.com/ [10] XPOWER ESTIMATOR (XPE) V9.1.02 program from Xilinx. Avalable http://www.xilinx.com/ise/power_tools
Biometrics of Cut Tree Faces W. A. Barrett Department of Computer Engineering San Jose State University San Jose, CA [email protected]
Abstract- An issue of some interest to those in the lumber and timber industry is the rapid matching of a cut log face with its mate. For example, the U.S. Forest Service experiences a considerable loss of its valuable tree properties through poaching every year. They desire a tool that can rapidly scan a stack of cut timber faces, taken in a suspect lumber mill yard, and identify matches to a scanned photograph of stump faces of poached trees. Such a tool clearly falls into the category of a biometric identifier. We have developed such a tool and have shown that it has usefully high biometric discrimination in the matching of a stump photograph to its cut face. It has certain limitations, described in this paper, but is otherwise eminently suitable for the task for which it was created.
I.
INTRODUCTION
A biometric measure is some set of measurements made on an object that is intended to provide a unique or near-unique signature, typically a vector of numbers, of that object. A biometric measure is not designed to reproduce an exact image of the object, but rather to provide sufficient statistical variance that one such object can be distinguished from another through comparison of their signatures alone. Two such signatures may be compared in different ways. The Euclidean distance measure is a simple one, considers two biometric vectors as points in a multi-dimensional space, and reduces their distance of separation to a single number. A small distance then implies a close match. Given a large set of object measurements, with some taken of the same object at different times, one can assess the quality of the measurement vector strategy by examining two distance distributions: one of all those distances between different images of the same object, called the authentics distribution, and a second of all those distances between images of different objects, called the imposters distribution. The degree of overlap of these two distributions should be small. II. LOGFACE BIOMETRICS In this work, we consider digital color photographs of the faces of cut timber, typically taken in daylight, with the camera axis essentially at right angles to the face. The image face should be wholly within the camera view, but does not have to be centered in the image. It turns out that the biometric measure that we have developed is reasonably independent of the camera angle, so that elaborate means of ensuring a correct view angle are not necessary.
Fig.1. Logface biometric system editing frame. An image is brought up in this viewing window. Operator segments logfaces through an interactive cubic spline tool.
The background in the images varies considerably, as does the number of faces photographed. The background may consist of partial side views of other logs, the sky, or other natural scenery, Fig. 1. The background of a stump is typically a combination of earth, sky, twigs, grass, and other plants. Some images are partially obscured by other log faces or other growth. Segmentation is required to separate the wanted log face image from its background. This takes the form of a closed simple cubic spline curve that just encloses the cut face in an image. Pixels outside a face segment are digitally reduced to zero intensity prior to computing the face’s biometric. The nature of the face image might be expected to be of the tree rings. Unfortunately, one also sees strong saw kerf patterns, which also create biometric variations depending on the angle of the incident light and orientation of the log. Similar kerf marks are seen on the matching faces, due to the symmetry of the saw teeth. There are also significant variations in coloration of the different parts of the faces, plus changes in coloration with age. A freshly cut pine log will have a strong white-yellow color,
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 562–565. © Springer Science+Business Media B.V. 2008
563
BIOMETRICS OF CUT TREE FACES
but this shifts to a darker yellow and later brown over a few weeks as the log ages. Some logs will also develop a split starting near the center toward the bark -- as the log dries out, it contracts more in the outer layers than the inner ones, causing a high peripheral stress to develop in the outer layers. Using greyscale rather than color for the biometric measures helps eliminate this factor. An obvious variation among face images is in the orientation of the log face. After the tree is cut, its trunk is stripped of branches in the forest, and loaded onto a carrier. The orientation relative to the tree stump is lost. The implication with respect to a biometric measure is that (1) the measure must be orientation independent, and (2) that we locate an origin in the segmented image that can be used as a rotation origin. III. PSEUDO-ZERNIKE MOMENTS We resolved issue (1) by choosing a pseudo-Zernike polynomial moment invariant [1]. Issue (2) is easily solved by first obtaining a good segmentation of the face’s bark layer, then using the center of mass of the face as an origin. A Pseudo-Zernike polynomial Rpq(r) is defined by two integers p and q and a radius r such that 0 <= r <= 1, p >= 0, q >= 0 and q <= p. A few of these polynomials are listed in Table 1. p q 0 1 2 3 4
0
1
2
3
4
1
3r -2
10r2 –12r +3
r
5r2 – 4r
35r3 – 60r2 +30r -4 21r2 – 30r2 +10r 7r3 –6r2
126r4 –280r3 +210r2 –60r + 5 84r4 – 168r3 +105r2 –20r 36r4 –56r3 +21r2 9r4 –8r3 r4
r2
r3
Table 1. Zernike polynomials Rpq(r) for a few values of p and q. 0 ≤ r ≤ 1, p ≥ 0, q ≥ 0 and q ≤ p. A general formula is given in [1].
Circular moments, in which the pseudo-Zernike polynomials may be embedded, are typically a combination of a radial component, in which the radius is considered to range from 0 to 1, combined with a circular component over the whole circle. The radial component is comprised of a polynomial whose degree increases with the order. The circular component has the general structure of a Fourier series expansion, as one might expect, with 0, 1, 2, 3, etc. cycles in each full circle. Eq. (1) defines the pseudo-Zernike moment Zpq, as an integral over a circular image space 0 ≤ r ≤ 1, p ≥ 0, q ≥ 0, and q ≤ p. f (r, θ ) is the pixel intensity at (r, θ). Special numerical integration algorithms are required to compute these moment values efficiently.
Z pq =
( p + 1)
π
2π
1
0
0
∫ ∫R
pq
(r ) e −iqθ f (r ,θ ) r dr dθ
(1)
Clearly, p is associated with the radial components, and q with the circular components, though the two are required in the polynomial coefficient Rpq(r). One might expect that the origin would cause a singularity, but the form of the polynomials prohibit that, with all but one of the radial functions vanishing at the origin. The moments of an image can be rapidly computed using pre-computed tables of coefficients, and methods developed by various authors, for example, [2]. It happens that all the orientation information appears in the first few low-order moments, of orders (0,0), (1,0), and (0,1). As the order increases, the measure of fine radial detail is improved. The pseudo-Zernike moments themselves are not invariant with respect to rotation, but certain combinations of them are, given in Eq. (2). ϕ1 = Z p 0
ϕ 2 = Z pq
2
ϕ 3 = ( Z pq ) * ( Z rs ) m ± [( Z pq ) * ( Z rs ) m ] * where m ≥1, and q = ms
(2)
Regarding ϕ3, note that the + sign produces twice the real part and the – sign produces twice the imaginary part of the first term. Both the real and imaginary parts are invariant with respect to rotation. The real part is also invariant with respect to a reflection, while the imaginary part is not. It happens that these are not independent. Indeed, there are considerably more invariants for a given p than there are Zernike moments. Belkasim et al [3] recommend these invariants and constraints, which comprise an independent set whose cardinality is the same as that of the Zernike moments for a given p, Eq. (3): ( PZMI ) p 0 = Z p 0 ( PZMI ) pq = Z pq ( PZMI ) p , p + q = Z pq
for 1 ≤ q ≤ p 1/ q
⋅ Z p1 * ± ( Z pq
1/ q
⋅ Z p1 *) * for q > 1
( PZMI ) p , p +1 = Z p −1,1 ⋅ Z p1 * ± ( Z p −1,1 ⋅ Z p1 *) * for q = 1 and p > 1
A second problem is that polynomial expansions typically require computing some high powers of complex numbers, together with additions and subtractions of these numbers. The coefficients of the polynomials increase rapidly with p, as can be seen in Table 1. Powers of size 15 are needed for good resolution, yet such high powers cause significant loss of precision with 8-byte IEEE floating-point numbers. Powers above 20 are essentially useless, and merely generate quantization noise in the measures. Belkasim recognized this problem and proposed using fractional powers instead of integral powers, a measure that significantly extends the number of useful orders by reducing quantization errors. These are shown in Eq. (3). Fractional
564
BARRETT
powers of floating point numbers requires a few more calculation cycles, but are worth it in the improved quality of the results. It should be clear that the Zernike polynomial values can be pre-computed for a given granularity in r and θ. This leaves the integration, Eq. (1), to be computed for each image segment. This, of course, becomes a summation over a set of pixels within a circular segment. The diameter of this segment is the diameter of a circle whose area is equal to the area of the segmented image. Some portions of the image segment are clipped, and others reduced to 0 (as background), but this applies equally to a stump and its matching log face. We have found that by dropping a few of the low order terms, and keeping a set of some forty floating-point measures, we obtain a good biometric discrimination in the several hundred log faces examined with our software tools. IV. SOFTWARE PLATFORM The Forest Service would like a tool that can be operated by any ranger with minimal training. At a minimum, this entails having the ranger take digital photographs of a stump and log faces, then sending these to an analysis center for matching purposes.
We would prefer to know the camera distance and focal length when an image was taken, as this would also provide an estimate of the physical size of each face, but this was not done. A known face size would be a useful and powerful biometric, when combined with the other feature measurements. At the analysis center, a technician is trained to manually segment the faces, Fig. 1. Each field image is brought up on a screen. Each face can be accurately delimited with a few mouse movements and clicks, using a cubic-spline fitting tool. Once a face is delimited, its enclosing polynomial is saved in a database. The polynomial is also used to compute the face’s biometric measure, an operation that is typically completed in a fraction of a second per face. Face matching can proceed when several images have been so entered and segmented, Fig. 2. When the technician is ready to identify those log faces that match a particular stump, the stump image is selected, and then one key click locates a set of cut faces that most closely match the stump. These matching faces are ordered by closeness of match, i.e. by increasing biometric distance measure from the stump’s biometric measure. The matching faces are shown in a special window in which both the stump image and the face image can be visually compared. A unique algorithm examines both of these images, and arrives at a canonical “rotation angle” for each of them. The face image is then rotated for display by the difference between these angles, such that it will typically appear to be angularly aligned with respect to the stump. Needless to say, both images are scaled to exactly fit into their square image frame. The result of this automatic scaling and rotational alignment is to make a visual comparison of the two images very easy. After all, the biometric measurement process is not exact, and sometimes produces a false match. Any remaining false matches are easily eliminated through a visual comparison. V. LOGFACE TOOL EXPERIENCE
Fig 2. Face matching panel. Slider near the top selects one of several possible matches, the best toward the left. The bottom menus provide a means of manually verifying a match.
Although this logface tool has been designed to be as userfriendly as possible, with many safeguards against careless actions, some training is nevertheless required to install and use it. We were privileged to train one computer-astute person in its use at the San Dimas Forest Service research group. The main problem he faced was in getting the product installed, owing to certain security and virus precautions enforced by their computer support staff. Installation is otherwise automated. Our operator easily learned to use the tool for face segmentation and matching in less than an hour. Needless to say, the tool is richly documented for both the casual operator and a more sophisticated person interested in tuning its biometric parameters. That experience, together with the need to segment each face manually through the tool’s windows interface, suggested that a central bureau should be used to carry out the image analysis,
565
BIOMETRICS OF CUT TREE FACES
Imposters vs. Authentics, for trainer
Imposters Authentics
1 0.9 0.8
Acc. probability
0.7
image of an object to another image, and that should provide a close match. A tentative result using a random sampling of known image-pairs drawn from the known matches and nonmatches yielded a cross-over probability of 0.04. This essentially means that if 25 random samples are compared to a given candidate, the probability is even that the correct matching sample will be chosen from the set.
0.6
VII. SUMMARY
0.5 0.4 0.3 0.2 0.1 0 -0.25
4.75
9.75
14.75 19.75 24.75 29.75 34.75 39.75 44.75 49.75 distance
Figure 3 Integrated imposters vs. authentics probabilities. This is estimated from a sample of several hundred manually matched “trainer” images. Most of the matching images are of the same face, photographed at different distances and viewpoints. The biometric distance 15.0 can be used as an effective discriminant between matching and non-matching faces.
using the field agents only to capture photographs and descriptions. A central bureau could also explore biometric quality and make appropriate adjustments. The bureau would also be in a better position to identify matches across forest districts than could someone in the field. That effort has so far not been mounted. VI. BIOMETRIC QUALITY In order to better gauge the quality of the biometrics, we incorporated two tools into our system. Both require a special “training set” of images, in which the matching faces have been manually identified. One tool uses this set to construct the imposter-authentics distribution in the form of an Excel spreadsheet data list -- see Fig. 3. The other tool produces a very detailed report in Excel form of all the biometric measures, organized by image index and match index (not shown). This report, though voluminous, permits one to experiment with different distance measures, and also to examine the separate variances of each of the measure categories. We have estimated the biometric quality of matching using a reasonably large set of log face images. Unfortunately, we do not have more than a few dozen images of known matching faces and stumps, too small a sample to yield a good biometric quality estimate. In fact, those pairs show a high quality of matching -- the system has consistently located the matching face to every stump in our collection, with the matching face at the top of the match list. We have also compared log faces with other log faces, using, of course, different photographic images of the same face. This provides a larger set of matches than stumps to log faces, though this obviously can be criticized as comparing one
A software tool designed to match a cut log face image with that of its mating face or stump, using biometric principles, has been designed and implemented. It accepts a set of digital images and provides a means of segmenting the log faces in the images. Once segmented, the faces can be compared using an orientation-invariant comparison algorithm. Matching faces or nearly-matching faces are brought up in a special window for a final manual comparison by the operator. Some unique properties of this tool include the use of orientation-invariant pseudo-Zernike polynomial moments, face segmentation using a cubic-spline fitting scheme, and a matching tool that automatically rotates a candidate image for final visual matching purposes. Automatic segmentation of the faces has not been achieved. There are several possible avenues that could be explored in this regard, but this appears to be a difficult problem due to the varigated background and the frequent resemblance of the background to the face. Once segmented, the face matching appears to be excellent, and in line with reports of similar matching experiments using pseudo-Zernike moments. The interested reader may download a version of this tool, including complete documentation, through the author’s web site: http://www.engr.sjsu.edu/wbarrett. ACKNOWLEDGMENTS This work was supported through a contract with the U.S. Department of Agriculture, through the Forest Service Technology and Development group, San Dimas, CA. We thank Mr. Ed Messerlie of the Forest Service for initiating this work, for his close working relationship, and for his patience, as well as providing us with numerous digital photographs. We also thank Andy Horcher of the Forest Service San Dimas office, for his continued support. REFERENCES [1] [2]
[3]
R. Mukundan, K. R. Ramakrishnan, “Moment Functions in Image Analysis”, World Scientific, 1998, pp. 57-64. Chee-Way Chong, R. Mukundan, and P. Raveendran, “An Efficient Algorithm for Fast Computation of Pseudo-Zernike Moments”, Intl. Conf. on Image and Vision Computing, IVCNZ01 New Zealand, Nov. 2001, pp 237-242. S.O. Belkasim, M. Shridhar and M. Ahmadi, “Pattern Recognition with Moment Invariants: A Comparative Study and New Results”, in Pattern Recognition, Vol. 24, No. 12, 1991, pp. 1117-1138.
A Survey of Hands-on Assignments and Projects in Undergraduate Computer Architecture Courses Xuejun Liang Department of Computer Science Jackson State University Jackson, MS 39217 USA [email protected]
Abstract - Computer Architecture and Organization is an important area of the computer science body of knowledge. How to teach and learn the subjects in this area effectively has been an active research topic. This paper presents results and analyses from a survey of hands-on assignments and projects from 35 undergraduate computer architecture and organization courses which are either required or elective for the BS degree in CS. These surveyed courses are selected from universities listed among the 50 top Engineering Ph.D. granting schools by the US News & World Report 2008 rankings, and their teaching materials are publicly accessible via their course websites.
I.
INTRODUCTION
Computer Architecture and Organization is an important area in undergraduate computer science programs. According to the Joint Task Force on Computing Curricula of IEEE Computer Society and Association for Computing Machinery [1], the subjects in this area include (1) Digital logic and digital systems, (2) Machine level representation of data, (3) Assembly level machine organization, (4) Memory system organization and architecture, (5) Interfacing and communication, (6) Functional organization, (7) Multiprocessing and alternative architectures, (8) Performance enhancements, and (9) Architecture for networks and distributed systems. They are often taught in either a twocourse sequence or a three-course sequence in most of undergraduate computer science programs. Some of these subjects can be taught at both an introductory level and an upper level. Teaching the subjects in the computer architecture area can be difficult; students sometimes have trouble getting the points of the subjects under the traditional paper-pencil pedagogy. Some topics in computer architecture area are difficult to understand without proper intuitions. Hands-on assignments and projects such as designing, programming, simulating, and implementing a processor architecture would be able to provide such learning intuitions and will certainly help students learning the course subjects and engage their learning interest. But, there are so many distinct topics in the computer architecture area and computer architectures can be approached in several different levels. Subjects taught in computer architecture and organization courses will vary from
institution to institution and between CS and CE majors in the same institution. Therefore, there could be too many choices in selecting hands-on assignments and projects for a computer architecture and organization course. It is certainly desirable to have an overall picture of these possible hands-on assignments and projects by categorizing them and then getting their distribution over different categories. To this end, the author surveyed hands-on assignments and projects collected from 35 undergraduate computer architecture and organization courses which are either required or elective for the BS degree in CS. These surveyed courses are selected from universities listed among the 50 top Engineering Ph.D. granting schools by the US News & World Report 2008 rankings [2], and their teaching materials are publicly accessible via their course websites. There have been substantial research works on the computer architecture education. Computer processor simulators are the most useful tools in the computer architecture education and research. There are several lists of processor simulators available from the Internet, for example, the WWW Computer Architecture Page [3]. Most simulators on these lists are for the research purpose. A survey of simulators used in computer architecture and organization courses can be found in [4]. In order to make a simulator easier to use and suitable for the teaching purpose, graphical interfaces to an existing simulator were created in [5]. There are also many cache simulators [6, 7] to allow students to play with cache memory and memory hierarchy. However, the usage of these tools is often limited to a few individual institutions. This survey will provide readers an overall picture of major hands-on assignments and projects in the undergraduate computer architecture education, their categorization and distribution, as well as languages, tools, and platforms used in these assignments and projects. II. ASSIGNMENT CATEGORIES Hands-on assignments and projects used for the computer architecture and organization education can be categorized based on their contents, programming languages used, and tools used. In this survey, four main categories A, B, C and D are given as shown in Table I. Several subcategories are also listed within each category. There are totally twelve subcategories.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 566–570. © Springer Science+Business Media B.V. 2008
A SURVEY OF HANDS-ON ASSIGNMENTS AND PROJECTS TABLE I CATEGORIES AND SUBCATEGORIES Categories A. Digital Logic Design
B. Assembly Language Programming C. High-Level Language Programming
D. Exploiting Processor Simulators
Subcategories 1. Basic Digital Logic Design 2. Scalar Processor Design 3. Cache Design 4. Superscalar Processor Design 5. Basic Assembly Programming 6. Advanced Assembly Programming 7. Basic High-Level Programming 8. Processor Simulator 9. Cache Simulator 10. Advanced High-Level Programming 11. Using simulators 12. Modifying simulators
A. Digital Logic Design Digital logic design plays an essential role in studying the internal behalves of a computer hardware component. Designing, simulating, and implementing a processor architecture by using the digital logic approach is certainly a valuable hands-on experience in computer architecture area. Under this category, the first subcategory called basic digital logic design includes the combinational circuits and major components, the sequential circuits and finite state machines, and the pipelined circuits. The knowledge from this subcategory is necessary for students to move forward on the remaining three subcategories. The scalar processor design subcategory contains the single-cycle processor and the pipelined processor. Note that the survey did not include a course whose assignments and projects were solely on the first subcategory. B. Assembly Language Programming The ISA (Instruction Set Architecture) level of computer organizations is the interface between software and hardware. A good ISA makes the computer hardware easy to be implemented efficiently and makes the computer software program easy to be translated into an efficient code. Assembly language programming is the best way to study an ISA. The basic assembly programming subcategory includes (1) System I/O, ALU operations, and control flows, (2) Stacks, subroutines, and recursions, and (3) Programmed I/O, interrupts, and exceptions. One advanced assembly programming example is writing a simple timesharing OS kernel on a processor. C. High-Level Language Programming It is true that understanding computer processor and system architectures will help students writing more efficient highlevel language programs. But, on the other hand, developing a software program to simulate a computer processor or a computer system will also help students understanding better computer processor and system architectures. Within the high-level language programming category are listed four subcategories as shown in Table I. The basic highlevel programming subcategory deals with using strings, pointers, memory allocation, and etc. The processor simulator subcategory involves in developing and implementing
567
assemblers, functional processor simulators, cycle-accurate processor simulators, and graphical interfaces of processor simulators. Cache simulators are used to simulate the computer cache memory. Two examples in the advance high-level programming subcategory are the parallel programming on clusters with MPI and the development of an interpreter to simulate UNIX file system. Note that the survey did not include a course whose assignments and projects were solely on the basic high-level programming subcategory. D. Exploiting Processor Simulators Computer processor simulators are very useful tools in the computer architecture education and research. A functional processor simulator allows users to run an assembly language program and to examine the contents either inside the processor or inside the memory. Graphical computer simulators which are able to visualize dynamic behaviors of a computer will certainly help students learning the computer architecture and engage their learning interest. Some processor simulators allow user to study the processor performance and cost and to compare the processor design tradeoffs before building the processor. Two subcategories are listed under the exploiting simulator category as shown in Table I. The using simulator subcategory deals with studying dynamic behaviors, performance, and cost of a computer through using simulators. The other subcategory is about modifying an existing processor simulator to add more functions and features.” III. SURVEY RESULTS AND ANALYSES The survey was performed for 35 undergraduate computer architecture and organization courses from universities among the 50 top Engineering Ph.D. granting schools by the US News & World Report 2008 rankings. The surveyed courses were taught either during or before the summer of 2007. Among the surveyed 35 courses, 27 courses are required and 8 courses are elective for the Bachelor degree of Science in Computer Science. Because the programming/lab assignments and projects are the focus of this survey, courses without programming/lab assignments or projects, or whose programming/lab assignments and/or projects were not publicly accessible via their course websites, were not considered in this survey. The survey content includes course descriptions, course formats, student evaluation, textbooks, languages and tools used, and programming/lab assignments and projects. A course is selected in this survey is based on the course description and the availability of programming/lab assignments and projects. A surveyed course may, or may not, be associated with a lab, or may be purely a lab. Among the 35 surveyed courses, 4 courses did not have required textbooks and 3 courses adopted two required textbooks (one for computer architecture and the other for digital logic). The famous textbook “Computer Organization & Design: The Hardware/Software Interface” by David Patterson and John Hennessey [8] was adopted by 17 required courses and 1 elective course. Another textbook
568
LIANG
A. Assignment and Project Distribution Table II lists the categorization of hands-on assignments and projects under the four categories and the twelve subcategories described in Section 2 for each of the 35 surveyed courses. The column # shows the course number and the column WT shows the weight percentage that assignments and projects will contribute to the final grade of students. The weight of 100% indicates that this is a pure lab course; NA means that the weight is not available. Take an example in Table II, the course # 01 is a required course and its programming/lab assignments and projects will take 35% of the final grade and belong to categories A and B or subcategories 2, 5, and 6. Take another example in Table II, the course # 03 is an elective course and its programming/lab assignments and projects will take 25% of the final grade and belong to category C or subcategory 10. Note that the number of assignments or projects in each subcategory or category is not recorded in Table II because the workload in each assignment or project may be very different. It is also possible that a big project may be involved in several subcategories or even several categories. TABLE II ASSIGNMENT AND PROJECT CATEGORIZATION (* This is an elective course)
01 02 03* 04 05* 06 07 08* 09 10 11 12 13 14 15 16 17 18* 19 20 21* 22* 23 24* 25* 26 27 28 29 30
WT (%)
35 25 35 NA 50 30 40 30 55 NA 50 100 20 30 60 40 25 50 50 60 50 25 25 30 10 40 35 30 15 NA
Assignment and Project Categories (Subcategories) A B C D 1 2 3 4 5 6 7 8 9 1 1 0 1 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x
1 2
55 20 100 NA NA
x
x x
x x
x
x x
x
x
Based on the assignment and project categorization for each course shown in Table II, the course distributions over the 12 subcategories and over the 4 categories are calculated and shown in Fig. 1 and 2, respectively. Take an example in Fig. 1, two required courses and three elective courses have assignments or projects in Subcategory 3. Similarly, take an example in Fig. 2, three required courses and three elective courses have assignments or projects in Category D. From Fig. 1 and 2, it can be noticed that assignments and projects in the assembly programming (Category B), in the basic high-level programming (Subcategory 7), and in the cache simulator (Subcategory 7) are all offered in a required course. Assignments and projects in the scalar processor design (Subcategory 2) and in the basic assembly programming (Subcategory 5) are most popular, while there are relatively few assignments and projects in the exploiting simulators (Category D). Required
Elective
5
7
20 15 10 5 0 1
2
3
4
6
8
9
10
11
12
Assignment and Project Subcategory
Fig. 1. Course Distribution over the 12 Subcategories Required
Elective
25 Number of Courses
#
31 32 33 34 35
Number of Courses
“Computer Architecture: A Quantitative Approach” by John Hennessy and David Patterson [9] was adopted by 5 elective courses.
20 15 10 5 0 A
B
C
D
Assignment and Project Category
x Fig. 2. Course Distribution over the 4 Categories
x
x
Based on the assignment and project categorization for each course shown in Table II, the numbers of courses that cover different numbers of subcategories (categories) are calculated and shown in Fig. 3 (Fig. 4). There is only one required course that has assignments and projects in 7 different subcategories as shown in Fig. 3. Aside from this course, any other course covers assignments and projects in less than 5 subcategories.
A SURVEY OF HANDS-ON ASSIGNMENTS AND PROJECTS
From Fig. 3 and 4, it can be seen that elective courses tends to focus on less subjects than required courses. Only one elective course covers assignments and projects in two categories; the rest elective courses give their assignments and projects in only one category. Required
Elective
5
7
Number of Courses
12 10 8 6 4 2 0 1
2
3
4
6
8
9
10
11
12
Number of Assignment and Project Subcategories
Fig. 3. Course Distribution over Subcategory Coverage Required
Elective
Number of Courses
20 15 10 5 0 1
2
3
4
Number of Assignment and Project Categories
Fig. 4. Course Distribution over Category Coverage
B. Digital Logic Design In the digital logic design category, the popular design entry is using a hardware description language such as Verilog and VHDL. There are six surveyed courses that use FPGA devices in their assignments and/or projects. A brief overview of FPGA platforms, Logic and FPGA design tools, and processors selected to implement will be given in this section. There are four FPGA boards used by six surveyed courses. Three of them are commercial products. They are Xilinx’s XUP Virtex-II Pro Development System [10], Altera’s Development and Education Board [11], and XESS’s XSA board [12]. Only one non-commercial FPGA board, called Calinx [13], is from the UC Berkeley. There are a large number of Logic and FPGA design tools used by the surveyed courses. These tools include the logic synthesis and simulation and the FPGA place & route. Among these tools, many are commercial products, including ModelSim [14], Synopsys VCS [15], Xilinx ISE [16], Altera Quartus II [17], and Aldec Active-HDL [18]. Note that there is generally a free education edition for each of the above commercial tools, but it may be very slow for a large design. There are also many educational logic design and simulation tools used in the surveyed courses. JSIM is a CAD tool from MIT [19]. It consists of a simple editor for entering a circuit
569
description, simulators to simulate a circuit at device-level, transient analysis-level, and gate-level, and a waveform browser to view the results of a simulation. VIRSIM is a graphical user interface to Synopsys VCS for debugging and viewing waveforms [20]. Logisim is an educational tool for designing and simulating digital logic circuits [21]. Chipmunk system software tools from UC Berkeley perform a wide variety of tasks: electronic circuit simulation and schematic capture, graphics editing, and curve plotting, to name a few [22]. SMOK (pronounced smOk) and CEBOLLITA are tools designed to improve the student design experience in an undergraduate machine organization course [23] in the University of Washington. Funsime/Timsim is a set of Verilog tools used in a surveyed course at Cornell University [24]. Funsim is a functional simulator and Timsim is a timing simulator. The processors selected to implement in the surveyed courses using the digital logic design is mainly a subset of a well-known processor such as MIPS and Alpha. They may also be a simple artificial processor architecture designed by instructors or even students themselves. Moreover, they can be an educational processor come with a course or a textbook. For examples, Beta is an educational RISC processor used in MIT’s Computation Structures course. PAW is a simple architecture designed to be easy to implement in a semester course by the Princeton University. Mic-1 is a microarchitecture described in the textbook “Structured Computer Organization”, 4/e, Andrew S. Tanenbaum, Prentice-Hall, 1998. SRC (Simple RISC Computer) is used in the textbook “Computer Systems Design and Architecture,” 2/e, Vincent P. Heuring and Harry F. Jordan, Prentice Hall, 2004. C. Assembly Programming In the assembly language programming category, the majority of assignments and projects are in the basic assembly programming subcategory. Only five projects are classified in the advanced assembly programming subcategory. They are (1) A simple timesharing OS kernel on the Beta processor at MIT, (2) An interpreter that simulates a subset of the MIPS-I ISA. at Stanford, (3) SPIMbot contest at University of Illinois–Urbana-Champaign, (4) Implement a dynamic memory allocator to study the way structured data types and structures with bit fields supported in MIPS at Texas A&M University, and (5) Create the SnakeOS Operating System on LC-3 at Univ. of Pennsylvania. Note that LC-3 is an ISA used in the textbook “Introduction to Computing Systems: From Bits and Gates to C and Beyond,” 2/e, by Yale N. Patt and Sanjay J. Patel, McGraw-Hill, 2003. The processors that are targeted to the assembly language programming include Beta, MIPS, LC-2K7 (an 8-register, 32bit computer with 65536 words of memory, designed and used at the university of Michigan–Ann Arbor), PowerPC, IA-32, PAW, LC-3, SRC, and x86. The functional simulator for interpreting, executing, and debugging assembly programs are BSIM for Beta, SPIM [25] and GMIPC [26] for MIPS, LC-3
570
LIANG
Simulator [27] and PennSim [28] for LC-3, and SRC Assembler and Simulator [29]. D. High-Level Programming In the high-level language programming category, the highlevel languages used are C, C++, and Java. The processors simulated in the subcategory 8 include MIPS, LC-2K7, a student-designed ISA, PAW, and LC-3. There are eight assignments and/or projects in the advanced high-level language programming subcategory. Five of them are (1) Parallel programming on clusters with MPI at Stanford, (2) An interpreter to simulate UNIX file system at Berkeley, (3) MIPS Multicore Simulator, and Multiplayer Network Tetris Game at Cornell, (4) Use shared memory (pthreads) and message passing (MPI) to compute the Nth prime number at Duke, and (5) Write a multiprocessor program to do Quicksort running on the MulSim [30] shared-memory multiprocessor simulator at UC-Davis. E. Exploiting Processor Simulators It can be seen from Table II, there are only a few courses involved in this category. In the subcategory 11, one project at Berkeley is to determine cache parameters using CAMERA and study virtual memory using CAMERA and VMSIM, where CAMERA is a simple cache simulator used in CS 61C at Berkeley and VMSIM [31] is a simulator of a computer system executing concurrent processes into which desired CPU scheduling and memory management policies can be plugged in with ease. There are also three SimpleScalar [32] assignments in this subcategory to do benchmarking, branch prediction algorithms, a cache memory system, chip multiprocessors, and multithreaded processors. There are three assignments in the subcategory 12: (1) MIC-1 microcode modification, (2) The code modification of sim-outorder in SimpleScalar to explore a micro-architectural issue, and (3) Extend the Mac-1 instruction set by adding a MDN instruction. IV. CONCLUSIONS AND FUTURE WORKS This survey presents an overall picture of major hands-on assignments and projects currently used in the undergraduate computer architecture education at the top universities in USA. This work is intended for helping educators to select and/or create right hands-on assignments and projects as well as tools for their computer architecture and organization courses based on their expected course outcomes. A major future work will be evaluating and comparing some of these hands-on assignments and projects as well as tools. Meanwhile, how to adopt these hands-on assignments and projects as well as tools in computer architecture courses at an underrepresented institution can be an interesting work. REFERENCES [1] [2]
The Joint Task Force on Computing Curricula of IEEE Computer Society and Association for Computing Machinery, “Computing Curricula 2001 Computer Science Final Report,” 2001 U.S. News & World Report, “America’s Best Graduate Schools 2008: Top Engineering Schools,” available from: http://grad-
[3] [4]
[5]
[6]
[7]
[8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32]
schools.usnews.rankingsandreviews.com/usnews/edu/grad/rankings/eng/ brief/engrank_brief.php Luke Yen, Min Xu, Milo Martin, Doug Burger, and Mark Hill, “WWW Computer Architecture Page,” available from: http://pages.cs.wisc.edu/~arch/www/ W. Yurcik, G. Wolffe, and M. Holliday, “A Survey of Simulators Used in Computer Organization/Architecture Courses,” in the Proceedings of the 2001 Summer Computer Simulation Conference (SCSC 2001), Orlando FL. USA, July 2001 C. Weaver, E. Larson, and T. Austin, “Effective Support of Simulation in Computer Architecture Instruction,” in the Proceedings of the Workshop on Computer Architecture Education (WCAE), Anchorage AK USA, May 2002 S. Petit, N. Tomás, J. Sahuquillo, and A. Pont, “An execution-driven simulation tool for teaching cache memories in introductory computer organization courses,” in the Proceedings of the Workshop on Computer Architecture Education (WCAE), pp.18-24, Boston MA USA, June 2006. J. Mendes, L. Coutinho, and C. Martins, “Web Memory Hierarchy Learning and Research Environment,” in the Proceedings of the Workshop on Computer Architecture Education (WCAE), pp.25-32, Boston MA USA, June 2006 David Patterson and John Hennessey, “Computer Organization & Design: The Hardware/Software Interface,” 3/e, Morgan Kaufmann, 2007 John. Hennessy and David Patterson, “Computer Architecture: A Quantitative Approach,” 4/e, Morgan Kaufmann, 2006 Xilinx, “Xilinx XUP Virtex II Pro Development System,” available from http://www.xilinx.com/univ/xupv2p.html Altera, “Altera’s Development and Education Board,” available from http://www.altera.com/education/univ/materials/boards/unv-de2board.html “XSA Board V1.1, V1.2 User Manual,” XESS Corporation, 2005 “CALINX - EECS150 FPGA LAB BOARD,” University of California, Berkeley, available from http://calinx.eecs.berkeley.edu/ Mentor Graphics, “ModelSim,” available at http://www.model.com/ Synopsys, “VCS,” available from http://www.synopsys.com/vcs/ Xilinx, “Logic Design,” available from http://www.xilinx.com/ise/logic_design_prod/index.htm Altera, “Quartus II Software,” available from http://www.altera.com/products/software/products/quartus2/qtsindex.html Aldec, “Active-HDL Overview,” available from http://www.aldec.com/products/active-hdl/ MIT, “JSIM,” available from http://6004.lcs.mit.edu/ Tutorial: VCS and VirSim, available from http://users.ece.utexas.edu/~dghosh/vlsi1_lab3/web/lab3set2.html Logisim, available from http://ozark.hendrix.edu/~burch/logisim/ UC Berkeley, “The Chipmunk System,” available from http://www.cs.berkeley.edu/~lazzaro/chipmunk/ SMOK/CEBOLLITA, available from http://www.cs.washington.edu/homes/zahorjan/homepage/Tools/SMOK/ index.shtml Funsime/Timsim, available from http://www.csl.cornell.edu/courses/ece314/projects/ece314p3sp07_files/ verilogtools.html SPIM: A MIPS32 Simulator, available from http://pages.cs.wisc.edu/~larus/spim.html GMIPC – MIPS Simulator, available from http://www.csl.cornell.edu/courses/ece314/gmipc/gmipc.html LC-3 Simulator, available from http://highered.mcgrawhill.com/sites/0072467509/student_view0/lc-3_simulator.html PennSim Simulator Manual, available from http://www.seas.upenn.edu/~cse240/pennsim/pennsim-manual.html SRC Assembler and Simulator, available from ftp://schof.colorado.edu/pub/CSDA/Simulators+Models/ MulSim Multiprocessor Simulator, available from http://heather.cs.ucdavis.edu/~matloff/mulsim.html VMSim - Virtual Memory Management Simulator, available from http://lass.cs.umass.edu/~bhuvan/VMSim/ SimpleScalar, available from http://www.simplescalar.com/
Predicting the Demand for Spectrum Allocation Through Auctions Y. B. Reddy Department of Mathematics and Computer Science Grambling State University, Grambling, LA 71245; email: [email protected] Abstract – The projected rare resource spectrum generates high profits if utilized efficiently. The current static allocation lead the spectrum to underutilized with fixed income. Predicting the user requirement for spectrum and auctioning the spectrum helps to better serve the customers and at the same time increases the income. In this research we use the automated collaborative filtering model for predicting the customer requirement and then allocate the spectrum through auctions (bidding for spectrum in open market). Genetic algorithm is used for optimization of the spectrum bidding problem and concluded that the spectrum will be used efficiently while generating more revenue by bidding for spectrum in the market. Keywords- Genetic Algorithm; automated collabottive filtering, cognitive radio;channels; auctions;
I.
INTRODUCTION
Historical spectrum allocation regulations insist the static assignment and long term leasing of spectrum. Over time, the static assignment led to under utilization and extreme demand for spectrum. To eliminate such under utilization of spectrum and fulfill the customer demands for spectrum, a new approach is required. As part of new approach it is required to predict the customer needs. The customer needs will be predicted using automated collaborative filtering (ACF) and allocated through auction. The spectrum trading, which uses pricing based incentives includes the functions sell, lease, and predict the user needs. The auctions are attractive for both sellers to improve their financial returns and buyers to meet their demands. To perform these auctions, we require efficient resource allocation methods and auction algorithms. The resources must be monitored continuously by a special agent at each base station to meet the demands of the customers. The agent name is cognitive radio, which understands the radio parameters, customer demands, stores the customer history, and bid for resources to meet customer needs. Dynamic spectrum allocation (DSA) using cognitive radios and DSA by auction and bidding are immediate answers to manage spectrum efficiently. Huang, et al [1] discussed the price driven power control to minimize the interference, where all buyers use the same spectrum band. Ileri [2] used the optimal channel allocation with iterative bidding to maximize the expected revenue. A hybrid model proposed to minimize the complexity by using simple auctions during peak period
with a reserved price while applying a uniform price to all buyers during off-peak is proposed by Ryan, et al [3]. Hong and Wassel’s [4, 5] results show that for dynamic channel allocation using the game theory approach for broadband fixed channel allocation, genetic algorithm [6] will be a better choice for optimum allocation of resources. The performance of genetic algorithms (GA) for resource allocation was studied by Reddy [7, 8] and concluded that genetic algorithms perform better in optimum resource allocation but take more computations. Reddy concluded that GAs produce better results in optimum power allocation and concluded that a GA approach is a viable and better for optimization problems. The proposed problem is a matter of optimum resource allocation, where the resource is the spectrum. II.
AUTOMATED COLLABORATIVE FILTERING FOR SPECTRUM ALLOCATION
Automated collaborated Filtering (ACF) is a recommendation of a product based on word of mouth [9, 10]. In ACF, if user A’s ratings of a channel (or channels) matches with another user B’s ratings then it is possible to predict the ratings of a new channel for A, if B’s rating for that channel is available. In other words, let us assume that if users X, Y, and Z have common interest in the channels C1, C2, and C3, then if X, Y did high rate of channel C4, then we can recommend the C4 for Z. That is, we can predict that Z bids for high for that channel since C4 is close interest of Z. The approximate bid of a kth bidder can be calculated by storing the bids of current bidders on the spectrum. For example, if there are N bidders and b1, b2, . ., bN are the bids of all N bidders. Let B is the sum of all bids, then kth bidder’s share is calculated as shown below: Let bk be bid of the kth bidder and sum of the N bids is N
B = ∑ bk
---- (1)
k =1
The kth bidder share of the spectrum =
bk B
----- (2)
Similarly the user interest on a product (spectrum) will be calculated.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 571–576. © Springer Science+Business Media B.V. 2008
REDDY
572
A common way of implementing the ACF systems is by using the mean squared difference formula [11], defined as below: Let U and J are two persons interested in a product called spectrum. Let Uf and Jf are the ratings of U and J on a feature f of the product. Let S, the set of features of the spectrum both U and J are rated and f ε S. The difference between two persons U and J in terms of their interests on a product is given by [11]:
δU ,J =
1 ∑ (U f − J f ) 2 | S | f ∈S
---- (3)
ACF recommendations are two types, namely invasive and non-invasive based on the user preferences [12, 13]. An invasive approach requires explicit user feedback, where the preferences can vary between 0 and 1. In non-invasive approach, the preferences are interactive and Boolean values. In non-invasive rating 0 means user not rated and 1 means rated. Therefore in non-invasive cases, it requires more data for any decision. In ACF systems all user recommendations will be taken into account, even they are entered at different times. More recommendations give good strength for recommendation and recommendations solely depends upon the data. III.
SPECTRUM BIDDING MODEL
The efficient spectrum allocation can be achieved through complete coordination of reconfigurable base stations. Cognitive radio plays a role for real time spectrum sharing and can be achieved by pooling the frequencies of different radio access technologies owned by different operators. The pool of spectrum can be accessed by any of the radio access technology (RAT) by maintaining inter and intra co-channel reusable distance constraints (without violating the set constraints). The reusable channel distance between two base stations is measured through spatial reusable distance. The eligible channel of available channels is selected by satisfying the rules of the reusable distance. The reusable channel is important for automated collaborative filtering before a user bids in auctions. The dynamic spectrum allocation through ACF and auctions (DSATAA) improves the usage of spectrum by adjusting the parameters of allocation in time and space. The ACF model helps the better selection of channel and quick selection of user interested channel. In this article a GA approach was used for resource bidding model that maximizes the spectrum utilization through DSA while increasing the revenue. The bidding prices are never uniform in any profitable business. If a customer is willing to pay a higher price for the product, then the customer will win the bid. If the difference of interest between any two customers (see equation 3) is very little (means that both customers are interested on the same spectrum) then there is a chance of higher bid by these two customers on a particular spectrum (when the spectrum is
available for bid). For better bidding, the market clearing algorithms were useful and studied extensively by Sandholm and Suri [14]. The seller predicts higher price on a particular spectrum and the customers willing to pay based on the recommendation on that product. The product recommendation in the current problem depends upon customer interest and the impact of interference, where the product is the spectrum with M channels. The best channels among the available channels have higher prices. The prediction of customer interest will be detected using ACF model and then the bidding price will be fixed. The recommendation is based upon the signal of non-associated access points, which disrupt communications. Unlike product recommendations based on ratings by two customers, the spectrum recommendation is based upon the interference constraint. If a kth user pays price pk for spectrum frequency fk then bidding for spectrum (auction clearing problem) is expressed as a non-linear optimization problem with minimum interference [15]:
f k . pk ( f k )
----- (4)
subject to fk ≤ 1. If the bidder share is
bk , the equation (4) becomes: B
bk × p k ( f k ) .The term pk ( f k ) is the unit price of the B spectrum fk , f k ∈ Fk , and Fk = ∑ f k k
The best price is obtained by maximizing
Maximize
bk . pk ( f k ) k∈bidders B
∑
bk × pk ( f k ) : B ------ (5)
The spectrum assign policy follows the spectrum usage policy. The policy is: Assume that there are three base stations A, B, and C where they are neighbors and spectrum assigned to neighboring base stations should not be same. i.e.
FA ∩ FB ∩ FC = 0
----- (6) A
If a channel is assigned to a user ( s k =1), the channel is not available to other users, where each base station has the channel frequency as:
Fk = {s1A , s 2A ,..s MA }
----- (7)
The best bidding price will be setup if the seller know the closely interested customers bidding for the spectrum. The closely related customers are obtained by using the mean squared difference formula given in equation (3). The bidding process takes new shape when the system creates a database of
573
PREDICTING THE DEMAND FOR SPECTRUM ALLOCATION THROUGH AUCTIONS
users and user interests. Using the closely related customers stored in database we can increase or decrease the unit bidding cost of the specified spectrum. The closely interested bidding customer case is dealt separately. Maximization of the bid depends upon p k ( f k ) . Equation (5) is a non-linear integer programming problem, since interference constraints involve integers. i.e.
s kA = 0 or 1. The
optimal solution can be obtained by using genetic algorithms. IV.
APPROACH
The static allocation and uniform pricing of the spectrum generates constant revenue and inefficient use of the spectrum. FCC reports [16] show that more than 70% of the spectrum is underutilized, which is due to static allocation of the spectrum. Due to static allocation of the spectrum, the licensed users or primary users use the specified channels. During the peak time, spectrum is not available to a normal user (secondary user). ACF model helps to predict the type of spectrum requirement for a particular user. Dynamic spectrum allocation and auction policy will help secondary users to use the spectrum efficiently and at the same time increase revenue. Efficient utilization of spectrum and auctioning policy should not interrupt the quality of service (QoS) or increase the interference. There are many approaches to deal with this problem. An intelligent agent called cognitive radio (CR) is created at each base station to keep track of the current state of the spectrum, store history of spectrum utilization, track the users (customers), predict the needs of users, and bid for spectrum. That is, the cognitive radio works at the base station for secondary users and provides the extra revenue for the manager. Because the process involves the quality of service, the CR must take care of the interference. Prediction through ACF: Using the equation (3) the difference between interests of any two persons will be determined. If these persons present in the bid, the bidding process takes different direction and seller gets high value. So there is a need to maintain the ACF database with dynamic updates of user interests, which helps for profits in bidding. Sorabh [15] discussed the piecewise linear price demand (PLPD) using the linear equation for price sensitivity. The formula allows the bidders to express the preferences privately by eliminating complex bid signaling. But ACF further simplifies the bidding because it predicts various user needs and helps in auction clearing. Auction Clearing Algorithms (ACA): The ACA’s are NPhard. If a channel frequency in a cell allocated to any bidder, none of the neighboring cells of same frequency should not be allocated due to interference. So it is required to have a maximal independent set of conflict graph of frequencies to allocate the same frequency. If a large number of bidders are involved it becomes a more complex linear programming problem and requires large amount of time. One of the possible solutions to solve such problem is genetic approach
[6]. The genetic algorithm approach helps to solve the problem of allocation of the spectrum to the appropriate user and higher bid. The trials and success rate is provided to the genetic algorithm to find the fitness of channels and select the appropriate channel for the future request. In other words, CR uses the genetic algorithm to process and find the best channel ordered list for the best bidding. The best price on bidding will be obtained by maximizing the value in equation (5). To solve the equation (5) we need the following input: number of users channel numbers assigned to users channel rating bid value The values for the input are taken randomly and calculated the price of the spectrum. The bid values are selected depending upon the channel rating received from users. An example of the calculations for bid price is given in Table 1 below: Table 1: Spectrum bidding with ‘n’ channels and k users. User # 1 2 1 4 3 2 1 2
Channel # 1 2 4 3 5 8 7 6
Channel Rating 0.5 0.4 0.6 0.8 0.2 0.5 0.8 0.6
Channel rating 0 to 0.3 0.31 to 0.5 0.51 to 0.7 0.71 to 0.99
bid-value 0.3 0.5 0.7 0.9
Bid-value
(bi/B)*A
0.5 0.5 0.7 0.9 0.3 0.5 0.9 0.7 5
0.5/5*0.625=0.0625 0.5/5*0.625=0.0625 0.7/5*0.625=0.0875 0.9/5*0.625=0.1125 0.3/5*0.625=0.0375 0.5/5*0.625=0.0625 0.9/5*0.625=0.1125 0.7/5*0.625=0.0875 0.5/5*0.625=0.0625 Sum=1.25
A=5/8=0.625
We solved the spectrum auction policy by using MATLAB language. The output is then assigned as fitness function to ‘gatool’ (MATLAB genetic algorithm tool) and observed the convergence. The simulations are discussed in section VI. V.
WHAT IS GENETIC ALGORITHM
Genetic algorithms (GA) are a particular class of evolutionary algorithms used in computing to find exact or approximate solution. Genetic algorithms are inspired by evolutionary biology such as inheritance, mutation, selection, and crossover. In computations the abstract representations of candidate solutions are chromosomes and set of chromosomes formed as population. Traditionally the chromosomes are randomly generated as binary strings of 0s and 1s, but other encodings are also possible. In each generation the fitness of individual (chromosome) in population is evaluated and multiple individuals are stochastically selected from the current population based on their fitness. The new population
REDDY
574
is formed using mutation, crossover and selection operators and fitness of the individual chromosome. The algorithm terminates as maximum number of generations are reached or satisfactory fitness level has been reached for the population. A typical genetic algorithm requires a genetic representation of the solution domain and a fitness function to evaluate the solution domain. The fitness of the solution is the sum of values of all objects in the knapsack if the representation is valid or 0 otherwise. Once we have the genetic representation and the fitness function defined, GA proceeds to initialize a population of solutions randomly, and then improve it through repetitive application of mutation, crossover, inversion, and selection operators. The chromosomes are homogeneous in length to facilitate crossover operation. Tree-like representations are explored in genetic programming and graph-form representations are explored in Evolutionary programming. The operations are explained below: Initialization: The individual population (chromosome) is generated randomly. The size of the population depends upon the nature of the problem. Selection: Selection eliminates the poorer performing individuals by promoting the better performing individuals. Depending upon the problem, a certain percentage of new population will be added to lead better solution. The less fit population are normally ignored to bread for next generation. This helps keep the diversity of the population large, preventing premature convergence on poor solutions. Popular and well-studied selection methods include roulette wheel selection and tournament selection. Reproduction: Generate new population from the current population using crossover and mutation operators. To produce new population, take a pair of randomly selected chromosomes (parents) and generate the children using crossover and mutation operators. The children share the many of the characteristics of its parents. The process continues until a new population of solutions of appropriate size is generated. These processes ultimately result in the next generation population of chromosomes that is different from the initial generation. Termination: The common termination condition is number of generations. Other termination conditions may include that a solution is found that satisfies minimum criteria or budget constraints on computations. VI.
RESULTS AND CONTRIBUTIONS
The equation (5), bidding for spectrum was solved using MATLAB program and MATLAB gatool in parallel. The data for user entrance in the system for channel use, channel numbers (without duplication), and channel rating by users were developed randomly through MATLAB programming. Figure 1 is drawn for 500 channels, 10 users, and 100 different iterations. For each one of the iteration the bidding value is
calculated and the lowest bidding is observed after 70 iterations. Next the bidding value is input to the MATLAB gatool and executed for 600 generations as shown in Figure 2. The crossover is set for heuristic with population of 30 and mutation Gaussian and with appropriate stopping criteria. The heuristic selection of crossover converges quickly compared to scattered or two point (see Figure 4). Initially we used the crossover, mutation, and scaling function values were used as the default values. In the second step, the parameter values of population were changed to 30, crossover function as heuristic, mutation as Gaussian, and selection function as Stochastic Uniform. The execution was selected for 600 generations for better bidding values (Figure 4 and Figure 5). The function was tested with 4 to 200 users and with variation of 20 to 50 channels. The maximum number of generations observed was set to 600. The example graphs generated through gatool were provided in Figure 4 and Figure 5. The best fit value is better when more users are trying to bid for spectrum (top part of the Figure 4 and Figure 5). Figure 3 (4 to 20 bidders) and Figure 6 (4 to 50 bidders) are tested with MATLAB program with 50 channels. It is observed that if number of bidders is less, more channels are available for bidding and if the bidding range is smaller then more users are bidding for spectrum. The reason for decreasing of profit depends upon the number of channels for bidding and bidding rate. Therefore, we can conclude that as the number of channels open for bidding increases the profit decreases because profits based on number of open channels and bidding rate, which is natural in the market. Figure 7 shows the market value for fixing the number of channels 50 for bidding. The figure 7 concludes that the profit decrease as few number of channels are available for bidding. The figure 3, Figure 6, and Figure 7 conclude that open bidding for spectrum or bidding for spectrum through dynamic allocation generates more revenue compared to static allocation of spectrum. Figure 1 show that if no one requires the spectrum (or no one bids for spectrum), we do not distinguish between dynamic allocation and static allocation of spectrum. Figure 4 and Figure 5 (Lower part) show the convergence of the equation (5) using gatool for 50 channels and 600 generations. The best fitness shows the decreasing value or market generated value as we fix the number of channels for static allocation from 1 to 50. Figure 2 (right top of the figure) further concludes that the average distance between any two individuals of the population close to 0 (zero) quickly at generation 1, means the system converges at an earliest time. That is, when the system generates the best population and the profit will be optimum from that point. Figure 1 also shows the fitness scaling of expected number of children versus the raw scores at each generation. It concludes that number of channels and bidders make difference for profit in the auctions.
575
PREDICTING THE DEMAND FOR SPECTRUM ALLOCATION THROUGH AUCTIONS Bidding with 500 channels and 10 users
Best: 0.59141 Mean: 0.5951
0.63
0.6 Best fitness
0.61
bidding value
0.595
Fitness value
0.62
Mean fitness
0.59 0.585 0.58
0.6 0.59
0
100
200 300 400 Generation Average Distance Between Individuals
500
600
4
0.58
3 2
0.57
1
0.56 0.55
0
0
10
20
30
40 50 60 number of iterations
70
80
90
50
100
150
200
250 300 350 Generation
400
450
500
550
600
100
Figure 1: Bidding 500 channels and 10 users fixed with 100 different iterations
Figure 4: users varying 4 to 200 the graph converges after 450 generations Best: 0.57935 Mean: 0.59472
0.61
17 Average x 10 Distance Between Individuals 4
Fitness value
Best fitness
0.6
3
Mean fitness
0.59
2
0.58
1
0
200 400 Generation Fitness Scaling
600
0
6
3
4
2
2
1
0.62 Best fitness
Fitness value
Best: 0.58995 Mean: 0.59508
Mean fitness
0.6
0.58
0
100 200 300 400 500 600 Generation Score Histogram
100
200 300 400 Generation Average Distance Between Individuals
500
600
4 3
Expectation
2 1 0
0 0.58
0.59 0.6 Raw scores
0.61
0 0.58
0.59
0.6
0.61
50
100
150
200
250 300 350 Generation
400
450
500
550
600
Figure 5: users varying 4 to 20 the graph converges after 550 generations
Figure 2: Convergence with crossover as heuristic search
Bidding mean valu for 4 to 50 users and 50 channels 0.014
bidding value
Bidding graph for 4 to 20 users and 50 channels 0.0128
0.0135
0.0126
0.013
0.0124
0.0125
0.0122
0.012
0.012
0.0115
0.0118
0.011
0.0116
0.0105
0.0114
0.01
0.0112
4
6
8
10 12 14 number of users
16
18
20
Figure 3: Bidding mean value for 4 to 20 users and 50 channels
0
5
10
15
20
25
30
35
40
45
50
Figure 6: Bidding mean value for 4 to 50 users and 50 channels
REDDY
576 The Profit Diagram with 60 Bidders
REFERENCES
140
[1] 120
[2]
100 80 Profit
[3] 60
[4] 40 20
[5]
0 -20
0
20
40 60 80 100 Number of Fixed Units of Spectrum
120
140
Figure 7: Market Value By Fixing the Channels 60 Bidder
[6] [7]
[8]
VII. CONCLUSIONS The current research discusses the properties of three modules contributed for optimum utilization of wireless communications facility. First, cognitive radio is the best tool to sit at the base station and keep track of the channel allocation, optimum power utilization, and use the auction facility for best bidders. Second, the automated collaborative filtering facility helps to provide the data of user recommended channels which leads to predict the channel (s) for higher bid. Third, the MATLAB gatool to optimize the resource allocation and calculate the best fit for allocation of channels (spectrum). The genetic algorithms for optimization was used by many authors [7, 8], but the ACF model for predicting demand for spectrum is proposed first time in (wireless communications) the proposed research and the results are satisfactory. In continuation of this research we include ACF and game theory for better bidding process. Many times ACF works closely with game theory for predicting the channel gain and utilization. With the introduction of current results, many researchers will use the ACF and game theory combination for better results. In conclusion, the results through MATLAB ‘gatool’ and MATLAB program conclude that less static allocation generates more revenue and efficient utilization of spectrum. Also, using ACF approach we can predict the channel in demand and recommend for higher bidding. ACKNOWLEDGMENT The research work was supported by Air Force Research Laboratory/Clarkson Minority Leaders Program through contract No: FA8650-05-D-1912. The author wishes to express appreciation to Dr. Connie Walton-Clement, Dean, College of Arts and Sciences, Grambling State University, for her continuous support.
[9] [10]
[11] [12]
[13]
[14] [15]
[16]
Huang, J., Berry, R., and Honing, M., “Auction mechanisms for distributed spectrum sharing”, Proc. Of the 42nd Allerton Conference, September 2004. Ileri, O., Samardzija, D., Sizer, T., and Mandayam, N. B., “Demand responsive ness pricing and competitive spectrum allocation via a spectrum server”, Proc. of DySpan, 2005. Ryan, K., Aravantions, E., and Buddhikot, M. M., “A new pricing model for next generation spectrum access”, Proc. Of Tapas, 2006. Shin Horng Wong, Ian Wassell., “Dynamic Channel Allocation Using a Genetic Algorithm for a TDD Broadband Fixed Wireless Access Network”, IASTED International Conference in Wireless and Optical Communications, Banff, Canada, 2002, 521-526. Shin Horng Wong, Ian Wassell., “Distributed Dynamic Channel Allocation Using Game Theory for Broadband Fixed Wireless Access”, International Conference on Third Generation Wireless and Beyond, San Francisco, May 2002, 304-309. David E. Goldberg, “Genetic Algorithms in Search, Optimization, and Machine Learnin”, Addison-Wesley, 1989. Y. B. Reddy, “Genetic Algorithm Approach for Adaptive Subcarrier, Bit, and Power Allocation”, 2007 IEEE International Conference on Networking, Sensing and Control, London, UK, April 15-17, 2007. Y. B. Reddy, “Genetic Algorithm Approach in Adaptive Resource Allocation in OFDM Systems”, International Joint Conferences on Computer, Information and Systems Sciences and Engineering (CIS2E 06 - Sponsored by IEEE), December 2006, Bridgeport, USA. Resnick, P., Varian, H. R., “Recommender Systems”, Special issue of Communications of the ACM, 40 (3), 1997. Shardanand, U., Maes, P., “Social Information Filtering-Algorithms for Automating Word of Mouth”, Conference on Human Factors in Computing Systems, 1995. Cunningham, P., “Intelligent Support for E-commerce”, http://www.cs.tcd.ie/Padraig.Cunningham/iccbr99-ec.pdf, 1999 Hays, C., Cunningham, P., Smyth, B., “A Case-based Reasoning View of Automated Collaborative Filtering”, Proceedings of 4th International Conference on Case-Based Reasoning, 2001. Sollenborn, M., Funk, P., “Category-Based Filtering and User Stereotype Cases to Reduce the Latency Problem in Recommender Systems”, 6th European Conference on Case Based Reasoning, Springer Lecture Notes, 2002. Sandholm, T., and Suri, S., “Market Clearability”, Proc. Of the International Joint Conference on Artificial Intelligence (IJCAII) 2001. Sorabh G., Chiranjeeb B., Lili C., Haitao Z., and Subhash S., “A General Framework for Wireless Spectrum Auctions”, 2nd IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks, DySPAN 2007. http://wireless.fcc.gov/reports/
Component-Based Project Estimation Issues for Recursive Development Yusuf Altunel Istanbul Kültür University Phd. Student at EMU Computer Eng. [email protected]
Abstract- In this paper we investigated the component-based specific issues that might affect project cost estimation. Component-based software development changes the style of software production. With component-based approach the software is developed as the composition of reusable software components. Each component production process must be treated as a stand-alone software project, which needs individual task of management. A typical pure component-based development can be considered as decomposition/integration activities successively applied at different levels and therefore results in recursive style of development. We analyzed and presented our results of studies on the component-based software development estimation issues from recursive point of view.
I.
INTRODUCTION
Software project estimation is performed to understand software’s baseline models and relationships, key process characteristics, and enable better management for planning and tracking the activities and validating models, as well as guiding the process improvement to understand, assess, and package the software [1]. Project planning activities require accurate estimations to enable the manager take reasonable decisions on cost, resources, schedules, and project coordination. We analyzed the special requirements and identified the main characteristics of component-based project estimation. Component-based software development methodology, as a result of its modularity and reusability nature, makes identification of the main blocks of the software easier and facilitate the estimation. For the management point of view, component based development requires some additional tasks comparing with the classical projects, such as searching, obtaining, assessing, and integrating components. Henceforth, activities like market analysis, quality considerations, and reusability studies are getting more attention. Additionally, component-based development might require extra testing and comprehensive documentation. Finally, there are different aspects of component-based estimation as a result of its reusability and code outsourcing characteristics. A component-based software development project is based on decomposition-composition tasks and once the whole system is decomposed into its main components, each component must be treated as a new software system. Therefore each component development task must be considered as a new software development project that will be estimated and scheduled individually [2]. Correspondingly,
Mehmet R. Tolun Çankaya Üniversity [email protected]
the component-based approach to estimate the costs and identify the activities per component, sub-components, and variants must be established. As the decomposition progresses, new sub-components are identified and the process continues until the primitive components, the components that cannot be further decomposed, is reached. This recursive approach to the software development provides a proper model to construct the software in incremental and iterative manner. In this paper we firstly present the project features, which are categorized as objective and subjective features, used to understand project’s basic characteristics. Next, we studied the classical project estimation issues with respect to componentbased development project. This is followed by the study of component-based specific issues in project measurement. Finally we provide our conclusive remarks and further studies. II. PROJECT FEATURES A good estimation requires clear identification of the project by collecting information on its basic and unique characteristics. A software project would be represented by means of its objective and subjective features: The objective features provide the ‘real’ characteristics of the software independent of the development team, environment, and the tools, etc. whereas the subjective features represent the actual style of development and fully depends on the subjective characteristics such as the style of development, the technology used, organizational characteristics, and experience of development team. Mostly, objective and subjective features are used in project estimations as single units [3]. Making differences between each category and estimating them enable us to calculate the real value of the project and our expected influence individually. Additionally, objective criteria help us to catalogue and compare different type of projects from different domains on the same base. A. Objective Features Objective features represent the difficulty of the problem that will be solved. The main factors are listed as the size and complexity of the project, its functional and non-functional characteristics, domain properties, the parallelism and concurrency, interfaces, and the user characteristics. The attributes of these factors are represented in Table 1. Objective features can initially be produced from earlier projects and documents but later updated as the decomposition process proceeds.
T. Sobh (ed.), Advances in Computer and Information Sciences and Engineering, 577–581. © Springer Science+Business Media B.V. 2008
578
ALTUNEL AND TOLUN TABLE I OBJECTIVE FEATURES AND FACTORS USED TO REPRESENT SOFTWARE PROJECT’S BASIC CHARACTERISTICS Factors
Size Complexity Requirements Characteristics Functional Characteristics Non-functional Characteristics
Decomposition Width Component Complexity Number of Requirements Number of Functions Quality
Domain Properties
Terminology
Parallelism
Threads
Interfaces
Man-machine
Roles vs. Users
Type of Last User
Factors Size Complexity Requirements Characteristics Functional Characteristics Non-functional Characteristics Domain Properties Parallelism Interfaces Roles vs. Users
Number of Methods Algorithmic Characteristics Concreteness Structural Characteristics Documentation
Attributes (1) Decomposition Depth Component Collaborations Coverage
Decomposition Length Usage Scenarios Clarity
Coverage
Clarity
Efficiency Standards Parallel Processes Operating System Number of Roles Attributes (2) Size of Methods
Reusability Domain Specific Components Distributed Data Access Networking Function Per Role
Data
Completeness Behavioral Characteristics
Expertise Simultaneous Data Access Direct Device Access Function to Role
Concurrency Other Systems Role to User
Role to User
B. Subjective Features They represent organizational influences in implementation of the project. Certain factors such as the availability of certain resources, project specifics, the development type, the base activity characteristics, and project’s modeling characteristics are all subjective features, since they depend on the management decisions and various from one project to another. The attributes of these factors are represented in Table II which can be derived from the organizational data but must be reorganized and refined first. The studies on the subjective features provide valuable insight about how effective the organization produce the software and the results might be used in risk management and process improvement activities.
TABLE II FEATURES AND FACTORS USED TO REPRESENT PROJECT’S RELATIVE CHARACTERISTICS.
Factors Project Specific Features Time Project Resources Availability Criticality Project Specifics Penalties Motivation Organization Specific Features Organization Experience: -Similar Projects -Methodology -Tools Harmony
Organization
Style of Development Methodology Type of Developm. Base Activity Specific Modeling Tools
Attributes Manpower
Hardware
Technology Prizes
Bidder Type
Mgmt.
Leadership
Software
Team: -Analysis -Design -Implement. -Testing Inter-Team Intra-Team
Develop. Life Cycle
Decomp. Style
Project Mgmt.
Quality and Reuse Mgmt.
Config. Mgmt.
Analysis Models Analysis
Design Models Design
Prototypes Implement.
Documents and Templates
Customer
User
Component Acquisition Method Document.
Marketing
Testing
Mgmt.
III. CLASSICAL VS. COMPONENT BASED ESTIMATION Classical estimation is identified as the determination of the resource, cost, and schedule of a software project. The classical estimation techniques are functionally oriented and a certain degree of decomposition helps. Performance, quality, and security considerations affect the overall estimation. Classical estimation is very rough, and availability of the experience and historical data enhance the correctness of predictions [4]. Classical estimation techniques perform the estimation based on very general characteristics of the project at the early steps of development and derive the individual activity costs accordingly. As a result, the estimates are very rough and provide underestimated results for project management activities [5]. The technology in the field of software engineering is emerging and new tools and techniques force us to change our vision of software engineering project estimation. The experiences in the software projects so far show us that the influences to component-based estimation are various and more detailed than those provided by the classical estimation methodologies [6]. The GUI tools help the developers to easily construct the software interfaces and automatically generate thousands of lines of code in background. On the other side, reusable software is becoming an important part of software projects, and cost estimation for off-the-shelf software is considerably different. A summary of comparison between the classical and component-based project characteristics is represented in Table III.
COMPONENT-BASED PROJECT ESTIMATION ISSUES FOR RECURSIVE DEVELOPMENT TABLE III FEATURES AND FACTORS USED TO REPRESENT PROJECT’S RELATIVE CHARACTERISTICS.
Attribute Lines of Code
Project Size
Project Complexity
Input-ProcessOutput
Quality Vision Documentation
Use of Tools Standardization Development Environment
Development Life Cycle
Classical Counts the source code length. Programmers generate the code. Function Points. Lines of Code
Amount of decisions in code. Too many decomposition alternatives. Widely applied.
Only for critical projects. Standard documentation on need basis. When needed. Job is generally done manually. Project specific. Depends on the willingness. Depending on the technology. It is easy to learn. Depends on the project and management’ decision: Waterfall, Iterative, Incremental are all possible.
Component-Based Meaningless for off-the-shelf components. Partially produced by the code generators. Useful to understand functional characteristics. Number of components, component size in length, depth, and wideness. Complex components would be outsourced. Forces the decomposition into semantically correct modularization. GUI and reporting tools help to easily construct the interfaces and reports. Component’s internal structures are dominating. Per component. Additional to the classical documents, component specification and configuration documents. Too complex to be performed manually. Use of tools is encouraged. Reusability forces for standardization. Needs more time for adaptation to the methodology and the tools. Extra training to teach how to use the components. Recursive in nature. Each component constructed as a stand-alone project. Waterfall, iterative, and incremental methods are adapted.
A. Source Code Size The lines of code count estimations are not feasible for component-based development project size. Generally, source code of the outsourced components is not available and corresponding measures cannot be performed. To estimate and measure the project size, other estimations like component size in length, depth, and wideness must be preferred. Jose Javier Dolado claims that estimating the system by sizing the components was advantageous and accurate [7]. As new tools and techniques are invented, the length of source code is becoming the minor factor in the software engineering effort estimation. As a result of using modern code generators, a considerable part of source code is generated automatically or semi-automatically. Therefore, the activities to gather information, communicate with the users, search for
579
reusable software, decompose the system into reusable units of software, constructing and assessing models; maintain, test and integrating the components, etc. are becoming the dominant factors. B. Project Complexity Generally, project complexity, project size, and degree of structural uncertainty cause the project costs to increase. The complexity is a relative measure and estimation generally based on past effort. Increase in project size increases the interdependency between the modules, which in return causes project costs to grow geometrically. Due to the larger amount of decomposition alternatives in complex projects, making correct decisions is harder. Component-based software development methodology expected to provide results with better decomposition and higher reusability, which in return expected to reduce the overall project costs [8] and increase the predictability. Comparing with the classical approaches, alternatives to decompose the system into components and bringing the components together dominate the complexity. For example, cyclomatic complexity of the methods in typical componentbased applications is relatively low, thanks to the modularization, but there are too many permutations that must be considered. Another factor to reduce the overall complexity in a project might be having the chance to obtain the complex components from component producers specialized in the field. Accordingly, the need to implement complex components is expected to degrade as mature component market appears. C. Input-Process-Output The classical methods in software project estimation are based on the input-process-output view which provides very rough information on the main characteristics of componentbased projects. The input and output characteristics of the software system consumes relatively little work power, thanks to the new GUI tools and ready to use user interface components. Additionally, the process mode of the software itself does not provide sufficient information to make accurate estimations. Instead of classical ones, we should concentrate on other factors like the internal style of decomposition, the availability of ready to use components, the collaboration type of internal components, the architectural and real time characteristics of the software, as well as the parallelism and concurrency should determine the estimation. D. Development Life Cycle A critical drawback of classical approaches is the sensitivity to software development life cycle. Most of the classical approaches assume that analysis and system specification phases have already been completed at the time of estimation. The complete definitions of functional characteristics in iterative or incremental software development project might be carried out piece by piece and only is ready at the end of the implementation. Component-based implementation is recursive and iterative in its very basic nature. Therefore, estimation techniques must be available based on not only the functional
580
ALTUNEL AND TOLUN
specifications, but on all possible knowledge about the projects like the requirements, quality specifications, market study, etc. Additionally, at each iteration level the estimations must be adjusted according to the new and detailed knowledge about the software. IV. COMPONENT-BASED SPECIFIC ISSUES Component-based software, as a result of its continuous decomposition-integration characteristics, is properly constructed in recursive manner. The concentration is to decompose each piece of software into independently generated software components applying the same set of activities [9]. In recursive development, the project might be explored in breadth-first or depth first manner. When the breadth first is the case, the decomposition tree is explored layer by layer first the child components and next the child of the children, and so on. In depth-first manner, on the other side, one branch of the decomposition tree is explored first and the others later. In the depth-first exploration, the correctness of the estimation for branches that are implemented first are incrementally improved, but other parts remains unknown at initial steps. In breadth-first exploration, the correctness is gradually incremented as the development progresses. A. Size of Decomposition Generally, project size is accepted as a critical feature in cost estimations. In the case of the component-based projects alternative metrics for size are needed, since the lines of code type metrics are rarely available as a result of the black-box characteristics of the components. To overcome the problem, we suggest an alternative metrics used to estimate the decomposition characteristics of the project. Three dimensions of decomposition are identified as the depth, width, and length of decomposition. Depth of Decomposition: Depth of decomposition is used to represent the number of successive iteration to take the decomposition process before stop. It shows us how long will take the recursive decomposition process in the development. This feature can be measured by studying the sub-functional characteristics of the components newly to develop, and the number of ready to use sub-components and variants that are available. In worst case, all sub-functional characteristics must be implemented by producing new components when no readyto-use components available. In other cases, some of the subfunctionality can be implemented by means of ready-to-use components, which reduces the depth of decomposition. Width of Decomposition: Width of decomposition represents the number of child components that will appear after the parent component is decomposed. Our studies show that there are certain factors that influence the number of subcomponents. One factor for example is the number of independent sub-functions (use cases) that a component is expected to implement. To reduce the number of possible subcomponents, the cohesion of the parent component can be studied. Additionally, the successive operations that will take
place within the component to fulfill its basic duties (example: connect to the database, find the records, process the records, etc.), new sub-components might appear to fulfill these tasks on behalf of the parent component. Designing generic components to implement such common behavior can help us to reduce the width of the decomposition and consequently we need less sub-component implementation. Length of Decomposition: Length of Decomposition is used to identify number of variants produced for each component. Component variants are special components providing only a sub-set of the functionality to implement the variation. Variation of the functionality, in this respect, results in implementation of more variants. For example, a student component can be used in sending an email to the student, and/or in grading the student. In the first case, the email address of the student and a function to send the email is enough to complete the mission. In the second case however, more information (the courses registered, exam results, course’s grading policy, grades defined by institution, etc.) and more functionality is required. It might be an acceptable strategy to keep all information and functionality in one “Student” component but it is wiser to produce variants to optimize the run-time computing resources and reduce the possible mistakes arises due to component features that are not needed. B. When to Estimate The aim is to estimate the individual and (if possible) independent characteristics of the project to provide more correct and detailed information. A critical question is when to perform the first estimation and at which steps of development can we apply re-estimation. The initial estimation for component-based project is performed at the beginning of the project within a limited time and based on incomplete and defective data about the software. At the initial steps, the development team studies the general characteristics of the software. The technical decisions and internal details are clarified as the development progress. Accordingly, the technical issues like decomposition, component’s internal characteristics such as the size, complexity, interfaces, etc. are revealed at the later steps of development. Since correct estimations depend on the detailed and clear technical information, which is not available at the initial steps, the project estimation in recursive approach is relatively harder. C. Estimation Points So far we have discussed the project features and represent the component-based specific characteristics in the view of recursive decomposition approach. It is necessary to evaluate the estimation points using the objective and subjective features per component. In component-based projects, we would like to know the costs of certain activities like the project management, quality specific tasks, documentation, analysis, design, implementation, testing, integration activities, and market study. In Table IV we represented the relationship
COMPONENT-BASED PROJECT ESTIMATION ISSUES FOR RECURSIVE DEVELOPMENT
between the objective factors of the project and the estimation points. Similarly in Table V we presented the relationship between the subjective factors with respect to the estimation points. From these tables it is easy to see that some of the objective features (size, functional and non-functional characteristics) and some of the subjective features (the project resources and base activities) becoming the dominating factors for each estimation point. Similarly, some of the estimation points (analysis, design, implementation, and testing) are strongly dependent on objective features and some of the estimation points (management, quality, integration) are dependent on the subjective features. These results are in no way surprising, but must be justified with experimental results. Additionally, some new studies are required to clarify the relationship between the attributes and sub-attributes with respect to the estimation points to obtain more correct results. TABLE IV THE ESTIMATION POINTS AND RELATED OBJECTIVE FACTORS. Estimation Points
Size
Funct.C haract.
Nonfunct. Charact.
Mgmt. Quality Document.
√ √ √
√ √
√ √ √
Analysis Design Implement. Testing Integration Market Study
√ √ √ √ √
√ √ √ √
√ √ √ √
√
√
Parallel
Interfaces
Roles vs. Users √ √
√ √ √ √ √
√ √ √ √ √
√ √ √ √
TABLE IV THE ESTIMATION POINTS AND RELATED SUBJECTIVE FEATURES. Estimatio n Points
Project Resourc. Available
Projec t Type
Motivatio n
Methodol ogy
Develop Life Cycle
Base Activit.
Mgmt. Quality Document.
√ √ √
√ √ √
√ √ √
√ √
√
√ √ √
Analysis Design Implement . Testing Integratio n Market Study
√ √ √
√ √ √
√ √ √
√ √
√ √
√ √
√ √
√
√
√
√ √
√
V. CONCLUSIONS Component-based software development is attracting more and more attention in the software society and the market grown steadily [10]. Component-based approach to software development is changing the style of software development by proposing reusability, quality, outsourcing rather than developing every piece of software from scratch. It was very
581
interesting to analyze the software project estimation from the perspective of recursive component-based software development. From this study we conclude that there must be some objective and subjective factors to represent project’s main characteristics which help to estimate project costs. Additionally, we found out that there are certain differences between the classical and component-based estimations. Software is constructed out of components and subcomponents as a result of recursive decomposition/integration activities. Each component and sub-component is a stand-alone software project, which needs a specific software project. Therefore it is harder to get correct results at the initial steps of development, but as the decomposition process proceeds, we get better knowledge about each child component and better estimation results appear. As an advantage, ready-to-use components reduce overall project complexity which in return enhances the estimation. Finally, lines-of-code based estimations are rarely suitable in the case of pure componentbased development due to the lack of the source code generally. As a future work, we plan to justify the results with experimental studies in the field. We try to develop a new model to make a more detailed estimation using the objective and subjective features, their attributes and sub-attributes. REFERENCES [1]
“Software Engineering Program: Software Measurement Guidebook”, National Aeronautics And Space Administration, Washington, DC, 1995, pp 5. [2] “Effort Estimation in Component-Based Software Development: Identifying the Parameters”, http://www.cs.utexas.edu/users/csed/doc_consortium/DC98/smith.p df pp 1. [3] “Effort Estimation in Component-Based Software Development: Identifying the Parameters”, Randy K. Smith; The University of Alabama; http://www.cs.utexas.edu/users/csed/doc_consortium/DC98/smith.p df pp 2. [4] Roger S. Pressman, Software Engineering A Practitioner’s Approach, 5th Edition, McGraw Hill, 2001, Pp 114. [5] Luiz A. Laranjeira, “Software Size Estimation of Object-Oriented Systems”; IEEE Transaction on Software Engineering, Vol. 16, No. 5, May 1990, pp 510, 511. [6] L. Angelis, I. Stamelos, M. Morisio, “Building Software Cost Estimation Model Based on Categorical Data”, In The Proceedings of the Seventh International Software Metrics Symposium (METRIC ‘01), IEEE, 2001 pp 3. [7] Jose Javier Dolado, “A Validation of the Component-Based Method for Software Size Estimation”, IEEE Transactions on Software Engineering, Vol. 26, No. 10, October 2000, pp 1018 [8] Sahra Sedigh-Ali, Arif Ghafoor, Raymond A. Paul, “Software Engineering Metrics for COTS-Based Systems”, IEEE Computer, Vol. 34, No. 5, May 2001, pp 44. [9] Yusuf Altunel and Mehmet R. Tolun, “Component-Based Software Design: Recursive Development and Variants”, In Proceedings of Integrated Design and Process Technology, IDPT-2004 SDPSSociety for Design and Process, June 2004, pp 39. [10] Miguel Afonso Goulão & Fernando Brito e Abreu, “From Objects to Components: A Quantitative Experiment”, http://citeseer.ist.psu.edu/584513.html .
Author Index
Abdoli, F., 153 Abdulrab, H., 491 Abeysinghe, G., 33 Abiona, O. O., 85 Abiyev, Adalet N., 21 Abramov, G.V., 178 Abreu, Bruno, 119 Abuzaghleh, Omar, 5 Abuzneid, Abdelshakour, 5 Adagunodo, E. R., 85 Aderounmu, G. A., 85 Agyepong, Kwabena, 372 Akbarzadeh-T, Mohamad-R., 147 Akojwar, Sudhir G., 503 Al-Arif, T., 79 Alejandro, Valdes Marrero Manuel, 213 ALGhalayini, Mohammad A., 329 Aliee, Fereidoon Shams, 455 Almeida, Chrishnika de, 422 ALNemah, ELQasem, 329 Alshabi, W., 491 Altman, Micah, 311 Altunel, Yusuf, 577 Arsan, Taner, 526 Asadi, Mehdi, 305 Ashalatha, M.E., 256 Atabany, W., 558 Auner, Gregory, 268 Barrett, W. A., 562 Baskaran, Ravi, 251 Batista, Wesly, 119 Bavan, A. S., 33, 74 Bavan, S., 33, 74 Bayrak, Coskun, 108 Begdillo, Shapour Jodi, 305 Bennamoun, M., 509 Bertels, Koen, 97 Best, Lisa A., 273 Bidgoli, B. Minaei, 354 Bradford, Phillip G., 394 Burnett, Andrew, 45 Calinescu, Ani, 51 Carvalho, Ana Lisse, 399 Cavenaghi, Marcos Antonio, 284
Challa, Subhash, 343 Chandu, Vaddadi P., 251 Chmelar, Petr, 390 Christakou, Evangelos, 136 Christensen, K., 405 Cilku, Bekim, 196 Coleshill, Elliott, 131 Counsell, Steve, 497 Daily, Jeremy S., 172 Das, Vinu V., 242, 553 Davies, R., 509 de Castro, Ana Karoline Araújo, 39 de Souza, Gilberto George Conrado, 39 Degenaar, P., 558 Dekhil, Mohamed, 441 Derezińska, Anna, 57 Dowling, Tom, 45 Downing, Beth, 172 Elkady, Ayssam Y., 90 Elkobrosy, Galal A., 90 El-Menshawy, A.M., 79 Emelyanov, A.E., 178 Esswein, Werner, 295 Faheem, H.M., 79 Fan, Qiong-Wen, 159 Faruk, Md. Saifuddin, 1 Ferworn, Alex, 131 Filiposka, Sonja, 196 Fitsilis, P., 378 Fornazin, Marcelo, 284 Fuicu, Sebastian, 290 Furtado, Maria Elizabeth S., 399 Gadewadikar, Jyotirmay, 372 García-Arriaga, H. O., 27 Gebriel, Eslam M., 521 Geisser, Michael, 317 Georgiadis, Christos K., 125 Ghazizadeh, M., 348, 354 Ghosh, Riddhiman, 441 Ghosh, Tirthankar, 532 Grnarov, Aksenti, 196 Gümüşkaya, H., 184 58 3
584
Gupta, Anu, 68 Gupta, Karan, 245 Gupta, Kshitij, 68 Hameurlain, Abdelkader, 279, 416 Hannab, Sarwat N., 90 Hernych, Radim, 390 Hierons, Rob M., 497 Hildenbrand, Tobias, 317 Hong, Chun-Pyo, 242 Hossain, A.B.M. Mozzammel, 1 Huang, Feng-Long, 159, 571 Hughes, Cameron, 101 Hughes, Tracey, 101 Hyder, Syed N., 515 Irwin, Barry, 360 Islam, Mofakharul, 323 Islam, S. M. S., 509 Itmi, M., 491 Ivliev, M.N., 178 Jamal, Sheliza, 422 Jochumsen, K.M., 405 Joshi, Hemant, 108 Juhrisch, Martin, 295 Kaevand, T., 540 Kahani, M., 153, 354 Kandepet, Pavan, 447 Kang, Byung-Heon, 242 Kazemi, Ehsan Mohamad, 142 Kazemi, Farhad Mohamad, 142, 147 Ke, Shu-Yu, 159 Khan, Umar F., 479 Khetrapal, P. Ratika, 547 King, Raddad Al, 416 Krishna, Venkata, 547 Krishnamurthy, G.N., 256 Kruse, T.A., 405 Kubicek, Daniel, 390 Kude, Thomas, 317 Kuljaca, Ognjen, 372 Kulkarni, Anjali V., 245 Kumar, P. Ram, 428 Kumlander, Deniss, 114 Kurtz, Gunnar, 317 Kutty, K.A. Narayanan, 467 Lazar, Alina, 101 Lee, Dong-Ho, 242 Leela, G.H., 256 Liang, Xuejun, 566 Lille, Ü., 540
AUTHOR INDEX
Lopes, Denivaldo, 119 Lu, Hsin-Yi, 485 McCullen, Erik, 268 Machado, Thais C. Sampaio, 399 Mahesar, Quratul-ain, 410 Manes, Gavin W., 172 Manuel, Rosas Salazar Juan, 213 Marana, Aparecido N., 284 Márcia, G. S. Gonçalves, 166 Marcu, Marius, 290 Mariana, Guzmán Ruiz, 213 Mendes, Marília Soares, 399 Merunka, Vojtěch, 300 Mesleh, Abdelwadood Moh’d., 11 Miskovski, Igor, 196 Mogensen, O., 405 Mokhov, Serguei A., 473 Molhanec, Martin, 300 Momani, Mohammad, 343 Moravejian, Reihaneh, 142 Morvan, Franck, 279, 416 Moslehpour, Saeid, 461 Murthy, M.V. Ramana, 428 Murthy, S.G.K., 428 Nabende, Peter, 384 Naghibzadeh, M., 348 Naik, Ratna, 268 Naz, Shaid Ali, 5 Netto, Danilo B.S., 284 Nguyen, Anne, 63 Nourizadeh, Saeed, 305 Nural, M. V., 184 Ogwu, F. J., 85 Olteanu, Alina, 394 Oluwatope, A. O., 85 Öpik, A., 540 Orchard, Jeff, 422 Ostadzadeh, S. Arash, 97 Ostadzadeh, S. Shervin, 455 Owens, R., 509 Patrikar, Rajendra M., 503 Pham, Hanh H., 190 Pilkington, Nick, 360 Pimenidis, Elias, 125 Pinheiro, Plácido Rogério, 39, 399 Pourebrahimi, Behnaz, 97 Pourreza, H.R., 17, 142 Pourreza, Hamid Reza, 17, 142 Pratt, Benjamin, 532 Puliroju, Chandrasekhar, 461
AUTHOR INDEX
Rajan, A.V.S., 33 Ramaswamy, V., 256 Ramaswamy, Srini, 108, 491 Reddy, Y. B., 571 Regalado-Méndez, A., 27 Rejeb, Jalel, 207 Riasat, Aasia, 515, 521 Rimai, Lajos, 268 Rizvi, Syed S., 515, 521 Samad, Mahmoud El, 279 Samadi, Saeed, 147 Sandhu, Kamaljeet, 218, 224, 230, 236 Sanglikar, Mukund A., 366, 371 Sarang, Nita, 366, 371 Sarkar, Rahul, 422 Saydam, Tuncay, 526 Shah, Asadullah, 410 Shao, Xiang, 63 Sherman, Trudy, 536 Shirali-Shahreza, Mohammad, 339 Silva, Neander, 136 Simon, K. Y., 268 Singh, Karandeep, 251 Singh, Ranjana, 433 Singhal, Rekha, 433 Singla, Ravinder Kumar, 68 Singleton, Nathan, 172 Spivey, Christopher L., 461 Srinivasan, Mahalakshmi, 207 Stacey, Deborah, 131
Strand, Lennart, 265 Sumanth, S. Sankar, 467 Swift, Stephen, 497 Swiniarski, Roman W., 447 Syed, Noureen, 422 Taghia, Jalal, 202 Taghia, Jalil, 202 Taha, Z., 79 Tamanini, Isabelle, 399 Tan, Q., 405, 408 Tello-Delgado, D., 27 Thomassen M., 405 Tolun, Mehmet R., 577 Trajanov, Dimitar, 196 Tudor, Dacian, 290 Tzeng, Show-Shiow, 485 Uddin, Moeen, 5 Vamplew, Peter, 323 Wanyama, Tom, 384 Yavari, Abulfazl, 17 Yearwood, John, 323 Zahedi, M.H., 348, 354 Zarei, Bager, 305 Zhang, Chuanlei, 108 Zhang, Linfeng, 268
585
Subject Index
Adaptive control, 373, 376 Ad Hoc On-Demand Distance Vector (AODV), 532 Ad hoc networks, 85, 196–200, 479, 506, 532–535 Admission control, 485–490 Agile, 53, 57, 378–383 Aho-Corasick algorithm, 207, 208, 212 Algorithms, 2, 5, 6, 22, 24, 48, 74–76, 208, 214, 243, 246–248, 256, 281, 282, 285, 286, 348, 349, 361, 419, 468, 476, 560, 573 Arabic Language Text Classification System, 11 Architectural design, 136–139, 536, 538 Arena-M, 317, 319, 320 Attacks ontology, 153 Auctions, 571, 572 Audio processing, 476 Audio signal, 5, 202, 205, 206, 527 Audio source separation, 202, 206 AuInSys, 190–193 Authentication, 305, 307, 410–414, 441 Automated collaborative filtering, 571–572 Automated single camera object tracking system, 245 Automobile event data recorder, 172–177 Autonomic computing, 51–55 Autonomous classification, 190–195 Baugh-Wooley multiplier (BWM), 467, 470, 472 Bayesian belief networks, 385, 386 Biological and supply chain networks, 51, 54 Biometric identifier, 562 Biometric information, 284–289 Biometrics of Cut tree faces, 562–565 Biometric systems, 509, 513, 562 Bisimulation, 33, 36–38 Bluetooth, 184–186, 292, 442–446, 479–484 Bug history data, 108 Bug prediction, 108–111 Business to Business (B2B) Strategy, 547, 548 Business process integration, 125, 127, 129 Buyer-Seller watermarking protocol tree model, 553–556 Calculus of communicating systems (CCS), 33 Cancer patient, 405–407 Capacitance voltage (CV), 270 Cartesian parallel manipulator (CPM), 90–96 CAT (Computer Aided Transcription), 101, 102
Causal analysis and resolution (CAR), 166, 167, 170, 171 Cellular automata (CA), 242 Channel spacing, 1, 2 Chromium (VI) reduction, 28 Cipher message, 428 Cipher text, 256, 284, 555 Civil Procedure, 175 Classification, 11–15, 144, 190–195, 213, 474 Classification of grain, 213 Cliques, 74–78 Cluster ensembles, 323, 325–327 Clustering, 118, 153, 190–195, 202–204, 261, 323–327, 408, 503–505, 529 CMMI, 166, 167, 171, 367, 368, 371 Cognitive radio, 571–573, 576 Coherent data acquisition, 25, 26 Color textured image segmentation, 323–328 Color and typefaces, 265–267 Communication protocol, 290, 318, 410, 433, 480 Computer architecture, 566–568, 570 Computer attacks, 153 Computer simulation, 177 Computer vision, 147, 213–217, 325 Concatenative synthesis, 261–264 Conductance-voltage (GV), 268, 269, 271 Congestion avoidance, 85–89, 481 Context-aware services, 184–189 Control gain, 28 Convolutive mixture, 202–206 Cooperation, 147, 148, 151, 491–496 CPU Scheduling, 348, 349, 353, 570 CPU time consumption, 207, 212 Crash Reconstruction, 172, 173 Cross-sectional plane, 422–424, 426, 427 Cryptographic key, 284, 287, 289 Cryptography, 48, 64, 65, 129, 242, 256, 284, 285, 287, 308, 339, 428–432 Curvelet transform, 142–144 Data-oriented XML, 390–393 Decomposition/integration activities, 577, 581 Decryption, 253, 256–259, 284, 285, 309, 428, 429, 431, 556 Defect management activities, 68 Defect tracking systems, 68–70, 72, 73, 108, 112 Dense-reader mode, 521, 522 58 7
588
SUBJECT INDEX
DES, 308, 309 DFS, 305, 306, 308, 396–398 Diagnoses, 389 Digital forensics, 172, 175–177 Digital sampler, 21–26 Digital Television – DTV, 39, 399, 403 Discrete wavelet transforms (DWT), 467, 468–472, 503–505, 508 Distortion power, 21 Distributed hash tables (DHT), 279 Distributed intrusion detection system, 153–158 DWT, 467, 468–472, 503–505, 508 2D-3D Ear and Face, 509 Effort variance, 366–371 e-Learning, 114–118 Electrochemical reactor, 27–29, 31 Electronic Services Acceptance Model (E-SAM), 225, 230, 232 Embedded systems, 291, 505, 536–539 Empirical mode decomposition (EMD), 202, 203, 206 Encryption, 28, 65, 66, 129, 251, 252, 256, 257, 259, 260, 284, 285, 287, 307–309, 411, 428–430, 431, 483, 553 Enterprise architecture, 295, 455–460 Enterprise framework, 455 Enterprise models, 295–299 e-Services, 218–235 e-Services acceptance model, 218–223 Ethernet, 85, 178, 179, 182, 433 Evaluation, 11–14, 18, 21–23, 26, 39, 40–45, 47, 48, 125–130, 136–140, 150, 152, 167, 169, 170, 184, 188, 216, 220, 222, 234, 256, 264, 281, 290–294 Evidence Production, 172 Exchanged in XML, 390 Expectation Maximization (EM), 323, 324, 326 Expert systems, 119–124, 354, 384, 389 Exploration of volumetric datasets, 422–427 Extrapolation accuracy, 273–278 Face detection, 447, 449, 451–453, 509, 510 Face recognition, 115, 118, 447–454, 511, 512 Fault recovery, 305 Fault tolerance, 189, 200, 305, 306 Feature coding, 339, 340 Feature selection, 11–16, 513 Feistel network, 256 Filtering, 5, 11, 13, 21, 57, 79, 150, 202, 203, 405–408, 476, 527, 559, 560, 571, 576 Fingerprint method, 311–316 FIR filter, 21, 204, 416, 467–469, 472 Four wave mixing (FWM), 1 Foveated images, 17–20
FPGA, 516, 558–561, 569 Frame differencing and dynamic template matching, 245–250 Free/Open Source Software Projects, 68–73 Frequency control, 372 Frequency insensitive measurement, 21, 24 FRRCS, 348–353 Function, 102, 179, 203, 208, 209, 256–260, 314, 368 Fuzzy rule base, 348, 353, 356 Fuzzy sets, 349, 354 Gaussian trust, 343–347 Gene selection, 405–409 Genetic algorithm, 571, 573, 576 Graphics processing units, 360–365 Grid computing, 196–201 Grid environment, 86, 97, 100, 196, 197, 200, 279–283 GSM, 184, 186, 290, 292, 294 Hand-held rectangular panel, 422, 423 Hands-on assignments, 566–570 2D Haar Wavelets, 448, 451 History-based pricing mechanism, 97–100 HTML, 101, 102, 104–106, 342, 390 Image Processing, 1, 31, 148, 152, 447, 558–561 Independent component analysis (ICA), 202, 204, 448 Inference engine, 120, 354, 356, 357 Information retrieval (IR), 11, 329 Information security, 339, 340, 428 Information technology, 63, 226, 230, 378, 384, 455 Injection attacks, 207–212 Innovation, 63, 65, 68, 119–124, 218, 220, 224, 230, 236, 238, 400 Institute’s Architecture Tradeoff Analysis Model (ATAM), 536 Intelligent systems, 221, 526 Interference, 1, 182, 424, 455, 479, 492, 504, 521–525 Intrinsic mode functions (IMFs), 202, 206 Intrusion detection system, 79, 153–158, 207 IP communication, 5, 294 Iscsi, 433–440 Java refactorings, 497–502 Key distribution, 308, 428–431 Labeled Transition System (LTS), 33, 34 Lagrangian multiplier, 90, 96 Language model, 159, 160, 162–164, 297
SUBJECT INDEX
Large scale query optimization, 279–283 Legal transcript analysis, 101–107 Linear feedback shift register (LFSR), 242 Linear and nonlinear trends, 273–278 Line coverage and optimization of test suites, 57 Listen and wait protocol, 521, 524 Load balancing, 305 Locating relational data sources, 416–421 LPL (License-Plate Line), 147 LPR (License-Plate Rectangle), 147 Macro-averaging F1 measure, 11 Macro-averaging precision, 11, 15 Macro-averaging recall, 14, 15 Maple, 45–49 MARF, 473–478 Market-based Grids, 97–100 Matching of a stump photograph, 562 Maximum a posteriori (MAP), 323, 324 Medical information system, 119–124 MesoDyn simulation code, 540–546 Microarray studies, 405–409 Micropayment System, 63–67 MIS hydrogen sensor, 268–272 Mobile computing, 119–124 Mobile device, 40, 44, 119, 121, 122, 124, 184, 186, 188, 290, 291, 294, 319, 320, 399, 403, 441–446 Mobile digital TV, 39, 40, 42 Mobile technologies, 317–322 Mobile television applications, 399–404 Model driven architecture, 302, 455, 457 Modeling, 29, 39, 79, 85–89, 90–96, 98, 138, 139, 154, 242, 262, 263, 295–299 Modeling and Simulation, 85–89 MRF Model, 323–328 Multi agent-based systems, 79 Multi-agent systems (MAS), 491 Multi-agent Vision-based System, 147–152 Multicriteria, 39–44 Network control systems, 178–183 Neural networks, 21, 368, 450, 451, 452, 454, 503 Noise subtraction, 5–10 Non-real-time services, 485–490 Object normalization, 300–304 Object-oriented database design, 300–304 Ontology, 153–158 Optimization methods, 526 Organization, 11, 54, 68, 119, 125, 127, 148, 166–171, 190, 218, 220, 222, 242, 279, 295–298, 317, 329, 354, 366–368, 380, 401, 403, 416, 417, 491, 536, 566, 577, 578 OSI reference model, 479, 480
589
P2P environment, 280, 416, 417, 418 P-array, 256 Pattern recognition, 13, 116, 449, 450, 473, 476, 477, 478 PCV (Plate-Candidates Verification), 147 PD control, 90–95, 374 PEDT/PSS complex, 540–546 Peer to peer (P2P), 279 Performance model, 126, 366, 367, 515–520 Persian/Arabic Text, 339–342 Phase shift, 6, 21–23, 26 Plaintext, 256 Podcast, 265–267 Power efficiency, 290–294 Power system, 21, 176, 372–377 Predicting survival outcomes, 405–409 Prevention of Software Piracy, 251–255 Principle component analysis (PCA), 202, 204 Problems analyses, 166 Process performance, 167, 366, 367 Profile exchange, 441–446 Programs code coverage, 57–62 Project cost estimation, 577 Project Management Body of Knowledge (PMBOK), 378, 379 Project management processes, 378 Protecting Medical Images, 284–289 PSpice, 461–466 Ptolemy-II, 503, 506 Public key encryption, 553 Pure-interest rate model, 394 Pure-volatility model, 394 Quality control, 213–217 Quantum cryptography, 428, 430, 431 Reactive power measurement, 21, 26 Real-time constraints, 184 Real world images, 17, 19 Reconfigurable coprocessors, 515 Reconfigurable hardware, 515 Recursive development, 577–581 Registration mechanism, 251 Rejal Science and Hadith Science, 354 Reputation system, 343–347 Requirements engineering, 317–322 Research, 11, 13, 21, 27, 39, 44, 54, 55, 68, 69, 72, 73, 79, 85, 86, 97, 108, 109, 114, 116, 119, 120, 138, 147, 153, 154, 162, 178–183 Resource allocation, 97–100 Resource discovery and selection, 279–283 Retinal prosthetic, 558 RFID, 446, 521–525 Routing Table Instability, 532–535
590
SUBJECT INDEX
Routing protocol, 196, 197, 199, 503, 505, 508, 532, 533 RRCS, 348–353 Rule base System, 348, 354, 356 S-box, 256, 257 Scientific data verification, 311–316 Secret-key, 256–308 Secure distribution, 251–255 Security, 48, 50, 63, 64, 65 Selection of functional tests, 57 Self-formation of collections, 190–195 Semantic grid, 547–552 Sensing mechanism, 268–272 Sensor networks, 290, 343–347 Serial manipulator, 90, 96 Service deployment, 196 Service-oriented architecture, 295–299 Service-oriented context-awareness, 184–189 Signal processing, 21, 22, 203 Simplify massive complex circuits, 461 Six Sigma’s DMAIC methodology, 166 SOA, 119–124 Soft Biometrical Students Identification, 114–118 Software bug estimation, 108–113 Software development projects, 166–171 Software engineering, 45, 68, 108, 297, 298, 473–478 Software implementation, 256, 425, 467, 515 Software maintenance projects, 366 Software quality attributes (QAs), 536 Sorting, 242, 360–365 Speech recognition systems, 261 Speech synthesiser, 261–264 SQL query processing, 416 SSL, 291, 305, 307, 308, 310, 410–412, 414 Stability, 28, 88, 112, 178, 182, 183, 372, 378, 534, 535 State transition models, 295, 297 Storage and retrieval technique, 329–338 Structured operational semantics (SOS), 33 Support vector machines, 11–16 Symantec document segmentation (SDS), 329–338 Symbolic mathematics package, 45 Synchronous Calculus of Communicating Systems (SCCS), 33 System development, 236–241 Systolic architecture (SA), 467 TCP/IP, 479–484 Technology acceptance model (TAM), 218, 224, 231, 236
Telemedicine, 119–124 Temporal parallel image processing, 558–561 Termination probability, 485, 486, 490 Test oracle, 45, 46, 49 Test taxonomy, 497–502 Text steganography, 339–342 Thin film, 270, 271, 540–546 Threshold, 13, 14, 19, 88, 99, 142–146 Traffic monitoring, 131, 142 Transactions, 45, 63, 64, 66, 126, 127 Transferable lessons, 51–56 Transmission systems, 1 Uncertainty, 53, 221, 323, 343, 384–386 Underlying trend, 273–278 Undirected graph, 74–78 Unequal-spaced channel-allocation, 1–4 UNF, 311–315 Unicode standard, 339, 341 Usability, 39–41, 44, 314, 320, 399–401, 403, 446, 536 User motivation, 224, 225, 227, 228, 231, 233, 234 User–service interactions, 441, 444 Valuing long-term forwards, 394–398 Vector machines, 11–16 Vehicle detection, 142, 145, 147, 148, 152 Vehicle recognition, 142–146 Verbal decision analysis, 399–404 VHDL, 461–466 Video streaming, 526–531 Vision systems, 131, 245 Visual attention system, 17 Visual sense, 503, 506 VLSI application, 242–244 Voting scheme, 159, 160, 162 Walsh function, 21–23 Water dispersion, 540–546 Wavelength-division multiplexing (WDM), 1 WebCoin, 63, 65, 66 Web services, 119, 125, 126, 129, 184, 187, 188, 189, 295, 296, 410–415 Wireless communication applications, 290–294 Wireless data networks, 485–490 WLAN, 184–189, 290–294, 320 Worm detection, 79 WSN, 343, 503, 504, 506, 508 XML, 101, 102, 106, 125, 184, 189, 301, 390–393, 410–415, 442, 474, 549