Grid and cooperative computing: second international workshop, GCC 2003, Shanghai, China, December 7-10, 2003: revised papers, part II

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris...

Author: Minglu Li | Xian-He Sun | Qianni Deng | Jun Ni

9 downloads 1108 Views 35MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen University of Dortmund, Germany Madhu Sudan Massachusetts Institute of Technology, MA, USA Demetri Terzopoulos New York University, NY, USA Doug Tygar University of California, Berkeley, CA, USA Moshe Y. Vardi Rice University, Houston, TX, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany

3033

Springer Berlin Heidelberg New York Hong Kong London Milan Paris Tokyo

Minglu Li Xian-He Sun Qianni Deng Jun Ni (Eds.)

Grid and Cooperative Computing Second International Workshop, GCC 2003 Shanghai, China, December 7-10, 2003 Revised Papers, Part II

Springer

eBook ISBN: Print ISBN:

3-540-24680-0 3-540-21993-5

©2005 Springer Science + Business Media, Inc. Print ©2004 Springer-Verlag Berlin Heidelberg All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America

Visit Springer's eBookstore at: and the Springer Global Website Online at:

http://ebooks.springerlink.com http://www.springeronline.com

Preface

Grid and cooperative computing has emerged as a new frontier of information technology. It aims to share and coordinate distributed and heterogeneous network resources for better performance and functionality that can otherwise not be achieved. This volume contains the papers presented at the 2nd International Workshop on Grid and Cooperative Computing, GCC 2003, which was held in Shanghai, P.R. China, during December 7–10, 2003. GCC is designed to serve as a forum to present current and future work as well as to exchange research ideas among researchers, developers, practitioners, and users in Grid computing, Web services and cooperative computing, including theory and applications. For this workshop, we received over 550 paper submissions from 22 countries and regions. All the papers were peer-reviewed in depth and qualitatively graded on their relevance, originality, significance, presentation, and the overall appropriateness of their acceptance. Any concerns raised were discussed by the program committee. The organizing committee selected 176 papers for conference presentation (full papers) and 173 submissions for poster presentation (short papers). The papers included herein represent the forefront of research from China, USA, UK, Canada, Switzerland, Japan, Australia, India, Korea, Singapore, Brazil, Norway, Greece, Iran, Turkey, Oman, Pakistan and other countries. More than 600 attendees participated in the technical section and the exhibition of the workshop. The success of GCC 2003 was made possible by the collective efforts of many people and organizations. We would like to express our special thanks to the Ministry of Education of P.R. China and the municipal government of Shanghai. We also thank IBM, Intel, Platform, HP, Dawning and Lenovo for their generous support. Without the extensive support from many communities, we would not have been able to hold this successful workshop. Moreover, our thanks go to Springer-Verlag for its assistance in putting the proceedings together. We would like to take this opportunity to thank all the authors, many of whom traveled great distances to participate in this workshop and make their valuable contributions. We would also like to express our gratitude to the program committee members and all the other reviewers for the time and work they put into the thorough review of the large number of papers submitted. Last, but not least, our thanks also go to all the workshop staff for the great job they did in making the local arrangements and organizing an attractive social program. December 2003

Minglu Li, Xian-He Sun Qianni Deng, Jun Ni

This page intentionally left blank

Conference Committees

Honorary Chair Qinping Zhao (MOE, China)

Steering Committee Guojie Li (CCF, China) Weiping Shen (Shanghai Jiao Tong University, China) Huanye Sheng (Shanghai Jiao Tong University, China) Zhiwei Xu (IEEE Beijing Section, China) Liang-Jie Zhang (IEEE Computer Soceity, USA) Xiaodong Zhang (NSF, USA)

General Co-chairs Minglu Li (Shanghai Jiao Tong University, China) Xian-He Sun (Illinois Institute of Technology, USA)

Program Co-chairs Qianni Deng (Shanghai Jiao Tong University, China) Jun Ni (University of Iowa, USA)

Panel Chair Hai Jin (Huazhong University of Science and Technology, China)

VIII

Conference Committees

Program Committee Members Yaodong Bi (University of Scranton, USA) Wentong Cai (Nanyang Technological University, Singapore) Jian Cao (Shanghai Jiao Tong University, China) Jiannong Cao (Hong Kong Polytechnic University, China) Guo-Liang Chen (University of Science and Technology of China, China) Jian Chen (South Australia University, Australia) Xuebin Chi (Computer Network Information Center, CAS, China) Qianni Deng (Shanghai Jiao Tong University, China) Xiaoshe Dong (Xi’an Jiao Tong University, China) Joseph Fong (City University of Hong Kong) Yuxi Fu (Shanghai Jiao Tong University, China) Guangrong Gao (University of Delaware, Newark, USA) Yadong Gui (Shanghai Supercomputing Center, China) Minyi Guo (University of Aizu, Japan) Jun Han (Swinburne University of Technology, Australia) Yanbo Han (Institute of Computing Technology, CAS, China) Jinpeng Huai (Beihang University, China) Weijia Jia (City University of Hong Kong) ChangJun Jiang (Tongji University, China) Hai Jin (Huazhong University of Science and Technology, China) Francis Lau (University of Hong Kong) Keqin Li (State University of New York, USA) Minglu Li (Shanghai Jiao Tong University, China) Qing Li (City University of Hong Kong) Xiaoming Li (Peking University, China) Xinda Lu (Shanghai Jiao Tong University, China) Junzhou Luo (Southeast University, China) Fanyuan Ma (Shanghai Jiao Tong University, China) Dan Meng (Institute of Computing Technology, CAS, China) Xiangxu Meng (Shandong University, China) Jun Ni (University of Iowa, USA) Lionel M. Ni (Hong Kong University of Science & Technology) Yi Pan (Georgia State University, USA) Depei Qian (Xi’an Jiao Tong University, China) Yuzhong Qu (Southeast University, China) Hong Shen (Advanced Institute of Science & Technology, Japan) Xian-He Sun (Illinois Institute of Technology, USA) Huaglory Tianfield (Glasgow Caledonian University, UK) Weiqin Tong (Shanghai University, China) Cho-Li Wang (University of Hong Kong) Frank Wang (London Metropolitan University, UK) Jie Wang (Stanford University, USA) Shaowen Wang (University of Iowa, USA) Xingwei Wang (Northeastern University, China)

Conference Committees

Jie Wu (Florida Atlantic University, USA) Zhaohui Wu (Zhejiang University, China) Nong Xiao (National University of Defense Technology, China) Xianghui Xie (Jiangnan Institute of Computing Technology, China) Chengzhong Xu (Wayne State University, USA) Zhiwei Xu (Institute of Computing Technology, CAS, China) Guangwen Yang (Tsinghua University, China) Laurence Tianruo Yang (St. Francis Xavier University, Canada) Qiang Yang (Hong Kong University of Science & Technology) Jinyuan You (Shanghai Jiao Tong University, China) Haibiao Zeng (Sun Yat-Sen University, China) Ling Zhang (South China University of Technology, China) Xiaodong Zhang (NSF, USA and College of William and Mary, USA) Wu Zhang (Shanghai University, China) Weimin Zheng (Tsinghua University, China) Aoying Zhou (Fudan University, China) Wanlei Zhou (Deakin University, Australia) Jianping Zhu (University of Akron, USA) Hai Zhuge (Institute of Computing Technology, CAS, China)

Organization Committee Xinda Lu (Chair) (Shanghai Jiao Tong University, China) Jian Cao (Shanghai Jiao Tong University, China) Ruonan Rao (Shanghai Jiao Tong University, China) Meiju Chen (Shanghai Jiao Tong University, China) An Yang (Shanghai Jiao Tong University, China) Zhihua Su (Shanghai Jiao Tong University, China) Feilong Tang (Shanghai Jiao Tong University, China) Jiadi Yu (Shanghai Jiao Tong University, China)

IX

X

Conference Committees

Will Globus dominate Grid computing as Windows dominated in PCs? If not, what will the next Grid toolkits looks like? Panel Chair Hai Jin, Huazhong University of Science and Technology, China [email protected]

Panelists Wolfgang Gentzsch, Sun Microsystems, Inc., USA [email protected] Satoshi Matsuoka, Tokyo Institute of Technology, Japan [email protected] Carl Kesselman, University of Southern California, USA [email protected] Andrew A. Chien, University of California at San Diego, USA achien @ ucsd.edu Xian-He Sun, Illinois Institute of Technology, USA [email protected] Richard Wirt, Intel Corporation, USA [email protected] Zhiwei Xu, Institute of Computing Technology, CAS, China [email protected] Francis Lau, University of Hong Kong [email protected] Huaglory Tianfield, Glasgow Caledonian University, UK [email protected]

Table of Contents, Part II

Session 6: Advanced Resource Management, Scheduling, and Monitoring Synthetic Implementations of Performance Data Collection in Massively Parallel Systems Chu J. Jong, Arthur B. Maccabe

1

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid Chuan He, Zhihui Du, San-li Li

10

A Parallel Branch–and–Bound Algorithm for Computing Optimal Task Graph Schedules Udo Hönig, Wolfram Schiffmann

18

Selection and Advanced Reservation of Backup Resources for High Availability Service in Computational Grid Chunjiang Li, Nong Xiao, Xuejun Yang

26

An Online Scheduling Algorithm for Grid Computing Systems Hak Du Kim, Jin Suk Kim

34

A Dynamic Job Scheduling Algorithm for Computational Grid Jian Zhang, Xinda Lu

40

An Integrated Management and Scheduling Scheme for Computational Grid Ran Zheng, Hai Jin Multisite Task Scheduling on Distributed Computing Grid Weizhe Zhang, Hongli Zhang, Hui He, Mingzeng Hu Adaptive Job Scheduling for a Service Grid Using a Genetic Algorithm Yang Gao, Hongqiang Rang, Frank Tong, Zongwei Luo, Joshua Huang

48

57

65

Resource Scheduling Algorithms for Grid Computing and Its Modeling and Analysis Using Petri Net Yaojun Han, Changjun Jiang, You Fu, Xuemei Luo

73

Architecture of Grid Resource Allocation Management Based on QoS Xiaozhi Wang, Junzhou Luo

81

XII

Table of Contents, Part II

An Improved Ganglia-Like Clusters Monitoring System Wenguo Wei, Shoubin Dong, Ling Zhang, Zhengyou Liang Effective OpenMP Extensions for Irregular Applications on Cluster Environments Minyi Guo, Jiannong Cao, Weng-Long Chang, Li Li, Chengfei Liu

89

97

A Scheduling Approach with Respect to Overlap of Computing and Data Transferring in Grid Computing Changqin Huang, Yao Zheng, Deren Chen

105

A Deadline and Budget Constrained Cost-Time Optimization Algorithm for Scheduling Dependent Tasks in Grid Computing Haolin Feng, Guanghua Song, Yao Zheng, Jun Xia

113

A Load Balancing Algorithm for Web Based Server Grids Shui Yu, John Casey, Wanlei Zhou Flexible Intermediate Library for MPI-2 Support on an SCore Cluster System Yuichi Tsujita Resource Management and Scheduling in Manufacturing Grid Lilan Liu, Tao Yu, Zhanbei Shi, Minglun Fang A New Task Scheduling Algorithm in Distributed Computing Environments Jian-Jun Han, Qing-Hua Li

121

129 137

141

GridFerret: Grid Monitoring System Based on Mobile Agent Juan Fang, Shu-Jie Zhang, Rui-Hua Di, He Huang

145

Grid-Based Resource Management of Naval Weapon Systems Bin Zeng, Tao Hu, ZiTang Li

149

A Static Task Scheduling Algorithm in Grid Computing Dan Ma, Wei Zhang

153

A New Agent-Based Distributed Model of Grid Service Advertisement and Discovery Dan Ma, Wei Zhang, Hong-jun Zhang

157

IMCAG: Infrastructure for Managing and Controlling Agent Grid Jun Hu, Ji Gao

161

A Resource Allocation Method in the Neural Computation Platform Zhuo Lai, Jiangang Yang, Hongwei Shan

166

Table of Contents, Part II

An Efficient Clustering Method for Retrieval of Large Image Databases Yu-Xiang Xie, Xi-Dao Luan, Ling-Da Wu, Song-Yang Lao, Lun-Guo Xie

XIII

170

Research on Adaptable Replication Protocol Dong Zhao, Ya-wei Li, Ming-Tian Zhou

174

Co-operative Monitor Web Page Based on MD5 Guohun Zhu, YuQing Miao

179

Collaboration-Based Architecture of Flexible Software Configuration Management System Ying Ding, Weishi Zhang, Lei Xu The Research of Mobile Agent Security Xiaobin Li, Aijuan Zhang, Jinfei Sun, Zhaolin Yin

183 187

Research of Information Resources Integration and Shared in Digital Basin Xiaofeng Zhou, Zhijian Wang, Ping Ai

191

Scheduling Model in Global Real-Time High Performance Computing with Network Calculus Yafei Hou, Shi Yong Zhang, YiPing Zhong

195

CPU Schedule in Programmable Routers: Virtual Service Queuing with Feedback Algorithm Tieying Zhu

199

Research on Information Platform of Virtual Enterprise Based on Web Services Technology Chao Young, Jiajin Le

203

A Reliable Grid Messaging Service Based on JMS Ruonan Rao, Xu Cai, Ping Hao, Jinyuan You A Feedback and Investigation Based Resources Discovery and Management Model on Computational Grid Peng Ji, Junzhou Luo

207

211

Moment Based Transfer Function Design for Volume Rendering Huawei Hou, Jizhou Sun, Jiawan Zhang

215

Grid Monitoring and Data Visualization Yi Chi, Shoubao Yang, Zheng Feng

219

An Economy Driven Resource Management Architecture Based on Mobile Agent Peng Wan, Wei-Yong Zhang, Tian Chen

223

XIV

Table of Contents, Part II

Decentralized Computational Market Model for Grid Resource Management Qianfei Fu, Shoubao Yang, Maosheng Li, Junmao Zhun A Formal Data Model and Algebra for Resource Sharing in Grid Qiujian Sheng, Zhongzhi Shi An Efficient Load Balance Algorithm in Cluster-Based Peer-to-Peer System Ming-Hong Shi, Yong-Jun Luo, Ying-Cai Bai

227 231

236

Resource Information Management of Spatial Information Grid Deke Guo, Honghui Chen, Xueshan Luo

240

An Overview of CORBA-Based Load Balancing Jian Shu, Linlan Liu, Shaowen Song

244

Intelligence Balancing for Communication Data Management in Grid Computing Jong Sik Lee

250

On Mapping and Scheduling Tasks with Synchronization on Clusters of Machines Bassel R. Arafeh

254

An Efficient Load Balancing Algorithm on Distributed Networks Okbin Lee, Sangho Lee, Ilyong Chung

259

Session 7: Network Communication and Information Retrieval Optimal Methods for Object Placement in En-Route Web Caching for Tree Networks and Autonomous Systems Keqiu Li, Hong Shen

263

A Framework of Tool Integration for Internet-Based E-commerce Jianming Yong, Yun Yang

271

Scalable Filtering of Well-Structured XML Message Stream Weixiong Rao, Yingjian Chen, Xinquan Zhang, Fanyuan Ma

279

Break a New Ground on Programming in Web Client Side Jianjun Zhang, Mingquan Zhou

287

An Adaptive Mixing Audio Gateway in Heterogeneous Networks for ADMIRE System Tao Huang, Xiangning Yu Kernel Content-Aware QoS for Web Clusters Zeng-Kai Du, Jiu-bin Ju

294 303

Table of Contents, Part II

A Collaborative Multimedia Authoring System Mee Young Sung, Do Hyung Lee

XV

311

Research of Satisfying Atomic and Anonymous Electronic Commerce Protocol Jie Tang, Juan-Zi Li, Ke-Hong Wang, Yue-Ru Cai

319

Network Self-Organizing Information Exploitation Model Based on GCA Yujun Liu, Dianxun Shuai, Weili Han

327

Admire – A Prototype of Large Scale E-collaboration Platform Tian Jin, Jian Lu, XiangZhi Sheng

335

A Most Popular Approach of Predictive Prefetching on a WAN to Efficiently Improve WWW Response Times Christos Bouras, Agisilaos Konidaris, Dionysios Kostoulas

344

Applications of Server Performance Control with Simple Network Management Protocol Yijiao Yu, Qin Liu, Liansheng Tan

352

Appcast – A Low Stress and High Stretch Overlay Protocol V. Radha, Ved P Gulati, Arun K Pujari

360

Communication Networks: States of the Arts Xiaolu Zuo

372

DHCS: A Case of Knowledge Share in Cooperative Computing Environment Shui Yu, Le Yun Pan, Futai Zou, Fan Yuan Ma

380

Improving the Performance of Equalization in Communication Systems Wanlei Zhou, Hua Ye, Lin Ye

388

Moving Communicational Supervisor Control System Based on Component Technology Song Yu, Yan-Rong Jie

396

A Procedure Search Mechanism in OGSA-Based GridRPC Systems Yue-zhuo Zhang, Yong-zhong Huang, Xin Chen

400

An Improved Network Broadcasting Method Based on Gnutella Network Zupeng Li, Xiubin Zhao, Daoyin Huang, Jianhua Huang

404

Some Conclusions on Cayley Digraphs and Their Applications to Interconnection Networks Wenjun Xiao, Behrooz Parhami

408

XVI

Table of Contents, Part II

Multifractal Characteristic Quantities of Network Traffic Models Donglin Liu, Dianxun Shuai

413

Network Behavior Analysis Based on a Computer Network Model Weili Han, Dianxun Shuai, Yujun Liu

418

Cutting Down Routing Overhead in Mobile Ad Hoc Networks Jidong Zhong, Shangteng Huang

422

Improving Topology-Aware Routing Efficiency in Chord Dongfeng Chen, Shoubao Yang

426

Two Extensions to NetSolve System Jianhua Chen, Wu Zhang, Weimin Shao

430

A Route-Based Composition Language for Service Cooperation Jianguo Xing

434

To Manage Grid Using Dynamically Constructed Network Management Concept: An Early Thought Zhongzhi Luan, Depei Qian, Weiguo Wu, Tao Liu

438

Design of VDSL Networks for the High Speed Internet Services Hyun Yoe, Jaejin Lee

442

The Closest Vector Problem on Some Lattices Haibin Kan, Hong Shen, Hong Zhu

446

Proposing a New Architecture for Adaptive Active Network Control and Management System Mahdi Jalili-Kharaajoo, Alireza Dehestani, Hassan Motallebpour A Path Based Internet Cache Design for GRID Application Hyuk Soo Jang, Kyong Hoon Min, Won Seok Jou, Yeonseung Ryu, Chung Ki Lee, Seok Won Hong On the Application of Computational Intelligence Methods on Active Networking Technology Mahdi Jalili-Kharaajoo

450 455

459

Session 8: Grid QoS Grid Computing for the Masses: An Overview Kaizar Amin, Gregor von Laszewski, Armin R. Mikler A Multiple-Neighborhoods-Based Simulated Annealing Algorithm for Timetable Problem He Yan, Song-Nian Yu

464

474

Table of Contents, Part II

Lattice Framework to Implement OGSA: Its Constructs and Composition Scenario Hui Liu, Minglu Li, Jiadi Yu, Lei Cao, Ying Li, Wei Jin, Qi Qian

XVII

482

Moving Grid Systems into the IPv6 Era Sheng Jiang, Piers O’Hanlon, Peter Kirstein

490

MG-QoS: QoS-Based Resource Discovery in Manufacturing Grid Zhanbei Shi, Tao Yu, Lilan Liu

500

An Extension of Grid Service: Grid Mobile Service Wei Zhang, Jun Zhang, Dan Ma, Benli Wang, Yun Tao Chen

507

Supplying Instantaneous Video-on-Demand Services Based on Grid Computing Xiao-jian He, Xin-huai Tang, Jinyuan You

513

A Grid Service Lifecycle Management Scheme Jie Qiu, Haiyan Yu, Shuoying Chen, Li Cha, Wei Li, Zhiwei Xu

521

An OGSA-Based Quality of Service Framework Rashid Al-Ali, Kaizar Amin, Gregor von Laszewski, Omer Rana, David Walker

529

A Service Management Scheme for Grid Systems Wei Li, Zhiwei Xu, Li Cha, Haiyan Yu, Jie Qiu, Yanzhe Zhang

541

A QoS Model for Grid Computing Based on DiffServ Protocol Wandan Zeng, Guiran Chang, Xingwei Wang, Shoubin Wang, Guangjie Han, Xubo Zhou

549

Design and Implementaion of a Single Sign-On Library Supporting SAML (Security Assertion Markup Language) for Grid and Web Services Security Dongkyoo Shin, Jongil Jeong, Dongil Shin Performance Improvement of Information Service Using Priority Driven Method Minji Lee, Wonil Kim, Jai-Hoon Kim

557

565

HH-MDS: A QoS-Aware Domain Divided Information Service Deqing Zou, Hai Jin, Xingchang Dong, Weizhong Qiang, Xuanhua Shi

573

Grid Service Semigroup and Its Workflow Model Yu Tang, Haifang Zhou, Kaitao He, Luo Chen, Ning Jing

581

A Design of Distributed Simulation Based on GT3 Core Tong Zhang, Chuanfu Zhang, Yunsheng Liu, Yabing Zha

590

XVIII

Table of Contents, Part II

A Policy-Based Service-Oriented Grid Architecture Xiangli Qu, Xuejun Yang, Chunmei Gui, Weiwei Fan

597

Adaptable QOS Management in OSGi-Based Cooperative Gateway Middleware Wei Liu, Zhang-long Chen, Shi-liang Tu, Wei Du

604

Design of an Artificial-Neural-Network-Based Extended Metacomputing Directory Service Haopeng Chen, Baowen Zhang

608

Session 9: Algorithm, Economic Model, Theoretical Model of the Grid Gridmarket: A Practical, Efficient Market Balancing Resource for Grid and P2P Computing Ming Chen, Guangwen Yang, Xuezheng Liu

612

A Distributed Approach for Resource Pricing in Grid Environments Chuliang Weng, Xinda Lu, Qianni Deng

620

Application Modelling Based on Typed Resources Cheng Fu, Jinyuan You

628

A General Merging Algorithm Based on Object Marking Jinlei Jiang, Meilin Shi

636

Charging and Accounting for Grid Computing System Zhengyou Liang, Ling Zhang, Shoubin Dong, Wenguo Wei

644

Integrating New Cost Model into HMA-Based Grid Resource Scheduling Jun-yan Zhang, Fan Min, Guo-wei Yang

652

CoAuto: A Formal Model for Cooperative Processes Jinlei Jiang, Meilin Shi

660

A Resource Model for Large-Scale Non-hierarchy Grid System Qianni Deng, Xinda Lu, Li Chen, Minglu Li

669

A Virtual Organization Based Mobile Agent Computation Model Yong Liu, Cong-fu Xu, Zhaohui Wu, Wei-dong Chen, Yun-he Pan

677

Modeling Distributed Algorithm Using B Shengrong Zou

683

Multiple Viewpoints Based Ontology Integration Kai Zhang, Yunfa Hu, Yu Wang

690

Table of Contents, Part II

XIX

Automated Detection of Design Patterns Zhixiang Zhang, Qing-Hua Li

694

Research on the Financial Information Grid Jiyue Wen, Guiran Chang

698

RCACM: Role-Based Context-Awareness Coordination Model for Mobile Agent Applications Xin-huai Tang, Yaying Zhang, Jinyuan You

702

A Model for Locating Services in Grid Environment Erfan Shang, Zhihui Du, Mei Chen

706

A Grid Service Based Model of Virtual Experiment Liping Shen, Yonggang Fu, Ruimin Shen, Minglu Li

710

Accounting in the Environment of Grid Society Jiulong Shan, Huaping Chen, GuoLiang Chen, Haitao Tian, Xin Chen

715

A Heuristic Algorithm for Minimum Connected Dominating Set with Maximal Weight in Ad Hoc Networks Xinfang Yan, Yugeng Sun, Yanlin Wang Slice-Based Information Flow Graph Wan-Kyoo Choi, Il-Yong Chung

719 723

Session 10: Semantic Grid and Knowledge Grid Semantic Rule Service Model: Enabling Intelligence on Grid Architecture Qi Gao, HuaJun Chen, Zhaohui Wu, WeiMing Lin CSCW in Design on the Semantic Web Dazhou Kang, Baowen Xu, Jianjiang Lu, Yingzhou Zhang SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity – The P2P Meets the Semantic Web Le Yun Pan, Liang Zhang, Fanyuan Ma

727 736

744

SkyEyes: A Semantic Browser for the KB-Grid Yuxin Mao, Zhaohui Wu, HuaJun Chen

752

Toward the Composition of Semantic Web Services Jinghai Rao, Xiaomeng Su

760

A Viewpoint of Semantic Description Framework for Service Yuzhong Qu

768

XX

Table of Contents, Part II

A Novel Approach to Semantics-Based Exception Handling for Service Grid Applications Donglai Li, Yanbo Han, Haitao Hu, Jun Fang, Xue Wang

778

A Semantic-Based Web Service Integration Approach and Tool Hai Zhuge, Jie Liu, Lianhong Ding, Xue Chen

787

A Computing Model for Semantic Link Network Hai Zhuge, Yunchuan Sun, Jie Liu, Xiang Li

795

A Semantic Web Enabled Mediator for Web Service Invocation Lejun Zhu, Peng Ding, Huanye Sheno

803

A Data Mining Algorithm Based on Grid Xue-bai Zang, Xiong-fei Li, Kun Zhao, Xin Guan

807

Prototype a Knowledge Discovery Infrastructure by Implementing Relational Grid Monitoring Architecture (R-GMA) on European Data Grid (EDG) Frank Wang, Na Helian, Yike Guo, Steve Thompson, John Gordon

811

Session 11: Data Remote Access, Storage, and Sharing The Consistency Mechanism of Meta-data Management in Distributed Storage System Zhaofu Wang, Wensong Zhang, Kun Deng

815

Link-Contention-Aware Genetic Scheduling Using Task Duplication in Grid Environments Wensheng Yao, Xiao Xie, Jinyuan You

822

An Adaptive Meta-scheduler for Data-Intensive Applications Xuanhua Shi, Hai Jin, Weizhong Qiang, Deqing Zou

830

Dynamic Data Grid Replication Strategy Based on Internet Hierarchy Sang-Min Park, Jai-Hoon Kim, Young-Bae Ko, Won-Sik Yoon

838

Preserving Data Consistency in Grid Databases with Multiple Transactions Sushant Goel, Hema Sharda, David Taniar

847

Dart: A Framework for Grid-Based Database Resource Access and Discovery Chang Huang, Zhaohui Wu, Guozhou Zheng, Xiaojun Wu

855

An Optimal Task Scheduling for Cluster Systems Using Task Duplication Xiao Xie, Wensheng Yao, Jinyuan You

863

Table of Contents, Part II

XXI

Towards an Interactive Architecture for Web-Based Databases Changgui Chen, Wanlei Zhou

871

Network Storage Management in Data Grid Environment Shaofeng Yang, Zeyad Ali, Houssain Kettani, Vinti Verma, Qutaibah Malluhi

879

Study on Data Access Technology in Information Grid YouQun Shi, ChunGang Yan, Feng Yue, Changjun Jiang

887

GridTP Services for Grid Transaction Processing Zhengwei Qi, Jinyuan You, Ying Jin, Feilong Tang

891

FTPGrid: A New Paradigm for Distributed FTP System Liutong Xu, Bo Ai

895

Using Data Cube for Mining of Hybrid-Dimensional Association Rules Zhi-jie Li, Fei-xue Huang, Dong-qing Zhou, Peng Zhang Knowledge Sharing by Grid Technology Bangyong Liang, Juan-Zi Li, Ke-Hong Wang A Security Access Control Mechanism for a Multi-layer Heterogeneous Storage Structure Shiguang Ju, Héctor J. Hernández, Lan Zhang Investigating the Role of Handheld Devices in the Accomplishment of Grid-Enabled Analysis Environment Ashiq Anjum, Arshad Ali, Tahir Azim, Ahsan Ikram, Julian J. Bunn, Harvey B. Newman, Conrad Steenberg, Michael Thomas

899 903

907

913

Session 12: Computer-Supported Cooperative Work and Cooperative Middleware A TMO-Based Object Group Model to Structuring Replicated Real-Time Objects for Distributed Real-Time Applications Chang-Sun Shin, Su-Chong Joo, Young-Sik Jeong Fuzzy Synthesis Evaluation Improved Task Distribution in WfMS Xiao-Guang Zhang, Jian Cao, Shensheng Zhang

918 927

A Simulation Study of Job Workflow Execution Models over the Grid Yuhong Feng, Wentong Cai, Jiannong Cao

935

An Approach to Distributed Collaboration Problem with Conflictive Tasks Jingping Bi, Qi Wu, Zhongcheng Li

944

XXII

Table of Contents, Part II

Temporal Problems in Service-Based Workflows Zhen Yu, Zhaohui Wu, ShuiGuang Deng, Qi Gao

954

iCell: Integration Unit in Enterprise Cooperative Environment Ruey-Shyang Wu, Shyan-Ming Yuan, Anderson Liang, Daphne Chyan

962

The Availability Semantics of Predicate Data Flow Diagram Xiaolei Gao, Huaikou Miao, Shaoying Liu, Ling Liu

970

Virtual Workflow Management System in Grid Environment ShuiGuang Deng, Zhaohui Wu, Qi Gao, Zhen Yu

978

Research of Online Expandability of Service Grid Yuan Wang, Zhiwei Xu, Yuzhong Sun

986

Modelling Cooperative Multi-agent Systems Lijun Shan, Hong Zhu

994

GHIRS: Integration of Hotel Management Systems by Web Services Yang Xiang, Wanlei Zhou, Morshed Chowdhury

1002

Cooperative Ants Approach for a 2D Navigational Map of 3D Virtual Scene Jiangchun Wang, Shensheng Zhang

1010

Workflow Interoperability – Enabling Online Approval in E-government Hua Xin, Fu-ren Xue

1018

A Multicast Routing Algorithm for CSCW Xiong-fei Li, Dandan Huan, Yuanfang Dong, Xin Zhou

1022

A Multi-agent System Based on ECA Rule Xiaojun Zhou, Jian Cao, Shensheng Zhang

1026

A Hybrid Algorithm of n-OPT and GA to Solve Dynamic TSP Zhao Liu, Lishan Kang

1030

The Application Research of Role-Based Access Control Model in Workflow Management System Baoyi Wang, Shaomin Zhang, Xiaodong Xia

1034

Research and Design of Remote Education System Based on CSCW Chunzhi Wang, Miao Shao, Jing Xia, Huachao Chen

1038

Data and Interaction Oriented Workflow Execution Wan-Chun Dou, Juan Sun, Da-Gang Yang, Shi-Jie Cai

1042

Table of Contents, Part II

XXIII

XCS System: A New Architecture for Web-Based Applications Yijian Wu, Wenyun Zhao

1046

A PKI-Based Scalable Security Infrastructure for Scalable Grid Lican Huang, Zhaohui Wu

1051

A Layered Grid User Expression Model in Grid User Management Limin Liu, Zhiwei Xu, Wei Li

1055

A QoS-Based Multicast Algorithm for CSCW in IP/DWDM Optical Internet Xingwei Wang, Hui Cheng, Jia Li, Min Huang, Ludi Zheng

1059

An Evolutionary Constraint Satisfaction Solution for over the Cell Channel Routing Ahmet Ünveren, Adnan Acan

1063

Author Index

1067

This page intentionally left blank

Table of Contents, Part I

Vega Grid: A Computer Systems Approach to Grid Research Zhiwei Xu

1

Problems of and Mechanisms for Instantiating Virtual Organizations Carl Kesselman

2

Grid Computing: The Next Stage of the Internet Irving Wladawsky-Berger

3

Making Grid Computing Real for High Performance and Enterprise Computing Richard Wirt

4

Grid Computing for Enterprise and Beyond Songnian Zhou

5

Semantic Grid: Scientific Issues, Methodology, and Practice in China Hai Zhuge

6

Grid Computing, Vision, Strategy, and Technology Wolfgang Gentzsch

7

Towards a Petascale Research Grid Infrastructure Satoshi Matsuoka

8

The Microgrid: Enabling Scientific Study of Dynamic Grid Behavior Andrew A. Chien

9

On-Demand Business Collaboration Enablement with Services Computing Liang- Jie Zhang

10

Session 1: Grid Application Multidisciplinary Design Optimization of Aero-craft Shapes by Using Grid Based High Performance Computational Framework Hong Liu, Xi-li Sun, Qianni Deng, Xinda Lu A Research on the Framework of Grid Manufacturing Li Chen, Hong Deng, Qianni Deng, Zhenyu Wu Large-Scale Biological Sequence Assembly and Alignment by Using Computing Grid Wei Shi, Wanlei Zhou

11 19

26

XXVI

Table of Contents, Part I

Implementation of Grid-Enabled Medical Simulation Applications Using Workflow Techniques Junwei Cao, Jochen Fingberg, Guntram Berti, Jens Georg Schmidt A New Overlay Network Based on CAN and Chord Wenyuan Cai, Shuigeng Zhou, Linhao Xu, Weining Qian, Aoying Zhou An Engineering Computation Oriented Visual Grid Framework Guiyi Wei, Yao Zheng, Jifa Zhang, Guanghua Song

34 42 51

Interaction Compatibility: An Essential Ingredient for Service Composition Jun Han

59

A Distributed Media Service System Based on Globus Data-Management Technologies Xiang Yu, Shoubao Yang, Yu Hong

67

Load Balancing between Heterogeneous Computing Clusters Siu-Cheung Chau, Ada Wai-Chee Fu

75

“Gridifying” Aerodynamic Design Problem Using GridRPC Quoc-Thuan Ho, Yew-Soon Ong, Wentong Cai

83

A WEB-GIS Based Urgent Medical Rescue CSCW System for SARS Disease Prevention Xiaolin Lu MASON: A Model for Adapting Service-Oriented Grid Applications Gang Li, Jianwu Wang, Jing Wang, Yanbo Han, Zhuofeng Zhao, Roland M. Wagner, Haitao Hu Coordinating Business Transaction for Grid Service Feilong Tang, Minglu Li, Jian Cao, Qianni Deng Conceptual Framework for Recommendation System Based on Distributed User Ratings Hyun-Jun Kim, Jason J. Jung, Geun-Sik Jo

91 99

108

115

Grid Service-Based Parallel Finite Element Analysis Guiyi Wei, Yao Zheng, Jifa Zhang

123

The Design and Implementation of the GridLab Information Service Giovanni Aloisio, Massimo Cafaro, Italo Epicoco, Daniele Lezzi, Maria Mirto, Silvia Mocavero

131

Comparison Shopping Systems Based on Semantic Web – A Case Study of Purchasing Cameras Ho-Kyoung Lee, Young-Hoon Yu, Supratip Ghose, Geun-Sik Jo

139

Table of Contents, Part I

XXVII

A New Navigation Method for Web Users Jie Yang, Guoqing Wu, Luis Zhu

147

Application Availability Measurement in Computational Grid Chunjiang Li, Nong Xiao, Xuejun Yang

151

Research and Application of Distributed Fusion System Based on Grid Computing Yu Su, Hai Zhao, Wei-ji Su, Gang Wang, Xiao-dan Zhang An Efficient and Self-Configurable Publish-Subscribe System Tao Xue, Boqin Feng The Implementation of the Genetic Optimized Algorithm of Air Craft Geometry Designing Based on Grid Computing Xi-li Sun, Xinda Lu, Qianni Deng Distributed Information Management System for Grid Computing Liping Niu, Xiaojie Yuan, Wentong Cai The Design of Adaptive Platform for Visual-Intensive Applications over the Grid Hui Xiang, Bin Gong, Xiangxu Meng, Xianglong Kong

155 159

164 168

172

Maintaining Packet Order for the Parallel Switch Yuguo Dong, Binqiang Wang, Yunfei Guo, Jiangxing Wu

176

Grid-Based Process Simulation Technique and Support System Hui Gao, Li Zhang

180

Some Grid Automata for Grid Computing Hao Shen, Yongqiang Sun

184

The Cooperation of Virtual Enterprise Supported by the Open Agent System Zhaolin Yin, Aijuan Zhang, Xiaobin Li, Jinfei Sun The Granularity Analysis of MPI Parallel Programs Wei-guang Qiao, Guosun Zeng NGG: A Service-Oriented Application Grid Architecture for National Geological Survey Yu Tang, Kaitao He, Zhen Xiang, Yongbo Zhang, Ning Jing

188 192

196

Integration of the Distributed Simulation into the OGSA Model Chuanfu Zhang, Yunsheng Liu, Tong Zhang, Yabing Zha

200

An Extendable Grid Simulation Environment Based on GridSim Efeng Lu, Zhihong Xu, Jizhou Sun

205

XXVIII

Table of Contents, Part I

The Architecture of Traffic Information Grid Zhaohui Zhang, Qing Zhi, Guosun Zeng, Changjun Jiang

209

Construction Scheme of Meteorological Application Grid (MAG) Xuesheng Yang, Weiming Zhang, Dehui Chen

213

OGSA Based E-learning System: An Approach to Build Next Generation of Online Education Hui Wang, Xueli Yu, Li Wang, Xu Liu Multimedia Delivery Grid: A Novel Multimedia Delivery Scheme ZhiHui Lv, Jian Yang, ShiYong Zhang, YiPing Zhong The System for Computing of Molecule Structure on the Computational Grid Environment Yongmei Lei, Weimin Xu, Bingqiang Wang An Efficient Parallel Crawler in Grid Environment Shoubin Dong, Xiaofeng Lu, Ling Zhang, Kejing He The Development and Application of Numerical Packages Based on NetSolve Haiying Cheng, Wu Zhang, Yunfu Shen, Anping Song

217 221

225 229

233

Grid-Based Biological Computation Service Environment Jing Zhu, Guangwen Yang, Weimin Zheng, Tao Zhu, Meiming Shen, Li’an Qiao, Xiangjun Liu

237

CIMES: A Collaborative Image Editing System for Pattern Design Xianghua Xu, Jiajun Bu, Chun Chen, Yong Li

242

Campus Grid and Its Application Zhiqun Deng, Guanzhong Dai

247

The Realization Methods of PC Cluster Experimental Platform in Linux Jiang-ling Zhang, Shi-jue Zheng, Yang Qing

251

Coarse-Grained Distributed Parallel Programming Interface for Grid Computing Yongwei Wu, Qing Wang, Guangwen Yang, Weiming Zheng

255

User Guided Parallel Programming Platform Yong Liu, Xinda Lu, Qianni Deng A High-Performance Intelligent Integrated Data Services System in Data Grid Bin Huang, Xiaoning Peng, Nong Xiao, Bo Liu

259

262

Table of Contents, Part I

Architecting CORBA-Based Distributed Applications Min Cao, Jiannong Cao, Geng-Feng Wu, Yan-Yan Wang

XXIX

266

Design of NGIS: The Next Generation Internet Server for Future E-society Chong-Won Park, Myung-Joon Kim, Jin-Won Park

269

Video-on-Demand System Using Multicast and Web-Caching Techniques SeokHoon Kang

273

Session 2: Peer to Peer Computing PeerBus: A Middleware Framework towards Interoperability among 277 P2P Data Sharing Systems Linhao Xu, Shuigeng Zhou, Keping Zhao, Weining Qian, Aoying Zhou Ptops Index Server for Advanced Search Performance of P2P System with a Simple Discovery Server Boon-Hee Kim, Young-Chan Kim

285

Improvement of Routing Structure in P2P Overlay Networks Jinfeng Hu, Yinghui Wu, Ming Li, Weimin Zheng

292

Overlay Topology Matching in P2P Systems Yunhao Liu, Xiao Li, Lionel M. Ni, Yunhuai Liu

300

Effect of Links on DHT Routing Algorithms Futai Zou, Liang Zhang, Yin Li, Fanyuan Ma

308

A Peer-to-Peer Approach to Task Scheduling in Computation Grid Jiannong Cao, Oscar M.K. Kwong, Xianbing Wang, Wentong Cai

316

Efficient Search in Gnutella-Like “Small-World” Peer-to-Peer Systems Dongsheng Li, Xicheng Lu, Yijie Wang, Nong Xiao

324

Dominating-Set-Based Searching in Peer-to-Peer Networks Chunlin Yang, Jie Wu

332

GFS-Btree: A Scalable Peer-to-Peer Overlay Network for Lookup Service Qinghu Li, Jianmin Wang, Jiaguang Sun

340

An Approach to Content-Based Approximate Query Processing in Peer-to-Peer Data Systems Chaokun Wang, Jianzhong Li, Shengfei Shi

348

A Hint-Based Locating and Routing Mechanism in Peer-to-Peer File Sharing Systems Hairong Jin, Shanping Li, Tianchi Ma, Liang Qian

356

XXX

Table of Contents, Part I

Content Location Using Interest-Based Subnet in Peer-to-Peer System Guangtao Xue, Jinyuan You, Xiaojian He

363

Trust and Cooperation in Peer-to-Peer Systems Junjie Jiang, Haihuan Bai, Weinong Wang

371

A Scalable Peer-to-Peer Lookup Model Haitao Chen, Chuanfu Xu, Zunguo Huang, Huaping Hu, Zhenghu Gong

379

Characterizing Peer-to-Peer Traffic across Internet Yunfei Zhang, Lianhong Lei, Changjia Chen

388

Improving the Objects Set Availability in the P2P Environment by Multiple Groups Kang Chen, Shuming Shi, Guangwen Yang, Meiming Shen, Weimin Zheng

396

PBiz: An E-business Model Based on Peer-to-Peer Network Shudong Chen, Zengde Wu, Wei Zhang, Fanyuan Ma

404

P2P Overlay Networks of Constant Degree Guihai Chen, Chengzhong Xu, Haiying Shen, Daoxu Chen

412

An Efficient Contents Discovery Mechanism in Pure P2P Environments In-suk Kim, Yong-hyeog Kang, Young Ik Eom Distributed Computation for Diffusion Problem in a P2P-Enhanced Computing System Jun Ni, Lili Huang, Tao He, Yongxiang Zhang, Shaowen Wang, Boyd M. Knosp, Chinglong Lin Applications of Peer to Peer Technology in CERNET Chang-ji Wang, Jian-Ping Wu

420

428

436

PSMI: A JXTA 2.0-Based Infrastructure for P2P Service Management Using Web Service Registries Feng Yang, Shouyi Zhan, Fuxiang Shen

440

CIPS-P2P: A Stable Coordinates-Based Integrated-Paid-Service Peer-to-Peer Infrastructure Yunfei Zhang, Shaolong Li, Changjia Chen, Shu Zhang

446

A Multicast Routing Algorithm for P2P Networks Tingyao Jiang, Aling Zhong Leveraging Duplicates to Improve File Availability of P2P Storage Systems Min Qu, Yafei Dai, Mingzhong Xiao

452

456

Table of Contents, Part I

XXXI

Distributing the Keys into P2P Network Shijie Zhou, Zhiguang Qin, Jinde Liu

460

SemanticPeer: An Ontology-Based P2P Lookup Service Jing Tian, Yafei Dai, Xiaoming Li

464

Authentication and Access Control in P2P Network Yuqing Zhang, Dehua Zhang

468

Methodology Discussion of Grid Peer-Peer Computing Weifen Qu, Qingchun Meng, Chengbing Wei

471

PipeSeeU: A Scalable Peer-to-Peer Multipoint Video Conference System Bo Xie, Yin Liu, Ruimin Shen, Wenyin Liu, Changjun Jiang

475

Session 3: Grid Architectures Vega Grid: A Computer Systems Approach to Grid Research Zhiwei Xu, Wei Li RB-GACA: A RBAC Based Grid Access Control Architecture Weizhong Qiang, Hai Jin, Xuanhua Shi, Deqing Zou, Hao Zhang

480

487

GriDE: A Grid-Enabled Development Environment 495 Simon See, Jie Song, Liang Peng, Appie Stoelwinder, Hoon Kang Neo Information Grid Toolkit: Infrastructure of Shanghai Information Grid Xinhua Lin, Qianni Deng, Xinda Lu

503

On-Demand Services Composition and Infrastructure Management Jun Peng, Jie Wang

511

GridDaen: A Data Grid Engine Nong Xiao, Dongsheng Li, Wei Fu, Bin Huang, Xicheng Lu

519

Research on Security Architecture and Protocols of Grid Computing System Xiangming Fang, Shoubao Yang, Leitao Guo, Lei Zhang

529

A Multi-agent System Architecture for End-User Level Grid Monitoring Using Geographic Information Systems (MAGGIS): Architecture and Implementation Shaowen Wang, Anand Padmanabhan, Yan Liu, Ransom Briggs, Jun Ni, Tao He, Boyd M. Knosp, Yasar Onel

536

An Architecture of Game Grid Based on Resource Router Yu Wang, Enhua Tan, Wei Li, Zhiwei Xu

544

XXXII

Table of Contents, Part I

Scalable Resource Management and Load Assignment for Grid and Peer-to-Peer Services Xuezheng Liu, Ming Chen, Guangwen Yang, Dingxing Wang

552

Research on the Application of Multi-agent Technology to Spatial Information Grid Yan Ren, Cheng Fang, Honghui Chen, Xueshan Luo

560

An Optimal Method of Diffusion Algorithm for Computational Grid Rong Chen, Yadong Gui, Ji Gao

568

A Reconfigurable High Availability Infrastructure in Cluster for Grid Wen Gao, Xinyu Liu, Lei Wang, Takashi Nanya

576

An Adaptive Information Grid Architecture for Recommendation System M. Lan, W. Zhou

584

Research on Construction of EAI-Oriented Web Service Architecture Xin Peng, Wenyun Zhao, En Ye

592

GridBR: The Challenge of Grid Computing S.R.R. Costa, L.G. Neves, F. Ayres, C.E. Mendonça, R.S.N. de Bragança, F. Gandour, L.V. Ferreira, M.C.A. Costa, N.F.F. Ebecken

601

Autonomous Distributed Service System: Basic Concepts and Evaluation H. Farooq Ahmad, Kashif Iqbal, Hiroki Suguri, Arshad Ali ShanghaiGrid in Action: The First Stage Projects towards Digital City and City Grid Minglu Li, Hui Liu, Changjun Jiang, Weiqin Tong, Aoying Zhou, Yadong Gui, Hao Zhu, Shui Jiang, Ruonan Rao, Jian Cao, Qianni Deng, Qi Qian, Wei Jin

608

616

Spatial Information Grid – An Agent Framework Yingwei Luo, Xiaolin Wang, Zhuoqun Xu

624

Agent-Based Framework for Grid Computing Zhihuan Zhang, Shuqing Wang

629

A Hierarchical Grid Architecture Based on Computation/Application Metadata Wan-Chun Dou, Juan Sun, Da-Gang Yang, Shi-Jie Cai

633

A Transparent-to-Outside Resource Management Framework for Computational Grid Ye Zhu, Junzhou Luo

637

Table of Contents, Part I

XXXIII

A Service-Based Hierarchical Architecture for Parallel Computing in Grid Environment Weiqin Tong, Jingbo Ding, Jianquan Tang, Bo Wang, Lizhi Cai

641

A Grid Computing Framework for Large Scale Molecular Dynamics Simulations WenRui Wang, GuoLiang Chen, HuaPing Chen, Shoubao Yang

645

Principle and Framework of Information Grid Evaluation Hui Li, Xiaolin Li, Zhiwei Xu, Ning Yang

649

Manufacturing Grid: Needs, Concept, and Architecture Yushun Fan, Dazhe Zhao, Liqin Zhang, Shuangxi Huang, Bo Liu

653

Developing a Framework to Implement Security in Web Services Fawaz Amin Alvi, Shakeel A. Khoja, Zohra Jabeen

657

Session 4: Grid Middleware and Toolkits Computing Pool: A Simplified and Practical Computational Grid Model Peng Liu, Yao Shi, San-li Li

661

Formalizing Service Publication and Discovery in Grid Computing Systems Chuliang Weng, Xinda Lu, Qianni Deng

669

An Improved Solution to I/O Support Problems in Wide Area Grid Computing Environments Bin Wang, Ping Chen, Zhuoqun Xu

677

Agora: Grid Community in Vega Grid Hao Wang, Zhiwei Xu, Yili Gong, Wei Li

685

Sophisticated Interaction – A Paradigm on the Grid Xingwu Liu, Zhiwei Xu

692

A Composite-Event-Based Message-Oriented Middleware Pingpeng Yuan, Hai Jin

700

An Integration Architecture for Grid Resources Minglu Li, Feilong Tang, Jian Cao

708

Component-Based Middleware Platform for Grid Computing Jianmin Zhu, Rong Chen, Guangnan Ni, Yuan Liu

716

Grid Gateway: Message-Passing between Separated Cluster Interconnects Wei Cui, Jie Ma, Zhigang Huo

724

XXXIV

Table of Contents, Part I

A Model for User Management in Grid Computing Environments Bo Chen, Xuebin Chi, Hong Wu GSPD: A Middleware That Supports Publication and Discovery of Grid Services Feilong Tang, Minglu Li, Jian Cao, Qianni Deng, Jiadi Yu, Zhengwei Qi Partially Evaluating Grid Services by DJmix Hongyan Mao, Linpeng Huang, Yongqiang Sun Integrated Binding Service Model for Supporting Both Naming/Trading and Location Services in Inter/Intra-net Environments Chang-Won Jeong, Su-Chong Joo, Sung-Kook Han

732

738

746

754

Personal Grid Running at the Edge of Internet Bingchen Li, Wei Li, Zhiwei Xu

762

Grid Workflow Based on Performance Evaluation Shao-hua Zhang, Yu-jin Wu, Ning Gu, Wei Wang

770

Research on User Programming Environment in Grid Ge He, Donghua Liu, Zhiwei Xu, Lin Li, Shengliang Xu

778

The Delivery and Accounting Middleware in the ShanghaiGrid Ruonan Rao, Baiyan Li, Minglu Li, Jinyuan You Applying Agent into Web Testing and Evolution Baowen Xu, Lei Xu, Jixiang Jiang

786 794

Experiences on Computational Program Reuse with Service Mechanism Ping Chen, Bin Wang, Guoshi Xu, Zhuoqun Xu

799

Research and Implementation of the Real-Time Middleware in Open System Jian Peng, Jinde Liu, Tao Yang

803

An Object-Oriented Petri Nets Based Integrated Development Environment for Grid-Based Applications Hongyi Shi, Aihua Ren

809

Some Views on Building Computational Grids Infrastructure Bo Dai, Guiran Chang, Wandan Zeng, Jiyue Wen, Qiang Guo

813

Research on Computing Grid Software Architecture Changyun Li, Gansheng Li, Yin Li

817

Table of Contents, Part I

XXXV

Research on Integrating Service in Grid Portal Zheng Feng, Shoubao Yang, Shanjiu Long, Dongfeng Chen, Leitao Guo

821

GSRP: An Application-Level Protocol for Grid Environments Zhiqiang Hou, Donghua Liu, Zhiwei Xu, Wei Li

825

Towards a Mobile Service Mechanism in a Grid Environment Weiqin Tong, Jianquan Tang, Liang Jin, Bo Wang, Yuwei Zong

829

Mobile Middleware Based on Distributed Object Song Chen, Shan Wang, Ming-Tian Zhou

833

Session 5: Web Security and Web Services On the Malicious Participants Problem in Computational Grid Wenguang Chen, Weimin Zheng, Guangwen Yang

839

Certificate Validation Scheme of Open Grid Service Usage XKMS Namje Park, Kiyoung Moon, Sungwon Sohn, Cheehang Park

849

Distributed IDS Tracing Back to Attacking Sources Wu Liu, Hai-Xin Duan, Jian-Ping Wu, Ping Ren, Li-Hua Lu

859

The Study on Mobile Phone-Oriented Application Integration Technology of Web Services Luqun Li, Minglu Li, Xianguo Cui

867

Group Rekeying Algorithm Using Pseudo-random Functions and Modular Reduction Josep Pegueroles, Wang Bin, Miguel Soriano, Francisco Rico-Novella

875

Semantics and Formalizations of Mission-Aware Behavior Trust Model for Grids Minglu Li, Hui Liu, Lei Cao, Jiadi Yu, Ying Li, Qi Qian, Wei Jin

883

Study on a Secure Access Model for the Grid Catalogue Bing Xie, Xiao-Lin Gui, Qing-Jiang Wang

891

Modeling Trust Management System for Grids Baiyan Li, Wensheng Yao, Jinyuan You

899

Avoid Powerful Tampering by Malicious Host Fangyong Hou, Zhiying Wang, Zhen Liu, Yun Liu

907

Secure Grid-Based Mobile Agent Platform by Instance-Oriented Delegation Tianchi Ma, Shanping Li

916

XXXVI

Table of Contents, Part I

Authenticated Key Exchange Protocol Secure against Offline Dictionary Attack and Server Compromise Seung Bae Park, Moon Seol Kang, Sang Jun Lee

924

StarOTS: An Efficient Distributed Transaction Recovery Mechanism in the CORBA Component Runtime Environment Yi Ren, Jianbo Guan, Yan Jia, Weihong Han, Quanyuan Wu

932

Web Services Testing, the Methodology, and the Implementation of the Automation-Testing Tool Ying Li, Minglu Li, Jiadi Yu

940

Composing Web Services Based on Agent and Workflow Jian Cao, Minglu Li, Shensheng Zhang, Qianni Den

948

Structured Object-Z Software Specification Language Xiaolei Gao, Huaikou Miao, Yihai Chen

956

Ontology-Based Intelligent Sensing Action in Golog for Web Service Composition Zheng Dong, Cong Qi, Xiao-fei Xu

964

The Design of an Efficient Kerberos Authentication Mechanism Associated with Directory Systems Cheolhyun Kim, Yeijin Lee, Ilyong Chung

972

A Multi-agent Based Architecture for Network Attack Resistant System Jian Li, Guo-yin Zhang, Guo-chang Gu

980

Design and Implementation of Data Mapping Engine Based on Multi-XML Documents Yu Wang, Liping Yu, Feng Jin, Yunfa Hu

984

Research on the Methods of Search and Elimination in Covert Channels Chang-da Wang, Shiguang Ju, Dianchun Guo, Zhen Yang, Wen-yi Zheng Design and Performance of Firewall System Based on Embedded Computing Yuan-ni Guo, Ren-fa Li OGSA Security Authentication Services Hongxia Xie, Fanrong Meng Detecting Identification of a Remote Web Server via Its Behavioral Characteristics Ke-xin Yang, Jiu-bin Ju

988

992 996

1000

Table of Contents, Part I

Access Control Architecture for Web Services Shijin Yuan, Yunfa Hu Formalizing Web Service and Modeling Web Service-Based System Based on Object Oriented Petri Net Xiaofeng Tao, Changjun Jiang

XXXVII

1004

1008

Report about Middleware Beibei Fan, Shisheng Zhu, Peijun Lin

1012

Grid Security Gateway on RADIUS and Packet Filter Jing Cao, BingLiang Lou

1017

A Security Policy Implementation Model in Computational GRID Feng Li, Junzhou Luo

1021

An Approach of Building LinuxCluster-Based Grid Services Yu Ce, Xiao Jian, Sun Jizhou

1026

Dynamic E-commerce Security Based on the Web Services Gongxuan Zhang, Guowei Zuo

1030

Standardization of Page Service Using XSLT Based on Grid System Wanjun Zhang, Yi Zeng, Wei Dong, Guoqing Li, Dingsheng Liu

1034

Secure Super-distribution Protocol for Digital Rights Management in Unauthentic Network Environment Zhaofeng Ma, Boqin Feng

1039

X-NIndex: A High Performance Stable and Large XML Document Query Approach and Experience in TOP500 List Data Shaomei Wu, Xuan Li, Zhihui Du

1043

The Analysis of Authorization Mechanisms in the Grid Shiguang Ju, Zhen Yang, Chang-da Wang, Dianchun Guo

1047

Constructing Secure Web Service Based on XML Shaomin Zhang, Baoyi Wang, Lihua Zhou

1051

ECC Based Intrusion Tolerance for Web Security Xianfeng Zhang, Feng Zhang, Zhiguang Qin, Jinde Liu

1055

Design for Reliable Service Aggregation in an Architectural Environment Xiaoli Zhi, Weiqin Tong The Anatomy of Web Services Hongbing Wang, Yuzhong Qu, Junyuan Xie

1059 1063

XXXVIII

Table of Contents, Part I

Automated Vulnerability Management through Web Services H. T. Tian, L.S. Huang, J.L. Shan, G.L. Chen

1067

Optimizing Java Based Web Services by Partial Evaluation Lin Lin, Linpeng Huang, Yongqiang Sun

1071

An XML Based General Configuration Language: XGCL Huaifeng Qin, Xingshe Zhou

1075

Modification on Kerberos Authentication Protocol in Grid Computing Environment Rong Chen, Yadong Gui, Ji Gao

1079

A Distributed Honeypot System for Grid Security Geng Yang, Chunming Rong, Yunping Dai

1083

Web Security Using Distributed Role Hierarchy Gunhee Lee, Hongjin Yeh, Wonil Kim, Dong-Kyoo Kim

1087

User Authentication Protocol Based on Human Memorable Password and Using ECC Seung Bae Park, Moon Seol Kang, Sang Jun Lee New Authentication Systems Seung Bae Park, Moon Seol Kang, Sang Jun Lee

1091 1095

Web Proxy Caching Mechanism to Evenly Distribute Transmission Channel in VOD System Backhyun Kim, Iksoo Kim, SeokHoon Kang

1099

Author Index

1103

Synthetic Implementations of Performance Data Collection in Massively Parallel Systems Chu J. Jong1 and Arthur B. Maccabe2 1

School of Information Technology, Illinois State University, IL, USA [email protected]

2

Department of Computer Science, University of New Mexico, NM USA [email protected]

Abstract. Most performance tools that run on Massively Parallel (MP) systems do not scale up as the number of nodes increases. We studied the scalability problem of MP system performance tools and proposed a solution, replacing the two-level data collection structure by hierarchal one. To demonstrate that hierarchical data collection structure solves the scalability problem, we synthesized an implementation model to implement the performance data collection in MP systems. This paper presents our synthetic implementation results. Keywords: Data collection, response time, split node, performance knee.

1

Introduction

Complex applications, such as genetic analysis, material simulation, and climate modeling[8], require high performance and high computation resources to generate results. MP systems, built with thousands processors and huge amount of memory, meet these applications’ needs1. The effectiveness of using the MP system computation power mainly depends on the user’s knowledge and sometimes on the compilers or system performance tools. In the past years, developers have made significant efforts to parallel performance tools. Recent work showed that many areas, such as reducing overheads[3], improving accuracies[10], minimizing perturbations[9], and increasing convenience[5], have been improved except the tool’s scalability. We use the response time, time between issuing a user command and a reply, to measure the performance of a tool. The results of performance tools are based on the performance data collection and processing. Two main factors contribute to the response time and they are: effective processor utilization and balanced point [6]. In MP systems, when the workload incurred by workers exceeds the 1

ASCI project: The Red had 9,298 Pentium Xeon processors; the Blue-Pacific composed 1464 RS/6000 nodes with 4 CPUs; the Blue-Mountain had 256 MIPS R10000 processors; and the BlueGene/L will have 130,000 advanced microprocessors to achieve 367 teraflops.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1–9, 2004. © Springer-Verlag Berlin Heidelberg 2004

2

C.J. Jong and A.B. Maccabe

server’s balanced point, the server starts thrashing and that causes exponential processor utilization drop and response time increase. Since most performance tools use two-level structure (one server and workers) to collect data, the workload of the server is proportional to the number of workers. When the number of workers reaches the server’s limit, thrashing occurs. A level hierarchical data collection structure (one server, workers, and two or more worker-servers) allows worker-servers share the workload to prevent server thrashing. Thus increases the total number of workers significantly. In theory, the maximum numbers of workers can be handled by a single server without thrashing are: in a two-level data collection structure and similar to in a level hierarchical data collection structure. and are ratios of network throughput, memory size, and processing power between server and workers respectively. Although some systems provide large ratios [4] to accommodate more workers, and are bounded by the system capacities.

2

Previous Work – Response Time Simulation

We built a time driven event simulator to simulate the response time behaviors for both data collection structures. From our simulation results, we concluded that a hierarchical data collection structure eliminates the response time knee (exponential increasing) that two-level data collection structures cannot avoid [7].

3

Response Time Implementation

We used a small cluster of 8 Pentium III 500 MHz PCs to construct our implementation model. We wrote MPI implementation programs in C run on Linux with customized kernel modules. We used a node splitting mechanism, to produce a larger virtual node cluster. The is a virtualization mechanism that splits a physical node into a number of virtual nodes. The basic principle of is to divide resources of a physical node into equivalent slices and group one slice from each divided resource to form a virtual node (processor). Theoretically the value of can be arbitrary chosen, the reality is that the more virtual processors, the higher the system overheads and the heavier the channel workload. We applied to seven PCs. Virtual processors are workers or workerservers, the un-split PC is the server. Applying to seven PCs, we built a virtual processor cluster that is equivalent to an one-order magnitude scale down cluster of a 56 500-MHz single CPU workers and a one-GHz quad CPU server connected by a one-Gigabit network cluster.

3.1

System Modules

Our customized kernel modules allows us to have better control on the system resources. Our pseudo application, pseudo data generating/collection, and pseudo

Synthetic Implementations of Performance Data Collection

3

data processing allows us to emulate a system without side effects. Following are descriptions of our modules. System Structure. All PCs are connected by a 100-megabit Ethernet. The server processor is the un-split PC, the worker and worker-server processors are virtual processors. Both workers and worker-servers are evenly distributed among seven PCs, no two worker-servers reside in the same PC, and no two children from the same parent reside in the same PC too. System Control Units. Memory Management Unit (MMU) - Locks a block of memory for processors. It uses paging to handle memory page faults and all memory requests have to go through it. Processor Management Unit (PMU) Controls CPU cycles consumed by processors. It suspends a process when that process reaches its CPU cycle limit. Perturbation Control Unit (PCU) - Controls application perturbation rate. Implementation Software. Processors - One server processor and many worker and/or worker-server processors. A 8:1 computation ratio between server and worker (worker-server) processor and a 8:4:1 memory ratio among server, worker-servers, and workers. Server. Runs on the server processor. Major functions are: to interact with the user, set system topology, monitor process behaviors, and fulfill the user’s request. Server collects the response time and calculates global statistics, such as data gathering time, data processing time, data processing rate, and channel time. Worker. Runs on worker processor. Worker calculates local statistics, such as response time, data collection time, data processing time, data processing rate, and channel time. Worker-Server. Runs on worker-server processor. Worker-server calculates non-local statistics, such as worker-server response time, data gathering time, data processing time, data processing rate, and channel time. Application Process - Runs on all processors, it emulates the computation and I/O behavior of different applications. Data Processing Process - Invoked by data generating/collection process after data gathering completed. It processes data by reading and writing every byte at least once. It provides a hook to allow different data access methods to be used, such as sequential, offset, indexed, link list, scattered among pages or segments. The size of data access range from a single byte to several hundred bytes. Data Generating/Collection Process Invoked by performance tool process. It performs three different tasks, data generating, data gathering, and data forwarding. It generates performance data and puts them in the data buffer with other data gathered from its children. After data processing is completed, it sends the result to its parent.

4

3.2

C.J. Jong and A.B. Maccabe

The Implementations

The two-level structure has two kinds of processes: server and worker, the server can have up to 56 child workers. The hierarchical structure has three kinds of processes: server, worker-server, and worker, the server has seven child workerservers, and each worker-server can have up to seven child workers. Communications are restricted to the processes having direct parent and child relations. Following are logical actions of each of them: Processor has no parent - server. To get a user request, parse the request, send data collection command, allocate data processing buffers, wait for all data and then process data, and present results to the user. Processor has no children - worker. To wait for data collection command, allocate data buffers, store data in the buffer, and send data to its parent. Processor has both parent and children - worker-server. To wait for data collection command, allocate data buffer, send data collection command, allocate data processing buffers, store data in the buffer, wait for all data and then process data, and sends results to its parent.

4

Implementation Results

Implementations start from one server and 7 workers (worker-servers) and increment one worker per physical node at a time (to eliminate impacts from unbalanced communication delay) until the total worker number reaches to 56.

Fig. 1. Server Response Time of Two-Level Structure

Synthetic Implementations of Performance Data Collection

5

Fig. 2. Server Data Processing Rate of Two-Level Structure

4.1

Server of Two-Level Structure

Figure 1 shows the response time of the two-level data collection structure. Horizontal scale is the number of workers and vertical scale is the response time in milliseconds. Labels indicate data sizes in kilobytes collected by each worker. The total amount of data gathered by the server equals the data size multiplied by the number of workers. Figure 2 shows the data processing rates. The 256K rate drops from 2183 bytes/ms at 48 workers to 355 bytes/ms at 50 workers and the 384K rate drops from 2187 bytes/ms at 32 workers to 406 bytes/ms at 33 workers. The rate drops indicate thrashings.

6

C.J. Jong and A.B. Maccabe

Fig. 3. Average Worker Server Response Time

4.2

Worker-Server of Hierarchical Structure

Figure 3 shows the average worker-server response times, the sums of the workerserver data gathering times and the worker-server data processing times, of the hierarchical data collection structure. The worker-server data gathering time is the longest time between when a worker-server issues a data collection command and receives all data packets. The average worker-server data processing times are shown in table 1. Except one value, 0.76 ms from 8K data size, most of them are in the range between 0.30 ms/KByte and 0.42 ms/Kbyte. Either of gathering time or processing time that start non-linear incrementation will make the response time non-linear. Table 1 indicated that none of the worker-servers were thrashing. Larger values were caused by the delay from the Linux process management. Actually, thrashing occurred in the virtual node subsystem due to the TCP/IP stack and buffer contention.

4.3

Server of Hierarchical Structure

Figure 4 shows the normalized (filtering outliers and replacing subsystem by values generated from algorithms based on the data sizes) response times of the hierarchical data collection structure. Graphs are data sizes of 4, 8, 16, 32, 64, 128, 256, 384, and 512 kilobytes. The overall data collected by all processors equals the data size multiplied by the total number of processors. Figure 5 shows the corresponding data processing rates. All rates are in the range between 2700 bytes/ms and 3300 bytes/ms. They showed no evidence of the server being saturated, and thrashing did not occur.

Synthetic Implementations of Performance Data Collection

7

Fig. 4. Normalized Server Response Time

Fig. 5. Server Data Processing Rate

5

Discussion

In figure 1, 256K and 384K graphs start non-linear incrementation when the total data collected by the server reaches 12 megabytes. The same graphs in figure 2 also showed a non-linear decrementing at a total data size of 12 megabytes. From the theory of thrashing [2] [1], when all main memory are used, swapping and waiting occurs. The 12 megabyte main memory was the upper limit we set up for the server.

C.J. Jong and A.B. Maccabe

8

In figure 3, some response time start non-linear incrementation after 50 processors regardless of the data size. However, from table 1, we know that all data processing times (except the 8K) are in the range between 0.30 ms/Kbyte to 0.42 ms/Kbyte. That means that none of the worker-servers reach their thrashing point. The large response time delays must caused by their data gathering times. Although applying to creates enough virtual nodes, it causes the sub-node system thrashing since the increasing workload reaches the system’s balanced point. The only way to prevent the sub-node system thrashing is not to increase the number of but to add physical nodes (PCs) to the system. The normalized hierarchical structure graphs in figure 4 shows that the response times are increased linearly in terms of the total processors. The overall data size does not affect the response time even after it reaches the memory limit. Figure 5 shows flat server data processing rates across all data sizes. That means that the server never reaches its balance point and thrashing has been totally eliminated. We also noticed that larger data size has slightly higher data processing rate. The constant data packet overhead in favors larger data collection size

6

Conclusion and Future Work

The results of our implementations shows that the hierarchical data collection structure eliminates the response time knee that a two-level data collection structure cannot avoid. The scalability has been improved in orders of magnitude with a minor network delay penalty per level. Both structures collectes the same overall amount of data, however, the amont of data gathered by two servers are not the same. In fact, the server in the hierarchical data collection structure receives quality data instead of quantity data. We are currently porting our implementation model to a larger cluster. We are also working on instrumentation and performance data presentation. We plan to enhance our virtualization mechanisms to provide a better development environment. Our long term goal is to produce a system that helps developers develop better MP system performance tools.

References 1. W. C. L. A. Alderson and B. Randell. Thrashing in a multiprogrammed paging system. Operating Systems Techniques by Hoare and Perrott, pages 152–167, 1972. 2. P. J. Denning. The working set model for program behavior. Communication of the ACM, (5):323–333, May 1968. 3. G. Eisenhauer, B. Schroeder, K. Schwan, V. Martin, and J. Vetter. DataExchange: High Performance Communication in Distributed Laboratories. Ninth International Conference on Parallel and Distributed Computing and Systems, October 1997. 4. D. A. R. et al. Scalable Performance Analysis: The Pablo Performance Analysis Environment. IEEE Scalable Performance Libraries Conference, IEEE Service Center, Piscaataway, N.J., 1993.

Synthetic Implementations of Performance Data Collection

9

5. W. Gu, G. Eisenhauer, and K. Schwan. Falcon: On-line Monitoring and Steering of Parallel Programs. Concurrency: Practice and Experience, 1995. 6. P. B. Hansen. Operating System Principles. Prentice-Hall, Inc., Prentice-Hall, Inc., Englewood Cliffs, New Jersey, 1973. 7. C. J. Jong and A. B. Maccabe. A simulator for performance tools in massively parallel system. In Postceedings of The International Conference on Parallel and Distributed Processing Techniques and Applications, June 2003. 8. T. Kindler, K. Schwan, D. Silva, M. Trauner, and F. Alyea. A Paraller Spectral Model for Atmospheric Transport Processes. Concurrency: Practice and Experience, 8:639–666, November 1996. 9. C. Liao, M. Martonosi, and D. W. Clark. Performance Monitoring in a MyrinetConnected Shrimp Cluster. In Symposium on Parallel and Distributed Tools, pages 21–29, P.O. Box 12114 Church Street Station, New York, N.Y. 10257, August 1998. The Association for Computing Machinery. 10. R. L. Ribler, J. S. Vetter, H. Simitci, and D. A. Reed. Autopilot: Adaptive Control of Distributed Applications. Proceedings of the 7th IEEE Symposium on HighPerformance Distributed Computing, Chicago, IL, July 1998.

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid Chuan He, Zhihui Du, and Sanli Li Grid Research Group, Department of Computer Science and Technology, Tsinghua University, Beijing, China [email protected]

Abstract. In this paper, we investigate many monitoring and management tools in Grid and other distributed systems. After focusing on the advantage and disadvantage of GMA (Grid Monitoring Architecture), we propose a new monitoring and management schema for Grid --GMA+. In GMA+, we add a close loop feedback structure to GMA and provides interfaces which match Grid Service standards for all its components and defines metadata for monitoring events. It is a novel infrastructure for Grid monitoring and management with highly modularity, usability and scalability.

1 Introduction Nowadays, single PC and even HPCs can not satisfy the explosive need of high performance computing. It needs to connect various heterogeneous resources which are distributed in physical location with high speed network to solve some huge problems. Grid [1] is the extension of traditional distributed computing technology, it waves resources in LAN or WAN and constructs dynamic VO(Virtual Organization) to implement secure and coordinated sharing of resources between persons, organizations and resources. The target of Grid monitoring and management is monitoring resources in Grid for fault detection, performance analysis, performance tuning, load balancing and scheduling. Compared with traditional distributed systems, Grid is more complicated in architecture, larger in scale and more distributed in physical location. So it is more urgent to construct a monitoring and management system with high performance, high scalability and high stabilization to do automatic management in Grid environment.

2 Related Works There have already been many monitoring tools in traditional distributed systems. However, those existing tools can not completely meet the needs of Grid monitoring and management.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 10–17, 2004. © Springer-Verlag Berlin Heidelberg 2004

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid

11

NetLogger [2], Paradyn [3], AIMS [4], Gloperf [5] and SPI [6] collect data from distributed systems and then analyze the data using special tools. Those tools generate monitoring data with unique format, which constrains their cooperation with other systems. Globus HBM [7] periodically sends “Heart Beat” to a centralized collection system in order to do fault detection in a distributed environment. The centrality of HBM make it difficult for extending, what’s more, HBM relies on local information so that it can not make forecast and management from the whole point of view. NWS [8] and DSRT [9] only monitor some specific resources (network bandwidth, CPU) but are not adaptive for new resources. RMON [10] and Autopilot [11] implement intelligent and dynamic control by using technology like reactive circuits. JAMM [12] is another similar system, but it only can be used in Java program. The shortcoming of those systems is that they are bound to given programming languages. So they can’t cooperate with other systems. SNMP-based tools are not suitable for Grid monitoring and management which runs on WAN and requires high performance. In addition, those toolkits do not provide application level monitoring.

3 GMA GMA [13] is an architecture presented and supported by GGF. It aims to constitute standards for grid monitoring and make existing systems interoperate.

3.1 Architecture GMA first adopts a producer-consumer model in Grid monitoring. All monitoring information are events which are base on timestamp for storing and transferring.

Fig. 1. Grid Monitoring Architecture

12

C. He, Z. Du, and S. Li

Fig. 1. shows the architecture of GMA. GMA is consisted of four key components: 1. Sensors. A sensor is any program which can generate time-stamped monitoring events. For example, we have sensors that monitor CPU usage, memory usage, and network usage. 2. Producers. Producers is the source of performance events. Producers send events to consumers through their interfaces and one producer can provide several independent interfaces for sending different events. 3. Consumers. Applications who receive and consume monitoring events. Consumers can not only run at the same computer with the producer, but can run at remote computers also. 4. Directory Service. The directory service in Grid likes a registry. Producers and consumers publish their location information in the directory service. When a producer and a consumer find each other through directory services, the monitoring procedure runs only between the producer and the consumer, but has no relationship with the directory service at all. All monitoring information in GMA are events. Another important contribution of GMA is that it abstracts three event passing patterns between producers and consumers. They are: Publishing/subscribing pattern, Query/response pattern and Notification pattern. The protocol between producers and consumers are not defined in GMA, it can be implemented in many ways. For example: using SOAP/HTTP, LDAP or XML/BXXP. What’s more, we also can use one protocol in controlling but another distinct protocol in transmission under different situations.

3.2 Current Status of GMA and Its Limitations R-GMA [14] makes use of relation database to offer directory service instead of LDAP. But it can not be suitable for huge data transaction, especially when facing outburst of information. GridMon [15] is the first prototype of Grid monitoring in China. Although it applies GMA thoughts partially, it is not entirely base on GMA model. In fact, GridMon uses LDAP as directory service and only realizes the query/ response model. It defines a scalable format for monitoring events, yet it is too simple to serve a complex system like Grid. GMA presents a highly scalable and flexible architecture for grid monitoring. However, there are still some limitations in GMA that should be improved in the following ways: 1. In GMA, the directory service should be universal enough to support LDAP, relation database and P2P distributed directory storage rather than restricted to LDAP. 2. There is no definition of the metadata model of monitoring information in GMA system. Using XML schema is a good way to define it and make the system more suitable for monitoring various kinds of new resources. 3. Producers and consumers in GMA system need to provide standard external interfaces. Moreover, considering that Grid is a wide area system, we should encapsulate those interfaces as Web Service or Grid Service proposed in OGSA [16].

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid

13

4. GMA only offers monitoring functions but does not integrate monitoring with management. In a practical system, GMA ought to be joined into the structure like close loop feedback in order to combine monitoring and management together.

4 GMA+ 4.1 Architecture According to the shortages of GMA mentioned above, we give an improved Grid Monitoring system—GMA+ based on GMA. The architecture of GMA+ is showed in Fig. 2. We classify all events in GMA+ into 2 types: M-Event for monitoring events and C-Event for controlling events. We also add controller and actuator as 2 new parts.

Fig. 2. The architecture of GMA+

Actually, the controller consists of a consumer, a producer and its controlling logic. The controller can analyze various events, then it will generate new M-Events and CEvents, in the end, it will send corresponding C-Events to the actuator. The actuator receives the evens and adjusts the status of Grid resources dynamically. Typically, the functions of the controller can be: validating the monitoring events, classifying the events, decision-making system and etc. In GMA+, many controllers can be connected as a chain, so sensor->producer-> controller...->controller->actuator forms a local closed loop structure. Fig. 3. shows three different types of controller: controller1 filters M-Event; controller2 analyzes the receiving C-Events and M-Events and generates C-Events for controlling; controller3 is a more complicated one for decision-making.

4.2 Components in GMA+ The interfaces of all GMA+ components follow Grid Service Specification in OGSA. Grid Services are more suitable for use in a Grid environment than Web Services,

14

C. He, Z. Du, and S. Li

Fig. 3. Different Controllers in GMA+

because Web Services are mostly persistent, while services in Grid are mainly volatile. Another important reason is that Grid Service has a soft lifetime management to release useless resources automatically, but Web Service can not solve that problem easily. In a word, adopting Grid Service in Grid monitoring and management has many benefits such as standardization, easy to distribute, easy for cooperating, independent for platforms and programming languages.

4.2.1 Sensors We implement four different types of sensors in GMA+. They are host sensors, network sensors, process sensors and application sensors. In a real Grid, there may be more kinds of sensors than what we have defined. For extension, we have given the metadata model for all events in GMA+, encapsulated sensors with Grid Service interface and defined the programming model for how to develop a new sensor. Those all guarantee that new sensors could be added into GMA+ easily. 4.2.2 Consumers GMA+ offers many kinds of querying and analyzing tools as consumers. They mainly contains: 1. Storing archives. Although the monitoring events in Grid is mainly useful in short period of time, it still needs a storing archive to keep those events for later use. We implement a distributed storage system to archive the monitoring events in GMA+ using LDAP and RDBMS. 2. Real-time monitoring tools. The tools are used to monitor real-time data. 3. Analyzing tools. These tools are used to analyze the data which is kept in the storing archives. 4. Query and analysis portal. GMA+ also provides a web portal to do query and analysis. In GMA+, we define several types of users with different privileges to do monitoring and management.

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid

15

4.2.3 Controller and Actuator In GMA+, we just implement two simple controllers: C-Controller and A-Controller to demonstrate the usage of controllers and actuators. C-Controller is used to classify monitoring events. A-Controller is used to analyze the monitoring events of computers (the computing resources in Grid). If the load or temperature of CPU exceeds the threshold, A-Controller will notify the corresponding actuator to report an alarm. Fig. 4. shows the collaboration sequence of sensors, controllers and actuators.

Fig. 4. Sequence diagram of sensors, controllers and actuators

4.3 Directory Service There are some traditional directory systems such as LDAP, DNS, DEC’s GNS [17], Intentional Naming Systems [18] and Active Names [19]. Nevertheless, they may meet similar problems when used directly in the Grid. 1. These directory services can not serve for the storage of a great number of data; as a result they can not satisfy the extension of Grid. 2. These services are not suitable for Grid monitoring data which needs to be updated frequently. They are only fit for the frequently-asked queries. 3. These services are not able to support queries related with multi-objects, for example, function “Join” in SQL. Recently, more methods were put forward to solve the directory service problems, such as using RDBMS [20]. But we have to see that there are not any mature solutions. As all components in GMA+ are Grid Services, the directory service in GMA+ is used to index Grid Services. We use WSIL to implement the directory. WSIL has a lightweight, distributed model: the XML documents which describe Web Services can be stored at any location; independent documents can connect with others through URL. Every component in GMA+ (producer, consumer, controller, sensor, and actuator) contains three parts: Grid Service interfaces, service entity and service description in WSIL. The distributed WSIL documents in the network construct a logical directory service. Fig. 5. shows the architecture of directory service in GMA+.

16

C. He, Z. Du, and S. Li

Fig. 5. Directory Service using WSIL

4.4 Metadata and Protocol Monitoring and management events are widely used in GMA+. So it is important to define the metadata of events with high scalability. We use XML Schema to define the metadata for various events. The metadata of events do not rely on transferring protocol. In GMA+, we use SOAP as the transfer protocol. SOAP is a lightweight protocol base on HTTP and XML. It is a protocol supporting RPC through HTTP and has many benefits such as: high scalability, easy to understand and easy to deploy.

5 Next Step Work In the future, we prepare to add more adapters and sensors in GMA+ in order to monitor and manage more Grid resources. And we also plan to implement more complicated controllers and actuators with powerful algorithm in order to do intelligent control and management in Grid.

References 1. 2.

3.

Foster and C. Kesselman, “The Grid: Blueprint for a New Computing Infrastructure”, Morgan Kaufmann Publishers, San Francisco, CA, 1999. Brian Tierney, William Jonston, Brian Crowley, Gary Hoo, Chris Brooks, and Dan Gunter, “The NetLogger Methodology for High Performance Distributed Systems Performance Analysis”, Proc. of IEEE High Performance Distributed Computing Conference (HPDC7), July 1998. Barton P. Miller, Jonathan M. Cargille, R. Bruce Irvin, Krishna Kunchithapadam, Mark D. Callaghan, Jeffrey K. Hollingsworth, Karen L. Karavanic, and Tia Newhall, “The Paradyn Parallel Performance Measurement Tool”, IEEE Computer, 28(11), November 1995, pp. 37–46.

GMA+ – A GMA-Based Monitoring and Management Infrastructure for Grid 4.

5.

6.

7. 8. 9. 10.

11.

12. 13.

14. 15.

16.

17. 18.

19.

20.

17

Jerry C. Yan, “Performance Tuning with AIMS—An Automated Instrumentation and Monitoring System for Multicomputers”, Proc. of the Twenty-Seventh Hawaii Int. Conf. on System Sciences, Hawaii, January 1994. Craig A. Lee, Rich Wolski, Ian Foster, Carl Kesselman, and James Stepanek, “A Network Performance Tool for Grid Environments”, Proc. of SC’99, Portlan, Oregon, Nov. 13–19, 1999. Devesh Bhatt, Rakesh Jha, Todd Steeves, Rashmi Bhatt, and David Wills, “SPI: An Instrumentation Development Environment for Parallel/Distributed Systems”, Proc. of Int. Parallel Processing Symposium, April 1995. http://www.globus.org/hbm/heartbeat_spec.html R. Wolski et al., “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing”, Journal of Future Generation Systems, 1998. H. Chu and K. Nahrstedt, “CPU Service Classes for Multimedia Applications,” Proc. of IEEE Multimedia Computing and Applications, Florence, Italy, June 1999. Clifford W. Mercer and Ragunathan Rajkumar, “Interactive Interface and RTMach Support for Monitoring and Controlling Resource Management”, Proceedings of Real-Time Technology and Applications Symposium, Chicago, Illinois, May 15-17, 1995, pp. 134– 139. J.S. Vetter and D.A. Reed, “Real-time Monitoring Adaptive Control and Interactive Steering of Computational Grids”, The International Journal of High Performance Computing Applications (2000) Chris Brooks, Brian Tierney, and William Johnston, “Java Agents for Distributed System Management”, LBNL Technical Report, Dec. 1997. B. Tierney, R. Aydt, D. Gunter et al. “A Grid Monitoring Architecture (2002)”, GGF Performance Working Group, http://www-didc.lbl.gov/GGF-PERF/GMA-WG/ papers/GWD-GP-16-1.pdf Fisher.S. “Relational Grid Monitoring Architecture Package”, http://hepunx.rl.ac.uk/grid/wp3/releases.html Li Cha, Zhiwei Xu, Guozhang Lin, “A Grid monitoring system using LDAP”, Computer science and technology department, Beijing Institute of Technology. Journal of Computer Research and Development. August, 2002. Foster, C. Kesselman, J. Nick, S. Tuecke, “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Open Grid Service Infrastructure WG, Global Grid Forum, June 22, 2002. Lampson, B. W., “Designing a global name service”, In 4th ACM Symposium on Principles of Distributed Computing (August 1986). Adjie-Winoto, W., Schwartz, E., Balakrishnan, H., and Lilley, J. “The design and implementation of an intentional naming system”, In Proceedings of the 17th ACM Symposium on Operating System Principles (December 199). Vahdat A., Dahlin M., Anderson T. and Aggarwal A. “Active names: flexible location and transport of wide-area resources”, In USENIX Symposium on Internet Technology and Systems (Oct 1999). E.F.Codd, “A relational model of data for large shared data banks”, CACM, 13(6), Jun 1970.

A Parallel Branch–and–Bound Algorithm for Computing Optimal Task Graph Schedules Udo Hönig and Wolfram Schiffmann FernUniversität Hagen, Lehrgebiet Technische Informatik I, 58084 Hagen, Germany {Udo.Hoenig|Wolfram.Schiffmann}@FernUni-Hagen.de http://www.informatik.ti1.fernuni-hagen.de/

Abstract. In order to harness the power of parallel computing we must firstly find appropriate algorithms that consist of a collection of (sub)tasks and secondly schedule these tasks to processing elements that communicate data between each other by means of a network. In this paper, we consider task graphs that take into account both, computation and communication costs. For a homogeneous computing system with a fixed number of processing elements we compute all the schedules with minimum schedule length. Our main contribution consist of parallelizing an informed search algorithm for calculating optimal schedules based on a Branch–and–Bound approach. While most recently proposed heuristics use task duplication, our parallel algorithm finds all optimal solutions under the assumption that each task is only assigned to one processing element. Compared to exhaustive search algorithms this parallel informed search can compute optimal schedules for more complex task graphs. In the paper, the influence of parameters on the efficiency of the parallel implementation will be discussed and optimal schedule lengths for 1700 randomly generated task graphs are compared to the solutions of a widely used heuristic.

1

Introduction

A task graph is a directed acyclic graph (DAG), that describes the dependencies between the parts of a parallel program [8]. In order to execute it on a cluster or grid computer, it’s tasks must be assigned to the available processing elements. Most often, the objective of solving this task graph scheduling problem is to minimize the overall computing time. The time that a task needs to compute an output by using the results from preceding tasks corresponds to the working load for the processing element to which that task is assigned. It is denoted by a node weight of the task node. The cost of communication between two tasks and is specified as an edge weight If both tasks are assigned to the same processor, the communication cost is zero. Task graph scheduling comprises two subproblems. One problem is to assign the tasks to the processors, the other problem consists of the optimal sequencing of the tasks. In this paper, we suppose a homogeneous computing environment, e.g. a cluster computer. But, even if we assume identical processing elements, the M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 18–25, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Parallel Branch–and–Bound Algorithm

19

problem to determine an optimal schedule has been proven to be NP–complete, apart from some restrained cases [8]. Thus, most researchers use heuristic approaches to solve the problem for reasonable sizes of the task graph. Three categories can be distinguished: list–based, clustering–based and duplication–based heuristics. List–based heuristics assign priority levels to the tasks and map the highest priority task to the best fitting processing element [10]. Clustering-based heuristics embrace heavily communicating tasks and assign them on the same processing element, in order to reduce the overall communication overhead[1]. Duplication-based heuristics also decrease the amount of communication while simultaneously the amount of (redundant) computation will be increased. It has been combined with both list-based [2] and cluster-based approaches [9]. In order to evaluate the quality of all those heuristics in a unified manner it would be desirable to compare the resulting schedules lengths of those heuristics to the optimal values. The parallel Branch–and–Bound algorithm proposed in this paper can be used to create a benchmark suite for this purpose. Of course, this is only feasible for task graphs of moderate size (e.g. with lower than 30 tasks). Usually, there are multiple optimal schedules that provide a solution set of the task graph scheduling problem. All these schedules are characterized by the same minimal schedule length. To compute these optimal schedules we have to search for all the possible assignments and sequences of the tasks. The simplest algorithm to compute the set of optimal schedules enumerates all possible solutions and stores only the best ones. But, even for a small number of tasks the number of solutions will be enormous. If we want to get the set of optimal schedules in an acceptable period of time and with maintainable memory requirements we have to devise a more skillful algorithm. The basic idea to shorten the effort of an exhaustive search is to perform a structured or informed search that reduces the state space. Thus, the number of schedules that have to be investigated is much smaller than the number of possible processor assignments multiplied by the number of possible sequences. In this way, an informed search can manage more complex task graphs than exhaustive search strategies. The informed search is often based on a A* algorithm ([3], [5]). In this paper, we will present a Branch–and–Bound approach and its implementation on a parallel virtual machine [4]. The paper is organized as follows. In the next section, the concepts of the Branch–and–Bound algorithm will be explained. The third section is concerned with the parallelization of that algorithm. It describes how the workload is partitioned and how load balancing will be achieved. In the fourth section, we present results and discuss the influence of various parameters on the efficiency of the parallel implementation.

2

Branch–and–Bound Algorithm

If we want to shorten the time to compute the set of optimal solutions, we are not allowed to consider every valid schedule (assignment of the tasks to the processing elements plus determination of the tasks’ starting times). Instead,

20

U. Hönig and W. Schiffmann

we have to divide the space of possible schedules into subspaces that contain partially similar solutions. These solutions should have in common that a certain number of tasks is already scheduled in the same way. The corresponding subspace contains all the schedules that are descended from the partial schedule but differ in the scheduling of the remaining tasks. Each partial schedule can be represented by a node in a decision-tree. Starting from the root, which represents the empty schedule, all possible schedules will be constructed. At any point of this construction process, we can identify a set of tasks that are ready to run and a set of idle processing elements to which those tasks can be assigned. In this way, each conceivable combination will produce a new node in the decision tree (Branch). Supposed we have an estimate for the total schedule length, we can exclude most of the nodes that are created in the Branch part of the algorithm. This estimate can be initialized by any heuristic. Here, we used the heuristic from Kasahara and Narita [6]. After the creation of a new partial schedule (node of the decision tree), we can estimate a lower bound of it’s runtime by means of it’s current schedule length and the static b-level values of the remaining (yet unscheduled) tasks. The lower bound is computed by the sum of the partial schedule length plus the maximum of the static b-level values. If is greater than we can exclude the newly created node from further investigation. By this deletion of a node (Bound) we avoid the evaluation of all the schedules that depend on the corresponding partial schedule (subspace of the search space). In this way, we accelerate the computation of the solution set for the task graph problem. As long as a node’s is lower or equal to the current we continue to expand this node in a depth-first manner. When all the tasks of the graph are scheduled, a leaf of the decision tree is reached. If we add the corresponding schedule to the set of best schedules. we clear the set of best schedules, store the new (complete) schedule into the set of best schedules and set Then we continue with the next partial schedule. The pruning scheme above is further enhanced by a selection heuristic that controls the order of the creation of new nodes. By means of this priority controlled breadth-first search we improve the threshold for pruning the decision tree as early as possible. Likewise to the Bound phase, this procedure reduces further the total number of evaluations. By proceeding as described above, all possible schedules are checked. At the end of the search procedure, the current set of best schedules represents the optimal schedules for the task graph problem.

3

The Parallel Algorithm

The parallelisation of the sequential Branch–and–Bound algorithm requires a further subdivision of the search-space into disjunct subspaces, which can be assigned to the processing units.

A Parallel Branch–and–Bound Algorithm

21

As already described in Section 2, every inner node of the decision-tree represents a partial schedule and every leaf node corresponds to a complete schedule. The branching rule used by the algorithm, guarantees that the sons of a node will represent different partial schedules. Since the schedules are generated along the timeline, a later reunification of the subtrees, rooting in these sons, is impossible. Therefore two subtrees of the decision-tree always represent disjunct subspaces of the search-space, if none of their roots is an ancestor of the other one. Another result of these thoughts is that every part of the search-space can unambiguously be identified by it’s root-node. In order to achieve a balanced assignment of the computation to the available processing units, the algorithm generates a workpool, containing a certain number of subtree-roots. This workpool is managed by a master-process, which controls the distribution of the tasks to the slave-processes. The workpool is created by means of a breadth-first-search which terminates, when a user defined number of elements is collected in the workpool. The nodes are numbered by the ordinal numbers of the nodes’ permutations. These ordinal numbers allow an unambiguous identification of the nodes. By means of the root’s ordinal number, it is possible to build the corresponding subtree. For that reason, the only information, that need to be stored in the workpool, are the ordinal numbers of the subtrees’ roots. This helps to keep the required memory small. The parallel algorithm can be partitioned into three parts, called Initialisation, Computation and Finalisation. During the Initialisation-Phase, the master launches the slave-processes and splits the whole task into a number of smaller subtasks. The number of subtasks depends on the size of the workpool, which is specified by the user. The Computation-Phase begins as soon as the master assigns a subtask to every slave1. Then, the master has to wait until it receives a message from one of the slaves which indicates that the slave has completely analysed a given subtree or that it has found an improved schedule. In the last case, the master only stores the broadcasted schedule length and the process-id of the sending slave. In the other case, it sends a new subtask to the slave if the workpool is not empty, otherwise the slave will stay idle. As soon as all subtasks are processed and all slaves are idle, FinalisationPhase will take place. The master informs all slaves about the end of the computation-process. This request is necessary, because the sending of messages appears asynchronously and it is not guaranteed, that every message that indicates a new best solution was already sent and received. Every slave receiving the finalisation message, compares the global best solution to it’s own recent results and possibly deletes it’s own suboptimal results. If the slave recognizes, that it’s own recent temporal solution is better than the global best solution, it sends an appropriate broadcast to the master and to all other slaves. Then, the slave sends the master an acknowledgement to indicate, that it finished the adjustment successfully. The master waits, until it receives an acknowledgement 1

It is required that the workpool contains more tasks than slaves.

22

U. Hönig and W. Schiffmann

from every slave. Then it requests the complete schedule from the last slave that reported to have found the best solution so far. Additionally it requests some bookkeeping information of all slaves. The slaves terminate after sending their replies. Finally, the master creates the output-file and terminates as well.

4

Results

To achieve an efficient informed search algorithm, there are some constraints that should be analysed before starting the computation of larger problems. It was found, that some of the most important constraints that influence the search-speed are independent of the given task graph. These aspects belong to the algorithm’s properties such as the size of the workpool and the number of processing elements that are involved in the search-process. Additionally, we demonstrate the suitability of our approach for the evaluation of scheduling heuristics. For this purpose, we analyse the heuristic of Kasahara and Narita [6] using a test bench of approximately 1700 task graphs.

4.1

Size of the Workpool

After it’s creation, the workpool includes the complete search space, subdivided into a user-defined number of subspaces. Apparently, the size of a subspace is determined by the number of schedules that it contains. If the number of subspaces will be increased, the size of every subspace will be reduced. In this way the workpool’s size determines the granularity of the search space and the number of schedules one slave has to analyse. Figure 1 shows how the runtime for different task graph problems depends on the size of the workpool. On the left side, we see how an increase of the workpool size can reduce the runtime on a parallel computing system with 30 processing elements. It is clearly visible that a workpool size between approx. 300 to 1200 elements (partial schedules) will be useful in this case. In contrast, we see on the right side of figure 1, that the situation for light-weight scheduling problems changes to get worse when using a parallel implementation. In this case, the minimal runtime is reached with the sequential implementation and no subtasks at all. The relative slowdown increases with the workpool’s size and might become clearly more than 100 %. Since it is difficult or even impossible to estimate the computational complexity of a schedule, the workpool size has to be chosen carefully in order to minimize the overall runtime.

4.2

Number of Processing Elements

Usually, the maximum speedup of a parallel program will be equal to the number of the available processing elements. In order to evaluate the scaling of the parallel implementation we used three task graphs that had sequential runtimes of approximately one minute each. The workpool size was set to 6000.

A Parallel Branch–and–Bound Algorithm

23

Fig. 1. Influence of the workpool’s size on the runtime. Sequential runtime on the left side diagram: 50..250s; on the right side diagram: 0,01..1,2s

Figure 2 shows the relation between the speedup-factor and the number of used processing elements. For smaller numbers of processing elements, the speedup-factor increases almost linear (for one example even a slight superlinear speedup can be recognized). If the number of processing elements gets larger than 8 the speedup differs from being linear and it begins to move to a saturation limit of approximately 16.

Fig. 2. Influence of the number of processing elements on the speedup-factor

4.3

Analysing Scheduling Heuristics

The computation of optimal schedules is a rather time-consuming process which is only possible for small to medium-size task graphs. Although most of the

24

U. Hönig and W. Schiffmann

proposed scheduling heuristics aim at large task graphs, this subsection should show, that the efficiency of those heuristics can also be analysed by considering smaller graphs for which the optimal schedule lenghts can be computed by the proposed algorithm. A test with approximately 1700 computed optimal schedules was carried out to evaluate the heuristic’s efficiency. The used graphs were generated randomly in terms of multiple possible settings of the DAGs’ properties, e.g. the connection density between the nodes. Using such a wide spread variation of task graph properties, we can be sure that the results are independent of the chosen task graph set. This way, our approach enables scientists to evaluate and compare their heuristics’ results more objectively. To demonstrate this new opportunity, we use the well-known heuristic of Kasahara and Narita, described in [6].

Table 1 shows the deviation of this heuristic’s results from the optimal schedule lengths. The heuristic finds a solution with the optimal schedule length for 57.67% of the investigated task graphs. Regarding to the other task graphs, the observed deviation from the optimal schedule length is rather low (< 10%) in 83.41% of the cases. Only 2.86% of all solutions are worse than 25%. The performance of heuristics is usually evaluated by comparing their solutions with the ones of another (well known) heuristic. We argue that it would be more meaningful to use the deviations from the optimal solutions introduced above. For this purpose we will soon release a benchmark suite that provides the optimal solutions for 36.000 randomly created task graph problems which cover a wide range of different graph properties.

5

Conclusion

In this paper we presented a parallel implementation of a Branch–and–Bound algorithm for computing optimal task graph schedules. By means of parallelization the optimization process is accelerated and thus a huge number of test cases can be investigated within a reasonable period of time. The runtime needed for the computation of an optimal schedule is highly dependent of the workpool’s size and the number of processing elements that are available for computation. In order to reduce the runtime, the size of the workpool has to be chosen carefully. A nearly linear speedup can be achieve, provided that an appropriate workpool size is used.

A Parallel Branch–and–Bound Algorithm

25

By means of the parallel Branch–and–Bound algorithm, the optimal schedules for a benchmark suite that comprises 1700 task graphs were computed. This allows for a more objective evaluation of scheduling-heuristics than comparisons between heuristics. We evaluated the solutions of the heuristic of Kasahara and Narita [6] by comparing the corresponding schedule lengths towards the optimal schedule lengths of all the 1700 test cases. The authors’ future work will include the release of a test bench, which will provide a collection of 36000 task graph problems together with their optimal schedule lengths. This benchmark suite will enable researchers to compare the performance of their heuristics with the actually best solutions. Acknowledgement. The authors would like to thank Mrs. Sigrid Preuss who contributed some of the presented results from her diploma thesis.

References 1. Aguilar, J., Gelenbe E.: Task Assignment and Transaction Clustering Heuristics for Distributed Systems, Information Sciences, Vol. 97, No. 1& 2, pp. 199–219, 1997 2. Bansal S., Kumar P., Singh K.: An improved duplication strategy for scheduling precedence constrained graphs in multiprocessor systems, IEEE Transactions on Parallel and Distributed Systems, Vol. 14, No. 6, June 2003 3. Dogan A., Özgüner F.: Optimal and Suboptimal reliable scheduling of precedenceconstrained tasks in heterogeneous distributed computing, International Workshop on Parallel Processing, p. 429, Toronto, August 21-24, 2000 4. Geist A., Beguelin A., Dongarra J., Jiang W., Mancheck R., Sunderam V.: PVM 3 Users Guide and Reference Manual, Oak Ridge National Laboratory, Tennessee 1993 5. Kafil M., Ahmad I.: Optimal Task assignment in heterogeneous distributed computing systems, IEEE Concurrency: Parallel, Distributed and Mobile Computing, pp. 42-51, July 1998 6. Kasahara, H., Narita, S.: Practical Multiprocessor Scheduling Algorithms for Efficient Parallel Processing. IEEE Transactions on Computers, Vol. C-33, No. 11, pp. 1023-1029, Nov. 1984 7. Kohler, W.H., Steiglitz, K.: Enumerative and Iterative Computational Approaches. in: Coffman, E.G. (ed.): Computer and Job-Shop Scheduling Theory. John Wiley & Sons, New York, 1976 8. Kwok, Y.-K., Ahmad, I.: Static scheduling algorithms for allocating directed task graphs to multiprocessors. ACM Computing Surveys, Vol. 31, No. 4, 1999, pp. 406–471 9. Park C.-I., Choe T.Y.: An optimal scheduling algorithm based on task duplication, IEEE Transactions on Computerss, Vol. 51, No. 4, April 2002 10. Radulescu A., van Gemund A. J.C.: Low-Cost Task Scheduling for DistributedMemory Machines, IEEE Transactions on Parallel and Distributed Systems, Vol. 13, No. 6, June 2002

Selection and Advanced Reservation of Backup Resources for High Availability Service in Computational Grid* Chunjiang Li, Nong Xiao, and Xuejun Yang School of Computer, National University of Defense Technology, Changsha, 410073 China, +86 731 4575984 [email protected]

Abstract. Resource redundancy is a primary way to improve availability for applications in computational grid. How to select backup resources for an application during the resource allocation phase and how to make advanced reservation for backup resources are challenging issues for grid environment to provide high availability service. In this paper, we proposed a backup resources selection algorithm base on resources clustering, then devised several policies for advanced reservation of backup resources. With this algorithm and these policies, the backup resource management module in grid middleware can provide resource backup service more efficiently and cost-effectively. These algorithm and policies can be implemented on the GARA system, make the QoS architecture more powerful and practical for computational grid.

1

Introduction

Grid computing [1] is a kind of distributed supercomputing, in which geographically distributed computational and data resources are coordinated for solving large-scale problems. The resources in the grid environment are wide-area distributed, heterogeneous in nature, owned by different individuals or organizations; which makes the grid a more variable and unreliable computing environment. The most common failure include machine faults in which hosts go down, and network faults where links go down. When some resources fail, the applications using such resources have to stall, waiting for the recovery of the failed resources or migration to other resources. Ways to reduce the stall-time for applications are critical for grid middleware to provide high availability service. Resource backup is such a kind of methods, that is, at the resource allocation phase, allocates redundant resources as the backup resources for the application, when some resources fail, the application’s tasks running on the failed resources can migrate to the backup resources, * This work is supported by the National Science Foundation of China under Grant No.60203016 and No.69933030; the National High Technology Development 863 Program of China under Grant No.2002AA131010. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 26–33, 2004. © Springer-Verlag Berlin Heidelberg 2004

Selection and Advanced Reservation of Backup Resources

27

without asking the global resources manager to reallocate resources which is time-consuming in the grid. The application in computational grid usually uses a large amount of resources, and the resources usually differ from each other on type, hardware/software architecture, and performance. So, how to select backup resources for this application is more complex than ever. The backup resources allocated for the application may not be used at all during the application running process. If making advanced reservation for all those backup resources during this phase, it is more feasible to get the resource wasted, which makes the application less cost-effective. So, it is necessary to design multiple policies for advanced reservation of backup resources. In this paper, firstly, we proposed a backup resources selection algorithm based on resources clustering. This algorithm simplified the selection process by dividing the resources used by the application into subsets, the resources in each subset share somewhat similarity and can share backup resources. Then we devised several policies for advanced reservation of backup resources. These policies can be implemented with the GARA system [2]. This paper is organized as follows. The resource backup process is described in section 2. In section 3, the selection algorithm of backup resources based on resources clustering is introduced. In section 4, the policies for advanced reservation of backup resources are presented, and briefly discussed the architecture of the policy engine for resource backup. Conclusion is drawn in section 5.

2

Resource Backup

Resource backup [3] in a single computer system, like a cluster or a server, is often done by hardware component redundancy. In computational grid, however, as the resources belong to different administrative domains, the grid middleware cannot rearrange the hardware components for redundancy. The only way for resource backup is to allocate redundant resources for the application that need high availability. Here, we call the resources allocated to the application on which the application is running as primary resources, and call the resources for redundancy as backup resources. As we know, the resource allocation process in the grid includes two phases [4]: resource discovery and reservation, for example, when a user submits an application with resources requirement described with RSL [5] to the computational grid, the global scheduler first analyzes the resource requirement, performs resource discovery using the grid information system, then computes a primary resource list; secondly, it try to obtain reservations for these resource. If the application needs high availability service, the scheduler can allocate redundant resources at these phases, keeping advanced reservation of some resources as backup. So, in the running process, when the resources on which some application’s task running are failed, these tasks can migrate to the backup resources and continue to run without reallocation process. This can reduce the stall time of the application, increasing its availability. Usually, the application in the grid does not explicitly declare which resources should be allocated as backup resources when it is submitted. The backup resources management module should select backup resource according to the resources requirement and

28

C. Li, N. Xiao, and X. Yang

availability requirement of the application. So, backup resources selection algorithm is absolute necessary. Furthermore, the backup resources may not be used during the running process, in order to avoid the waste of resources, flexible advanced reservation policies is also necessary. In this paper, we first present a backup resource selection algorithm based on resource clustering, then designed some policies for the advanced reservation of backup resources. They are very valuable for providing high availability services in computational grid.

3 3.1

Selection Algorithm Definitions

Before process resource clustering, we give the following definitions for the relations between resources. Definition 1, Substitution Relation, if the task running on resource can also run on and can satisfy the performance requirement of the tasks, then we call that resource is substitutable by denoted as Sub Definition 2, Similarity relation, if two resources and can substitute each other, i.e. Sub and Sub then we call and as similar resources, denoted as Sim Definition 3, Complete resource set, a resource set if Sim then we call as complete resource set. Definition 4, Similarity relation of resource set, complete resource set and for if Sub and Sub then we call and as similar resource set, denoted as Sim

3.2

Resource Clustering

It is obvious that the resources in a complete resource set can share backup resources. The primary resources allocated to an application can cluster into several complete resource set based on similarity relation, then we can choose backup resources for each set. The clustering includes two steps: inner domain clustering and inter domain clustering. First we will examine the determination of similarity relation between two resources. Determination of Similarity Relation. The similarity relation can be determined at two levels: physical level and logical level. At the physical level, it is obvious that homogeneous resources allocated to an application are similar. At the logical level, for two resources allocated to an application, and if the tasks running on can also run on and vice versa, and the application performance can still be satisfied, then and are similar resources. For example, a grid application GA, the tasks of it are programmed with Java, and there are two workstations can run these two tasks , one is a Linux workstation, and the other is a Windows workstation. If the tasks running on these two resources can be exchanged without degradation of the performance, then these two workstations are similar computing resources for the application, although they are heterogeneous computing resources.

Selection and Advanced Reservation of Backup Resources

29

Inner Domain Clustering. It is common that quite a few resources in the primary set are belonging to one administrative domain, for example a cluster or a mainframe. Usually, the resources in a domain are homogeneous, we suppose this is true for all administrative domains. So, the inner domain clustering is easy to perform. The primary resource set can be denoted as these resources are from domains, and the resources in each domain is a complete resource set. is the resources set from the domain. If the resources coming from a domain are heterogeneous, we can divide the domain into sub-domains, and keep the resources in each sub-domain are homogeneous. Inter Domain Clustering. In order to have the primary resources of the application cluster into complete resource sets, the application level clustering must be performed. Inter domain clustering is based on the similarity between domains, which can be determined with following rule: for and if Sim then Sim We use a matrix to describe the similarity relation between domains. If Sim then otherwise The algorithm for inter domain clustering is given in 1.

Fig. 1. Inter Domain Resources Clustering

3.3

Backup Resource Selection for a Resource Set

Availability Measurement. For selecting backup resources, it is necessary to devise a measurement method for the availability of a resource set. In this paper, for simplicity, we use an availability measurement model based on probabilistic model [6]. Suppose a primary resource set of an application is only when all the resources in it are available, the application can run smoothly; otherwise, the application will stall. Suppose A(R)is the availability of this resource set, and the failure rate of is then,

30

C. Li, N. Xiao, and X. Yang

Suppose the availability demand of this application is if then need to allocate backup resources for it. Suppose we have selected backup resources for this resource set, and the expanded resource set is the availability of it can be defined as: {at least resources in is available } If add a backup resource to R, denoted as then

If add more than one backup resources, the computation of is very difficult, for simplicity, we use an approximate method. In this approximate method, we suppose the failure rate of each resources in R is equal to the highest one, denoted as Then the availability of resource set R is In selecting backup resource for this resource set, the failure rate of the backup resource should not over suppose we chosen backup resources, then

Backup Resource Selection algorithm. Based on the upper discussions, we proposed the algorithm for selecting backup resources for a resource set R, shown in 2 In this algorithm, each iteration only choose one backup resource for

Fig. 2. Backup Resources Selection Algorithm for a Complete Resource Set

Selection and Advanced Reservation of Backup Resources

31

the complete resource set. This can determine the minimum number of backup resources for it. In practice, we can choose more than one resource at each time, improve the efficiency of this algorithm.

3.4

Backup Resource Selection Algorithm Based on Resource Clustering

Now we can describe the whole backup resource selection algorithm based on resource clustering as shown in 3:

Fig. 3. Backup Resource Selection Algorithm based on Resource Clustering

4

Policies for Advanced Reservation of Backup Resources

The allocation of backup resources is fulfilled by advanced reservation, in which the reservation is requested before it is needed. The GARA system based on the famous grid middleware Globus Toolkit [7,8], can process advanced reservation for multiple resources such as CPU slot, network bandwidth and storage capacity. But the GARA system lacks flexible policies for advanced reservation. In order to reduce the waste of resources, we designed several policies for advanced reservation of backup resources, these policies can be implemented on the GARA system, providing flexible resource backup mechanism. Flowing, we will introduce the idea of each policy briefly. Here, we use to denote the list of backup resource for application GA, the resources in which is selected by the selection algorithm proposed above.

4.1

Policies

Totally Advanced Reserved, TAR. When allocating backup resources, make advanced reservation for all of the resources in of application GA; that is

32

C. Li, N. Xiao, and X. Yang

only when all the backup resources have been advanced reserved, the application can begin to run. Partially Advanced Reserved, PAR. In the advanced reservation stage, only make advanced reservation for some of the resources in form the backup resource set During the running process, if the advanced reserved resources are not enough for failure recovery, then make advanced reservation for other backup resources in Compensate for Risk, CR. For reducing the number of backup resources in the whole grid environment, different applications can share their backup resources. In this policy, the backup resources which are advanced reserved can assemble into a resource pool, like the insurance mechanism in the society. But for this policy, the time management for advanced reservation is critical, because it is very often that backup resources for different application have different time period for reservation. Delayed Advanced reservation, DAR. In this policy, before the application start, none of the resources in is advanced reserved. Only when resource failure occurs, the resource reservation agent begins to reserve the backup resources in the backup resource list of this application. The number of resources reserved can be determined according to the failure condition and the availability demand of the application.

Fig. 4. The Architecture of Policy Engine

No-backup, NB. For completeness, we also call the method that does not prepare any backup resources for the application as a backup policy. That is, never make advanced reservation of resources used for backup. When failure

Selection and Advanced Reservation of Backup Resources

33

occurs, the job management module in the grid reallocates resources for the failed tasks.

4.2

Policy Engine

It is obvious that each policy has benefits and drawbacks. The backup resource management module in the computational grid will never use only one policy to serve all the applications. It is undoubtedly that a policy engine should be implemented for resource backup service. In 4 we described the architecture of the policy engine for resource backup service of computational grid. There are three critical modules in the policy engine: MLBR is the module that Manage the List of Backup Resources for applications, MARR is the module that Manage the Advanced Reserved Resources in the whole environment, and TMARR is Time Management module for Advanced Reserved Resources. All these modules can call the GARA API to control the advanced reservation of resources. The policy engine can provide flexible resource backup service for the applications.

5

Conclusion

Backup resource management for providing high availability service in computational grid faces two critical issues: backup resource discovery and advanced reservation. In this paper we proposed a backup resource selection algorithm and devised several policies for advanced reservation. These algorithm and policies can provide flexible resource backup service for the applications in computational grid.

References 1. Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers (1999) 2. Simoni, A.: End-to-End Quality of Service for High-End Applications. PhD thesis, the University of Chicago (2001) 3. Hwang, K., Xu, Z.: Scalable Parallel Computing. Mc-Graw-Hill Companies Inc. (1998) 4. K. Czajkowski, I. Foster, N.K.C.S.M.W.S., Tuecke, S.: A resource management architecture for metacomputing systems. In Proceedings of the IPPS/SPDP’s 98 Workshop on Job Scheduling Strategies for Parallel Processing (1998) 5. WWW: The globus resource specification language rsl v1.0. (http://www.globus.org/gram/rsl_spec1.htm) 6. Archana Sathaye, S.R., Trivedi, K.: Availability models in practice. In Proceedings of Int. Workshop on Fault-Tolerant Control and Computing (FTCC-1) (2000) 7. Foster, I., Kesselman, C.: Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputing Applications 11 (1997) 115–128 8. WWW: (http://www.globus.org)

An Online Scheduling Algorithm for Grid Computing Systems Hak Du Kim1 and Jin Suk Kim2* 1

Electronics and Telecomunications Research Institute, Taejon, Korea 2 School of Computer Science, University of Seoul, Seoul, Korea [email protected]

Abstract. Since the problem of scheduling independent jobs in heterogeneous computational resources is known as NP-complete [4], an approximation or heuristic algorithm is highly desirable. Grid is an example of the heterogeneous parallel computer system. Many researchers propose heuristic scheduling algorithm for Grid [1], [8], [9], [10]. In this paper, we propose a new on-line heuristic scheduling algorithm. We show that our scheduling algorithm has better performance than previous scheduling algorithms by extensive simulation.

1 Introduction A Grid computing system is a system which has various machines to execute a set of tasks. We need high performance Grid computing systems in the field of natural science and engineering for large scale simulation. In this paper, we propose a scheduling algorithm which assigns tasks to machines in a heterogeneous Grid computing system. The scheduling algorithm determines the execution order of the tasks which will be assigned to machines. Since the problem of allocating independent jobs in heterogeneous computational resources is known as NP-complete [4], an approximation or heuristic algorithm is highly desirable. In the scheduling algorithms, we consider that the tasks randomly arrive the system. We assume that the scheduling algorithms are nonpreemptive, i.e., tasks must run to completion once they start, and the tasks have no deadlines. All the tasks are independent, i.e., there is no synchronization or communication among the tasks. In the on-line mode, a task is assigned to a machine as soon as it arrives at the scheduling system

2 Related Works The scheduling problem has already been investigated by several researchers [1]. In MET(Minimum Execution Time), the scheduling algorithm is developed to minimize execution time, i.e., the algorithm assigns a task to the machine which has the least * Corresponding author: Jin Suk Kim M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 34–39, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Online Scheduling Algorithm for Grid Computing Systems

35

amount of execution time. Since this algorithm does not see the ready times of the machines, it has imbalance of load among machines. This algorithm calculates only the minimum one among m machine execution times, and then the algorithm assigns the task to the selected machine. The time to find the machine which has minimum execution time is O(m). The MCT(Minimum Completion Time) assigns a task to the machine which has minimum completion time, i.e., the algorithm gets the completion time for each task by adding begin time and execution time, and calculates the minimum one among m machine completion times. The algorithm assigns the task to the selected machine. Therefore, the time to get the machine which has minimum completion time is O(m). The KPB(K-Percent Best) first finds (k/100)*m best machines based on the execution time for the task, and the algorithm calculates the minimum one among selected machine completion times, and then the algorithm assigns the task to the machine which has the minimum. The time to get the subset of machines is O(mlogm) because the time for sorting m execution times is needed. The time to determine the machine which has minimum completion time is O(m). Overall time complexity of KPB is O(mlogm).

3 A Scheduling Algorithm The execution time denotes the amount of time which is taken to execute task on machine [1]. The completion time denotes the time at which machine completes task Let the begin time of on be We can see that from the above definition. is defined as the set of tasks. n is the number of tasks and m is the number of machines. In this paper, we use a makespan as a performance metric for scheduling algorithms. The makespan is the maximum completion time when all tasks are scheduled. Figure 1 shows a proposed scheduling algorithm MECT(Minimum Execution Completion Time). The inputs of the scheduling algorithm are a task and a set of execution times which is taken to execute on machines In the step I, MECT finds maximum begin time among the begin times of all machines. In the step II, find a subset of machines M’ such that for in M’. In the step III, if M’ is not empty, MECT determines the machine in M’ which has minimum execution time to execute the task Otherwise, MECT determines the machine in M which has minimum completion time to execute the task Finally, MECT returns the index of machine k. Here, we compute the time complexity of MECT. The time that is taken to get a maximum begin time is O(m). The time to find a subset of machines M’ is also O(m). In the step III, we can get k in O(m). Therefore, overall time complexity of MECT is O(m).

36

H.D. Kim and J.S. Kim

Fig. 1. A Scheduling Algorithm MECT

4 Simulation Results In this section, we made a simulation program with SimJava which is used in discrete model simulations [6]. In this simulation, we assume that the execution time for each task on each machine is known prior to execution. This assumption is used when studying scheduling algorithm for heterogeneous computing systems [11]. We use task-machine matrix which has the execution times. Figure 2 shows an example of task-machine matrix. For example, the 3rd row represents execution times for on each machines, i.e., To simulate the scheduling algorithms on various scheduling situations, many studies use task-machine matrix consistency model [11]. We say a task-machine matrix is consistent if machine executes task faster than machine then machine executes all tasks faster than machine We say a task-machine matrix is inconsistent where machine executes some tasks faster than machine and machine executes other tasks faster than machine A task-machine matrix is said to be semi-consistent if some columns are consistent and other columns are inconsistent. Figure 2 represents inconsistent task-machine matrix which has 10 tasks and 5 machines. In this matrix, we can see that executes faster than but executes slower than In this simulation, we made task-machine matrices which have 1,000 tasks and 20 machines. We use the arrival rate of tasks 100. Figure 3 shows the average makespans for scheduling algorithms in inconsistent model. The machine heterogeneity is varied 10 to 120 and the task heterogeneity is 3000. We have 50 tests for each cases. It can be noted that MECT outperforms previous scheduling algorithms.

An Online Scheduling Algorithm for Grid Computing Systems

Fig. 2. A 10×5 Task-Machine Matrix

Fig. 3. The makespans for scheduling algorithms in the inconsistent model

Fig. 4. The makespans for scheduling algorithms in semi-consistent model

37

38

H.D. Kim and J.S. Kim

Fig. 5. The makespans for scheduling algorithms in consistent model

Figure 4 and 5 show the simulation results in the semi-consistent model and the consistent model, respectively. In these figures, we can see that MECT competes with MCT. Note that the performance of MET is lower than that of other three algorithms. In the last simulation, the machine heterogeneity is 20 and the task heterogeneity is varied from 500 to 3000. Figure 6 compares the scheduling algorithms based on makespan. We can see that MECT outperforms previous scheduling algorithms when the task heterogeneity is high.

Fig. 6. The makespans for scheduling algorithms in consistent model

An Online Scheduling Algorithm for Grid Computing Systems

39

5 Conclusion In this paper, we propose a new scheduling algorithm MECT for heterogeneous Grid computing systems. The proposed scheduling algorithm is a kind of on-line scheduling algorithm. We show that MECT has better performance than the traditional scheduling algorithms especially when the heterogeneity is high.

References [1]

M. Maheswaran, S. Ali, H. J. Siegel, D. Hensgen, and R. F. Freund, “Dynamic Matching and Scheduling of a Class of Indenpendent Tasks onto Heterogeneous Computing Systems,” Proc. of the 8th Heterogeneous Computing Workshop, pp. 30-44, April, 1999. [2] I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” Journal of High-Performance Computing Applications, vol. 15, no. 3, pp. 200-222,2001. [3] T. D. Braun, H. J. Siegel, and Noah Beck, “A Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems,” Journal of Parallel and Distributed Computing, vol. 61, pp. 810-837, 2001. [4] O. H. Ibarra and C. E. Kim, “Heuristic Algorithm for Scheduling Independent Tasks on Nonidentical Processors,” Journal of the ACM, vol. 24, no. 2, pp. 280-289, April, 1977. [5] M. Pinedo, Scheduling: Theory, Algorithms, and Systems, Prentice Hall, NJ, 1995. [6] F. Howell and R. McNab, “SimJava: A Discrete Event Simulation Package For Java With Applications In Computer Systems Modelling,” Proc. of the 1st International Conference on Web-based Modelling and Simulation, January, 1998. [7] A. A. Khokhar, V. K. Prasanna, M. E. Shaaban, and C. L. Wang, “Heterogeneous Computing: Challenges and Opportunities,” IEEE Computer, vol. 26, pp. 18-27, June, 1993. [8] R. Buyya, J. Giddy, and D. Abramson, “An Evaluation of Economy-based Resource Trading and Scheduling on Computational Power Grids for Parameter Sweep Applications,” Proc. of the 2nd International Workshop on Active Middleware Services, August, 2000. [9] H. Barada, S. M. Sait, and N. Baig,“Task Matching and Scheduling in Heterogeneous Systems using Simulated Evolution,” Proc. of the 15th Parallel and Distributed Processing Symposium, pp. 875-882, 2001. [10] B. Hamidzadeh, Lau Ying Kit, and D.J. Lilja, “Dynamic Task Scheduling using Online Optimization,” Journal of Parallel and Distributed Systems, vol. 11, pp. 1151-1163, 2000. [11] T. D. Braun, H. J. Siegel, N. Beck, L. L. Boloni, M. Maheswaran, A. I. Reuther, J. P. Robertson, M. D. Theys, B. Yao, D. Hensgen, and R. F. Freund, “A Comparison Study Mapping Heuristics for a Class of Meta-tasks on Heterogeneous Computing Systems,” 8th IEEE Heterogeneous Computing Workshop, pp. 15-29,1999.

A Dynamic Job Scheduling Algorithm for Computational Grid* Jian Zhang and Xinda Lu Department of Computer Science and Eng., Shanghai Jiaotong Univ., Shanghai 200030, China {zhangjian,

lu-xd}@cs.sjtu.edu.cn

Abstract. In this paper, a dynamic job-scheduling algorithm is proposed for a computational grid of autonomous nodes. This algorithm tries to utilize the information of a practical system to allocate the jobs more evenly. In this algorithm, the communication time between nodes and the scheduler is overlapped with the computation time of the nodes. So the communication overhead can be little. The principle of scheduling the job is based on the desirability of each node. The scheduler would not allocate a new job to a node that is already fully utilized. The execution efficiency of system will be increased. An implementation framework of the algorithm is also introduced.

1 Introduction The Grid concept has recently emerged as a vision for future network based computing. A computational grid is a large scale, heterogeneous collection of autonomous systems, geographically distributed and interconnected by low latency and high bandwidth networks. Networks of workstations, NOWs, represent particular forms of grids. Like an electrical power grid, the Grid will aim to provide a steady, reliable source of computing power. The most difficult problems in a computational grid are the management and control of resources, dependability and security. Obviously, until now, researchers have proposed many techniques to allocate jobs dynamically in a parallel system to improve its performance. But most of these algorithms ignore practical issues such as the speed of nodes, the process ability of the node, and the different size of the jobs. These schedulers might not perform satisfactorily in practical. In this paper, a dynamic job-scheduling algorithm is proposed. In this algorithm, the scheduler tries to allocate a job according to the knowledge of the nodes. This algorithm is very general and it can adapt to more situations to improve the job fairness. This algorithm is also a valid method to prevent the saturation of the nodes. This paper is organized as follows. In the second section, an analysis of the general job scheduling policies is presented. Then in the third section, the job-scheduling algorithm is introduced and explained in detail. The mobile agent based implementation framework is giving in the fourth section, followed by a summary of this paper in the fifth section. *

This work was supported by the National Science Foundation of China (No. 60173031)

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 40–47, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Dynamic Job Scheduling Algorithm for Computational Grid

41

2 Analysis of the General Job Scheduling Policies There are three general job-scheduling policies [1]: Static scheduling: In this policy, the job is assigned to the nodes at compile time and will never be reassigned again. Dynamic scheduling: System makes all scheduling decisions at run time. They use a central work-queue where all idle nodes go to find work to execute. Affinity scheduling: In this policy, schedulers create one local work queue for each node. Each node is statically assigned some work, as if static scheduling were used. If load imbalance actually occurs, idle nodes search the work queues of other nodes to find work to do. In the static scheduling policy, because the scheduler need not communicate with the node before the assigning the job, and the job will never be reassigned, the synchronization and communication overhead will be very little. But the drawback is that it may lead to underutilization of nodes. In the dynamic scheduling policy, the right to take initiative is controlled by the node. When a node is in the idle state, it will send message to the scheduler to ask for the job. The job in the head of job queue will be assigned to this node. In this policy, the load imbalance will be reduced to the minimum. However, this policy may also result in an increase of communication overhead. Before the idle node can get the job, it has to spend time in making communication with scheduler. The affinity scheduling policy attempts to strike a balance among the static and dynamic schedulers. On one hand, it can reduce the load imbalance. There will be no idle node if there is a job waiting for executing. On the other hand, the communication overhead between nodes and scheduler is little. Each node has its own job queue. But in this policy, migration may be taken among nodes. The overhead between nodes is increasing. In some condition, the overhead related to migration is higher than the overhead related to load imbalance. Thus the performance of affinity scheduling policy may be worse than static scheduling policy. If no migration in the affinity scheduling policy, the performance of system would be improved notably. With the utilization of some other practical information, we can do this. The size of jobs is useful practical information, but it is hard to get in advance. We still can get some other information from the system. We can estimate the speed of each node. We can know the maximum number of jobs that can be executed concurrently on each node. And we can also know that there are three jobs running in each node right now. Such information is also useful. The algorithm we proposed in this paper is based on the utilization of these information.

3 The Dynamic Job Scheduling Algorithm 3.1 The Architecture of the Algorithm Intuitively, the best load-balancing status occurs when all nodes are at the point of full utilization and each node’s workload is proportional to its capacity. We try to increase the throughput of the system. We may not allocate more jobs to a fully utilized node, because this will cause imbalance without improving the overall throughput [2]. We

42

J. Zhang and X. Lu

should know whether we could allocate a new job to a particular node or not. An important characteristic of the algorithm is to estimate the desirability of executing a new job in each node. We use a scheduling function to reflect this desirability. When the node is saturated, scheduler will not allocate new job to it. The architecture of this algorithm is shown in Fig.1.

Fig. 1. Architecture of the algorithm

The characteristic of this architecture is that there is no job queue for each node. Additionally, there is a message queue for the scheduler. All the messages sent to the scheduler will be appended to this queue. The scheduler will extract the message from the message queue to process. Taking Fig.1 as an example. Assuming a node will be fully utilized when three jobs are running in it, then the scheduler will not dispatch the next job until a running job is completed. Because all short jobs are in a job in will more likely be completed first. Therefore, the next long job has a higher chance of being allocated to Then we can reduce the chances of overloading the nodes and tend to distribute long jobs to the nodes more evenly.

3.2 The Algorithm In order to explain this algorithm in detail, here some notions are explained at first. P The set of all nodes A node in the set P, The initial speed of a node The bounding factor of a node The run queue length of a node The effective speed of a node The scheduling function Central job queue Message queue

A Dynamic Job Scheduling Algorithm for Computational Grid

43

In a computational grid, all the nodes consist the set of nodes P. denotes each node in the node set, is a bounding factor. It limits the number of schedulable jobs in that node. The bounding factor is generally set such that when jobs are scheduled to the node is fully utilized. The value of the bounding factor will affect the response time of system. If we set a high for a node the number of jobs that can be executed concurrently on this node will be high, then the response time will be long. So if in a system where response time is important, the bounding factor should be set to a low value. is the run queue length of a node, reflecting the number of active jobs which are being executed on this node. Moreover, should not larger than

Each node has its speed. When it is in idle state, its speed will be its initial speed. We use to denote this speed. This speed should be the highest speed for this node. When some jobs are being executed in this node, the speed of this node will decrease. The current speed will be its effective speed. We use to denote the effective speed of node. The effective speed of a node has an inverse proportion with the number of jobs in this node. The more the number of jobs is, the lower the effective speed is. When the node is in idle state, its effective speed should be its initial speed. We may use the formula below to calculate the effective speed of a node:

is the scheduling function reflecting the desirability of executing a new job on this node The larger is, the more desirable sending a job to is. If equals zero, this means the number of jobs in this node has reached this node has already been fully utilized. The scheduler tries to find the node with the highest from the nodes whose is larger than zero.

If of all the nodes are equal to zero,

this means all the nodes in the parallel system are all saturated. The scheduler will not dispatch additional jobs to overload an already saturated system but to wait for some running jobs to complete. The scheduling function may be expressed like this:

From this formula, we see that the larger is, the smaller is. When is less than will be larger than zero. When equals will be zero. If is less than zero, it means the node is overloaded with local jobs.

44

J. Zhang and X. Lu

There are three kinds of messages that the scheduler will receive. Job_done: It is sent from the node to the scheduler when a job in this node is completed. New_job: This message is sent to the scheduler when a new job is coining. The scheduler will append this job at the end of the central job queue. Current_load: The current run queue length of a node. The algorithm is shown below:

In the initial phase, it will get the bounding factor and the initial speed of each node The for each node will be set zero. Then calculates the effective speed and the value of scheduling function for each node. The algorithm then processes the incoming message in the main loop. If the message is “New_job”, it will append the new job at the end of central job queue If the message is “Job_done”, the of that node will be minus 1. The and will be recalculated. In the next step, the algorithm will find the with the highest from the set P. If the highest is zero, it will not extract the job from the to execute but to process the next message from the If the highest is larger than zero, it will extract the job from the and execute it on The of that node will be added 1. The and will also be recalculated. Compared with the dynamic scheduling policy, in our algorithm, the node does not need to use a special time to make communication with the scheduler. The time for communication and the time for computation are overlapped. When it completed a job, it will send a “Job_done” message to the scheduler, and it will continue to execute other jobs assigned on it but to wait for new job. This could reduce the overhead between the node and the scheduler. It alleviates the drawback of the dynamic scheduling policy. Compared with the affinity scheduling policy, when the scheduler want to allocated a job, the algorithm does not select the node blindly. It will select among nodes whose scheduling functions are large than zero, which is calculated with the process ability and the effective speed of that node. Therefore, it can evenly distribute the workload among the nodes, and no migration is needed. Another important characteristic of this algorithm is that it would not allocate more jobs to a node when the desirability of that node is zero. This means it would not allocate more jobs to the node that is already fully utilized. Most other scheduling algorithms submit the jobs to the selected node without checking if the node has become

A Dynamic Job Scheduling Algorithm for Computational Grid

45

saturated after selection and before sending out the job. These algorithms want to increase the throughput of system, but on contrary, it will decrease the execution efficiency of system.

4 Mobile Agent Based Implementation of the Algorithm 4.1 Mobile Agent Mobile agent is an emerging paradigm that is now gaining momentum in several fields of applications [3]. A mobile agent corresponds to a small program that is able to migrate to some remote machines, where it is able to execute some function or collect some relevant data then migrate to other machines in order to accomplish another task. The basic idea of this paradigm is to distribute the processing through the network: that is, sent the code to the data instead of bringing the data to the code. The type of applications that are most appropriate for mobile agent technology would include at least one of the following features: data collection, searching and filtering, distributed monitoring, information dissemination, negotiating and parallel processing [4]. When the application is computationally intensive or requires the access to distributed sources of data then the parallel execution of mobile agents in different machines of the network seems to be an effective solution. At the same time, the dynamic and asynchronous execution of mobile agents fits very well in changing environments where it is necessary to exploit some notion of adaptive parallelism.

4.2 Implementation Framework The implementation of the algorithm is based on Aglet (IBM). Aglet is probably the most famous mobile agent system. It models the mobile agent to closely follow the applet model of Java, with the following characteristics: object-passing, autonomous execution, local interaction, asynchronous, disconnected operation, parallel execution, etc. Aglets are Java objects that can move from one host to another, their fundamental operations including: creation, cloning, dispatching, retraction, deactivation, activation, disposal, and messaging [6]. It is possible for them to halt execution, dispatch to a remote host, and restart executing again by presenting their credentials and obtaining access to local services and data. Aglet provides a uniform paradigm for distributed object computing [7]. Using Aglet can ease the development of distributed computing system. We are doing research on mobile agent based parallel computing. On the basis of that, an implementation framework of the algorithm is proposed. It is composed of two parts: Console and Monitor. They are both aglets with ability of communication through messaging. Console is responsible for the initialization, decision-making and job dispatching. Monitor is responsible for the load monitoring. Console resides on the central node, waiting for the coming of new jobs and messages from monitor in each node. When a new job comes it will find an appropriate node to execute it according to the algorithm. Meantime it will handle messages from

46

J. Zhang and X. Lu

monitors: update run queue length and scheduling function. Monitors check the run queue length of each node, and send this information to console if these values change

4.3 Experimental Results The experiments were conducted on a network, consisting of a Sun server and five Sun workstations. The tasks are generated on the server and then scheduled to other machines. Two conditions were considered in the tests: without and with background loads. The background loads are computing agents, which are long time consuming. The tasks are generated with interval of 4 and 6 second respectively, until 100 tasks are generated. For comparing with the dynamic scheduling algorithm (DS), the round robin (RR) scheduling algorithm was used to do the same tests. The experimental results are shown in table 1.

From table 1, it is observed that the performance of DS is better than that of RR. The average speedups for the two conditions are 1.24 and 1.27 respectively. As the tasks are generated and scheduled, the run queue length of some machines may be greater than the bounding factor, their scheduling function is less than zero. DS algorithm will not assign new tasks to these nodes until the scheduling function is positive. While RR algorithm continues scheduling new tasks to them and finally get a bad result.

5 Conclusion In this paper, we introduce a dynamic job scheduling algorithm. The basic principle of this algorithm is trying to utilize the information of each node in a computational grid to allocate the jobs among them. A scheduling function is used to determine the desirability of each node to accept a new job. If a node were already fully utilized, the new job would not be allocated to it again. This algorithm tries to improve the throughput, response time, and fairness of the system. Moreover, a mobile agent based implementation framework is proposed.

A Dynamic Job Scheduling Algorithm for Computational Grid

47

References 1. Markatos E P.: How architecture evolution influences the scheduling discipline used in shared-memory multinodes. Joubert G R. Proceedings of Parco 93. Amsterdam: Elsevier, (1993) 524-528 2. Hui C C, Chanson S T.: Improved strategies for dynamic load balancing. IEEE Concurrency, 3 (1999) 58-67 3. V. Pham, A. Karmouch.: Mobile Software Agents: An Overview. IEEE Communications Magazine, 7 (1998) 26-37 4. B. Venners.: Solve Real Problems with Aglets, a Type of Mobile Agent, JavaWorld Magazine, 5 (1997) 5. Perdikeas M.K., Chatzipapadopoulos F.G., Venieris I.S. An Evaluation Study of Mobile Agent Technology: Standardization, Implementation and Evolution. IEEE International Conference on Multimedia Computing and Systems, 2 (1999) 287 -291 6. Lange Danny B., Oshima Mitsuru.: Mobile Agents with Java: The Aglet API. World Wide Web Journal, 3 (1998) 111-121 7. Lange Danny B., Oshima Mitsuru.: Programming and Deploying Mobile Agents with Java, Addison-Wesley, MA, (1998)

An Integrated Management and Scheduling Scheme for Computational Grid* Ran Zheng and Hai Jin Cluster and Grid Computing Lab Huazhong University of Science and Technology, Wuhan, 430074, China {zhraner, hjin}@hust.edu.cn

Abstract. Computational grids have become attractive and promising platforms for solving large-scale high-performance applications of multi-institutional interest. However, the management of resources and computational tasks is a critical and complex undertaking as they are geographically distributed, heterogeneous in nature, owned by different individuals or organizations with their own policies, different access, and dynamically varying loads and availability. In this paper, we propose an integrated management and scheduling scheme for computational grid. It solves some pivotal and important questions such as resources heterogeneous and information dynamic. It affords transparent support for high-level software and grid applications, enhancing the performance, expansibility and usability of computers, and providing incorporate environment and information service. This scheme has universality for computational grid and makes every grid resource work efficiently.

1 Introduction Computational grids [1][2] are becoming more attractive and promising platform for solving large-scale computing intensive problems. In this environment, various geographically distributed resources are logically coupled together and presented as a single integrated resource. The resource management and scheduling is the key technology of a grid. How to manage the resources efficiently is the pivotal issue, which decides whether grid is available or not. At the same time it is a complex undertaking as the resources are distributed geographically, heterogeneous in nature, owned by different individuals or organizations, and they have dynamically varying loads and availability. Some existing resource managing technologies of the parallel and distributed system cannot fit well for the characteristics of the computational grids mentioned above. This paper presents an integrated resource management and scheduling scheme for computational grid. In section 2, we analyze three resource management models and point out the hierarchical structure is suitable for grid. We put forward an integrated * This paper is supported by National Science Foundation under grant 60125208 and 60273076. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 48–56, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Integrated Management and Scheduling Scheme for Computational Grid

49

management and scheduling scheme in section 3, and the prototype is also explained. In section 4, task dispatching and selection algorithms for this architecture are introduced. Section 5 focuses on the performance evaluation of this scheme. Finally, we draw the conclusion and give out future work in section 6.

2 Anatomy of Resource Management Architectures Primarily, there are three different scheduling models: Centralized management model This can be used for managing single or multiple resources, which suits well for cluster (or batch queuing) systems such as Condor [3], [4], LSF [5], and Codine [6]. There are many advantages: simple structure, convenient maintenance, certifiable consistency or integrality. However, it is hard to achieve in distributed system for the scheduling bottleneck. Therefore it is not suitable for capacious grid. Decentralized management model In this model resources are partitioned into different virtual domains. In each domain here is a domain scheduler. The model appears high scalable, but remote status is not available so the optimal scheduling is questionable! What’s more, the traffic is heavy and the data are located decentralized, which make against data consistency and scheduling in multi-domains. Hierarchical management model This model looks like a hybrid (central and decentralized) model, which not only avoids the shortcoming of the two models, but also settles some challenging problems: site autonomy, heterogeneous environment and policy extensibility. It has been adopted in Globus [7][8], Legion [9][10], Ninf [11], and NetSolve [12][13].

Fig. 1. Hierarchical Architecture Model

Our grid resource management architecture follows this hierarchical model, shown in Figure 1. It is constructed with a super-scheduler, several local schedulers and published resources, just like a tree. Super-scheduler is root, and local schedulers are non-leaves. All leaves are resources, which are divided into several virtual domains.

50

R. Zheng and H. Jin

3 Integrated Management and Scheduling Scheme 3.1 Integrated Resource Management and Scheduling Scheme Resource management is highly important in grid, which is similar but more complex than distributed system. It should not only support multi-scheduling, but also suit for the complex surrounding and provide necessary QoS. The integrated management and scheduling scheme is shown in Figure 2. The key components are Grid Resource Scheduler, Grid Information Server and Grid Nodes.

Fig. 2. Integrated Resource Management and Scheduling Structure

Grid Resource Scheduler is a decision-making unit, which incepts user requests, adopts optimal scheduling algorithms and handles seamless managing issues. Furthermore, the scheduler must be able to deal exceptional case. For example, after the fail of one resource, the scheduler can reschedule tasks to other idle resources. Grid nodes include devices and software. The node managers handle all issues from upper scheduler and harmonize the actions of devices and active processes. The dispatcher determines inner scheduling based on upper info or resource statuses that are collected by monitor. Examples of local manager include cluster systems such as MOSIX and queuing systems such as Condor, Codine and LSF. Grid Information Server acts as databases for describing items of interest to the resource management systems, such as resources, jobs, schedulers.

3.2 Grid Scheduler Infrastructure Grid scheduler acts as a mediator between users (application) and grid resources. It is responsible for grid management. The representation of grid scheduler is shown in Figure 3, where there are two hierarchies: decomposing level and scheduling level.

An Integrated Management and Scheduling Scheme for Computational Grid

51

Job receiver module incepts user requests and returns results. Task decomposing module decomposes job into several parallel, mutually exclusive or synchronous atomic tasks. The rules are saved in database, which are the foundation of rule-based illation. Resource-task matching module finishes the matching between atomic tasks and resources, identifying the exchangeable and compensatory ability of resources. Scheduling module searches information from information server, and saves or modifies some context. Task scheduling module analyses scheduling strategies with different principles. Rule-based illation module arranges resources for atomic tasks. Scheduler optimizing module can optimize scheduling on-line. Job receiver module takes charge of scheduling generation, creation of atomic tasks, and maintenance of job status. Resource matching module decomposes jobs, searches grid information server, and generates allocating schemes. Scheduling illation module analyses generated reasonable schemes and determines the best. Scheduling module allocates tasks to selected resources according with the result.

Fig. 3. Grid Scheduling Structure

Re-scheduler happens with interruption of outer or inner abnormity. Outer abnormity is caused with the arrival of urgent jobs or cancellation. Inner abnormity is caused by factors such as resource failure. When interruption happens, job scheduler sends signal to task scheduling module, then re-dispatches and reschedules.

4 Grid Scheduling Algorithm 4.1 Dynamic Scheduling Mechanism There are two commonly dynamic methods, named event-based and timed-based. In timed-based method the interval of scheduling is periodical. After a fixed time a new period is coming: eliminate all finished tasks and insert new tasks to reschedule.

52

R. Zheng and H. Jin

Event-based method is different, which inspects the rescheduling status constantly and decides the possibility whether to issue a rescheduling event. The structure of dynamic scheduling is shown in Figure 4. Job has its own priority to be scheduled. A new submitted job springs the scheduling of the selected local scheduler at the same time if its emergency is high enough to exceed the threshold value. Otherwise it is scheduled with other tasks when the scheduling slot comes. The failure of resource or the cancellation of tasks will all lead to reschedule.

Fig. 4. The Structure for Dynamic Scheduling

4.2 Integrated Scheduling Algorithms All tasks in grid can be classified as real-time or best-effort class. All tasks will be space-based dispatched widely and time-based selected in virtual domain. The basic rule is real-time tasks are processed earlier than best-effort tasks. 4.2.1 Grid Task Dispatch Algorithm of Grid Scheduler We propose a mixed scheduling algorithm with the goal of the least number of missed real-time deadlines and load balance of grid resources, called LMLB. The algorithm is invoked at every real-time task or best-effort task arrival. For every real-time task, resource-task matching is done first. If no satisfied resource, add 1 to the scheduling counter whose initial value is 0. If the value is out of the scheduling threshold, it is regarded the task is disable, else put it to renewed window to reschedule next time, then repeat the operation in any case. If some resources are suitable for the request, select one from the matching unit with the goal of the least number of missed deadlines. Compute as

where is the deadline specification, is the scheduling time. Estimate the processing time on each available resource

where

denotes the round-trip time of network,

between resource and user,

is the bandwidth,

is transmitted data is the waiting time on

is a

An Integrated Management and Scheduling Scheme for Computational Grid

parameter of waiting probability, arbitrary units) for the task, and Select a suitable resource

where

53

is the logical computational “cost” (in some is resource performance (in units per second).

whose

is not over and nearest to

denotes the logical computation of resource

mation of all tasks’ logical computation “costs” in resource

the real-time sumis

,

is the boundary whether resource overloads or not. However, if there is no resource satisfying which is shown no suitable resource for the task at this moment, add 1 to its scheduling counter for rescheduling. For every best-effort task, the initial operation is similar with real-time task. Select one with the goal of load balance of grid resources from the matching unit. Estimate current load of resource on each available resource

where in resource

is the real-time summation of all tasks’ logical computation “costs” and

Select a suitable resource

denotes the logical computation of resource whose

is the smallest.

4.2.2 Tasks Selection Algorithm in Virtual Organization A mixed algorithm is proposed, in which best-effort tasks are solved after real-time tasks, so that most deadlines can be satisfied and resource can be utilized efficiently. Grid scheduler should ensure the maximal benefits of users, namely the minimal number of missed tasks. Therefore, real-time tasks are prior to best-effort tasks to schedule. But best-effort tasks should also be processed as soon as possible. The real-time task selection algorithm should consider QoS of all tasks. Weighted Priority Schedule Algorithm (WPSA) is adopted in real-time scheduling. In WPSA, the priorities of tasks in virtual organization are varied dynamically with their importance. Tasks with different priority get different QoS in this algorithm. Highest Responsibility First Algorithm (HRFA) is used for best-effort tasks. Response Rate is the ratio of response time of the task, defined as:

54

R. Zheng and H. Jin

Here Response Time is the summation of waiting time that the task joins the system and the execute time estimated. Therefore, the Eq.6 can be written as:

5 Performance Evaluation of the Scheduling Scheme In this section, we will evaluate the grid schedule scheme from the viewpoints of QoS of grid applications. The process of matching, scheduling, and executing is performed so that some metric of aggregate QoS delivered to the requestor is maximized [14]. LMLB algorithm is based on the greedy scheduling algorithm described in [15] as MCT (Minimum Completion Time). MCT is an aggressive approach that does not consider the possible deadlines. It can lead to an increasing load of resource, even overloaded. LMLB can overcome this shortcoming efficiently. The scheduler uses load measurements and searches all resources and finds an optimal one. Figure 5 shows a comparison of the failure rate between MCT and LMLB algorithm. The failure rate is defined as the percentage of requests that missed their deadlines. Results are shown for the different thresholds of workload conditions. Although LMLB decreases the overloads of grid resources, it is inevitable that the loads of them are not in balance. In order to utilize idle resources sufficiently, the resources with the least loads are distributed to best-effort tasks with no limitation of deadlines. It not only gets load balance, but also ensures earlier finishing time.

Fig. 5. Failure Rate for Greedy and the LMLB Algorithm

Fig. 6. Waiting Time of Tasks with Different Priorities

WPSA is adopted for real-time scheduling. The different deadlines or degrees of importance are distinguished with different weights. WPSA ensures higher priority for more emergent tasks, which can be ahead processed. Suppose tasks are divided

An Integrated Management and Scheduling Scheme for Computational Grid

55

into two classes with higher or lower priorities. Figure 6 shows the comparison among WT1, WT2 and WT, which denote the higher, the lower and all tasks respectively. The conclusion is that tasks with higher priorities get better QoS. HRFA is used for best-effort tasks. Compare it with First Come First Server algorithm (FCFS) and Shortest Executing Time First algorithm (SETF). FCFS only considers Waiting Time and ignores Executing Time, while SETF just on the opposition emphasizing Executing Time. HRFA is the compromise between them, which not only cares for short tasks, but also do not let long tasks wait for too long. From Eq.7, we can know tasks with shorter executing time can get higher response rate, saying HRFA gives special treatment to short tasks. But if waiting time of one task is so long that its response time increases with the extending of waiting time, the task will be possible to have the highest response rate to schedule immediately.

6 Conclusions and Future Work In this paper, we proposed an integrated resource management and scheduling scheme for computational gird. The integrated algorithms for high QoS of grid applications are used and correlative evaluations are investigated. The results suggest that future applications on grid will compete for resources. Based on this architecture, there are many factors and aspects need to be studied carefully, such as fault tolerance, high utilities of grid resources, trade-off between failure-rate and cost.

References 1. 2.

3. 4. 5. 6. 7. 8.

9.

I. Foster and C. Kesselman (ed.), The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann Publishers, 1998. M. Baker, R. Buyya, and D. Laforenza, “The Grid: International Efforts in Global Computing”, Proc. of International Conference on Advances in Infrastructure for Electronic Business, Science, and Education on the Internet, Rome, Italy, 2000. M. Litzkow, M. Livny, and M. W. Mutka, “Condor – A Hunter of Idle Workstations”, Proc. of the 8th International Conference of Distributed Computing Systems, June 1988. J. Basney and M. Livny, “Deploying a High Throughput Computing Cluster”, High Performance Cluster Computing, Vol. 1, Chapter 5, 1999. Q. Zhao and J. Suzuki, “Efficient quantization of LSF by utilizing dynamic interpolation”, Proc. of 1997 IEEE International Symposium on Circuits and Systems, Hong Kong, 1997. F. Ferstl, Job and resource management systems: Architectures and Systems, Vol.1, pp.499~518, 1999. I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit”, International Journal of Supercomputer Applications, Vol.11, No.2, pp.115-128, 1997. K. Czajkowski, I. Forster, N. Karonis, C. Kesselman, S. Martin, W. Smith, and S. Tuecke, “A Resource Management Architecture for Metacomputing Systems”, Proc of the 4th Workshop on Job Scheduling Strategies for Parallel Processing, 1998. A. Grimshaw and W. Wulf, “The Legion Vision of a Worldwide Virtual Computer”, Communications of the ACM, Vol.40, No.1, 1997.

56

R. Zheng and H. Jin

10. S. Chapin, J. Karpovich, and A. Grimshaw, “The Legion Resource Management System”, Proc. of the 5th Workshop on Job Scheduling Strategies for Parallel Processing, 1999. 11. H. Nakada, M. Sato, and S. Sekiguchi, “Design and Implementations of Ninf: towards a Global Computing Infrastructure”, Future Generation Computing Systems, Metacomputing Special Issue, 1999. 12. H. Casanova and J. Dongarra, “NetSolve: A Network Server for Solving Computational Science Problems”, International Journal of Supercomputing Applications and High Performance Computing, Vol. 11, No. 3, 1997. 13. H. Casanova, M. Kim, J. Plank, and J. Dongarra, “Adaptive Scheduling for Task Farming with Grid Middleware”, International Journal of Supercomputer Applications and HighPerformance Computing, 1999. 14. M. Maheswaran, “Quality of Service Driven Resource Management Algorithms for Network Computing”, Proc. of International Conference on Parallel and Distributed Processing Technologies and Applications, 1999. 15. M. Maheswaran, S. Ali, H. Siegel, D. Hensgen, and R. Freund, “Dynamic Mapping of a Class of Independent Tasks onto Heterogeneous Computing Systems”, Journal of Parallel and Distributed Computing, Vol.59, pp.107-131, 1999.

Multisite Task Scheduling on Distributed Computing Grid Weizhe Zhang1, Hongli Zhang1, Hui He2, and Mingzeng Hu1 1

School of Computer Science and Technology, Harbin Institute of Technology, P.R.China {zwz, zhl, mzh}@pact518.hit.edu.cn 2

http://pact518.hit.edu.cn/index.html Network Information Center, Harbin Institute of Technology, P.R.China [email protected]

Abstract. Multisite task scheduling plays more and more important role in the grid computing as the WAN becomes faster and faster. Through the development of a three-level architecture of the distributed computing grid model and a grid schedule model, a scalable environment for multisite task scheduling is put forward. Then, a multisite Distributed Scheduling Server is designed and its prototype is implemented. A heuristic strategy, Clustering-based Grid Resource Selection algorithm, is described. Experiment indicates the scheduler and the algorithm are effective.

1 Introduction Grid computing refers to the coordinated and secured sharing of computing resources across different administrative domains, aiming to solve the large-scale embarrassing problems such as fluid dynamics, weather modeling, nuclear simulation and molecular modeling....etc. Currently, computational grid can be classified into distributed computing grid and high-throughput computing grid [1]. A distributed supercomputing grid executes the application in parallel on multiple machines to reduce the completion time of a job. A high-throughput grid increases the completion rate of a stream of jobs. Task scheduling is necessary and important to achieve less running time and higher throughput. Traditionally, the definition of task scheduling is the assignment of start and end times to a set of tasks to some certain resources, subject to certain constraints. However, computing grid involves so many resources over multiple administrative domains that resources should be selected carefully in order to provide the best Qos. Thus, the traditional scheduling model based on static resources can not satisfy the large-scale dynamic resources requirement of the grid computing. In this paper, a new scheduling model oriented to the distributed computing grid is put forward. In the new scheduling model, the resource selection phrase plays an important role. Normally, resource selection algorithms can be classified into single-site and multisite resource selection algorithms. Currently, most of the scheduler systems adopt the M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 57–64, 2004. © Springer-Verlag Berlin Heidelberg 2004

58

W. Zhang et al.

single-site resource selection algorithm such as Matchmaker/Class Ad system of University of Wisconsin-Madison [2], Nimrod/G Scheduler of Monash University[3], Silver Grid scheduler of Supercluster Organization[4] and the Metascheduler of the Poznan Supercomputing and Networking Center[5]. However, only the GrADS [6] project matches sets of resources to applications instead of just a single resource. The lack of the multi-site resource selection algorithms is the result that many users fear a significant adverse effect on the computation time due to the limitations in network bandwidth and latency over wide-area networks. As WAN networks become faster and faster, the overload caused by communication may decrease over time. In fact, [7] has proved the usage of multi-site application can significantly improve running time in terms of a smaller average response time and about 25% communication overload. In this paper, we propose our enhanced multi-site resource selection algorithm— CGRS algorithm based on the distributed computing grid model and the grid scheduling model. The CGRS algorithm integrates a new density-based internet clustering algorithm into the decoupled scheduling approach of the GrADS and decreases its time complexity. The rest of this paper is organized as follows. First, our scheduling model is discussed in the next section. In Section 3, we present the design of our scheduler. The resource selection algorithm and experiments are presented and discussed in Section 4 and 5 respectively. The paper ends with a brief conclusion.

2 Models 2.1 Distributed Computing Grid Model Distributed Computing Grid Model (DCGM) adopts three-level architecture as shown in Fig.1: the top level consists of Grid Information Servers (GIS) and Grid Meta Scheduling Server (GMSS); the second level has several domains and each domain consists of a Grid Distributed Scheduling Server (GDSS); all kinds of Grid Computing Resources (GCR) and Grid User Groups (GUG) are third level. Grid Information Servers (GIS) are the essential part of any Grid software infrastructure, providing fundamental mechanisms for discovery and monitoring [8]. Each domain is controlled by at least one GIS, which dynamically collects information about resources registered to it and spread information to other GISs. A GIS receives Grid information request sent by GMSS, GDSS and GUG and returns the satisfactory resource aggregate to the requester Grid Meta Scheduling Servers (GMSS) focus on harmonizing the scheduling of different GDSSs. The goal of the GDSS is to avoid the mistake that GDSS assumes the absence of the others when two or more applications are submitted simultaneously. Every GMSS accepts the meta scheduling request from the GDSSs and cooperates with other GMSSs to increase the system throughput. There is much work done for the GMSS policies, in the current stage we only focus on the GDSS scheduling policies. Grid Distributed Scheduling Servers (GDSS) are the key component in the arch-

Multisite Task Scheduling on Distributed Computing Grid

59

itecture, which administer the efficient use of registered resources and mapping the grand-challenging application on the selected resources aggregate. When the Grid User Group (GUG) submits the job to GDSS, GDSS contacts with GIS to gather the information of the Grid. Then, GDSSs use the decision module for scheduling and dispatches the meta scheduling request to GMSSs. Grid Computing Resources (GCR) are non-dedicated workstations or personal computers, which may be homogeneous or heterogeneous. Every GCR registered to GIS acts as the target of task mapping performed by GDSS or GMSS. Grid User Group (GUG) has the challenge problems such as fluid dynamics, weather modeling and nuclear simulation....etc. GUG interacts with Grid environment through the Grid Portal along with GDSS.

Fig. 1. Distributed computing grid model

2.2 Scheduling Model Distributed computing grid mainly focuses on some specific grand application which takes hours, days, weeks and months while high throughout grid targets stream of tasks. Thus the scheduling purpose of our DCGM is not to maximize system utilization but to reduce the turn around time. The GDSSs make best effort decisions of static scheduling using predicable performance model of specific applications and submit the job to the resources selected. The scheduling model is formally defined as a seven-tuple: where the meanings are as follows: 1. R is a finite and nonempty set of the non-dedicated and heterogeneous GCRs. 2. represents an application requirement model, satisfying is a finite and nonempty set of application information.

60

3. 4. 5. 6. 7.

W. Zhang et al.

and are a finite set of GUG network and host minimal application requirement respectively, which provide the basic Qos guarantee. T is a finite and nonempty set of arbitrarily divisible grand-challenge tasks P is a finite and nonempty set of performance models determined by types of T . , S denotes the set of the start time of tasks } is nonempty set of mapping strategies. is a function filtering out the resources that do not meet the GUG minimal job requirements and reduce the resources set for the GDSS. is a function determining the best fit resource (or set of best resources) to submit a job, which is the core process of the GDSS.

3 The GDSS Design Combing the scheduling model of DCG and the general architecture presented in [8], we begin with a framework of a single GDSS node. The framework as shown in Figure 2 will give a broad overview of the work required to build a generic unit.

Fig. 2. The framework of the Grid Distributed Scheduling Server

There are three main phases when scheduling on the GDSSs. Phase One is resource filtering, which consists of distributed computing grid portal, XML parser, job priority queue and resource filter. In order to proceed to resource filtering, the users must specify task description and some minimal set of job requirements through Web portal which creates the XML document parsed by our DOM parser. Then, job priority queue is with responsibility for determining the priority of the job. Subsequently, resource filter removes unsuitable resources utilizing the information from GIS. At the end of phase one, the list of potential resources is generated. Phase Two involves mapping tasks and selecting the best resource set. Predicable information collector gathers the detailed information from the GIS. Our system adopts information provider based on Globus and NWS to support dynamic information collection. Job scheduling decision module is the key component of GDSSs,

Multisite Task Scheduling on Distributed Computing Grid

61

which determines the best-fit resource (or the resource set) as a meta request. The efficiency of job scheduling decision module is directly determined by best-fit resource selection algorithm. Our resource selection algorithm based on grid clustering will be explained in the next section. Subsequently, the meta request is sent to the GMSSs and the cooperating resource set is feed back after the GMSSs negotiate a compromise about the contention of different GUGs request the same GCRs at the same time. At the end of phase two, a set of cooperating resource is generated. In Phase Three the job is executed, which includes a file dispatcher and a result retriever. We adopt gridftp, GRAM services based on Globus to implement remote job submission and remote compilation. At last, the result is retrieved and displayed on the Web portal using Virtual Reality Modeling Language (VRML).

4 A Resource Selection Algorithm The resource selection algorithm is at the core of the job scheduling decision module and must have the ability to integrate multi-site computation power. Our Clusteringbased Grid Resource Selection (CGRS) algorithm clusters the set of available resources, generates the candidate schedules for all the subsets in each cluster and evaluates the candidate schedules to select a final schedule. Pseudo-code for our multi-site search procedure is given in Figure 3. The first method called by the procedure is GCRClustering ( ); The method clusters the available GCR into disjoint subsets, such that the network delays within each subset are lower than the network delays between subsets. The clustering algorithm of available GCRs clustering is so important as the basis of CGRS algorithm that we design a sophisticated clustering algorithm based on data mining [10] method, which will be expounded in the following section. Another core method is the MapAndPredict ( ). The method adopts performance model and mapping strategy of some specific applications to predict execution time. Because the predicted execution time directly determines the correctness of the best schedule, performance model and mapping strategy plays an important role.

Fig. 3. Clustering-based Grid Resource Selection (CGRS) algorithm

62

W. Zhang et al.

Our methodology of CGRS algorithm is similar to the GrADS approach in that it decouples the performance model and adopts the multisite selection algorithm to promote application performance. However, the CGRS algorithm outperforms the schedule search procedure of GrADS in two aspects: Firstly, the CGRS algorithm introduces more sophisticated clustering algorithm than the method based on the Internet Domain Name Service adopted by the GrADS. . It is well-known that clustering algorithm based on DNS is unreasonable and imprecise. The CGRS algorithm adopts a new clustering method based on data mining [9] providing firmly theoretic ground for resource selection. Secondly, the CGRS algorithm is an O(n) algorithm, where n represents the available GCRs, if the GCRClustering method is decoupled and implemented by a separate module which runs periodically. The time complexity of GrADS is about where n and s represents the available GCRs and the number of clusters respectively.

4.1 Density-Based GCR Clustering (DGC) Algorithm The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering .The purpose of GCR clustering, which serves as a pivotal preprocessing step for CGRS algorithm, is to identify the clusters of low intra-cluster network latencies to enable coarse-grain parallelism. We propose a density-based GCR clustering technique instead of the traditional DNS-based clustering method which leads to more static clusters. The basic ideas of DGC algorithm involve a number of new definitions: The neighborhood within a radius of a given edge is called the neighborhood of the edge, where radius represents network latency or bandwidth. If the of an edge contains at least a minimum number, MinPts, of edges, then the edge is called a core edge. Given a set of edges, D, we say that an edge p is directly density-reachable from edge q if p is within the of q. Now, the DGC algorithm is presented as follow: 1. All the directly density-reachable edges of every edge in the AvailableGCRPool are found and stored into the adjacency list. 2. Then, the DGC algorithm determines the core edges among the low latency edges or high bandwidth. 3. At last, it iteratively collects directly density-reachable edges from these core edges, which may involve the merge of a few density-reachable clusters. The process terminates when no new edge can be added to any cluster. The time complexity of DCG algorithm is where e is the edge number.

Multisite Task Scheduling on Distributed Computing Grid

63

5 Experiments The efficiency of the decoupled multi-site scheduling methodology itself has been demonstrated in [6].In this section, we present validation results for the density-based GCR Clustering (DGC) algorithm developed in Section 4.1. We have implemented DGC algorithm on the tree network topology, graph network topology and AS-level network topology shown in Figure 4. In each topology, according to a user-defined parameter D, the edges of graph can be classified into two categories: low-latency and high-latency. The low-latency edges are the edges between two black nodes while others are high-latency edges. In the experiment, we initialize MinPts and as 1, which means that the core edges itself are low-latency edges and at least directly connect with one low-latency edge. In the tree network topology, after clustering using DGC algorithm, three clusters {(1,2)(2,3)(2,9)(3,4)(4,5)},{(11,12)(11,13)(11,15)(13,14)},{(18,19)(19,20)(19,21)} are acquired and resource selection combinations of DGC algorithm are 9 less than that of the DNS-based method. In the graph network topology, only one cluster {(3,6)(3,7)(5,8)(6,8)(7,12)} is acquired and 7 resource selection combinations are erased. In AS-level network topology, the low-latency edges are {(1,2)(2,3)(3,4)(4,1)(5,6)(6,7)(7,8)(8,5)(9,10)(9,11)(9,12)}and the result of clustering is three clusters: {(1,2)(2,3)(3,4)(4,1)}{(5,6)(6,7)(7,8)(8,5)} and {(9,10)(9,11)(9,12)}.

Fig. 4. The tree, graph network and AS-level network topology

The above results indicate that our clustering strategy is correct and effective for clustering grid computing resources. It avoids the negative impact of high-latency edges, reduces the member number of the resource aggregate and the resource evaluation combination of CGRS algorithm.

6 Conclusion In this paper, a three-level architecture of the Distributed Computing Grid Model (DCGM) is brought forward and acts as the infrastructure of multisite scheduling environment. The design of the key component of multisite scheduling environment, Distributed Scheduling Server (GDSS), is discussed in detail.

64

W. Zhang et al.

Also, we focus on the multisite resource selection algorithm of GDSS. A heuristic strategy, Clustering-based Grid Resource Selection (CGRS) algorithm, was described. In the CGRS algorithm, we mainly introduce the Density-based GCR Clustering (DGC) algorithm to cluster the resources in the distributed computing grid and combine it with the decoupled scheduling approach. The next step in this research is to precisely quantify the benefit of the CGRS algorithm with actual internet application in real distributed computing grid environment. Moreover, we will involve more parallel applications such as loose synchronous applications and embarrassing applications into the schema of our scheduling.

References 1. Foster and C. Kesselman, The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Fransisco, CA, 1999. 2. R. Raman, M. Livny and M.Solomon, Matchmaking: distributed resource management for high throughput computing, High Performance Distributed Computing, 1998. Proceedings. The Seventh International Symposium on 28-31 July 1998 Page(s): 140 -146 3. R.Buyya, D.Abramson and J Giddy, Nimrod/G: an architecture for a resource management and scheduling system in a global computational grid, High Performance Computing in the Asia-Pacific Region, 2000. Proceedings. The Fourth International Conference/Exhibition on May 2000, Volume: 1 , 14-17 Page(s): 283 -289 vol.1 4. Silver Design Overview, http://supercluster.org/projects/silver/designoverview.html 5. Krysztof Kurowski, Jarek Nabrzyski, and Juliusz Pulacki, User Preference Driven Multiobjective ResourceManagement in Grid Environments, Proceedings of CCGrid 2001, May 2001. 6. H. Dail, F. Berman and H. Casanova, A Decoupled Scheduling Approach for the Grads Program Development Environment, to appear in Journal of Parallel and Distributed Computing (JPDC), 2003 7. Carsten Ernemann, Volker Hamscher, Uwe Schwiegelshohn, Ramin Yahyapour and Achim Streit, On Advantages of Grid Computing for Parallel Job Scheduling, 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID’02), May 21 - 24, 2002 , Berlin, Germany 8. J. Schopf, A General Architecture for Scheduling on the Grid. Submitted to special issue of JPDC on Grid Computing (2002) 9. Jiawei Han, Micheline Kamber, Data Mining: Concepts and Technology, Morgan Kaufmann, 2001

Adaptive Job Scheduling for a Service Grid Using a Genetic Algorithm Yang Gao1, Hongqiang Rong2, Frank Tong2, Zongwei Luo 2 , and Joshua Huang2 1

National Laboratory for Novel Software Technology, Nanjing University, Nanjing 210093, China [email protected]

2

E-Business Technology Institute, The University of Hong Kong, Hong Kong, China {hrong,ftong,zwluo,jhuang}@eti.hku.hk

Abstract. This paper presents a new approach to scheduling jobs on a service Grid using a genetic algorithm (GA). A fitness function is defined to minimize the average execution time of scheduling N jobs to machines on the Grid. Two models are proposed to predict the execution time of a single job or multiple jobs on each machine with varied system load. The single service type model is used to schedule jobs of one single service to a machine while the multiple service types model schedules jobs of multiple services to a machine. The predicted execution times from these models are used as input to the genetic algorithm to schedule N jobs to M machines on the Grid. Experiments on a small Grid of four machines have shown a significant reduction of the average execution time by the new job scheduling approach.

1 Introduction One of the challenges for a service Grid is to efficiently process users’ requests to Grid services in large numbers. This is essentially a problem of optimal allocation of the Grid resources to complete the requested services within a given time slot through effective job scheduling. Given N jobs submitted at time to a service Grid that has machines to execute these jobs in parallel, the optimal job scheduling strategy is to minimize the execution time of these jobs subject to a given cost. Job scheduling problems on a service Grid can be divided into the three levels. System-level scheduling deals with the problem of assigning a single job to one of M machines on the Grid that can finish the job in the shortest time [1][6][4]. Application-level scheduling deals with the problem of scheduling N various jobs 1

The work was conducted when the author was visiting the E-Business Technology Institute of The University of Hong Kong, under the support of the IBM China Scholar Visitorship Program, Natural Science Foundation of P.R.China (No.60103012) and the National Grand Fundamental Research 973 Program of China (No.2002CB312002)

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 65–72, 2004. © Springer-Verlag Berlin Heidelberg 2004

66

Y. Gao et al.

that are submitted to the Grid at the same time slot to M machines of the Grid [3]. Grid-level scheduling deals with the problem that N jobs are submitted to the Grid at the same time slot but the Grid is lack of resources to complete these jobs in a given time slot. In such a situation, the Grid-level scheduling system needs to find other Grids to execute some of these jobs. Unlike job scheduling in parallel computing and cluster computing that usually use a static load model or performance model estimated from experience data [8], the Grid job scheduling uses a dynamic model to predict job execution time due to the heterogeneity of computing resources and dynamics of the machine load [7]. The traditional approach to job scheduling is to first model the available computing resources, then determine the system load and finally estimate the jobs’ execution time. Direct application of this approach to Grid job scheduling often results in poor performance due to the special characteristics of the Grid environment [2]. In addition, job scheduling in parallel computing and cluster computing emphasizes on the performance and load balance of the whole system while in Grid computing, different scheduling polices and algorithms are required to deal with different kinds of tasks [5]. In this paper, we present an approach to using a genetic algorithm (GA) to minimize the average execution time of scheduling N jobs on heterogenous machines on a service Grid. To solve this job scheduling problem, we first develop a model to predict the execution time of a single service (job) on different machines with varied system load. Then, we extend the single service model to the multiple service types model that deals with situations where different types of jobs arrive at each machine in sequence. With the multiple service types model, we define an objective function to evaluate the optimal scheduling of N services of different types to M machines of different load situations and use a genetic algorithm to find the solution by minimizing the objective function. We conducted simulated experiments on a small scale Grid composed of four machines with different capacities and operating systems. The experiment results have shown significant reduction of the average execution time by our approach in comparison with the random allocation and average allocation methods.

2 2.1

Adaptive Models for Predicting Job Execution Times Single Service Type Model

We first develop a model to predict the execution time of a single service or job on each machine on the Grid. The major factors that affect the execution time include the machine capacity, complexity of the algorithm for the service and the size of data involved. The machine’s capacity changes dynamically over system load. Because of the load dynamics and many other unknowns, it is impossible to predict the precise execution time of a job on a machine. We can only predict the possible execution time of a job from historical experiences and use this predicted execution time to schedule the execution of the job. In the service Grid,

Adaptive Job Scheduling for a Service Grid Using a Genetic Algorithm

67

the computational performance of each service is likely tested on each machine before it is published for use. Therefore, the computational performance and ability to process different sizes of data for each service can be known or learned incrementally from the historical data of using the service. The difficult part is to deal with the dynamic load of the system that affects the execution performance of the service to be submitted. To simplify this problem we first assume that only one service is submitted to the Grid at a time slot. Our objective is to optimize the performance of the whole system no matter whether the system has a light or heavy load. In order to increase system’s throughput, the scheduling system must maximize the numbers of completed jobs within a time slot. To solve this problem, we use the following model to predict the execution time of a single job on a machine.

Here, is the predicted execution time of a new job on machine is the number of times that the jobs of the same service have been executed on is the actual execution time of the same service on the same machine. is the learning rate. and as The historical values of and are stored on the machine, can be obtained by experiments. If there are jobs generated from the same service already running on machine the actual execution time of the job will be affected due to the system load. In this case we use the following model to predict the execution time of the job.

Here, is the predicted execution time of job on machine that has jobs of the same services still running. and on the right side are the last historical values of the expected execution time and actual execution time respectively. is the learning rate. Because of the dynamics of the system load, it is difficult to obtain However, the ratios of can be estimated. Using the ratios, the execution time of the job on machine can be predicted by

where

and as and approach constants if the Grid system is stable. Figure 1 explains the process of calculating the predicted execution time of job on machine When the first job is submitted to the Grid, its predicted

68

Y. Gao et al.

Fig. 1. Predict the execution time of the on a machine

job when there are

jobs still running

execution time on machine is calculated by (1). The second job arrives while the first job is still running on machine The predicted execution time of the second job on machine is calculated by (3). Similarly, we can calculate the predicted execution time of the job, on machine when the previous submitted jobs are still running on it.

2.2

Multiple Service Types Model

To predict the execution time of the jobs in multiple service types we use the following model.

where

The superscript indicates a type of service. The difference of (6) from equation (3) is that two subsequent jobs can be different services as shown in (7) and (8). Figure 2 shows how the predicted execution time of the job is calculated. Because the predicted execution times of all preceding jobs are known, the scheduling path is known for a particular sequence of jobs, e.g., shown as the light line in Figure 2. However, unlike the single service type model, depends on the scheduling path. In other words, different scheduling paths result in different execution times since services in each path are different. Therefore, in implementation, we do not save individual on machine but ratios instead. We can observe from Figure 2 that if there are S services of different types on machine and the maximal number of possible concurrent jobs is N, then the total stored weights of connection lines are It is feasible to store these weights in the scheduling system since the number of services on a machine is limited. To implement this scheduling algorithm, the Grid initializes all parameters and all based on experiments. When a new job arrives, the Grid predicts all execution times on each machine, sorts them and selects

Adaptive Job Scheduling for a Service Grid Using a Genetic Algorithm

69

a machine with the minimal execution time to execute this job. When the job ends, the scheduling system records the real execution time and adjusts the prediction value. When another new job arrives, the schedule system schedules the job based on these new prediction values. The computational complexity of this system-level scheduling algorithm is O(M), where M is the number of machines.

3

A GA Approach to Application-Level Scheduling

Application-level scheduling deals with the problem of scheduling N jobs of different service types to machines on the Grid. To solve this scheduling problem, we can use the execution time prediction models to calculate the predicted execution time of these jobs on each machine and find an optimal allocation of these jobs to the M machines by minimizing

subject to

where

is the average execution time of jobs on machine N is the total number of jobs submitted to the Grid at a given time slot and is the maximal number of jobs limited to machine is the current, private load on machine can be obtained from experiments or historical results of running different numbers of jobs on each machine. Figure 3 shows three examples. Essentially, each curve is the accumulative distribution function of (3). Interpolation and extrapolation can be used to obtain a particular value. is also determined by experiments. (9) can be optimized using a genetic algorithm. Table 1 shows the pseudo code of the algorithm. First, the Grid gatekeeper sends requests to all machines and inquires whether or not they can process the new jobs. Each machine checks its load according to the current system performance and returns the information to the gatekeeper. The machine which agrees to receive the new jobs returns its current load status, such as the current number of jobs and The Grid scheduling system produces initial population including certain jobs assignments, evaluates each individual’s fitness, chooses the individuals with higher fitness to make copies, performs crossover and mutation operations, obtains the new population, and evaluates each individual’s fitness in the new population. Finally,

70

Y. Gao et al.

Fig. 2. Predict the execution time of the job when the jobs of different service types are runing on the machine.

Fig. 3. The performance curves of these machines running different numbers of jobs.

the Grid obtains the nearly optimal jobs assignment strategy and assigns the jobs to each machine. At last, each machine receives the new jobs that will be scheduled by its OS. In this algorithm, indicates the population in the step.

4

Experimental Results and Analysis

We conducted two experiments to test our models on a small scale Grid consisting of 4 machines. In these experiments, only one service to calculate the Discrete Flourier Transform (DFT) was used. The scheduling time slot was set as one second. In the first experiment, a request of the DFT service to transform 100 points was sent to the Grid in each time slot. 1000 requests for the same service were sent consecutively. The Grid system-level scheduling model was used to schedule these 1000 jobs. Figure 4 shows the results of scheduling the jobs randomly and by using the multiple service types model. One can clearly see

Adaptive Job Scheduling for a Service Grid Using a Genetic Algorithm

71

the reduction of the average execution time by the prediction model. For example, in 1000 second time slots, the Grid finished more than 800 jobs scheduled by the prediction model while it could only complete 400 jobs by the random scheduling. Figure 5 shows the distribution of the difference between the predicted execution time and the actual execution time of 1000 jobs. Although a system error of -3.8s is present, the model is able to predict the execution time within a small variance (4.95s). Even for interactive analysis, this prediction error is acceptable in many applications .

Fig. 4. Comparison of random scheduling and model scheduling.

Fig. 5. Distribution of the difference between the predicted execution time and the actual execution time.

Fig. 6. Comparison of GA scheduling with Random scheduling and average scheduling.

Fig. 7. Performance of the GA scheduling algorithm.

In the second experiment, we conducted 20 tests. In each test, we changed the number of jobs submitted to the Grid in each time slot. The number of jobs increased from 20 to 100. Figure 6 shows the average execution time of jobs scheduled by the GA approach, the average scheduling and the random

72

Y. Gao et al.

scheduling. We can see that the GA approach performed best and the random scheduling performed worst. The performance of the average scheduling is close to the GA approach when the number of jobs is samll. When the number of jobs becomes large, the GA approach shows a clear advantage. For example, the average execution time of the average scheduling was 20 seconds longer than that of the GA approach when they scheduled 80 jobs. Figure 7 shows predicted execution time against the iterations of the genetic algorithm. Our experiments showed that a near-optimal scheduling can be found within 50 iterations.

5

Conclusions

In this paper, we have presented two prediction models that are used to predict the execution time of a job on a given machine with varied system load. Based on the prediction models, we have developed a genetic algorithm approach to scheduling N jobs of different services to M machines on a Grid. Our experiment results have shown that the GA approach can reduce the average execution time of N jobs run on the Grid in comparison with some naive scheduling methods. Our experiments also demonstrated that the prediction models can predict the execution time accurately.

References 1. The globus grid project. http://www.globus.org. 2. Miguel L. Bote-Lorenzo, Yannis A. Dimitriadis, and Eduardo Gomez-Sanchez. Grid characteristics and uses: a grid definition. In Proceeding of the First European Across Grids Conference, 2 2003. 3. Henri Casanova, MyungHo Kim, James S. Plank, and Jack J. Dongarra. Adaptive scheduling for task farming with grid middleware. The International Journal of High Performance Computing Applications, 13(3):231–240, Fall 1999. 4. Steve J. Chapin, D. Katramatos, J. Karpovich, and A. Grimshaw. Resource management in legion. Future Generation Computer Systems, 15(5–6):583–594, 1999. 5. Klaus Krauter, Rajkumar Buyya, and Muthucumaru Maheswaran. A taxonomy and survey of grid resource management systems for distributed computing. Software Practice and Experience, 2:135–164, 2002. 6. Rajesh Raman, Miron Livey, and Marvin Solomon. Matchmaking:distributed resource management for high throughput computing. In Proceedings of the seventh IEEE International Symposium on High Performance Distributed Computing, Chicago, IL, July 1998. 7. D. Thain, T. Tannenbaum, and Miron Livny. Condor and the grid. In Anthony J.G. Hey, Fran Berman, Geoffrey C. Fox, editor, Grid Computering: Making the Global Infrastructure a Reality, chapter 11, pages 299–335. Wiley, West Sussex, England, 2003. 8. Y. Zhang, H. Franke, J. E. Moreira, and A. Sivasubramaniam. An integrated approach to parallel scheduling using gang-scheduling, backfilling, and migration. In D.G. Feitlson and L. Rudolph, editors, JSSPP 2001, Lecture Notes in Computer Science, pages 133–158, Berlin Heidelberg, Springer-Verlag.

Resource Scheduling Algorithms for Grid Computing and Its Modeling and Analysis Using Petri Net* Yaojun Han1,2,3, Changjun Jiang1 ,You Fu1,2, and Xuemei Luo2,3 1

Department of Computer Science & Engineering, Tongji University, Shanghai, 200092, China 2 Department of Computer Science, Shandong University of Science & Technology, Qingdao, 266510, China 3 Lab. Computer Science, ISCAS, Beijing, 100080, China [email protected]

Abstract. A resource scheduling algorithm called XMin-min is proposed in this paper. In the XMin-min algorithm, we consider not only the expected execution time of tasks, but also expected communication time when calculating expected completion time. In the paper, the execution cost of tasks and budget of application are selected as QoS and an algorithm XMin-min with QoS is also proposed. An extended high-level timed Petri net (EHLTPN) model suiting scheduling of resource in grid computing is presented in the paper. In the EHLTPN, the firing times assigned to transitions are functions of the tokens of input places. We construct a simple model for the resource scheduling in grid computing using EHLTPN. A definition of Reachable Scheduling Graph (RSG) of EHLTPN to analyze the timing property of the resource scheduling is given in this paper. Two algorithms can be use to settle the “state explosion” problem while constructing RSG of EHLTPN.

1 Introduction In grid computing environment, the scheduling problem becomes complex, as resources are geographically distributed, heterogeneous in nature, owned by different individuals or organizations [1]. It is well known that the choice of the best pairs of tasks and resources is a NP-complete problem. Some simple heuristics for dynamic scheduling of a class of independent tasks onto a heterogeneous computing system have been presented [2]. It is well known that the Min-min heuristics is now becoming the benchmark of such kinds of task/host scheduling problems. However, the Min-min algorithm is unable to balance the load well since it usually maps small tasks *

This work is support partially by projects of National Preeminent Youth Science Foundation (No. 60125205), National 863 Plan (2002AA4Z3430, 2002AA1Z2102A), Foundation for University Key Teacher by the Ministry of Education, Shanghai Science & Technology Research Plan (02DJ14064, 03JC14071), Open project of Laboratory of Computer Science, ISCAS (SYSKF0304).

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 73–80, 2004. © Springer-Verlag Berlin Heidelberg 2004

74

Y. Han et al.

first and did not deal with the Quality of Service (QoS). A novel QoS guided taskscheduling algorithm for grid computing was introduced in [3]. However, there are few algorithms consider the communication time of tasks while scheduling. An extension of Min-min algorithm called XMin-min is proposed in this paper. In the XMin-min algorithm, we consider not only the expected execution time of tasks, but also expected communication time of tasks when calculating expected completion time. In the paper, the execution cost of tasks and budget of application are selected as QoS. We give another scheduling algorithm XMin-min with QoS by embedding the QoS information into the XMin-min algorithm to improve the efficiency and the utilization of a grid system. In grid computing environment, the need arises in resource scheduling for powerful graphical and analytical tools such as Petri nets. Petri nets have gained more and more applications because of their ability to model asynchronous events, parallelism, connection, and synchronization [4]. In order to describe real processes well, many extensions of Petri nets such as colored Petri nets [5] and timed Petri nets [6] have been proposed. Some Petri nets models for scheduling were given in [7,8]. But, these models and their analysis technologies are not suitable to the resource scheduling in grid computing environment. Up to now, there are few Petri nets models for grid computing. We gave an extended colored time Petri net (ECTPN) model for describing and analyzing resource scheduling in grid computing environment in [9]. We modify the ECTPN and give an extended high-level timed Petri net (EHLTPN) model in order to well suit scheduling of resource in grid computing environment. In the EHLTPN, the firing times assigned to transitions are functions of the tokens of input places. A definition of reachable scheduling graph (RSG) to analyze the timing property of the resource scheduling in grid computing environmenis is given in this paper. Meantime, the XMin-min and XMin-min with QoS algorithms can be also use to settle the “state explosion” problem while constructing RSG of EHLTPN. The rest of this paper is organized as follows. Two algorithms are proposed in section 2. An EHLTPN model for the resource scheduling in grid computing environment is constructed in section 3. The definition of RSG is given in section4. Section 5 gives an example and experimental results. Section 6 concludes the paper.

2 Algorithms for Resource Scheduling We assume that there are m computing resources that are accessible to the user via m distinct network links, n tasks that are mapped onto m heterogeneous machines. The tasks are assumed to be independent. The expected execution time of task on machine is defined as the amount of time taken by to execute given has no load when is assigned. The expected transmit time is defined as the amount of time taken by transmitting task to machine from user site. Let denote the expected time machine will become ready to execute a task after finishing the execution of all tasks assigned to it at that point in time. The expected completion time of task on machine is defined as the wall-clock time at which completes In our paper,

Resource Scheduling Algorithms for Grid Computing and Its Modeling

75

Let denote the expected execution cost of task on machine denote the budget of task and B denote the total budget of all tasks Algorithm 1. XMin-min algorithm (1) For all tasks (2) For all machines (3) (4) Do until all tasks in are mapped (5) For each find the earliest completion time and that obtains it (6) Find with the minimum earliest completion time within budget B (7) Assign to that gives the earliest completion time within budget B (8) Delete task Update Update for all i (9) Enddo Algorithm 2. XMin-min Algorithm with QoS (1) For all tasks (2) For all machines (3) (4) Do until all tasks in are mapped (5) For each find the earliest completion time within budget ( i.e. and the machine that obtains it (6) Find with the minimum earliest completion time within budget B (7) Assign to that gives the earliest completion time within budget B (8) Delete task Update Update for all i (9) Enddo Obviously, above algorithms have the same complexity as the Min-min algorithm.

3 Petri Net Model for Resource Scheduling The basic concepts and properties of Petri nets have been introduced in [4,5,6], we do not intend to review them here. Definition 1. An extended high-level timed Petri net a eight-tuple where, is a finite nonempty set of places, is a finite nonempty set of transitions, is a finite set of directed arcs from P to T and T to P. C is a function: where is a power set of color set and are the negative function and positive function of P×T. is an initial marking that satisfies D is a set of firing durations, where firing duration is a function of the tokens of input places. Definition 2. An EHLTPN model for the resource scheduling in grid computing environment RSPN is a eight-tuple where, is a finite set of places, where, represents all unmapped tasks, represents the task selected to schedule using some algorithm,

76

Y. Han et al.

and represent the data of tasks used to reschedule, represent the tasks mapped to machine is a finite set of transitions, where, is used to select a task to schedule from unmapped tasks, represent the execution of any task on machine are used to modify the data of all unmapped tasks, is a finite set of arcs, is a set of colors, where u is the number of tasks in and M is the current marking, where and are selected from all unmapped tasks according to some algorithm, where u is the number of tasks in where is the running time of the task on the machine The graphical representation of RSPN is shown as figure 1.

Fig. 1. RSPN model

4 Reachable Scheduling Graph Definition 3. Let RSPN is an EHLTPN model for the resource scheduling. The reachable scheduling graph (RSG) of RSPN is defined as a directed graph with labeled directed edges and labeled nodes RSG(RSPN)=(V,E1,E2) and divided into m vertical sections corresponding to m machines and some levels. Proposition 1. The RSG(RSPN) =(V,E1,E2) is constructed by the following algorithm. (1) Let For j=1 to m do lv[j]=0. (2) Place in level 0 and tag “new”.

Resource Scheduling Algorithms for Grid Computing and Its Modeling

77

(3) If there exists no “new” node in V, then the algorithm ends, otherwise go to (4) (4) Select a “new” marking M and do the following: (4.1) While there exist at M, do the following for each enabled transitions at M:

(4.11) Obtain M’ that results from firing at M by calculating arc functions. (4.12) If (4.121) (4.122) If then lv[j]=lv[j]+1, M’ is placed in level lv[j] and section j and tag M’ “new”, where j is the index of and (4.123) If then is placed in level lv[j] and section j and tag M’ “new” and where i and j are the indexes of and respectively and (4.124) If then M’ is placed in level lv[j] and section j and tag "new", where j is the index of and (4.13) with (4.14)If then tag where (4.2) If there exists no transition t such that M[t>, then tag M “dead node”. (4.3) Remove “new” from M and go to (3). The correctness of the algorithm can be easily proven according to the definitions of RSPN and RSG and the firing rule of Petri net. In order to reduce the scale of RSG, we can use above two algorithms to generate successive nodes while calculating arc functions. Proposition 2. Let RSG(RSPN)=(V,E1,E2) is the reachable scheduling graph of RSPN. The sequence composed of “from beginning of level 1 along the edge E2 in section j represents the scheduling sequence of tasks on machine Proposition 3. Let RSG(RSPN)=(V,E1,E2) is the reachable scheduling graph of is in last level of section j. is the tag attached to Then the makespan for the complete schedule is equal to Example: Suppose that there are 4 independent tasks and there are two machines at time t. Table 1 gives the expected execution time, transmitting time, execution cost and budget data. The total budget required to run all tasks is 60.

The graphical representation of the RSPN for the example is similar to figure 1. We construct RSGs (shown in figure 2, 3 and 4) using the Min-min, XMin-min and XMin-min with QoS algorithms when calculating the arc functions of RSPN according to proposition 1 respectively.

78

Y. Han et al.

Fig. 2. The RSG constructed using Min-min

Fig. 3. The RSG constructed using XMin-min

Fig. 4. The RSG constructed using XMin-min with QoS.

From figure 2,3 and 4, we know that the Min-min algorithm gives a schedule sequence on machine and a makespan of 18, the XMin-min algorithm gives a schedule sequence on machine on machine and a makespan of 19, where execution time is 11, and the XMin-min with QoS

Resource Scheduling Algorithms for Grid Computing and Its Modeling

79

algorithm gives schedule sequence on machine and on machine and a makespan of 16, where execution time is 11. It shows that the XMinmin with QoS algorithm outperforms the XMin-min without QoS and the XMin-min algorithm outperforms the Min-min algorithm in this example.

5 Experimental Results and Discussion In our experimental testing, the system consists of a cluster named DAWNING 3000 including 4 nodes and 12 PCs. We design a program running the system evaluate the newly proposed scheduling algorithm. The expected execution time, expected transmit time and expected execution cost of tasks on machines were produced randomly by the program. The budget of each task is equal to the maximum of costs of 85%. The experimental evaluation of the algorithms is performed for n={50, 100,150,200} tasks. We get the average makespan and the cost of the 100 times. Table 2 and figure 5 shows the comparison of makespans and costs.

Fig. 5. (a) Makespans for three algorithms

(b) Costs for three algorithms

In order to compare three algorithms, we subtract the transmit time from makespan for Xmin-min and Xmin-min with QoS in figure 5. The difference of makespan among three algorithms is little in figure 5(a). But, from figure 5(b), we know that the difference of cost among three algorithms is big. The cost of Xmin-min with QoS is minimum. The bigger the number of tasks is, the bigger the difference of cost is.

80

Y. Han et al.

6 Conclusions In this paper, two resource scheduling algorithms dealing with execution time and transmitting time were presented. The example and experimental results show that the XMin-min with QoS algorithm outperforms the XMin-min without QoS. As a powerful graphical and analytical tool, an EHLTPN model was presented and a simple model for the resource scheduling in grid computing environment was constructed using EHLTPN. The reachable scheduling graph was given and used to analyze the timing property and the sequence of the resource scheduling efficiently and intuitively. Because the transmitting time values of tasks vary with time, we will consider the dynamic transmitting time when calculating the completion time in every iteration of the XMin-min and XMin-min with QoS algorithms in our future work.

References 1. Buyya, R., Giddy, J., Abramson, D.: An Evaluation of Economy-based Resource Trading and Scheduling on Computational Power Grids for parameter Sweep Applications. Proceedings of the Int. Workshop on Active Middleware Services (AMS 2000), Kluwer Academic Press, USA(2000). 2. Maheswaran, M., Ali, S., et al.: Dynamic Mapping of a Class of Independent Task onto Heterogeneous Computing Systems. IEEE Heterogeneous Computing Workshop (HCW’99),San Juan, Puerto Rico (1999). 3. He, X.-S., Sun, X.-H., Laszewski G. V..: A QoS Guided Scheduling Algorithm for Grid Computing. Proc. of the Int. Workshop on Grid and Cooperative Computing (GCC2002), Sanya,, China(2002)745-758. 4. Murata, T.: Petri Nets: Properties, Analysis And Applications. Proceeding of IEEE. Vol. 77, 4 (1989)541-580. 5. Jensen, K.: Coloured Petri nets: Basic Concepts, Analysis Methods and Practical Use, Vol. 1, Basic Concepts. Mono-graphs in Theoretical Computer Science. Berlin, Heideberg, New York: Springer-Verlag, 2nd corrected printing (1997). 6. Zuberek, W.M.: Timed Petri Nets: Definitions, Properties and Applications. Microelectron. Reliab., Vol. 31, 4(1991)627-644 7. Prashant Reddy, J., Kumanan, S., Krishnaiah Chetty, O.V..: Application of Petri Nets and a Genetic algorithm to Multi-Mode Multi-Resource Constrained Project Scheduling,” AMT 17 (2001)305-314. 8. Huang, B., Zhang, B.: A New Scheduling Model Based on Extended Petri Net - TREM Net. Proc. of the Int. Conference on Robotics and Automation, Vol. 1 (ICRA94-1). San Diego, CA , USA(1994)495-500. 9. Han Y.-J., Jiang C.-J.: Extended Colored Time Petri Net -Based Resource Scheduling in Grid Computing. Proc. of the Int. Workshop on Grid and Cooperative Computing (GCC2002), Sanya,, China(2002)7345-353.

Architecture of Grid Resource Allocation Management Based on QoS* Xiaozhi Wang and Junzhou Luo Department of Computer Science and Engineering, Southeast University Nanjing, 210096, P. R. China [email protected], [email protected]

Abstract. Qualities of service (QoS) and resource management are key technologies in grid. Through analyzing the characteristics of Grid QoS, this paper sets up the layered structure of Grid QoS. Based on the analysis of the content of grid resource allocation management (GRAM) based on QoS, this paper puts forward the architecture of GRAM based on QoS. Through mapping, converting and negotiating the QoS parameters, it can implant the user’s requirement about QoS in the process of resource allocation management, and connect Grid QoS with GRAM very well. All these provide a reasonable consulting model for QoS and resource allocation management in grid.

1 Introduction The total target of grid is to provide users the ability to harness the power of large numbers of heterogeneous resources: computational resources, storage resources, devices and useful information etc, which are distributed in the wide area and belong to different organizations. In Open Grid Services Architecture (OGSA [1]), all resources are organized in a rational way and formed virtual organizations, which are dynamic and expansive. Virtual organization makes it possible for the mapping of many logical resources cases to the same physical resource. Resource management can be conducted in virtual organization based on basic resource formation. So resource management encounters new challenges. On one hand, in OGSA, grid resources are transparent to grid users, in the form of logical resource. But they are distributed in physical and have their own management strategies. How to allocate and schedule these resources in virtual organization, and enhance their utilization are important problems for resource management to be solved in grid. On the other hand, different grid services have different QoS requests to resources participating in the service, and in the case of considering the service cost, different users can have different QoS needs. In OGSA, the QoS characteristics of physical resource cannot *

This work is supported by National 973 Fundamental Research Program of China (G1998030402) and National Natural Science Foundation of China (90204009)

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 81–88,2004. © Springer-Verlag Berlin Heidelberg 2004

82

X. Wang and J. Luo

represent that of logical resource. How to convert the user’s QoS request to particular QoS parameters that are in grid becomes a problem to be solved urgently. Moreover, the resource allocation under grid environment is closely related to Grid QoS, while carrying on resource management, one should consider the two synthetically. Based on above issues, through analyzing the characteristics of Grid QoS, we propose and set up the layered structure of Grid QoS. On the basis of analyzing the content of GRAM based on QoS, further, we propose the architecture of grid resource allocation management based on QoS (GRAM-QoS). Through the mapping among different layers of Grid QoS, it converts the user’s QoS demand to particular QoS parameters of resources, and implants the mapping conversion of QoS into the resource selection processes. Considering the characteristics of Grid QoS roundly, GRAM-QoS provides a reasonable consulting model for resource allocation management based on QoS. The rest of the paper is as follows. Section 2 reviews related work in GRA and Grid QoS. In Section 3, we analyze the characteristics of Grid QoS and set up the layered structure of Grid QoS. In Section 4, we propose the content of GRAM based on QoS, and introduce the architecture and working flow of GRAM-QoS. Finally, we conclude and point to future directions in Section 5.

2 Related Work and Limitations In [2], the author indexed the resource allocation in grid has two phases: the external and the internal. It points out the characteristics of resource allocation under the grid environment. In [3], the author introduced the economics method, which the service price of resources is determined by the market supply and demand state. It puts forward one new method about resource allocation in terms of service price. But they don’t consider the user’s QoS request synthetically. In [4][5][6][7], the author mainly researched the architecture, reservation strategy and scheduling technology based on resource co-reservation in grid computing systems. Other scholars studied grid resource allocation technology from different angles, e.g. [8][9][10], these research works raised the performance of GRA from different aspects. Though above works can offer QoS guarantee in some extent, they didn’t analyze the characteristics of Grid QoS roundly. So there isn’t a systemic solution based on Grid QoS in grid resource allocation up to now.

3 The Characteristics and Layered Structure of Grid QoS 3.1 The Characteristics of Grid QoS Quality of Service (QoS) is a synthetically guideline, which is used for measuring the satisfaction of a service. It describes some characteristics of a certain service. It’s expressed as a group of parameters using an understandable language to users. In

Architecture of Grid Resource Allocation Management Based on QoS

83

[11], the author divides the QoS of multimedia network system into network QoS and devices QoS. But it can’t summarize QoS issues in grid circumstance. In grid, QoS issues not only include network QoS issue and devices QoS issue, it should also include resource QoS issue. Resources in grid circumstance not only contain hard resource but also soft resource in the form of information, data and software, etc. They should respond to the resource demand of grid job as well as that of local job. Generally, the priority of local job is higher than that of grid job. So the status of the resource, such as load-factor, arrival rate of local job etc, will have important effect on the quality of resource to grid users. On the other hand, the performance of the device, which the resource lied on, also has important effect on the QoS to grid users. In order to evaluate the quality of grid resource accurately, we separate the devices QoS and resources QoS. Among them the devices QoS parameters express the QoS parameters of the specific devices such as response time, throughput capacity, etc, resources QoS parameters express the performance of resources in grid circumstance, such as load ability, interrupt rate, etc. The devices QoS parameters and resources QoS parameters are totally different. In addition, for all components in OGSA are virtual objects, grid users are only interest in logical resource, which is in virtual organization. The characteristic of physical resource should transparent to grid user. Based on the analyzing of last paragraph, the QoS characteristics of physical resource cannot represent that of logical resource. So in grid, we should also differentiate logical resource and physical resource. The Grid QoS system should provide strategy and function that can convert and map the QoS parameters of logical resources to that of the physical resources.

3.2 The Layered Structure of Grid QoS Because QoS have different descriptions to different objects, for example, QoS demands put forward by the end user maybe only some simple descriptions, such as bad, generally, better, best. The QoS demands of resource in that of grid service QoS is the QoS demands to logical resource and system, for instance, resources are excellent, system response time 180ms, system transmission speeds 2 Mb/s, etc. The final QoS parameters are a group of particular numerical value. Further extended in [11], grid QoS can be divided into four layers, as fig. 1 shows. The top of it is user layer, representing the QoS demand, brought forth by the users when they apply for the grid service. The second is the grid service layer. It describes the QoS demand of grid service, for instance, the responding time of service, the transmission rate of the system, and quality of resource that takes part in the service etc. The third is a layer of system and logical resource, which satisfies the QoS parameters in layer of grid service. In system QoS, it mainly refers to network QoS demands and device QoS demands. In logical resource QoS, it mainly refers to the QoS demands on resources in virtual organizations. The bottom is a layer of network, devices and physical resource. The layer of network QoS describes the performance of network, such as ability of loading, internal time, delay, bandwidth etc. The layer of devices QoS describes the demands on devices such as the responding time of

84

X. Wang and J. Luo

Fig. 1. Layered structure of Grid QoS

devices, throughput, etc. The layer of physical resource QoS describes the demands on physical resource that takes part in the service actually, such as loadable ability of physical resource, useable time, interruption rate of grid job etc. Through the analysis above, we can divide Grid QoS into three parts: network, devices and physical resource. QoS parameters of each layer are coincidentally converted from top to bottom or from bottom to top by specific grid system.

4 GRAM-QoS: GRAM Based on QoS 4.1 The Content of GRAM Based on QoS The activity of GRAM based on QoS exists in the process of service application, service execution and service close. At the stage of service application, the resource allocation management system will define user’s resource demands according to specific grid service, and convert user’s QoS demands to particular grid QoS parameters, then it will take these QoS parameters as constrained factors to search for available resources that satisfy the requirement in grid. In the process of searching, system maybe negotiate with the user, then get final result: not being able to supply, being able to supply or reducing QoS demands to supply. If the negotiation is successful finally, it means that service provider can provide resource with final QoS demands, and then system can carry out the admission control according to specific strategy. After that, it reserves the resource and sent the grid job to resource waiting pool, waiting to be scheduled and executed. If not successful, system should inform the user and terminate the application. At the executive stage of service, the resource allocation management system will monitor the resource which at the reserved state, and renew the relevant QoS information of them. If the information of reserved resource cannot satisfy the user’s demand, it should make new QoS negotiation or choose commutability resource in order to ensure that the user’s QoS demand can be satisfied. As for the resource at reserved state, it will be scheduled according to the strategies offered by the provider. Because in general situation, the priority of local job is higher than that of the grid job, the arrival of local job will lead to the interruption of grid job. In order to ensure that grid job can be done before deadline, it’s necessary to dynamic adjust the priority

Architecture of Grid Resource Allocation Management Based on QoS

85

of grid job, in another word, with closer to deadline, the priority of grid job should becomes higher. Adopting this scheme also can enhance the resource utilization. At the stage of close, system should release the resource and renew the statistical information of the resource. For the use of resource in grid need tolls, system should also record the information such as the cost of the used resource and the user’s account information etc.

4.2 Logical Structure of GRAM Based on QoS (GRAM-QoS) According to the analysis above about the content of GRM based on QoS, in addition to the analysis of the layered structure of Grid QoS, we propose the logical structure of GRAM based on QoS (GRAM-QoS), just as fig. 2 shows. The main modules are explained as follows. 1. Grid Services Market On one hand, Grid Service Market provides the function of inquire about grid service for the grid user; on the other hand, it provides a function of register and publish grid service for the service provider. When the provider register and publish, they should provide identity proves and relevant description of service, such as resource demand and QoS demand which with particular QoS parameters in different layers. 2. Grid Middleware Services This module is mainly responsible for sign-on, safety control, managing user’s information and accounting the information about the used resource. 3. Grid Resource Broker The Resource Information Service Center module is the information center of available resource in grid circumstance. It provides information about the quality and QoS parameters of the logical resource. The Resource Information Provider Service module in Grid resource node offers this information. The QoS Mapping & Converting module implements the mapping conversion from user’s QoS demand to particular QoS parameters in different layers. The QoS Negotiation module in Grid Resource Broker used for judging whether system QoS and logical resource QoS can satisfy user’s demands. The QoS Negotiation module in Grid Resource Node judges whether physical resource QoS, network QoS and devices QoS can satisfy the user’s demands. When presenting resources cannot satisfy the user’s demand, two QoS Negotiation modules should interact with relevant modules and inquire whether the user can reduce QoS demand. The Resource Monitor module is responsible for monitoring the reserved resources. If the QoS parameters of reserved resources cannot satisfy user’s demands, the module would get touch with the QoS Negotiation module to make new QoS negotiation or choose commutability resource. The Resource Information Provider Service module offers the information needed by this module. The Error Process module processes errors that come from the QoS Negotiation module with the resource, which cannot satisfy user’s QoS demands. It finishes the execution of grid service and reminds the user.

86

X. Wang and J. Luo

Fig. 2. Logical structure of GRAM based on QoS

4. Grid Resource Node The Resource Information Provider Service module locates in Grid Resource Node, which is used for monitoring the QoS information of physical resources in grid. It obtains the newest information of resources through the QoS Control module, and provides the Resource Information Service Center module and the Resource Monitor module with renewed information. If the result of the QoS negotiation is that it is able to provide resource that can satisfy user’s demand, the QoS Admission Control module would complete tasks such as resources co-allocation, conflict detect, deadlock detect and load balance, etc. Then, finally the module will finish the affirmation work requested by the user. The Trade Server module is responsible for determine the using price and record the information such as the total cost of the used resource and the user’s account information etc. The Resource Reservation module is responsible for setting resources reservation flag and sending grid job to the Waiting-job Pool, waiting to be scheduled. Otherwise, Waiting-job Pool should responsible for adjusting the priority of gird jobs dynamically. The Scheduler takes charge of the scheduling of jobs in Waiting-job Pool according to particular strategy. In general, the priority of local job is higher than that of the grid job. It is permitted that the grid job has higher priority when grid job is very close to its deadline. The QoS Control module takes charge of the control of all dynamic QoS parameters. It adjusts QoS parameters according to the result of QoS negotiation, such as bandwidth, buffer size, etc. It should also response the inquiry from the Resource Information Provider Service module and renew its state information.

Architecture of Grid Resource Allocation Management Based on QoS

87

4.3 Working Flow Without considering the service register and publication, starting with the grid service application, we give the working flow of GRAM-QoS as follows: 1) Through the Grid Middleware Services module, grid user sign-on the grid system; 2) Grid user inquire and apply for grid service through Grid Services Market; 3) System confirms user’s resource demands according to the specific grid service; 4) The user puts forward QoS demand; 5) The QoS Mapping & Converting module implements the mapping conversion from user’s QoS demand to particular QoS parameters in different layers; 6) System chooses logical resources according to that needed by grid service in Resource Information Service Center. Through QoS negotiating, mapping and converting, system assures that selected logical resources can satisfy user’s QoS demands. If not finding resource that can satisfy the demands, the Error Process module would inform the user and terminate the applying; 7) According to the result from step 6), system inquire about the information of physical resources according the mapping relation between logical resource and physical resource; 8) Through the QoS Negotiation module in Grid Resource Node, system judges whether physical resource QoS, network QoS and devices QoS can satisfy the user’s demands. When presenting resources cannot fully satisfy the user’s demand, two QoS Negotiation modules should interact with relevant modules and inquire whether the user can reduce QoS demand, working flow go back to step 6); 9) Then the QoS Admission Control module implements affirmation work requested by grid job; 10) The Trade Server module determines the using price of resource and records the information about the used resource; 11) The Resource Reservation module sets up resource reservation flag and record QoS demands; 12) Grid job enters Waiting-job Pool. The Waiting-job Pool module takes charge of dynamic adjust the priority of grid jobs in Waiting-job Pool; 13) The Resource Monitor module monitors the state of reserved resources; 14) The Scheduler schedules local jobs and grid jobs in Waiting-job Pool according to particular strategy; 15) The QoS Control module adjusts QoS parameters according to the result of QoS negotiation and reacts the requirement from the Resource Information Provider Service module and renews its information about resource state.

5 Conclusions and Future Work Grid QoS and resource management are key technologies in grid. The relationship between them is much closer in OGSA. Through analyzing the QoS characteristic in grid, we propose and set up the layered structure of QoS in grid, which provides a reasonable gist for mapping and converting QoS parameters in grid. On the basis of

88

X. Wang and J. Luo

analyzing the content of GRAM based on QoS, we put forward the architecture of GRAM based on QoS (GRAM-QoS). The GRAM-QoS provides a reasonable consulting model for the QoS and resource allocation management in grid. In future work, we plan to design a simulation platform based on Grid QoS to experiment the usability of GRAM-QoS and enhance its performance. We also aim to implement GRAMA-QoS on Globus toolkit. Another interesting aspect is to study the technology of dynamic adjust the priority of grid job. In this aspect, we maybe consult relevant solution of CORBA [12]. Furthermore, we plan to formalize the description language about the parameters of Grid QoS, which is based on XML.

References I. Foster, C. Kesselman, Jeffrey M. Nick, S. Tuecke. The Physiology of the Grid: An Open Grid Services architecture for Distributed Systems Integration. http://www.gridforum.org/ogsi-wg/drafts/ogsa_draft2.9_2002-06-22.pdf 2. Chen Hongtu, M. Maheswaran. Distributed dynamic scheduling of composite tasks on grid computing systems. Parallel and Distributed Processing Symposium., Proceedings International, IPDPS 2002, on page(s): 88-97 3. R. Buyya, D. Abramson, J. Giddy. A case for economy grid architecture for service oriented grid computing. Parallel and Distributed Processing Symposium., Proceedings 15th International, Apr 2001, on page(s): 776-790 4. L. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, A. Roy. A Distributed Resource Management Architecture that Supports Advance Reservations and Co-Allocation. Intl Workshop on Quality of Service, 1999 5. K. Czajkowski, I. Foster, and C. Kesselman. Resource Co-Allocation in Computational Grids. Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing (HPDC-8), 1999 on page(s). 219-228 6. W. Smith, I. Foster, V. Taylor. Scheduling with advanced reservations. Parallel and Distributed Processing Symposium, 2000. IPDPS 2000. Proceedings. 14th International, 1-5 May 2000, on page(s): 127 –132 7. Lizhe Wang, Wentong Cai, Bu-Sung Lee, Simon See, Wei Jie. Resource co-allocation for parallel tasks in computational grids. Challenges of Large Applications in Distributed Environments, 2003. Proceedings of the International Workshop on , 21 June 2003, on page(s): 88 –95 8. Omer F. Rana, Michael Winikoff, Lin Padgham, James Harland. Applying conflict management strategies in BDI agents for resource management in computational grids Author. IEEE Computer Society Press. 2002, on page(s): 205-214 9. Jonghun Park. A scalable protocol for deadlock and livelock free co-allocation of resources in Internet computing. Applications and the Internet, 2003. Proceedings. 2003 Symposium on , 27-31 Jan. 2003, on page(s): 66 –73 10. Grosu Daniel, Anthony T. Chronopoulos. Algorithmic Mechanism Design for Load Balancing in Distributed Systems. Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics: Accepted for future publication, 2003,on page(s): 1-8 11. K. Nahrstedt, R. Steinmete. Resource Management in Networked Multimedia Systems. Computer, May 1995, Volume: 28 Issue: 5, on page(s): 52 –63 12. Y. Wang, F. Brasileiro, E. Anceaume, F. Greve, and M. Hurfin. Avoiding Priority Inversion on the Processing of Requests by Active Replicated Servers. Proc. Int’l Conf. Dependable Systems and Networks (DSN 2001), on page(s): 97-106, 2001 1.

An Improved Ganglia-Like Clusters Monitoring System* Wenguo Wei1,2 , Shoubin Dong 1 , Ling Zhang1, and Zhengyou Liang1 1

Guangdong Key Laboratory of Computer Network, South China University of Technology, Guangzhou, 510641, P.R.China {wgwei, 2

sbdong,

ling,

zhyliang}@scut.edu.cn

Department of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, 510665, P.R.China

Abstract. Ganglia [1] is a scalable distributed monitoring system for high performance computing systems such as clusters and Grids. We propose an improved Ganglia-like clusters monitoring system, which has more reliability with federation node and associated link failures; some monitoring data is accessed by permission; adding control functions such as restart or shutdown confusion processes; send email or pager to cluster administrator when important event occurs; and optionally select some data to federation node based on user policy in order to speedup the WAN access. We have implemented a prototype system.

1 Introduction Currently there has been an enormous shift in high performance computing from systems composed of small numbers of computationally massive devices to systems composed of large numbers of commodity components. This architectural shift from the few to the many is causing designers of high performance systems to revisit numerous design issues such as scale, reliability, heterogeneity, manageability, and system evolution over time. With clusters now the de facto building block for high performance systems, scale and reliability have become key issues, as many independently failing and unreliable components need to be continuously accounted for and managed over time. Heterogeneity, previously a non-issue when running a single vector supercomputer or an MPP, must now be designed for from the beginning, since systems that grow over time are unlikely to scale with the same hardware and software base. Manageability also becomes of paramount importance, since clusters today commonly consist of hundreds or even thousands of nodes. Finally, as systems evolve to accommodate growth, system configurations inevitably need to adapt. In summary, high performance systems today have sharply diverged from the monolithic machines of the past and now face the same set of challenges as that of largescale distributed systems. * This research was supported by Guangdong Key Laboratory of Computer Network under grant 2002B60113. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 89–96, 2004. © Springer-Verlag Berlin Heidelberg 2004

90

W. Wei et al.

This paper presents the design and implementation of the improved Ganglia-like distributed monitoring system. It is organized as follows. In Section 2, we describe the key challenges in building a distributed monitoring system and why we select Ganglia to improve, then we analyze the architecture of Ganglia, finally we point out Ganglia’s main issues. In Section 3, we describe our current design and implementation of Ganglia-like system that is improved at several fields. In Section 4, we present a performance analysis of our system theoretically. And in Section 5, we conclude the paper.

2 The Presupposition and Foundation of Our Work 2.1 Monitoring System Design Challenges The key design challenges for distributed monitoring systems thus include: Scalability: The system should scale gracefully with the number of nodes in the system. Clusters commonly consist of hundreds nodes. Grid computing efforts, such as TeraGrid [2], will eventually push these numbers out even further. Robustness: The system should be robust to node and network failures of various types. As systems scale in the number of nodes, failures become inevitable. The system should localize such failures so that the system continues to operate and delivers useful service in the presence of failures. Extensibility: The system should be extensible in the types of data that are monitored. It is impossible to know a priori everything that ever might want to be monitored. The system should allow new data to be collected and monitored. Manageability: The system should incur management overheads that scale slowly with the number of nodes. For example, managing the system should not require a linear increase in system administrator time as the number of nodes increases. Manual configuration should also be avoided as much as possible. Portability: The system should be portable to a variety of operating systems and CPU architectures. Despite the recent trend towards Linux on x86, there is still wide variation in hardware and software used for HPC. Systems such as Globus [3] further facilitate use of such heterogeneous systems.

2.2 Why Ganglia? There are a number of research efforts centered on monitoring of clusters, but only a handful that have a focus on scale. Supermon [4] is a hierarchical cluster monitoring system that uses a statically configured hierarchy of point-to-point connections to gather and aggregate cluster data collected by custom kernel modules running on each cluster node. CARD [5] is a hierarchical cluster monitoring system that uses a statically configured hierarchy of relational databases to gather, aggregate, and index cluster data. Compared to these systems, Ganglia differs in four key respects. First, Ganglia uses a hybrid approach to monitoring which inherits the desirable properties

An Improved Ganglia-Like Clusters Monitoring System

91

of listen/announce protocols including automatic discovery of cluster membership, no manual configuration, at the same time still permitting federation in a hierarchical manner. Second, Ganglia makes extensive use of widely-used, self-contained technologies such as XML and XDR which facilitate reuse and have rich sets of tools that build on these technologies. Third, Ganglia makes use of simple design principles and sound engineering to achieve high levels of robustness, ease of management, and portability. Finally, Ganglia has demonstrated operation at scale.

2.3 Ganglia Architecture Ganglia is based on a hierarchical design targeted at federations of clusters (Figure 1). It relies on a multicast-based listen/announce protocol [6,7] to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. Within each cluster, Ganglia uses heartbeat messages on a well-known multicast address as the basis for a membership protocol. Membership is maintained by using the reception of a heartbeat as a sign that a node is available and the non-reception of a heartbeat over a small multiple of a periodic announcement interval as a sign that a node is unavailable.

Fig. 1. Ganglia architecture

Each node (daemon “gmond”) monitors its local resources and sends multicast packets containing monitoring data on a well-known multicast address whenever significant updates occur. All nodes in same cluster always have an approximate view of the entire cluster’s state and this state is easily reconstructed after a crash. Ganglia (daemon “gmetad”) federates multiple clusters together using a tree of point-to-point connections. Each leaf node specifies a node in a specific cluster being federated, while nodes higher up in the tree specify aggregation points. Each leaf node logically

92

W. Wei et al.

represents a distinct cluster while each non-leaf node logically represents a set of clusters. (You can specify multiple cluster nodes for each leaf to handle failures.) Aggregation at each point in the tree is done by polling child nodes at periodic intervals. Monitoring data from both leaf nodes and aggregation points is then exported.

2.4 Ganglia’s Main Issues There are several issues in Ganglia (2.x version): If any aggregation nodes and associated links are failure, then all the data of its leaf nodes (a set of clusters) cannot be collected, i.e. it provides no redundancy of non-leaf nodes and associated links. Ganglia is a distributed monitoring system that may span multiple clusters, if those clusters belong to different organization, some clusters owner may not want anyone to know all information Ganglia reported. So a mechanism must be proposed to access monitoring data by permission. There is little or no control mechanism. Sometimes people not only monitor passively, but also want to restart or kill some processes, or want Ganglia send email or pager to cluster administrator if any important event occurs. Ganglia aggregates all data that gmond reported to client, even if there is many clusters, each cluster is composed of many nodes, and the network is slowly. To speedup network access, only part of monitoring data (e.g. dynamic information) is sent when network is busy, however user have choice to see all data if they want. Infrastructure limitation. Ganglia has a flat namespace, i.e. it assumes that all measurements on hosts can be easily represented by a simple key/value pair. This may hold for some metrics such as the number of CPUs but fails miserably when you want something like the %CPU user for process 1289.

3 Improvements and Implementation 3.1 Improved Reliability of Whole System As systems scale in the number of nodes, failures become both inevitable. If any aggregation nodes and associated links are failure, then all the data of its leaf nodes (a set of clusters) cannot be collected, i.e. it provides no redundancy of non-leaf nodes and associated links. So we assume any of two aggregation nodes have at least two different paths linked (may not directly linked) in the monitored system, i.e. the structure of monitored system isn’t a tree but contain rings. We propose a mechanism to deal with failure of aggregation nodes. Data collection in gmetad is done by periodically polling a collection of child data sources, which are specified in a configuration file. Each data source is identified using a unique tag and has multiple IP address/TCP port pairs associated with it, each of which is equally capable of providing data for the given data source. It use configuration files for specifying the structure of the federation tree for simplicity and

An Improved Ganglia-Like Clusters Monitoring System

93

since computational Grids, while consisting of many nodes, typically consist of only a small number of distinct sites.

Fig. 2. Example of monitored system logic connection relationship

Configuration File’s Structure. If a case of monitored system is as figure 2. Our aggregation node’s configuration file extends Ganglia function. We explain as following (figure 3): There are two types data source, one is gmond (a cluster state), and another is gmetad (a set of clusters state). For gmond type data source, spokesman is any node of the cluster, IP address and TCP port pairs can identify it, and generally we can specify multiple nodes for redundancy. The parent field is up level or same level aggregation node to collect its monitoring data, two types parent nodes, i.e. primary parent is the main aggregation node, secondary parent is backup node, it begins to work when primary is unavailable. For another data source gmetad, there is only parent field, which is also used for data collecting. Example of lowest aggregation node configuration file is as figure 3.

Fig. 3. Example of lowest aggregation node’s configuration file

Procedure of Monitoring Data Collecting. If all aggregation nodes work normally, collecting data is done just as Ganglia by polling child nodes at periodic intervals.

94

W. Wei et al.

When any aggregation node is failure (for example, in figure 2, Agg1 is failure), its upper aggregation node (the primary parent node: Agg11) triggers a message to the failed aggregation node’s lower nodes (Data Source1 and Data Source2) to send monitoring data to its secondary parent node (Agg2). Reliability Analysis. This solution has better reliability then Ganglia. If the failure node is a cluster spokesman, the engine (Listening thread) could collect data from any other spokesman (at least 2 spokesman, however you can specify many spokesman if you like), if the failure node is an aggregation node, which is parent node of a data source, then the data source monitoring data can be collected by another parent node. According to graph connectivity theory [8], the connectivity of monitored system topological graph is at least 2, because any of two aggregation nodes have at least two different links in the monitored system. When both parent nodes fail synchronously, then some data may not be collected because of disconnection of the graph. So our solution can work with many numbers of spokesman failures (at least a spokesman is running) and one of parent node failure, its reliability significantly improved. Anyway, if the connectivity of graph is bigger, the reliability is higher.

3.2 Accessing Monitoring Data by Permission Ganglia is a distributed monitoring system that may span multiple clusters, if those clusters belong to different organization, some clusters owners may not want anyone to know all information (for example the running processes) Ganglia reported. So we propose a mechanism to access monitoring data by permission. We add a one-byte field-- “security” to some metrics by default, anyone whose permission is bigger then the field could see this metric data, and the granularity of permission is metric. Further, we can extend it to more levels security control. However, cluster administrator can customize it by adding more metrics to security control, or reducing some metrics from security control.

3.3 More Control Functions and Aggressive Behaviors Ganglia has little or no control mechanism. Sometimes people not only monitor passively, but also want to clear some died processes, or restart some confusion processes, of course only anyone has permission could do it, or expect Ganglia to send email or pager to cluster administrator if any important event occurs. We implement the process control by root authorization. The cluster administrator will be notified when any important events occur such as disk is over 95% full, CPU load average is unacceptably high, some important processes have died, can’t connect to special IP address and some service is down etc. We modify Ganglia engine to implement this function.

An Improved Ganglia-Like Clusters Monitoring System

95

3.4 Aggregation Monitoring Data by User Policy Ganglia aggregates all data that gmond (Ganglia monitoring daemon) reported to client, even if the number and size of cluster is big, and the network is slowly. To speedup network access, it’s necessary to have part of monitoring data sent, for example, dynamic information is sent when network is busy. We implement a policy to send data selectively by 4 approaches. The first is by basic static data such as the number of CPUs, operating system (name, version, architecture); the second is by dynamic data such as %CPU (user, nice, system, idle), load (1, 5, and 15-minute averages), memory (free, shared), processes (running, total), free swap etc; the third is by granularity of data, I.e. we limit how deeply the xml data is recursively displayed, there are 3 granularity: cluster, node and metric; the fourth is customized by the above 3 approaches. However, user have choice to see all data if they want and have permission to see it.

3.5 Hierarchical Namespace for Monitoring Data Ganglia has a flat namespace, i.e. it assumes that all measurements on hosts can be easily represented by a simple key/value pair. This may hold for some metrics such as the number of CPUs but fails miserably when you want something like the %CPU user for process 1289, which needs at least 3 fields to represent it. We implement a hierarchical namespace that can have any arbitrary depth (limited only by maximum stack size), this hierarchical namespace can copy with vary data structure.

4 Performance Analysis We improve data storage structure by adding a field to some metrics to deal with permission and change flat namespace to hierarchical namespace; repair Ganglia engine (for example, Listening thread) for aggregation node’s failure and send notify when important event occurs etc. These improvements only add little operational overhead, Theoretical analysis shows sacrificing little performance to get more reliability and flexibility. A quantitative comparison analysis of Ganglia and our system gained through real world deployments on distributed-systems is under taking.

5 Summary Our system improves Ganglia from 5 aspects as following: Enhance the whole reliability from aggregation node’s failure; increase permission assignment and control function; more flexible monitoring data collecting and hierarchical namespace for data storage. This system has good performance and reliability to manage middle and large multiple-clusters environment. Further enhancements and optimizations of this

96

W. Wei et al.

model are currently under investigation. Now Ganglia can monitor PlanetLab [9], which currently consists of 102 nodes distributed across 42 sites spanning three continents: North America, Europe, and Australia. We believe our system will do better on CERNET or ChinaGrid in future.

References 1. 2. 3. 4. 5. 6.

7.

8. 9.

Massie, M. L., Chun, B. N., and Culler, D. E. The Ganglia Distributed Monitoring System: Design, Implementation, and Experience, submitted for publication, February 2003. The TeraGrid Project. Teragrid project web page (http://www.teragrid.org), 2001. I. Foster and C. Kesselman. Globus: A meta computing infrastructure toolkit. International Journal of Supercomputer Applications, 11(2): 115–128, 1997. Matt Sottile and Ron Minnich. Supermon: A high speed cluster monitoring system. In Proceedings of Cluster 2002, September 2002. Eric Anderson and Dave Patterson. Extensible, scalable monitoring for clusters of computers. In Proceedings of the 11th Systems Administration Conference, October 1997. Elan Amir, Steven McCanne, and Randy H. Katz. An active service framework and its application to realtime multimedia transcoding. In Proceedings of the ACM SIGCOMM ’98 Conference on Communications Architectures and Protocols, pages 178–189, 1998. Brent N. Chun and David E. Culler. Rexec: A decentralized, secure remote execution environment for clusters. In Proceedings of the 4th Workshop on Communication, Architecture and Applications for Network based Parallel Computing, January 2000. F. Hyarary, Graph Theory, Addison-Wesley, Reading, Mass, 1969. Larry Peterson, David Culler, Tom Anderson, and Timothy Roscoe. A blueprint for introducing disruptive technology into the internet. In Proceedings of the 1st Workshop on Hot Topics in Networks (HotNets-I), October 2002.

Effective OpenMP Extensions for Irregular Applications on Cluster Environments Minyi Guo1, Jiannong Cao2, Weng-Long Chang3, Li Li1, and Chengfei Liu4 1

Department of Computer Software, The University of Aizu, Aizu-Wakamatsu City, Fukushima 965-8580, Japan minyi@u–aizu.ac.jp 2

4

Department of Computing, The Hong Kong Polytechnic University, Kowloon, Hong Kong 3 Department of Information Management, Southern Taiwan University of Technology, Tainan County, Taiwan School of Computer and Information Science, University of South Australia, Mawson Lakes, South Australia 5095, Australia

Abstract. Sparse and unstructured computations are widely used in Scientific and Engineering Applications. Such problem inherent in sparse and unstructured computations is called irregular problem. In this paper, we propose some extensions to OpenMP directives, aiming at efficient irregular OpenMP codes to be executed in parallel. These OpenMP directives include scheduling for irregular loops, inspector/executor for parallelizing irregular reduction, and eliminating ordered loops. We also introduce implementation strategies with respect to these extensions.

1

Introduction

Many codes in scientific and engineering computing involve sparse and unstructured problems in which array accesses are made through a level of indirection or nonlinear array subscript expressions. This means that the data arrays are indexed either through the values in other arrays, which are called indirection arrays/index arrays, or through non-affine subscripts. The use of indirect/nonlinear indexing causes the data access patterns, i.e. the indices of the data arrays being accessed, to be highly irregular. Such a problem is called irregular problem. Exploiting parallelism for irregular problems becomes very difficult due to their irregular data access pattern. A typical example is shown in Fig. 1. In the loop, elements are moved across the columns of a 2D array based on the information provided in the indirection arrays prev_elem and next_elem. The elements of array cell are shuffled and stored in array new_cell. If this loop is split across OpenMP threads with different threads taking care of different values of the prev_elam and next_elem may have the same values in different threads at the same time. This may result in a potential problem when updating the value of new_cell. There are some simple solutions to this problem, which include making all the updates atomic, or having each thread compute temporary results which are M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 97–104, 2004. © Springer-Verlag Berlin Heidelberg 2004

98

M. Guo et al.

then combined across threads. However, for the extremely common situation of sparse array access neither of these approaches is efficient.

Fig. 1. A typical irregular loop

2

Requirements of Irregular Loop Scheduling

When parallelizing a loop in OpenMP, we may use the schedule clause to perform different scheduling policies which affect how loop iterations are mapped onto threads. There are four scheduling policies available in OpenMP: static Scheduling, dynamic scheduling, guided scheduling, and runtime scheduling. In order to achieve load balance for irregular loops, it is better to select dynamic or guided scheduling. In dynamic and guided scheduling schemes, the chunk parcel follows the owner computers rule. This rule specifies that, on a single-statement loop, each iteration will be executed by the processor which owns the left hand side array reference of the assignment for that iteration. However, if irregular loops are parceled in terms of dynamic scheduling, the performance of total execution may not be improved even if the load balance is achieved. The reason is that the communication overhead among threads may be considerable, especially for cluster environments or software distributed-shared memory (DSM) systems. Consider the following irregular loop.

Without

loss

of generality, and

for iteration assuming are distributed onto threads

that

Effective OpenMP Extensions for Irregular Applications

99

and respectively. Then the iteration of executing S1, S2, and S3 would be partitioned to threads and respectively. The following table shows the owner of executing assignments and the required communications and synchronizations for the example loop.

We can conclude that owner computes rule is often not best suited for irregular codes. Another situation is shown in Fig. 1. The index may have the same values on different threads. When parallelized in the cluster environment, there are often inter-thread dependencies in which two different iterations of the loop modify the same data. This situation prevents the loop from being processed in parallel, and thus serializes its execution. Though OpenMP provides two mechanisms that help in the parallelization of such loops: atomic and critical directives. These mechanisms, however, are excessively expensive in codes where not all the accesses need to be done in mutual exclusion and in other two situations, irregular reduction and loops in which only some iterations need to be executed in sequential order. As such, we propose a new OpenMP directive irregular, for loops that have irregular data access patterns. The directive will provide an efficient alternative to load balance, iteration partitioning and irregular data access.

3

OpenMP Directive Extensions for Irregular Loops

This section introduces our extensions to OpenMP, the irregular directive. The irregular directive can be applied to the parallel do directive in one of the following situations: When the parallel region is recognized as an irregular loop: in this case the compiler will invoke a runtime library which partitions irregular loop according to a special computes rule. When an ordered clause is recognized in the parallel region where the loop is irregular: in this case the compiler will treat this loop as partially ordered; that is, some iterations are executed sequentially while others may be executed in parallel. When an reduction clause is recognized in the parallel region where the loop is irregular: in this case the compiler will invoke an inspector/executor routines to perform irregular reduction in parallel.

100

M. Guo et al.

The irregular directives in extended OpenMP version may have the following patterns: This designates that the compiler will encounter an irregular loop, where irarrayl,. . .,irarrayN are possible indirection arrays, or This designates that the compiler will encounter a special irregular reduction or irregular ordered loop where expr1, .... exprN are expressions such as loop index variables.

3.1

Irregular Loop Scheduling

Irregular loop are frequently found in the core of scientific and engineering applications. The following loop is a more complicated irregular loop, which is a simplified version extracted from ZEUS-2D code:

For the above loop, the compiler will consider communication overhead when the iteration is partitioned to Although the number of elements to be communicated is 6, same as the former, but the communication steps are reduced (three times). This improvement is important when the outer sequential time step-loop is large. This illustrates that the owner computes rule is not always an optimal scheme for guiding partition of loop iterations.

3.2

Partially Ordered Loops

A special case for the example in Fig. 1 occurs when a shared update needs to be performed not only in a mutual exclusive manner but also in an ordered way. The use of the irregular clause in this case tells the compiler that for those iterations which may update the same data in the different threads, they need to be executed in an ordered way. There is no change to other iterations. An example of code using the indirect clause in this manner is shown as follows:

Effective OpenMP Extensions for Irregular Applications

3.3

101

Irregular Reduction

Some scientific applications need to perform reduction operations which are not directly parallelizable, where the update index for the element is not the induction variable of the loop, but a function of it, or another array. The following code shows an example of such a case.

The example shows that the computation is irregular because accesses to array y are determined by the index arrays idx1 and idx2, preventing the compiler from analyzing accesses exactly at compile time. The only way to parallelize the code using OpenMP is to protect the update of the y array either with atomic or with critical directives. These solutions, however, are excessively expensive in codes where not all the accesses need to be done in mutual exclusion. The inclusion of the irregular clause in the parallel do directive in the presence of a reduction clause tells the compiler that the reduction being performed in the parallel loop has an irregular data access pattern, but some parts of it can be executed in parallel, thus enabling the compiler to generate code to deal with this situation. The implementation of this clause will be introduced in the next section.

102

4

M. Guo et al.

Implementation

Our extended irregular directives for OpenMP can be implemented by adding several library routines in the compiler. The implementation strategies and algorithms are outlined in this section.

4.1

Implementation of Irregular Scheduling

We adopt the strategy of loop iteration partitioning for irregular codes by following the least communication computes rule [4]. Different from the owner computes rule, the whole loop body of a loop to be parallelized is processed. Suppose that all arrays including data arrays and index arrays are initially distributed as BLOCK. The communication pattern of a partitioned loop iteration on a processor can be represented as a directed graph G = (V, E), called communication pattern graph (CPG). The following algorithm describes how iterations of a loop to be partitioned to threads, where is defined as a set of threads which have to send (receive) data to (from) thread before (after) the iteration is executed. and are the number of processors in and respectively. and are defined as the degrees of the set and respectively.

4.2

Implementation of Partial Ordered Loops

To parallelize partial ordered loops, the key technique is to detect data dependence. However, it is very expensive to test most of the data dependencies for irregular codes at runtime. We proposed a symbolic analysis method [3] similar to Range Test [2], which can detect irregular data dependence as much as at compile-time. In our symbolic analysis, symbolic solutions of a set of symbolic expressions are obtained by using certain restrictions. We introduced symbolic analysis algorithms to obtain the solutions in terms of a set of equalities and inequalities.

Effective OpenMP Extensions for Irregular Applications

103

Fig. 2. Performance of X2INTZC program Fig. 3. Performance of IRRCFD program on 8 processors SUN Cluster with three dif- on 8 processors SUN Cluster with the optiference scheduling strategies of OpenMP. mization of irregular reduction and partial ordered loop of OpenMP.

4.3

Implementation of Irregular Reduction

We use GatherScatter approach for the implementation of irregular reduction in our compiler. GatherSatter can generate explicit messages between threads for distributed memory systems. Reductions with regular accesses can be converted directly to collective communication. Irregular reductions may be parallelized by generating an inspector to identify nonlocal data needed by each processor. The inspector also generates a communication schedule and performs address translation, modifying indices of nonlocal data to use local buffers. Inspectors are expensive, but their code can be amortized over many time steps. on each time step an executor gathers nonlocal data using the communication schedule, performs the computation using local buffers, and scatters nonlocal results to the appropriate processors.

5

Experiments, Simulations, and Performance Results

We are constructing the library routines with the OpenMP implementation. We evaluated our extensions on two platforms, SGI Origin2000 with 16 nodes and SUN workstation cluster with 8 × 400MHz CPUs, connected by 100Mbps Ethernet cable. Since irregular scheduling and reduction are not part of the current OpenMP implementation, we implemented them by hand so that this part of the OpenMP program was generated to pthread routines in SGI Origin2000 and MPI routines in the cluster. The OpenMP versions are MIPSpro Fortran 77 Compiler for SGI Origin2000 and SUN Forte Developer 6 Update 2 compiler for SUN Cluster, both provide OpenMP support. Due to the limited space of this paper, we only show the results on the cluster. We select an irregular kernel X2INTZC of the fluid dynamics code, ZEUS-2D for our irregular scheduling study. X2INTZC includes some loops with similar appearance as Example 2. Another application IRRCFD is used to evaluate the irregular reduction and partial ordered loop optimizations. Figure 2 shows 8 CPUs speedup on the cluster for those three scheduling strategies. Here we observed that irregular scheduling improves the performance

104

M. Guo et al.

since it reduces communication cost. This phenomenon is more significant when the time steps are large. Figure 3 presents the optimized results of IRRCFD on the cluster. We see that in both cases the performance after optimization of irregular reduction and partial ordered loops has been improved. In comparison, speedup on the cluster is less than one on SGI Origin2000 because it costs more for inspector/executor manner.

6

Conclusion

The performance issue of irregular scientific computing codes in current OpenMP implementation has not well investigated. At its current version, OpenMP can only sequentially execute loops by using atomic, order loops directives for irregular codes. In this paper, we proposed the new directives to improve the sequentially decomposition of irregular loops, to parallelize irregular reductions, and to reduce atomic and ordered loops. These directives are specially useful for distributed memory multicomputers such as cluster platforms, though the implementation of them in OpenMP compilers may cost extra computation at runtime. The experiments and simulations validated our effort for these proposals. Our proposed method would enable straightforward and efficient automatic parallelization of a wide range of scientific applications.

References 1. R. Asenjo, E. Gutierrez, Y. Lin, D. Padua, B. Pottengerg, E. L. Zapata. On the Automatic Parallelization of Sparse and irregular Fortran codes. Technical Report 1512, University of Illinois at Urbana-Champaign, CSRD, December 1996. 2. W. Blume and R. Eigenmann. Nonlinear and symbolic data dependence testing. IEEE Transactions on Parallel and Distributed Systems, Vol. 9, No. 12, pp. 11801194, Dec. 1998. 3. M. Guo, Y. Pan, and C. Liu. Symbolic Communication Set generation for irregular parallel applications. The Journal of Supercomputing, Vol. 25, No. 3, pp. 197–214, 2003. 4. M. Guo. Efficient Loop Partitioning for parallel codes of irregular Scientific Computations. IEICE Transactions on Information and Systems, Vol. E86-D, No. 9, pp. 442–451, 2003. 5. E. Gutierrez, R. Asenjo, O. Plata, and E.L. Zapata. Automatic parallelization of irregular applications. Parallel Computing, 26(2000), pp. 1709-1738, 2000. 6. Y. Hu, A. Cox, and W. Zwaenepoel. Improving fine-grained irregular sharedmemory benchmarks by data reordering. In Proceedings of SC’00, Dallas, TX, November 2000. 7. D. S. Nikolopoulos, T. S. Papatheodorou, C. D. Polychronopoulos, J. Labarta, and E. Ayguade. Is Data Distribution Necessary in OpenMP? in proceedings of SC 2000, 2000. 8. R. Ponnusamy, J. Saltz, A. Choudhary, S. Hwang, and G. Fox. Runtime support and compilation methods for user-specified data distributions. IEEE Transactions on Parallel and Distributed Systems, 6(8), pp. 815-831, 1995.

A Scheduling Approach with Respect to Overlap of Computing and Data Transferring in Grid Computing Changqin Huang1,2, Yao Zheng1,2, and Deren Chen1 1

College of Computer Science, Zhejiang University, Hangzhou, 310027, P. R. China 2 Center for Engineering and Scientific Computation, Zhejiang University, Hangzhou, 310027, P. R. China

Abstract. In this paper, we present a two-level distributed schedule model, and propose a scheduling approach with respect to overlap of computing and data transferring. On the basis of network status, node load, and the relation between task execution and task data access, data transferring and computing can occur concurrently in the following three cases: a) A task is being executed on a part of its dataset when the other of its dataset is being replicated; b) A dataset of a scheduled task is being replicated to a node, at which another task is running; c) Data exchange happens when dependant subtasks are running at different nodes. Corresponding theoretical analysis and experimental results demonstrate that the scheduling approach improves execution performance and resource utilization.

1 Introduction A computational grid is an emerging computing infrastructure that enables effective access to distributed and heterogeneous computing resources in order to serve the needs of a Virtual Organization (VO) [1]. The performance that can be delivered varies dynamically for resources competing, network status, task type, and so on. Therefore, resource management and scheduling is a key and hard issue. In data management, replication from primary repositories to other locations at an apt moment can be an important optimization step [2,3]. In the present paper, we focus on scheduling approaches suitable for large-scale data-intensive applications or those of dataintensive and computing-intensive nature, which exist widely in the area of engineering and scientific computation. In the present work, we adopt a distributed schedule model, in which there exist two level schedulers. The scheduler schedules task execution on the basis of a variety of metrics and constraints, meanwhile it tries its best to reduce task expending time to improve performance by overlap of computing and data transferring. This paper is organized as follows: Section 2 reviews related work in the arena of grid scheduling. In Section 3, details of our approach and proposed scheduling model are described. An algorithm and apt analysis are included in Section 4. Case studies with experimental results are included in Section 5, and conclusions in Section 6. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 105–112, 2004. © Springer-Verlag Berlin Heidelberg 2004

106

C. Huang, Y. Zheng, and D. Chen

2 Related Work For the development and deployment of applications on computational grids, there are a number of approaches to scheduling. Vadhiyar et al. [4] present Metascheduler with a 2D chart and Metascheduler types. Berman et al. [5] adopt performance evaluation techniques [6], and they utilize the NWS [7] resource monitoring service at application-level scheduling. Abraham et al. [8] use a parametric engine and heuristic algorithms. Zomaya et al. [9] apply a genetic algorithm. Beaumont et al. [10] aim at independent and equal-sized tasks. Dogan et al. [11] consider the problem of scheduling independent tasks with multiple QoS requirements. The above schedules [4,5,8-11] are related to independent tasks or ignore issues of efficient replication. An adaptive scheduling algorithm for parameter sweep applications is used by Casanova et al. [12], and they take data storage into account. The essential difference between their work and ours is that our heuristic actively replicates datasets. Thain et al. [13] describe a system that links jobs and data by binding execution and storage sites into I/O communities, but do not address policy issues. Ranganathan et al. [14] focus on data-intensive applications, where data movement can be either tightly bound to job scheduling decisions or not by a decoupled way. They don’t consider the cases that task computing and data transferring proceed in a parallel fashion on a node.

3 Scheduling Strategy and Scheduling Model To provide the context for this scheduling strategy and system model, we first address the scheduling scenario in detail. Each site (LAN) comprises a number of nodes (such as PCs, clusters, and supercomputers), and each node has a limited amount of storage. A set of data is initially put onto a node, where user’s task is submitted, or it is mapped to nodes at this site according to a certain distribution. The target computational grid consists of heterogeneous nodes connected by LANs and/or WANs. The whole computational grid is hierarchical: node, LAN, and WAN. Scheduling at a single node is ignored here. The scheduling is divided into two levels: Global Scheduler (GS), corresponding to a WAN; and Local Scheduler (LS), corresponding to a LAN. Firstly, tasks are submitted to a site/node, at which the associated LS is activated to schedule the tasks, and when this scheduler fails in scheduling, these requests are passed to the associated GSs, A GS is responsible for determining which site(s) in its domain these tasks are sent to. Finally, the corresponding LS gives a complete schedule by the local scheduling algorithm. It kills these requests at the other schedulers, then lets tasks be executed and results be returned. As far as algorithms are concerned, the “Best” schedule considers information such as CPU speed, network status between hosts, and task properties. This information is retrieved from resource information providers, such as the Network Weather System (NWS) and the Metacomputing Directory Service (MDS). Our approach requires an application-specific performance model and a scheduler. The scheduler schedules

A Scheduling Approach with Respect to Overlap of Computing

107

tasks and make a decision of transferring data. The goal of the scheduler is to develop a schedule that minimizes makespan and maximizes utilization rate. Each scheduler has two components and two queues as shown in Fig. 1, and their functionalities and relations are described as follows.

Fig. 1. Scheduling model and interaction in a scheduler or among schedulers

Task Scheduling Component (TSC): TSC makes a scheduling decision on the basis of information about resources and tasks, and passes some messages of data transferring to Data Transferring Component (DTC), if overlap of computing and data transferring occurs (we will discuss the details in Section 4). When there exist tasks in Arrived Task Queue (ATQ), TSC keeps activated and gives a schedule of all tasks in the associated ATQ, puts the scheduled tasks into the associated Scheduled Task Queue (STQ), and directs the tasks to be executed on selected resources. Within a limited period of time, if the TSC is not able to give a schedule to a certain task, it will deliver the task request to associated GS’s ATQ to schedule the task with the similar method, otherwise, it returns “failure”. Data Transferring Component (DTC): DTC keeps track of the popularity of each dataset locally available. It works in the following two ways: a) Only if DTC receives associated messages from TSC, it can make a decision on “how” to replicate datasets necessary for tasks in STQ or tasks being executed, under conditions that CPU is busy but connected network is idle. b) It makes a decision on “how” to exchange data necessary for dependant subtasks being executed. Finally, it directs nodes to transfer datasets, so computing and data transferring are performed concurrently. Arrived Task Queue (ATQ) and Scheduled Task Queue (STQ): An ATQ stores all tasks to be delivered to its scheduler. Tasks are put into an ATQ when task requests arrive, and a task is taken out when it has been scheduled. The STQ store

108

C. Huang, Y. Zheng, and D. Chen

those tasks scheduled by the local TSC, and its task is taken out when the task comes into execution.

4 Scheduling Algorithm and Theoretical Analysis 4.1 Assumptions Based on the above-mentioned scheduling model, with a concern for network traffic, we limit a task’s execution in a LAN. To simplify the scheduling, we make the following assumptions: a) Each task/subtask (a subtask exists when a task is divided into parallel subtasks) is assigned to a specific node that it can meet its deadline. b) The time spent on it can be predicted by related techniques (e.g., the PACE [15]). c) Before execution, each task/subtask can get the information about the relation of its computing and its dataset (e.g., the computing may proceed on a part of dataset). d) Tasks/subtasks can be pre-scheduled on the basis of task status and grid information. The core algorithm, by which the scheduler schedules task execution, is not uncertain and can be selected by users (e.g., the FCFS algorithm, and the GA algorithm).

4.2 Scheduling Algorithm Both GS and LS adopt the same approach described below except the core algorithms may not be the same. Only if a distributed task gets its dataset/subset, it starts to run.

A Scheduling Approach with Respect to Overlap of Computing

109

4.3 Theoretical Analysis The metric in our analysis used is makespan and average resource utilization rate here. We only analyze efficiency to be brought by our data transferring strategy. The generic algorithm without our approach is assumed: A scheduled task/subtask needs to hit all of its dataset by replication before it starts execution. It is opposite to our approach. To simplify analysis, both computing and data exchanging in the concurrent way, between the dependant subtasks being executed, are not considered here. Under the conditions that the dataset of one task/subtask is divisible, the above algorithm is considered. Let p denote this task or subtask, and m data size. Dataset is divided into n blocks equally. Let x denote the percentage of CPU performance decrease when transferring data on network concurrently, the speed of transferring data on network, the speed of processing data by a CPU, the makespan for generic scheduling algorithm, and the makespan for a our algorithm It happens that both computing and transferring data are performed concurrently except the first block of data is transferred, so we have the following equation:

If

then

reversely, Obviously, under the first condi-

tion, there exists In general, when there exists multi-storage system, and m is very large, so there exists under the second one. Totally, the makespan adopting our algorithm decreases considerably in general. Let denote the average resource (CPU) utilization rate for generic scheduling algorithm, and the average resource utilization rate for our algorithm. We give a period of time

110

C. Huang, Y. Zheng, and D. Chen

Because If x is little and m is large, will increase considerably. If the dataset of one task/subtask cannot be divided, overlapping computing with data transferring can take place between the task/subtask to being executed and the scheduled task/subtask in a STQ. By analyzing, there exists a similar conclusion: Only if x is little and m is large, the makespan will be reduced and the average resource utilization rate will be increased considerably.

5 Experiments We have developed an engineering computation oriented visual grid prototype system, named VGRID, in which tasks are auto-scheduled in a visual fashion, and it permits a selection of task scheduling core algorithms. In this environment, three pairs of experiments have been designed by using the above scheduling approach. The tasks consist of the iterations of two application examples: Monte Carlo integration in high dimensions, including a small dataset transferring; video conversion application, including a large dataset replication and compression. All nodes are PCs with Intel Pentium 4 processors of 2.0G Hz, memory of 512M, Ethernet 100M, and hard disk 80G/7200rpm. The experiments are described as follows, where two approaches are used. Approach A adopts the FCFS algorithm, whereas Approach B adopts the FCFS algorithm with our scheduling approach. Case 1: One task: video conversion. One node. Case 2: Four tasks: Monte Carlo simulation, video conversion, Monte Carlo simulation, and video conversion in sequence. One node. Case 3: Four tasks: Monte Carlo simulation, video conversion, Monte Carlo simulation, and video conversion in sequence. Three nodes. Experimental results are illustrated in Fig.2 and 3. As shown in these figures, different types of tasks, scheduled task sequences and grid resources have different performance scenarios. In all experiments, all average resource utilization rates increase over 15% by adopting our algorithm. But in Case 1, the new makespan decreases very little and the associated average resource utilization rate increases 18%. This means that overlap of computing and transferring data can-

A Scheduling Approach with Respect to Overlap of Computing

111

not bring benefits, but adds a little workload. Therefore our algorithm isn’t very fit for the tasks of this type as performance decrease percentage x is large to these tasks, it occurs under the conditions of competing grid resources.

Fig. 2. Variation of the makespan

Fig. 3. Variation of the average resource utilization rate

6 Conclusions A scheduling model and an associated algorithm were proposed in the present work. This approach tries its best to reduce task expending time to improve performance by overlapping computing with data transferring. We have theoretically analyzed and instantiated this algorithm with three tests based on a FCFS core algorithm in the VGRID under different conditions. Our results show: Firstly, it is obvious to improve system performance. Secondly, the relation of task execution and its dataset, and the size of data have a significant impact on system performance. Though these results are promising, in interpreting their significance we have to bear in mind that they are based on the simplified grid scenarios. The case, that these dependant subtasks move data for exchange, has not yet been studied in detail. Acknowledgements. The authors wish to thank the National Natural Science Foundation of China for the National Science Fund for Distinguished Young Scholars under grant Number 60225009. We would like to thank the Center for Engineering

112

C. Huang, Y. Zheng, and D. Chen

and Scientific Compu-tation, Zhejiang University, for its computational resource with which the research project has been carried out.

References 1.

2. 3.

4. 5. 6.

7.

8.

9. 10.

11.

12. 13. 14.

15.

I. Foster, C. Kesselman et al.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, 2001, 15 (3): 200-222 J. Subhlok and G. Vondran: Optimal Use of Mixed Task and Data Parallelism for Pipelined Computations. Journal of Parallel and Distributed Computing, 2000, 60: 297-319 O. Beaumont, A. Legrand et al.: Scheduling Strategies for Mixed Data and Task Parallelism on Heterogeneous Clusters and Grids. Proc. of the 11th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2003 S. S. Vadhiyar and J. J. Dongarra: A Metascheduler for the Grid. Proc. of the 11th IEEE International Symposium on High Performance Distributed Computing, 2002 F. Berman et al.: Adaptive Computing on the Grid Using AppLeS. IEEE Transactions on Parallel and Distribted Systems, 2003, 14(4): 369-382 W. Smith, I. Foster, and V. Taylor: Predicting Application Run Times Using Historical Information. Proc. of the IPPS/SPDP Workshop on Job Scheduling Strategies for Parallel Processing, 1998 R. Wolski et al.: The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Future Generation Computing Systems, 1999, (56): 757-768 A. Abraham, R. Buyya et al.: Nature’s Heuristics for Scheduling Jobs on Computational Grids. Proc. of 8th International Conference on Advanced Computing and Communications, Cochin, India, 2000 A. Y. Zomaya et al.: Observations on Using Genetic Algorithms for Dynamic LoadBalancing. IEEE Transactions on Parallel and Distributed Systems, 2001, 9: 899-911. O. Beaumont and L. Carter: Bandwidth-Centric Allocation of Independent Tasks on Heterogeneous Platforms. Proc. of the International Parallel and Distributed Processing Symposium, 2002 A. Dogan and F. Özgüner: Scheduling Independent Tasks with QoS Requirements in Grid Computing with Time-Varying Resource Prices. Proc. of Grid Computing-GRID 2002, 2002, 58-69 H. Casanova, G. Obertelli et al.: The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid. Proc. of Supercomputing 2000, Denver, 2000 D. Thain, J. Bent et al.: Gathering at the Well: Creating Communities for Grid I/O. Proc. of Supercomputing 2000, Denver, 2000 K. Ranganathan and I. Foster: Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications. Proc. of the 11th International Symposium on High Performance Distributed Computing, 2002 G. R. Nudd et al.: PACE – A Toolset for the Performance Prediction of Parallel and Distributed Systems. Journal of High Performance Computing Applications, 2000, 3: 228251

A Deadline and Budget Constrained Cost-Time Optimization Algorithm for Scheduling Dependent Tasks in Grid Computing Haolin Feng1,3, Guanghua Song2,3, Yao Zheng2,3, and Jun Xia2,3 1 2

Chu Kechen Honors College, Zhejiang University, Hangzhou, 310027, P. R. China College of Computer Science, Zhejiang University, Hangzhou, 310027, P.R. China 3 Center for Engineering and Scientific Computation, Zhejiang University, Hangzhou, 310027, P. R. China

Abstract. Computational grid has a promising future in large-scale computing, because it enables the sharing of widely distributed computing resources. Good managements with excellent scheduling algorithms are in great demand to take full advantage of it. Many scheduling algorithms in grid computing are for independent tasks. However, communications are very common in scientific computing programs. In this paper, we will propose an easy-implemented algorithm to schedule the tasks with some communications. Our algorithm is suitable for a large proportion of scientific computing programs, and is based on Binary Integer Programming. It is able to meet the users’ quality of service (QoS) requirements, and to minimize the combination of costs and time consumed by the users’ programs. We will give an example of scheduling a typical scientific computing task to show the power of our algorithm. In our experiment, the grid resource consists of an SGI Onyx 3900 supercomputer, four SGI Octane workstations, four Intel P4-2.0GHz PCs and four Intel P4-1.8GHz PCs.

1 Introduction Computational grids [1] become more and more popular in large-scale computing, because they enable the sharing of computing resources that are distributed all over the world. Those computing resources are distributed widely and owned by many different organizations, and thus, good systems for resource management are essential to take full advantage of grids. Published literature provides us with various management systems with different policies and principles ([2, 3, 4]). The most important part of a good management system is an excellent algorithm for scheduling tasks. It is the management systems that do the jobs of resource discovery[8], selecting machines and scheduling the tasks for the users. Nowadays, numerous scheduling algorithms are available [5, 6, 7]. Most of the scheduling algorithms assume the tasks to be independent. Under this assumption, the existing algorithms can still work with many scientific and engineering computing problems. However, the majority of scientific computing problems and computing in engineering require communications among tasks, e.g. computation in areas of solid meM. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 113–120, 2004. © Springer-Verlag Berlin Heidelberg 2004

114

H. Feng et al.

chanics and fluid dynamics. Without considering the communications among tasks, algorithms are of obvious limitation, and cannot take full advantage of the powerful computational grids. However, it is very challenging to schedule general dependent tasks, and so far, there is no existing satisfying solution, due to causes such as the heterogeneous architectures of different machines in the same grid and the limited bandwidths of network transmission. Thus, for the first step, we try to add some constraints on communications in order to achieve an improved scheduling algorithm. In this paper, we present a model that can schedule both independent tasks and dependent tasks with special communications. In this model, we can reach a balance between the cost and the job completion time for different clients. That means, we provide an optimal solution for an objective function, which consists of the cost and the job completion time. Moreover, both the deadline and the budget set by the clients will be met in the algorithm. We select an array of machines for a batch of tasks, either independent or dependent under a constraint condition, and we show that our solution can be achieved by solving a classical problem -- Binary Integer Programming [11].

2 Problem Modeling Suppose a user has submitted a program consisting of N dependent tasks (denoted as and in the grid there are M machines (denoted as available at the moment (M >= N). The machines are different in terms of architectures, computing power, prices of CPU time, as well as the distances from the management system, which will result in different speeds of data transfer. Our goal is to assign each task to a specific machine (processor) in order to minimize the total “cost” -- the money and the time for completing all these N tasks. Note that the tasks would sometimes communicate with each other, which will make our problem even more complicated.

2.1 Assumptions 1) For each single task, while it is assigned to a specific processor in the grid, the time spent on it can be known. Many techniques are available to achieve this [9, 10]. 2) Tasks are dependent, and communicate with each other. The communication happens whenever a certain percentage amount of work of every task has been completed for all the tasks. For example, the communication happens when 10% of work of every task has been done, and then, communication happens again when another 6% work of every task has been done, and so on. We need not know the value of percentage before its execution, but we should know how many times of communication are to happen as well as the scale of communications. Remark. Although that is a constraint, most problems in scientific computing satisfy this assumption. For example, computations in computational structural analysis, in computational fluid dynamics, and in DNA sequencing, all satisfy this assumption. Also, a program with independent tasks can be considered as the program with tasks

A Deadline and Budget Constrained Cost-Time Optimization Algorithm

115

that communicate after 100% of work has been completed. Therefore, our algorithm works with both independent and dependent tasks. 3) According to most accounting systems, the charge for a unit of time is proportional to the resource used by the user at that moment. Provided that, for a specific processor, if its resource used is very close to zero in a period of time, the cost it charges is very close to zero. Thus, it is reasonable to assume that when waiting for communication, we do not have any monetary consumption.

2.2 Algorithm Description 2.2.1 Definition of the Object Function Now we have to define a proper function to measure the “cost”. Defining the function as a weighed sum of the money and the job completion time is a good idea [5]. People have different views of value towards money and time; thus, it is necessary to give our clients the right to specify the weights for the money and the time, respectively. For example, a certain user considers one unit of time as valuable as one hundred units of money, then we set and Here and stand for the weights of money and time, respectively. The users can also set the deadline (denoted as and the maximal cost that can be afforded (denoted as That is why we called it “a deadline and budget constrained cost-time optimization algorithm”. Based on this logic, we define the object function in this way: Here, is the time spent on task given that is assigned to prcessor For each pair of i and j, as we have assumed, is known. is the cost of processor per unit CPU time. If we really assign task to then the value of will be 1, otherwise 0. T stands for the duration from the beginning of the first task to the end of the last task. 2.2.2 Constrained Condition of the Problem

Here q stands for the times of communications happened during the process.

116

H. Feng et al.

The meaning of is the same with that of equation (1). Inequations (3) and (4) mean “deadline and budget constrained”. Equation (5) means that each single task should be assigned to one and only one processor. Inequation (6) means that each single processor should process at most one of those tasks. That is because communications cannot happen until every one of those tasks has finished by the same percentage, which has been assumed in Subsection 2.1. Equation (7) gives the value of the duration from the end of the (k-1)th communication to the end of the kth communication, and is the percentage amount of work completed during these two communications. For each k, stands for the time for the kth communication for task assigned to machine (The value of determines whether or not is assigned to Equation (7) means that, another (percentage amount) of every task has been finished, and then the kth communication will happen. The reason why we use the first “max” is that the communication will not happen until all the tasks are ready for communication. stands for the time spent on the kth communication. Here, we use “max” because of the following reason: the quality of network varies from place to place, and it is the slowest network that determines the time for communications. Equation (8) has given the value of the job completion time of N tasks. We use the term because after the last communication, of each task is left.

3 Model Modifications and Simplification Combining Equations (7) and (8), we have the following equation: From now on, we can use Equation (9) instead of Equations (7) and (8). A large proportion of scientific computing problems spend most of the time on computing, and the time for communications is relatively ignorable. (That does not mean we can ignore communications, because usually much time is spent on waiting before communications.) When the time transferring data is very little, we can just ignore the difference among the networks, and replace the term with Then, Equation (9) is replaced with the following equation: Now, we claim that we can replace Equation (10) with the following inequation: We state that the replacement will not change the solution. The proof is omitted here, due to the space limitation. In this special case our model can be presented in the following way:

A Deadline and Budget Constrained Cost-Time Optimization Algorithm

117

Except that should be binary integers, all the constrained conditions in our model are linear. Thus, the model is straightforward, and is reduced to a classical Binary Integer Programming problem, for which a lot of methods are available. In reality, the user may have some other requirement for some tasks, for example, reliability. In such cases, not all the processors in a grid are suitable for the tasks. If a certain machine is not suitable for a certain task, we set the corresponding term to be greater than In this way, our algorithm can avoid assigning that task to that machine. Therefore, our algorithm can schedule tasks with QoS requirement.

4 Experiments and Evaluation In this section, we will use a numerical program of Computational Fluid Dynamics (CFD) as an example to test our scheduling algorithm. This CFD program simulates the vortex streets downstream near the nozzle in a plane jet. The whole computational domain is divided into two types of sections, the physical domain, and one PML buffer zone at each end of the physical domain. Four processors have been used to get the final results in their respective subsections of the same height (Fig. 1).

Fig. 1. The steady vortex streets downstream near the jet nozzle

Because of its typicality in scientific computation, it is a satisfying example to show the value of our scheduling algorithm. The program consists of four tasks, which are to be scheduled by our algorithm. There are several different types of machines available for these tasks: an SGI Onyx 3900 supercomputer with 64 processors, four SGI Octane workstations and eight personal computers (four of which are better than the rest). According to the costs of these machines, we have assumed the

118

H. Feng et al.

price for each processor (Table 1). We will discuss two different cases. The first case is that the workload of each task is nearly the same; while the second is that the workloads are different. We will see how this difference will affect the outcome of the schedule. As we have four tasks, we need no more than four processors of the supercomputer. We denote these processors as SC1, SC2, SC3 and SC4. Similarly, WS1, WS2, WS3, and WS4 stand for the processors of four SGI workstations, respectively; P4-2-1, P4-2-2, P4-2-3 and P4-2-4 stand for the four P4 2.0G Hz processors; P4-1.8-1, P4-1.8-2, P4-1.8-3 and P4-1.8-4 stand for the four P4 1.8G Hz processors.

To estimate the time for computing and communications, we can run the program and record the real time. In this experiment, through the recorded data, we know that although the communications are frequent, the data transferred during communications are of very small size. Thus, compared with the time for computing and waiting, the total time for transferring data is of very small amount. As is claimed in Section 3, we can simplify our model in this case.

Case 1: The workload of each task is nearly the same, so, for each type of processors, we only have to list the estimation of CPU time needed to complete one task (Table

A Deadline and Budget Constrained Cost-Time Optimization Algorithm

119

2). Let be 10,000 (units of time), and the budget is 400,000 (units of money). And we will set different value of ratio so that we can see how the weights of time and money can affect the scheduling (Table 3). Remark: The data in Table 3 is given by our algorithm. Because the situation for each of those four tasks is the same, for each scheme, we only listed the CPU time for one task. But the monetary consumption and the value of Z are for those four tasks in each scheme. When really running that program on the super computer, the time for waiting and communications is too short to be measured accurately, so the walltime above should be 1984 minutes. However, the difference is small enough to be ignored. Analysis: When the term is less than 0.0655, all the tasks will be processed by the supercomputer, because it can save a lot of time, which the user values highly. If the term is between 0.0655 and 0.3122, all the tasks will be processed by those four P4 2.0G processors. While the term is larger than 0.3122, the four P4 1.8G processors are preferable, because of the low charge. The result shows that the workstations are not used in this case. However, if the user has some other requirements, for example, the requirement for reliability, then, both the workstations and the super computer may be preferable. The details for adding quality of service requirements have been discussed in Section 3. (Fig. 2).

Fig. 2. Changes of the ratio

result in the changes of the scheme

Case 2: The workload of each task is different. We will give a brief discussion on this case. Suppose there are two tasks with larger workload, each of which is twice as much as each of the smaller ones. Let be 0.01:0.99, then, according to our algorithm, we should assigned the larger tasks to the supercomputer, and leave the smaller tasks to the P4-2.0G Hz processors. This will help to minimize the waiting time before communications. But when the value of changes, the scheme will change greatly in order to meet the requirements of difference clients and to minimize the “cost” they defined. Of course, the budget of the program should also be taken into our consideration, so that to meet our goal – “budget and time” constrained algorithm.

5 Conclusions We have presented an algorithm to schedule programs with dependent tasks. Unlike other algorithms, this one takes the communications among tasks into consideration. Although we impose a constraint on the communications, the algorithm is suitable to

120

H. Feng et al.

scheduling a large proportion of programs on scientific computing. Moreover, we reduce the problem to a classical programming problem -- Binary Integer Programming, which can be solved by some existing methods. Our algorithm can meet the users’ quality of service requirements such as the deadline, budget, security and reliability. By scheduling such a typical scientific computing application, our experiment shows how the algorithm meets the requirements of different users, and how the communications will affect the scheme, and thus demonstrate the validity of the algorithm. Acknowledgements. The authors wish to thank the National Natural Science Foundation of China for the National Science Fund for Distinguished Young Scholars under grant Number 60225009. We would like to thank the Center for Engineering and Scientific Computation, Zhejiang University, for its computational resources, with which the numerical experiments have been carried out.

References I. Foster and C. Kesselman (eds.): The Grid: Blueprint for a Future Computing Infrastructure, Morgan Kaufmann Publishers, USA, 1999. 2. GRAM: Grid Resource Allocation & Management, Argonne National Laboratory, and USC Information Sciences Institute. 3. C. Youn: Resource Management and Scheduling in Grid (Concepts and Trends), 2002. 4. R. Buyya, D. Abramson, J. Giddy, and H. Stockinger: Economic Models for Resource Management and Scheduling in Grid Computing, Special Issue on Grid Computing Environments, the Journal of Concurrency and Computation: Practice and Experience (CCPE), 14(13-15), 2002 5. A. Dogan and F. Özgüner: Scheduling Independent Tasks with QoS Requirements in Grid Computing with Time-Varying Resource Prices, Proceeding of Grid Computing-GRID 2002, 58-69, 2002. 6. A. K. Amoura, E. Bampis, C. Kenyon, and Y. Manoussakis: Scheduling Independent Multiprocessor Tasks, Algorithmica, 32: 247–261, 2002 7. R. Buyya, M. Murshed, and D. Abramson: A Deadline and Budget Constrained Cost-Time Optimization Algorithm for Scheduling Task Farming Applications on Global Grids. (www.cs.mu.oz.au/~raj/, current September 14, 2003) 8. J. Yu, S. Venugopal, and R. Buyya: A Market-Oriented Grid Directory Service for Publication and Discovery of Grid Service Providers and their Service. (www.cs.mu.oz.au/~raj/, current September 14, 2003) 9. M. A. Iverson, F. Özgüner, and C. Lee: Potter: Statistical Prediction of Task Execution Times through Analytic Benchmarking for Scheduling in a Heterogeneous Environment, IEEE Trans. Computers, 48(12): 1374-1379, 1999 10. B. Reistad and D. K. Gifford: Static Dependent Costs for Estimating Execution Time, Proc. of the 1994 ACM Conference on LISP and functional programming, 65–78, 1994. 11. F. S. Hillier, G. J. Lieberman: Introduction to Operations Research, 7th ed., McGraw-Hill Higher Education, 2001. 1.

A Load Balancing Algorithm for Web Based Server Grids Shui Yu, John Casey, and Wanlei Zhou School of Information Technology, Deakin University 221 Burwood HWY, Burwood, VIC 3125, Australia {syu, jacasey, wanlei}@deakin.edu.au

Abstract. Load balance is a critical issue in distributed systems, such as server grids. In this paper, we propose a Balanced Load Queue (BLQ) model, which combines the queuing theory and hydro-dynamic theory, to model load balance in server grids. Base on the BLQ model, we claim that if the system is in the state of global fairness, then the performance of the whole system is the best. We propose a load balanced algorithm based on the model: the algorithm tries its best to keep the system in the global fairness status using job deviation. We present three strategies: best node, best neighbour, and random selection, for job deviation. A number of experiments are conducted for the comparison of the three strategies, and the results show that the best neighbour strategy is the best among the proposed strategies. Furthermore, the proposed algorithm with best neighbour strategy is better than the traditional round robin algorithm in term of processing delay, and the proposed algorithm needs very limited system information and is robust.

1 Introduction Server grids are an important and efficient architecture for Internet based applications. Server grids based on distributed architecture can improve the performance, which is a critical issue of Internet based applications. Nowadays, server grids in the Internet environment are very popular, such as, distributed web based databases, clustering web servers, mirrored servers, anycast servers, peer-to-peer computers, and so on. One issue of Internet based server grids is the load balance among the distributed servers. Most of the existing load balance algorithms [2], [3], [10] are based on the background of a static environment, but in the situation of Internet based server grids, the environment is no longer static because of the unstable Internet traffic, congestions, user requests, and so on. Graph theory is one of the methods of analyzing the load balance issue [3], and Statistics is a useful tool for the load balance problem as well [1], [2], [10]. [5], [6] applied a hydro-dynamic approach to model network traffic. The main advantage of this method is its power in describing dynamic load balancing activities. However, the hydro system describes a continuous world, while computer network systems belong to a discrete environment; therefore certain transformation methods have to be employed. On the other hand, because of its discrete nature, queuing theory has been used for modeling computer networks for decades. However, modeling dynamic load balancing activities using queuing theory is difficult. In this paper, we try to combine M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 121–128, 2004. © Springer-Verlag Berlin Heidelberg 2004

122

S. Yu, J. Casey, and W. Zhou

balancing activities using queuing theory is difficult. In this paper, we try to combine the discrete nature of queuing theory and the power in describing dynamic activities of the hydro-dynamic approach to model load balance activities of Internet based server grids. The rest of this paper is organized as follows. Section 2 refers to the related work. In section 3 we propose the Balanced Load Queue model. A novel algorithm is proposed in section 4 based on the Balanced Load Queue model. The performance evaluation is discussed in section 5. Finally section 6 summaries the paper and presents the future work.

2 Related Work [9] presented the supermarket model to describe load balancing for a group of servers: customers arrive as a Poisson stream of rate at a collection of n servers. Each customer chooses some constant d servers independently and uniformly at random from the n servers, and waits for service at the one with the fewest customers. The service time for a customer is exponentially distributed with mean 1, and the service protocol is first-in first-out. Furthermore, the paper pointed out that the supermarket model is difficult to analyze because of dependencies: knowing the length of one queue affects the distribution of the other queues. Then the author first developed a limiting, deterministic model representing the behavior as and then translated the results from that model to results for large, but finite, values of n. Balls and bins model is used for load balancing research in [2], [10]. The problem is described as follow: suppose that n balls are thrown into n bins, with each ball choosing a bin independently and uniformly at random, then the largest number of balls in any bin is approximately log n /log log n with high probability. [1] proposed a approach of online load balance based on the balls and bins model, and obtained that if each user samples the load of two resources and sends his request to the least loaded one, the total overhead is small, and the load on the n resources varies by only a O(log log n) factor. [5], [6] introduced a hydro-dynamic approach to solve the dynamic load balancing issue on a network of heterogeneous computers. The authors modeled a computer as a cylinder, the diameter represents the computing capability of the computer and liquid in the cylinder denotes the work load on the computer. Their conclusion is when the system achieves the global fairness, namely the heights of all the cylinders are the same, the system is load balanced, at the same time, the potential energy of the system is minimized. Anycast is a new type of network service which always tries to find the “best” server among the anycast group [7], [11], [13]. Anycast mechanism provides an automatic load balance capability among an anycast group. Our previous research [14] provides a practical and efficient method for anycast routing, which integrates the network delay and the server performance as a criterion for finding the “best” server.

A Load Balancing Algorithm for Web Based Server Grids

123

3 Balanced Load Queue (BLQ) Model The hydro-dynamic approach is very effective for analyzing dynamic load balancing activities of distributed systems. However, the liquid system used in the hydrodynamic approach is a continuous system, while the situation in our computer systems is discrete. On the other hand, the queuing theory is a powerful tool for modeling computer systems, but it is less effective in modeling dynamic systems. In this section we combine the two distinct theories together to model Internet based server grids.

Fig. 1. Server Grid using the Queuing Model

Fig. 2. A queue and the related concepts

Here we model each server in a server grid as a queue, and the queues are connected by networks. Figure 1 shows an example of a server grid with four servers and five connections. In Figure 1, the queues, and represent servers, and there exists a network connecting them together to form a server grid. For each queue the width of a queue denotes its computing capability: the wider, the more powerful. In order to simplify the explanation, we describe some concepts here, which will be used in the rest of this paper. In Figure 2, parameter indicates the moving speed of the requests in the queue m, actually, is the service rate of the related computer. is a request in the queue, and is the service time for in the queue. Based on the definitions in Figure 2, we can find that during the processing of request i, at any time point t, When the processing finished, then Definition 1. Global Fairness (GF). In a server grid ers, if the sums of service time (where

n is the number of the servis the number denoting a

server, and is the sequence number for requests in a queue) in each queue are equivalent, then we call the system is in a state of global fairness. The definition can be expressed in the follow equation.

If a server grid is in the state of global fairness, then the current requests in all the queues will be finished at the same time, and further, that each server is equal for a new incoming request. Assertion 1. If the work load of a server grid with n servers is balanced, then in a given period [0, T] (T is sufficiently big), the system must be in the state of global fairness, namely, the equation (1) is correct.

124

S. Yu, J. Casey, and W. Zhou

Proof: There are three cases for the issue as listed below; any other situations are the combination of them. Case 1. There are no requests in the queues and is the arrival rate of requests for queue i) for the period [0, T]. It is obvious that the equations are correct. Case 2. i =1,2,...,n for the period [0, T]. This means all the arrival rates are bigger than the service rates respectively, namely, the all the servers are busy for the whole period, The assertion is correct. Case 3. Without loss generality, suppose there is no request in and there is/are one or more request(s) in at a given time point For the reason of load balance, if there comes a new request, the request will be dispatched to by the overloaded queue(s), this situation may happen from time to time. Therefore, if T is sufficiently big, the assertion is correct. In all of the three cases the assertion are correct, therefore the assertion is correct for any combination of them, as a result, the assertion is correct for any situation. Assertion 2. When a server grid is in the state of global fairness, then the performance of the whole system is the best. Proof: assume that there are n servers, and the service rates are if the system is not idle, in the state of global fairness, the total service rate is

if

the system is not in the state of global fairness, after a period of time, T, there will be at least one computer having no jobs to do, then the total service rate is/are the server/servers that has/have no jobs to do. It is obvious that therefore the assertion 2 is correct. Assertion 3. In a server grid with servers, if the system is in the state of balance, work load of n servers are balanced, then during a given period [0, T] (T is sufficiently big), the ratios of arrival rate to the service rate for each server are the same. Proof: If the system is in the state of balance, then equation (1) is correct. And we know that We ignore the switching time of processes, then for a long term view, we can obtain the following result.

Where k is a constant, which represents the ratio for convenience. This assertion implies that the relationship between the arrival rate and the service rate is fixed when the load of the system is balanced. Furthermore, parameter k implies the average waiting time for the users when the whole system is fully loaded. When k is bigger, the average waiting time is longer at that scenario.

A Load Balancing Algorithm for Web Based Server Grids

125

Assertion 4. If the work load of n servers are balanced, then during a given period [0, T] (T is sufficiently big), the relationship between mean time a request spends in the system, and the arrival rate is reciprocal. Proof: Assume that n=2, based on the equations of queuing theory, we can get the in terms of and shown as below,

From equation (1), we can obtain,

When

the proof is the same, then in general,

This assertion indicates that the relationship between the arrival rate and the mean time that a request spends in the system when the load of the system is balanced.

4 A Load Balanced Algorithm Based on the BLQ Model The balanced load queue model is good for describing load balance issues for server grids, but it is expensive because we need to know the states of all the queues. Based on assertion 3, we found that if the system is balanced, then the ratio of the arrival rate and the service rate for a given server is fixed. As we know, the service rate of a server is a constant value, for a given value k, if that means the server is approaching to the state of balance; On the other hand, if that means the server is overloaded, and the incoming requests should be dispatched to other servers in order to get the system back to the balanced state. The main advantage of this idea is that we just need to set a reasonable small k when the system is initiated, and then each server can judge weather it is necessary to deviate the incoming request or not without the information of network and any other information about the other servers. We assume that the whole performance of the system is satisfied by the users, which means the k in equation (2) is fixed, and then we get a boundary for arrival rate for each server, respectively. When there comes a new request to server i, the server will calculate its own if then it does nothing, otherwise, it deviates the incoming request to one of the other peer servers. How to decide the destination to process the deviated requests is an interesting issue, we design 3 strategies here for deviation: 1) Random Selection Strategy. Choose one server randomly from the other servers of the server grid; 2) Best Node Strategy. Choose the best one from all the servers of the grid using a global probing.

126

S. Yu, J. Casey, and W. Zhou

3) Best Neighbour Strategy. Choose the better one from the current server’s nearest two neighbors (neighboring servers). The details of the algorithm are shown as below.

We must point out that for the best neighbour strategy and the random selection strategy, there is a potential danger of a deviation loop. The probability of deviation loop is high when the number of servers is small.

5 Performance Analysis We have conducted some experiments on the Internet in order to demonstrate our proposed algorithm and compare the performance of the three strategies for job deviation. Moreover, we use a central controlled algorithm with round robin strategy [4] [8] [12] as a benchmark to evaluate our algorithm. The scenario for our algorithm is that requests are generated everywhere in the Internet and target to one of the servers of the server grid randomly. We know an estimated processing time for each job on a given server. Because of the delay of the deviation, there exists a delay of processing compared with the estimated processing time; we name it as Processing Delay. We use more than ten servers, which are distributed in two campuses, to act as the server grid. In the rest of this section, we will present and compare several factors, which have impact on the performance of the whole system. Figure 3 shows that when the number of nodes (servers) in a server grid increases, the processing delay of the best neighbour strategy keeps almost constant and less than the other two proposed strategies. Generally only the best neighbour strategy of the proposed algorithm is better than the central controller algorithm. The reason is that the best node strategy is expensive while the random selection strategy has no quality control. If the arrival rates are stable, then the number of requests can reflect the general performance in term of time. Based on Figure 4, we can observe that generally the average processing delays of the three strategies and central controller algorithm are close to a constant value respectively. In term of the general performance, best neighbour is better than best node, and much better than the random selection. Both of the

A Load Balancing Algorithm for Web Based Server Grids

127

strategies with quality control are better than the central controller algorithm in term of processing delay.

Fig. 3. No. of Nodes vs Processing Delay

Fig. 5. Network Delay vs Processing Delay

Fig. 4. No. of Requests vs Processing Delay

Fig. 6. Arrival Rate vs Processing Delay

Figure 5 compares the impact of network delay on the processing delay. It shows that the best neighbor strategy is the best in the three proposed strategies and the benchmark algorithm. Arrival rate is a parameter that reflects the concentration of the Internet traffic. The relationship of processing delay and the arrival rate is shown in Figure 6. Based on the result, we can conclude that the performance of the best neighbour strategy is the best in the four strategies.

6 Summary and Future Work In this paper, we proposed the balanced load queue model, which combines the advantages of the queuing theory and the hydro-dynamic approach to model the Internet based server grids. We proposed a load balancing algorithm based on our balanced load queue model, which tries its best to keep the system in the global fairness status using job deviation strategies. We presented three strategies: best node, best neighbour, and random selection for job deviation. We predefined a threshold in the algorithm for each server (queue) in the server grid, which depends on a reasonable delay for users. If one queue’s jobs exceed the predefined threshold, then a job deviation strategy will be employed.

128

S. Yu, J. Casey, and W. Zhou

Our experiments show that the best neighbour strategy is the best among the three strategies and the central controlled strategy at several aspects: number of servers (nodes), number of requests, network delay and arrival rate. The proposed algorithm can work with very limited system information, moreover, it can work independently from network traffic, link breaches, and so on. Some further researches need to be done, for example, the dynamic adjustment for the threshold for a server grid is an important issue for the whole system performance. Furthermore, the deviation loop is a critical and interesting topic for further research.

References 1. 2.

3.

4.

5.

6.

7. 8.

9. 10. 11. 12. 13.

14.

Yossi Azar, Andrei Z. Broder, Anna R. Karlin, and Eli, Upfal, “Balanced Allocations,” SIAM J. COMPUT. Vol. 29, No.1, pp180-200, 1999. Eleni Drinea, Alan Frieze, and Michael Mitzenmacher, “Balls and Bins Models with Feedback,” Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 308-315, 2002. Bharat S. Joshi, Seyed Hosseini, and K. Vairavan, “On a load balancing algorithm based on edge coloring,” Proceedings of the Southeastern Symposium on System Theory, pp174-178, 1997. Jing Liu, Hung Chun Kit, Mounir Hamdi, and Chi Ying Tsui, “Stable Round-Robin Scheduling Algorithms for High-Performance Input Queued Switches,” Proceedings of the Symposium on High Performance Interconnects Hot Interconnects, 2002. Chi-Chung Hui, and Samuel T. Chanson, “A hydro-dynamic approach to heterogeneous dynamic load balance in a network of computers,” Proceedings of the 1996 International Conference on Parallel Processing, pp. III-140-147, 1996. Chi-Chung Hui, and Samuel T. Chanson, “Efficient load balancing in interconnected LANs using group communication,” Proceedings of the International Conference on Distributed Computing Systems, pp.141-148, 1997. W. Jia, W. Zhou, and J. Kaiser, “Efficient Algorithm for Mobile Multicast Using Anycast Group,” IEE Proc.-Commun., Vol. 148, No. 1, February 2001. Tamas Marostis, Sandor Molnar, and Janos Sztrik, “CAC Algorithm Based on Advantage Round Robin Method for Qos Networks,” Proceedings of the Sixth IEEE Symposium on Computers and Communications, 2001. M. Mitzenmacher, “Load balancing and dependent jump markvo processes,” Proceedings of the Annual Symposium on Foundations of Computer Science, pp213-222, 1996. Michael Mitzenmacher, “The Power of Two Choices in Randomized Load Balancing,” Ph.D thesis, 1997. C. Partridge, T. Mendez, and W. Milliken, “Host Anycasting Service,” RFC 1546, November 1993. Jie Wang and Yonatan Levy, “Managing Performance Using Weighted Round-Robin,” Proceedings of the Fifth IEEE Symposium on Computers & Communications, 2000. Dong Xuan, Weijia Jia, Wei Zhao, and Hongwen Zhu, “A Routing Protocol for Anycast Message,” IEEE Transaction on Parallel and Distributed System, Vol. 11, No. 6, June 2000. Shui Yu, Wanlei Zhou, Fuchun Huang, and Mingjun Lan, “An Efficient Algorithm for Application-Layer Anycasting”, The Fourth International Conference on Distributed Communities on Web (DCW2002), Sydney, April 2002.

Flexible Intermediate Library for MPI-2 Support on an SCore Cluster System Yuichi Tsujita Department of Electronic Engineering and Computer Science, Faculty of Engineering, Kinki University 1 Umenobe, Takaya, Higashi-Hiroshima, Hiroshima 739-2116, Japan [email protected]

Abstract. A flexible intermediate library named Stampi for MPI-2 support on a heterogeneous computing environment has been implemented on an SCore cluster system. With the help of a flexible communication mechanism of this library, users can execute MPI functions without awareness of underlying communication mechanism. In message transfer of Stampi, a vendor-supplied MPI library and TCP sockets are used selectively among MPI processes. Introducing its own router process mechanism hides a complex network configuration in inter-machine data transfer. In addition, the MPI-2 extensions, dynamic process creation and MPI-I/O, are also available. We have evaluated primitive functions of Stampi and sufficient performance has been achieved and effectiveness of our flexible implementation has been confirmed.

1

Introduction

The low cost and scalability of a PC cluster have made it the most popular platform today. But there is a difficulty that users need to pay attention to each PC node because each node is operated independently and users need to care heterogeneity in the PC cluster. To provide a seamless computing environment, an SCore cluster system (SCore system) [1] was developed. As MPI [2,3] is the de facto standard in parallel computation, almost all computer vendors have implemented their own MPI libraries. A built-in MPI library of the SCore system, MPICH-SCore [4], is one of the versions of an MPICH library [5]. Although this library is available inside a PC cluster (intramachine MPI communications), dynamic process creation and MPI communications among different platforms (inter-machine MPI communications) have not been supported. To realize such mechanisms, Stampi [6] has been implemented on an SCore system [7]. Recent applications in parallel computation handle huge amounts of data. Almost all data-intensive applications tend to have access to noncontiguous data rather than contiguous one. MPI-I/O was proposed as a parallel-I/O interface to support such I/O patterns in the MPI-2 standard [3]. But MPI-I/O operations among computers have not been supported in any vendor-supplied MPI library. To realize this mechanism, we have developed a flexible MPI-I/O library, named M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 129–136, 2004. © Springer-Verlag Berlin Heidelberg 2004

130

Y. Tsujita

Stampi-I/O [8], as a part of a Stampi library. Users can call MPI-I/O functions in both local and remote I/O operations using a vendor-supplied MPI-I/O library. When the library is not available, UNIX I/O functions are used instead of the library (pseudo MPI-I/O method). An MPI-I/O function call is translated into the combination of the UNIX I/O operations and data manipulations inside a Stampi library. Primitive MPI functions of the Stampi library have been evaluated on interconnected Linux machines, a Linux cluster with an SCore system (SCore cluster) and a Linux workstation. Network connection between them was established on Gigabit Ethernet based LAN. In this paper, outline, architecture, and preliminary results of Stampi on an SCore system are described.

2

Implementation of Stampi on an SCore System

Stampi has been developed to hide complex network configuration and heterogeneity for flexible MPI communication on interconnected supercomputers. The features of Stampi are summarized as follows: 1. flexible communication mechanism among computers, 2. dynamic process creation and remote I/O operation mechanisms based on the MPI-2 standard, 3. flexible mechanism in both local and remote I/O operations, and 4. support of external32 data format among multiple platforms.

To use high availability and flexibility of an SCore system on a heterogeneous computing environment, Stampi has been implemented on the system. Rest of this section describes the details of Stampi on the SCore system.

2.1

Architecture of Stampi on an SCore System

Architectural view of Stampi on a heterogeneous computing environment including an SCore cluster is depicted in Fig. 1. When user processes execute MPI functions, the Stampi library is called at first. High performance intra-machine MPI communication is available using a well-tuned underlying communication library named PM2 [9] via an MPICH-SCore library. Although ROMIO [10] is available in MPICH-SCore, Stampi uses the pseudo MPI-I/O method in local MPI-I/O operations because ROMIO of MPICH-1.2.0 in SCore 5.0.1 does not support error handling. In inter-machine MPI communications, a communication path is switched to the TCP socket connections inside the Stampi library. When computation nodes can not communicate outside, a router process is invoked on the server node to relay messages from/to user processes on the computation nodes. The spawn functions based on the MPI-2 standard have been implemented in Stampi with the help of a remote shell command (rsh, ssh, etc.) to use computational resources effectively.

Flexible Intermediate Library for MPI-2 Support

131

Fig. 1. Architecture of Stampi on a heterogeneous computing environment.

Remote MPI-I/O operations are carried out with the help of an MPI-I/O process which is invoked on a remote computer. I/O interfaces for users are also based on the MPI-I/O APIs. I/O requests from user processes are translated into a message data, and it is transfered to the MPI-I/O process via a communication path switched to the TCP socket connections. The MPI-I/O process plays parallel-I/O operations according to the I/O requests. When a computer does not have own MPI-I/O library, the pseudo MPI-I/O library in Stampi is used. In the pseudo MPI-I/O method, each MPI-I/O function is translated into the combination of UNIX I/O functions such as write() and data manipulations.

2.2

Execution Mechanism of Stampi on an SCore System

Stampi supports interactive and batch mode both. Here, execution method to create child user processes and MPI-I/O processes from an SCore cluster to a remote computer with a batch system is explained using Fig. 2. Firstly, an SCore start-up process (scout) and a router process are initiated by a Stampi start-up command (starter). Then the scout process initiates user processes. When those user processes call MPI_Comm_spawn() or MPI_File_open(), a router process kicks off a starter process on a remote computer with the help of a remote shell command, and it generates a script file which is submitted to a batch queue system according to a specified queue class in an info object. Secondly, the starter written in the script file kicks off user processes or MPI-I/O processes in the case of MPI_Comm_spawn() or MPI_File_open(), respectively. Besides, a router process is invoked on an IP-reachable node if it is required. Finally, inter-machine MPI communication is available via a communication path established between both computers. Remote I/O operations are carried out by the MPI-I/O processes. When the user processes on the SCore cluster call MPI_File_close(), the MPII/O processes are terminated. Next, mechanism of remote I/O operations is explained. As an example, mechanism of MPI_File_write_at_all() in remote I/O operations is illustrated

132

Y. Tsujita

Fig. 2. Execution mechanism of dynamic process creation and remote I/O operation from an SCore cluster to a remote computer.

in Fig. 3. When user processes call this function, several parameters are packed in a user buffer using MPI–Pack(). Then the buffer is transfered to the MPI-I/O process using MPI_Send() and MPI_Recv() of the Stampi library. In these functions, Stampi-supplied underlying communication functions such as JMPI_Isend(), JMPI-Irecv(), and JMPI_Wait() are called for non-blocking TCP socket communications. After message data is transfered, I/O operation is carried out by the MPI-I/O process, and returned values are sent to the user processes. Other MPI-I/O functions of Stampi also use the similar mechanism.

3

Performance Measurement

Performance of MPI communications and MPI-I/O operations was measured on interconnected Linux machines, an SCore cluster and a Linux workstation. Specifications of them are summarized in Table 1. A Linux kernel used in the computation nodes of the SCore cluster is a modified one for an SCore system based on the original Linux kernel. The SCore cluster consisted of one server node and eight computation nodes. Network connections among the computation nodes were established with Gigabit Ethernet (1 Gbps, full duplex mode) through a Gigabit Ethernet switch (Extreme Alpine 3804). While a server node was connected to the switch with 100 Mbps bandwidth. Network connection between the computation nodes and the Linux workstation was made with 1 Gbps bandwidth via the switch and two Gigabit Ethernet switches (NetGear GS524Ts) on Gigabit Ethernet based LAN. Remote I/O operations of Stampi were carried out with inter-machine MPI communications and local I/O operations on a disk which was attached to the Linux workstation using an Ultra160 SCSI connection. In this test, a router process was not used because each computation node could communicate outside directly. Data size was denoted as the whole message

Flexible Intermediate Library for MPI-2 Support

133

Fig. 3. Mechanism of MPI_File_write_at_all() in remote I/O operations. MPI functions in rectangles are MPI interfaces of Stampi. Internally Stampi-supplied functions such as JMPI_Isend() are called.

data size to be transfered. Message data was split evenly among user processes and they were transfered to another user processes or an MPI-I/O process. Transfer rate of inter-machine MPI communications was calculated as (message data size)/(RTT/2), where RTT is a round trip time for ping-pong communication between user processes. In addition, we defined the latency as RTT/2 for 0 Byte message data. In remote I/O operations, latency was measured as operation time for 0 Byte message data.

3.1

Performance of Inter-machine MPI Communications

Performance of inter-machine MPI communications between the computation node of the SCore cluster and the Linux workstation was measured using pingpong data transfer with MPI_Send() and MPI–Recv(). Besides, TCP–NODELAY flag in TCP sockets was activated in the Stampi start-up command to gain higher performance. Performance results are summarized in Table 2. We achieved up to 28 % (35.0/125 × 100 for 256 MByte message data) of the theoretical bandwidth. Performance of inter-machine data transfer using raw TCP sockets was also measured, and the similar performance was observed. Thus, there was not significant performance degradation in inter-machine MPI communication mechanism compared with the case of raw TCP sockets.

3.2

Performance of Remote I/O Operations

Performance of remote I/O operations from the SCore cluster to the Linux workstation was measured using collective MPI-I/O functions,

134

Y. Tsujita

MPI_File_write_at_all() and MPI_File_read_at_all() with TCP_NODELAY flag in the Stampi start-up command. An MPI-I/O process which was invoked on the Linux workstation operated the pseudo MPI-I/O method. Performance values are summarized in Table 3. In both functions, performance values in the cases of single user process and multiple user processes are almost same. It is considered that inter-machine data transfer between the SCore cluster and the Linux workstation is bottleneck in remote I/O operations. To examine the performance values, performance of local I/O operations was measured using Stampi on the Linux workstation. Performance results are summarized in Table 4. Up to 68.3 % (~ 109.3/160 × 100) and 93.9 % (~ 150.3/160 × 100) of the theoretical Ultra160 SCSI bandwidth were achieved for write and read operations, respectively. Using these values, performance of remote I/O operations was estimated roughly. In the case of MPI_File_write_at_all(), total operation time is estimated to be the sum of operation times for transfer of parameters and bulk data, local I/O on a remote computer, and transfer of returned values. In this estimation, the operation time to transfer the parameters is supposed to be same with the latency (57 from Table 2) because length of message data was a few Bytes. Operation times to transfer and write a 1 MByte data were 29.5 ms (~ (1 MB)/(33.9 MB/s)) and 9.84 ms (~ (1 MB)/(101.6 MB/s)), respectively. As

Flexible Intermediate Library for MPI-2 Support

135

length of the returned values was a few Bytes, the time for this operation is also supposed to be 57 with the same reason for the parameters. Thus the total time was estimated as 39.5 ms (~ 57 + 29.5 ms + 9.84 ms + 57 ) in the single user process case, while measured one was 42.7 ms (~ (1 MByte)/(23.4 MB/s)). It is noted that there were negligible and unconsidered processing times in data manipulation, context switch inside user and MPI-I/O processes, and so on.

4

Summary

In this paper, outline, architecture, and preliminary performance results of Stampi on an SCore system are reported. Stampi on an SCore system realizes intra-machine and inter-machine MPI communications with a high performance MPICH-SCore library and TCP sockets, respectively. Dynamic process creation based on the MPI-2 standard is also supported among computers. In addition, Stampi supports both local and remote MPI-I/O operations using a vendor-supplied MPI-I/O library. If the library is not available, a pseudo MPI-I/O library using UNIX I/O functions is used. In remote I/O operations, Stampi achieved sufficient performance considering performance values of inter-machine MPI communications and local I/O operations. The bottleneck in remote I/O operations was considered to be mechanisms in inter-machine MPI communications. Although there was the bottleneck, transfer rates were almost same in the case of up to four user processes.

136

Y. Tsujita

Acknowledgments. The author would like to thank Prof. Genki Yagawa, University of Tokyo and director of Center for Promotion of Computational Science and Engineering (CCSE), Japan Atomic Energy Research Institute (JAERI), for his continuous encouragement. The author would like to thank the staff at CCSE, JAERI, especially Toshio Hirayama, Norihiro Nakajima, Kenji Higuchi, and Nobuhiro Yamagishi for providing a Stampi library and giving useful information. This research was partially supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), Grant-in-Aid for Young Scientists (B), 15700079, 2003.

References 1. PC Cluster Consortium: http://www.pccluster.org/. 2. Message Passing Interface Forum: MPI: A Message-Passing Interface Standard, June 1995. 3. Message Passing Interface Forum: MPI-2: Extensions to the Message-Passing Interface Standard, July 1997. 4. M. Matsuda, T. Kudoh, and Y. Ishikawa: Evaluation of MPI Implementations on Grid-connected Clusters using an Emulated WAN Environment. In Proceedings of the IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2003), 2003, pp. 10–17. 5. W. Gropp, E. Lusk, N. Doss, and A. Skjellum: A high-performance, portable implementation of the MPI Message-Passing Interface standard. Parallel Computing, 22(6), 1996, pp. 789–828. 6. T. Imamura, Y. Tsujita, H. Koide, and H. Takemiya: Architecture of Stampi: MPI Library on a Cluster of Parallel Computers. Recent Advances in Parallel Virtual Machine and Message Passing Interface, LNCS 1908, Springer, 2000, pp. 200–207. 7. Y. Tsujita, T. Imamura, N. Yamagishi, and H. Takemiya: MPI-2 Support for Heterogeneous Computing Environment Using an SCore Cluster System. Parallel and Distributed Processing and Applications, LNCS 2745, Springer, 2003, pp. 139– 144. 8. Y. Tsujita, T. Imamura, N. Yamagishi, and H. Takemiya: Stampi-I/O: Flexible Distributed Parallel-I/O Library for Heterogeneous Computing Environment. Recent Advances in Parallel Virtual Machine and Message Passing Interface, LNCS 2474, Springer, 2002, pp. 288–295. 9. T. Takahashi, S. Sumimoto, A. Hori, H. Harada, and Y. Ishikawa: PM2: High Performance Communication Middleware for Heterogeneous Network Environments. In SC2000: High Performance Networking and Computing Conference, IEEE, November 2000. 10. R. Thakur, W. Gropp, and E. Lusk. On Implementing MPI-IO Portably and with High Performance. In Proceedings of the Workshop on I/O in Parallel and Distributed Systems, May 1999, pp. 23–32.

Resource Management and Scheduling in Manufacturing Grid Lilan Liu1, Tao Yu1, Zhanbei Shi2, and Minglun Fang1 1

CIMS & Robot Center of Shanghai University, Shanghai, China, 200072 2 Computer Science of Shanghai University, Shanghai, China, 200072 [email protected]

Abstract. In order to resolve resource management and scheduling problem in Manufacturing Grid (MG) - an application of Grid technology, we develop a resource management and scheduling system with the interaction of Manufacturing Grid Information Service (MGIS) and the Manufacturing Grid Resource Scheduler (MGRS). The former, MGIS, provides fundamental mechanisms for remote resource encapsulating, registration, and monitoring, and the latter, MGRS, performs scheduling roles as Global Process Planning (GPP) analyzing, resource discovery, resource selection, and resource mapping.

1 Introduction Manufacturing resources, ranging from software, such as Computer Aided Design (CAD), Computer Aided Process Planning (CAPP), and Computer Aided Manufacturing (CAM), to various kinds of machine tools, such as Computerized Numerical Control (CNC), and Rapid Prototype Manufacturing (RPM), etc, are quite distinct from those computing resources or data resources. This particularity increases the complexity of resource management and scheduling in Manufacturing Grid (MG), which has been proposed in our previous research [1, 2, 3]. So, a Manufacturing Grid Information Service (MGIS) and a Manufacturing Grid Resource Scheduler (MGRS) are proposed in this article to construct the resource management and scheduling system in MG. MGIS provides functions for remote resource encapsulating, registry, and monitoring. And, MGRS performs scheduling roles as Global Process Planning (GPP) analyzing, resource discovery, resource selection, and resource mapping.

2 Resource Management and Scheduling in MG 2.1 Resource Management and Scheduling System Due to the characteristics of manufacturing resources, we investigate a resource management and scheduling system, which includes MGIS and MGRS, as shown in Fig.1. With this system, Manufacturing Grid enables large-scale sharing of resources and M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 137–140, 2004. © Springer-Verlag Berlin Heidelberg 2004

138

L. Liu et al.

collaborative working among formal or informal enterprises and/or institutions: what are called Virtual Organizations (VO) or Virtual Enterprise (VE).

Fig. 1. Resource Management and Scheduling System

Fig. 2. Manufacturing Grid Information Service

In the following sections, we’ll mainly discuss the MGIS and the MGRS.

2.2 Manufacturing Grid Information Service In this research, we constructed the MGIS in MG system with the help of Index Service provide by GT3 [4, 5], as shown in Fig.2. The functions of its two important components are as follows: Resource Templates. In manufacturing, resource differs greatly to each other in terms of its nature (physical characteristics like location and functionality), the demands placed on it (time, quality, cost or service), and the ways in which it is employed (e.g., discovery, brokering, monitoring, diagnosis, adaptation). Nevertheless, in each case we see a similar structure: the resources belonging to the same category are similar in characteristics and demands. So, we design many templates of inhomogeneous resources, which describe the attributes, demands and interfaces of these kinds of resources. And resource templates can increase with the expanding of MG system. Index Service providing collective-level indexing and searching functions, Index Service is used to obtain information from multiple resources, and acts as an organization-wide information server for a multi-site collaboration.

2.3 Manufacturing Grid Resource Scheduler The utilization of Manufacturing Grid is distinguished from other data Grid or computing Grid applications in two ways: tasks. The tasks submitted in MG are not formulas or data but products; requirements. The consumer’s requirements usually include user satisfaction, product quality and service, time-to-market, and cost, which are normally called TQCS (Time, Quality, Cost, and Service) [6, 7].

Resource Management and Scheduling in Manufacturing Grid

139

So, we develop a TQCS -based Manufacturing Grid Resource Scheduler (MGRS) to perform scheduling functions in MG, shown in Fig.3.

Fig. 3. Manufacturing Grid Resource Scheduler (MGRS)

The functions of the four main components in MGRS are as follows: GPP Analyzing. Based on the information from GPP knowledge database, Global Process Planning (GPP) aims at analyzing and decomposing the submitted task into a few serial or parallel basic manufacturing subtasks. Resource Discovery. The goal of this step is to identify a list of authorized resources that are available to the consumer by interacting with MGIS. And, if possible, the preliminary filter of resources can be made in this step. Resource Selection. Once the list of possible resources is known, MGRS will select those resources that meet the basic constraints imposed by the user. Resource Mapping. In this stage, the optimal solution is chosen to map the subtasks onto resources. The choice of the best pairs of tasks and resources is a multiobjective decision-making problem in manufacturing. And the optimization criteria are often the random combination of Time, Quality, Cost and Service (TQCS).

3 Conclusions In order to solve resource management and scheduling problem to handle dynamic changes in availability of manufacturing resources and user requirements in Manufacturing Grid (MG), we develop a resource management and scheduling system with the interaction of Manufacturing Grid Information Service (MGIS) and Manufacturing Grid Resource Scheduler (MGRS). The former, MGIS, provides fundamental mechanisms for remote resource encapsulating, registration, discovery and monitoring, and the latter, MGRS, performs scheduling roles as Global Process Planning (GPP) analyzing, resource discovery, resource selection, and resource mapping.

140

L. Liu et al.

References 1. Liu Lilan, Yu Tao, Shi Zhanbei, etc: Self-organization Manufacturing Grid and Its Task Scheduling Algorithm. Computer Integrated Manufacturing Systems (2003). 2. Liu Lilan, Yu Tao, Shi Zhanbei, etc: Research on Rapid Manufacturing Grid and Its Service Nodes. Machine Design and Research (2003). 3. Shi Zhanbei, Yu Tao, Liu Lilan: Service Registry and Discovery in Rapid Manufacturing Grid. Computer Applications (2003). 4. GT3 Index Service User’s Guide. http://www.globus.org/ogsa/releases/final/docs/infosvcs /indexsvc ug.html. 5. Thomas Sandholm, Jarek Gawor: Globus Toolkit3 Core – A Grid Service Container Framework. http://www-unix.globus.org/core/. 6. Kavitha Ranganathan, Ian Foster: Computation and Data Scheduling for Large-Scale Distributed Computing. http://www.globus.org/research/papers.html 7. S. H. Wu, J. Y. H. Fuh, A. Y. C. Nee: Concurrent Process Planning and Scheduling in Distributed Virtual Manufacturing. IIE Transactions (2002).

A New Task Scheduling Algorithm in Distributed Computing Environments Jian-Jun Han and Qing-Hua Li Department of Computer Science and Technology, Huazhong University of Science & Technology, Wuhan 430074,China han_j _j @16 3.com

Abstract. Distributed computing environments are well suited to meet the computational demands of diverse groups of tasks. At present the most popular model characterizing task’s precedence is DAG(directed acyclic graph). In [2], a novel model called TTIG(Temporal Task In Interaction Graph)that is more realistic and universal than DAG and its corresponding algorithm called MATE are presented. This paper extends TTIG model and proposes a new static scheduling algorithm called GBHA(group-based hybrid algorithm) that eliminates cycles when traversing TTIG, so that global information can be captured. Simulation results show that our algorithm outperforms MATE significantly in homogeneous systems.

1 Introduction Efficient scheduling of application tasks is a key issue for achieving high performance in parallel and distributed systems. Since the general DAG scheduling algorithm is NP-complete [1], many research efforts have been made in this research field[3-5]. Among all of these scheduling algorithms, list-scheduling algorithm has been shown to have a good cost-performance trade-offs and static scheduling outperforms dynamic scheduling in most cases when precedence, computation and communication volumes of tasks are known a priori. However, most of these algorithms are base on DAG, each task of which communicates with other tasks only at the beginning and at the end of this task, so that it is not well suited to model iterative parallel programs that repeatedly alternate computation and communication phases with other tasks. Hence, [2] proposes a new model, called TTIG, to characterize dependencies between tasks and get rid of drawback described above in DAG, and its corresponding scheduling algorithm called MATE. In this work, we extend TTIG and propose a new algorithm called GBHA that ranks the nodes upward based on groups and maps the nodes onto processors according to their earliest completion time or earliest start time of each processor.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 141–144, 2004. © Springer-Verlag Berlin Heidelberg 2004

142

J.-J. Han and Q.-H. Li

2 Modified TTIG Unlike TTIG ,Modified TTIG is derived from TFG directly without taking time of concurrency into account. First, some new concepts are described as follows: Definition 1.Normal Node (NN). NN is the same as the node that communicates only at the beginning or at the end in DAG. Definition 2. Composite Node (CN). CN is derived from TFG, which exchanges information with other nodes repeatedly within nested loops in program code. Definition 3. Component of Composite Node (COCN). COCN is the component of one CN. Each component of a CN should be assigned to the same processor. In figure1, T1 is comprised of four components. Definition 4.Direct Precedence Set of NN (DPSNN) where and Definition 5.Direct Successor Set of NN (DSSNN) where or

and

Definition 6. Direct Precedence Set of CN (DPSCN) where or

and

Definition 7 .Direct Successor Set of CN (DSSCN) or where

and

Definition 8.Group (GP) Group is the core of our algorithm, whose main goal is to prevent yielding cycles. In order to prevent forming cycles in ranking procedure, the nodes in cycle paths are constructed in groups. [6]gives mote details. GP(m): refers to the node set in group m. QG: Each group has a two-dimension array called queue of group, each of which represents a path in graph. QG(m)(n) refers to the nth queue of the group m. NQ(m):refers to the number of queues in group m. DPSG(m): denotes the set of nodes from which communication data are transferred directly to GP(m). where DSSG(m): denotes the set of nodes to which communication data are transferred directly from GP(m). where Next, the construction procedure of group is described as follows: Initially, g=0, QG(m)=null for each m .

A New Task Scheduling Algorithm in Distributed Computing Environments

Step 1. If

143

where

and

then

remedy accordingly.

Step 2. For each where

where

if there exist paths from

to

(a group is treated as one node when traversing the modified

TTIG graph in depth-first-search. When one node that belongs to a group n is traversed, all nodes in this group are traversed, and the next nodes to be traversed are in DSSG(n)), then find out all paths from to For each node in paths, if any GP(n) where

then GP(m)=GP(m)

remedy DPSG(m),

DSSG(m), QG(m), NQ(m) accordingly. Otherwise, if where and then merge group n into group m, remedy parameters in group m accordingly and delete group n, It is important to note that each path yields one queue in a group. Step 3. For each where and is a CN, if there exist cycle paths from

to

then find out all paths from

to

g’= g’+1, and append

to the new group.

Similar to above strategy, append the nodes in paths to and remedy parameters accordingly. Note that each path yields one queue in a group. Theorem 1. Given that each group m is treated as a unit in graph and let the nodes in DPSG(m),DSSG(m) be this unit’s predecessors and successors respectively. Then, it is impossible to find cycle when traversing the graph. Proof. We assume that a cycle can be found when traversing the graph, then there must exist a CN in this cycle path, otherwise it will contradict the rule that it is impossible to find cycle in DAG. Hence, without loss of generality, let node i be composite node in this cycle path. If node i belongs to group n, it will contradict the assumption that each group is treated as a unit. If node i doesn’t belong to any group, two cases occur: 1)If all other nodes in this path are NNs except node i, all nodes in this path are merged into a group according to step 3. 2) Provided that there exists a composite node j besides node i, all nodes in this path are merged into a group according to step 1 or step 3 if j doesn’t belong to any group. Similarly, nodes in this path are merged into a group according to step 2 if node j is a group node. Therefore, two cases stated above contradict the assumption that each group m is treated as a unit.

3 GBHA Algorithm and Simulation Experiment The procedure of GBHA1 used in homogeneous system is presented as follows:

144

J.-J. Han and Q.-H. Li

Step 1. Search and construct groups in modified TTIG. Step 2. Sort queues for each group in non-increasing order of their length. Step 3. Rank the nodes in the graph. Step 4. Enqueue the start node into sorted list. Step 5. If there is unscheduled task in sorted list, then select the first task

in the

sorted list, 1)if and then assigning it to the processor on which task i can start execution the earliest, only two processors are considered here as mentioned above. 2)if and then assigning it to the processor on which the first component of task i can start execution the earliest. 3)if then mapping all unscheduled nodes in group g to processor in non-increasing order of length of queue to which Ti belongs. If the method is the same as 1). If , the method is the same as2). After Ti is scheduled, its ready successors are added to FIFO list or sorted list and Ti is dequeued from sorted list.Repeat Step 5. Since group can eliminate cycles in TTIG, many mature heuristics can be used in GBHA. As GBHA captures global behavior of TTIG, GBHA outperforms MATE significantly. The details of algorithm and simulation results refer to [6].

4 Conclusion In this paper, we extend TTIG model further, and propose a new method based on group called GBHA, which outperforms MATE significantly and can be comparable to efficient multiprocessor scheduling algorithms based on DAG but with a significant lower time complexity.

References 1. M.R.Garey and D.S.Johnson. Computers and Intractability: A guide to the Theory of NPCompleteness. W.H.Freeman and Co..1979. 2. C.Roig, A.Ripoll, M.Senar, F.Guirado, and E.Luque. A new model for static mapping of parallel applications with task and date parallelism. IEEE Proceedings of the International Parallel and Distributed Processing Symposium, 2002, 78-85. 3. M.Tan.H.J.Sigle, J.K.Antonio, and Y.A.Li. Minimizing the application execution time through scheduling of subtasks and communication traffic in a heterogeneous computing system. IEEE Trans, on Parallel and Distributed Systems, Aug.l997,8(8):857-871. 4. H.Topcuoglu, S.Hariri, and M.-Y.Wu. Task scheduling algorithms for heterogeneous processors. In Proc. Heterogeneous Computing Workshops, 1999. 5. A.Radulescu,A.J.C.van Gemund. Low-cost Task Scheduling for Distributed-Memory Machines. IEEE Transactions on Parallel and Distributed Systems,2002,13(6):648-658. 6. A New Static Task Scheduling Algorithm in Homogeneous Computing Environments. MiniMicro Systems. to appear.

GridFerret: Grid Monitoring System Based on Mobile Agent Juan Fang, Shu-Jie Zhang, Rui-Hua Di, and He Huang College Of Computer Science, Beijing University of Technology, Beijing 100022, China [email protected]

Abstract. GridFerret system is a grid resource discovery and monitoring system which bases on mobile agent, it applies mobile agent technology to grid environment. The existed resources in the grid environment, the status information of grid computing nodes, the optimized information of current grid environment can be provided by GridFerret system, the introduction of mobile agent technology reduces the network traffic during the grid resource discovery and monitoring process effectively.

1 Introduction The concept of grid should include three characteristics: coordinates resources that are not subject to centralized control, using standard, open, general-purpose protocols and interfaces, to deliver nontrivial qualities of This paper introduces the content and resolve methods of grid resource monitoring, a new grid resource discovery and monitoring model is constructed based on mobile agent technology.

2 Grid Monitoring and Related Technology 2.1 MDS Architecture of Globus Toolkit 2.4 In the context of Globus Toolkit, information services have the following requirement: A basis for configuration and adaptation in heterogeneous, dynamic environments Access to static and dynamic information regarding system components Scalable, efficient access to dynamic data Uniform, flexible access to information Decentralized maintenance MDS can aggregate information from multiple systems at a physical site as well as aggregate information from multiple sites within the project.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 145–148, 2004. © Springer-Verlag Berlin Heidelberg 2004

146

J. Fang et al.

2.2 Aglet System Model We introduce mobile agent to design the GridFerret system, exert the characteristics of mobile agent, reducing the network traffic of grid resource discovery and monitoring process effectively. GridFerret system adopts Aglet platform which based on Java language to exploit, it provides a simple and general mobile agent programming model, then provides dynamic and efficient communicate mechanism.

3 GridFerret System 3.1 The Design of GridFerret Directory Information Tree GridFerret system adopts distributed directory structure to describe the structured characteristics of grid. GridFerret system adopts OpenLDAP, each server customizes schema according to rfc criterion. Global LDAP server answers for store grid resource name and position information, it doesn’t store concrete information of resource usage status. The local LDAP server of each node stores the concrete information local file system, storage, network, professor, job, OS, etc. GridFerret system customizes gridferret.schema according to rfc criterion. For instance, the definition of GF-Group-Name is as follows: ATTRIBUTETYPE (1.3.18.0.2.4.712 NAME

'GF–Group–Name'

DESC

'Ferret Group NAme'

EQUALITY caseIgnoreMatch ORDERING caseIgnoreOrderingMatch SUBSTR

caseIgnoreSubstringsMatch

SYNTAX

1.3.6.1.4.1.1466.115.121.1.15)

OBJECTCLASS (1.3.18.0.2.6.158 SUP organization MUST GF–Group–Name MAY (GF–validfrom $ GF–validto $ GF–keepto)) The GF-validfrom, GF-validto and GF-keepto define the efficient time of node updating; other attributes are similar with above.

GridFerret: Grid Monitoring System Based on Mobile Agent

147

3.2 GridFerret Architecture The primary agents and their function of GridFerret system are listed as follows. SensorAgent: Monitoring the status information of local resource. RegisterAgent: VO’s members may use it to register, and unregister, if the static information of node changed, it can be updated by RegisterAgent. LDAPAgent: Accomplish the operation of addition, deletion, update, query of LDAP directory server according to API defined by Java JNDI. UpdateAgent: Calling SensorAgent to update the status of nodes, and then achieve all the real-time data and status information. It also acts as static agent to stay at agent runtime environment to run update in turn. QueryAgent: Searching corresponding resource information in grid computing environment by query criteria. Grid node can join to virtual organization by RegisterAgent.

Fig. 1. Registry and unregistry of grid node

4 Implementation of GridFerret At present the system has implemented in LAN, we use five servers to experiment, a dell2400 server as global LDAP server; three nodes are two dell2400, one IBM5000. Another IBM4400 is used as backup server of global LDAP server to ensure system continues working when system collapsed. GridFerret system and LDAP server adopt Linux platform, Client end adopts Windows platform, and LDAP directory server adopts OpenLDAP. Each node has Aglet runtime environment, the registry, unregistry and update system status information can be done.

148

J. Fang et al.

5 Conclusion The grid monitoring system based on mobile agent considers the characteristics of grid computing environment; it provides some pertinent mechanism and strategy. Most mobile agent platforms are based on Java, so they have very good expansibility. In addition, comparing with the distributed application which based on RPC manner, the movement of mobile agent need not long time steady connect to network, it can alleviate network load greatly. When monitoring grid resource which distributed wide area, mobile agent avoids network transfer of a great deal data, it will improve the system run efficiency and reliability.

References 1. Ian Foster. What is the Grid? A Three Point Checklist [EB/OL]. http://www.gridtoday.com/02/0722/100136.html, 2002 2. http://www.globus.org/ogsa/TechResources/MDS.html

Grid-Based Resource Management of Naval Weapon Systems Bin Zeng l,2, Tao Hu 2 , and ZiTang Li 1 1

School of Computer Science, Huazhong University of Science and Technology, Wuhan 430074, China [email protected] 2

University of Naval Engineering, Wuhan, 430030, China

Abstract. The continuous transformation of the Chinese navy into an integrated and network-centric capability requires a cooperative and distributed weapon resource management system. As one of the steps into naval information grid our objective is to develop generalized principles for grid computing that can be applied to this specific domain. To address this problem, we adopt the OGSA (Open Grid Service Architecture) technology to rebuild the legacy weapon manager. This paper proposes a generic co-ordination WRM (Weapon Resource Management) based on the integration of grid capabilities. The architecture can effectively shape a stovepipe and self-contained system into a service community equipped with open interfaces thus enabling the command to make fast, high quality weapon allocation decisions across widely distributed platforms.

1 Introduction In the past the Navy has acquired numerous weapon systems that alone can be considered complex systems. However, the current reality is that these systems cannot be viewed as operating in isolation. Grid Technology is an excellent choice of building an open system and system of systems. Network-centric weapon resource management system in the idealized vision is a “publish and subscribe”, “plug and play” network, in which any application can be “plugged” into the network anywhere, at any time, to help achieve warfighting objectives. For applying the advantages of Grid to the widely distributed naval weapon systems, the navy has stepped into the roadmap of developing new system and integrating the legacy systems based on Grid architecture.

2 Technical Foundation of the WRM Service-Based Architecture The WRM of a service-based architecture uses OGSA technology as a foundation. Specifically, WRM uses the OGSA discovery, look-up and lifetime services[1]. FigM. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 149–152, 2004. © Springer-Verlag Berlin Heidelberg 2004

150

B. Zeng, T. Hu , and Z. Li

ure 1 provides a conceptual view of the WRM architecture and those external entities that have a direct impact on WRM operations. Weapon resources are shown at the bottom of the diagram. These resources are the hardware, software, data and communications media that support the weapon operations mission.

Fig. 1. WRM Architecture Typically, Weapon resources are in constant operation and the various properties that reflect status are in a constant state of flux. The key element for a cooperative management system is to monitor these internal property transitions. This is accomplished through the use of the event source layer. This layer reflects the fact that WRM employs the grid standard event reporting structure. WRM manager receives events from weapon management wrapper, which functions as Web Based Enterprise Management (WBEM) CIM Object Manager (CIMOM) and the event producers. WRM managers are both producers and consumers of resource status information. An application that is interested in the type of resource information can find the service providers that support its information needs through directory service. Once discovered, the application and the managers enter into a service contract(s) through a negotiation process. The contract is completed and the client sends specific processing instructions to the service provider through the command services at the right of the event sources. This discovery and contract process facilitates a decoupling of the managers and their clients. Manager Services comprises the first true WRM layer. This layer consists of the various managers implemented within the WRM environment. These managers take the form of Grid services made available to various clients through the look-up and discovery process. The Manager Services also contains a small administrative user interface to support the configuration, deployment, and troubleshooting of individual managers. Directly above the Manager Services layer is the Client Application Layer. This layer contains the clients of Manager services. These clients use Globus Directory service[2] to find appropriate managers. The clients enter into service contracts with the managers and manage their performance through a renewal process. Clients also employ “business logic” drawn from the knowledge base to transform manager events into useful consumer information.

Grid-Based Resource Management of Naval Weapon Systems

151

Business Logic Managers share the characteristics of both client applications and manager services. Aggregate managers employ MDS(Meta Directory service) to discover and enter into service contracts with other managers using the same methods as other clients. These aggregates transform the manager events into process/organization specific status through the use of “business logic” and then make this information available in the same manner as other services. The top layer reflects the visualizations provided to the various WRM user groups and the interfaces to external information presentation and dissemination systems such as the Naval Distributed Command and Control Environment. The knowledge base provides a central repository for information relating specific weapon resources to naval operations processes and organizational elements. The knowledge base also reflects the business logic used in the process of transforming data into useful information. It is WRM’s knowledge base that separates WRM from commercially available monitoring and management systems. The WRM knowledge base is a grid service providing knowledge capabilities to WRM consumers. The knowledge base provides a mapping service that links command and control tasks to the specific resources that support those tasks. This mapping allows WRM users configure WRM clients to discover appropriate WRM manager services using only knowledge of specific tasks. The WRM knowledge base extends this mapping capability by providing a mapping of specific resource problems to their operational impact. The knowledge base also provides clients with indicators for these identified problems and provides, on demand, the specific service contract to be used to determine relevant resource status.

3 Grid Services in Weapon Resource Management WRM is a Grid service community with the individual managers being Grid services. Globus’s ability to support WRM is based around four key concepts [3]. Discovery/Registry: Discovery is the process used to find services on the network and finds its way to use the service. Registry services can be used by manager services to join in service directory. WRM uses the discovery/registry methods provided by Globus to support service consumers in finding appropriate service providers, Fault Monitoring: The Globus HBM (HeartBeat Monitor) service provides simple mechanisms for monitoring the health and status of managers. Fault recovery mechanisms, such as automatic restart of crashed daemon processes, will be implemented later for WRM’s reliability, Events: Remote events are the paradigm Globus uses to allow services to notify each other of changes in their state. Because manager itself is a service, it can use remote events to notify interested parties when the set of services available to a community has changed,

152

B. Zeng, T. Hu , and Z. Li

Security: The single sign-on mechanisms for all Grid resources provided by GSI will be shaped in accordance with military standard such as different security solutions, mechanisms and policies (such as onetime passwords), Now we make a case of WRM’s service delivery process. At initialization, WRM managers announce their availability and register with the Grid Information Service. The managers employ a “well defined” set of information (service data) to advertise its availability to support user requirements. When a client queries the Grid Information Service it receives a set of references for the managers that can potentially satisfy the information requirements. The client application then can use this reference to query the managers to determine the best set of information sources. The client then uses the selected reference(ies) to communicate directly to the manager(s). As a result of the negotiation process, the client and the manager service(s) enter into a service performance contract. The service contract specifies the client’s performance model including set of resources, capabilities of resources, problem parameters and the duration of the contract. The duration is managed in the form of a lease. The client is responsible for renewing the lease in order to maintain the service. The manager implements the function of performance monitor to decide if the contract has been violated.

4 Conclusion and Future Works WRM represents one of the first steps toward developing a service-based architecture for command and control. It also provides a framework for developing future capabilities in support of Network Centric Warfare. The paper presents the architecture descriptions that are useful for directing coevolution and also for understanding and controlling the collection of naval weapon systems. With the completion of the core information infrastructure, the near term focus of the research and subsequent development will be in three main areas. These areas include the expansion of the WRM manager, automated information discovery and Quality of Service issues.

References 1. Foster, I., Geisler, J., Nickless, W., Smith, W., Tuecke, S.: Software Infrastructure for the IWAY High Performance Distributed Computing Experiment. In 5th IEEE Symp. on High Performance Distributed Computing (1997) 562–571 2. Foster, I., Kesselman, C.: Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications.11 (1997) 115–128 3. Krill, R., Jerry, A.: Some Fundamental Principles for Engineering New Capabilities into a Battle Force System of Systems. Panel Presentation at 11th INCOSE (2001) 4. Tierney, I., Aydt, R., Gunter, D., Smith, W., Swany, M., Wolski, R.: A Grid Monitoring Architecture. The Global Grid Forum (2002). http://forge.gridforum.org/projects/ggfeditor/document/GFD-I.7/en/1/GFD-I.7.pdf

A Static Task Scheduling Algorithm in Grid Computing Dan Ma1 and Wei Zhang2 1

School of Computer Science in HuaZhong University of Science and Technology , WuHan, 430074, China. [email protected] 2

WuHan Ordnance N.C.O. Academy of PLA, WuHan, 430075, China

Abstract. Task scheduling in heterogeneous computing environment such as grid computing is a critical and challenging problem. Based on traditional list scheduling we present a static task scheduling algorithm LBP (Level and Branch Priority) adapted to heterogeneous hosts in grid computing. The contribution of LBP algorithm lies mainly on working out a new method determining task priority in task ready list. Compare to the influential algorithms in the field of heterogeneous computing environment for instance HEFT and CPOP, LBP algorithm has a better task scheduling performance under the same time complexity.

1 Introduction The task scheduling is still one of the most challenging problem need to be solved urgently either in grid computing or in traditional distributed and parallel computing. In homogonous environment, the researchers have explored many heuristic task scheduling list-based algorithms. These algorithms are classified as two types: one is called as BNP (Bounded Number of Processor) task scheduling algorithm. It supposes that all processors are fully connected and the number of processors is limited. The task scheduling algorithm ISH[1], MCP[2] and ETF[2]etc. belong to this kind of algorithm. Another is called as APN (Arbitrary Processor Network) task scheduling algorithm. It supposes that the processors network is arbitrarily connected and the number of processors is also limited. So it must consider the communication contention because the processors network isn’t fully connected. The task scheduling algorithm MH[3] and DLS[2] etc. belong to this kind of algorithm. The above two types of task scheduling algorithm works in the homogenous environment. But in heterogeneous system (such as grid system) the task scheduling problem is more complex because more factors such as different processor capacity, matching of different language codes and overhead of communication contention etc. are involved in task scheduling. So far, the task scheduling algorithm under heterogeneous condition is not often seen in literature. The influential algorithms are HEFT[4] and CPOP[4] presented by H.Topcuoglu et al. This paper presents a static task scheduling algorithm LBP (Level and Branch Priority) in grid environment. The LBP algorithm may obtain more optimizing perform-

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 153–156, 2004. © Springer-Verlag Berlin Heidelberg 2004

154

D. Ma and W. Zhang

ance (viz. more shorter scheduling length) than HEFT and CPOP under the same condition. At the same time, it doesn’t increase the time and space complexity.

2 The Basic Definition and Model The most common task scheduling model is Directed Acyclic Graph (DAG). A parallel program can be well expressed as a DAG. In DAG, the parallel parts of application program are partitioned as many tasks. There exist communication data among some tasks. Generally, the task scheduling in grid environment may be seen as two independent stages logically. The first stage is task mapping or task assignment stage in which a task is assigned to a certain host. Simply expressing as: denotes an arbitrary task, denotes an arbitrary host, n denotes the number of tasks, k denotes the number of hosts. The second stage is a task scheduling or task order stage. In this stage the all tasks already assigned on respective hosts are decided when to start. Expressing as: denotes the start time of the task A DAG is denoted by graph G=(V,E). denotes n tasks with weight value; denotes the communication edge with weight value. Considering heterogeneous hosts, the computation workload of the task is various on the different hosts. So denotes the computation time of the task executing on a certain host denotes the message size between the task and the task Definition 1. The task node without any parent node is called entrance node. The task node without any child node is called exit node. If there is not only an exit node then only an exit node is named and the other exit nodes are looked as a common node which exists a void edge with zero communication message to the only exit node. Definition 2. The idle hosts that adopted to be scheduled in grid system are fully interconnected. The number of idle hosts is limited. All idle hosts could synchronously carry on the task computing progress and message passing progress. The message size among tasks that are scheduled on the same host is zero.

3 The Proposed Algorithm and Analysis on Its Performance Most static heuristic task scheduling algorithm is based on classic list scheduling ideas. Its basic content may be divided into two independent steps. Step 1. All tasks in task graph are sorted according to a certain priority order and form a task ready list. Step 2. Taking out the head node from the task ready list one by one and scheduling it to a certain processor by employing a special strategy. The algorithm HEFT is a typical static list scheduling algorithm. By analyzing the basic idea of list scheduling, we think the most key factor in step 1 is how to determine the priority of task node. Generally, the two common attribu-

A Static Task Scheduling Algorithm in Grid Computing

155

tions that determine task priority is T-LEVEL (T-LEVEL of the task is the length of the most longest path from the entrance node to the task node and B-LEVEL (B-LEVEL of task is the length of the most longest path from the task node to the exit node ). The T-LEVEL value of the task is involved to the most early start time of and The B-LEVEL value of the task related to the critical path of the task graph. The algorithm HEFT used B-LEVEL as the priority attribution. Different from homogeneous environment, the task executing time is a mean executing time on all different hosts when computing the B-LEVEL. Saying step 2, many algorithms adopt the greedy strategy. The algorithm HEFT does so. Further it permits inserting the current task in the time gap of two scheduled tasks. This insertion undoubtedly increases the overhead of the algorithm. The algorithm LBP mainly improves on the selection of task priority attribution in step 1. In the homogeneous environment, the T-LEVEL and B-LEVEL value is the most important task priority attribution. Especially the B-LEVEL emphasizes that the tasks on the critical path should be scheduled as soon as possible. But in the heterogeneous environment, the task executing time on the different hosts is various. Only adopting the mean executing time to computing the B-LEVEL isn’t wise because the mean B-LEVEL can’t really reveal the relationship between the task and the critical path. In view of the heterogeneity, we present a new way of computing the task priority that is called as Level-Branch Priority. The way determining the task priority described follows: First, computing the Level value of each task. There are two methods to compute the Level value. From the entrance to the exit;(the method isn’t introduced here owing to limit of paper length) From the exit to the entrance: When there isn’t only an entrance node, computing the Level value of every task node according to the sequence from the exit to the entrance. The value is the sum of the edges on the longest path from the exit node to the task node If there are j paths from the exit to task and the value is relevant to the jth path, Define: then Then computing the branch value of each task. is the sum of all out edges weight value of the task viz. is the out degree of Finally, determining the priority of the task according to the and The priority of is higher than if despite or If then comparing the and If then the priority of is higher than Contrarily then the priority of is lower than The whole LBP algorithm simply described as below in a non-formal mode: Input the DAG, determining the priority of any task according to the and Put any task into the task ready list at the decreasing order of the priority of the task While the task ready list is not empty Do Take out head task from the task ready list to begin scheduling. For each host in idle hosts set Do Computing the most early finish time of the task when it is scheduled on the host not considering the insertion the current task into the

156

D. Ma and W. Zhang

time gap between any two scheduled tasks when computing the most early finish time of the task Endfor Scheduling the task on the host that makes it could be finished at a most early time. Endwhile Output the task scheduling gantt chart. Time complexity: The time complexity of the HEFT and CPOP algorithm is O(e*q). e is number of all edges in DAG, q is the number of all idle hosts. The LBP algorithm adopts the same greed strategy to select the idle host as the HEFT and CPOP algorithm. The difference lies on that the LBP algorithm scheduled the current task only after the last scheduled task on the idle host and the HEFT algorithm considering the insertion operation. The time complexity of LBP isn’t greater than HEFT. So the time complexity of LBP is also O(e*q). Scheduling performance: Some simulation experiment were made by adopting small-scale stochastic DAGs. These DAGs compose of task nodes from ten to a hundred. The CCR (Communication to Computation Ratio) of DAGs vary from 0.1 to 10.The two important indices were discussed: the mean run time and the mean speedup. The simulation results reveal that the mean run time of LBP is a little less than HEFT and CPOP when the number of task nodes is great. At the same time, the mean speed-up of LBP is higher than HEFT and CPOP when the number of task nodes is small. With the number of task nodes becoming bigger the mean speed-up of LBP tends to be uniform as HEFT and CPOP.

4 Conclusion The static task scheduling algorithm aiming at the heterogeneous environment isn’t often seen. The HEFT and CPOP algorithm are two influential algorithms. Based on HEFT and CPOP this paper presents a new task priority determining and task scheduling algorithm called LBP. Comparing to HEFT and CPOP, the LBP algorithm may obtain better scheduling performance than HEFT and CPOP without increasing the time and space complexity.

References 1. H.EL-Rewini, T.G.Lewis, H.H.Ali. Task Scheduling in Parallel and Distributed Systems, Englewood Cliffs, New Jersey: Prentice Hall, 1994. 2. Rajkumar Buyya, High performance cluster computing: Architectures and system (volume No. 1).402-406. 3. H.EL-Rewini, T.G.Lewis. Scheduling Parallel Programs onto Arbitrary Target Machines. Journal of Parallel and Distributed Computing, vol.9(2), 138-153, June 1990. 4. Haluk.T, Salim.H, Min-You Wu. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing. Transactions on Parallel and Distributed Systems, vol.13, No.3, March 2002.

A New Agent-Based Distributed Model of Grid Service Advertisement and Discovery Dan Ma, Wei Zhang, and Hong-jun Zhang School of Computer Science in HuaZhong University of Science and Technology, WuHan 430074, China. [email protected]

Abstract. Grid computing is becoming a research focus in distributed and parallel systems. The idea of the grid service being a kernel of the whole grid architecture is accepted by the researchers. The grid service has the characteristics of high scalability and dynamic. By analyzing the resource management in heterogeneous environment and agent-based hierarchy model, this paper presents an agent-based distributed model on grid service advertisement and discovery. It can’t only satisfy the scalability and dynamic of grid service, but may also reduce the system overhead comparing to central management of service advertisement.

1 Introduction Geographically wide-area network and many large-scale distributed high-end resources managed by various organization or personnel compose a new corporate computing mode. This new heterogeneous and distributed corporate computing mode is called grid computing or grid. In grid computing architecture, the resources provided to the grid users are abstracted as the grid service. According to important standard proposal-Open Grid Service Architecture (OGSA)[1] presented by GLOBAL GRID FORUM, the concept of grid service is far-ranging. All various kinds of computational resource, storage resource, interconnected network, application program and database etc. is grid service. Advent of the concept of grid service is helpful to eliminate the existed difference among various heterogeneous resources in grid system. However, the grid service management isn’t an easy task, because the grid service in a real grid system should have high dynamic and scalability. This status often causes the problem of looking for the grid service that could satisfy to the performance need of the application users. So providing a valid grid service advertisement and discovery mechanism is necessary. Further, this mechanism oneself should be simple and consume as little system overhead as possible. The software agent is a powerful high-level tool for modeling a complex software system. So it is adapted to implement the advertisement and discovery of grid service. As far, some typical distributed and parallel systems implement different resource management model. They have different characteristics. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 157–160, 2004. © Springer-Verlag Berlin Heidelberg 2004

158

D. Ma, W. Zhang, and H.-j. Zhang

Condor[2]: User agent negotiates with the resource agent by the matcher. The user agent asks the matcher for resource, the resource agent provides resource information to the matcher. The matcher is responsible for making a match between resource provider and resource requestor. Obviously, the matcher becomes the system bottleneck. This brings additional trouble when the system is frequently extended. Globus[3]: In Globus, Meta computation Directory Service (MDS) is adopted to manage static and dynamic information of all resources. This mode may satisfy the need of scalability and dynamics of system resource. Nevertheless, all LDAP servers must be notified once when the state of any a resource is changed or a new resource is added into the system. This will increase the system and network overhead. Agent-based hierarchy model[4]: According to this model, the agents that are responsible for service advertisement and discovery are organized as a hierarchy. When a new service is added into an agent, this agent need to distribute the new service message to its next up-level and down-level agents. When a node in hierarchy asks for a certain service, its agent query to next up-level agent until to the highest level. In a high dynamics scene especially when service need to be distributed frequently the overhead will be greatly increased and the nodes that located in higher level may become the bottleneck.

2 A New Agent-Based Distributed Model The basic idea of the new agent-based distributed model is that the grid service agent exists at every grid node in grid system. Different from the hierarchy model, all agents don’t form the superior or junior level relationship. Every agent has same position in grid system viz. they all are equal. Each agent only maintains its own service information, it also keep a remote service address list that records the address of its neighbor nodes. When the agent needs to look for the service that doesn’t exist at local node it only interacts with neighbor nodes using some service discovery algorithm. This kind of organization structure resembles the P2P mode. So this model is more suited to the scene of highly distributed grid system. The software agent is a perfect high-level tool for modeling a grid system. All agents that manage grid service form a multi-agent system. The software agent provides a coordination platform for service requester and service supplier. Each agent isn’t only a service supplier, but also a service requester. The grid service agent works as a kernel component of service advertisement (register) and discovery management. It composes of a series of functional modules. In addition to some general function modules for example communication modules etc. The basic modules that are mainly responsible for service register and discovery include: the service register and discovery interface, the local service register module, the remote service address list module, the optimizing strategy module for service discovery. The elemental function of these modules is described as below and the sketch map of the agent-based model is omitted. Service register and discovery interface: Service register and discovery interface is an I/O of the whole agent structure. It receives local or remote service request and set

A New Agent-Based Distributed Model of Grid Service Advertisement

159

up a certain service register/discovery instance. Generally, it first adapts standard service description language such as Web Service Description Language (WSDL) to describe request or registered service. Then pass the service request or register parameters to the local service register module. If the requested service is found in local register module then the interface return the service address to local node or remote node. Or return the fail signal. Local service register module: Local service register module itself is a grid service. It mainly takes charge of register of local service. When the interface passes a local service register parameter, this module registers this service. When a service request parameter from local or remote node is passed, the module calls for service discovery method that is encapsulated in the register service to deal with the service request. If the requested service is already registered then return the local address to the interface. If the requested service is not matched to relevant service then pass this request to service fast discovery cache. Remote service address list module: Remote service address list module mainly maintains an address list of all neighbor nodes. It first receives the service request from service fast discovery cache. Then selects an appropriate algorithm from service discovery algorithm sets and a neighbor node address in address list. Finally passes all these parameters to the interface. The interface begins to look for the requested service from the remote node by building service discovery instance. Service discovery optimizing strategy module: Service discovery optimizing strategy module composes of some optimizing strategy components for instance service discovery algorithm sets and service fast discovery cache etc. The algorithm sets collects many service search algorithms such as Depth First Search (DFS) algorithm and Width First Search (WFS) algorithm. Service fast discovery cache reserves some last access remote service addresses and some basic service address which are often accessed. Once the service register module don’t meet the service request, the service request is transferred to cache and begin to match the relevant service whose address recorded in the cache. If matching succeeds then return its service address, otherwise pass this service request to the remote service address list module.

3 Service Advertisement (Register) and Discovery Mechanism Service register: All grid service only is registered in local node. This mode avoids the embarrassment that register server easily becomes system bottleneck. The traditional central service register mode such as Condor or Globus often produces such problem. At the same time, the data of every grid service are only reserved in the local node therefore don’t occupy large storage space. The local service needn’t be registered to any remote node, so save network bandwidth. Grid service register procedure works as: When a local resource want join grid system, it presents service register request to local agent. The service register and discovery interface in local agent first describes the received service register information in standard service description language. Then a service register instance is created. Call for register

160

D. Ma, W. Zhang, and H.-j. Zhang

method and relevant data of service register instance and register service to the local service register module. Service discovery: Service discovery mechanism has two cases--local service and remote service discovery. Local service discovery procedure works as: The local node presents service request. The interface describes service request in standard description language and creates a service discovery instance. Call for discovery method and relevant data of instance to query in local service register module. If matching succeeds then return local address, otherwise Enter the service fast discovery catch to look for reserved service . If matching succeeds then return remote service address, otherwise Start the remote service discovery procedure. Remote service discovery procedure works as: Local service request activates a service discovery instance in interface and no relevant service is found at local node or cache. Select a appropriate search algorithm from algorithm sets and starting address from the remote address list. Call for remote service discovery method of service discovery instance to query remote node one by one. If matching succeeds within life time defined in service discovery instance then return remote node address. Or return fail signal and stop service discovery.

4 Conclusion The kernel of resource management in grid system based on grid service is how to advertisement and discovery grid service. By analyzing existed resource management mechanism we present a new agent-based distributed model of service register and discovery. This model is different from general central management mode and more suits to the scene of highly distributed grid. It could reduce system overhead and save network resource comparing to central management.

References 1. Foster. I, Kesselman. C, Nick. J.M, Tuecke. S. “Grid services for distributed systemintegration”, Computer , Volume: 35 Issue: 6 , June 2002, Page(s): 37 -46 2. R.Raman, M.Livny, M.Solomon. “Matchmaking: Distributed Resource Management for High Throughput Computing”, In Proceeding of IEEE International Symposium on High Performance Distributed Computing, Chicago, Illinois, July 1998. 3. K.Czajkowski, I.Foster, N.Karonis, C.Kesselman, S.Martin, W.Smith, S.Tuecke, “A Resource Management Architecture for Metacomputing Systems”, In proceeding of IPPS/ SPDP’98 Workshop on Job Scheduling Strategies for Parallel Processing. 1998. 4. Junwei.Cao, Darren J. Kerbyson, Graham R. Nudd, “Use of Agent-Based Service Discovery for Resource Management in Metacomputing Environment” In Proceedings of 7th International Euro-Par Conference, Manchester, UK, Lecture Notes in Computer Science 2150, Springer Verlag, 882-886, August 2001.

IMCAG: Infrastructure for Managing and Controlling Agent Grid Jun Hu and Ji Gao School of Computing, Zhejiang University, Hangzhou 310027,Zhejiang, China [email protected]

Abstract. This paper presents infrastructure for Managing and Controlling Agents Grid (IMCAG). The goal of IMCAG is to realize the distributed service integration within Internet. Regarding providing service as the central task, IMCAG creates a transparent integration to make the easy use of every kind of service in heterogeneous, open and dynamic network environment. This paper expatiates IMCAG from three basic aspects: the framework of the Agent grid information communication; the core control mechanism of Agent grid; the individuation of adjusting with Agent service, and discusses application of IMCAG by a test. The paper concludes that IMCAG will become more mature and complete along with the development of related technique and application in grid. Keywords: Agent grid, web services, Agent federation

1 Introduction How to make use of various computing resources of Internet and how to make virtual organizations in Internet to realize the cooperative work are becoming one of research hotspots. These researches focus on two aspects: 1), Taking Web services as basic elements of the new generation WEB to realize the distributed service integration and cooperative work. But the controlling granularity for Web service is too small, can’t support the systematical construction of system and policy on knowledge level to adjust and control their behavior.2), Regarding ABC (Agent-Based Computing) and CoABS [1] (Control of Agent Based Systems) items as representation. They import Agent grid as means to control Web service and develop the DAML-S [2] language to describe the Web service that Agents can provide. However, most research items are limited on particular aspect or partial problem of managing and controlling Agents Grid, lack the complete theories and method system to guide the systemic development of foundation facility. For this, this paper presents IMCAG: Infrastructure for Managing and Controlling Agents Grid. Applying Agent grid as upper level construction of Web service by Agent and MA (Multi-Agent) technique, it is easy to individually establish the cooperative work system and behavior-restricted policy of Agents on knowledge level, and validly control the quantity and performance of Agent service. Our study focuses on M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 161–165, 2004. © Springer-Verlag Berlin Heidelberg 2004

162

J. Hu and J. Gao

three fields: the semantics of the information context, the core managing and controlling mechanism of Agent grid, transparent managing and controlling of agents service. These make up of a serial study for theory and methodology of the Agent grid in the web environment.

2 IMCAG Architecture Figure 1 shows Agent grid architecture. The whole Agent grid constitutes with various nested AFs (Agent federation) and ACs (Agent cooperative group)[3][4]. AF constitutes with one MA (manage Agent) and some member’s Agents and acquaintance’s Agents, The MA and acquaintance’s Agent or other AF can constitute AC by negotiation for completing a certain service.

Fig. 1. Agent grid architecture

From managing and controlling Agent social behavior angle, IMCAG system is described with a three-element set as following: IMCAG=(CE, MM, SI), MM=(FM, AS, NR) CE----the semantics of the information context, through based-ontologies modeling and expressing mechanism, make exchanging information contents of Agents having clear semantic meaning. It is the foundation of managing and controlling Agents grid. MM----the core mechanism of managing and controlling Agent grid, including three parts: FM--the Agent federation management, it manages the whole cooperative process of Agent; AS--the Agent assistant system, through establishing assistant system for Agent sociality, it makes Agent to conveniently gain the needing services at anytime and anywhere; NR--the Agents rational negotiation, through integrating expression of negotiation contents and ratiocination into the negotiation process, Agent can rationally and flexibly boosts negotiation process based on the Agent social knowledge, and negotiation protocols, negotiation reasoning and decisionmaking model, to obtain the higher consultation intelligence. SI----the transparent managing and controlling of agents, through applying interface Agent as intelligence middleware of man-machine intercourse, it offers convenient means to adjust and control Agent service for customer.

IMCAG: Infrastructure for Managing and Controlling Agent Grid

163

And then, this paper expatiates the logic system and the related essential element of IMCAG through three levels of structures that are mentioned above.

3 The Semantics of the Information Context IMCAG describes the semantics of information context with five-element set: CE = (OKRL, OML, Mapping, ICMT, OAFM), OKRL =(WSL, CDL, CPL), OKRL---- (Ontology Based Knowledge Representation Language) is used in inner part of Agents; OML---- (Ontology Based Markup Language), be used as the communication language among Agents; Mapping----mapping mechanism between OKRL with OML; ICMT---- modeling tool sets; OAFM----the automatically forming mechanism of ontology.

Fig. 2. IMACG modeling frame

The modeling frame is shown in Figure2. OKRL represents knowledge, which is needed by agent when it launches social activity based on web services, from three aspects: description of web services requiring and offering (WSL); applied area ontology (CDL); definition of restricted policy of Agent behavior (CPL). OML is designed as limitary XML. OML contains the descriptive ability of OKRL.

4 The Core Mechanism of Managing and Controlling Agent Grid Agent federation management Adopting activity sharing oriented joint intention as main line [5], AF manages the whole process of Agents cooperation. Joint intention resolves activity to sub-activities and dispatches these sub-activities to corresponding agents. MA of AF control joint intention by Recipe and centralized manage and schedule activity sharing. The Agent assistant system is described with a five-element set as following: AS = (QWS, MSPC, MACM, MAS). QWS----Querying Web services; M S P C - - - Mid-service public center; MACM----Middle Agent cooperation mechanism; MAS----Middle Agents (MA) MA = (WSAR, CMM, MSM). WSAR--Web services

164

J. Hu and J. Gao

advertisement warehouse; CMM-- compatible matching mechanism between QWS with WSAR; MSM--Mid-service mechanism. Agent assistant system offers assistance for Agents from two levels. The first level is MSPC, it manages middle Agent, suggests middle service provider; the second level is MAS, it suggests Web services provider. This double level assistant service is realized by MACM. Agent rational negotiation is described with a five-element set as following: NR = (NP, NE, RNC, MMA, IE). NP----negotiation protocols that be accepted by both parties; NE---Negotiation engine, it boosts negotiation process according to negotiation protocols; RNC----Representation of Negotiation Content, adopt the CDL defined description format; MMA----Mental Model of Agent, used for describing the Agent social faith, domain knowledge, Negotiation state information, and the reasoning knowledge; IE----Inferential Engine, it is divided into three levels: Evaluation, strategy and tactics, decides the Agent negotiation behavior and contents. The mental state model makes Agents to rationally decide negotiation behavior and contents that should be adopted; The description of negotiation contents based on ontologies, make Agents that participate the negotiation have the common semantics of the negotiation contents; negotiation protocols establishes communication rules that must be obeyed for Agents that participate the negotiation.

5 The Transparent Managing and Controlling of Agents Service IMCAG describes adjusting will with restricting policy which established by user, defined with Concept-Definitions of CDL and obeyed by Agent federation when it offers Agent service, named customer policy. Customer can control the Agents behavior indirectly by establishing customer policy. The adjusting of Agent service is described with a five-element set as following: SI = (IA, TS, PS, IR, PT). IA----interaction Agent; TS----computing task set started by customer, PS----policy set established by customer; IR----Agent service controlling mechanism; PT----tracking mechanism of Agent service offering process. The customer starts the desirable Agent serves by IA, specifies the restricting policy obeyed by Agent federation when it offers Agent service, and track the Agent service offering process; IA then starts task that need customer cooperation, ask for instructions of some difficult problem or send out important messages to customer.

6 Conclusion We have validated IMCAG by an instance of test---a minitype conference arrangement. It shows that IMCAG establishes infrastructure of Agent social grid, provide the solution of cooperative work among various virtual organizations on the Internet. IMCAG completely and integrallty dissertate the theories and methodology of managing and controlling Agent grid. IMCAG will become integration mechanism of various Agents social grid technique.

IMCAG: Infrastructure for Managing and Controlling Agent Grid

165

References 1. Dylan Schmorrow. Control of Agent-Based Systems (CoABS). http://www.darpa.mil/ ipto/programs/coabs/index.htm. 2. DAML Services Coalition. DAML-S: Web Service Description for the Semantic Web. In The First International Semantic Web Conference (ISWC), June 2002 3. Gao Ji, Lin Donghao. ASOJI: An Agents Based Controlling Integration Method. Pattern Recognition & Artificial Intelligence,2000,13 (2): 151-158. 4. Gao ji, Wang Jin. ABFSC: An Agents-Based Framework For Software Composition. Journal of computers, 1999,21 (10): 1050-1058. 5. Gao Ji, Lin Donghao. Agent Cooperation Based Control Integration by Activity-Sharing and Joint Intention, JCST, 2002, 17(3), 331-340.

A Resource Allocation Method in the Neural Computation Platform Zhuo Lai, Jiangang Yang, and Hongwei Shan Department of Computer Science, Zhejiang University Hangzhou, China 310027 [email protected]

Abstract. A resource management framework was designed for a neural computation platform based on the Grid technology. The Metacomputing Directory Service (MDS) in Globus toolkit was employed to locate the resources in the Grid. A kind of semi-structure data model was adopted to encapsulate the schema, data and query of tasks and resources. The position in the hierarchy and waiting time of tasks were taken into account to sort the tasks beforehand and those tasks chose compatible computing nodes in that order.

1 Introduction Neural networks have inspired many scientists to propose them as a solution for various problems. NCP (Neural Computation Platform) is developed to relieve the burden of implementing all kinds of neural network models from scratch. Because he training of a particular neural network involves huge amount of data, we used the idea of Grid Computing [1] to construct a distributed system. And the purpose of this paper is to present an autonomous resource allocation method used in NCP. Figure 1 shows the overview of the resource management system in the NCP. Metacomputing Directory Service (MDS)[3,4] provides information services in Globus project [2]. The platform status can be listed through queries of MDS. The GIIS provides a method to combine various GRIS services together and a consistent Grid resource system image, which facilitates queries from Grid application. GRIS and GIIS are both components relating to MDS in Globus toolkit. There exists a service cache in GIIS. Resources can register themselves either through GRIS or directly access GIIS. Platform users can also request for resources to GIIS. If the caches have expired, it will acquire latest information from GRIS. The Resource Information Collecting procedure (in Figure 1) acquires the latest resource information from MDS periodically and stores them to Resource Information database. Task Status collecting procedure (in Figure 1) also stores task status to Task Status database. Two queues have been created: map-ready-queue and map-urgentqueue. A task activation program checks the tasks and if needed data acquired, the tasks will be placed into map-ready-queue. Resource allocation for these waiting tasks will go on next. When matching starts, the weight of the task in a waiting queue is calculated first and resource allocation refers to it. If no proper computation node

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 166–169, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Resource Allocation Method in the Neural Computation Platform

167

found, the task still stays in the queue but its weight is incremented. The task will be placed into map-urgent-queue if its weight exceeds a limit.

Fig. 1. Resource management system in the computation

2 Sorting and Mapping Algorithms 2.1 Data Model A semi-structure data model called Classified Advertisement (Classad)[5] is adopted by NCP. Classad is flexible and extensive, which encapsulates resource queries into the data model. Figure 2 shows the formalized representation of Classad in our platform [5]. A Classad may contain the following items: attribute, constraint, rank. Constraint exists between computing nodes and tasks.

Fig. 2. Formalized description of Classad They can both set limits to the other. Rank embodies the definition of QoS from tasks so that users can define different QoS rules under different circumstances. The most outsanding characteristic of Classad is that it allows computing nodes to define

168

Z. Lai, J. Yang, and H. Shan

their own policies. Those computing nodes will reject any tasks that conflict with policies.

2.2 Pre-sorting Tasks The NCP is a distributed system on which several sub-tasks from a particular task may run at different computing nodes. These subtasks are data-dependent. Directed Acyclic Graph (DAG) [6] can depict their relations.Those sub-tasks are executed in the order depicted in DAG. Subsequent sub-tasks can get the execution chance only when earlier subtasks have finished, so earlier subtasks should get more chances to choose proper computing nodes. In NCP, a DAGP is defined which represents the priorities of subtasks in DAG. An exit node set (ens) is created which contains the lowest level nodes in DAG. For those nodes si not at the lowest level, an immediate successor set (iss) is created which contains the nodes immediately below si and have data-dependent on it. So for each si, DAGP is calculated as follows:

In fact, the DAGP value of a sub-task is the distance from its position in DAG to the exit. A Waiting Time Priority (WTP) is defined in our neural computation platform. If task T failed in a match, it stayed in the queue and waited its next chance and the WTP value of it shifted left one bit.

The t in expression 3 and 4 above represents the number of matches has been done on task si. The changing of WTP gets rid of the possibility that a task can not acquire proper resources for its low priority. And the other advantage is that WTP is a complement of Classad. Although Classad endows computing nodes with the power rejecting tasks, it may prevent tasks finding a proper node for ever. So in our platform NCP, when WTP exceeds a predefined number, the task will be placed on the node it chooses.

2.3 Mapping Algorithm As a task and a computing node both own its Classad, a first match is done on the compatibility of the two Classad. A task can only run on a node without Classad conflicts. In Classad, self.attribute represents the attribute of itself while other.attribute represents the attribute of the other. For example, In a task’s Classad, other.Memory refers to the Memory attribute of the computing node with which the compatibility

A Resource Allocation Method in the Neural Computation Platform

169

test will be done; self.Datasize refers to the Datasize attribute of the task itself. Two Classad are compatible if and only if both constraints are true. The priority value of task si is calculated by adding DAGP and WTP of it. Tasks with higher priority can choose node earlier.

If p tasks is waiting for resources and q nodes available, Ti (i <= p) has the highest priority and does compatibility matches with q nodes. The rank value of compatible nodes can be calculated using the Rank attribute of Ti. Ti will run on the node with the highest rank value.

3 Conclusions We have introduced a new resource allocation method in our neural computation platform. Compared with traditional resource management system, this method gives more flexibility on task requirements and resource utilization. If each task sets its Rank to 1/execution time, the algorithm presented here will be very similar with traditional ones. Just because every task and computing node has its own policies, user-defined QoS can be satisfied. The Qos here is more of speed which is considered most by traditional methods.

References 1. 2. 3. 4. 5. 6.

I.Foster and C.Kesselman , The Grid: Blueprint for A New Computing Infrastructure. Morgan Kaufmann, 1999. 159-180. The Globus Project http://www.globus.org. Czajkowski,K., Fitzgerald,S., Foster, I. and Kesselman, C., Grid Information Services for Distributed Resource Sharing. In 10th IEEE International Symposium on High Performance Distributed Computing,(2001),IEEE Press,181-184 Fitzgerald, S., Foster, I., Kesselman,C., Laszewski,G.v., Smith,W. and Tuecke, S. A Directory Service for configuring High-performance Distributed Computations. In Proc. IEEE Symp.on High Performance Distributed Computing, 1997, 365-375. Raman, R., Livny, M. and Solomon, M., Matchmaking: Distributed Resource Management for High Throughput Computing. In IEEE international Symposium on High Performance Distributed Computing, (1998), IEEE Press, 140-147. Muthucumaru Maheswaran, Howard Jay Siegel, A Dynamic Matching and Scheduling Algorithm for Heterogeneous Computing Systems. In: Proceedings of the IEEE Heterogeneous Computing Workshop (HCW’ 98), IEEE Computer Society Press, 57-69.

An Efficient Clustering Method for Retrieval of Large Image Databases Yu-Xiang Xie1, Xi-Dao Luan1, Ling-Da Wu1, Song-Yang Lao1, and Lun-Guo Xie2 1

Multimedia R&D Center, National University of Defense Technology, Changsha, 410073, China [email protected]

2

School of Computer Science, National University of Defense Technology, Changsha, 410073, China

Abstract. This paper proposes a clustering method called CMA, which supports content-based retrieval of large image databases. CMA takes advantages of k-means and self-adaptive algorithms. It is simple and works without any user interactions. There are two main stages in this algorithm. In the first stage, it classifies images in a database into several clusters, and automatically gets the necessary parameters for the next stage - k-means iteration. We test our CMA algorithm on a large database of more than ten thousand images. Experiments show the effectiveness of this method.

1 Introduction Content-based retrieval of large image databases is still a challenge because of the large quantity, the abundance and the complexity of the media content, and the inexact understanding of the media content, etc. Many schemes [2] [3] for large image retrieval have been proposed. Some of them are to classify images in the databases according to the similarity measurements. One effective way of classifying large image databases automatically is clustering. The goal of clustering is to cluster samples into different clusters, while the samples in each cluster are more similar to each other than those in others. After clustering, similar images will be clustered together, thereby the search area will be confined, and then we can find out the target image more quickly and more accurately. There are mainly two kinds of clustering algorithms. One is called partitioning method and the other is called hierarchical method. Partitioning methods segment datasets into K parts by optimizing an evaluation function; the output of these algorithms is K clusters that are not intersectant. Typical algorithms of this kind include kmeans [1] and ISODATA. Hierarchical methods are composed of different layers of partition clusters, partition between each layer is nested, and the output of these algorithms is a layered classifying tree. Typical algorithms of this kind include BIRCH, CURE, self-adaptive [4], etc. This paper presents an algorithm based on k-means and self-adaptive clustering. We call it CMA, which means the combination of k-means and self-adaptive.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 170–173, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Efficient Clustering Method for Retrieval of Large Image Databases

171

2 CMA Algorithm In this section, we will introduce our improved clustering algorithm - CMA. On the one hand, we take advantage of k-means’ efficiency on large datasets; on the other hand, we improve our work on self-adaptive to solve the problem of setting parameters so that the parameters can be obtained dynamically. There are two main stages in CMA algorithm. In the first stage, we get an initial classification of the large image database to get some necessary parameters for the next stage – k-means iteration. The algorithm can run automatically without users’ interactions.

2.1 Initial Classification The object of this step is to classify the large image database into several clusters, decide the number of clusters and the initial centroids, then the experience parameters can be provided to the next k-means iteration. Firstly, we look upon each image in the image database as a cluster. Given two positive numbers and where is the average distance between all images in the image database. Secondly, we look upon each image as a circle center, draw a circle with a radius of and compute the number of images belong to this circle; we call this number as this cluster’s sample density. Thirdly, arrange these sample densities in reverse order; make the largest density to be the first centroid. If the distance between the second largest density sample and the first centroid is larger than then assign it as the second centroid; else determine the next largest density one. If the distances between the next largest density one and all existed centroids are larger than then assign it as a new centroid, and so on. In the end, we can get a group of centroids, which are the initial centroids of each cluster. This process includes several steps: creating distance matrix, computing each cluster’s sample density and arranging them, and finally finding out centroids. This process is also iterative, the terminate condition is that no new centroids can be found.

2.2 K-means Iteration After initial classification, we can get the number of clusters in the image database and the initial centroids then the next step is k-means iteration. The keystone of this stage is to create R clusters that are not intersectant. Here are the detailed steps: Step1: There are N images in the database. Get the number of clusters R and the initial centroids Step2: To each image compute the distance Dis between each cluster’s centroid and the image. Where Sim is the similarity between image and the cluster center Dis is the smallest one, then cluster this image into the jth cluster.

if

172

Y.-X. Xie et al.

Step3: To each cluster i, compute D(i), i=1,2,...,R, where D(i) means the sum of the distance between all images and their cluster centroids in the ith cluster. where

is the number of images in cluster i.

Step4: Compute the sum of the distance D. Step5: Compute each cluster’s geometry centroid (i=1,2,...R), where means the color feature of the image Step6: Compute the distances Dis’ (i=1,2,...N, j=1,2,...,R) between each image and the geometry centroid with the same method of step2. Step7: Compute each cluster’s D’(i) with the same method of step3. Step8: Compute total distance D’ with the same method of step4. Step9: If then assign those geometry centroids as new cluster centroids, go back to step2, else next step. Step10: Arrange the images in each cluster and create linear table, end.

3 Experiments We have a test on a large image database that contains 10,093 images. The test compares the precision and retrieval time of k-means and CMA. We define precision as the number of retrieved relevant images dividing the number of retrieved images. Similarity is a threshold for selecting which images should be regarded as relevant. And retrieval time is a measurement of the retrieval speed of different algorithms. It is calculated as the sum of the time of retrieving relevant images and the time of arranging them. Test results are shown in Fig.1 and Fig.2. From them, we can see that our proposed algorithm of CMA is superior to k-means both in precision and retrieval time.

Fig. 1. Precision of k-means and CMA

Fig. 2. Retrieval time of k-means and CMA

An Efficient Clustering Method for Retrieval of Large Image Databases

173

We test these two algorithms on the large image database repeatedly, and find that both the precision and retrieval time are relevant with the similarity. In Fig.1, the precision is ascending with the rising of similarity, while is quite opposite in Fig.2. This can be explained that with the rising of similarity, the retrieved relevant images are decreased, so do the retrieved images. But the decreasing speed of the retrieved images is faster than the retrieved relevant images. According to the definition of precision mentioned above, precision should be ascending with the rising of similarity. However, in Fig.2, with the rising of similarity, the retrieved relevant images are decreased, so does the time of arranging them. Because the definition of retrieval time is the sum of the time of retrieving relevant images and the time of arranging them, retrieval time should be decreased with the rising of similarity. From Fig.1 and Fig 2, we can draw a conclusion that the lower the similarity is, the more obvious the CMA algorithm’s efficiency is. When the similarity is above 60%, the efficiency of k-means and CMA is similar with each other.

4 Conclusions This paper proposes a clustering method for content-based retrieval of large image databases called CMA. This method is mainly based on k-means and self-adaptive clustering algorithm. It is dynamic and independent of setting experience parameters, and automatically gets necessary parameters by statistical characteristics of the whole image database. As to the algorithm’s efficiency, it has good results in precision and image retrieval time. There are some problems to be improved. For one thing, a faster arrangement algorithm is required to arrange all the retrieved images by similarity orders. Further more, our algorithm is tested on the color features of images. We need to expand it to multi-features in the future. As a whole, for CMA’s effectiveness and simpleness, we think that CMA can be used widely in various fields, such as image data mining, video content analysis, etc. There is much work to be done in the future.

References 1. P.Scheunders, A genetic c-Means clustering algorithm applied to color image quantization, Pattern Recognition, 30(6), 1997, 859-866. 2. Dantong Yu and Aidong Zhang, ACQ: An automatic clustering and querying approach for large image databases, Proc. of ACM Multimedia’99, Orlando FL, USA, Oct.1999, 95-98. 3. Kien A Hua, Khanh Vu, Jung_Hwan Oh, SamMatch: A flexible and efficient samplingbased image retrieval technique for large image databases, Proc. of ACM Multimedia’99, Orlando FL, USA, Oct.1999, 225-234. 4. Xiong Hua, Hu Xiaofeng, A self-adjusting shot-clustering technique without experiential parameters, Journal of image and graphics, Vol.6(A),No.3, 2001, 243-249. (in Chinese)

Research on Adaptable Replication Protocol Dong Zhao1 ,2 , Ya-wei Li 1,2 , and Ming-tian Zhou1 1

College of Computer Science and Engineering, University of Electronic Science and Technology of China 610054 Chengdu, China [email protected], [email protected] 2

BEA Systems, Chengdu Representative Office, 610017 Chengdu, China {dzhao, yli}@bea.com

Abstract. Traditional replication protocols lack adaptation and performance consideration after introducing replication. We propose a new replication protocol that adopts logical token ring architecture with dynamically increasing/decreasing replicas. With precondition to meet availability requirement, it not only enables adaptability to various distributed systems but also guarantees high performance of these systems. The comparison between this protocol and other typical replication protocols shows that it is feasible.

1 Introduction In distributed systems, traditional replication protocols such as active replication (State-Machine Approach)[1], primary backup[2], ROWAA[3] (Read Once/Write All Available), Majority[4] and Grid[5] undermine system performance by introducing many restrictions. Therefore they can only be adapted to certain type of distributed applications. To resolve this problem, we design ROWC (Read Once/Write Circularly), a new replication protocol with high performance, which can support different types of distributed applications.

2 Design of ROWC Protocol The following notations are used: s is an instance of replicated service in the system; m is a message sent from the client of the system which has an id consisting of a total order sequence number tos and ring sequence number rs; s.receive (m) denotes service replicas s has received a message m from a client; s.deliver (m) denotes service replicas s has delivered a message m to high-level applications; c.send (m) denotes a client c sends a message m to replication group; the symbol, defines the precedes relation, a global partial order on all events in the system occurring before: Configuration C is the collection of service replicas alive during the time slot T, viz. is running in T}. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 174–178, 2004. © Springer-Verlag Berlin Heidelberg 2004

Research on Adaptable Replication Protocol

175

Definition 1: Agreed Order Agreed order service is defined as the total order relationship between different requests. That is, any two messages m and m’ in the system have the partial order relationship: or The precondition is: or

in T;

and

2.1 System Model The system model of ROWC protocol adopts the hierarchical architecture, as shown in figure 1. Every service can have multiple replicated instances and deployed in different servers and consisting a replication group.

Fig. 1. System Model of ROWC

Based on our previous research[7], ROWC introduces the SA (Service Adapter) to separate client and replicated service so as to support client to access service transparently. Therefore it can meet the adaptability requirement. The CAP (Client-Adapter Protocol) between client and SA is independent of the ARP (Adapter-Replica Protocol) between SA and back-end replicated service. So ROWC can support multiple communication protocol just by implementing corresponding SA. Each SA has a CRQ (Client Request Queue) and a SRQ (Service Request Queue) to support concurrent access from multiple clients. SA is responsible of scheduling the replicas to process the requests and returning the reply to clients. Replicas consist of a logic token ring. At any time, only one member in replication group can hold the token and send message to the ring, and other member can only read message. The communication protocol between replication members is called TRP (Token Ring Protocol). In this protocol, token holder globally sorts the messages it receives but not yet delivers so that it can provide the global order to support agreed order service. To support sorting, each replica owns an ARQ (Adapter Request

176

D. Zhao, Y.-w. Li, and M.-t. Zhou

Queue) to buffer messages from SA and a RSRQ (Replicated Service Reply Queue) to buffer the reply messages to SA.

Fig. 2. Processing write-request in agreed order

Fig. 3. Processing read-request in agreed order

2.2 Processing Requests in Agreed Order Figure 2 shows the process of a write-request in agreed order service in the case of three replicated services. Client C sends a request message to the service through certain CAP, and SA enqueue to CRQ and assigns a SA sequence number to the message. SA multicasts to the replication group through ARP protocol. Replicas without token just put into their ARQs. The token holder receives the request, assigns a global sequence number. The difference between global sequence number and SA sequence number is that global sequence number is unique in all the messages from all SAs while the latter is only unique in the SA that generated it. Then, appends the SA sequence number, global sequence number and the ID of to the message multicasts to the whole replication group to indicate that the token is passed to the next replica According to the reply policy in ROWC, only which generated the global sequence number, sends the reply to the SA. This policy can avoid the acknowledgement flooding that often occurs in primary backup and active replication protocols. If has already delivered all the messages whose global sequence number are less than the newly generated one, then it can deliver and unicasts the result to the SA. At last, the SA sends to the client. The process of this service is completed. When and receive the multicast token by they get the global sequence number and SA sequence number, then find the buffered in their own ARQ. If they have already delivered all the messages before then they can deliver or else they assign the global sequence number in the token to and put into the ARQ buffer. Figure 3 illustrates that replicas can execute multiple continuous read-requests in parallel by sorting all read-requests. The difference from processing write-requests is that every read-requests can be only executed once in the assigned replica instance.

Research on Adaptable Replication Protocol

177

3 Performance Analysis and Simulation Suppose there are n hosts each of whom has a replica of each type of service. Let w be the ratio of write-requests to all the requests. Let number of replication instances processing read-requests are rq, while wq is the number of those processing writerequests, and assume every machine can process L requests per second.

Fig. 4. Response time (w=0.2)

Fig. 5. Response time (w=0.8)

In term of availability, ROWC has the same quality with active replication, primary backup and ROWAA. The availability If the instance number n is the same, then ROWAA has a better availability than Majority[4] and Grid[5]. The average execution time for a request in each replica is: To ensure impartiality, let the rq and wq have the maximum values[1, 2, 3, 4, and 5], so we can calculate the average execution time as shown in table 1. Considering the adaptability to distributed applications, ROWC balance the cost of different protocol with different write/read-requests density. When the frequency of requests is read-centric, ROWC employ the similar policy with ROWAA to service requests, only select one

178

D. Zhao, Y.-w. Li, and M.-t. Zhou

replica instance to service the read-requests so that the system performance is guaranteed. When the frequency of requests is write-centric, ROWC reconstructs the token ring, dynamically reduces the number x of replica instances that service the write-requests, to reduce the reply delay. The cost is that the availability of replicated service is lower than that of ROWAA. We implement ROWC protocol based on the prototype of our previous work[7], and compare it with other research. The simulation environment consists of 4 Sun Ultra-5 workstations running SOLARIS 8 connected by a 100M Ethernet network. There is one Java object replica in each workstation, viz. the replica number is 4. ARP and TRP protocol are based on multicast UDP. Figure 4 and figure 5 present the average request response time when the w is 0.2 and 0.8 respectively. Both of the figures demonstrate that ROWC protocol has a relative shorter response time than other replication protocols especially in the high load.

4 Conclusion The token ring structure in TRP protocol is similar with Totem[6]. However, ROWC classify the requests according to read/write frequency and employ different policy to achieve better performance in different types of applications. The performance result shows that the token ring which dynamically scales the replication instance numbers to categorize the read/write-requests of different density, thereby ROWC have better performance than other replication protocols.

References 1. F.B. Schneider: Replication Management Using the State-Machine Approach. In Sape Mullender (eds.): Distributed Systems, 2nd Edition, ACM Press Books, (1993) 169-197 2. N. Budhiraja, K. Marzullo, F.B. Schneider and S. Toueg: The Primary-Backup Approach. In Sape Mullender (eds.): Distributed Systems, 2nd Edition, ACM Press Books, (1993) 199216 3. P.A. Bernstein, V. Hadzilacos and N. Goodman: Concurrency Control and Recovery in Database Systems. Addison Wesley, Reading, MA, (1987) 265-294 4. R.H. Thomos. A Majority Consenus Approach to Concurrency Control for Multiple Copy Databases. ACM Transactions on Database Systems, 4(9), June (1979) 180-209 5. S.Y. Cheung, M.Ahamad and M.H. Amman The Grid Protocol: A High Performance Scheme for Maintaining Replicated Data. IEEE Transactions on Knowledge and Data Engineering, Vol.4, No.6, Dec. (1992) 582-592 6. Y. Amir, L.E. Moser: Fast Message Ordering and Membership Using a Logical TokenPassing Ring. In Proceedings of the International Conference on Distributed Computing Systems, May (1993) 551-560 7. D. Zhao, S. Yao and M. Zhou: Research and Design of a Middleware for Supporting WideArea Distributed Applications. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium: IPDPS 2002 workshops, April (2002) 217

Co-operative Monitor Web Page Based on MD5 Guohun Zhu and YuQing Miao Guilin Institute of Electronic Technology, 541004 Guilin, P.R.China [email protected] http://www.gliet.edu.cn/person/zhugh/

Abstract. This paper presents a new approach to detect defacement of the web page and have it back up in real-time. The mechanism base on integrity verified in a host and co-operative validated among a group of web sites. All upload page and data by legitimate user would be generated a digest file and copy to a backup directory. Monitor process not only audit collected the web page, but also interval exchange the web page audit index with adjoin node. While hacker defaced the web page, the monitor process will alarm master and recover page, also send a message to another web service. Lastly, the performance presented by contract with some utility tools.

1 Introduction Security is a serious issue for every IT teams today. Events such as the recently MSBlaster worm, the DoS attack against Microsoft.com and the web page defaced has raised of awareness of security [1] [2]. Two factors have weight heavily focused the web master to detected the web page defaced as soon as possible. One is the image of web sites, especially the government sites which always attacking on political conflict. The other is the worm virus some times infection from the web site. For instance, Code Red II or Nimda could infect the web site page and then infect the client host by browser while it was clicked [3]. In conventional, several approach have been developed to monitor the system or network, such host’s monitor tool, network’s monitor tool [4], IDS [5][6]. Unfortunately, if the intruder attack from remote, the host’s monitor tools such as ttymon would obtain noting [4]; while the hacker deface the web page with legitimate IP, the network’s monitor such as SNIF would filter noting yet [4]. As today’s IDS, according the anecdotal report that a growing number of customers are doing just one thing with their IDS: switching them off [6] This paper proposes a new cooperator monitor model that works with a group of web site. Every node verifies integrity with MD5 algorithm and exchanged message activity. If being defaced, the monitor on the node would restore the web page and also send an alarm messages to others node. While the monitor on node was stop by hacker, others node would warn the web master as soon as quickly. 1

1

This work is supported by the GuangXi Natural Science Foundation under grant no. 0310006.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 179–182, 2004. © Springer-Verlag Berlin Heidelberg 2004

180

G. Zhu and Y. Miao

2 MD5 for Web Page Monitor Consider that the web master and web page designer are always not same role, the designer commonly upload and maintain their work by some client tools, the web master generally is in charge of the web site running. So the integrity of web site is based on two premises. First the supervisor or Database Manger could not know the legitimate user password, but could modify or delete the password. Just as bank MIS, no bank clerk could know the customer password, but could modify while he forget password. Secondly the web page designer should be maintained their archive by access control. In case of a clandestine user control the web site by masquerader means, he could not change the user upload page with the legitimate user account, but could modify it with low level means The Integrity of MD5 monitor process has below step to do with legitimate user, the workflow was shown in figure 1. Firstly the client user must be register and gain a legitimate account and password, the secondly the Web Services will allocate a normal directory and a backup directory, meanwhile the Database save the register information. Then if the user uploads web page or others, the tools will calculate the file’s hash value and generate a digest index file and save to Database, also backup the upload file in a backup directory. After that, the background process would calculate the user file digest index interval, which would compare with the value in Database.

Fig. 1. The Monitor Model Based on MD5

3 A Co-operative Network Monitor The MD5 monitor tool on host machine has a leakage, if the hacker seizes the supervisor privilege and kills the monitor process, the MD5 monitor tool on single machine would not be action. Just as anti-virus or firewall install on single PC, some virus or insider attacks could stop the process at first and then destroy systems. How to avoid this? The answer is a grid of alarm model, every web site join together with cooperation. Consider two web servers build up a co-operative monitor network. Firstly the Web Server A and B setup a trust tunnel. Then the Monitor process on A host will scan not only local web page but also verify the web page on B machine.

Co-operative Monitor Web Page Based on MD5

181

Meanwhile the two machines exchange the digest index file structure. If the web page on B web server modified by legitimate user, the monitor on B machine will send a digest message to A web server, and A also audit the B web page again. If the MD5 index is equal, A monitor will regard the modify on B web server as legitimate, otherwise regard as illegitimate, then a warning email will send to A and B webmaster. The co-operative monitor brings about a new angle of attack. The malicious user could penetrate through the communication tunnel. In order to build up a trusted authentication mechanism that could be defense the attacker penetrate, the digital signature would be applied on the data exchange between each web server, which exchange data with the minimal encapsulation just as it implement in mobile IP technology [7].

Fig. 2. The logging file of virus

Fig. 3. Contrast test result of several processes

According the Anderson report [8], the malicious users could be identified three classes: masquerader, misfeasor and the clandestine user. Now assume that three classes malicious user want to deface the web page of one web site, the web services has four tools: SNIF, ttymon, IDS, and a tool base this MD5 model. Consider such anomaly behavior, the hacker browses and changes the web page especially the home page. If the hacker is a masquerader, the SNIF would be reject it access, ttymon only regards it as a legitimate user, the IDS would reject also, and the MD5 monitor tool would be alarm if any page was changed; if a misfeasor does such action, only MD5 monitor tool and IDS could be detected; if a clandestine

182

G. Zhu and Y. Miao

user does those defaced and deleted the audit collection data, only the MD5 monitor tool could give an alarm although the log file is loss. So this MD5 monitor tool could audit all classes of malicious.

4 The Test Result of an Example The cooperation model has build up between two web sites, one is a government web site in Guilin Statistical Bureau whose OS is Windows 2000 and the HTTP server is IIS, another is in campus web site whose OS is Linux and web server is iPlanet. The performance of the model was tested from function to resource cost. The test step is described below. Firstly a campus’s user registers a new account “bbb” to access the statistical bureau. Then access the website and upload some files. In statistical bureau web site, the user root directory file modify from terminal with mimic malicious user. The Monitor tool records all action in the virus log file that is shown in Figure 2. The performance of resource cost compare with Win Word200, IPClient and Notepad, the operating includes program start, execute and close. From the Figure 3, it could be conclude that the monitor tool performance is very small.

5 Conclusion In this paper, a novel co-operative protect web site model has been proposed. Every web site installs a monitor tool, which cost the not only monitor the web page and data on local host, but also monitor web page of the co-operative web site.

References 1. 2. 3. 4. 5. 6. 7. 8.

Will Knight.ZDNet UK-News-Hackers Attack Government Web Site. http://news.zdnet.co.uk/internet/0,39020369,2084966,00.htm. by platon on June 19, 2001.Alleged Israeli Hackers Deface Uae News Website. http://www.xatrix.org/article402.html Bank confirms crackers break into website. http ://w w w. landfield. com/isn/mail-archi ve/2001 /Jun/0115.html J. Alves-Foss. An Overview of SNIF: A Tool for Surveying Network Information Flow. In Proc. Internet Society 1995 Symposium on Network and Distributed System Security, IEEE Computer Society Press, (1995) 94–101. Rebecca Gurley Bace. Intrusion Detection. Macmillan Technology Publishing, Indianapolis .2000 David Braue. Intrusion detection: caught in its own web?. http://www.zdnet.com.au/itmanager/technology/story/0,2000029587,20278214,00.htm Guohun Zhu, Jinyi Lu, Jiexin Li. Implementation Tunnel in Mobile IP with Java. Proceeding of the Cross-Strait Information Technology Workshop. Southeast University,, Nanjing China. (2002) 350–356 James P. Anderson. Computer Security Threat Monitoring and Surveillance. James P.Anderson Co., Forst Washington, PA (1980).

Collaboration-Based Architecture of Flexible Software Configuration Management System* Ying Ding, Weishi Zhang, and Lei Xu Department of Computer Science and Technology, Dalian Maritime University, Dalian, 116026, P.R.China {abbyying, teesiv, xulei}@dlmu.edu.cn

Abstract. Software configuration management (SCM) products have evolved over the years and have become large and powerful, but they are not flexible enough to allow the user to pick the kind of control. In our work, we focus on two features of software configuration management systems: flexibility and collaboration. We present a new flexible software configuration management system called FSCM, which is designed to address the limits of current systems. In this paper, the main features of FSCM are discussed and a collaboration-based architecture of FSCM is presented.

1 Introduction SCM is one of the most important activities that assure software quality in software development. To support SCM, a great variety of tools have been developed over the years. Those existing SCM tools provide their own level of control, which is either loose or tight, over the users and evolution of the products and few are flexible enough to allow the user to pick the kind of control [1]. We discuss three aspects of the flexibility of SCM, i.e., company-level, project-level, and individual-level. First, most configuration management tools provide only specific functionalities that cannot grow with software companies. Second, most configuration management tools are not agile and they cannot change appropriately according to different projects. Third, most configuration management tools cannot implement different levels of CM control over different users. A project may span multiple teams located at geographically dispersed sites connected by a wide area network such as an organizational intranet or the Internet [4]. Many development teams, especially distributed teams, require process support to adequately coordinate their complex, distributed work practices. SCM systems can increase their control of the development process. It facilitates communication among project teams by providing users with access to all project assets through a central repository that is supported by workspace and process management. In [1] and [2], it

* Supported by the National Natural Science Foundation of China under Grant No. 69973009 and the National High Technology Development 863 Program under Grant No. 2003AA113020. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 183–186, 2004. © Springer-Verlag Berlin Heidelberg 2004

184

Y. Ding, W. Zhang, and L. Xu

is suggested that process management and team collaboration should be incorporated into CM systems. According to above discussion, it is clear that a software configuration management system, which can impose different levels of control over the users and evolution of products and can support process management and team works, is needed. In this paper, we present the architecture of a flexible software configuration management system, called FSCM, which establishes and maintains the integrity of software artifacts throughout software lifecycle.

2 Flexible CM Control Policy FSCM attempts to enhance adaptability by allowing users to choose policy of access control, version control, and change control. FSCM provides two modes to implement access control: role-based access control and file-based access control. The role-based mode controls users access to configuration items according to their roles. The filebased mode controls users access to configuration items by defining their permissions to folders and files. The two modes can also be mixed up to use. FSCM offers two version control modes. One is simple checkout-edit-checkin model. The other model is complex and provides three checkout modes that include read-only, exclusive-write and share-write. Only when a developer has a file in the mode of exclusive-write or share-write, can he make changes to the file. FSCM defines a process to control the changes made to configuration items. A user submits request for a change. People who are identified to have responsibility for accepting change requests and allocating the work will determine the requests according to the evaluation criteria. Once a change has been completed, affected users are notified by electronic mail or instant messaging that a new version is available.

3 Collaboration Support FSCM enhances collaboration among people and process by providing process management and workspace management. Process management helps to collaborate among people and process, and workspace management helps to collaborate between people and people. FSCM provides processes, which are defined as a related set of activities and CM control policy, and monitors their practitioners to ensure that they follow the defined processes. FSCM also automates the processes at a certain level. For example, when a member joins the project team, FSCM will trigger automatically the operation of assigning roles and tasks to the new member. Workspace provides software engineers with their own work area for creating and maintaining of a family of products. At the same time, they can communicate and coordinate effectively. Workspace management help users to track the progress of project, to provide the exact versions of source files, to update project data instantaneously, and to inform changes having been made.

Collaboration-Based Architecture

185

Fig. 1. FSCM Architecture

4 Architecture of FSCM The architecture of FSCM is shown in Fig. 1. It is comprised of three layers: the configuration library layer, the basic management layer and the high-level management layer. The configuration library layer is responsible for storing configuration items data and their related configuration information. The basic management layer provides with policies of configuration management, process management and a collaborative environment for software development. The high-level management layer supports high-level management functions including version control, change control, and status accounting.

4.1 Basic Management Basic management is divided into following three parts: control policy, process management, and workspace. Control Policy Management help users to choose the mode of implementing access control, version control, and change control. It ensures flexible configuration management in software development life cycle. Process Management provides some process models for software development and allows users to choose their development process as desired by the project. Users can also define their own development process models. The definition of process model includes the developing phases, basic roles, their related permissions, and CM control policy. When using the defined process model, users can also adapt it according to their needs. Workspace Management helps team members to get the exact versions required to complete a special task and create new versions of the configuration items they have changed. Workspace also provides two functions: task management and notify. Task management helps users to track the progress of projects. Notify helps members of project team to be aware of the changes of software state. FSCM provides two ways to support team awareness: Email and IM (Instant Messaging).

186

Y. Ding, W. Zhang, and L. Xu

4.2 High-Level Management FSCM provides following configuration management functions in high level: version control, change control and status accounting. Version control tracks changes to every file and directory, maintaining complete annotated version histories of every artifact. Change control provides a powerful platform to control changes to all artifacts that evolve during development. FSCM defines a formal approval process for requesting and approving changes. The change control process can also be customized based on team needs, development phases and change type. Status accounting records and reports the status of software problems and change requests. Standard report offers the project information, which is in common use. Users can also customize report form to get the information useful for their own.

5 Conclusions In this paper, we present the architecture of FSCM that provides a set of features, including flexible control policy, process management and workspace management, and that enhances the ability of flexible configuration management and collaboration in software development. The collaboration-based architecture of FSCM is designed to support project teams, especially geographically distributed teams, to develop products effectively.

References 1. Dart S.A.: Concepts in Configuration Management Systems. Proceedings of the 3rd International Workshop on Software Configuration Management. New York, NY, USA: ACM Press, (1991) 1-18 2. Mark C. Chu-Carroll and James Wright: Supporting Distributed Collaboration through Multidimensional Software Configuration Management. B.Westfechtel and A.van der Hoek (Eds.): SCM 2001/2003, Lecture Notes in Computer Science, Vol. 2649. Springer-Verlag, Berlin Heidelberg New York (2003) 40–53 3. Jacky Estublier, Anh-Tuyet Le, and Jorge Villalobos. Using Federations for Flexible SCM Systems. B.Westfechtel and A.van der Hoek (Eds.): SCM 2001/2003, Lecture Notes in Computer Science, Vol. 2649. Springer-Verlag, Berlin Heidelberg New York (2003) 163– 176 4. Israel Z.Ben-shaul, Gail E. Kaiser: Federating Process-Centered Environments: The Oz Experience. Automated Software Engineering, Vol. 5. Kluwer Academic Publishers, Netherlands (1998) 97-132 5. M.Oivo and S.Komi-Sirvio. Managing the Improvement of SCM Process. PROFES 2002, Lecture Notes in Computer Science, Vol. 2559. Springer-Verlag, Berlin Heidelberg New York (2002) 35–48 6. Jacky Estublier, Sergio Garcia, and German Vega. Defining and Supporting Concurrent Engineering Policies in SCM. B.Westfechtel and A.van der Hoek(Eds.):SCM 2001/2003, Lecture Notes in Computer Science, Vol. 2649. Springer-Verlag, Berlin Heidelberg New York (2003) 1–15

The Research of Mobile Agent Security Xiaobin Li, Aijuan Zhang, Jinfei Sun, and Zhaolin Yin Department of Computer Science and Technology China University of Mining and Technology XuZhou 221008 [email protected]

Abstract. Mobile Agent Security plays an important role in the application of Mobile Agent. This paper analyzes the security threats of Mobile Agent. According the character of Mobile Agent, the threats coming from the intention alteration are the most dangerous one. How to protect the agent from being attacked from the outer vicious uses’ modification has becoming a very important issue. To solve this problem, at last some security settlement strategies are presented to protect Mobile Agent. Among these methods, obfuscation technique may be an efficient way to prevent the vicious users from analyzing the agents’ codes and altering the agents’ intention.

1 Introduction There are lots of great advantages of Mobile Agents technology for many application areas [1]. But the application has been prevented from widely being used because of security problems. This security issue has become a research focus of academia and enterprise. Many well-known agent systems such as Aglet of IBM [2], Telescript of General Magic, Grasshopper of IKV++, Mogent of NanJing University pay much attention into the agent system security. Grasshopper platform authenticates all the roles in the system by Public Encryption mechanism. The communication between agents may be set up based on SSL. All the methods above would protect agent system from being attacked of vicious agent. But there is still a series of problems to solve. At first this paper analyzes all security threatens and attacks during the agent life cycle. Traditional security methods should still play a certain role in agent protection. But mobile agent has its particularity and some pertinence protection steps should be proposed. According the character of Mobile Agent, obscuration technique is proposed for the agent code security when agents are transferred from one to another platform or running on a remote host.1

1

The research has been partially supported by The Found of Key Laboratory for Novel Software Technology at Nanjing Unversity

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 187–190, 2004. © Springer-Verlag Berlin Heidelberg 2004

188

X. Li et al.

2 Security Threats in Mobile Agent System Distributed computers are connected together by mobile agent platforms to construct a computation infrastructure. Lots of distributed application may run simultaneously on these platforms. These computers may belong to different organization and have different purposes. There are all kinds of probable security threats and attacks under this circumstance. According to the life cycle of mobile agent and running process and running environment agent may be distinguished into two phases: transferring phase and running phase. There are certain hidden security troubles in each phase. Security threats in transferring phase. Agent should be transferred into destination platform to run when it is created. During this process agent would have the probability of being attacked from vicious users or programs. Security threats in running phas. Mobile agent would be recoded and resolved to a complete entity when it comes to a destination host. In this course agent would encounter all kinds of attacks from vicious users and platforms. Such as disguise, denial of service, wiretapping, alteration [3].

3 Settlement Strategies for Mobile Agent Security In essential, agent is a program that on behalf of user. So the traditional security methods such as data encryption or figure signal may also be applied in agent security. For example, to protect agent from message leakage or to ensure the identity of sender and receiver these traditional security methods could be applied. However, agent has its particularity that comes from the mobility and execution on the remote host. To solve the agent security, the special security problems should be solved. Code obfuscation is that hidden method is applied to change program that its running is not different for virtual machine. While the illegal users who wish to understand these codes may be puzzled. In detail, suppose a series of obfuscation transformation function are given: T={T[l],T[2],...T[n]}.Program p has lots of context such as object, class, method and variable description. These may be expressed with {S[1],S[2],...,S[k]} respectively. Formula Q={...,R[j]=T[I](S[j]),...} may be applied to transform the original program into a new program Q={R[1],R[2],...R[n]}.Some conditions should be obeyed in this transformation: Q has the same function as P. In other words, obfuscated codes should remain the original semantics. Q should has more intension obfuscation than P. It is much more difficult to understand the programmer’s intentions from analyzing the syntax of Q than to directly analyze the P’s. The execution efficient should be improved. In other words, there should be little time-consuming and space-stored during the obfuscation process. At present there are lots of obfuscation tools. Most of them transform programs based on lexical transformation. Typically all the names of classes, methods, pack-

The Research of Mobile Agent Security

189

ages and objects are replaced by other identification. Some have already support to rename all the programs in one project by certain arithmetic. Although this process may make the attacker puzzled, more careful observer or test would still have the probability to guess the true program purpose. To perfect the obfuscation technique, control transformation and data transformation are proposed by Christian Collberg and Clark Thomborson [4].

Fig. 1. Control transformation by predication insertion

Fig. 2. A data transformation example

3.1 Control Transformation In fact control transformation is a method that hide the meanings of predication. I.e. once the predication p is transformed, p would be hard to guess its outcome. For example, if a predication p always has the value of False (True), another predication P1 (P2) could be defined; if p has an uncertain value, a predication P3 could be defined. Once a hidden predication is given, the original program would be thrown into confusion by control transformation. As is shown in Figure 1(a), supposing there is a block program [A;B], a predication P2 is to be inserted between A and B.

190

X. Li et al.

From view of attackers, program B looks like only running under the condition of P2 equals true. In the Figure 1(b), program B may be divided into two different version: B and B1 (B1 may be lexical transformation from B). A predication P3 is inserted after program A. When P3 equals true, B will be executed. While P3 equals false, B1 will be executed. Attackers would be puzzled with this program through this transformation.

3.2 Data Transformation Not only code in program but also data could be obfuscated. The basic idea of data transformation is to separate one variable into one or more variables and transform the program reasonably so that the same semantics function will be remained but the readability is shielded. For instance, a boolean variable V may be split into two integers: p and q. The substitution is shown as Figure 2. Expression 2 and expression 3 have different value but actually they all give the same value false to Boolean variable. Expression 4 has a more complex form that would improve the difficult of readability.

4 Summary Remote computation thinking has been introduced into mobile agent. The notion of distributed computing has been expanded because of mobile agent. By contrast to traditional C/S structure, this technique has lots of advantages. But there is a big gap between theory and practice because of security problem. This paper analyzes all the security threatens in the life cycle of agent and introduces a comprehensive security scheme for mobile agent. Of course mobile agent would be confronted with more new security problems once they are used for more application areas. This is also the direction that should be devoted into.

References [1] [2] [3] [4]

Yin Zhaolin, “The Developing of Electronic Business based on FIPA Agent”, IceCe 2001 Joseph Tardo, “Mobile Agent Security and Telescript” http://citeseer.nj.nec.com/tardo96mobile.html Wayne Jansen, Tom Karygiannis, “Mobile Agent Security”, National Institute of Standards and Technology, Special Publication 800-19,August 1999 Christian S. Collberg,“Watermarking,Tamper-Proffing and Obfuscation-Tools for Software Protection”, IEEE Transactions on Software Engineering. Vol 28,No,8,August 2002

Research of Information Resources Integration and Shared in Digital Basin* Xiaofeng Zhou, Zhijian Wang, and Ping Ai School of Computer & Information Engineering, Hohai University, Nanjing 210098 z_xiaof eng@sina. com

Abstract. The digital basin is a complicated heterogeneous distributed system. There are a lot of information resources in it, include data resources, technology resources, system resources, application resources, etc. It is a primary problem needed solve and purpose in the digital basin how integrates this resources, turns it into an organic whole, provides maximum shared. Constructing information resources integration framework using OGSA not only can solve the integrating problem of heterogeneous distributed information resources but also can ensure simple high efficiency shared of all kinds of information resources. The application system development based on OGSA is high efficiency also.

1 The Definition and Meaning of the Digital Basin The concept of Digital Basin explicates from Digital Earth. It is an information aggregation about the basin on Digital Earth. Further on, the Digital Basin is an organic whole that digitize all the information of the basin and its interrelated, and structure by using form of the spatial information. Thereby it effectively reflects the integrated and true situation of the whole basin from each side, and fills various needs of transferring information. The Digital Basin is a system platform inosculating all kinds of digital information within basin based on basin spatial information, it is a uniform and digitized reappearances of true basin and the phenomenon related it. The Digital Basin consists of database included various information and sub-system of data gathering, processing, exchanging and managing, can make comparison and analysis of data of different period based on different needs, and penetrates its variational rule. The primary purpose of the Digital Basin is to solve existent problems now for instance difficulty of data share, lack of efficiency of software programming, lack of degree of software standardization, difficulty of ensuring software quality and difficulty of system integration.

* This work was supported by the 863 and 973 of China under the grant No.2001AA113170 and No.2002CB312002, and the science & technology innovation foundation of Hohai university under the grant No 2002407243 M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 191–194, 2004. © Springer-Verlag Berlin Heidelberg 2004

192

X. Zhou, Z. Wang, and P. Ai

2 The Needed Solve Important Problem in the Digital Basin The needed solve important problem is integration and shared of all kind of information resources in the digital basin. The information resources in the digital basin include data resources, technology resources, system resources, application resources, etc. A mass of information resources distribute in different position, they are different with each other. All of them will be used many users, and these users have different purpose and distribute in different position. The solving of problem integrating and sharing these information resources is very difficult. So it is a key problem of the digital basin.

3 The Method of Information Resources Integration and Shared in the Digital Basin Now there are many technologies to integrate and share the information resources. In these technologies, the OGSA is the most new and effective method. It can entirely satisfy need of the digital basin and better solve problems of multi-sources, heterogeneous, sharing, individuation, etc using OGSA.

3.1 The OGSA The Open Grid Services Architecture (OGSA) is an evolution towards a Grid system architecture based on Web services concepts and technologies offered by globus. The Grid is defined the aggregate of Grid services and Grid service is a kind of Web service in OGSA. Building on concepts and technologies from the Grid and Web services communities, this architecture defines a uniform exposed service semantics (the Grid service); defines standard mechanisms for creating, naming, and discovering transient Grid service instances; provides location transparency and multiple protocol bindings for service instances; and supports integration with underlying native platform facilities. The Open Grid Services Architecture also defines, in terms of Web Services Description Language (WSDL) interfaces and associated conventions, mechanisms required for creating and composing sophisticated distributed systems, including lifetime management, change management, and notification. Service bindings can support reliable invocation, authentication, authorization, and delegation, if required. OGSA accords with standard framework of Web service. OGSA extends concept of Web service, offers the concept of Grid service to support temporary service, because there are a lot of temporary service in Grid but Web service only solves problem of discovering and wakening permanent service. ServiceType is extended element of WSDL defined by OGSA, it is used to describe the Grid service. For its excellent interoperability and integrity, OGSA is especially suitable for constructing the information resource integration framework for the Digital Basin, which is distributed, heterogeneous and complicated.

Research of Information Resources Integration and Shared in Digital Basin

193

3.2 The Integration Framework of Digital Basin Based on OGSA In the framework, the storage facility is for the storing various information needed by the Digital Basin, including various kinds of databases, knowledge database, and etc. The framework manager is responsible for the coordination of the each parts of the framework, which is mainly providing services like Fig. 1. The integration framework directory service, information publish, load balancing, configuration, certificate authority, message transfer, performance inspect, accounting and priority, etc. The data service provides integration of heterogeneous distributed data resources and data shared in whole, which primary includes data services, data exchange services, spatial data services, etc. The main body of this framework is the application support service, which implement the functional logic of digital basin. And these also can be on different platforms and implemented in different languages. To do so, these modules must firstly be described by extended WSDL and registered and deployed in the registration center using UDDI. The application is the aggregate of multifarious applications integrated by the data services and the application support services to used the basin administration

3.3 The Application Development Based on the Integration Framework The application systems development based on the integration framework is very easy. This method divides developing process into four steps, which are decomposing, realizing, registering and integrating. The detail of flow sees figure 2. At developing an operation application system, first, you should decompose the application sys-

Fig. 2. Application development flow

194

X. Zhou, Z. Wang, and P. Ai

tem to a series of minimal independent function modules based on actual need of the operation application system and criterion of software design. Each function modules must be self-integrate which realizes material function of the operation application system and has not direct relation with other function modules. The secondly step is realizing these function modules, which primarily is realized by component technology. Of course, it can be realized middleware, object, subprogram, etc. Here these technologies are called component by a joint name for simpleness. These components must can independent run at home and are right. But their running platform and realizing technology can different with each other. The thirdly step is registering these components on the registry. First you must describe these components using extended WSDL protocol, and then register them using UDDI protocol. Then they become usable Grid services. The lastly step, you integrate a series of Grid services to the application system according with user’s actual need using WFDL protocol etc based on actual operation flow (on logic). Of course you only need the lastly step when the Grid services have existed. These Grid services are developed by anterior operation application system. It is the key of which OGSA realizes information resources integration and shared

4 The Application Example Now we are developing the Digital Yellow River application service platform system for Yellow River Conservancy Commission baser this framework. It is the core of the Digital Yellow River project, the important infrastructure supporting application developing and running, and platform of information resource shared. The system adopts J2EE framework. We use Weblogic of BEA company realizes the function components. The users can easily share all kinds of the information resources in this system by IE, moreover needn’t know what these resources realize and where these resources are in.

5 Conclusion Although the Grid services technology has not been mature, but it’s core has basically finalize the design. Now there are many produces of famous companies to support it. Using the Grid services constructs the framework of the digital basin has been proved highly effective in theory and in practice.

References 1. Ian Foster, The physiology of the grid, http://www.gridforum.org/ 2. Web Services Architecture Requirements, http://www.w3.org

Scheduling Model in Global Real-Time High Performance Computing with Network Calculus Yafei Hou, Shiyong Zhang, and Yiping Zhong Department of Computing and Information Technology Fudan University 200433 220 Handan Road, Shanghai 200433, P.R. CHINA

Abstract. We first put forward abstractive model of global scheduling pattern from the global scheduling pattern. Under the framework of the abstractive pattern, we can use a rate-latency server into the QoS parameters, although it can not be the optimal scheduling algorithms in the special schedulers, but for the global scale. It can be easy to get the delay bounds of end-to-end in the uncertain surroundings using some definitions and theories of network calculus.

1 Introduction Grid computing is an emerging paradigm for next generation distributed computing. In such an environment, it is necessary to consider the quality of service (QoS) requirements of the different clients to ensure that the resources are used in the most beneficial manner, there are many ways in doing that, the Resource Broker (RB) proposed in [1] integrates with the ARS presented in [2]. However, the occurrence of re-negotiation adds considerable overhead to the system. The [2] implements an Advance Reservations Server (ARS) that works in conjunction with the Dynamic Soft Real Time (DSRT) system [3] to reserve CPU resources in advance. However, in practice, most applications have QoS requirement that are negotiable, it leads to higher number of rejected reservation requests. And the [4] proposes and evaluates several algorithms for supporting advance reservations in supercomputer scheduling systems. But it allocates the “time slots” exclusively, the applications are assumed to operate on a “best effort” basis and the reservation requests are assumed to have different priority than the applications. But all ways or methods presented before can’t deal with the global real-time high performance computing, at the same time, there are so many different scheduling algorithms. Of course some of them can improve the scheduler’s performance, but for the global scale, they aren’t perhaps the best choices. In this paper, we first put forward abstractive model of global scheduling pattern from the global scheduling pattern. Under the framework of the abstractive pattern, we can use a rate-latency server into the QoS parameters. It can be easy to get the delay bounds of end-to-end in the uncertain surroundings using some definitions and theories of network calculus. In Section 2, we introduce scheduling model in global real-time high performance computing. In Section 3, we present the scheduling models for global scheduling pattern using network calculus, in Section 4. We conclude the whole paper.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 195–198, 2004. © Springer-Verlag Berlin Heidelberg 2004

196

Y. Hou, S. Zhang, and Y. Zhong

2 Scheduling Model in Real-Time High Performance Computing The scheduling model described in [5] suggests a pattern for global real-time scheduling. The execution model for this pattern is depicted in Fig. 1. From the process in Fig. 1, we can conclude the global scheduling pattern into the abstractive model of global scheduling pattern as depicted in Fig. 2. We can look client proxy as a scheduler with a link and many clients send their QoS requirements to it, it can server the requirements with rate R1. So the rate of the scheduler is R2. And of course the server proxy can also be seen as a scheduler with the rate R3. So we can use this abstractive model to research the QoS scheduler performance of real-time high performance computing.

Fig. 1. Global Scheduling Pattern

3 Scheduling Models for Global Pattern Using Network Calculus The basic tasks of scheduler can be found in [5]. The QoS parameters specified by a server can include execution time, accuracy, security, etc. Of all parameters, the delays of end-to-end are more important than others. From the abstractive model of global scheduling pattern in Fig. 2, we can find that packets come across many models of schedulers from 1 to 16 in Fig. 1. The main uncertain aspect of delay of end-to-end is produced by so many schedulers. So we adopt new scheduling models which can guarantee end-to-end delays. It comes from the network calculus [6], here we use its definitions and some theories. Definition 1. Consider a node that serves a flow. Packets are numbered in order of arrival, Let be the arrival and departure times. We say that a node is

Scheduling Model in Global Real-Time High Performance Computing

197

the a guaranteed rate (GR) node for this flow, with rate R and delay e if it guarantees that where is defined by Equation (1)

The variables

(“Guaranteed Rate Clocks”) can be interpreted as the departures

times from a FIFO constant rate server, with rate The parameter e expresses how much the node deviates from it. Note however that a GR node need not be FIFO. A GR node is also called “Rate-Latency server”, and its service curve is

Fig. 2. Abstractive Model of Global Scheduling Pattern

And there are so many scheduling algorithms which are accord with the GR nodes and the Rate-Latency nodes have many important characters, one of them is the delay of Table 1. When the servers adopt some scheduling algorithms in Table 1 [7], the delay bound can be got as latency. The meanings of parameters in Table 1 can be found in [7].

An other important character of Rate-Latency server is the that it can realize the concatenation of nodes, and the results come from the theory as follows[6]: Theorem 1. Assume a flow traverses systems and in sequence. Assume that a service curve of two systems offers a service curve of

to the flow. Then the concatenation of the to the flow.

198

Y. Hou, S. Zhang, and Y. Zhong

Then the concatenation of m GR nodes (that are FIFO per flow) with rates latencies where

is GR with

and

and

is the maximum packet size for the flow. The

is due to

packetizers[8]. From that a bound on the end-to-end delay through such a concatenation thus can be obtained. From Theorem 1, it can get the delay bounds easily with using the concatenation of nodes that adopting the GR models. Using those ways, the delay bound of end-to-end can be estimated at client, and it can also decrease the protocol complexity of global scheduling pattern with guaranteeing QoS.

4 Concluding Remarks In this paper, we first put forward abstractive model of global scheduling pattern from the global scheduling pattern in section 2. Under the framework of the abstractive pattern, we can use a rate-latency server into the QoS parameters in section 3, although it can not be the optimal scheduling algorithms in the special schedulers, but for the global scale. It can be easy to get the delay bounds of end-to-end in the uncertain surroundings using some definitions and theories of network calculus.

References 1. K. Kim and K. Nahrstedt, “A Resource Broker Model with Integrated Reservation Scheme,” IEEE International Conference on Multimedia and Expo 2000 (ICME ’00), Aug. 2000. 2. G. Garimella, “Advance CPU Reservations with the Dynamic Soft Real-Time Scheduler”, Master’s Thesis, University of Illinois at Urbana-Champaign, 1999. 3. H. Chu and K. Nahrstedt, “A Soft Real Time Scheduling Server in UNIX Operating System,” European Workshop on Interactive Distributed Multimedia Systems and Telecommunication Services (IDMS ’97), Sep. 1997. 4. W. Smith, I. Foster, and V. Taylor, “Scheduling with Advanced Reservations,” International Parallel and Distributed Processing Symposium (IPDPS ’00), May 2000. 5. Victor Fay Wolfe & Lisa DiPippo Russ Johnston & Trudy Morgan etc, “Patterns in Global Dynamic Middleware Scheduling and Binding,” http://www.cs.wustl.edu/~mk1/RealTime Patterns/ OOPSLA2001/submissions/VicFayWolfe.pdf 6. Le Boudec. J.Y., Thiran. P. Network Calculus, Berlin: Springer-Verlag. 2002. 7. D. Stiliadis and A. Varma, Latency-rate servers: A general model for analysis of traffic scheduling algorithms, IEEE/ACM Transactions on Networking, vol. 6, pp. 611˜625, Oct. 1998. 8. Le Boudec,“Some properties of variable length packet” Proceedings of ACM Sigmetrics 2001, Boston, June 2001

CPU Schedule in Programmable Routers: Virtual Service Queuing with Feedback Algorithm Tieying Zhu JiLin University, China Northeast Normal University, China [email protected]

Abstract. Programmable routers extend the traditional store-and-forward paradigm to a store-process-and-forward paradigm. So how to schedule CPU between competing per-flow processing becomes a key problem. This paper generalizes the processing model of programmable routers and presents a CPU scheduling algorithm called Virtual Service Queuing with Feedback VSQF to schedule processing resource, which queues packets by the amount of virtual service of per-flow based on the estimation of execution time of a packet before processing and updates the amount of virtual service with actual execution time using feedback after processing. The simulation shows VSQF has good fairness.

1 Introduction The basic function of routers is store-and-forward. When considering the resource scheduling, it only concerns with the bandwidth allocation. For programmable routers, whether the single PC-based routers [1, 2, 3] as shown in figure 1 [4], or Processing Engine (PE) based routers [5, 6], as shown in figure 2 [7], which are controlled by the central control processor and linked by switch fabric, all introduce the processing complexity in data path. So in such routers, not only the bandwidth but also the processing resources should be shared among the competing per-flow queues. In this paper, the section 2 discusses the related work. Section 3 describes the CPU scheduling algorithm called VSQF and shows its fairness by the simulation results. Conclusions are drawn in section 4.

2 Related Works Most research work of programmable routers focus on the router’s structure such as [1, 2, 3, 5, 6]. For resource allocation and scheduling, Packet Fair Queue (PFQ) has been widely studied in the context bandwidth scheduling such as WFQ [8] [9] and SFQ [10]. [11] puts forward a non time tag algorithm using the concept of the amount of virtual service and proves it has less complexity but equivalent perform M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 199–202, 2004. © Springer-Verlag Berlin Heidelberg 2004

200

T. Zhu

ance. WFQ, and the algorithm in [11] can’t be used in scheduling CPU directly, for they need the precise knowledge of the execution time to update the virtual time or the amount of virtual service. EFQ [7] queues the packet with the estimated finish time tag based on the packet length, given the strong correlation between packet size and execution time. [12] uses PS algorithm to allocate CPU among different flows.

Fig. 1. The model of software-based routers includes the classifier, several forwards and the scheduler.

Fig. 2. The model of PE-based routers includes the processor scheduler, several processing engines and the link scheduler.

3 Virtual Service Queuing with Feedback Algorithm Our work is aimed at providing a simpler way than time tag based algorithms, using concept of the amount of virtual service [11] which is targeted in bandwidth scheduling to schedule CPU resource. VSQF accumulates the amount of virtual service received by a flow according to the CPU reserved rate during a period of time, selects the head packet of a per-flow queue which will have the smallest amount of virtual service according to the estimated processing time before processing, and update it with the actual execution time after processing. The estimated processing time is based on the actual execution time of last head packet, given the packet processing time varies not significantly between packets in a classified per-flow queue.

CPU Schedule in Programmable Routers

201

The detailed description is as follows: 1. Flow i is selected only when (1) is met:

Where S is the flows in the system. Original the amount of virtual service of flow is 0. 2. After flow i processed, update the amount of virtual service of each flow with the actual processing time.

3. If two or more flows meet (1), the first flow is selected. 4. If flow i meets (1) and has no packet to process, its amount of virtual service is added by the maximum of the current actual processing time.

In simulation, six flows are processed by the same application with flow 1 reserving 50% of the processing resource and the rest of flows reserving 10% each. Each packet is created during a random interval. Figure 3 shows the response time of packets in six flows during a period of time. All the flows almost get the relative CPU reserving rate. So it is fairness according to the relative CPU reserved rate.

Fig. 3. The packet response time of six queues in different schedule time increases steadily with the increasing of packet number. All the flows almost get the relative CPU reserving rate.

4 Conclusion This paper generalizes the processing model of programmable routers and presents Virtual Service Queuing with Feedback VSQF. It accumulates the amount of virtual service received by a flow according to the CPU reserved rate during a period of time

202

T. Zhu

and queues packet by the amount of virtual service based on estimated execution time of a packet. It is fairness. Further experiments should be done to consider the bursting packets.

References 1.

Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M. Frans Kaashoek. The Click Modular Router. ACM Transactions on Computer Systems, 18(3), pp. 263-297, Aug. 2000. 2. L. Peterson, S. Karlin, and K. Li. OS Support for General-Purpose Routers. Proceedings of the 7th Workshop on Hot Topcs in Operating Systems, Mar. 1999. 3. D. Decasper, Z. Dittia, G. Parulkar, and B. Plattner. Router Plugins: A Software Architecture for Next Generation Routers. IEEE/ACM Transcations on Networking, 8(1), pp. 215, Feb. 2000. 4. Yitzchak Gottlieb and Larry Peterson. A Comparative Study of Extensible Routers. Proceedings of OpenArch ’02, Jun. 2002. 5. Scott C. Karlin. Embedded Computational Elements in Extensible Routers. PhD thesis, Department of Computer Science, Princeton University, Jan. 2003. 6. David Taylor, Jyoti Parwatikar, Ed Spitznagel, Jon Turner, Ken Wong. Design of a High Performance Dynamically Extensible Router. DARPA Active Networks Conference and Exposition, May 2002. 7. Pappu, P., Wolf, T. Scheduling Processing Resources in Programmable Routers. Proceedings of IEEE INFOCOM 2002, Jun. 2002. 8. A. Demers, S. Keshav, S. Shenker. Analysis and simulation of fair queuing. J Internetworking Res Experience, 1990, 1(10), pp. 3-26. 9. Jon Bennett, Hui Zhang. WF2Q: worst-case Fair Queuing Algorithms. IEEE INFOCOM’96, Jun. 1996. 10. Pawan. Goyal, Harrick M. Vin and Haichen Cheng. Start-time Fair Queuing: A Scheduling Algorithm for Integrated Services Packet Switching Networks. Proceeding of ACM SIGCOMM, Aug. 1996, ACM, PP. 157-168. 11. Feng Suili, Ye Wu, Sankar Ravi, Ke Feng. A Novel Packet Scheduling Algorithm without Timestamp. Journal of China Institute of Communication. Vol. 7, 2002, pp. 27-32. 12. Xiaohu Qie, Andy Bavier, Larry Peterson, and Scott Karlin. Scheduling Computations on a Software-Based Router. Proceedings of SIGMETRICS ’01, Jun. 2001.

Research on Information Platform of Virtual Enterprise Based on Web Services Technology Chao Young and Jiajin Le Department of Computer, Donghua University, 200051, Shanghai, China {davidyoung, lejiajin}@mail.dhu.edu.cn

Abstract. Paper point out that composition of Web Services between enterprises is ineffective and costly at present technology level. A solution is proposed to compose Web Services of virtual enterprise. In the solution, it is the key to unify data scheme and services pattern in related industry domain.

1 Introduction VE is considered as new running schema in new century. This running schema is suitable to inconstant market. Under this schema, enterprises don’t design, produce and sell product by themselves, but look for the best partners all over the word to set up alliances. They produce their production making full use of their own advantages at the least cost but at the most fast speed. This kind of alliances is dynamic that come into being at the beginning of a project and disintegrate at the end of the project. Due to the characters of business process in VE, information system platform of VE has its own peculiarity: a) Distributed, b) Opening, c) Loose-coupling. Web Services consists of a set of key technologies and standards that are suitable for B2B e-commerce. Web Services is the ideal candidate for integrating enterprise application and setting up opening and loose-coupling information platform for VE.

2 Inefficient and Costly Web Services Composition It should be pointed out that there are still several issues that need to be addressed before the full potential platform of VE can be realized over Web Services architectures. Web Services are interesting and differ from other distributed computing technologies because they are based on SOAP messages, which are encoded XML transported over HTTP (among other common protocols). SOAP is independent of application such as program logic and syntax. In another word, SOAP defines only architecture of information structure without including the information content. The data type and structure is described in WSDL document. Firstly, if all enterprises develop and publish their Web Services only according to their own business logics, consequently, enterprises will own programming interfaces different from that of other enterprises even if enterprises conduct the same business M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 203–206, 2004. © Springer-Verlag Berlin Heidelberg 2004

204

C. Young and J. Le

(play the same role in supply chain). Thus after finding appropriate service (signed as SA) from UDDI registry (either a public registry or a private one hosted by a trusted third part), customer of Web Services must study and analyze the WSDL description which is issued by publisher of this Web Services, understand the program interface and data structure, then programming appropriate module to invoke this service and treat the response. When choosing another Web Services (signed as SB) from other partner programmers have to develop new module. Because the two Web Services (SA and SB) are different in data structure and interface pattern even though they realize the same business logic. All of this hampers the adoption of Web Services and flexibility of VE application. Though some new tools have come into being which can output invocation code from WSDL document. The ability of these tools is still limited because they can not understand the interface and parameter exactly without human interaction. Secondly, there are a lot of message to be exchanged between partners of VE. These messages such as price, order, specification, draft and so on are complex with specific data structure which is different among enterprises. Therefore partners have to provide many Web Services to exchange these messages in XML format via SOAP message. But as long as every enterprise uses different data structure to describe its data, that is, adopt different XSD (XML Scheme Definition), for example, for their order, customers have to coding specific module according to specific XSD to treat business message exactly. So these different data structure is also a bottle neck for the composition of Web Services. If the composition of Web Services only includes the applications within an enterprise or only covers several stable partners, problems mentioned above look like maybe not so crucial. But to dynastic supply chain of VE, a loose couple is necessary because this kind of supply chain discomposes or merges frequently. If every change of supply chain will give rise to the modification or update of code, the cost of the change is too expensive. Therefore the information platform of VE is inefficient. To compose Web Services quickly and cost-effectively for effective B2B collaboration a solution based on standard common data schema and services pattern is proposed in section 3.

3 Rapid Web Services Composition Web Services composition is gaining a considerable momentum as a paradigm for effective B2B collaboration. The main goal of our work is enhance study of how to facilitate large-scale integration of Web Services which realize the business process.

3.1 Standardizing Common Data Schema and Services Pattern by tModel within Industry Domain As we have mentioned at section 2, heterogeneous business data scheme and Web Services interfaces will make the composition Web Services between partners be-

Research on Information Platform of Virtual Enterprise

205

come time-consuming and costly, consequently B2B collaboration turn inflexible and inappropriate for the dynamic environments of VE. To compose Web Services quickly and cost-effectively for effective B2B collaboration, we find, it is important that to set up a set of common standards or criterions about interfaces of Web Services and business data scheme (maybe drawn by industry association of this domain). Only if common interfaces are adopted widely in the supply chain of VE, can Web Services be invoked conveniently and efficiently. What’s more, only if data structure is standardized, can business data exchanged over enterprises seamless. It is impossible to get common interfaces and data structure over all industry domains. So standards should be built within certain industry domain. All partner of a VE, especially neighboring partners in supply chain usually belong to a big or small industry domain. Therefore to unify Web Services interfaces and data structure is feasible and is a fundamental job for the VE B2B e- commerce platform of certain industry domain. So to unify Web Services interfaces and data structure within industry domain is an important idea we provide in our work. According to UDDI specifications, the entity tModels are references that are actually the URL that can be used to access information about a specification. Also called “fingerprints”, this information is metadata about a specification, including its name, publishing organization, and URL pointers to the actual specifications themselves. So we can standardize Web Services interfaces and data structure and save them at a public web site, than use tModels to reference the public technical standards. All these enterprises that develop their Web Services according them will be compatible in technology. Because compatible enterprises adopt common Web Services interfaces and business data structure, their applications can be integrated easily, dynamically and seamless. When change happens in supply chain enterprise don’t have to update or change its program for integration with new trading partners. Thus the cost of integration is not as expensive as before and efficiency of data exchange is improved.

3.2 The Design of Common Data Schema and Services Pattern Web Services and related functional module can be classed and designed into three kinds of components. Every kind of components consists of a lot small components of different granularity which ultimately form XML documents referenced by standard tModels. These tModels group actually become standard technology specifications. 1) Web Services interface specifications (WSIS). These specifications are designed in accordance with industry characters and unified interface specification composing of business processes and logic of VE partners. Functions of every participant of VE supply chain are embodied in sub-module. For instance, the supply chain of textile VE maybe include textile, dying and garment enterprises. Thereby specifications package of textile domain may be comprised by textile specifications, dying specifications and garment specifications. All these specifications connect with each other to construct VE supply chain. Enterprise implements only some specifications according to its business that will be the Web Services interface in the near future.

206

C. Young and J. Le

2) Web Services calling specifications (WSCS). These specifications are designed in accordance with Web Services interface specifications and trading regulations. These specifications are also decomposed into sub-modules for every enterprises of supply chain. Via them enterprise can call the Web Services of his partners. Web Services calling specifications and Web Services interface specifications consist of all business process of certain industry domain. Industry enterprises can, in fact, unify their Web Services interfaces through these two component specifications. 3) Business data structure specifications (BDSS). These specifications comprise a lot of data structure definition and data composing and discomposing program. Data structure definition is XML scheme documents. Data composing and discomposing program provide some common interfaces to connect with back-end data source. The following is the procedure that enterprise must take to develop and publish its Web Services. The first step, to implement specifications related with its own business. For example, textile enterprise should implement Web Services interface components in the light of WSIS to establish Web Services, and implement dying calling modules and garment calling modules in the light of WSCS which will be used for invocation to the Web Services of dying or garment enterprises, and implement business data composing and discomposing components in the light of BDSS for data exchange between enterprises. The second step is to integrate Web Services to old applications and data sources via standard interfaces of the components. The third step is to publish enterprise base information, products or services information in registry center.

4 Conclusion In this paper, we analyze how to realize B2B e-commerce in VE based on Web Services technology. We focus our work mainly on how to realize rapid Web Services composition. We propose to unify Web Services interfaces and data structure within industry domain by which Web Services become compatible.

References 1. Benatallah, B, Dumas, M., Sheng, Q, and Ngu, A. 2002. Declarative Composition and Peerto-Peer Provisioning of Dynamic Web Services. In Proc. of the International IEEE Conference on Data Engineering. San Jose CA, USA. 2. Sheng Q Z, Bennatallah B, et al, SERF-SERV: A platform for Rapid Composition of Web Services in a peer-to-peer Environment. VLDB 2002. 3. UDDI Technical White Paper. Sep.6 2001. 4. Bennatallah B, Dumax M, et al, Declarative Composition and Peer-to-Peer Provisioning of Dynamic Web Services. ICDE 2002.

A Reliable Grid Messaging Service Based on JMS* Ruonan Rao, Xu Cai, Ping Hao, and Jinyuan You Department of Computer Science.Shanghai Jiao Tong University, Shanghai 20030, P.R.China. {Rao-ruonan,caixu,you-jy}@cs.sjtu.edu.cn,

Abstract. Reliable messaging service is a key service in service-oriented distributed computing architecture. In Open Grid Services Architecture (OGSA), asynchronous communication is needed between services. But the reference implementation of OGSA (Globus Toolkit 3.0, GT3) doesn’t provide reliable messaging service. In this paper, we try to integrate GT3 with JMS server and provide a reliable grid messaging service based on JMS.

1 Introduction The reliable messaging is an indispensable technology in distributed computing. Now there are many message service specifications such as CosNotification of CORBA and Java Message Service (JMS) etc. Open Grid Services Architecture (OGSA) is a service-oriented architecture for distributed computing, which integrate the Grid and Web services technologies[1]. The OGSA aims to access all kinds of network resources through WEB Services access mechanism. The reliable message delivery is needed in service-oriented architecture, as same as in the traditional distributed computing platform. Therefore, the reliable messaging service is the necessary basic service in OGSA environment[2]. But there is no reliable messaging service in GT3 which is regarded as reference implementation of OGSA. In this paper, we try to integrated JMS server with GT3 and provide reliable grid messaging service (RGMS). Using RGMS, Grid Services can communicate asynchronously and reliably.

2 Notification/Subscription in OGSI The notification/subscription mechanism defined in OGSI includes the following elements: a notification source which is the sender of notification messages; notification sink which is the receiver of notification messages; a subscription expression which is an XML element that describes what messages should be sent from the noti-

* This paper is supported by the Shanghai Science and Technology Development Foundation project (No.03DZ15027) M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 207–210, 2004. © Springer-Verlag Berlin Heidelberg 2004

208

R. Rao et al.

fication source to the notification sink; a notification message is an XML element sent from a notification source to a notification sink; a subscription request which is used to establish what and where a notification message are to be delivered, containing a subscription expression ,the locator of the notification sink to which notification messages are to be sent. A sink can subscribe any message exposed in a source, and when service data is changed, the notification source invoke the sink’s call back method and deliver the new service date. As we can see, the Notification/Subscription mechanism can’t satisfy the requirements of asynchronous communication between Grid Services. The conclusion is based on the following two points. First, in the mechanism, all messages which can be subscribed are all service data of notification source, if the target service data of notification source is changed frequently; call back method of notification sink is called frequently. If one of these invocations fails, the message will lost. So the reliability of messaging is not guaranteed. Second, when notification source need to publish some message, the message should be marked as a service data. It is unreasonable. As we know, messages which are published by notification source are generated by some procedure. So the messages to be published should not always be service data. So, we will introduce a Grid messaging service as a plus to Notification/Subscription mechanism of OGSI to perfect messaging between Grid Services.

3 JMS Java Message Service (JMS) [4] defines the Java API of message-oriented middleware and supports both a publish/subscribe model and a point-to-point model, the destination is either a queue or a topic. There are many implementation of JMS specification. HongSmartWeb[5], the J2EE application server we have developed, also implements JMS specification. The implementation of RGMS is based on HongSmartWeb.

4 Framework of RGMS 4.1 Service PortTypes In framework of RGMS, four service PortTypes are defined as following:

4.1.1 TopicServiceFactory PortType Description: the factory of TopicService service, Extends the Factory PortType of OGSI. No additional service data and operations are defined. 4.1.2 TopicService PortType Description: Extends the GridService PortType of OGSI. A client can publish and subscribe to this service. Service locator of sink is defined as a service data. Opera-

A Reliable Grid Messaging Service Based on JMS

209

tions defined includes: publish, subscribe and unsubscribe. The publish operation is used by notification source Grid Service to publish xml message to Topic Service. The input parameters of the operation include the message body and expired time. The subscribe operation is used by notification sink Grid Service to subscribe the topic message. The input parameters of the operation include the MessageListener Grid Service locator and expired time. The unsubscribe operation is used by notification sink Grid Service to unsubscribe the topic message. The input parameters of the operation just include the MessageListener Grid Service locator.

4.1.3 MessageListenerFactory PortType Description: the factory of MessageListener service, Extends the Factory PortType of OGSI. No additional service data and operations are defined. 4.1.4 MessageListener PortType Description: Extends Grid Service PortType of OGSI. The service implements the PortType is the message listener of a TopicService. The notification sink use the service instance locator as the parameter of the subscribe operation of TopicService. No additional service data are defined. Only one operation named onMessage is defined in the service. The operation is the call back operation of MessageListener, if a new message is published to the TopicService. The onMessage operation of associated sinks will be invoked to accept the new message.

4.2 The Mechanism of Communication Now, the implementation of such PortTypes and the mechanism of messaging based on the JMS will be discussed. A grid service which implements the TopicService PortType is associated with a JMS topic in logic. A JMS message topic will be create by the class JMSProxy when a TopicServiceImpl instance is created. The mapping from TopicService instance to the JNDI name of the topic associated is registered in the JMSProxy.

Fig. 1. The framework of RGMS

210

R. Rao et al.

4.2.1 Publish Mechanism As demonstrated in figure 1, the client publish the message by invoking the publish operation of TopicService instance. First the XML based message is converted to the text message of JMS, by the instance of the service. Then, the publish method of JMSProxy will be called, which will send a text message to an associated JMS topic. The JMS server associated with the topic services received the new JMS message and decide where the new message should be sent. 4.2.2 Subscribe and Delivery Mechanism If subscribing a message, as illustrated in Figure 1 , the client creates a MessageListener service instance at first, then uses this instance as parameters to call the subscribe method of TopicService. TopicServiceImpl firstly creates an instance of messageListener which will be registered with the JMS message topic, then use it as parameters to call the subscribe method of JMSProxy, Thus it will register a subscriber on corresponding JMS message topic. If new data is published to the JMS message topic, onMessage method of the associated message listener will be invoked. The method invokes the onMessage operation of the MessageListenerService Grid Service, and sends the xml message to the Grid Service.

5 Conclusion In this paper, the OGSA/OGSI communication mechanism has been discussed. It is demonstrated that a reliable asynchronous communication mechanism between services in OGSA is needed. But GT3 doesn’t provide a reliable messaging mechanism between services. In order to improve on the asynchronous communication mechanism of OGSI, RGMS has been proposed. RGMS is implemented based on JMS server and GTS, it can work efficiently as a plus to OGSI and GT3. Using RGMS, other Grid Services can communicate reliably and asynchronously.

References 1. 2. 3. 4. 5.

Foster, I., Kesselman, C., Nick, J., Tuecke, S.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. (2002) Fox, G.: Messaging Systems: Parallel Computing the Internet and the Grid. Euro PVM/MPI2003 Invited Talk (2003) Tuecke, S., Czajkowski, K., Foster, I., Frey, J., Graham, S., Kesselman, C.: Open Grid Services Infrastructure Version 1.0. Global Grid Forum (2003) Mark, H., Rich, B., Rachul, S.: Java Message Service Specification Version 1.1. Sun Microsystems (2002) Haopeng, C., Baowen, Z., Ruonan, R.: HongSmart White Papers. Shanghai Hongrui Information Science & Technology Co. Ltd. (2003)

A Feedback and Investigation Based Resources Discovery and Management Model on Computational Grid* Peng Ji and Junzhou Luo Department of Computer Science and Engineering, Southeast University, Nanjing 210096, China [email protected], [email protected]

Abstract. Resource Discovery and Management Model is always one of key technologies on Computational Grid research. In order to provide a proper integration of intelligent global adjustment ability and swift discovery ability to resources information dynamic changes, a Feedback and Investigation Based Resources Discovery and Management Model (F&IBRD&MM) with three specific channels, Normal Channel, Feedback Channel and Investigation Channel, and two innovative strategies, the “Feedback” strategy and the “Investigation” strategy, is presented in this paper.

1 Introduction “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.” [1] Computational Grid is used to connect various computational resources on the network to construct a virtual high performance computer, which could offer high performance computing service. According to these facts, how to discover available resources quickly, and how to reflect the dynamic changes of these resources in time are always the most important emphases for scholars to do their level best.

2 Current Research Since the goal of Computational Grid is to utilize all available, free computational resources to overcome difficulties brought by complicated tasks with enormous computing workloads, the first thing must be done is to discover proper computational resources as quickly as possible, and then to manage them. The structure of Resource Discovery and Management Model depends not only on the number of tasks and resources, but also on the type of domains in which resources are located. From the *

This work is supported by National 973 Fundamental Research Program of China (G1998030402) and National Natural Science Foundation of China (90204009)

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 211–214, 2004. © Springer-Verlag Berlin Heidelberg 2004

212

P. Ji and J. Luo

view of resource organization, discovery and management mode, researches on Grid Resource Management Model are mainly classified as Centralized Model, Distributed Model, Layered Model and Multi-Agent based Model. [2] Although scholars did a great deal of research works on Computational, problems still exist. Lack of proper Resource Discovery and Management Model to provide intelligent global adjustment ability and reflection ability to resource information dynamic changes is the most obvious one

3 The Feedback and Investigation Based Resources Discovery and Management Model 3.1 Model Skeleton In order to provide a proper integration of intelligent global adjustment ability and swift reflection ability to resource information dynamic changes, a Feedback and Investigation Based Resources Discovery and Management Model (F&IBRD&MM) is presented.

Fig. 1. Skeleton of the Feedback and Investigation Based Resources Discovery and Management Model

Compared with Layered Model and Multi-Agent Based Model, F&IBRD&MM Model provides two other channels in addition: Feedback Channel and Investigation Channel, in spite of Normal Managerial Channel. All of these three channels are provided to reflect computational resources dynamic changes more swiftly and accurately. Meanwhile, the Root Managerial Node can pass its commands to one certain Virtual Organization or even one computational resource directly with the help of

A Feedback and Investigation Based Resources Discovery and Management Model

213

Investigation Channel. This will obviously aggregate the whole computing efficiency, which Layered Model with level-by-level data transfer and Multi-Agent Based Model with numerous mobile agents searching do not have.

3.2 Core Modules Managerial Nodes (MN): Managerial Nodes are the cores of F&IBRD&MM Model, and they can be divided into 2 groups: the Root MN and MNs in one certain Virtual Organization. One and only one Root MN should be located to carry on global management all the time. Of course, there may be some redundant MNs to act as the Root MN’s candidates. These candidates should be disable when the Root MN runs normally and should be transparent to all computational resources. Only when the normal Root MN crashes, a specific Election Strategy must be used immediately to elect a new Root MN out of these candidates. So this election requires that the information and data kept on these candidates should be the same as the ones kept on normal Root MN. The creations of VOs will depend on Multicast Group Creation Technologies, and so as to the maintenances of VOs. Each VO has its own MN to discover and manage computational resources under its control. The backup strategy adopted by the Root MN can also be carried on in local VO. When one certain computational resource changes its state, the MN in this VO should firstly collect all changed information as quickly as possible, and then pass them upward to its father MN and even to the Root MN via Normal Channels. On receiving them, the father MN or the root MN should multicast the dynamic changed information to all other MNs under their control. Feedback Stubs (FS) and Channels: The concept of Feedback Stub is an innovation as well as a great support of F&IBRD&MM Model. In spite of Normal Channel, F&IBRD&MM Model also provides two other specific channels in addition: Feedback Channel and Investigation Channel. Both two specific channels will be used in specific times. Feedback Channel: allows computational resources to report their state information to Managerial Nodes located in corresponding VOs or even up-level VOs directly. But taking account of the congestion brought by these feedback operations, all Feedback Channels should be used on schedule. Investigation Channel: allows father Managerial Nodes (including the Root MN) to pass their commands directly to certain VOs or computational resources whose local MNs may not know what happens. This direct passing can do good effects not only on information collections but also on resource balancing and global adjustment while the corresponding MNs receive update notifications. Taking account of the direction the investigation operations run in, the time Investigation Channels can be used may conflict with each other. To solve this problem, it’s better that MNs with lower priority (further from the Root MN on Managerial Tree) should obey the orders

214

P. Ji and J. Luo

presented by MNs with higher priority (closer from the Root MN on Managerial Tree). Feedback Stub and father MN are the two end points of Feedback Channel. There are one or more than one FSs in each VO in order to allow basic MNs or even computational resources themselves to report information to up-level MNs directly while their local MNs do not know what happens.

4 Task Scheduling Supports F&IBRD&MM Model provides numerous supports to many key technologies of Grid-Computing, especially to task scheduling. But more obviously, these important supports are always transparent to the Model’s users. With the help of level-by-level data transfer and numerous mobile agents searching, key information collection modes of Layered Model and Multi-Agent Based Model respectively, F&IBRD&MM Model provides basic computational resources information for Grid computing scheduling. But unfortunately, level-by-level data transfer node has weak points that the spread speed may be too slow to waste many valuable computing time; while information will be lost when agents are moving from one stub to another according to numerous mobile agents searching mode. Based on these factors, in spite of these two strategies, F&IBRD&MM Model also provides a “Feedback” and “Investigation ” strategy with the two specific channels, Feedback Channel and Investigation Channel. The “Feedback” strategy not only makes computational resources report their information more voluntarily, but also speeds up the whole information collection efficiency. When a task is being computed, the scheduler may need to adjust its scheduling strategy locally for more efficient computing. At this time, the scheduler may pay more attentions to certain computational resources’ state information intentionally. Since level-by-level data transfer in Layered Model can’t satisfy the requirement of information spreading speed and numerous mobile agents searching in Multi-Agent Based Model may waste time on agents moving, F&IBRD&MM Model has to seek after one new strategy. The “Investigation” strategy is just the key. Schedulers and base VO or even computational resources can communicate and transfer information or commands like “Task Migration” directly via the Investigation Channels, and then the whole computing efficiency will be raised.

References 1. Ian Foster: What is the Grid? A three point checklist. GRIDtoday, Vol. 1. Argonne National Lab @ University of Chicago (2002) 2. Wang Yong, Xiao Nong, Wang Yijie, Lu Xicheng: An Extensible Hierarchical Resource Management Model For Metacomputing System. Journal of Computer Research and Development. 1 (2003)

Moment Based Transfer Function Design for Volume Rendering Huawei Hou, Jizhou Sun, and Jiawan Zhang IBM New Technology Center, Tianjin University, Tianjin 300072, P.R.China, [email protected]

Abstract. With the lucubrating of the research in Computer Supported Collaborative Work, shortening CSCW, people can find the effect of the CSCW in supporting cooperative works in many aspects. But in the design course of an ERP system which is related to a broad range of specialities, and an assembly of capricious members, together with frequently changed system goals, the concept of the CSCW is still not attained enough recognition. As a result, the software in ERP system does not support group-work sufficiently up-to-date. This development platform focuses on the support to cooperation in a project team or between project teams, and exerts technologies such as middleware, component and intelligent agent, which can guarantee the goal of short course, agility, good adaptability and good scalability in the development of ERP system applications.

1

Introduction

CSCW(Computer Supported Collaborative Work) is a distributed computer environment that utilizes techniques such as computers, network and communication, multimedia and human computer interface [1]. And its theory has been extended to many fields. General speaking, the development of ERP (Enterprise Resources Plan) systems is the course of many persons’ participation, big information throughput and strong request for collaboration, at the same time, the problems below exist in ERP software companies at large: 1. 1) The working manner of developers is relatively simple. 2. 2) The orientation of their products is traditional and not in reason, which applies products of separate companies, and only attach importance to resources configuration within one company. 3. 3) Developers lack a well common-used platform supporting afterdevelopment. Oppositely, the middleware platform is a good choice, which has good reusability, good scalability and can be afterdeveloped by users based on their own requirements.

For the problems that appear upward, the paper put forward a resolution, an intelligent middleware CSCW frame-based platform supporting ERP development. And the platform has characteristics below: M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 215–218, 2004. © Springer-Verlag Berlin Heidelberg 2004

216

H. Hou, J. Sun, and J. Zhang

1. 1) Support group work. 2. 2) Dynamically extend in its framework. The framework of this platform is a conjoint architecture of C/S and peer-to-peer, rather than restricted to traditional C/S or B/S architecture. 3. 3) Adopt multi-agent to provide intelligence. Necessary interactions are executed within various intelligent agents, which makes platform get more studing ability and intelligence, and ensures the user’s workload at small level. 4. 4) Introduce middleware and component technologies to easy users’ afterdeveloping work.

In a word, the intelligent middleware CSCW frame-based platform supporting ERP development put forward in this paper extends the range of CSCW application, similarly in research of privilege control, sharing objects maintenance and design of multi-agent systems. In the next section, the architecture design of this platform is introduced. In the third section, the instance of the support to ERP applications by this platform is introduced. And conclusion will be summarized in the fourth section.

2

The Architecture Design of This Platform

To make the platform support afterdevelopment, this paper put forward a conjoint architecture of middleware and components containing sufficient policies and function interfaces. As a whole, the platform is separated into four layers from bottom to top, naming core layer, application layer, user interface adapter layer and user interface rendering layer. And the most important part of the platform is the core layer. The core layer is actually a middleware platform that can encapsulate all tasks’ execution processes and related data. The platform provides application layer containing function components that support collaborative work and ERP applications. And upon those two layers lie the user interface adapter layer and user interface rendering layer concerning the way of human computer interaction. The user interface adapter layer contains application awareness level, application adapter level, behavior awareness level and mode adapter level in order, which resolve the problem of interface individuation relates to the map between the types of user interface visualization and the modes of users’ behavior. The platform studies the characteristics of the user’s working styles by the awareness to the user’s interactions, and then it builds an evolutive user model, which represents one’s work style. And the user interface rendering layer, the top layer, consists the user interface describing level and the user interface painting level, which is in charge of the maintenance of the tree structure of user models in memory and the task how to present the user’s model according to developer’s needs factually. Additionally, To solve the problems of the speed of user interface painting and the dynamic transformation among user working models, the platform inaugurates a sharing memory stack. The user interface presenting process is described as figure 1 below: The following content of this section introduces

Moment Based Transfer Function Design for Volume Rendering

217

Fig. 1. The user interaction rendering course

various types of collaborative working policies and mechanisms in brief. For example, the network communication level supports communication manners such as common TCP, UDP and multicast within a group. There maintain some event sending queues and event receiving queues in the network communication level to cache the events coming from the outer and going to the outer, which can reduce unnecessary resending times for the event loss problem caused by network jam. At the same, this paper offers a resorting arithmetic based on forecast corresponding to the caching queues. The arithmetic can be depicted as: 1. 1) At beginning, the queue sorts events according to the FIFO manner. 2. 2) If more events frequently arrives when the queue was full, the events in the queue will be analyzed with their inner information such as the sender, receiver, spring sharing object and privileges. 3. 3) The queue will resort events according to their privilege levels and correlative degree with the same sharing object.

This arithmetic can ensure the fastest sending speed and the shortest queue length. The policies talked about are optimized to support collaborative group work including sharing objects. The design of this platform can guarantee team members cooperating fast and correctly.

3

Support to the Development for ERP System

Considering the actuality that the ERP system has extended to an enterprise’s outer world, and an enterprise can not survive and faring without other enterprises in a corporate market environment. The core layer contains the data isolation level and the ERP transaction level, and the former solves the isomerous problem of various types of ERP application data, while the latter solves the problem how to process various ERP transactions. In the data isolation level, this paper offers database linking and managing technologies based on interfaces pool mode to implement general database interface. Facing the problem of various databases’ linking and operation methods, the system maintains a thread pool that runs various database interfaces and seems to be only one general database interface, and the system will select suitable interface automatically when users interact with some database. In addition,

H. Hou, J. Sun, and J. Zhang

218

the type of processing data flow will help reducing the platform’s workload, and promote the platform’s efficiency by handing the quantitative but simply logic work in ERP applications to expert part. Besides, the platform provides a business modeling language based on XML format, the BEML, and corresponding language parser, the BELParser, which can make users find the system’s feasibility before implementing their ERP systems in intuition. Obviously, the platform gives attention to supporting both collaboration and ERP functions, which can offer strongly tools to the developing work of a ERP project team to get rid of the tie of time and space.

4

Conclusion

The architecture and inner interfaces of this platform are all specified very clearly. And the actual orientation of the market is trading globalization and integration, so enterprises have to carry out modern cooperation rules and adopt ERP system which also accords with the government polices. And the high end market has been close to saturation under the dual pressure of overseas largescale cooperation and domestic companies, so the middle and low end markets are new economic growing point. As it is said, the platform is suitable to the middle-size and small-size cooperation very much, according to its good scalability and development-tend, therefore this platform has wonderful expectation and market capability.

References Shi Meilin, Xiang Yong, Yang Guangxin: Theory and Applications of CSCW, Electronic Industrial Press, Beijing. 21 (2000) 2–10 [DB1] Li Renhou, Zheng Qinghua, Bao Jiayuan: Conception, Structure, and Applications of CSCW[J], Computer Engineering and Applications. 22 (1999) 28–32 [LV1] A. Krebs, M. Ionescu, B. Dorohonceanu, and I. Marsic: The DISCIPLE System for Collaboration over the Heterogeneous Web, Proceedings of the 36th Hawaiian International Conference on System Sciences (HICSS-36), Waikoloa, Big Island, Hawaii. (2003) 10 pages/CD-ROM. [KJ1] C. D. Correa and I. Marsic: A Flexible Architecture to Support Awareness in Heterogeneous Collaborative Environments, Proceedings of the Fourth International Symposium on Collaborative Technologies and Systems (CTS 2003), Orlando, FL, (2003). 69–77 [WT1] M. Ionescu and I. Marsic: Latecomer and Crash Recovery Support in Fault Tolerant Groupware, IEEE Distributed Systems Online, Vol.2, No.7, pp.1-14, 2001. [LC1] A. Fleming Seay, William J Jerome and Kevin Sang Lee: Supporting Group Activity in MMORPGs with CSCW Techniques (2001). [HE1] Zhang Yunyong: Mobile Agent and Its Applications, Tsinghua University Press (2002). [KG1] P. Bingi, M.K. Sharma, J.K. Golda: Critical issues affecting an ERP implementation, Information Systems Management (1999). 7–14

[LR1]

Grid Monitoring and Data Visualization* Yi Chi1, Shoubao Yang2, and Zheng Feng2 1

2

Shenyang Institute of Computation,Chinese Academy of Science Department of Computer Science and Technology,University of Science and Technology of China, Hefei 230026, Anhui, China {ychi, fzheng }@mail.ustc.edu.cn, [email protected]

Abstract. Grid technology opens the way to build collaborative environments that enable distributed multi-organizational teams to jointly use computing resources. Thus automatic resources/services discovery should be lauched with the dynamicity of grid elements. Hardware and software failure can be found and solved in time by monitoring. Analyzing the gathered data could help to find performance bottleneck. Grid monitoring stores static and dynamic data. This article stresses on the Grid monitoring and data visualization scheme. Later the improvements of Grid applications and services are proposed. Keywords: Monitoring, USTCGrid, Grid, Visualization, Grid QoS

1 Introduction Grid[1] technology opens the way to build collaborative environments that enable distributed multi-organizational teams in scientific research as well as in industrial product design to jointly use simulation software. On the other hand, online collaborative visualization and interactive steering of such applications on parallel computing systems has been a research issue for several years. This article firstly introduces the current research on grid monitoring and data visualization. Then it stresses on the Grid Monitoring and data visualization strategy of USTCGrid. Lastly it raises what we would do to improve our applications and services.

2 USTCGrid Monitoring Before inquiry, users have been authorized by Portal, the interface for users to access various grid resources. After that users can get information data by the selected time period – each year, month, day or hour. System administrators or privileged users are also able to get the information of all the nodes and analysis them by the selected time period. * This paper is supported by the National Natural Science Foundation of China under Grant No. 60273041 and the National ‘863’ High-Tech Program of China under Grant No. 2002AA104560.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 219–222, 2004. © Springer-Verlag Berlin Heidelberg 2004

220

Y. Chi, S. Yang, and Z. Feng

2.1 USTCGrid Portal The computing facility for USTC is a computational grid, in which supercomputers are planned to be integrated with other non-Linux systems including non-QoS P2P nodes to form a massive, distributed computing Grid. For this purpose, a USTCGrid Portal has been implemented. It is a basic portal based on Globus Toolkit 3[6], especially OGSA [6]. Our goal is to design a basic Grid monitoring and visualization prototype with high scalability. We coopertated with department of Biology on three typical scientific computing applications: “Protein fold simulation ”,“Molecule design based on structure” and “Universe evolution simulation N-Body”.

2.2 Structure Conceptual model of grid resources is used as a base schema of the GIS (Grid Information Service) for discovery and monitoring purposes. With the paradigms stated in 2.1, a Grid monitoring system has such structure as Fig. 1.

Fig. 1. Structure of USTCGrid Monitoring

There are several services that would be provided, data collecting, storing, inquiry and analysis if possible. The users register and log in the system by Grid Portal and are given proper rights. Then they can select the nodes, the type of information or data they would like to inquiry and the time period. In this process, there are buffers and caches to speed the system. Grid Services are transferred by URL.

Grid Monitoring and Data Visualization

221

2.3 Monitoring Objects and Events Our data visualization displays the layerd nodes distribution as shown in Fig. 2. display the visualized source information. Users can order portlets to set views, sort accordings and menu structure.

Fig. 2. USTCGrid Monitoring

Monitored data include stastic data and dynamic data. Monitored events could be dealed by system administrator or the adaptive system itself to adjust and make higher performance. The monitored data include [2]: (1) Monitored objects a) System level metrios: host hardware resource cost status, eg. CPU load, host load, memory usage, disk usage (per partition), number of users b) Network objects: communication status across domains/hosts, communication delay with other hosts, data transferring bandwidth, routing status c) Application: process/process set running status. These are often applicationspecific, eg. Web server, database server, monitoring server and other event servers (2) Monitored events a) Alarm: cautionary value before collapse b) Breakdown: no output whatever the inputs are; caused by monitored event halt, unaccess to network objects, objects status beyond limit, e.g. out of memory, overdue data: caused by network breakdown or monitored host system collapse.

222

Y. Chi, S. Yang, and Z. Feng

3 Conclusion and Future Work The Grid monitoring and data visualization system would be incorporated into USTCGrid Portal, which is based on OGSA and Jetspeed. So our monitoring system would be implemented on Windows and Linux. We would work for the database to store snapshot/historical monitoring data. Implement automatic resource discovery using MDS infrastructure. For the Grid environment is different and more distributed than clusters, we would develop a web interface to display various “grid-views” [7], including layered logical view and graphical views. The goal of the graphical views is to give users a graphical representation of the status of a set of entities. Each maps and symbols in graphical views contains localization information, and a background image. Each map, link or symbol of a graphical view is drawn in a graphic HTML page from the information, with same status determination rules. With the use of a DB we can compare the infos from the GIIS against the past history of the availability of the resources (an object can be new, disappeared, re-available), so we would use Microsoft SQL Server on Windows and MySQL on Linux.

References [1] [2] [3] [4] [5]

[6] [7] [8] [9]

I. Foster, C. Kesselman, etc., The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputr Applications, 2001. Grid Monitoring System Based on LDAP, Li Zha, Zhiwei Xu, Guozhang Lin, Yushu Liu, Donghua Liu, Wei Li, 2002 Franck Bonnassieux, Robert Harakaly, etc. MapCenter: An Open Grid Status Visualization Tool Condor, http://www.cs.wisc.edu/condor/publications.html R. Raman, M. Livny, and M. Solomon, “Resource Management through Multilateral Matchmaking”, in: Proc. of the 9th IEEE Sympeosium on High Performance Distributed Computing (HPDC9), Pittsburgh, Aug. 2000, pp. 290-291. http ://w w w. globus.org http://www.nagios.org/ http://datatag.web.cern.ch/datatag/ http://www.ivdgl.org

An Economy Driven Resource Management Architecture Based on Mobile Agent Peng Wan, Wei-Yong Zhang, and Tian Chen Hefei University of Technology, Anhui, China, 230009 {roc_wan, zhang_zwy, chentianne}@hotmail.com

Abstract. The management of resources in the grid environments is complex as they are heterogeneous and geographically distributed. This paper proposes an effective resource management using the mobile agent method among distributed resources. In this paper, we discuss a Mobile-Agent-Based Grid Resource Management Model (MGRMM). In this model, resources will be deployed like commodity exchange and the technology of agent will be applied in the resource management to resolve a series of problems in the grid environments.

1 Introduction The resources in the Grid are heterogeneous and geographically distributed [1]. It makes the management of resources and application scheduling in the grid environments becomes a complex task. Mobile agent can travel among grid resources nodes and find resources that grid computation needs more agility. So, if we can apply the mobile agent methodology in the grid resource management (GRM) we will make resource management easier.

2 The Merit of Using Mobile-Agent Technology in GRM A mobile agent [2] is a self-contained software element responsible for executing a programmatic process, which is capable of autonomously migrating in a network. The merits of using mobile-agent technology in GRM are mainly in three aspects. The first is that it can reduce the communication among resources nodes. The essence of mobile agent is taking the computation to the end of data and executing local disposal in the end of data directly. It only returns the result. The second is that it enhances the ability of parallel solution in executing the jobs of grid. Mobile agent does not need a uniform scheduling. The agents created by the user can travel among different nodes asynchronously. Moreover, the user can create multi agents and execute them in one or multi nodes. The last is that it can adapt the dynamic of the grid resources well. It can perceive the change of the environment then reacts to the change quickly and initiatively. In a word, using the technology of mobile agent in GRM architecture can conquer many of its disadvantages and make it into good use.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 223–226, 2004. © Springer-Verlag Berlin Heidelberg 2004

224

P. Wan, W.-Y. Zhang, and T. Chen

3 The Architecture of MGRMM There are three GRM models that are common used [3]: the Hierarchical model, the Abstract Owner model and the Market model. The MGAMM is based on the Market model.

Fig. 1. The Architecture of the Local Resource Deployment

3.1 The Architecture of Local Node Every resource node in the MGRMM is managed by the local resource deployment. A local resource node involves a resource consumer agent depository (RCAD), a resource provider agent depository (RPAD), a local resource manage agent (LRMA), a local database (LDB), a local database manage agent (LDBMA), an account agent (AA) (see Fig. 1). In the follows, we will introduce them one by one In the grid environments, every user is both a resources consumer (RC) and a resources provider (RP). So MGRMM presents the RCAD and RPAD to delegate them. The agent depository is a dynamic agent store, which is composed of agents. The RPAD delegates the RP. It is composed of a resource advertise agent (RAA), a provider’s negotiate agent (PNA), a job execute agent (JEA), a QoS prove agent (QPA), a charging agent (CA) and a database agent (DBA). The work of RPAD involves sending resource advertisement to the RCAD of other nodes and the grid information server (GIS), making a bargain with the RCAD of other nodes (See Fig. 2), receiving jobs which are sent by other nodes’ RCAD, collecting QoS register, providing the records of resource usage and claiming for compensation when the RC breaches of faith, etc. The RCAD delegates the RC. It is composed of a resource discover agent (RDA), a consumer’s negotiate agent (CNA), a job manage agent (JMA), a QoS inspect agent

An Economy Driven Resource Management Architecture Based on Mobile Agent

225

(QIA), a payment agent (PA) and a database agent (DBA). The work of RCAD involves searching the grid for resource information, making a bargain with other nodes’ RPAD, managing job, segment job, sending job to the RPAD of other nodes, inspecting QoS register, applying to the payment and claiming for compensation when the RP breach of faith, etc.

Fig. 2. The Architecture of the MGAMM

There are various historical records in the LDB. For example, there are records of advertisement, the information of the users who are usually traded, the information of the historical trade, the information get from the GIS, the information of its account in the grid bank server (GBS) and various middle information been produced during the process of the grid resources trade. The manager of LDB is the LDBMA. The LRMA is in charge of the management of the local resources. It monitors and records the process of each service. It registers the scale of the resource, the performance of the resource and the time of the resource been used. Then it measures the usage of the resource and sends these records to the RPAD. Each user has a grid account and is managed by the AA. We can support the trade of grid resources, with the help of the AA The grid bank (GB) [4] is a secure Grid-wide accounting and (micro) payment handling system. It maintains the users’ (consumers and providers) accounts and resource usage records in the database. It supports protocols that enable its interaction with the RP and the RC. The function of the GIS is to provide a platform in which the grid users can issue and query the information of grid resource. The GIS can collect, count and manage the information of various agents. It also provides advice to the grid users.

226

P. Wan, W.-Y. Zhang, and T. Chen

3.2 The Process of Resources Trade The process of resource advertisement is made by RPAD. Firstly, the RPAD send the RAA, which carries the information of the resource that it wants to sale, to the GIS. Then the RAA searches the information of RC who is interested in such resource in the GIS. After that the RAA takes these information back to the RPAD. Secondly, the RP queries the historical trade records in its LDB to find the information of the RC who may be interested in the resource that the RP wants to sale. After the two steps above the RP knows whom it wants to trade. Then the RPAD sends the RAA to these users to advertise its resources and finds the chance of trade actively. The process of resource discovery is made by RCAD. Firstly, in the seedtime of resource discovery, the RC should make it clear that what resources its application needs. Then the RCAD sends the RDA to the GIS. When the RDA reaches the GIS it will ask the GIS for the information that who has these resources. After that the RDA takes the information back to the RCAD. Secondly, the RC queries the historical trade records in its LDB to find the information of RP who may have the resources that the RC wants to buy. After these two steps, the RC knows whom it wants to trade. Finally, the RCAD sends the RDA to these users to find the chance of trade. The trade negotiation involves negotiating the price, signing the trade contract, etc. Based on the information got in the process of resource discovery, the RC should make a decision that to whom he wants to trade. Maybe the RC is not so satisfied with some part of the service provided by the RP, such as the price, the time of the service and the performance of the resource, etc. So the RC needs to send a CNA to the RP and bargain with a PNA on these issues. After the trade negotiation, the RC dispatches the jobs to various RP based on the application and the RP’s ability. When the JEA gets the job, it sends the job to the LRMA. Then the LRMA deploys the resources to do the job. After the job was finished, the LRMA returns the result to the JEA. Then the JEA sends the result to the JMA. After the JMA collects all the jobs’ results from all of the RP who take part in doing the job, the JMA integrates the results to get a final result. This is the process of one resource trade.

References 1. I. Foster, C. Kesselman, S. Tuecke: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, 15(3), 2001. 2. Flávio Morais de Assis Silva, Raimundo José de Araújo Macêdo: Reliability Requirements in Mobile Agent Systems, Proceedings of the Second Workshop on Tests and Fault Tolerance (II WTF 2000), 15th-16th july 2000,Curitiba, Brazil 3. Rajkumar Buyya†, Steve Chapin, and David DiNucci: Architectural Models for Resource Management in the Grid, http://hpclab.cs.tsinghua.edu.cn/research/doc/gridmodels.pdf 4. Alexander Barmouta, Rajkumar Buyya GridBank: A Grid Accounting Services Architecture (GASA) for Distributed Systems Sharing and Integration, International Parallel and Distributed Processing Symposium (IPDPS’03), April 22 - 26, 2003, Nice, France

Decentralized Computational Market Model for Grid Resource Management Qianfei Fu1, Shoubao Yang2, Maosheng Li1, and Junmao Zhun 1 1

Department of Computer Science, USTC, Hefei 230026, China

2

Department of Computer Science, USTC, Hefei 230026, China

{fqianfei, lmsheng, zhjm}@mail.ustc.edu.cn [email protected]

Abstract. The lack of resource ownership and control result in the embarrassment of traditional central controlled grid resource scheduler. A decentralized computational market model (DCMM) is proposed in this paper. In this model, the supply & demand of computational resources is balanced by the market mechanism without central control. Comparing with computational market models and methods previously developed, the no-center architecture is an innovation and fits well with the grid environment.

1 Introduction ,,Grid“[1] computing has emerged as an important new field. The numerous similarities between economic systems and distributed computer systems suggest that models and methods previously developed within the field of mathematical economics can serve as blueprints for engineering similar mechanisms in distributed computer systems [4]. The POPCORN project [3] provides a market-based mechanism for trade in CPU time to motivate processors to provide their CPU cycles for other peoples’ computations. Nimrod-G [2] is a computational economy-based global Grid resource management and scheduling system that supports deadline- and budget-constrained algorithms for scheduling parameter sweep applications on distributed resources. A Decentralized Computational Market Model is proposed in this paper, comparing with computational market models and methods previously developed, the no-center model is more flexible and fits well with the grid environment, this is detail addressed in section 3.

2 Decentralized Computational Market Model (DCMM) Computational Market is based on the grid infrastructure. Figure 1 shows the DCMM and its relationship with the layered grid architecture. From the resource layer, four hierarchical layers build up the Decentralized Computational Market Model. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 227–230, 2004. © Springer-Verlag Berlin Heidelberg 2004

228

Q. Fu et al.

Fig. 1. Computational market model and its relationship with the layered Grid architecture

Figure 1 shows that Grid resource provider is parallel with the resource layer in the Globus [5] architecture. Resource layer in Globus architecture define protocols for the secure negotiation, initiation, monitoring, control, accounting, and payment of sharing operations on individual resources [1]. That is to say, the resource provided in the computational model is manageable resource with grid security guaranteed. The Grid middleware offers core components that help in scheme out computational market system. Many related services already offered by Globus[5] components. The new middleware for computational model specifically includes the following components along with grid components (some of them are similar with [2]):Grid Market Directory (GMD). Pricing Policies. GridBank. trust and QOS mechanism.

Fig. 2. The Structure of Market Portal

Market portal is the marketplace for computational resource transaction. As well as registering to the MDS, special supply and demand information could be published

Decentralized Computational Market Model for Grid Resource Management

229

on the Price BBS. Flexible trade mechanism such as commodity market, auction and securities and futures markets are provided in the Portal (these will be covered in future work). Resource consumers and agencies could choice different kinds of markets freely. Grid user could negotiate with the agency (consumer1 and agency1 shown in figure 2) directly after have got the resource information from the price BBS. Consumer and Agency are situated on the top layer of the model. Both Consumer and Agency are application processes on behalf of resource consumer and resource provider respectively. One Agency can do for many resources at one time; this endows the agency quantification to operate the complex bundled resources. With the interfaces supported by Market Portal, they could communication more conveniently and work to common understanding effectively.

3 Analysis of DCMM Comparison with traditional central control scheduler in grid environment, Market Portal makes great differences. It only provides necessary transaction information and a platform. When, where and with whom to do the transaction are determined by the participants (resource consumers and agencies) themselves, and these determinations are based on selfish interesting: to get to maximum utilities. So the supply and demand of computational resources is balanced by the market mechanism.

Fig. 3. Decentralized & Central-controlled computational market

Figure 3 shows the comparison of Decentralized & central-controlled computational market In figure 3(b), Grid Resource Broker (GRB) [2] acts as a grid resource scheduler, trades with the grid resource providers on behalf of the grid users and allots resources to them. In figure 3(a), there have no concept of Grid Resource Scheduler (or Broker), and the supply & demand of computational resources is balanced by the

230

Q. Fu et al.

market mechanism without central control. The bargainer himself instead of grid scheduler makes the trade decision.

4 Conclusion The numerous similarities between economic systems and distributed computer systems suggest methods previously developed within the field of mathematical economics can serve as blueprints for engineering similar mechanisms in distributed computer systems. A decentralized computational market model is proposed in this paper, comparing with models and methods previously developed, the noncentralize model is an innovation and fits well with the grid environment.

References 1. Ian Foster, Carl Kesselman, Steven Tuecke. The Anatomy of the Grid. Intl J. Supercomputer Applications, (2001). 2. http://www.buyya.com/ecogrid 3. http: //Amvw.cs.huji.ac.il/-popcorn. 4. Kurose, J. F., & Simha, R.. A microeconomic approach to optimal resource allocation in distributed computer systems. IEEE Transactions on Computers, 38 (5), (1989), 705-717. 5. http://www.globus.org 6. Jennifer M.Schopf. A General Architecture for scheduling on the grid. Submitted to JPDC, Special Issue on Grid Computing, (2002).

A Formal Data Model and Algebra for Resource Sharing in Grid* Qiujian Sheng and Zhongzhi Shi Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, P.O. Box 2704-28, Beijing 100080, China &Graduate School of the Chinese Academy of Sciences, Beijing, China [email protected]

Abstract. Management of various complex resources is an important issue to be well done, in order to fulfill the potential of Grid. In this paper, we propose a formal data model and algebra for resource sharing in the Grid. In the model, domain ontology knowledge is described within knowledge schema and domain background knowledge is captured in integrity constrain rules. A query algebra is defined to manipulate the modeled resource base.

1 Introduction It is envisioned that the success of using Grid for distributed system integration will rely on how to have the heterogeneous resources of the Grid, with its increasing scale and complexity, well managed[1]. Knowledge-based data management techniques will play a fundamental role in Grid, for examples, gird services that manipulate large data sets, especially those for data-intensive science, will need rich optimization techniques; as auto-service discovery became reality we will see “query” being made against huge volumes of intricate Grid-services descriptions, creating new challenges in indexing and query optimization. In the long run, a solid foundation, quasi relational theory, is desired. However, in order to effectively share the complex Grid resources, domain resource/data should be semantically structured and interrelated. In this respect, neither relational model[2] nor conventional object-oriented models are effective[3] for their simplicities, in some way. Recently a few models and query languages have been reported [4]. However, despite the works done so far, the research is far from being mature. An abstract and general foundation is still missing. In this paper, we aim to develop a solid foundation for a scaleable and manageable facilitates of complex resource sharing in the Grid.

* Supported by the National High-Tech Program (Grant No.2001AA113121) of China, National Natural Science Foundation of China (90104021), and Beijing Natural Science Foundation (4011003) M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 231–235, 2004. © Springer-Verlag Berlin Heidelberg 2004

232

Q. Sheng and Z. Shi

2 A Knowledge-Based Data Model The model consists of knowledge schema, resource instance base and integrity constrain. The knowledge schema is used to capture the agreed domain knowledge, i.e., concepts and properties. It plays a similar role of database schema in guiding access to the data in the Grid.

2.1 Model Definition Definition 1. Knowledge schema

, where

C is a set of abstract concepts, e,g,. Person, Publication etc, D is a set of concrete domain concept, e.g., Integer, String, and alike predefined type, P is a set of properties of concepts in domain, is a set of transitive property. Concept inclusion relationship is a strict partial order. If then includes is a sub-concept of Property inclusion relationship then KS,

includes

, or

is

a

In KS , is a strict partial order. If

sub-property

of

.In

a

knowledge

schema

The integrity constraint of knowledge schema captures the domain background knowledge that is not captured within concepts and properties of knowledge schema. It can be used in doing semantic query optimization during query process. Definition 2. The integrity constrain of knowledge schema includes constrains on property and other domain background knowledge. There are two kinds of property constrains. One is constrains on concept type of domain and range concept of a relational property. The other is cardinality constrain. The property constrain where are concept type constrains on domain concept and range concept of relational property P respectively. Min and Max are cardinality constrains.

Another kind of domain constrain, like “Only those researchers whose title-level is higher than 8 can take position of Director in department.”, is captured in our model by integrity rule with a generic form, are expressions constructed by concept and property. The example rule can be described as:

A Formal Data Model and Algebra for Resource Sharing in Grid

233

Definition 3. Resource Instance Base ,where I is a set of resource instances, is an instance of concept L is a set of literal values with type of Integer, String, etc., relates a concept to a set of instances assigns to each property-instance pair a set of instances related through given property. Ontology knowledge(i.e., concept and property) is a abstract of domain resources, while specific domain resources is an interpretation of conceptualized domain knowledge. In order to make a solid formal foundation, we have researched the denotation semantics of the model and its validation condition. Due to space limit, it is omitted here.

3 Algebra Seven algebraic operators, including Project, Closure, Select,Join,Union,Intersect and Minus are defined to manipulate the modeled resource base. The algebra makes it possible to specify a high-level declarative query language with a clear semantics. Also it’s the basis to do query optimization. Here, we mainly present the first four operators.

3.1 Projection The projection operator gets a collection c and produces a collection with complex resources according to P, which specifies the projection behavior. Definition Example 3.1. To query author, publish press and publish date of all publications. Notably, if P is not specified then the resulting collection is just a copy of the input collection c.

3.2 Closure The closure operator produces a collection through recursively traversing the specified property p over resource elements in c. Therefore the specified p must be a transitive property. We can decide the recursion depth by specified the iteration bound. The transitive closure iteration would end when there are no more paths to follow or a specified iteration limit is reached. Definition Example 3.2 To query all prerequisite course for Grid research.

234

Q. Sheng and Z. Shi

3.3 Selection The selection operator evaluates condition expression F over input collection c and return those evaluated true. Definition ,where F is a combination of logic operators, such as and arithmetical operator, such as on property values. It has value of True or False. Example 3.3 To query authors of publications whose at least one of authors is a Professor Example 3.4 Ask for author, publish house and publish date of all publications whose every author is Phd student. Obviously, it is a composition of two algebraic expressions.

3.4 Join The Join operator returns a collection including the resource elements in c1 and c2 whose property values matching the specified p. Definition Join ,p is a join predicate. Example 3.5 To select Phd students and Master students who have common supervisor.

4 Conclusion and Future Work In this paper, we presented a knowledge-based data model and algebra to advance the state of the art of resource management in the Grid environment. To our knowledge, it is the first effort towards an algebra for complex resource sharing in the Grid domain. Currently our work is preliminary. There are a lot of works to be continued. In order to test the capacity of proposed model and its algebra, we are implementing the proposed model and query language with Semantic Web techniques. Moreover, we plan to extend the algebra to support inference and analysis on the schema and instance, based on Description Logic[5].At last, algebraic equivalence laws and heuristics would be studied to do query optimization.

A Formal Data Model and Algebra for Resource Sharing in Grid

235

References 1. Semantic Grid, http://www.semanticgrid.org/ 2. Codd, E.F.,A relational model of data for large shared data banks, Comm. Of the ACM,13(6), June 1970, pp 377-387 3. Michael Halper et al Frameworks for incorporating semantic relationships into objectoriented database systems. Concurrency Computat.: Pract. Exper.2003, to appear. 4. Hai Zhuge, A Knowledge grid model and platform for global knowledge sharing, Expert Systems with Applications 22(2002) pp.313-320. 5. Franz Baader et al, The Description Logic Handbook --Theory, Implementation and Applications, ISBN: 0521781760, Cambridge University Press, January 2003

An Efficient Load Balance Algorithm in Cluster-Based Peer-to-Peer System Ming-Hong Shi, Yong-JunLuo, and Ying-Cai Bai Computer Science Department, Shanghai JiaoTong University, Shanghai, 200030 ,China {shi–mh, luo–yj, bai–yc}@cs.sjtu.edu.cn

Abstract. Load balancing is a very important problem in peer-to-peer system. We propose an inter-cluster load balancing algorithm in cluster-based system, based on the semantic group dividing of documents. The algorithm can regulate the popularity granularity size of the semantic group according to the load imbalance situation and alleviate the latent hot topic problem in advance by setting appropriate lower limit for the fairness index. The simulation conducted confirms that the algorithm can achieve high load balance fairness.

1 Introduction Peer-to-peer(p2p)systern become popular form in content distribution architecture. Many p2p systems such as [1],[2] are among them. This paper copes with the clusterbased p2p architecture proposed in [3], and advances the Semantic Group Dividing based Inter-cluster Load Balancing algorithm (SGDILB). SGDILB regulates the popularity granularity size according to the load imbalance. The simulation is conducted to check the validity of the algorithm.

2 Load Balancing Algorithm The same fairness introduced in [4] is used in the algorithm. Assume the system consists of node set The shared document set is Documents semantic group set p(Z) stands for the popularity of the set Z. Formula (1) is the function relationship among the sets.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 236–239, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Efficient Load Balance Algorithm in Cluster-Based Peer-to-Peer System

237

With node’s different power and content contributions considered, the normalized cluster popularity of is shown by formula (2). The goal is to make the fairness of the popularity of clusters be above a threshold, say 0.95. Due to the inter-cluster balance and intra-cluster balance, the whole system can obtain high efficiency. In web system, the web object access distribution follows Zipf [5]. At the same time, there is similar hot interest problem to [6] in p2p system. Literature [6] adopts the cache scheme to cope with the hot spot problem. In SGDILB algorithm, lower limit and upper limits are introduced. When fairness is below lower limit, too many node possibly may access documents of the same or similar interest, with hot interest problem arising. The algorithm checks whether there exists such problem in the step 3. In that situation, the algorithm divides in semantic the document group within the cluster with the highest popularity and with the highest normalized popularity to decrease the granularity size in load balancing. The division lessens the latent hot interest problem of the nodes in hot cluster simultaneously, because more nodes will join in accessing the hot topic. The timely transferring one of the semantic child document group to nodes in another cluster can prevent worsening of the load imbalance. Two flag bits are set. SGDILB algorithm is detailed as follows: 1. calculate the current fairness. 2. IF Fairness
238

M.-H. Shi, Y.-J. Luo, and Y.-C. Bai

3. One child group is assigned to another cluster: their GROUP_DIVs are set to 10. The assigned child group is increase by 1 in field MOVES. The parent group’s entry is kept only at source cluster with GROUP_DIV set to 00, signifying the parent group is no longer held here. The source cluster record the new location of the child groups. Both the source and the destination propagate the entries of child groups to neighbors and other clusters through managing nodes respectively. When the request for the parent group arrives from the node who do not know the change of the semantic division of the group, the source cluster searches in one child group it holds and also forward the request to the cluster of another child group. The node who matches the request piggybacks the entry of the child group in the answer message. After receiving the message, the requesting node deletes the entry of parent group, adds the entry of child group and sends update message to neighbors.

3 Performance Experiment The experiment uses 0.75 as document group popularity’s Zipf parameter value, 22500 nodes, 225000 documents, 100 clusters, 600 document groups, node power ranks between 1 and 6 units, and every node holding a number between 1 and 20 of documents. After the SGDILB algorithm is run, the normalized cluster popularity and fairness is demonstrated in Fig1. The fairness is apparently above 95%.

Fig. 1. Normalized Cluster Popularity Distribution

4 Conclusion This paper proposes a semantic group dividing based inter-cluster load balancing algorithm. The algorithm has the function of regulating the popularity granularity size according to the imbalance levels by setting up the lower and upper threshold, and lessening latent hot interest problem. The experiment shows that the algorithm can achieve high fairness in load balancing.

An Efficient Load Balance Algorithm in Cluster-Based Peer-to-Peer System

239

References 1. Icon Stoica, Robert Morris, David Karger, et al. Chord: a scalable peer-to-peer lookup service for Internet applications. Proceedings of ACM SIGCOMM`01, San Diego (September, 2001). 2. The gnutella Protocol specification v0.4. http://gnutella.wego.com. 3. Peter Triantafillou, Chrysanni Xiruhaki, Manolis Koubarakis, Nikolaos Ntarmos: Towards High Performance Peer-to-Peer Content and Resource Sharing Systems, Proceedings of the 2003 CIDR conference (2003). 4. R. Jain D-M. Chiu, W.R. Hawe: A Quantitative Measure of Fairness and Discrimination of Resource Allocation in Shared Computer Systems. DEC-TR-301 (1984). 5. Lee Breslau, Pei Cao, Li Fan, Graham Philips: Scott Shenker. Web Caching and Zipf-like Distributions: Evidence and Implications. IEEE INFOCOM (1999)126-134. 6. Tyron Stading, Petros Maniatis, Mary Baker: Peer-to-Peer Caching Schemes to Address Flash Crowds, Proceedings of 1st International Peer To Peer Systems Workshop (IPTPS 2002).

Resource Information Management of Spatial Information Grid* Deke Guo, Honghui Chen, and Xueshan Luo Department of Management Science and Engineering, National University of Defence Technology, Changsha, 410073, P.R. China [email protected]

Abstract. Spatial Information Grid (SIG) is an infrastructure and framework which enable us to congregate and share large-scale, heterogeneous, distributed spatial resources across dynamic “virtual organizations”, to organize and manage spatial resources systematically. In order to enable the users and soft agents to locate, select, employ and integrate spatial resources semi-automatically in SIG, we propose the idea of resource information management, and describe it from the aspects of framework, resource information description and resource information registry. Furthermore we proposed the design and implementation solution of resource information registry.

1 Introduction Spatial data is the data that can be associated with location on Earth. It also is the dominant form of data in terms of data volume, and has been widely used in many fields of social and economic activities, ranging from mines exploitation to mobile application. China has accumulated large-scale, heterogeneous spatial resources which include established and establishing fundamental spatial database, spatial data processing and application software, spatial facility and instrument, etc. But these spatial resources were distributed over related departments for long, and this has caused lots of obstacle when other users want to share, integrate spatial resource across departments and regions dynamically. This difficult problem can be characterized as grid problem defined as “Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources” [1]. We propose Spatial Information Grid (SIG) as an effective solution of this grid problem based on the latest grid theory and technology, this solutions involved adoption of a service-oriented model and attention to metadata. In the following, we first introduce the philosophy of SIG in section 2. In section 3, we discuss the resource information management solution from the aspects of framework, resource information description and resource information registry in details. Section 4 is the conclusion and our future work. * This work was supported by National High Technology Research and Development Program of China under grant 2002AA131010. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 240–243, 2004. © Springer-Verlag Berlin Heidelberg 2004

Resource Information Management of Spatial Information Grid

241

2 Spatial Information Grid SIG is defined as an infrastructure and framework which enables us to share distributed, large-scale, and heterogeneous spatial resources across dynamic virtual organizations cooperatively, organizes and deals with them systematically. It aims to make any kinds of users acquire spatial information on any kinds of spatial resources, especially the metadata about spatial resources, and possesses the capability of service-ondemand. The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources. As the increasing spread of sharing architecture and technologies for spatial resources, especially Web services and OGSA (Open Grid Services Architecture) [2], all kinds of wide-area distributed, large-scale, and heterogeneous Web-accessible spatial data/program /device/sensor could be encapsulated diverse spatial web service and grid service. In order to enable users and agents can locate, select, employ and integrate spatial service across discrete information isolates effectively and efficiently, SIG should provide mechanism to describe, publish, manage, and discover metadata information of spatial service shared, namely resource information management.

3 Framework of Resource Information Management To make use of a spatial web service, user needs interpretable and standard description about it, and the means by which it is accessed. An important goal for spatial resources information management is to establish a framework within which these descriptions are made and share. The framework provides not only technical support but also a unified starting point that is the resources information registry for resources information description, publication, discovery and employ. The framework has the characteristic of open and extensible. The fundamental framework designed based on web service protocol stacks just supports the Human user to discover, interpret, select and invocate spatial resource freely, it is difficult for software agent to do so. The extensible framework focus on the semantic markup for spatial resources, Thus Human user and software agent should be able to discover, interpret, select and invoke spatial resource. Figure 1 shows the extensible framework that will be discussed in detail in the following chapter.

3.1 Resource Information Description There are two kinds of major languages which we can select to describe spatial web service, which are WSDL and DAML-S [3]. WSDL provides a communication level description of the messages and protocols used by a web service, while DAML-S provides application level description above WSDL. The fundamental framework just adopts WSDL, but the extensible framework employs not only WSDL but also DAML-S. WSDL doesn’t support semantic description of Web services.

242

D. Guo, H. Chen, and X. Luo

DAML-S describes the semantic markup information of Web services from aspects of Profile, Process Model, and Grounding. Profile and Process Model are considered to be abstract specifications, in the sense that they don’t specify the details of particular message formats, protocols, and network addresses by which a Web service is instantiated. The role of the grounding is to provide these more concrete details.

Fig. 1. The Extensible Framework of Resource Information

3.2 Resource Information Registry As mentioned above, the fundamental and extensible framework requires a unified registry which supports the publication, organization and discovery of the description information about spatial web service both in WSDL and DAML-S format. There are many kinds of different registry methods and standards. For example: 1) Metadata registries defined by ISO standard 11179; 2) Registries for software components and developments; 3) Universal Description, Discovery and Integration (UDDI) registries; 4) Electronic Business XML (ebXML) registries; 5) SQL database catalogs. Although UDDI [4] is the most suitable for spatial web service among those methods and standards, it does not support semantic descriptions of services. Although agents can search the UDDI registry and retrieve service descriptions, a human needs to be involved in the loop to make sense of the descriptions, and to program the access interface. In order to satisfy the need of the extensible framework, we combine UDDI specification and DAML-S specification. Academe already pays more attention to this problem and has lots of achievement, such as [5].

3.2.1 Resources Information Publication and Discovery Resources information registry aims to facilitate the publication and discovery of potential business partners, services and their groundings. This may or may not be

Resource Information Management of Spatial Information Grid

243

done automatically. When this discovery occurs, programmers affiliated with the business partners program their own systems to interact with the services discovered. DAML-S enables more flexible discovery by allowing searches to take place on almost any attribute of the Service-Profile. Resources information registry must offer a Web service compliant Inquiry, Publication, Security, and Ownership API sets, please refer [4] for more details. The Subscription and Value Set API sets are optional for resources information registry. But these API sets just base keywords, don’t contain domain semantic information except for some taxonomies. Although it is possible to publish and discover DAML-S profile of web service by tModel, this way still is immature and impracticable.

3.2.2 Implementation of Resource Information Registry We combined the B/S and RPC model to realize the design and implementation of resource information registry. We employ Jsp as the representation technology, Java and Beans as the realization technologies of application logic, JDBC and RDBMS as the database technology in the framework of B/S. Furthermore, Simple Object Access Protocol (SOAP) is selected to realize the RPC model. Users can send the request soap message encapsulated by all kinds of SOAP toolkits to the object servlet which parses the soap message and activates related Beans to execute application logic. Then users get, parse and use the response soap message encapsulated by the servlet.

4 Conclusions In this paper, we have presented the philosophy of SIG, and analyzed the resource information management problem of SIG from the aspects of framework, resource information description and resource information registry. Furthermore we proposed the design solution of resource information registry. In the future work, we will pay more attention to import the domain semantic in the resource information registry.

References 1. Ian Foster, Carl Kesselman, Steven Tuecke. The Anatomy of the Grid-Enabling Scalable Virtual Organizations. Intl J. Supercomputer Applications, 2001 2. Ian Foster, Carl Kesselman, Jeffrey M. Nick4 et al. The Physiology of the Grid-An Open Grid Services Architecture for Distributed Systems Integration. Draft paper, 2002 3. Ankolenkar, A., Burstein, M., Hobbs, J.R. DAML-S: Web Service Description for the Smantic Web. The First International Semantic Web Conference (ISWC), June, 2002 4. Barbara McKee, Dave Ehnebuske. UDDI 2.0 API Specification, http://www.ud di.org/pubs/ ProgrammersAPI-V2.00-Open 20010608.pdf, June 2001 5. Paolucci, M., Kawamura, T., Payne, T.R. and Sycara, K. Importing the Semantic Web in UDDI. Web Services, E-Business and Semantic Web Workshop, 2002

An Overview of CORBA-Based Load Balancing* Jian Shu1 , Linlan Liu1 , and Shaowen Song2 1

Department of Computer Science Nanchang Institute of Aero-Technology ,Nanchang, Jiangxi, P.R.China 330034 [email protected] 2

Department of Physics and Computing Wilfrid Laurier University,Waterloo, On, Canada N2L 3C5 [email protected]

Abstract. The Common Object Request Broker Architecture (CORBA) compliant load balancing mechanism has caught the attention of many researchers in recent years. Load balancing is an issue, which has to be concerned in CORBA architecture. In this paper, we provide an overview of CORBA-based load balancing mechanism. We discuss its requirements, policies, strategies, as well as the pros and cons. We address these issues from the aspects of design, load monitoring, load balancing algorithm, and load migrating.

1 Introduction The fast increasing of data management and processing in distributed computing environment (DCE), including the Internet based distributed computing, has resulted in tremendous amount of computing tasks in the system. In many circumstances, the loads on the nodes are not well distributed, which leads to the overloading of some processors, while other nodes may well be in ideal status. This situation results in low system performance and throughput. In order to improve the overall performance and increase in the system throughput, a popular and cost-effective way is to employ some kind of load-balancing mechanism in distributed systems. The load balancing algorithms can reallocate the workload based on system state information so that each node in the system gets approximately equal amount of workload. In general, load balancing can be implemented in three configurations [1]: 1) network-based load balancing – it can be implemented via the DNS (Domain Name Service), a proxy server, or an address switching gateway; 2) operating system based load balancing – workload can be redistributed through clustering, load sharing, or process migration mechanisms; 3) middleware-based load balancing – an objectoriented approach that the computing for load balancing can be performed at any node in the system., CORBA is one of the major software architectures of this kind. Load balancing discusses in this paper is middleware-based. * The paper is supported by Jiangxi Nature Science Funding and Jiangxi Testing and Control Funding.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 244–249, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Overview of CORBA-Based Load Balancing

245

This paper is organized into 6 sections. Following this introduction, in Section 2, the requirements of CORBA-based load balancing are discussed. The load monitoring, load balancing algorithms, and load migrating are addressed in Sections 3,4, and 5 respectively. Finally, we present some concluding remarks in Section 6.

2 The Requirements of CORBA-Based Load Balancing CORB A was proposed by OMG in 1989. It offers a complete object-oriented, interoperable, platform-independent environment. Load balancing implemented in CORBA should satisfy the following requirements [2]: Object-oriented: Because CORBA is a complete object-oriented, the load balancing mechanism based on CORBA should comply with object-oriented convention. Low latency: Latencies occur in many situations, such as the response for collecting information of objects, clients, and servers, as well as communicating among objects. A satisfied load balancing mechanism should minimize these latencies in order to preserve the performance of the system. Transparent: A CORBA load balancing service should keep transparent, that is, it has to interfere as little as possible with the execution of client applications and server applications. And it can comply with clients and servers without the requirement of changes. Heterogeneous: CORBA supports heterogeneous computing architectures. Therefore, load balancing in CORBA has to have the ability to deal with heterogeneous loads and hosts. Also, different types of applications may have different definition of workload. Therefore, CORBA load balancing should be able to evaluate different loads without any restrictions to the clients. This is also the requirement of transparency. Scalable: Load balancing in CORBA should maximize the system scalability through fully utilizing the resources of group servers whose resources have not been made use of efficiently. The information should be accessible from any host. For the reason of scalability, CORBA load balancing should not depend on the extensions to GIOP/IIOP, and on the extensions to an ORB as well. Fair: The balancing algorithm should be fair, as much as possible, to all objects. Adaptive: Load-balancing service should allow new replicas to be added to or removed from the system. These changes also result in the change of the whole system. CORBA load balancing should be adaptive to these changes automatically. Request independent: There are different request patterns from the clients. Some requests arrive at deterministic or stochastic rates with the duration of execution being known. While some others are dynamic with the duration being unknown known before the execution. A good load balancing service should support these various client request patterns, without any or minimal limitations to the clients.

246

J. Shu, L. Liu, and S. Song

3 Load Monitoring Workload monitoring is to collect the load information from a distributed system. The data collected are used for evaluating load distribution to determine whether the workloads are off the balance. As mentioned in Section 1, the load balancing service discussed in this paper is a middleware-based. Usually, there are two ways to monitor middleware [3]: 1) interception -- the monitor intercepts the middleware data and analyze the contents. This approach can increase transparency; and 2) integration -- hooks are added into the middleware itself to inform the monitoring runtime of some events. This approach will impact the transparency of load balancing. In the ORB-based load balancing, load monitoring can be achieved by either integrating the function into ORB itself or adding hooks to it. For example, in [4], the load balancing service is implemented by replacing the default locator with another one which has load balancing functionalities. An ORB-based monitoring is also presented in [5], in which hooks are added into the ORB to intercept invoke calls. The advantage of ORB-based approach is that the overhead of communication is reduced. The disadvantage is that the load distribution depends on a specific ORB. Therefore, the transparency in this scenario is impacted. In the service-based scenario, load monitoring is usually fulfilled by naming service or event-service. If it is name service-based, each servant objects registers with name service. Every request has to get an object reference via name service, that is, it asks the name service for the reference to the service. Therefore, the monitoring can be done by supervising the request. A CORBA naming service based monitoring is presented in [6], while [7] and IONA’s Orbix are based on extended naming service. When event service-based monitoring is used, CORBA event service defines a framework for decoupled and asynchronous message passing between distributed objects [8]. In this case, an event channel is established which is responsible for the communication between consumer objects (clients) and supplier objects(servers). The load is monitored by supervising the event channel. An example of using extended CORBA event service to achieve load monitoring is presented in [9]. The information collected in load monitoring is used for load evaluating. Usually, a threshold policy is adopted to determine the status of load distribution. The thresholds are expressed in units of load [10]. When workload exceeds its threshold, a loadbalancing algorithm is triggered in order to re-balance workload among the stations.

4 Load Balancing Algorithms When designing load-balancing algorithms, many factors, such as load distribution area and control mode, need to be considered. We discuss the policies and strategies [10] which should be taken into account in load balancing algorithms in this section. Global algorithm or local algorithm. In local algorithms, object loads are redistributed among hosts in the neighborhood when workload related to a certain object exceeds its threshold. In global algorithms, object loads are redistributed in the entire system rather than the ones in the neighborhood. The objective, in this case, is to achieve the whole system load balancing after computing loads on entire system.

An Overview of CORBA-Based Load Balancing

247

Centralized algorithm or decentralized algorithm. In centralized algorithms, there is a central controller which is responsible for collecting system status information and making the load balancing. The implementation presented in [1] falls into this category. In decentralized algorithms, there is no central load balancer, while the load balancing mechanism is distributed. The implementation presented in [9] falls into this category. Adaptive algorithm and non-adaptive algorithm. In adaptive algorithms, loadbalancing policies vary with the changes of system state. When imbalance occurs, according to this load change, the load balancer will request replica redirect clients back to it, and then it redirects the clients to a less load replica. Load balancing policies leave unchanged when system state changes in non-adaptive algorithm. Selection policy. In order to balance the load, load balancer has to select a most suitable request to a proper object. The simplest way is to select the newly generated request which causes the host over load, because the overhead to transfer this kind of request is relative low. There is another approach, such as the one mentioned in [1], which redirects a running request to a proper replica. Profitability policy. Let denote the imbalance factor at time t, and denote the balancing overhead. is the function of the difference between the maximum object load before and after load balancing. Therefore, when is great than we consider that the load balancing is profitable. Location policy. This policy is to determine which object or replica is the suitable one to share the client request when unbalance happens. A popular method to find a suitable object is polling. For local polling, only objects on neighboring hosts are candidates. For global polling, all objects on system are polling candidates. Another way is to broadcast a query to see if any object or replica is available for load sharing. Load balancing algorithm is usually studied with graph coloring method. Generally speaking, there are two kinds of load balancing algorithm: nearest neighbor and direct approaches [10]. Nearest neighbor approach is based on successive approximation by exchanging load among neighboring hosts to achieve the whole system load redistribution. The challenger of this method is to design a fast iterative scheme. The direct approach exchanges load after it figures out senders and receivers. The challenger of this method is to match potential senders and receivers quickly. Some common load balancing algorithms, such as diffusion, dimension exchange, and gradient are addressed in detail in [10].

5 Load Migrating Load migration is to transfer the extra load from an overloaded node to its proper destination. There are two issues needed to be dealt with in this procedure--initiation and granularity: Initiation: Usually, there are two methods to initiate the migration [10]--senderinitiated and receiver-initiated. In the case of sender-initiated migration, the overloaded party is in charge of searching potential object or server replica. There are two widely used approaches for sender to select its destination: one is randomly and the other is sequentially. Fig. 1 shows a sender-initiated migration scenario. In the

248

J. Shu, L. Liu, and S. Song

case of a receiver-initiated migration, the underloaded party is responsible for selecting overloaded ones to accept their extra load. Once the load on receiver is less than a certain amount, it will send request to other objects to see if there is any extra load to receive. In some cases, a tradeoff approach, symmetrically-initiated, is employed, which is a combination of sender-initiated and receiver-initiated.

Fig. 1. Sender-initiated migration

Granularity: The granularity of migration can fall into two categories: strong migration and weak migration. Strong migration means that both the entity and its state are transferred to their destination together. Weak migration means that only the executable part and some related data are transferred. When designing migrating mechanism, there are two important factors which should be taken into account-atomicity and consistency. Atomicity means that the migrated request can be restored if the transfer is failure. Consistency means that the execution of the migrated request on destination should keep unchanged when compared with the one on original server.

6 Concluding Remarks CORBA is a prominent software architecture. It has been used in many applications and shows its great benefit. In this paper, we provide an overview of CORBA compliant load balancing mechanisms, including general requirements, load monitoring and evaluation, load balancing algorithm, and load migrating.

An Overview of CORBA-Based Load Balancing

249

Although many researches have been accumulated in the literature, CORBA-based load balancing is still in development stage.

References Ossama Othman, Carlos O’Ryan, and Douglas C. Schmidt: An Efficient Adaptive Load Balancing Service for CORB A, http://www.cs.wustl.edu/~schmidt/PDF/load_balancing1.pdf 2. Ossama Othman, Carlos O’Ryan, and Douglas C. Schmidt: The Design of an adaptive CORBA Load Balancing Service, IEEE Distributed Systems Online, vol. 2, (2001) 3. Erik Putrycz, Guy Bernard: Client Side Reconfiguration on Software Components for Load Balancing, In Proc. International Workshop on Distributed Dynamic Multiservice Architecture, in conjunction with IEEE International Conference on Distributed Computing Systems (ICDCS’2001), Phoenix, Arizona, (2001) 16-19 4. Gebauer,C.: Load Balancer LB-a CORBA component for load balancing, Diploma thesis, University of Frankfurt(1997) 5. C.Grant, P.Merle, and Geib J. Goodewatch: Supervision of Corba Applications. In ECOOP’99 Workshop, Lisbon, Portugal(1999)14-18 6. E.Damiani: An Intelligent Load distribution System for CORBA-compliant Distribution Environments, 1999 IEEE International Fuzzy System Conference Proceedings, Seoul, Korea(1999) 22-25 7. T. Barth, G. Flender, B. Freisleben, F. Thilo: Load Distribution in a CORBA Environment, Proceedings of the International Symposium on Distributed Objects and Applications (DOA’99), Edinburgh, Scotland (1999)158-166 8. Object Management Group. CORBA services: Common Object Services Specification, 1997 9. Kei Shiu Ho, Hong Va Leong: An Extended CORBA Event Service with Support for Load Balancing and Fault-Tolerance, International Symposium on Distributed Objects and Applications, Antwerp, Belgium(2000)49 10. Jie Wu, Distributed System Design, CRC Press(1999) 1.

Intelligence Balancing for Communication Data Management in Grid Computing Jong Sik Lee School of Computer Science and Engineering, Inha University Incheon 402-751, South Korea [email protected]

Abstract. This paper describes a design and development of an intelligence balancing for communication data management in a grid computing. The intelligence balancing allows to execute a complex large-scale grid computing system and share dispersed data assets collaboratively. After a brief review of communication data management, we discuss the design issue of the intelligence balancing. We analyze system performance and scalability, especially with non-intelligence and intelligence-based configuration. The empirical result on the heterogeneous OS distributed system indicates the intelligence balancing is effective and scalable due to the use of communication data reduction.

1 Introduction A popular trend to execute complex and large-scale distributed systems with reasonable computation and communication resources has been focused on a grid computing. Actually, distributed computing is a subset of grid computing. Grid computing approaches are being applied to a growing variety of systems including process control and manufacturing, military command and control, transportation management, and so on. In order to provide a reliable answer in reasonable time with limited communication and computation resources, a methodology for reducing the interactive data transmission is required in a grid computing environment. In this paper, we propose an intelligence balancing for communication data management to promote the effective reduction of data communication in a grid computing environment. This paper is organized as follows: Section 2 reviews communication data management [1], [2], [3]. Section 3 discusses the intelligence balancing for communication data management with an application: satellite cluster management. Section 4 analyzes performance effectiveness and system scalability. Section 5 illustrates a testbed for experiment and evaluates system performance. The conclusion is Section 6.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 250–253, 2004. © Springer-Verlag Berlin Heidelberg 2004

Intelligence Balancing for Communication Data Management in Grid Computing

251

2 Communication Data Management A large-scale grid computing system requires an achievement of real-time linkage among multiple systems, and thus has to execute complex large-scale execution and to share dispersed data assets and computing resources collaboratively. The methodology to support the reduction of the interactive messages among grid computing components is called a “communication data management.” It is the goal of a data traffic reduction scheme that a large-scale grid computing is performed with reasonable communication and computation resources. To perform a data traffic reduction scheme reliably, flexibility and efficiency are required. Flexibility does not indicate anything specific to any particular problem domain or technology, but rather indicates being general in nature. Efficiency requires the scaling of grid computing from very small to very large along many dimensions including numbers of the grid component objects, complexity of interactions, fidelity of representations, and computational/ network resources.

3 Intelligence Balancing and Performance Analysis For performance improvement of a complex large-scale grid computing system, we propose an intelligence balancing for communication data management which reduces the required transmission data by providing an intelligence to a transmissionrelated component. The proposed scheme indicates intelligence distribution to each grid component from a centralized intelligence-related component. The intelligence balancing reduces system execution cost by reducing the communication cost among components with separated small intelligence. The intelligence indicates that each component operates with its own decision beside its basic operations. The scheme increases system modeling flexibility and improves system performance through communication data reduction and computation synchronization and system scalability for a large-scale grid computing system. Note that the component with intelligence needs more computation resource. However, the reduction required transmission data improves a total system performance instead of growth of local computation. In this paper, we apply to the intelligence balancing to a satellite cluster management system [4], [5]. We introduce the ground system operation as a case study to discuss nonintelligence and intelligence approaches and evaluate system performance. The intelligence approach indicates that it separates ground functions and distributes a set of functions to spacecrafts. Performing a set of functions requires intelligence of spacecraft. Here, we classify the degree of intelligence: low and high. In case of low intelligence, a ground station separates four regions to be observed, makes four different command strings, and sends them to a cluster manager. The cluster manager parses the command strings and forwards them to each proper spacecraft. The parsing and forwarding classifies a lower intelligence of the cluster manager. In case of high intelligence, the ground station does not separate four regions to be observed and sends a total region to the cluster manager. The cluster manager should include the intelli-

252

J.S. Lee

gence for division of region to be observed. The division intelligence should understand the technology including region division, image capturing, image visualization, image data transmission, and so on. To analyze the system performance of the intelligence balancing, we take the amount of required satellite transmission data which are between ground station and spacecrafts. Notice that transmission data among spacecrafts inside cluster is ignored. As Table 1 shows, the intelligence approach significantly reduces the number of bits passed. Basically, there occurs overhead bits (H) needed for satellite communication when a ground station sends a command. The non-intelligence causes an amount of overhead bits since it makes a ground station to send spacecrafts messages individually. High intelligence significantly reduces the transmission data bits since it transmits the one big region location information irrelevant to the number of spacecrafts (N) in a cluster. The high intelligence still requires the same lower transmission data bits.

4 Experiment and Result A cluster of 4 spacecrafts flies on pre-scheduled orbits. One of the spacecrafts acts as the cluster manager that communicates with the ground station. The cluster manager aggregates data collected from the other spacecrafts and send it back to the ground station. In the platform setting, we develop a HLA [6]-compliant heterogeneous distributed system which includes various operating systems including SGI Unix, Linux, Sun Unix, and Windows. The total of five federates are allocated to five machines, respectively, and they are connected via a 10 Base T Ethernet network. As Table 2 shows, the intelligence approach apparently reduces the transmission data bits. Especially, the use of high intelligence approach greatly reduces the transmission data bits.

Intelligence Balancing for Communication Data Management in Grid Computing

253

The high intelligence approach is able to allow an execution which requires a small amount of transmission data bits regardless to the number of satellites. The system execution time considers both of communication and computation performance. The non-intelligence approach requires a large amount of communication data, however it does not need the local computation for intelligence. The intelligence approach reduces the amount of communication data and uses operations for intelligence. The system execution time for the intelligence approach is caused by both of data communication time and intelligence operation time. The reduction indicates that the time reduction from transmission data reduction is greater than the time expense from intelligence operation.

5 Conclusion This paper presented the design and development of the intelligence balancing scheme for communication data management in a grid computing system. The proposed scheme focuses on the intelligence balancing to each grid component and various degrees of intelligence. Also, the scheme indicates the conversion from a conventional passive component, which processes with given commands, to an intelligent component, which processes with its own decision. These intelligence balancing and intelligent component concepts allow various complex executions for a variety of grid computing systems and improve system performance through data communication reduction and computation load balancing. The empirical results showed favorable reduction of communication data and overall execution time and proved the usefulness of the intelligence balancing for communication data management in a grid computing system.

References 1. 2. 3. 4.

5.

6.

Boukerche A. and Roy. A.: A Dynamic Grid-Based Multicast Algorithm for Data Distribution Management. IEEE Distributed Simulation and Real Time Application, 2000. Gary Tan et. al.: A Hybrid Approach to Data Distribution Management. 4th IEEE Distributed Simulation and Real Time Application, August 2000. Katherine L. Morse, et. al.: Multicast grouping for dynamic data distribution management. Proceeding of SCSC99, 1999. Zetocha P.: Intelligent Agent Architecture for Onboard Executive Satellite Control, Intelligent Automation and Control. TSI Press Series on Intelligent Automation and Soft Computing, vol. 9. Albuquerque, N.M., pp. 27–32, 2000. Surka D.M., Brito M.C., Harvey C.G.: Development of the Real-Time Object-Agent Flight Software Architecture for Distributed Satellite Systems. IEEE Aerospace Conf., IEEE Press, Piscataway, N.J., 2001. High Level Architecture Run-Time Infrastructure Programmer’s Guide 1.3 Version 3, 1998 DMSO

On Mapping and Scheduling Tasks with Synchronization on Clusters of Machines Bassel R. Arafeh Department of Computer Science, Sultan Qaboos University Muscat, Oman [email protected]

Abstract. In this work, a two-step approach is adopted for scheduling tasks with synchronous communication. To that end, an efficient algorithm, called GLB-Synch, is introduced for mapping clusters and ordering tasks on processors in one integrated step. The algorithm used the information obtained during the clustering step for selecting a cluster to be mapped on the least loaded processor. A performance study has been conducted on the GLB-Synch algorithm by simulation. We have shown by analysis and experimentation that the GLB-Sync algorithm retains the same low complexity cost of the first step for clustering.

1

Introduction

With the era of low cost commodity hardware, cluster and Grid computing is emerging as an alternative for cost effective massive parallel processing [2]. But the communication cost over the available networking facilities is still very high compared to computing cost. Besides, the available networks lack the necessary reliability found in typical multiprocessor interconnection networks. To overcome this deficiency, synchronous message-passing communication may need to be enforced for many parallel applications or middleware software. In general, synchronous communication adds overhead to the already high cost of communication and may introduce deadlocks. Scheduling tasks, efficiently, on distributed memory architectures is still a challenging problem. A multi-step scheduling approach has been proposed by many researchers, such as Sarkar [6], and Liou and Palis [4], in order to reduce the complexity of the scheduling problem by using low cost heuristics. In this work, we adopt a two-step approach for scheduling synchronous parallel programs on distributed-memory architectures, called Guided Load Balancing with Synchronization (GLB-Synch). In the first step, tasks of the program are clustered to reduce the communication cost and to avoid deadlocks, assuming unbounded number of processors. In the second step, clusters are mapped and their tasks are ordered for execution on the available number of processors. The rest of the paper is organized as follows. The next section introduces a background. Section 3 presents the GLB-Synch algorithm, while section 4 discusses the performance study. Finally, section 5 is the conclusion. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 254–258, 2004. © Springer-Verlag Berlin Heidelberg 2004

On Mapping and Scheduling Tasks with Synchronization on Clusters of Machines

2

255

Background and Related Work

In this work, a parallel program is modeled as a weighted directed acyclic graph (DAG), where V is the set of task nodes, E is the set of communication edges, is the set of task computation weights, and is the set of edge communication costs. The length of a path is defined as the sum of all computation weights of nodes and all communication costs of edges along the path. The critical path of a DAG is the path from an entry node to an exit node that has the maximum length. The computation to communication ratio of a parallel program (PCCR) is defined as its average computation weight divided by its average communication cost. The execution behavior of the program DAG is the macro-dataflow model. However, the execution of each task consists of three phases: Receive, compute, and send. In a synchronous communication, the sender is blocked until an acknowledgement is received from the receiver. This waiting time is called the blocking delay. A direct deadlock situation between two clusters occurs due to a cyclic dependency relation between both of them. In general, a deadlock situation may arise due to a chain of dependency relations among a subset of tasks. In this work, we consider direct deadlock situations only. Based on the task execution phases, the following definitions of time parameters characterize the scheduling of a task node in a scheduled DAG with synchronous communication. Start time for receiving messages by task node End time for receiving messages by task node Start time for computation by task node End time for computation by task node Start time for sending messages by task node End time for sending messages by task node The issue of scheduling on distributed-memory parallel architectures with synchronous communication has not been given enough attention in literature. However, the work of Kadamuddi and Tsia [3] and Arafeh [1] address the issue, assuming a multi-step scheduling approach. This work uses the clustering algorithm, called NLCSynchCom, by Arafeh [1] for the task-clustering step. The complexity cost of the NLC-SynchCom algorithm is where, v is the number of nodes and e is the number of edges in a DAG.

3

Cluster-Mapping and Scheduling with Synchronous Communication

The GLB-Synch algorithm is an extension of Radulescu [5] GLB algorithm for mapping clusters to processors. However, the GLB-Synch algorithm performs both cluster-mapping and task-ordering in a single step in the context of synchronous communication. The algorithm uses the information obtained during the clustering step, based on the NLC-SynchCom algorithm for mapping clusters to processors. Eventually, the NLC-SynchCom algorithm schedules the tasks on the virtual processors, which we will refer to as the UNC schedule. The GLB-Synch algorithm uses the cluster start time, to represent the priority of a cluster, C, for mapping. Let denotes the processor ready time on a partial schedule. It is initialized by zero, and it is defined as

256

B.R. Arafeh

the end time of the last task, scheduled on that processor. Accordingly, a task, is scheduled for execution at if the current start time for the computation phase, is less than or equal to Otherwise, the task is scheduled to start its execution at its designated time. The time complexity of the GLBSynch algorithm is where, is the number of processors, and c is the number of clusters. The GLB-Synch algorithm is described next. GLB-Synch Algorithm Input: 1. A clustered DAG. 2. The Table of time parameters (i.e., UNC schedule). 3. Number of processors. Output: 1. A mapping of the clusters to the processors. 2. Tasks’ schedule on the processors. Algorithm Steps: 1. Compute the start time, and the workload, for each cluster, C. 2. Sort the clusters in an increasing order based on breaking ties by choosing the cluster with the highest workload. 3. For each cluster, C, do Map C to a processor, p, with the least workload. Zero the inter-cluster communication cost between any currently mapped clusters to p and C. Update the workload of processor p: 4. Update the UNC schedule due to the mapping step. 5. For each processor,p, do Perform task-ordering for all tasks mapped to p based on s_compute time, breaking ties by choosing that task with the highest blocking delay. For each task mapped to processor p

On Mapping and Scheduling Tasks with Synchronization on Clusters of Machines

Fig. 1. Average execution time of the clustering step with respect to the DAG size

4

257

Fig. 2. Average execution time of the GLB_Synch algorithm with respect to the DAG size

Performance Study

A performance study has been conducted based on the NLC-SynchCom algorithm for the clustering step, and the GLB-Synch algorithm for the cluster-mapping and taskordering steps. The objectives of the performance study are to assess the cost of the multi-step scheduling approach in the context of synchronous communication, to compare between the costs of both algorithms used, and to assess the speedup achieved versus the number of processors. The performance study used synthesized DAGs for simulation and experimentation. A random graph generator is implemented to generate weighted DAGs with various characteristics based on some factors.

Fig. 3. Average speedup versus the number of processors

258

B.R. Arafeh

Figures 1 and 2 show the average execution time for the clustering step, and the mapping and ordering step, respectively, versus the number of nodes. Three cases are considered for each scheduling step, based on the shape factor of the DAG, From the plots, it can be deduced that the cost of the mapping and ordering step is much less than the cost of clustering, in general. The cost of mapping and ordering does not exceed 25% the cost of clustering for DAG sizes greater than 100 nodes. Figure 3 shows the average speedup results versus the number of processors for three different DAG sizes, with It is very clear that the speedup achieved using the GLBSynch algorithm does not scale linearly with the increase in the number of processors.

5 Conclusion This work has introduced a low cost algorithm for mapping and ordering in the context of synchronous communication, called GLB-Synch. The simulation results show that a multi-step scheduling using NLC-SynchCom algorithm for clustering, and GLBSynch algorithm for mapping and ordering retain the same low complexity cost for both steps. However, the performance study has shown limited speedup gain over different DAG shape factors. Therefore, further improvements in clustering and mapping techniques are needed to achieve high performance.

References 1. Arafeh, B.: Non-linear Clustering Algorithm for Scheduling Parallel programs with Synchronous Communication on NOWs. International Journal of Computers and Their Applications, Vol. 10, no. 2 (2003) 103-114 2. Foster, I., Kesselman, C.: Globus: A metacomputing infrastructure toolkit. The International Journal of Supercomputing Applications and High Performance Computing, Vol. 11, no. 2(1997)115-128 3. Kadamuddi, D., Tsai, J.: Clustering Algorithm for Parallelizing Software Systems in Multiprocessors. IEEE Transactions on Software Engineering, Vol. 26, no. 4 (2000) 340-361 4. Liou, J.-C, Palis, M.A.: A Comparison of General Approaches to Multiprocessor Scheduling. Proceedings of International Parallel Processing Symposium (1997) 152-156 5. Radulescu, A.: Compile-Time Scheduling for Distributed-Memory Systems. Ph.D. thesis, Faculty of Information Technology and Systems, Delft University of Technology, Delft, The Netherlands (2001) 6. Sarkar, V.: Partitioning and Scheduling Programs for Execution on Multiprocessors. MA:MIT press, Cambridge (1989)

An Efficient Load Balancing Algorithm on Distributed Networks Okbin Lee1, Sangho Lee1, and Ilyong Chung2 1

Dept. of Computer Science, Chung-buk University, Cheongju, Korea [email protected], [email protected] 2

Dept. of Computer Science, Chosun University, Kwangju, Korea [email protected]

Abstract. In order to maintain load balancing in a distributed system, we should obtain workload information from all the nodes on the network. This processing requires communication complexity, where is the number of nodes. In this paper, we present a new synchronous dynamic distributed load balancing algorithm based on design, where Our algorithm needs only communication complexity and each node receives workload information from all the nodes without redundancy. And load balancing is maintained so that every node has the same amount of traffic for transferring workload information.

1

Introduction

In a distributed system it is desirable that workload be balanced between processors so that utilization of processors can be increased and response time can be reduced. A load balancing scheme [1]-[2] determines whether a task should be executed locally or by a remote processor. This decision can be made in a centralized or distributed manner. In a distributed system, distributed manner is recommended. In order to make this decision, each node can be informed about the workload information of other nodes. Also this information should be the latest because outdated information may cause an inconsistent view of the system state. So disseminating load information may incur a high link cost or a significant communication traffic overhead. For example, the ARPANET[3] routing algorithm is a distributed adaptive algorithm using estimated delay as the performance criterion and a version of the backward-search algorithm [4]. For this algorithm, each node maintains a delay vector and a successor node vector. Periodically, each node exchanges its delay vector with all of its neighbors. On the basis of all incoming delay vectors, a node updates both of its vectors. In order to collect global information in a completely connected graph, 1 round message interchange is required and traffic overhead for exchanging workload information is where is the number of nodes. CWA(Cube Walking Algorithm) [5] is employed for load balancing on hypercube network. It requires round message interchange and communication complexity is M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 259–262, 2004. © Springer-Verlag Berlin Heidelberg 2004

260

O. Lee, S. Lee, and I. Chung

[6]-[7]. And SBN(Symmetric Broadcast Networks) provides a communication patterns among nodes in a topology-independent manner. It also needs round message interchange and communication complexity is [8]-[9]. In this paper we design an algorithm for exchangibg workload information among distributed nodes. In this algorithm, each node receives information from nodes and same time, sends information to other nodes periodically, where So, each node receives workload information for nodes with two-round message interchange. Also our algorithm needs communication complexity and each node bears equal share of the traffic overhead.

2

About

Design

In order to exchange workload information, we are using design. Let be a set of elements. Let be a set of blocks, where and For a finite incidence structure if satisfies following conditions, then it is a balanced incomplete block design(BIBD)[11], which is called a 1. B is a collection of of V and these are called the blocks. 2. Each element of V appears exactly of the blocks. 3. Every two elements of V appears simultaneously in exactly of the blocks. 4. For a if it satisfies and then it is a symmetric balanced incomplete block design (SBIBD)[10] and it is called a There are some relations ,such as and among parameters We’ve suggested an algorithm for generating design[11]. For example, a (7,3,1) design is as follows.

3

Design of an Efficient Load Balancing Algorithm on (v,k+1,1)-Design

In this chapter, an efficient load balancing algorithm is now constructed on design. Notations in our algorithm is as follows. : j-th element of the set X . : i-th set of the family of sets X . : i-th family of sets of families of sets X .

An Efficient Load Balancing Algorithm on Distributed Networks

Let X be a design for and let every element Then, every node can be mapped into element Definition 1. Each node defines

and

261

include

as follows.

Algorithm for Construction of an Efficient Load Balancing Algorithm 1. For every node sends workload information of itself to node and renews the table of workload information. 2. For every node sends a set of workload information to node and renews the table of workload information. 3. Repeat the first step.

Lemma 1. In the algorithm step 1, each node receives information from nodes on Proof. In the algorithm step 1, node sends information of itself to So, if the condition is satisfied, node receives information from node And is the condition for satisfying According to the Definition 1, is true. Therefore node receives information from nodes on Lemma 2. In the algorithm step 2, each node receives information from nodes on Proof. In the algorithm step 2, node sends informations to So, if the condition is satisfied, node receives information from node According to the Definition 1, equals So, if is a member of node receives information from node This means node receives from nodes on Each node on sends information to node And

262

O. Lee, S. Lee, and I. Chung

is empty set, where Because on design, every two objects appears simultaneously in exactly one , and So, each node receives information from nodes on Theorem 1. According to our Algorithm, each node obtains global workload information by 2 round message interchange. Proof. In the algorithm step 1, node receives information and in the algorithm step 2, node receives information. And intersection of these sets is empty. Because is is Therefore, node receives workload information for nodes.

4

Conclusion

In order for the system to increase utilization and to reduce response time, workload should be balanced. In this paper, we present an efficient load balancing algorithm based on design. In our algorithm, each node receives global workload information by 2 round message interchange. And traffic overhead for obtaining global information is where is the number of nodes. Also, traffic overhead for each node is equally.

References 1. M. Willebeek-Lemair and A. P. Reeves, Strategies for dynamic load-balancing on highly parallel computers, IEEE TPDS, vol.4, no.9, 1993. 2. B.A. Shirazi, Scheduling and load balancing in parallel and distributed systems, IEEE Computer Society Press, 1995. 3. M. Padlipsky, A perspective on the ARPANET Reference Model, Proc. of INFOCOM, IEEE, 1983. 4. L. Ford, D. Fulkerson, Flow in Network, Princeton University Press, 1962. 5. M. Wu, On Runtime Parallel Scheduling for processor Load balancing, IEEE TPDS, vol.8, no.2, 1997. 6. K. Nam, J. Seo, Synchronous Load balancing in Hypercube Multicomputers with Faulty Nodes, Journal of Parallel and Distributed Computing, vol. 58, 1999. 7. H. Rim, J. Jang, S. Kim, Method for Maximal Utilization of Idle links for Fast Load Balancing, Journal of KISS, vol.28, no.12, 2001. 8. S. Das, D. Harvey, and R. Biswas, Adaptive Load-Balancing Algorithms Using Symmetric Broadcast Networks, Journal of parallel and Distributed Computing, vol. 62, no. 6, 2002. 9. S. Das, D. Harvey, and R. Biswas, Parallel Processing of Adaptive Meshes with Load Balancing, IEEE TPDS, vol.12, no.12, 2001. 10. C.L.Liu, Block Designs in Introduction to Combinatorial Mathematics, McGrawHill, 1968. 11. O. Lee, I. Chung, S. Lee, The Design of a special incidence structure satisfying the conditions of (v,k+1,1)-configuration, JAMC, vol.12, no.1-2, 2003.5

Optimal Methods for Object Placement in En-Route Web Caching for Tree Networks and Autonomous Systems* Keqiu Li and Hong Shen Graduate School of Information Science Japan Advanced Institute of Science and Technology 1-1, Asahidai, Tatsunokuchi, Ishikawa, 923-1292, Japan

Abstract. This paper addresses the problem of computing the locations of copies of an object to be placed among the en-route caches such that the overall cost gain is maximized for tree networks. This problem is formulated as an optimization problem and both unconstrained and constrained cases are considered. The constrained case includes constraints on the cost gain per node and on the number of copies to be placed. Low-cost dynamic programming-based algorithms that provide optimal solutions for these cases are derived. Furthermore, based on our mathematical model, a solution to coordinated en-route web caching (CERWC) for autonomous systems (ASes) is also presented. The implementation results show that our methods outperform existing algorithms for both CERWC for linear topology and object placement at individual nodes. Key words: Web caching, dynamic programming, object placement, tree network, autonomous system.

1

Introduction

With the explosive growth in popularity of the World Wide Web, prompt web content delivery is becoming increasingly important. Web caching is an important technology for improving web performance, since caching objects close to users can save network bandwidth, alleviate server load, and reduce internet access latency. An overview of web caching can be found in [2,6]. To obtain the full benefits of web caching, different architectures have been employed such as hierarchical and distributed caching [4]. En-route caching is a new caching architecture developed recently [5] in which caches are placed on the access path from the user to the server. Each en-route cache intercepts any request that passes through its associated node, and either satisfies the request by sending the requested object to the client or forwards the request upstream along the path to the server until it can be satisfied.Cooperative caching, in which * This work is supported by Japan Society for the Promotion of Science (JSPS) Grantin-Aid for Scientific Research under its General Research Scheme (B) Grant No. 14380139. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 263–270, 2004. © Springer-Verlag Berlin Heidelberg 2004

264

K. Li and H. Shen

caches cooperate in serving each other’s requests and making storage decisions, is a powerful paradigm to improve cache effectiveness. In this paper, we address the problem of CERWC for tree networks and ASes to compute the locations of copies of an object to be placed among the en-route caches such that the overall cost gain is maximized. Our contributions are summarized as follows: (1) We propose a novel mathematical model for CERWC for tree networks. In our model, we incorporate both object placement and replacement policies to compute the object caching locations in a coordinated way. (2) We present low-cost dynamic programming-based algorithms to solve the problem for the different cases and theoretically show the algorithms to be either optimal or convergent to the optimal solutions. (3) We extend our solution to CERWC for ASes. (4) We perform extensive simulation experiments to evaluate our model by several performance metrics. The results show that our methods outperform existing algorithms for either CERWC for linear topology or object placement at individual nodes only. The rest of the paper is organized as follows. Section 2 formulates our mathematical model. Section 3 and Section 4 focus on unconstrained and constrained CERWC for tree networks respectively. Section 5 concentrates on CERWC for ASes. Section 6 describes the simulation model and discusses the experimental results. Section 7 summarizes our work and concludes the paper.

2

Mathematical Model

We model the network as a tree T = (V,E), where V is the set of nodes, each of which is associated with an en-route cache, and E is the set of network links. Figure 1 shows an example of such a tree topology. In this paper, we use to denote a tree whose root is

Fig. 1. Web Caching for Tree Networks

Let be a subset of nodes, at each of which a copy of an object is cached. For every node denotes the set of all nodes that are the descendants of node and denotes the set of all nodes that are the children of node Let denote the access frequency of object O defined by the number of requests to access object O during a certain period of time (including the

Optimal Methods for Object Placement in En-Route Web Caching

requests from node

itself and from others), obviously,

this paper, we assume that

265

In

is a constant. For any two nodes

denotes the set of all edges on the path between and and denotes the set of all nodes on the path between and including and For object O, we associate every edge with a nonnegative cost whose maximal value is a constant, and the cost on multiple edges is defined as the summation of the cost of individual edge. As we mentioned above, it is necessary and important to find methods to optimally distribute copies of an object among the en-route caches, since the size of each cache is limited. Accordingly, when a new object is stored in a cache, one or more objects may need to be removed from the cache to make room for it, if necessary. Storing an object at a node enables all the requests previously passing it now to be satisfied at it; hence, its access cost, which is defined in this paper as cost saving, is decreased. Similarly, removing the copy of an object from a node increases its access cost, which is defined as cost loss. In this paper, we consider cost saving and cost loss in a coordinated way. Let be the miss penalty of object O with respect to node which can be given by where is the nearest higher level node of that stores a copy of object O (see Figure 1). Therefore, the cost saving for node denoted by can be defined as where is the total access frequency of object O that can still be served by the original caches that belong downstream of node if the copy of object O stored at node is removed. For instance, in Figure 1, Let be the cost loss for storing a copy of object O at node Thus, the cost gain for a single node denoted by can be defined as Based on the cost gain for a single node, we can present the mathematical model for CERWC as an optimization problem as follows:

where is called the constraint space. In the following sections, we consider different cases of

3

UCERWC for Tree Networks

In this section, we focus on solving the UCERWC problem for tree network topology. Based on Equation (1), the UCERWC problem for tree can be defined as follows:

K. Li and H. Shen

266

where Section 2. Let

and

and

are the same as defined in

be a subtree of whose node set is where Similarly, we can define the UCERWC problem for tree Before presenting an algorithm for solving Equation (2), we give the following theorems. The detailed proof can be found in [3]. Theorem 1. For tree

Theorem 2. For tree

if

then we have

if

then we have

By Theorem 1, we can see that CERWC for tree can be decomposed into that for subtrees Otherwise, we store a copy at node and further consider the subtree By Theorem 2, we can see that for tree if then we do not store a copy of an object at node and further consider the subtrees where Otherwise, we store a copy at node and further consider the subtree Therefore, based on Theorem 1 and Theorem 2, we can present a dynamic programming-based algorithm for UCERWC for tree networks.

4

CCERWC for Tree Networks

In this section, we concentrate on solving the CCERWC problem for the case in which the network topology is a tree, determining the locations for placing copies of an object among the en-route caches such that the overall cost gain is maximized under different constraints, including non-negative cost gain per node, placing exactly copies, and placing at most copies of an object among the en-route caches.

4.1

Non-negative Cost Gain Per Node

Based on Equation (1), the CCERWC problem of non-negative cost gain per node for tree can be defined as follows:

where

Optimal Methods for Object Placement in En-Route Web Caching

267

Before developing the dynamic programming-based algorithm for solving Equation (3), we propose the following theorems. Theorem 3. If to tree then

is the optimal solution to Equation (2) with respect is also the optimal solution to the following equation:

By Theorem 3, we can obtain the optimal solution to Equation (4) by solving Equation (2). It is easy to see that Equation (3) is equivalent to the following Equation.

By Theorem 3, we can get an optimal solution to Equation (5) by solving the following Equation:

Therefore, we can obtain an optimal solution to Equation (3) by solving Equation (6). Based on the algorithm proposed in Section 3, we present a dynamic programming-based algorithm for CCERWC of non-negative cost gain per node.

4.2

Placing Exactly

Copies of an Object

Similarly, the CCERWC problem of placing exactly copies of an object among the en-route caches for tree can be defined as follows:

Suppose that is the optimal solution to Equation (2) with respect to tree we can easily know that it is not necessary to place more than copies of an object among the en-route caches, where Otherwise, there must be at least one node whose cost gain is negative. Therefore, should be less than So we first compute by algorithm 2 by setting and the optimal locations are all the nodes in when Let then it can be easily proved that if then there is no feasible solution to Equation (3). Therefore, we can know that the

268

K. Li and H. Shen

parameter in Equation (3) should satisfy: It is obvious that the number of copies of an object to be placed in the network is relevant to the parameter Hence, the proper selection of determines the number of caching locations. The crucial observation is that the number of caching locations is a monotonically decreasing function of that is, as increases, the number of caching locations decreases monotonically. Therefore, we can determine the optimal locations for placing exactly copies of an object among the en-route caches by tuning the parameter

4.3

Placing at Most

Copies of an Object

Based on Equation (1), the CCERWC problem of placing at most copies of an object among the en-route caches for tree can be defined as follows:

Suppose that where is the optimal solution to Equation (3) by setting then we have the following theorem on the relationship between Equation (8) and Equation (2). Theorem 4. if

then Equation (8) is equivalent to Equation (2).

From Theorem 4, we can see that the problem as described in Equation (2) can be viewed as a special case of the problem being discussed in this subsection by setting

5

CERWC for Autonomous Systems

ASes play an important role in routing objects in the internet. In this paper, we assume that the network topology for each AS is a tree. We denote the whole network by and assume that there are ASes in the network, each of which is represented by tree Based on Equation (1), the problem for CERWC for ASes is defined as follows:

From Equation (9), we can see that this problem degenerates to the problem addressed in Subsection 4.2 when i.e., determining the optimal locations for placing copies of an object in tree networks. In this section, we also use to denote the overall maximum cost gain for placing copies of an object in for convenience, i.e., where is the optimal solution to Equation (9).

Optimal Methods for Object Placement in En-Route Web Caching

269

Fig. 2. Experiment for Average Response Ratio

Fig. 3. Experiment for Object Hit Ratio

Now we apply the following idea to solve this problem. We first divide into two parts: and thus, we consider the problem of placing copies of an object in the first part and copies of an object in the second part, where Then, we divide into two parts: and thus, we consider the problem of placing copies of an object in the first part and copies of an object in the second part, where We repeat this process until there is only one tree left. Regarding the recursive process, we have the following theorem. Theorem 5.

where Based on Theorem 5, we can present an algorithm for CERWC for ASes.

6

Simulation and Results

We have performed extensive simulation experiments for comparing the results of our model with those of the existing models. The network topology is randomly

270

K. Li and H. Shen

generated by the Tier program [1]. We have conducted experiments for a lot of topologies with different parameters and found that the performance of our model was insensitive to topology changes. Here, we list only the experimental results for one topology because of space limitations. In our experiments, we compare the performance results of different models across a wide range of cache sizes, from 0.04 percent to 12 percent. Due to space limitation, we only list two performance results including the average response ratio and the object hit ratio. In our experiments, we denote the results for the LRU model [7] by LRU, the results for the model proposed in [5] by LT, and the results for the model proposed in Section 3 by TT.

7

Conclusion

The performance of en-route web caching depends mainly on the locations of caches and on the management of cache contents. In this paper, we studied the coordinated en-route web caching problem on tree networks and ASEs for both unconstrained and constrained cases. We presented a mathematical model that integrates both cost loss and cost saving for storing an object at a node. We also proposed low-cost dynamic programming-based algorithms for the different cases. The simulation results show that our methods significantly outperform the existing algorithms which consider either coordinated en-route web caching for linear topology or object placement (replacement) at individual nodes only.

References 1. K. L. Calvert, M. B. Doar, and E. W. Zegura. Modelling Internet Topology. IEEE Comm. Magazine, Vol. 35, No. 6, pp. 160-163, 1997. 2. B. D. Davison. A Web Caching Primer. IEEE Internet Computing, Vol. 5, No. 4, pp. 38-45, 2001. 3. K. Li and H. Shen. An Optimal Method for Coordinated En-Route Web Object Caching. Proc. of the 5th Int’l Symp. on High Performance Computing(ISHPCV), pp. 268-375, Tokyo, Japan, 2003. 4. P. Rodriguez, C. Spanner, and E. W. Biersack. Analysis of Web Caching Architectures: Hierarchical and Distributed Caching. IEEE/ACM Transaction on Networking, Vol. 9, No. 4, pp. 404-418, 2001. 5. X. Tang and S. T. Chanson. Coordinated En-Route Web Caching. IEEE Transactions on Computers, Vol. 51, No. 6, pp. 595-607, 2002. 6. J. Wang. A Survey of Web Caching Schemes for the Internet. ACM SIGCOMM Computer Communication Review, Vol. 29, No. 5, pp, 36-46, 1999. 7. S. Williams, M. Abrams, C. R. Standbridge, G. Abdulla, and E. A. Fox. Removal Policies in Network Caches for World Wide Web Documents. Proc. ACM SIGCOMM’96, pp, 293-305, 1996.

A Framework of Tool Integration for Internet-Based E-commerce Jianming Yong1,2 and Yun Yang2 1

Department of Information Systems, Faculty of Business University of Southern Queensland, Toowoomba, QLD 4350, Australia [email protected] 2

CICEC - Centre for Internet Computing and E-Commerce School of Information Technology, Swinburne University of Technology PO Box 218, Hawthorn 3122, Australia [email protected]

Abstract. Tool integration has been an important research area for many years after software engineering became a key player in information industry. With the Internet establishing ubiquitous connectivity, more and more tools become reachable through the Internet. The Internet has massively changed the environment of software engineering. This paper extensively discusses relevant technologies of tool integration for Internet-based e-commerce. A new framework of tool integration for Internet-based e-commerce is proposed designed and demonstrated. An XML-based message server can effectively serve tool integration for Internet-based e-commerce within a business/integration domain.

1 Introduction Tool integration has been researched for many years with the development of software engineering. Traditionally tool integration was implemented by an integrated environment, like Pact environment [1], Gravity [2], Shamash [3], PCE, the Field environment [4], the Forest environment , EAD environment, Integrated projectsupport environment, etc. But after the Internet has been widely accepted as a main vast platform for e-commerce, the traditional tool integration, which mainly focuses on the environment, has to be adjusted to meet the needs of Internet-based ecommerce. Thus tool integration will need to facilitate the Internet-based ecommerce. In recent years more research on tool integration is placed on the Webrelated environments and applications. Actually the Internet has pushed software engineering into a more open-standard and distributed environment. That brings a big challenge to software tools, which contribute to the main development of software engineering. Especially in the recent years the World Wide Web has become an effective platform for e-commerce. Currently e-commerce falls into heavy Web-based applications, which will need the following requirements [5]: The necessity of handling both structured data and non-structured data The support of exploratory access through navigational interfaces A high level of graphical quality

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 271–278, 2004. © Springer-Verlag Berlin Heidelberg 2004

272

J. Yong and Y. Yang

The customisation and possibly dynamic adaptation of content structure, navigational primitives, and presentation styles The support of proactive behaviour Security, scalability, and availability Interoperability with legacy systems and data Ease of evolution and maintenance Based on these requirements, different kinds of tools will be needed to facilitate Internet-based e-commerce. Different tools, which may serve for a same or similar function, may be created by different vendors. Even same vendors may have different tools to serve different purposes over the Web-based applications, especially ecommerce. It is very important to find a way to effectively integrate tools, which are used to create applications or tools over the Internet. Thus tool integration over Internet-based e-commerce becomes a very challenging task for researchers. Although tool integration cannot completely overcome all the problems and the inconsistence among software tools will exist forever, this paper will try to discuss some important issues related to tool integration over Internet-based e-commerce. Through this, a mechanism can be established to better serve the development of Internet-based e-commerce from the perspective of tool integration. This paper will be organised as follows. Section 2 will discuss the application architecture of Internet-based e-commerce. Section 3 will introduce some tools for Internet-based e-commerce. Section 4 will review some methods of tool integration. Section 5 will propose a tool integration framework for Internet-based e-commerce. Sections 6 and 7 will detail the design and demonstration of tool integration for Internet-based e-commerce. Section 8 will conclude this paper.

2 Application Architecture of Internet-Based E-commerce In order to better understand how the Internet facilitates the e-commerce transactions, we use a traditional way, like the network application architecture, to split functions, which usually are created by tools, between clients (customers) and servers (sellers/service providers). Thus the following architecture can be used to describe how the functions of e-commerce can be achieved by client-server e-commerce systems. 2.1 Client-Server E-commerce Systems Today most organisations are using client-server architectures to build their applications and services. This architecture overcomes the shortcomings of the pure client-based or server-based architectures in isolation and can flexibly balance the processing between the client and the server based on the property of applications. Thus this architecture can satisfy the requirements of Internet-based e-commerce. The functions and tools are distributed among clients and servers based on business requirements. This architecture can be flexibly tailored to thin-clients or fat-clients according to different requirements of different e-commerce systems. For example, if clients are only the customers of this system, in the client side, minimal functions and tools will be required to sit on the client side. But if the clients are the developers of e-

A Framework of Tool Integration for Internet-Based E-commerce

273

commerce systems, which may require more functions and tools to sit on the client side so that the clients have enough capacity to required e-commerce functions, which will be provided to the customers over the Internet. Although this architecture can better serve for Internet-based e-commerce systems than other two architectures, it is not easy for tool integration because different vendors may have quite different client systems and server systems, which support by broad tools. In order to facilitate tool integration, traditional two-tier client-server architecture needs to extend to multi-tier client-server architecture.

3 Tools for Internet-Based E-commerce Because Internet-based e-commerce has massively involved the Web usage, tools related to Web development have played a big role in Internet-based e-commerce. As Fraternali [5] pointed out that existing tools have been grouped into six categories for Web tools: Visual editors and site managers; Web-enabled hypermedia authoring tools; Web-DBPL integrators Web form editors, report writers, and database publishing wizards; Multi-paradigm tools; Model-driven applications generators In addition to previous six categories of Web application tools, Internet-based ecommerce definitely needs all sorts of security tools to support its business implementation and transactions. Well-known security tools include networking monitoring tools, authentication/password tools, service-filtering tools, tools to scan hosts for known vulnerabilities, integrity-checking tools, encryption tools, etc. Only with these security tools help, Internet-based e-commerce can be broadly conducted by the convinced producers and customers.

4 Traditional Approaches for Tool Integration There are several traditional approaches for tool integration: tool-to-framework integration [6], tool-to-tool integration [7], tool integration based on the message server [8]. Tool-to-framework integration adopts an approach that tools can be encapsulated within a single software engineering framework, with all communication taking place by way of the framework. This approach is very feasible within one organisation/Intranet. But because of the diversity of the Internet, which connects countless heterogeneous systems, which are related to different software engineering frameworks, this approach becomes impossible for Internet-based e-commerce. Tool-to-tool integration is based on the belief that tools may also be integrated directly, with some communication by-passing the underlying framework. It works very well for certain scenarios. For example, if one tool has a fixed connection with another tool, we can directly use some techniques to integrate two tools. But for

274

J. Yong and Y. Yang

Internet-based e-commerce systems, most connections are not fixed and also so many tools are dispersed over the Internet. It is almost impossible to integrate those countless tools together based on the one-by-one method. Thus this approach cannot satisfy the requirements of tool integration for Internet-based e-commerce. Tool integration based on the message server was initially proposed by Reiss [4]. Its prototype is called Field. The distinguishing feature of Field is a centralised message server, called Msg, which routes messages between tools. Each tool in the environment registers with Msg a set of message patterns that indicate the kinds of messages it should receive. Tools send messages to Msg to announce changes that other tools might be interested in. Msg then the server “selectively” broadcasts those messages to tools whose patterns match those messages. Figure 1 shows how Field works.

Fig. 1. Message server working with tools

For example, a user makes a change in Tool 1. Tool 1 will tell the message server what has changed. Then the message server broadcasts the change to all associated tools. Suppose Tool 3 is interested in this change because this change will affect tool 3 later on. Thus tool 3 adjusts its functions to respond to Tool 1’s change. But this broadcast-based tool integration does not solve the problem of the control issues. For example, after Tool 1 notifies the change, the message server will broadcast this change to all other members. Then no one knows what will happen to related tools. Thus there will need a control mechanism to know what should happen. In order to solve this issue, the Field system needs extending the policy support for selective broadcast in tool integration (Figure 2). The policy repository has the rules that determine how and when tools are invoked in the software development process. It still uses a centralised message server. But the server maintains a set of patterns that indicate messages of interest for each tool. The action will be taken only when a message is successfully matched against a pattern by a set of condition-action pairs. These pairs define the interaction policy for that tool by controlling which messages are routed between tools.

A Framework of Tool Integration for Internet-Based E-commerce

275

Fig. 2. Policy support for selective broadcast

5 New Integration Framework for Internet-Based E-commerce Because the Internet has the characteristics of broad distribution, and also all heterogeneous business domains, entities, processes are connected together by the Internet, it is impossible to use one standard/approach/theory to deal with tool integration for Internet-based e-commerce. Traditional tool integration mechanisms, which mainly focus on creating a relatively unified environment, such as PSE (programming support environments) and SEE (software engineering environments) [9], cannot satisfy this demand, which has taken place among business environments because of the Internet use. In order to adopt and adapt to this ubiquitous connection by the Internet, a new methodology for tool integration has been proposed to deal with tool integration for Internet-based e-commerce. Based on the current popular architecture, which is discussed in the architecture section (section 2), of Internetbased e-commerce, we propose a ubiquitous tool integration framework for Internetbased e-commerce, distributed message connection through middleware server with XML. Because XML actually has been accepted as a de facto standard for message descriptions and exchanges, which are understandable to both human and computers, a significant convenience is brought into message-based tool integration for Internetbased e-commerce. A new framework is illustrated in Figure 3. If one tool wants to communicate with any other tools, it has to send an XML message to the message server. The message server needs to read this message against its message repository (MR). Then after the message matches its MR, the action rule repository (ARR) will be searched to use relevant rules to guide which actions should be taken by relevant tools. The details are in the next section.

6 Design of XML-Message-Based Tool Integration Because the XML MR needs to feeding in XML messages, all relevant tools are required to have ability to output XML messages for the message server to accept.

276

J. Yong and Y. Yang

Most current tools have XML message output functions. But some “legacy” tools cannot provide the XML message output. They need translation agents to facilitate this connection. Because the research on agents is beyond the scope of this paper, it will not be detailed how translation agents work for “legacy” tools and the message server. Instead, MR and ARR are the key to tool integration for Internet-based ecommerce. Thus details of MR and ARR are discussed in this sections.

Fig. 3. Framework of XML-message-based tool integration

6.1 Message Repository (MR) MR has a full list of all tools which belong to one integration domain. In most cases, one tool usually only needs cooperation from other tools within one integration domain. The MR is organised as a tree structure, which is easy for an incoming message to match its desired pattern. The document head includes information about the XML version, encoding, name space of integration domain, etc. All XML message blocks have XML schemas for activating incoming messages from requested tools. If a message is for Tool 1, the message will be sent to Tool 1 XML message block for further processing, which is relevant to ARR. If one tool needs cooperation beyond its integration domain, the message server will need to negotiate with other message servers to facilitate this cooperation. Then the message will be sent to the outside domain XML message block for further processing.

6.2 Action Rule Repository (ARR) After incoming messages are matched in MR, MR will request ARR to retrieve related action rules for requested tools to take actions in order to achieve the purpose of integration. ARR has all designed rules.

A Framework of Tool Integration for Internet-Based E-commerce

277

7 Demonstration In order to clearly demonstrate how MR and ARR work together, a finite state machine (FSM) is used to model their transactions. Suppose there are a number of tools in the integration domain. These tools are modeled as a set: T sends out messages for MR’ FSM to process. After MR’s FSM accepts incoming XML messages, ARR’s FSM is activated. This process is shown in Figure 4.

Fig. 4. FSM in message server

S0 is an initial state of the system, no tools need cooperation from others. After users start a tool for a request of cooperation from others, state S0 transits into state T. When FSM stays at state T and any XML messages send out, state T will transit into state A, which is located at MR. When FSM stays at state A and matched XML messages are found, state A will transit into state M, which is located at ARR. When FSM stays at state A and no XML messages are matched in MR, state A will transit into state D, which is located at MR for abnormal processing. When FSM stays at state M and satisfied conditions are identified, state M will transit into state T, which notifies other tools to take action to cooperate with the initial tool. If those woke-up tools need further cooperation, a new cycle of will happen. When FSM stays at state M and no satisfied conditions are found, state M will transit into state F, which is in charge of abnormal processing. When FSM stays at state F and error processes have been finished, state F will transit into state D. When FSM stays at state D and error reports return to initial tools, state D will transit into state T for another new cycle.

278

J. Yong and Y. Yang

When FSM stays at state T and no any new events happen, state T will transit into state S0, which the system is in the idle status.

8 Conclusions This paper extensively reviews tool integration from the perspective of software engineering. Then tool integration for Internet-based e-commerce is broadly discussed. Many useful Web tools are categorised in the paper. Finally a new framework of tool integration for Internet-based e-commerce has been proposed. Furthermore the design has been illustrated with a finite state machine used to demonstrate the transaction between tools and the message server as proof of concept, which is the key of this integration solution. Through the new framework of tool integration for Internet-based e-commerce, more and more Internet-based tools can be effectively integrated with each other and serve better for business functions.

References 1. 2. 3. 4. 5. 6. 7.

8.

9.

Thomas, I. Tool Integration in the Pact Environment. Proc. of the 11th international conference on Software engineering. 1989. Pittsburgh, Pennsylvania, United States: ACM, pages:13-22. Rangarajan, M., et al. Gravity: An Object-Oriented Framework for Hardware/Software Tool Integration. Proc. of the 30th annual Simulation Symposium (SS’97). 1997. Atlanta, GA: IEEE, pages:24-30. Camacho, D., et al. Shamash: An AI tool for Modelling and Optimizing Business Processes. Proc. 13th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’01). 2001. Dallas, Texas, USA: IEEE, pages:306-313. Reiss, S.P., Connecting Tools Using Message Passing in the Field Program Development Environment. IEEE Software, 1990. 7(4): p. 57-66. Fraternali, P., Tools and Approaches for Developing Data-intensive Web Applications: a Survey. ACM Computing Surveys (CSUR), 1999. 31(3): p. 227-263. Gautier, B., et al. Tool Integration: Experiences and Directions. Proc. of the 17th international conference on Software engineering. 1995. Seattle, Washington, United States: ACM, pages:315-324. Bredenfeld, A. and R. Camposano. Tool Integration and Construction Using Generated Graph-based Design Representations. Proc. of the 32nd ACM/IEEE conference on Design automation conference. 1995. San Francisco, California, United States: ACM Press New York, NY, USA, pages:94-99. Garlan, D. and E. Ilias. Low-cost, Adaptable Tool Integration Policies for Integrated Environments. Proc. of the fourth ACM SIGSOFT symposium on Software development environments. 1990. Irvine, California, United States: ACM Press New York, NY, USA, pages:1-10. Ossher, H., W. Harrison, and P. Tarr. Software Engineering Tools and Environments: a Roadmap. Proc. of the conference on The future of Software engineering. 2000. Limerick, Ireland: ACM Press New York, NY, USA, pages:261-277.

Scalable Filtering of Well-Structured XML Message Stream* Weixiong Rao1, Yingjian Chen2, Xinquan Zhang2, and Fanyuan Ma1 1

Computer Science and Engineer Department, Shanghai Jiaotong University, China 200030 2

{rweixiong, fyma}@sjtu.edu.cn

Shanghai General Motor Data Center, Shanghai, China 201206

{yingjian_chen, xinquan_zhang)@shanghaigm.com

Abstract. We propose a novel XML filtering system, termed MTrie. MTrie supports an effective and scalable filtering of XML message stream based on XPath expressions. The features that MTrie provides for publisher/subscribe based MOM systems are: (1) MTrie can support a large number of XPath queries by merging these queries to a single trie-like data structure. (2) Based on DTD, MTrie convert the merged XPath queries into the MTrie index which makes XML filtering more effective and faster. (3) MTrie can support XML message filtering of heterogeneous DTD files by make an MTrie index for every DTD file. Our experiments results verify that MTrie outperforms than earlier work and shows scalability for both message size and XPath queries number.

1 Introduction EAI (Enterprise Application Integration) has become a promising technology in enterprise level application, especially through message-oriented middleware (MOM). The MOM is based on the publish/subscribe communication paradigm, which make the loosely coupled communication among applications become possible. The main task of MOM is to route messages from publishers to subscribers, including the messages filtering, routing, and dissemination. What makes MOM difficult is that Enterprise level application must be scalable to large amount of subscribers, providing scalability and also supporting the expressive message filtering. In largescale MOM systems, expressiveness and scalability must be carefully chosen because expressiveness and scalability are interdependent. On one hand, a bad filtering makes the every publisher map to every subscriber, which leads to M*N mappings where M is the number of publishers and N is the number of subscribers; On the other hand, a large scale subscribers requires more complex delivery strategies[1]. The content-based XML message routing allows subscribers to evaluate the whole content of messages and provide a more powerful and flexible message filtering and routing than the traditional channel or subject based mechanism. The challenge in *

Research described in this paper is supported by The Science & Technology Committee of Shanghai Municipality Key Technologies R&D Project Grant 03dz15027 and by The Science & Technology Committee of Shanghai Municipality Key Project Grant 02DJ14045

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 279–286, 2004. © Springer-Verlag Berlin Heidelberg 2004

280

W. Rao et al.

content-based XML message routing is how to publish the incoming XML message streams to a large scale of subscribers with XPath queries which can provide a set of rich expressiveness over both path-based structure expression and content-based content filter. Some pioneer works use the SAX’s event-based parsing and Finite State Machines (FSMs)[2], Nondeterministic Finite Automaton (NFA)[3] and lazy Deterministic Finite Automata (DFA)[4] for fast XML filtering. To a large scale of XPath queries, the cost of building FSM, NFA or DFA is rather expensive, especially the ancestor- descendant operator “//” can result in an exponential size data structure of DFA. We observe that the XML message streams in the enterprise level application, unlike the document-oriented XML data used for document exchange, are almost data-oriented XML data, well structured without recursive path, and have the DTD or XML schema to validate the XML message. Different from the previous works to build FSM, NFA or lazy DFA, we present a novel approach to filter wellstructure XML message. The key of our approach is to combine all XPath queries to a single trie index, termed MTrie, based on the XML message’s DTD or schema. By merging the common path elements in XPath queries, MTrie filters the common path element just one time when using a SAX parser to process the XML message stream without consuming plenty of main memory like DOM parser. Different from Yfilter[3], Xpush[4] or XTrie[5], we utilize the XML message’s DTD or schema to convert the Xpath path to the actual path called path pruning. To solve the challenge that this convert could result in linear size of tags for parent-child operator ‘/’ and exponential size of tags for ancestor-descendant operator ‘//’, we build an MTrie index for every DTD file and the size of an MTrie index is bound to that of an DTD graph. Our work on scalable filtering of well-structured XML message stream relies on past work in XML filtering systems. To efficiently support a large scalable XPath queries with XML message stream, recently some pioneer works have been developed. In XFilter[2], a Finite State Machine (FSM) for each path query is used to match the XML message. Yfilter[3] combines all XPath queries to a single NFA and can support high efficient, shared processing for large numbers of XPath queries. XPush[4] builds a lazy DFA for given XPath queries, sharing both path navigation and predicate evaluation among them. XTrie [5] indexes sub-strings of path expressions that only contain parent-child operators, and shares the processing of only these common sub-strings among the queries. Compared with these work to build the automatons or shared tire which lead to expensive cost for a large scale queries, our MTrier just combine common path of the XPath queries to a single MTrie index. The reminder of the paper is organized as follows. Related work is given in Section 2; Our MTrier architecture and its filtering algorithm are presented in Section 3; and the experiment evaluation is shown in Section 4. We conclude in Section 5.

2 MTrie Architecture and Matching Algorithm In this section, we describe MTrie architecture which is depicted as figure 1. The major components in MTrie include: (1) The event-based SAX parser to parse XML stream; (2) The DTD parser and XPath parser to build MTrie index and predicate table; (3) Filtering Engine for XML stream using the MTrie index and Predicate table. The running scenario of our MTrie EAI prototype is as follow. DTD files are pre-

Scalable Filtering of Well-Structured XML Message Stream

281

registered by application integration developer to validate the incoming XML message. To each DTD, we build a related DTD graph. Subscriber applications subscribe the XML data by XPath queries. Based on each DTD graph, all XPath queries from subscribers are merged into an MTrie index. Publisher applications publish the XML data message which is validated by the pre-registered DTD file. If the XML data satisfies a subscriber’s XPath query, MTire engine sends the XML data to the subscriber applications.

2.1 DTD and XPath Parser In the enterprise level application we observe that the XML message streams, unlike the document-oriented XML data used for document exchange, are almost dataoriented XML data, well structured without recursive path, and have the DTD or XML schema to validate the XML message. Here we use the DTD as our example. When the DTD files are registered in MTrie engine, the DTD parser will parse the DTD file and build the DTD graph. Just to express the simple parent-child relationship and element level, the DTD parser only parses the DTD element declaration and ignores the DTD attribute, DTD Entity and other declarations. Besides the parent-child relationship, DTD graph also has the levelID to show element’s level where the LevelID of root element is 1 and grows by 1 when element goes down in DTD graph .In order to further simplify DTD, we define three transformations on DTD element:

Fig. 1. Architecture of MTrie

(1) Separating: (3) Merging:

(2) Abbreviating: Thus a definition in a DTD such as can simplify to In such a simple DTD, we can easily build the DTD graph. Figure 2(a) and Figure 2(b) show an example DTD and its DTD graph. When an XML input stream is published by publishers, SAX parser begins to parse the XML and generates five types of events including startDocument(), endDocument(), startElement(), endElement() and characters(). The event model has constant parsing cost of O(1), compared with DOM parser. By using these events, we

282

W. Rao et al.

can drive them to match our MTrie index. In startElement() event, we check the element’s attribute name and attribute value; and the element text can also be checked in characters() event. So we can evaluate the filters that reference attribute name, attribute value and element content.

Fig. 2. DTD fragment an DTD graph

2.2 MTrie Indexing Building In MTire model, we use XPath as our query language which provides a flexible way to both structure and content filtering. The path structure query can provide the simple linear (for example: or tree-pattern (for example: E1[E2/@A1=V1] E3/@A2=V2) expression. The path can be absolute path like E1/E2 using parent-child operator ‘/’ or relative path like E2//E3 using ancestor-descendant operator ‘//’. Also XPath allows the use of a wildcard operator ‘*’ to match any element name. Besides the path structure query, XPath can allow one or more attribute value or element text filters. In MTrie, we combine the all XPath queries into a single Trie index. The node in MTrie is represented as the DTD element and the edge between two nodes in MTrie shows the parent-child relationship in DTD graph. Every XPath path follows the DTD graph from root element, and all common elements in XPath paths are merged into a node in MTrie. For the ‘E1/E2’ XPath path, E2 is searched from E1’s child elements in DTD graph and put into MTrie if found; otherwise the XML message does not satisfy the XPath path. For the ‘E1/*/E2’ XPath path, any element M who is the child element of E1 is searched to find a child element named as E2; then the element E1, M and E2 are inserted into MTrie. If there are n elements who are all children of E1 and parents of E2, all n elements are added to Mtrie and n paths like E1/Mi/E2 where i=1...(n-1) are built in MTrie. To ‘E1//E2’, DTD graph is traversed from E1 to find whether E1 has a descendent named E2 and all paths from E1 to E2 are added to MTrie.In Figure 3(a), XPath path P1 :/PLAY//PERSONA has 2 paths in DTD graph of Figure 2(b): and Both paths are added to MTrie index where the two common elements PLAY and PERSONAE are just represented once in MTrie index. In an MTrie index node, an auxiliary path is needed to record the whole path from the

Scalable Filtering of Well-Structured XML Message Stream

283

XPath’s start element node to end element node. Every path node in an auxiliary path includes the XPath Id and its level at this node. The level 1 shows the start level and level –1 shows the end. All auxiliary path of an XPath query can show all paths in MTrie. In figure 3(b), and show two actual paths of “P1: /PLAY//PERSONA” in MTrie. In node PERSONAE includes all auxiliary path which go through: P1(2), P2(-1), P3(2), P4(2), P5(2) and P6(2). Different from a simple linear Xpath query, an tree-pattern query, like /PLAY/PERSONAE [/TITLE]/PGROUP/PERSONA, is much more complex. We can decompose the tree-like query into multiple line-like queries. The XML message which meets all of the decomposed queries is the result for the tree-pattern query.

Fig. 3. XPath queries and MTrie Index

To evaluate the content of XPath queries, a predicate table is built. The MTrie index’s path node has a pointer that points to the element’s content filtering which can be the element text or the attribute value, the expression including both element text and attribute values. The element text and attribute value filtering can be handled easily in the SAX’s startElement() and characters() events. The complex content filtering expression has to be parsed to multiple element text or attribute value filtering which may decease the scalability of MTrie engine. In this paper, we focus on scalability of the structure filtering of our MTrie engine. When MTrie index and auxiliary path table are built, MTrie filtering engine can receive the events triggered from SAX parser and perform the matching of XML element path and content filtering against our MTrie index.

2.3 Matching Algorithm The filtering algorithm is the key part of our MTrie filtering engine. Compared with earlier XML matching system, MTrie matching algorithm is rather simple and efficient. It accepts 2 inputs: the SAX parsed XML data event stream and MTrie index, and returns the XPah queries who satisfy the XML data stream. The basic idea of MTrie filtering algorithm is as follow. When SAX parses XML stream, the XML data is traversed in pre-order style and MTrie index is checked to find whether the XML data is satisfied with XPath queries or not. For every incoming XML stream,

284

W. Rao et al.

the SAX startDocument() event fires and MTrie index begins the matching process from the root node of MTrie index. For each startElement() event, the XML stream’s level grow by 1 and MTrie filtering algorithm go down by the MTrie index and searches the child nodes of current node in MTrie index to find child node’s element name matches the element name in SAX’s startElement() event of XML message stream. If there exists such a child node where the XPath path node level equals to –1, the XPath query is the one of the candidate returned results. After the XPath query’s content filtering is met, the query will be added to returned results. When the SAX endElement () event is triggered, the XML stream’s level decreases by 1 and the current node in MTrie index goes up and returned to the parent node of the current node. Because the MTrie index is a sub-graph of DTD graph, all nodes in MTrie index can be traversed in SAX events of XML message stream. If MTrie index’s current node has arrived at the leaf node while the XML data stream has not reached the leaf element, MTrie index stays here until XML data stream returns the current node in the event of endElement(), and continues next step’s matching procedure. When SAX parser finishes traversing the whole XML data stream with the endDocument() event, MTrie matching procedure ends up with the returned satisfied XPath queries and sending the XML stream to those subscribers whose XPath queries is among the returned XPath queries.

3 Performance Evaluation We implemented an EAI prototype platform based MTrie engine in JAVA 1.4. To express MTrie’s performance, we use filter time as our two performance metrics. The filter time is defined as the time between XML message stream’s startDocument() event and endDocument() event. All Experiments were conducted on a 1.5GHz Intel Pentium 4 machine with 2048MB of main memory running Windows 2000 Server platform. We run our experiments on two groups of data set, one is the real data set from NASA (http://xml.gsfc.nasa.gov), another is the generated data from nine DTD files from NIAGRA experiment data page (http://www.cs.wisc.edu/niagara/ data.html) by using IBM’s XML generator tool. The ADC’s dataset.DTD in NASA contains 140 elements and 8 levels and we use the real XML files in the http://xml.gsfc.nasa.gov/pub/adc/xml_archives/. To generate the XML data of nine DTD files, IBM’s XML generator creates random XML instance based on our provided constraints. For every DTD files, we use the method in [3] to generate 250 XML documents with different sizes: small, medium, and large with an average of 20, 100 and 1000 pairs of tags, respectively. We generate the synthetic XPath queries using a similar version of XPath generator like [4]. The modified generator can generate XPath queries based on our input parameters including number of queries, maximum depth of a query, wildcard operator ‘*’ probability for each location step, “//” probability for each location step, number of value-based predicates in a query where the value is chosen from our pre-defined data. To simply our MTrie implementation, all of the generated XPath queries are linear queries. To compare our MTrie’s performance, we also implement YFilter and XPush using JAVA1.4. To address the question that how XML message data size and XPath queries number influence the performance of MTrie engine, we design the experiment to test the filtering time of three filter algorithms including MTrie, YFilter and XPush

Scalable Filtering of Well-Structured XML Message Stream

285

against NASA XML message size under different message size and XPath queries number. In figure 4(a) where three algorithms run with 50,000 XPath queries under different XML message size, MTrie engine performs with least filter time among three filtering algorithms because MTrie need not probe the DFA in XPush or NFA in YFilter and directly match MTrie index. In Figure 4(b) we find that MTrie performs with a stable filter time of 1800 ms after a linear increase at 300,000 queries. It can be explained that the merged MTrie index of 300,000 XPath queries has already been the whole NASA DTD graph, the MTrie inde of less 300,000 XPath queries is just a subgraph of DTD graph. While filter time of YFilter and XPush continues the linear increase even after 300,000 queries.

Fig. 4. 50,000 NASA XPath queries and 1M NASA XML message

Fig. 5. 50,000 XPath queries and 1M XML message size

To understand MTrie’s performance under multiple DTD files, we design the experiment to run MTrie engine under multiple MTrie index for every DTD and a single merged MTrie index for all DTD files. In the latter solution, multiple indexes are merged into a single larger MTrie index in the same as merging multiple XPath paths to a single MTrie index. Similar to the first experiment to test MTrie engine scalability on both XML message size and XPath queries, XML messages which

286

W. Rao et al.

generated from NIAGRA ‘s 9 DTD files is used for the input stream of our two solutions. In Figure 5(a), total 50,000 XPath queries are performed under 9 DTD files, and it shows 9 separated MTrie filter time shows the linear growth when the XML size becomes large and a merged MTrie index shows a larger growth. It is due to the fact that actors.DTD and movies.DTD both have a common root element called W4F_DOC, which makes the matching in mutiple DTDs has much less child element of W4F_DOC than a single MTrie index. When matching the element name in the SAX event, 1 MTrie index has 1/2 times of matching W4F_DOC’s child elements than 9 MTrie index. Figure 5(b) can directly shows 9 MTrie index has the better performance than a single MTrie index. According to figure 5(a) and 5(b), the merged MTrie index has worse performance than sepearated MTrie index if multiple DTD files exist who share the elements from root.

4 Conclusion In this paper, we have proposed a novel XML filtering system, termed MTrie. MTrie supports an effective and scalable filtering of XML message based on XPath expressions. The features that MTrie provides for publisher/subscribe abased MOM systems are: (1) MTrie can support a large number of XPath queries by merging these queries to a single trie-like data structure. (2) Based on DTD, MTrie convert the merged XPath queries into the MTrie index which makes XML filtering more effective and faster. (3) MTrie can support XML message filtering of heterogeneous DTD files by make an MTrie index for every DTD file. Our experiments result shows that MTrie outperforms than earlier work and show scalability for both message size and XPath queries.

References 1. A. Carzaniga, D. R. Rosenblum, and A. L. Wolf. Challenges for distributed event services: Scalability vs. expressiveness. In Engineering Distributed Objects ’99, May 1999. 2. ALTINEL, M., AND FRANKLIN, M. J. Efficient filtering of XML documents for selective dissemination of information. In Proceedings of VLDB Conference (2000). 3. Y. Diao, P. Fischer, M. Franklin, and R. To. Yfilter: Efficient and scalable filtering of XML documents. In Proceedings of ICDE, 2002. 4. Ashish Gupta, Dan Suciu, Stream Processing of XPath Queries with Predicates, In Proceeding of ACM SIGMOD Conference on Management of Data, 2003 5. C. Chan, P. Felber, M. Garofalakis, and R. Rastogi. Efficient filtering of XML documents with XPath expressions. In Proceedings of ICDE, 2002.

Break a New Ground on Programming in Web Client Side Jianjun Zhang and Mingquan Zhou

Department of Computer, Northwest University, Xi’an, China [email protected]

Abstract. This paper suggests adding new ability in browser to allow that the program running in web client side requests remote object. In this way, the browser will be extended from “an operator’s agent” to “an operator + program’s agent”, and then the browser will enable web program in web client side to organize and process the objects from network. Then it can support many new applications, such as Virtual Reality. The browser will become a new program running platform.

1 Introduction During the past 10 years of development of web, the technique at two ends (server side and client side) has been greatly improved. However, we also find that there is a great limitation on the capacity of browser, which functions as web agent in client sides. If the function of browser is extended and some limitation is discarded, the web programming technology in client side will be greatly improved as well. In server side, originally the server could only respond to simple requests and send an existing document to the client and now can invoke and execute programs of CGI, SSI, ISAPI, ASP, JSP and PHP etc, even the programs are in other servers. It can also interact dynamically with other servers via XML and SOAP under the multi-server architecture. Such powerful ability can approximately satisfy the requirement of the various applications of the web service. In the client side, originally the browser could only request a document with affiliated images and then display it on the screen. Now it can execute some program of Script, Java applet and ActiveX controls, all of which are imbedded in the web page and run at client side. This can achieve interactive ability between browser and operator. However, when we measure the current capacity of browser with Berners-Lee’s words [1], in which he describes the web “with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help”, we will find with a bit of disappointment that the ability of the current browser can not reach the goal of web services, and it can not support some quite simple Internet application and there is a great gap between the reality and our M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 287–293, 2004. © Springer-Verlag Berlin Heidelberg 2004

288

J. Zhang and M. Zhou

imaginations of “the browser is the universal computer operation platform in Internet Age”. What are the reasons for the situation? After our careful investigation, we find that there are several reasons on it. Among all, the most important one is that the browser cannot sufficiently support the ability that web program request and receive a data object from Internet. If we can extend the function of browser and let it support the web program to request remote data object from Internet as well as keeping security, we can greatly improve the ability of browser and break a new ground in the technique of programming in web client side.

2 The Limitation of Current Client Agent We can find out the limitation of function of current browser from an example.

2.1 The Requirement to Request Outer Object Dynamically Let us considerate a virtual reality requirement with communication interactively on Internet. In the scene there are lots of objects time-correlated with each other and they cooperate in action. Now, according to the operator’s online operating command, new objects should be added in. Some of objects are in several remote servers. You need to request them dynamically, then process them and add them into the current scene, in which they may appear and harmonize with other objects. There are three features with the requirement: First, the requests to outer objects are random submitted and you cannot decide which outer objects you really need when you create the web page. Second, you cannot download all the objects appeared in the document while the web page is loaded, because there are too many optional objects in several servers, and it is also impossible to distinct the useful objects from useless ones before the operator issue the command and the program run. Third, you cannot re-download all objects just for the new objects, because the objects to be downloaded are time-correlated with the objects that already exist. If you reload all the objects the original correlation may disappears. Can the current browser support the situation as the example above? Let us discuss the function of browser whether it supports the program running and its limitation.

2.2 The Present Situation of the Program Running on Client Side First, we may survey the present situation of the program running on client side. The programs running on the browser can be basically classified into 3 types according to their running mode: script, Java Applet and ActiveX controls. Java Applet and ActiveX are both regarded as component program but divided into two parts because of their different running mode.

Break a New Ground on Programming in Web Client Side

289

Java Applet is the first program running on client side. After the browser downloads and then scans the HTML document, the reference embedded in the document to the Java Applet will be found out. The browser will download its byte codes and then make up the executable codes together with Java library. The Java Applet runs under the support of Java Virtual Machine and Java running environment, which are embedded in the browser. In consideration of security, Java Applet can neither directly request the remote object in the other servers nor read and write files on client’s computer. This limitation greatly decreases the ability of Java Applet while it makes the system safer. Without the limitation the client’s computer system may be damaged if some evil programmers create a web page with virus and spread this virus to the computer that visits this web page. Java Applet runs as a plug-in component and it is a self-closed system that can hardly interact with other objects in a document or with browser. It cannot request other object or data from Internet dynamically. ActiveX controls are the code that can run on the client and have access authorization to the client computer. The main reason for using ActiveX controls is the reuse of software components, by which some modules with special function will be independently programmed following some roles. Then the entire software will be composed with a series of components named controls as to integrate the current controls into new software is timesaving. Apart from Java Applet, ActiveX is the technique with plug-in component widely used in both web programming and traditional programming. Its capacity is not limited. The controls in web page can be used to interact with operator, visit the computer system of users and request remote object. However it is quite difficult to use ActiveX to manipulate the objects in the document as its self-closeness of Active X. So it is not suited to web programming. Meanwhile, the software with such a strong function makes us worry about its security. By using the signal signature, Microsoft provides a solution for the security problems. However it is not enough to totally avoid the latent security problems. Script program includes JavaScript, VBScript, etc. After the browser downloads and then scans the HTML document, the script embedded in the document or a link that points to a script will be found out. The browser will download the script and then interpreted and executed it under the support of the script engine embedded in the browser. By using script we can operate almost all the objects in the document and in the browser. It also can dynamically invoke the ActiveX controls and Java Applet. Script programming is the most useful and convenient technique in the web programming, yet its function is greatly been limited in consideration of the security. We can neither use script to access the file in client’s computer nor dynamically request a new outer object. All the programming data and objects in the program should be already loaded.

2.3 The Current Way of Dealing with the Request to Remote Objects Now let us survey how the browser deals with the request to remote objects. First, the requests to remote objects come from 3 different sources: URLs designated in tags of the web page, Java Applet or Active X controls.

290

J. Zhang and M. Zhou

Browser will serve the requests from the URL in tags of the web page. While the browser loads the web page, it scans the document and processes every tag. When the browser finds a URL, it immediately requests the object and automatically processes the object according to the definition of the object in the tag. All the outer objects, defined by URLs in the web page, are loaded when the web page is load, no matter whether the object will be used consequently or not. As to the script, there is no way to request, receive and process a remote object. As to Java Applet, it can bypass the browser to access the current server by programming and request the remote object. However, by this way the acquired object can hardly be cooperated with the other objects in a web page. As to ActiveX controls, they are plug-in components and they are independent from the browser. Active X can access remote object by programming itself. However, this way of accessing remote object is quite complex, and the object gotten by this way is hard to cooperate with other objects in a web page. As a result, the current browser can hardly accomplish the function to dynamically request and use outer object (except that there is a backdoor for MS IE with nonnormal HTML tag to open a Internal Frame).

2.4 The Limitation of Function of Browser From the analysis above, we can find the limitation of function of browser. We know: By using static web page technique, it is directly impossible to fulfill the requirement in the example above; By using the current script technique, it is also impossible to fulfill such requirement because the script does not support to dynamically request or load a outer object; By using the Java Applet technique, the requested object cannot cooperate with the other non-applet object because the independency of Applet; By using the ActiveX, although you can request a remote object in an ActiveX controls and write it on the agent computer, the newly loaded object cannot cooperate with the current used objects because of the independency of Active X technique; It is still impossible to use the three techniques combined because it is too difficult to share the data of object among the plug-in component, script and Java Applet. The reality above shows the limitation of the current function of browser. Although the current browser is called “client agent”, actually it is only an operator’s agent. It can only update and display a web page according to the operator’s request. It is not an agent of running program and cannot support many requirements for the program to run. In fact, it is not been considered for a browser to support script, Java Applet and ActiveX to request outer data object. This situation is acceptable in the age of that the main function of web is to publish web page. However, the ability of browser is insufficient in the age of that the web is the platform of distributed computing.

Break a New Ground on Programming in Web Client Side

291

3 The Architecture of Extended Client Agent Obviously, we should extend the function of client agent, and add the function to supports web program to request outer object during the program running. It also means we extend the browser from “operator’s agent” to “operator + program’s agent”. However, how can we extend the function of client agent to support web program requesting outer object? And what is the architecture of it? In order to reach an imaginable solution, there are 4 consequent questions to be solved.

3.1 Should Web Program Directly Request Remote Objects or Indirectly Request Them via Client Agent? The undoubtedly logical answer for the question is to let client agent as a middle ware to request remote object for the web program rather than let the web program directly request the remote object. The reason for the point is that by this way we can efficiently utilize the ability of browser and decrease the complexity of processing in programming language. Moreover, we can add security control rules in client agent so as to control the security problem accompanied by the request.

3.2 What Kind of Interface Should Be Used between Web Program and Client Agent? The possible form and regulation for the interface may be Socket based or URL based. In consideration that the function is provided on web platform and the web platform is URL based, the logical answer is that it should be URL[2] based.

3.3 How Does Client Agent Distinguish the Dynamical Request for a New Object from the Request for a New Web Page, and Ensure That the Returned Data Are Sent to the Requestor Rather than Being Opened as a New Web Page? This question comes from the current situation in which each HTTP request from operator will result to update the web page. And suggested dynamical request for object comes from web program and requires only transferring the data to some object instead of updating the current web page. So the client agent must differ from the two kinds of requests and deal the two with deferent process after data returned. In fact, when we suggest allowing web program to deliver a request only for data object, there is no reason for us not to allow web program to deliver a request to update web page. So, browser should identify the type of the requests with some tag when it delivers the request, and then it can recognize the different response and process it with different way. As a client agent, browser usually communicates with server by HTTP[3]. The request is HTTP Request and the return is HTTP Response. The data type is a kind of

292

J. Zhang and M. Zhou

MIME type that include text/html, image/jepg, text/xml and so on. Server neither cares about whether the HTTP Request coming from agent client to requests a new web page or a data object, nor gives any identifier to the returned object. The browser has to recite the attribution of the current request by itself and forward the data to the corresponding web program after the data returns. The most convenient way to memorize the attribution is to add the new option item into HTTP message head. Consequently, the server must return the same HTTP message head and the browser will forward the data object to the web program according to the HTTP message head.

3.4 How Do the Data Received by the Browser Map to the Data Object in the Web Program? The solution of this problem is more complex so that we will not describe it in detail in this paper. We will discuss it in other papers. The basic idea is that we can bind the object file to the data object to achieve the matching between the data file, which is obtained by the browser from server, and the data object used by web program. We can do so because all the contents in a document are organized into object oriented and there is a corresponding definition for each object. The detailed form and rule of the binding should be discussed later. Meanwhile, in order to make the data type match with each other, there are lots of check to be done by the client agent and the language engine. The points above are the primary issues to enable the web program to request remote object. Besides these, the security problem is also an importance issue.

4 The Security Consideration The security consideration is the main cause for the current browser not to support web program to request outer data directly, besides some historical cause. Usually people think that if we allow the web program in web page to have much more ability, it may threaten the security of the computer. The program in web page may download virus and run them, or it may steal some file from the computer and send them out. In consideration with these, the ability of web program in client side is quite simple. It can only manipulate the object in the document. It has no right to input, output, read and write function. It can neither communicate with outside. After adding the ability we mentioned above, will the security of the computer, on which browser runs, be damaged? This paper holds the position that by adopting the following security rules, there will not be more security problems: 1. Forbid the web program on client side to access the file in host computer. Consequently, even the hack program or virus is loaded into the computer, it can only be

Break a New Ground on Programming in Web Client Side

293

a temporary file used by browsers rather than permanent disk file. It is easy to accomplish by browser to check every dynamic request to outer object. 2. A check may be carried out when the requested data are bind to the objects in document. Only the qualified binding can be done. 3. All of the data in the file, which is obtained by the dynamic web program request, should be bound with certain object in the web page. As long as the use of the object in the web page is safe enough, there will not be any security problems. Actually, what we suggest in order to extend the function of the browser is only to allow that the data object is dynamically and optionally loaded, compared to the original process of loading a web page. Under the security control measures we mentioned above, the original security still remain and new security problems will not be caused.

5 Conclusion As we know, on the web server side, it is allowed that the web server request other program and data to extend the server function. By allowing the web program on the client side to request the remote data object, the web program on client side attains similar ability as the program on server side. Like the server extended with the CGI interface and ASP environment; the browser can be extended with the ability to support web program to request the remote object. Such extension will not damage the security as long as we abide by some security rules. Consequently we can break a new ground on programming in client side we could not achieve before. For instance, some data can be sent to the client side in form of XML via SOAP interface, and then the script program flexibly produces and displays the image according to some algorithms and demand. We can also receive and organize the data coming from several servers to form completely effective view. It will obtain and organize the document according to the logical demanding of program running. Obviously, all web users expect the vision.

References 1. 2. 3.

Semantic Web Road map. http://www.w3.org/DesignIssues/Semantic. Tim Berners-Lee, September 1998. Uniform Resource Identifiers (URI): Generic Syntax (RFC 2396) T. Berners-Lee, R. Fielding, L. Masinter Hypertext Transfer Protocol -- HTTP/1.1, RFC2616

An Adaptive Mixing Audio Gateway in Heterogeneous Networks for ADMIRE System Tao Huang and Xiangning Yu National Laboratory of Software Development Environment School of Computer Science and Engineering BeiHang University Beijing 100083, P.R. China

Abstract. Unlike many common media gateways that simply exchange audio packets between senders and receivers, our AGW introduced here can work in heterogenous network environments and support an adaptive receiver-based audio transmitting stategy. Before relaying audio packets from source to destination, it decodes and analyzes the source packets, performing Automatic Gain Control (AGC) and Voice Automatic Detection (VAD) to enhance audio quality. And then it individually mixes audio data according to receivers’ requirements, inserting additional useful paddings and sending them out in different codecs. It has been proved that this method can help our system to serve large amounts of clients easily. Keywords: Audio gateway, NAT, Mixing strategy, Padding

1 Introduction With great expansion of Internet, many conventional office works are carrying out on intranet or internet. Interactive activities such as meetings, interviews and training courses are important aspects of these commercial affairs. Special softwares and capable networks are preconditions of carrying out these activities in different places far away. Now,there are three major video conference systems: H.323[1], Access Grid[2](AG) and SIP[3]. They are developed by different communities and have their own sets of protocols and products [4]. We have designed a new video conference system ADMIRE [5] based on AG system, it works well on multicast networks and it can communicate with AG system nodes freely. Like what AG system has done, we have also developed a reflector [6] to solve communication problems between multicast and unicast nodes. After a period of testing and serving, we found it worked well only at scenes of few participants. Good mutual effect could only be obtained when the participants’ number was below 30. Although we had made improvements on the reflector, little impression amelioration was achieved. This made us thought of new ways: Application level media gateways[7]. This paper first will give a brief overview of our ADMIRE system and media gateway’s architecture. Second will be the presentation of detailed design and solutions of the audio gateway. After discussing and analyzing some experiments’ results, a conclusion is made in the end. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 294–302, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Adaptive Mixing Audio Gateway in Heterogeneous Networks

2 2.1

295

System Overview ADMIRE

ADMIRE system is a real time collaboration platform which supports audio, video, chat, whiteboard, powerpoint, web page sharing, desktop sharing, and application sharing. It provides a complete conference management tool kits as well as conference archiving tools[5]. Besides these multimedia tools, there are three major components which ensure heterogeneous clients to join the same real-time sessions: Media gateways, Real gateway and H.323 gateway.

Fig. 1. Admire gateway cluster

Media gateways (including AGW and VGW) facilitate exchanging audio and video data between different networks. After some basic measurements, clients can obtain the best media services from media gateways. Real gateway makes it possible that large numbers of clients listen to the real time streamed media rather than interacting with them. H.323 gateway helps clients that already equipped with H.323 systems such as POLYCOM easily join our ADMIRE system without changing their former equipments.

2.2

Media Gateways

We have divided media gateways into three parts:MGC (Media Gateway Control),VGW( Video Gateway) and AGW(Audio Gateway). VGW and AGW are designed to handle video and audio data based on their characteristics. Frame extraction[8] and working rate modifying can help VGW change its working state dynamically, while AGW uses mixing and transcoding to be adaptive with different networks. MGC first collects information of network status, and then it starts a pair of AGW and VGW processes for each session and sets their initial parameters such as video sending rate and audio sending codec. Because network conditions are changing frequently, an adaptive approach of

296

T. Huang and X. Yu

adjusting our media services in time is necessary. Video streams always consume much more bandwidth than audio, and their transmitting rate is associated with real time contents while audio rate is correspondingly fixed with its encoding codec. It takes great effect on bandwidth saving when we change video sending rate, sometime it decreases from 1MB/s to 100KB/s. So we choose modifying VGW sending rate to adjust the state of Media gateways. It is AGW’s duty to send feedback to VGW and tell it when to adjust.

3

Problem and Solution

In a multimedia session, video and audio signals have different priorities. End users can put up with talking without video, but it is insufferable for them to watch each other in silence. For the higher priority of audio communication, AGW should be more stable and adaptive to heterogeneous networks than VGW. To achieve that goal, we should resolve three major audio problems: audio data exchanging, NAT firewall penetration and presenter identification.

3.1

Audio Packets Exchanging

In AG system and our formal ADMIRE system, RTP and RTCP audio packets are transmitted on the multicast network. All end users are equal to receive others’ audio data and send their own packets out. This will greatly occupy their bandwidth and increase CPU load. Nodes which are not located on multicast networks are also unable to join these multimedia sessions. Thus these two systems can only be implemented on well supported networks such as Internet2[9] in USA and NSFCNET[10] in China. In order to allow more users to our system, we must build a “bridge” between multicast and unicast networks. Here we have two different kinds of designing patterns to choose: Reflector and Mixer. Reflector: A reflector can redirect media packets from multicast networks to unicast networks without any processes. Each unicast node can receive all the multicast data through the tunnel established between the reflector and itself. Because it is unnecessary to handle those media packets, the implementation of an effective reflector is quite easy. But there is also a fatal drawback: high bandwidth consumption. The total bandwidth of the center reflector is relevant to the number of participants. In common, the number of participants in one session is always limited up to 30 users for the reflector to work well. We have tried to enhance its performance, little amelioration is acquired. Mixer: There are mainly two kinds of mixing strategies we can choose. One is is Distributed Mixing, it is flexible to deploy but nonetheless it is also a challenging to distribute computing load among different machines and maintain it [4]; the other one is Center Mixing which is not computationally intensive but less scalability. Cosidering the complexity and efficiency, AGW was designed to supply mixing service only for unicast users. It transparently reflects all unicast data to multicast networks in the same session. All members in a multicast group will be considered as an end point to AGW. For unicast users, AGW mixes all the

An Adaptive Mixing Audio Gateway in Heterogeneous Networks

297

Fig. 2. Hybrid Model of AGW

other sources and resends the mixed stream to them. This kind of architecture is called Hybrid Model[4][11]. Figure 2 has shown our idea clearly, A and B are two unicast clients outside of multicast session, and they only receive one mixed stream which contains all the other clients’ voice. The bandwidth has decreased from to In most times, m is much smaller than n and we can neglect it. In our former plan, we have designed to send only one pair of RTP and RTCP streams to end user. It means that nodes connecting with AGW can only find one mixed user in their user lists, they cannot be aware of the meeting information such as the membership in this meeting. If people take turns in speaking during a session, unicast users will take quite a time to realize it. To solve this problem, we think of a partly mixing strategy: RTP mixing and RTCP reflecting. That is, AGW will do nothing with RTCP packets from the two kinds of networks, just replicates them and sends them out. According to the definition of the RTP protocol, RTCP packets are sent in a low rate and the transmitted information are collected gradually, so they will not induce much additional bandwidth.

3.2

Firewall Penetration (NAT)

Because of heterogeneous network conditions, AGW should also supply different connecting modes besides special output streams. By default, AGW will first obtain IP addresses of clients from Control Server, and then start transportation of audio streams. This works well only if the clients have got global IP addresses, but great many ADMIRE Clients are running within Intranets and unable to receive UDP packets from outside networks. Incoming connections, audio streams from AGW included, are blocked off by NAT firewalls.We design a NAT penetrating module to solve this problem, it contains a set of APIs and a daemon program called NSPD. There are two kinds of NSPD running modes: server mode and client mode (Figure 3). When running in client mode, it keeps sending a notification packet to the server at a certain rate, filled with its local IP address (internal IP address) and virtual port in the packet payload. This packet is then filtered by NAT

298

T. Huang and X. Yu

Fig. 3. NSPD workflow

firewall which inserts a NAT address and port pair to its packet header. In the same time, another NSPD program running in server mode collects such packets and records them in its address table, after this the server should be aware of the client’s internal IP address as well as its NAT IP address and port. When the server is about to send packets back to an internal host, it will first check whether a NAT IP address is related to this internal IP address in its address table. If so, the server will try to send them to that NAT address. Sending action will fail if the internal IP address is not recorded in the address table of NSPD server. We can see that if there is not an initial notification, NSPD server will never know how to send back media streams. And virtual port is such an important factor in this mechanism because it is the only way to identify internal clients. After inserted a NSPD module respectively, the recent stable version of AGW is able to serve the ADMIRE Clients from Intranets.

3.3

Presenter Identification

Because now clients can only receive one audio stream through AGW, it is impossible for them to recognize the speaker in mixed voice. It seems different presenters are all speaking in one place. This greatly impacts users’ understanding of meeting focus, sometimes great communicating errors may occur. Adding padding to mixed RTP packets is used to solve this problem. Because we reflect all RTCP data to each unicast user, clients can obtain user list from analyzing these packets. What we need to info our clients are addressers’ SSRCs and their volumes. Considering the compatibility and implementing time, we have chosen the way of modifying RTP packets’ headers and add useful padding after RTP payload. Absolutely, additional padding data will lead to more bandwidth consumption, but we think it is quite worthy. It prevents unicast users from losing the meeting’s focus, and now they can easily identify the presenter every time by watching the volume jumping on the user list. Because the number

An Adaptive Mixing Audio Gateway in Heterogeneous Networks

299

of concurrent speakers is always less than 3, we can ignore the burden of this padding most of time.

4 4.1

Implementation and Experiment AGW Work Flow

Like our audio tool CAT[5], AGW adopts pipeline based architecture to enhance its execution efficiency. All data processes are encapsulated into filters, it is easy to add or remove a functional filter in the working flow.

Fig. 4. RTP and RTCP working Pipeline

Figure 4 illustrats our RTP handling channel, RTCP channel works in a similar way. All unicast RTCP packets have been resent to multicast address directly at very beginning of RTCPRecv filter. Besides of packets from themselves, each of the unicast nodes will receive all others’ RTCP data reflected by AGW. The only difference is that there is a RTCPProduce filter running simultaneously with the RTCPReflect filter, it also distributes RTCP data of the mixed streams to multicast and unicast nodes.

4.2

Mixing Strategies

Unlike video data, what audio packets contain are a serial of digital numbers recorded sound energy level. If different sources’ data captured at the same time are added together, a simple mixing is achieved. AGW has used this method to mix audio streams, what it is concerned about is efficiency and resource consumption of the mixing strategy. Client Based Mixing Strategy In the initial version of AGW, we create a mixing buffer for each session client. Audio data received from different nodes are added into these buffers

300

T. Huang and X. Yu

corresponding to their time stamp. Every a fixed period time of inserting, submitting thread will be activated and all useful mixed data are picked out to next filter. Figure 5 below has shown a detailed process.

Fig. 5. Client based mixing

We tested this AGW with different numbers of connecting clients, the result of CPU load and resource consumption are also listed in the Figure 5. It was tested in a scene of two synchronous speakers, and sending out streams are encoded in G.723.1 codec. The machine has a CPU of P4-1.7GHz and its memory is 512MB. It is obvious that this AGW works well at scenes of few connecting users. Because these factors are linear with the number of connecting users, it is natural that the computation overhead increases greatly with the ascending of participant number and the machine will soon be overwhelmed. After a serial of optimization, we have got little improvement. CPU load is still higher than 20% which can not be stand, because VGW will consume much more CPU at the same time. We need to improve the architecture of AGW. Speaker Based Mixing Strategy In our experiments, we have found that there are always less than five persons speaking synchronously in a meeting, so it’s possible that the other listening nodes share the same mixing result. As shown in the Figure 6, we only need to create buffers for those who are speaking now. Suppose the number of simultaneous speakers is M(M < 5) and the total number of meeting participants is N(N > 30), what we need to do is only to create M receiving buffers in receiving filter and M +1 mixing processes in mixing filter. Because they are irrelevant to the sum of users now, resources are greatly saved. The bigger the N is, the more resources is saving comparing with the previous strategy. Table in the right side of Figure 6 is the result of the same experiments on the new AGW. We can see the obvious improvement. CPU load is lower than 15%, and memory usage is much smaller than first AGW.

An Adaptive Mixing Audio Gateway in Heterogeneous Networks

301

Fig. 6. Speaker based mixing

4.3

Padding Format

In section 3.3, we have mentioned additional padding data in the mixed packet for presenter identification. Figure 7 below has shown the detailed design of this special part.

Fig. 7. Padding structure

Following the RTP protocol, the last octet of the padding contains a count of how many padding octets should be ignored. The structure in the figure is used to record the SSRC, encoding codec and volume of the speaking nodes which offered by AGW. If there are speaking nodes detected by AGW, it will insert several these structures behind the output packets’ RTP payload. As said before, there are usually few speakers in one multimedia session, it will not introduce much additional bandwidth consumption to transmit this added padding. CAT can recognize these packets by checking the P parameter in their RTP headers, and then it pick out the padding, and refresh the volume bars on the user list according to the info contains in it. Because RATv4 will simply cut them out, only the mixed stream’s volume will be showed while the others keep mute. So CAT users is easier to grasp the meeting focus than those users of RAT.

302

5

T. Huang and X. Yu

Conclusion and Further Work

In this paper an adaptive Audio Gateway is introduced. This AGW greatly extends scope of ADMIRE system and supplies a dynamical serving selection according to network conditions. With the help of NSPD module, users inside NAT can easily communicate with people on the internet. Also with VAD and such acoustic techniques’ help, it serves clients more exactly and effectively. Its functions fulfill our requests, and have been proved to be efficient in experiments. In the next period, we plan to separate mixing task of one meeting into several parts and distribute them to different AGW based on their load factors. Our goal is to serve tens of meetings with hundreds of clients synchronously.

References 1. International Telecommunication Union, “Packet based multimedia communication systems”, Recommendation H.323, Geneva, Switzerland, Feb. 1998 2. The Access Grid project,http://www.accessgrid.org 3. J. Rosenberg et al, “SIP: Session Initiation Protocol”,RFC 3261, Internet Engineering Task Force, June 2002, http://www.ietf.org/rfc/rfc3261.txt 4. Ahmet Uyar, Wenjun Wu, Hasan Bulut, Geoffrey Fox, “An Integrated Videoconferencing System for Heterogeneous Multimedia Collaboration”, 7th IASTED International Conference on INTERNET AND MULTIMEDIA SYSTEMS AND APPLICATIONS ˜IMSA 2003˜ August 13-15, 2003 Honolulu, Hawaii, USA 5. The ADMIRE project, http://www.nlsde.buaa.edu.cn/admire/en 6. Jin Tian, Chen QingJi, Lu Jian, “Multimedia Multicast Gateway Infrastructure”, proceedings of the 6th World Multiconference on Systemics, Cybernetics and Informatics (SCI 2002), Orlando, Florida, U.S.A, July 2002. 7. E. Amir, S. McCanne, and Z. Hui, “An application level video gateway”, ACM Multimedia Conference and Exhibition, Sanfrancisco, CA, November 1995, pp. 255-266. 8. Yueting Zhuang, Yong Rui,Thomas S. Huang,and Sharad Mehrotra, “Adaptive Key Frame Extraction Using Unsupervised Clustering”, Proceedings of IEEE International Conference on Image Processing, Pages 866-870, October, 1998, Chicago, IL 9. The Internet2 project,http://www.internet2.edu 10. The NSFCNET project,http://www.nsfcnet.net 11. Milena Radenkovic, Chris Greenhalgh, Steve Benford “A Scaleable and Adaptive Audio Service to Support Large Scale Collaborative Work and Entertainment, International Conference Advances in Infrastructure for e-Business, e-Education, e-Science, and e-Medicine on the Internet L’Aquila, Jan 21 - Jan 27 2002

Kernel Content-Aware QoS for Web Clusters Zeng-Kai Du and Jiu-bin Ju College of Computer Science and Technology, Jilin University, Changchun 130012, China [email protected]

Abstract. While content-aware QoS is increasingly desired for clusterbased Web systems, the high processing overhead it caused can easily make the Web switch a system bottleneck. In this paper, we present a more scalable architecture in which content-aware request distribution and service differentiation can be performed on the back-end server nodes. Based on this scalable architecture, a kernel content-aware QoS mechanism is proposed for Web clusters. Simulation results demonstrate that this kernel content-aware QoS mechanism can provide guaranteed performance for preferred clients even when the server is subjected to a client request rate that is several times greater than the server’s maximum processing rate. ...

1

Introduction

With the explosive growth of WWW, the Internet is becoming a mature and business-oriented media. The need for QoS-enhanced Web architectures is much stronger. In recent years, numerous QoS mechanisms have been proposed for Web systems, either at kernel or application level. Typically, kernel mechanisms classify client requests based on client IP address and TCP port [1], [2], [3]. On the other hand, user space mechanisms can take into account the content or type of services requested in request classification and further scheduling [4], [5], [6]. In cluster-based Web systems, a front-end Web switch is often employed as the point of contact for the server on the Internet and distributes incoming requests to a number of back-end nodes. As the Web switch has a centralized control on the system status, QoS mechanisms are often deployed on this Web switch. Unfortunately, when these QoS mechanisms are implemented contentaware, the corresponding request distribution can easily make the Web switch a system bottleneck. To achieve more scalable performance, we present a scalable architecture in which all hosts participate in request dispatching1. On the basis of this architecture, a kernel content-aware QoS mechanism is proposed for Web Clusters. By examining the application layer information in the HTTP header, this mechanism can provide content-aware service differentiation. At the same time, the implementation of this mechanism introduces only a little overhead to the basic CODA architecture. 1

This work was supported by the National Natural Science Foundation of China under Grant No. 60073040.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 303–310, 2004. © Springer-Verlag Berlin Heidelberg 2004

304

Z.-K. Du and J.-b. Ju

The rest of the paper is organized as follows. Section 2 investigates popular Web QoS mechanisms and presents the CODA architecture. In section 3, a kernel content-aware QoS mechanism is proposed for the CODA architecture. Section 4 presents a detailed simulation model for CODA and the parameters of the workload model. We discuss our simulation results in Section 5 and Section 6 presents conclusions.

2

Background

In recent years, a significant amount of research on Quality of Service (QoS) has focused on the network infrastructure. However, network QoS alone is not sufficient to support end-to-end QoS. To avoid the situation where high priority traffic reaching a server is dropped at the server side, the system hosting the Web site should be also enhanced with mechanisms for delivering end-to-end QoS to some classes of users and services. In traditional Web servers, client request are served in a first-come-firstserved way. To support differentiated services, numerous QoS mechanisms have been proposed at application or kernel level aiming to accommodate priority based scheduling schemes. At the kernel level, priority-based scheduling is often located at the system resource that forms the bottleneck in the request path [1], [2], [3]. With these priority-based scheduling, incoming connections can be reordered based on their priorities. As kernel mechanisms can discard incoming connections at early stage of the protocol stack processing, they are quite efficient in protecting Web systems from overload. At application level, Web servers (e.g., Apache) are typically modified in the request accepting procedure to accommodate new priority-based scheduling disciplines [4], [5], [6]. Owing to the easy access to application layer information, user space mechanisms are often content-aware. However, when a request has to be discarded eventually, it has consumed a lot of system resource. So user space mechanisms are always expensive. In cluster-based Web systems, QoS mechanisms are often deployed on the front-end Web switch. To provide differentiated services, the main task is how to assign the server resources to different service classes. In commercial contentaware Web switches, server nodes are statically partitioned into multiple subsets [7], [8]. Client requests of different classes will be assigned to these subsets for service differentiation. Dynamic server partition algorithms have also been proposed in recent literatures [9], [10], [11]. Unfortunately, when content-aware QoS mechanisms are implemented on these Web switches, the corresponding request distribution can easily make them a system bottleneck. To overcome this drawback, we have proposed a completely distributed architecture (CODA for short), as showed in Fig. 1. In this architecture, we employed both layer-4 and layer-7 Web switching techniques. The typical scenario of CODA is as followed: (1) a client sends a connect request to the front-end (layer-4) Web switch, (2) the Web switch selects a server node (called initial server node) with some simple dispatching algorithm, (3) the client connects to

Kernel Content-Aware QoS for Web Clusters

305

the initial server node and sends it a request, (4) the initial server node obtains the content of this request, selects a destination server node on the basis of the content from the request and hands this connection off to the destination server node [12], (5) the server at the destination server node sends replies directly to the client.

Fig. 1. Completely Distributed Architecture

From the scenario of CODA, we can see that there is no performance bottleneck in this architecture. Increasing the size of the cluster should result in a proportional improvement in performance. At the same time, all the server nodes in CODA are the same. So the cluster cannot be wholly disabled by the failure of a single node - as is possible under centralized approaches. However, in order to deploy this architecture, some decentralized dispatching algorithm must be implemented on the back-end server nodes. Various issues about designing these algorithms have been addressed in [13].

3

Kernel Content-Aware QoS Mechanism for CODA

From the scenario of CODA presented in Section 2, we can see that the client request has been parsed in the kernel to enable content-aware request distribution. Based on this information, we have designed an adaptive architecture for enhancing CODA with content-aware Quality of Service, as shown in Fig. 2. When a connection is established between a client and the server, a HTTP request from the client will be sent in one or more TCP packets. The Request parser is responsible for parsing the HTTP request from these packets. Then the Time Stamper makes a start timestamp on it. Based on the content from the client request, the Content-aware dispatcher performs request distribution. If this client request is to be served locally, a specific priority will be assigned to this request by the Request classifier according to the application layer information in the HTTP header, such as the URL name or type and other application-specific information available in cookies. Then the client request will be placed into a Class-based accept queue. By using a separate queue for each service class, this Class-based accept queue may support multiple service classes. If this client request is to be served by another server node, the Connection Handoffer on both server nodes will move the established connection to the des-

306

Z.-K. Du and J.-b. Ju

Fig. 2. Adaptive QoS architecture for CODA

tination node. Then the client request will be classified by the Request classifier and be placed into according accept queue. As soon as the Web server is ready to receive client requests, the Prioritybase scheduler is responsible for selecting the next request to be served. Many scheduling policies can be used in this scheduler. In section 5, we will use a work-conserving weighted fair queuing accept queue (WFQAQ) scheduler. When the response of the request is sent to the client, the Time Stamper will make an end timestamp on this request. From the start and the end timestamp, the QoS verifier calculates out the delay of this request on the server and compares it with a given SLA. If the SLA is invalidated, it will invoke the Resource adaptor to adjust the server resources assigned to different service classes. Form the scenario above we can see that our QoS mechanism is content-aware and implemented at the kernel level. By examining the application layer information in the HTTP header, this mechanism can provide content-aware service differentiation. At the same time, the implementation of this mechanism introduces only a little overhead to the basic CODA architecture. As this mechanism is wholly implemented in the kernel, the context switch to user space is avoided. So we expect that it work more efficiently than user space mechanisms.

4

Simulation Model

In this section, we describe a detailed simulation model of our completely distributed architecture. With this model, we will show how the kernel QoS mechanism provides guaranteed performance for preferred clients even during high workloads.

4.1

System Model

Our Web cluster consists of multiple back-end server nodes and a dedicated layer4 front-end switch. The Web switch and server nodes are interconnected with

Kernel Content-Aware QoS for Web Clusters

307

a local fast Ethernet with 100 Mbps bandwidth. The Web cluster is connected to the Internet through one or more large bandwidth links that do not use the same Web switch connection to the Internet. Being the focus of this paper on server QoS, we did not model the details of the external network that connects the clients to the Web cluster. Due to the efficiency of Layer-4 routing mechanism, we model the front-end switch as a CSIM process [14] that distributes client requests without delay. On the other hand, each server in the cluster is modeled as a separate component with its CPU, main memory, locally attached disk, and network interface. In the simulation, we use system parameters adopted in [12]. Connection establishment and teardown costs are set at 0.145ms of CPU time each, while transmission from memory cache incurs 0.04ms per 512 bytes. If disk access is required, reading a file from disk has a latency of 28 ms (2 seeks + rotational latency). The disk transfer time is 0.410ms per 4 KBytes (resulting in approximately 10 MBytes/set peak transfer rate). For files larger than 44 KBytes, an additional 14 ms (seek plus rotational latency) is charged for every 44 Kbytes of file length in excess of 44 KBytes.The Web server software is modeled as an Apache-like server, where an HTTP daemon waits for requests of client connections. Each client is a CSIM process that, after activation, attempts to setup a connection to the Web cluster. The front-end layer-4 switch selects a server node (called initial server node) with some simple dispatching algorithm. When the initial server node obtains the content of the request, a destination server node will be selected for this request based on the request content. If this client request is to be served locally, the handoff overhead will be ignored due to the simplicity of local TCP handoff. If this client request is to be served by another server node, an extra handoff overhead of 0.3ms will be cost on the initial server node. When the Web server picks a request out of priority-based accept queue, it dedicates a new HTTP process for that connection.

4.2

Workload Model

Special attention has been devoted to the workload model that incorporates all most recent results on the characteristics of real Web load. The high variability and self-similar nature of Web access load is modeled through heavy tail distributions such as Pareto, lognormal and Weibull distributions. Random variables generated by these distributions can assume extremely large values with non-negligible probability. In our experiment, the client arrivals are modeled through a lognormal distribution with a standard deviation set as 25% of its mean parameter. Two types of client requests, static requests and dynamic requests, are considered in our experiment. Their respective percentages in the workload mix are obtained from the Standard Performance Evaluation Corporation. The static workload consists of files in four classes. The frequency of distribution is shown in the Table 1. A dynamic request includes all overheads of a static request and overheads due to computation to generate the dynamic objects. In [15], the dynamic requests are divided into three classes: light, middle-intensive and intensive re-

308

Z.-K. Du and J.-b. Ju

quests. Their service times are modeled through exponential distribution with mean equal to 16, 46 and 150 ms, respectively. According to the logfile traces from real e-commerce sites, their percentages are set as 10%, 85%, and 5%. To balance out different types of workloads among the server nodes, we use a distributed CAP algorithm in our simulations.

5

Performance Analysis

As described in Section 3, our QoS mechanism can support multiple service classes with different SLAs. Without loss of generality, we consider two classes of requests denoted as high and low classes in our simulation experiments. Our main goal is to show how the performance of high-class requests can be guaranteed in different architectures. In this section, we consider two architectures: 1. CODA without QoS mechanism. In this architecture, the high-class and low-class requests are treated in the same manner. 2. CODA enhanced with our kernel content-aware QoS mechanism. In this architecture, we use a WFQAQ scheduler in selecting the next request to be served. The SLA in terms of performance is typically measured as the K-percentile of the page delay that must be less than Y seconds. Although the clients may accept 7-8 seconds of response time (including the time for the address lookup phase and network transmission delay), typical measures are 90- or 95-percentile of the requests that must have a delay at the server less than 2-4 seconds. In our simulations, we use 95-percentile of page delay of high-class requests as the main metric to analyze the performance of the two different architectures. Figure 3 shows the 95-percentile of page delay of high-priority requests for the two architectures we compared. As the request arrival rate increases, the page delay of high-priority requests is bound to increase. In CODA without QoS mechanism, all client requests are treated in the same manner. When the rate of incoming requests is below the capacity of the Web cluster, most requests can be processed without introducing large delays. However, when the request arrival rate increases to 500 R/S (requests per second), bare CODA can no longer fulfill the specified SLA. So we consider 500 R/S as the maximum workload that can be handled by this cluster. In CODA enhanced with our kernel content-aware QoS mechanism, the WFQAQ scheduler allows a weight to be assigned to each class, the rate of requests accepted from a class is proportional to its weight. Thus, the weight

Kernel Content-Aware QoS for Web Clusters

309

Fig. 3. 95-percentile of page delay of high-priority requests

setting of a class allows us to control its delay in the accept queue. When the Web cluster is overloaded, high-priority requests may get preferential treatment while the low-priority requests receive degraded or no service. As a result, the given SLA can still be guaranteed at about 2000 R/S in our QoS enhanced CODA. In our simulations, the percentages of high and low class are set as 20% and 80%. When the request arrival rate is at 2000 R/S , the arrival rate of highpriority requests is about 400 R/S . So at this point, about 80% (400/500) of the cluster’s resource is used in processing the high-priority requests. At the same time, most low-priority requests are held in the accept queue and discarded eventually. The left 20 percent of cluster capacity must be spent on the interaction with these eventually discarded requests, including the connection establishment, request parsing and connection teardown. When the rate of incoming requests is above 2000 R/S , more cluster performance will be lost in processing these eventually discarded requests. In the extreme case, the overload condition will lead to server livelock, where the server is busily handling TCP connections in the kernel and is not doing any useful work. At this time, more efficient QoS mechanisms (e.g., TCP SYN policing) are needed to protect Web systems from overload. In our architecture, these mechanisms can be employed on the front-end switch as a complementary.

6

Conclusions

In this paper, we present a more scalable architecture in which content-aware request distribution and service differentiation can be performed on the back-end server nodes. Based on this scalable architecture, a kernel content-aware QoS mechanism is designed for Web clusters. Unlike existing content-blind kernel mechanisms, this mechanism can provide content-aware service differentiation by examining the application layer information in the HTTP header.In addition,

310

Z.-K. Du and J.-b. Ju

it only imposes little overhead to the basic CODA architecture. We demonstrate that this kernel content-aware QoS mechanism can provide guaranteed performance for preferred clients even when the server is subjected to a client request rate that is several times greater than the server’s maximum processing rate.

References 1. Thiemo Voigt, Renu Tewari, Douglas Freimuth and Ashish Mehra. Kernel Mechanisms for Service Differentiation in Overloaded Web Servers. 2001 Usenix Annual Technical Conference, Boston, MA, USA, June 2001 2. T. Abdelzaher, K. G. Shin, and N. Bhatti. Performance Guarantees for Web Server End-Systems: A Control-Theoretical Approach. IEEE Transactions on Parallel and Distributed Systems, 13(1), January 2002. 3. P. Pradhan, R. Tewari, S. Sahu, A. Chandra, and P. Shenoy. An Observationbased Approach Towards Self-Managing Web Servers. In Proceedings of the Tenth International Workshop on Quality of Service (IWQoS 2002), May 2002. 4. N. Bhatti and R. Friedrich. Web server support for tiered services. IEEE Network, 13(5), September 1999. 5. R. Pandey and R. Barnes, J. F. Olsson. Supporting quality of service in HTTP servers. In Proc. ACM Symp. on Principles of Distributed Computing, Puerto Vallarta, Mexico, June 1998. 6. N. Vasiliou and H. L. Lutfiyya. Providing a differentiated quality of service in a World Wide Web server. In Proc. Performance and Architecture of Web Servers Workshop, Santa Clara, CA, June 2000. 7. “Resonate Products–Central Dispatch”, http://www.resonate.com. 8. F5 Networks, http://www.f5labs.com/. 9. V. Cardellini, E. Casalicchio, M. Colajanni, and M. Mambelli. Web switch support for differentiated services. ACM Performance Evaluation Review, 29, 2001. 10. H. Zhu, H. Tang, and T. Yang. Demand-driven service differentiation in clusterbased network servers. In Proc. of IEEE Infocom 2001, Anchorage, Alaska, Apr. 2001. 11. Valeria Cardellini, Emiliano Casalicchio, Michele Colajanni, and Marco Mambelli. Enhancing a Web-server Cluster with Quality of Service Mechanisms. In Proceedings of the IEEE International Performance, Computing, and Communications Conference, Phoenix, AZ USA , April 2002. 12. “Locality-Aware Request Distribution in Cluster-based Network Servers”, Vivek S. Pai, Mohit Aron, Gaurav Banga, Michael Svendsen, Peter Druschel, Willy Zwaenepoel and Erich Nahum. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VIII) , San Jose, CA, October 1998. 13. Du Zeng-Kai and Ju Jiu-Bin. Distributed Content-aware Request Distribution in Cluster-based Web Servers. In Proceedings of the Fourth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT’03), August 27-29, 2003, Chengdu, China. 14. Mesquite Software, Inc., CSIM18 user guide, http://www.mesquite.com. 15. V. Cardellini, E. Casalicchio, M. Colajanni, A performance study of distributed architectures for the quality of Web services. In Proc. of Hawaii Int’l Conf. on System Sciences (HICSS-34), Maui, Hawaii, pp. 3551-3560, Jan. 2001. IEEE Computer Society.

A Collaborative Multimedia Authoring System Mee Young Sung and Do Hyung Lee Department of Computer Science & Engineering, University of Incheon, 177 Dowhadong, Namgu, 402-749 Incheon, South Korea {mysung, oldkill}@incheon.ac .kr

Abstract. The existing authoring tools usually provide an authoring environment where the spatial information and temporal information are edited independently in two different Interfaces, which can inconvenience the users. We created a collaborative authoring system for multimedia presentation, which overcomes this inconvenience. The 3D spatio-temporal editor of our system allows users in different places to author together a multimedia presentation simultaneously in a single unified spatio-temporal space. Conceptually, every temporal relationship can be described using one of seven relations (‘before’, ‘meets’, ‘overlaps’, ‘during’, ‘starts’, ‘finishes’, and ‘equals’). This conceptual representation provides an efficient means for designing an overview of a multimedia presentation. Our authoring system internally represents the edited multimedia presentation using a TRN. A TRN is composed of media objects and a set of temporal relationships among objects, and it corresponds exactly to the conceptual temporal structure of the multimedia presentation. The TRN editor of our system presents graphically the internal TRN and provides users with an intuitive mechanism for representing the conceptual flow of a presentation. The internal TRN and the graphical TRN can be generated automatically from the 3D graphical representation specified by the author of the presentation. The results obtained through some our experiments explain that our system is more advantageous than traditional multimedia authoring systems in terms of authoring time and ease of interaction.

1 Introduction A multimedia authoring system must provide an environment where the temporal relationships and spatial relationships among objects can be edited simultaneously [1, 2, 3, 4]. An interactive multimedia authoring system also needs to support user interactions. Some media (such as video, sound, and animation) require users to specify temporal characteristics, and other media (such as video, animation, images, and text) require users to specify the spatial relationship between objects. The key to authoring a presentation lies in the composition of spatial relationships and temporal relationships between objects. Existing authoring tools usually provide an authoring environment where the spatial information and temporal information are edited independently in two different 2D GUIs (Graphical User Interfaces), which can inconvenience the users. Concerning the temporal editing interface, the traditional scaled timeline M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 311–318, 2004. © Springer-Verlag Berlin Heidelberg 2004

312

M.Y. Sung and D.H. Lee

approach allows users to directly view and control the structure of the content; however, the representation is fixed, and the operations are manual. The goal of this study is to develop an easy and efficient multimedia authoring environment where users can create a multimedia presentation in a simple and intuitive manner. Toward this goal, we created a 3D spatio-temporal space which integrates the 2D spatial editing environment and the 1D temporal editing environment into a single, unified editing environment. In addition, we provide users with the capability to edit temporal relationships between media objects at the conceptual level: for example, presenting object A before B, presenting object A during B, etc. In this paper, we propose a collaborative authoring system that is efficient through providing various editing facilities. Figure 1 illustrates the overall structure of our system. We will briefly present the concept of TRN in the following section. In section 3, our main editors; the 3D spatio-temporal editor and the TRN (Temporal Relation Network) editor will be described. We will discuss the collaboration scheme in section 4. Then we will examine some experiments on our system and a comparison of it in section 5. Finally, the last section will provide conclusions.

Fig. 1. System Overview

2 TRN (Temporal Relation Network) Our system’s internal representation of presentation is based on Allen’s temporal intervals [5]. Conceptually, every temporal relationship can be described using one of seven relations (‘before’, ‘meets’, ‘overlaps’, ‘during’, ‘starts’, ‘finishes’, and ‘equals’). We proposed to use TRN (Temporal Relation Network) for representing these temporal relationships of multimedia presentations [6]. TRN representations of these seven temporal relations are summarized in Figure 2. The TRN representations in Figure 2 correspond exactly with the internal representation of each corresponding temporal relationship. TRN is a directed and weighted graph. Note that all five parallel relations (such as overlaps, during, starts, finishes, and equals) can be generalized

A Collaborative Multimedia Authoring System

313

as the ‘equal’ relation by adding some dummy delay objects (represented as small black squares) as shown in Figure 2. This conceptual representation provides an efficient means for designing multimedia presentations by sketching the presentation overview at the conceptual level.

Fig. 2. Representation of Temporal Relations

3 Editors Figure 3 (a) illustrates a perspective view of a multimedia presentation in our system. This presentation consists of five media objects, an audio clip, a video clip, two images, and a text object. Authors can create media objects, place objects at the desired positions, and enlarge or shorten the temporal length of objects by dragging and dropping. Authors can change the perspective from which the objects are viewed in 3D space using the arrow keys. Also, authors can quickly change to these default views by selecting a corresponding icon. Details of our 3D spatio-temporal editor are described in the references [6]. Our authoring system is based on SMIL (Synchronized Multimedia Integration Language). It means that our system generates the SMIL codes. A structural view of SMIL tags of the presentation corresponds exactly to the DOM (Document Object Model) structure of the presentation. A view of a SMIL

314

M.Y. Sung and D.H. Lee

object’s attributes are presented on the left of the screen shots in Figure 3 and Figure 4. The bottom right panel of Figure 3 (a) illustrates an example graphical representation of TRN that is created as a user authors the presentation. The internal TRN and the graphical TRN can be generated automatically from the 3D graphical representation specified by the author of the presentation. The algorithm for this automatic conversion to TRN is discussed in the reference [7]. After the authoring is finished, a DOM structure associated with the presentation can be generated from the graphical representation. Our system generates SMIL codes through the interaction between the graphical representation and the DOM structure. Figure 3 (b) presents the spatial projection view of the example in Figure 3 (a). The traditional timeline view of the example is also presented at the bottom of Figure 3 (b). Figure 4 (a) demonstrates the temporal projection view of the example in Figure 3 as well as the textual view of it. Figure 4 (b) shows a screenshot of playback using our built-in player. The author can choose to view between the TRN view, the timeline view, or the textual view using the tabs at the bottom of the panel in the figure.

Fig. 3. An Example of 3D Representation of a Multimedia Presentation: (a) Perspective View, (b) Spatial View

Fig. 4. An Example of 3D Representation of a Multimedia Presentation: (a) Temporal View, (b) Screen Shot of Playback

A Collaborative Multimedia Authoring System

315

4 Collaboration Structure Our authoring system allows a group of users working at different machines to work on the same multimedia presentation and to communicate in real time. In any collaborative computing environment, multiple users or processes can access a shared object concurrently. In this situation, an inconsistency of shared data might occur therefore a concurrency control is required. We implemented some ideas for efficient concurrency control in our system. They are mainly based on user awareness, multiple versions, and access permissions of shared objects. Details of our concurrency control mechanism are described in the reference [7]. The collaboration manager of our system takes charge of the communications of all events generated by users. Each authoring system at different places can be a server as well as a client of a collaboration group at the same time. A server generates itself as the first client of the collaboration group. Any client can connect to the server using TCP (Transmission Control Protocol) and generates packets corresponding to the content that is created as users edit the presentation. It also receives packets from the server, analyses the packets, and invokes appropriate events or modules. Once a client connects to a server, the server updates the list of groups and initializes the new client by sending a group of objects that have been authored up until that time to the new client. After then, the server multicasts any messages passed to it and the client processes and visualizes any received messages. This mechanism is a variation of a clientserver mechanism which can provide better network performance and better portability of the system.

Fig. 5. Structure of Group Communication

5 System Analysis For the purpose of validating our system, we performed some experiments for confirming the usability of our system. We also compared the functionalities of our system to the existing SMIL editors.

316

M.Y. Sung and D.H. Lee

5.1 Usability Analysis We undertook an experiment that compares our system with an existing commercial system: the TagFree SMIL editor. The TagFree SMIL Editor provides two authoring interfaces: one is a 2D temporal authoring interface (a traditional scaled timeline interface) and the other is a 2D spatial authoring interface. However, our system provides only one authoring interface: that is a 3D spatio-temporal authoring interface as shown in Figure 3 and Figure 4. The experiment was conducted with 10 university students who are good at computing. First, we explained the usage of two editors to the students. Then let the students manipulate both editors. And then, we proposed the students to author multimedia content comprised of several media objects. We let five students use the 3D interface (our system) first, then the 2D interface (TagFree SMIL 2000 Editor) second. We let another five students use the 2D interface first, then the 3D interface second. We gave the students 6 different scenarios of presentations whose number of objects increases by 2 (such as 1, 3, 5, 7, 9, and 11) and let them author those presentations. This authoring experiment was performed 2 times with the same students by changing the sequence to use the 2D interface and 3D interface. Table 1 presents the average editing time of this experiment. The analysis of the results of this experiment is illustrated in Table 2.

As shown in Table 2, we found that the 3D authoring interface allows users to author faster than the two 2D authoring interfaces. We also found that the difference between the average authoring time for 2D and that of 3D increases as the complexity of media objects increases. There are three reasons why 2D authoring takes more time. The first reason for latency comes from the time consuming adjustment of media objects on the scaled timeline for representing the temporal relationships as ‘parallel’ or ‘sequential’. The second reason originates from the change of the authoring environment from the temporal interface to the spatial interface, or vice-versa. The last reason is caused by the separate editing of layout regions in the spatial interface

A Collaborative Multimedia Authoring System

317

followed by the manual linkages of the layout objects to the objects in the temporal interface. It is inconvenient that the temporal characteristics and the spatial characteristics of an object cannot be recognized at a glance in a 2D authoring environment. In comparison, a 3D authoring interface allows users to recognize the spatio-temporal characteristics at a glance. Moreover, it is obsolete to link the temporal objects to the spatial objects in the 3D authoring environment. The students who participated in this experiment concluded that our 3D authoring interface is an intuitive and faster authoring tool. Generally, our system was favored above the others. However, the students highlighted one inconvenience of our system. That is the difficulty for editing the overlapped objects in 3D, since the grasp of the sense of distance and pointing out the appropriate 3D distance with the mouse in the 3D perspective view is not always evident.

5.2 Comparison of SMIL Editors Table 3 summarizes the comparison of commercial SMIL Editors and our authoring system. The commercial products are the SMIL Composer (Sausage Software), the GriNS (Oratrix), the SMIL Editor ver.1.0 (Rikei, Jpan), the TagFree 2000 SMIL Editor (Dasan Technology, Republic of Korea). As shown in Table 3, our system provides several advantages. They are the provision of diverse editing interfaces, allowing users to author in a 3D spatio-temporal environment, the possibility of logical and conceptual editing using the Temporal Relation Network editor, offering realtime feedback of every edit, providing the intuitiveness and efficiency necessary for

318

M.Y. Sung and D.H. Lee

authoring multimedia presentations, providing the collaborative authoring functionality, etc.

6 Conclusions We developed a collaborative multimedia authoring system which is composed of several editors such as a 3D spatio-temporal editor, a TRN (Temporal Relation Network) editor, a timeline editor, a tag editor, an attribute editor, and a text editor. The 3D spatio-temporal editor of our system allows users in different places to author multimedia presentations in unified spatio-temporal space while freely traversing the spatial domain and the temporal domain without changing the context of authoring. Our authoring system automatically converts the authored multimedia presentation to a Temporal Relation Network (TRN) for its internal representation. A TRN corresponds exactly to the conceptual temporal structure of the multimedia presentation. The internal TRN is visualized in the TRN editor. We performed some experiments to validate the usability of our authoring system. The experiments lead us to conclude that the 3D authoring interface allows users to author faster than the two 2D authoring interfaces, and the difference between the average authoring time for 2D and that of 3D increases as the complexity of media objects increases. Acknowledgement. This work was supported by the Korea Science and Engineering Foundation (KOSEF) through the Multimedia Research Center at the University of Incheon.

References 1. 2.

3. 4.

5. 6.

7.

M.Y. Kim, J. Song, “Multimedia Documents with Elastic Time,” Proceedings of ACM Multimedia ’95, November 5-9, 1995, San Francisco, California, USA, pp143-154, 1995. J. Song, G. Ramalingam, R. Miller, B. K. Yi, “Interactive authoring of multimedia documents in a constraint-based authoring system,” Multimedia Systems, Vol. 7, pp424-437, Springer-Verlag, 1999. M. Vazirgiannis, I. Kostalas, T. Sellis, “Specifying and Authoring Multimedia Scenarios,” IEEE Multimedia, Vol. 6, No. 3, pp24-37, July-September 1999. T.D.C. Little, A. Ghafoor, “Spatio-Temporal Composition of Distributed Multimedia Objects for Value-Added Networks,” IEEE Computer, Vol. 24, No. 10, pp42-50, October 1991. J.F. Allen, “Maintaining Knowledge about Temporal Intervals,” Communications of the ACM, Vol. 26, No. 11, pp832-843, November 1983. M.Y. Sung, S.J. Rho, J.H. Jang, “A SMIL-based Multimedia Presentation Authoring System and Some Remarks on Future Extension of SMIL”, Proceedings of Packet Video 2002, Pittsburgh, Pennsylvania, USA, 11 pages, April 24-26, 2002, http://www.pv02.org M.Y. Sung, D.H. Lee, S.J. Rho, S. Y. Rhee “Authoring Together in a 3D Spatio-Temporal Space,” Proceedings of ACM Multimedia 2002 Workshop ITP (International Conference on Immersive Telepresence), December 6, 2002, Juan-les-Pins, France, 4 pages, 2002.

Research of Satisfying Atomic and Anonymous Electronic Commerce Protocol Jie Tang, Juan-Zi Li, Ke-Hong Wang, and Yue-Ru Cai Knowledge Engineering Group, Department of Computer, Tsinghua University, P.R.China,100084 [email protected],

[email protected]

Abstract. Atomicity and anonymity are two important attributes for electronic commerce, especially in payment systems. Atomicity guarantees justice and coincidence for each participant. However traditional atomicity either bias to SELLER or CUSTOMER. Anonymity is also intractable, unanonymous or anonymity both leads to dissatisfying ending. Therefore, it is important to design a system in which satisfying atomicity and revocable anonymity are both enabled. In this paper, based on AFAP(Atomic and Fair Anonymous Protocol), we propose an approach to realize satisfying atomicity. In this method, not only SELLER’S satisfying atomicity but also CUSTOMER’s satisfying atomicity is supported. At the same time, based on Brands’ fair signature model, this method satisfies both anonymity and owner-trace or money-trace.

1 Introduction Recent years, e-commerce (electronic commerce) is one of the most important and exciting area of research and development in information technology. Many ecommerce protocols corresponding to traditional credit card payment, cheque payment and cash payment have been put forward, such as, SET[1], NETBILL[2], DigiCash[3], etc. However, application of e-commerce doesn’t grow up as expect, the main reasons can be classed into two aspects. (1) Lack of fair transaction. A fair transaction is a transaction satisfying atomicity, which means that both sides agree with the goods and money before the transaction, and satisfy with the goods (or money) after the transaction. Otherwise transaction terminates with reconversion to both sides. Unfortunately, most existing e-transaction protocols don’t provide atomicity. None of them supports satisfying atomicity. (2) Lack of privacy protection. Most of the current systems can’t provide protection to users’ privacy. In these systems, all clients’ activities on the web are logged involuntary which lead to potential possibility of misusing the clients’ privacy. Another is anonymous system [3], in which client could execute payment anonymously, but new problems emerge, e.g. cheating, money laundering, etc. Therefore, these two categories both have some defects. We proposed an electronic transaction protocol to realize both atomicity and fair anonymity in [4]. In this paper, we further the research and propose Satisfying Atomicity and Fair Anonymous Protocol to realize both satisfying atomicity and fair anonymity. This paper is structured as follow. Chapter two introduces related work. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 319–326, 2004. © Springer-Verlag Berlin Heidelberg 2004

320

J. Tang et al.

Chapter three reviews AFAP. Chapter four presents a new electronic transaction protocol (SAFAP). Chapter five analyzes the satisfying atomicity and fair anonymity of the protocol. Finally, we conclude the paper with a discussion.

2 Related Work 2.1 Satisfying Atomicity Tygar brought forward that atomicity should be guaranteed in e-transaction [5,6]. He divided atomicity into three levels: Money Atomicity, Goods Atomicity, Certified Delivery. These three definitions can refer to [4] and [13]. Jean Camp presented a protocol realizing atomicity and anonymity [7]. But in it, fair anonymity is unavailable. Additional, goods are limited to electronic ones. Based on the atomic classify by Tygar, we propose an extended atomicity classification: satisfying atomicity, namely strong fair transaction. In this scene, atomicity can be fallen into two classes: satisfied atomicity and unsatisfied atomicity. Satisfied transaction: All participants agree with the transaction before the submission of the transaction and satisfy the ending after submission. Obviously, satisfied transaction is a subset of transaction. On the other side, satisfying transaction has to be transaction of successful submission. Transaction fulfilling goods atomicity does not have to be satisfying one, which might due to the shopper’s dissatisfaction with payment or customer’s dissatisfaction with goods. Unsatisfied transaction: the submitted transaction is not a satisfied transaction. Submission of satisfied transaction is named satisfied submission. In e-transaction, customer always couldn’t validate the goods supply before she receives goods. At the same time seller couldn’t validate the payment before the payment really take place. Therefore, when payment and goods supply are both assumed correctly, the transaction is called satisfying transaction.

2.2 Fair Anonymity David Chaum put forth the concept of the anonymity in electronic commerce, and he realized the privacy in his Digicash[3]. But absolutely anonymity provides feasibility to criminal by untraceable e-crime, such as corruption, unlawful purchase, etc. In 1992, van Solms and Naccache[8] discovered a serious attack on Chaum’s payment system. Therefore, Chaum and Bronds proposed some protocols to implement controllable anonymity[9,10]. Afterward, Stadler brought forward fair anonymity[11]. Some other methods are proposed in the later years[12]. With fair anonymity, the anonymity of one participator could be revoked by the authorized organization when illegal transactions are found. Unfortunately, in most existing systems, certificates are usually used to authenticate merchants and customers, and the credit card is employed as payment method, consequently, the transaction detail is opened to the bank.

Research of Satisfying Atomic and Anonymous Electronic Commerce Protocol

321

3 Atomic and Fair Anonymous Electronic Transaction Protocol We proposed a protocol, in which atomicity and fair anonymity are both satisfied. We define the protocol as AFAP(Atomic and Fair Anonymous Protocol). The system model is a 5-tuple AFAPM := (TTP, BANK, POSTER, SELLER, CUSTOMER) Where SELLER is electronic merchant; CUSTOMER is an online buyer; TTP is third trusted partner; BANK is the bank network; POSTER is a delivery system. AFAP comprises five sub-protocols: TTP Setup, BANK Initialization, Registration, Payment and Trace.

3.1 TTP Setup and BANK Initial Sub-protocol Based on the PKI trusted tree, TTP can be authenticated level by level, which is similar to the CA authentication. TTP selects systemic parameters g , q, and chooses private key

randomly, then computes

finally

publishes g, q, in signature certificate. Two main steps are included in BANK Initial sub-protocol, viz. obtainment of the certificate and installation of local basic database. BANK selects private key randomly, and computes Then TTP generates certificate for BANK. At last, BANK builds up account database and deposit audit database.

3.2 CUSTOMER Register Sub-protocol CUSTOMER should register at TTP to get private signature certificate firstly, and then she apply for an account number at BANK. To get private certificate is similar to the process when BANK applies for the certificate. To apply for the account number at BANK, CUSTOMER selects randomly, and computes where at the same time CUSTOMER selects a random binary string m to compute the knowledge signature BANK verifies and saves I as the account number of CUSTOMER, where I is global unique in AFAP. Then BANK computes sends z back to CUSTOMER.

3.3 Payment Sub-protocol Payment sub-protocol is the most important sub-protocol. Detail of the sub-protocol can refer to [4] and [13]. The process of the payment sub-protocol in AFAP is elaborated as Fig.2.

322

J. Tang et al.

Fig. 1. The process of the payment sub-protocol in AFAP

3.4 Trace Sub-protocol When illegal transaction occurred, such as repayment, payment forgery and electronic commit, etc. trace sub-protocol is activated, which include three aspects: (1) Owner Trace. It is aimed to revoke the owner anonymity and discover the account based on the known illegal payment. Firstly, BANK queries the payment token and sends to TTP, TTP executes the following computation to traces the owner of the illegal payment. (2) Payment Trace. To discover payment token based on known drawn token. Firstly, BANK sends to TTP. TTP computes to get payment token. (3) Drawn Trace. Drawn trace is the inverse action to payment trace. It is aimed to discover drawn token according to the known payment token. BANK searches and sends to TTP, TTP computes to get drawn token.

Research of Satisfying Atomic and Anonymous Electronic Commerce Protocol

323

4 Satisfying Atomic and Fair Anonymous E-transaction Protocol AFAP implements atomicity, but it doesn’t realize satisfying atomic, in which SELLER agreed with the payment, but CUSTOMER might receive wrong goods viz. correspond to the description on Trans_detail. In this scene, CUSTOMER doesn’t have any guarantee. She can only make a request to TTP to audit the transaction, withdraw the anonymity, open the Trans_detail and validate it with POSTER. To implement the satisfying atomicity for CUSTOMER it is necessary to assume her that the goods she will receive is a satisfying one, namely equal to Trans_detail. This section presents a new protocol SAFAP (Satisfying Atomic and Fair Anonymous Electronic Transaction Protocol) to achieve it with a little anonymous loss. In this method, POSTER takes responsible for CUSTOMER’s satisfying atomicity. When receive the goods, POSTER validates the goods and generates a new description for the goods and send it to CUSTOMER. CUSTOMER confirms it and then send the confirm message to TTP, TTP commits the whole transaction. The flow and relation of participators in the payment sub-protocol can refer to Fig.3.

Fig. 2. The flow and relation of entities in the model of satisfying payment

The process of the payment sub-protocol in SAFAP is elaborated as Fig.4. In step 3, CUSTOMER computes the drawn token Token_w and payment token Token_p. With these tokens CUSTOMER can execute payment. Then she sends blinded Token_p to BANK. In step 4, BANK executes a blind signature protocol with CUSTOMER. At the end of the protocol, BANK sends a signature to CUSTOMER, which is the signature of blinded Token_p by BANK. At the same time, BANK subtracts corresponding value from the CUSTOMER’s account. To keep the coherence, BANK maintains a database to save the pair of signature and value. In this way, the signature is regarded as coin worthy of value in later transaction. In step 5, CUSTOMER selects the Pickup_fashion, sets Pickup_Response, which is kept secreted by CUSTOMER and POSTER. CUSTOMER then chooses and fills the good and quantity to generate Trans_detail. In the end, CUSTOMER sends the signature and Trans_detail to SELLER.

324

J. Tang et al.

In step 6, SELLER logs TID+ and startup a local transaction clock. As well SELLER validates with CUSTOMER has to prove that payment token Token_p is signed in the signature. Afterward SELLER sends and value to BANK.

Fig. 3. The process of the payment sub-protocol in SAFAP

In step 7, BANK validates the blind signature Given failure, BANK summits RollbackReq to TTP to rollback the whole transaction. And if validation passed, BANK queries the signature-value database, compares the received signature-value with the record in the database, then BANK validates dual payment with the signature. Passed all these steps, BANK adds payment signature on the SELLER’s account, and generates Trans_guarantee for SELLLER. In step 8, received Trans _ guarantee from BANK, SELLER startup the process of dispatch, she transfers Pickup_fashion to POSTER, and notifies POSTER to ready for delivering goods.

Research of Satisfying Atomic and Anonymous Electronic Commerce Protocol

325

In step 9, POSTER picks up Afterward, POSTER estimates whether Pickup_fashion is permitted, if passed, POSTER inspects the goods and generates a Goods_Description. Finally POSTER generates Pickup_guarantee and sends to CUSTOMER. In step 10, CUSTOMER compares the Goods_Description with what she selected, if validation passes, she sends to TTP. In step 11, given received Pickup_guarantee before Expiration_Date, TTP checks whether there includes Rollback request, if exist, TTP sends Rollback command to all participators, or TTP sends Trans_Commit. On the other hand, if TTP do not receive Pickup_guarantee before Expiration_Date, she sends Rollback to all participators to rollback the transaction. Finally, received Trans_Commit, BANK begins really transfer process, viz. transfer value corresponding to the signature from the account of CUSTOMER to that of SELLER. SELLER dispatches goods to POSTER; POSTER delivers goods according to the Pickup_fashion. To pick up the goods, CUSTOMER is required to provide correct Response to avoid cheating.

5 Analysis of Satisfying Atomicity and Fair Anonymity In this section, we validate atomicity and anonymity informally. At the begin of the transaction, CUSTOMER applies for TID and Expiration_Date, then obtains drawn token and blind signature of payment token from BANK, as well she signs Trans_detail. SELLER sends the dispatch message to POSTER only when she received Trans _ guarantee from BANK. So, we can conclude that SELLER and CUSTOMER are both satisfied with Trans_detail. Meanwhile, CUSTOMER selects Pickup_fashion, and POSTER estimates it and provides Pickup_guarantee, therefore they are both satisfied with Pickup _fashion. After received payment token, SELLER validates it with BANK and receives Trans _ guarantee, which means that BANK and SELLER are satisfied with Trans _ guarantee . Additional, if, for the artificial reasons or network failure, TTP can’t receive Pickup_guarantee in Expiration_Date, then TTP sends Rollback command to rollback the whole transaction. SELLER sends cancel message to POSTER. BANK deletes payment token saved on account of SELLER, and returns it to CUSTOMER. In general, CUSTOMER initiates purchase request, and SELLER could receive correct transfer guaranteed by BANK, CUSTOMER could receive correct goods guaranteed by POSTER. Logs in the local databases are available for audit. Sum up, money atomicity, goods atomicity and certified delivery are enabled in this protocol. To enable satisfying atomicity, another way is to embed Trans_detail in the message to POSTER in step 8. POSTER inspects goods according to the Trans_detail. Only if the goods are coincident to the Trans_detail, POSTER submits to TTP, and then TTP commits the whole transaction. Anonymity in this protocol covers three aspects, CUSTOMER to SELLER, CUSTOMER to POSTER and CUSTOMER to BANK. The detail of the verification can be referred to [13]. As well as, when illegal transaction occurred, BANK retrieves

326

J. Tang et al.

payment token to TTP, TTP executes computation to realize owner trace, payment trace and origin trace. Accordingly, fair anonymity is available in this protocol.

6 Conclusion Online businesses are becoming prosperous. One of the most important things to prosper e-commerce is to provide fair, secure, privacy unsealed transaction. For existing application, either atomicity is not provided, or fair anonymity is unavailable. In this paper, we analyze satisfying atomicity and fair anonymity and propose a new protocol to enable them. For further work, we will improve in these aspects: (1) Electronic payment based on group blind signature. (2) More efficient atomicity without loss of anonymity. To improve the atomicity always means a loss of anonymity, therefore how to balance them is a practical problem.

References 1. 2. 3. 4.

5. 6. 7. 8. 9. 10. 11. 12. 13.

Larry Loeb. Secure Electronic Transactions Introduction and Technical Reference. ARTECH HOUSE,INC. 1998 B. Cox, J. D. Tygar, M. Sirbu, Netbill security and transaction protocol. Proceedings of the USENIX Workshop on Electronic Commerce. 1995:77-88 D.Chaum, A.Fiat, M.Naor. Untraceable electronic cash. Advances in cryptology:Crypto’88 Proceedings, Springer Verlay, 1990:200-212 Tang Jie, Li Juan-Zi, Wang Ke-Hong, Cai Yue-Ru. Research of Atomic and Anonymous Electronic Commerce Protocol. 9th RSFDGrC (International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing). Springer-Verlag, LNCS/LNAI. 2003, 5. 2639:711-714 J. D. Tygar. Atomicity versus Anonymity: Distributed Transactions for Electronic Commerce. Proceedings of the 24th VLDB Conference, New York, USA, 1998:1-12 J. D. Tygar. Atomicity in Electronic Commerce. Proceedings of the Fifteenth Annual ACM Symposium on Principles of Distributed Computing,1996,5:8-26 L. Camp, M.Harkavy, J. D. Tygar, B.Yee. Anonymous Atomic Transactions. In Proceedings of 2nd Usenix Workshop on Electronic Commerce. 1996,11:123~133 S. von Solms and D. Naccache. On blind signatures and perfect crimes. Computers and Security. October, 1992, 11(6): 581-583 D. Chaum, J.H.Evertse, J.van de Graff, R.Peralta. Demonstrating Possession of a Discrete Logarithm without Revealing It. Advances in Cryptology-CRYPTO ’86 Proceedings. Springer Verlag. 1987: 200-212 S. Brands. Untraceable Off-line Cash in Wallets with Observers. In Advances in Cryptology-Proceedings of CRYPTO’93, Lecture Notes in Computer Science. Springer Verlag. 1993, 773:302-318 M. Stadler, M. M. Piveteau, J. Camenisch. Fair blind signatures. In L. C. Guillou and J.J. Quisquater, editors, Advances in Cryptology—EUROCRYPT’95, Lecture Notes in Computer Science, Springer Verlag. 1995, 921:209-219 T. Sander, A. Ta-Shma. Auditable, Anonymous Electronic Cash. Lecture Notes in Computer Science, Spring Verlag. 1999,1666:0555-571 Tang Jie. The Research of Atomic and Fair Anonymous Electronic Transaction Protocol. YanShan University. 2002:35–59

Network Self-Organizing Information Exploitation Model Based on GCA* Yujun Liu, Dianxun Shuai, and Weili Han East China University of Science and Technology, Shanghai 200237, P.R.China. [email protected],[email protected]

Abstract. At present, the exploitation mode for network information has some serious drawbacks such as non-aftereffect, independence of each other and point-to-point in the massive, stochastic, parallel and distributed networks. Thus, this paper is devoted to a novel self-organizing exploitation mode for network information which is proposed in [1], and a new generalized cellular automaton (GCA) is used to discover the selforganizing information configuration, and some performance analysis was made under complex environment including metabolism, intermittent congestion and stochastic fault. The theory and experiment results have shown its advantage.

1

Introduction

With the rapid development of Internet, network construction and environment are becoming more and more complicated. To satisfy the performance demand of network, many methods have been put forward. Grid computing realizes the integration of the wide area scope and the cooperation computing, and users can access pellucidly the data through grid gate of network where information resources are distributed at the each grid nodes [2]. HeYong [3] applied the programming methods to the selection and distribution of information resources which was the bottle-neck problem of LAN. Yongxin Feng [4] provided a network reorganization algorithm in the network management. In the research of network dependability and network behaviors, various network behaviors have been observed ([5]-[7]). Claudio & Michela presented a methodology to model the behavior of TCP flow in [5], which stems from a Markovian model of a single TCP source, and eventually considers the superposition and interactions of several sources using standard queueing analysis techniques. Mathis et al.[6] focus on stochastic behavior of the congestion avoidance mechanism, deriving an expression for the throughput. Internet exhibits more and more complicated behaviors, such as large-scale non-linear dynamics, social interactions and metabolic phenomenon among massive network entities. In the matter of fact, both of the information exploitation * Supported by the National Key Foundational R&D Project(973) under Grant No. G1999032707, the National Natural Science Foundation of China under Grant No. 60135010 and No. 60073008, and the State Key Laboratory Foundation of Intelligence Technology and System, Tsinghua University. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 327–334, 2004. © Springer-Verlag Berlin Heidelberg 2004

328

Y. Liu, D. Shuai, and W. Han

and the network behaviors are deeply affected by each other. Up to now, however, the study on the network behavior has been almost restricted itself to such issues as traffic modelling, congestion avoidance, transmission delay, network throughput, packet-losing rate etc., and the resolutions are all static, close, serial, non-self-organizing and based on the simple environment. It can also be seen that the attention is mainly concentrated on how to alleviate various negative influences of the network behavior. The novel exploitation mode for network information is focused on making use of some positive influence of the network behavior on information exploitation. This paper is devoted to it which is based on the self-organizing network behavior and makes it possible to let the network information behavior become related, multi-to-multi, after-effected and dynamically adjust the distribution of network information. To this purpose, we present a new conception about the information exploitation events in the networks, and a new generalized cellular automaton (GCA) approach to the self-organizing information exploitation mode.

2 2.1

The Self-Organizing Information Exploitation Model under Complex Environment Conceptual Model

An information exploitation event is always regarded as a proliferating process of some information contents. The information proliferation can lead to a self-organizing configuration of information contents by which the networks can respond much more quickly and reliably to the users’ requests. And it can dynamically change with the occurrence of new exploitation events, and can be easily combined with other complicated mechanism involved in the performance phase-transition, social interactions, metabolic evolution, intermittent congestion and nodes stochastic faulty and so on. The ideal network environment does not exist congestion, network nodes crash and network information metabolism. But in reality, the networks are complex and full of these phenomenon. For the convenience of theory study and performance research under the real networks, we introduce the ideal network and complex network to the paper. The network congestion has two forms: intermittent congestion and persistent congestion, and the paper is confined to the discussion of the network intermittent congestion. Considering the lifetime of the network nodes, the random exception also should be included that its no-exception-time obeys the distribution of negative exponent. And the network information have the same character, which we call metabolism phenomena. The complicated network environment that was discussed in the paper contains primarily three aspect: the metabolism of the network entity, the intermittent congestion of the network and the random fault of the network. The metabolism phenomena reflects the adaptation of the network entity, survival of the fittest. The network random fault consider the life cycle of the network entity that

Network Self-Organizing Information Exploitation Model Based on GCA

329

the no-fault-time obeys the distribution of negative exponent. In the network information self-organizing exploitation mode, we describe the network entity metabolism, intermittent congestion, random fault to the adaptation factor, the congestion factor and the life factor, which respectively represent the past, current and the future state. The architecture of the model, as shown in Fig. 1, consists of three parts which are the source selection strategies, information proliferating strategies and the network self-organizing configuration of network information besides the network users. It manages the magnanimity random concurrence distributed information exploitation behaviors and their after-effect to form a self-organizing configuration of the information distribution in the networks.

Fig. 1. The architecture of the network information exploitation mode. 1. The two point lineation shows that networks provide the user an needed information which was completely specified by the information type and source address in the traditional network information exploitation mode. 2. The dashed frame shows the mechanism and framework of the network self-organizing information exploitation. The network self-organize the configuration of network information according to the information contents, and decide information source addresses, provide the users an needed information, dynamically update the network self-organizing information configuration by the information proliferation strategies.

2.2

Mathematic Model

We will consider the question of complex network environment which includes intermittent congestion, nodes stochastic fault and information contents metabolism in the mathematic model. In the following, its formalization is described. Definition 1. The exploitation event for network information at time is a subset of the quadruple set defined over namely

330

Y. Liu, D. Shuai, and W. Han

where represents the network information contents; is a set of information source addresses; is a set of the objective addresses for information exploitation; is a set of network performance factors needed for information self-organizing, which is a subset of the triple set defined over where is a set of adaptation factors of network entity metabolism; is a set of delay factors of intermission congestion; is a set of life factors of random fault. At the time the state at the location can be represented as and the state function can be defined as where denote respectively the adaptation factor, the congestion factor and the life factor. Definition 2. Denotation means that there is information content in the information address So we have an address set and an information set Definition 3. Source Selecting Process(SSP). For a network information exploitation the source selecting process can be thought as the mathematics programming problem that is as follows: under considering the network entity performance factors of set it ensures the cost function minimum when source address are selected according to the probability

where is a set of source address number in the set is the distance between source address in and information destination address is the data package number transported. is the threshold of the state function. Definition 4. Information Proliferating Process(IIP). IIP æ will consider the event and the network state, then recombine the distribution of the information content that is to say, to expand and modify the set of The destination address of information content proliferated to must nearer to the source address. IIP æ will adopt different strategies to satisfied the following formulas according to the set of proliferating address

2.3

Strategies of SSP and IIP

For the sake of research, as for the model of network self-organization information exploitation, there have three strategies to be proposed respectively to SSP and IIP. For source selecting strategy(SSS), we present three methods which are the closest source selecting strategy(C), the maximal processing capability source

Network Self-Organizing Information Exploitation Model Based on GCA

331

selecting strategy(MPC) and the first fitted source selecting strategy(FF). According to the definition 2 and the Strategy C, the node which is the nearest to the source address and the state function will take larger value has the bigger probability selected, it can be described as the following formula.

we also can find that the node which have maximal processing capability and have larger value state function will easy be selected in the strategy MPC.

where is the capability of the information service of source address With regard to the strategy FF, the following formula is valid.

Three strategies of information proliferating(IPS) are also brought forward which include direct proliferating strategy(D), random proliferating strategy(R) and compromise strategy(CS). The strategy D can proliferate information directly to the visiting node. The strategy R will randomly proliferate in the range of the set of proliferation address set. The strategy CS proliferates the information to the node located at the middle of the source and the destination.

2.4

Network Complex Environment

Concerning the network environment, we discuss in three sides which are the metabolism of the network information, the intermittent congestion of the network traffic and the random fault of the network entity. The metabolism phenomenon of the network information obey the principle of “using to grow and unused to destroy” to adjust the distribution of the information content. A node’s adaptability of network information can be measured by the adaptation factor which is direct proportion to the frequency of information used, when is less than the threshold the information content will be deleted. It reflects the past of the network environment. The intermittent congestion of the network traffic have the character of outburst, intermission and ability to restore. The congestion factor describe the current state of the network environment, and it can be measured by many parameters such as the package lost rate, the fractal number, the delay time and the traffic flux. When the is larger than its threshold the network enter into the stage of the congestion, and control strategy must be carried out, when the proliferation address is far away the area.

332

Y. Liu, D. Shuai, and W. Han

Fig. 2. The evolution demo of the network self-organization information exploitation

Fig. 3. The experiment results of the network self-organizing information exploitation mode based on the GCA.(SE is the traditional mode but others not; S1:the Closest source selecting strategy; S2: the Maximal load source selecting strategy; S3: the First fitted source selecting strategy)(R1-R3 represent request number to 1000,800,600 respectively) (The number of the cells in the GCA = 2500; The number of information contents = 50; The number of information sources specified by users = 5)

Network Self-Organizing Information Exploitation Model Based on GCA

333

The network entities often crashed suddenly when running. We denote the no fault lifetime as a random variable, which distribution density function obey the negative exponent distribution which is where is the attenuation intensity, is the time variable. We note that the mode of the traditional information exploitation has the single path rather than the multiple path of the network self-organization mode. The dependability of the information exploitation can be described by the lifetime factor Moreover, the fault rate and the average lifetime of the single path respectively are resumed as When there exist paths between the source and destination address, the lifetime of the multiple paths is the maximal lifetime of the paths, which distribution function can be given as

where T is the maximal lifetime length. The fault rate and the average lifetime of multiple path respectively are

It’s obvious that the fault rate and average lifetime of the multiple path excel the single path’s.

Fig. 4. The figure of the average lifetime and fault rate of multi-path

3

Simulations and Conclusions

In the section, firstly, we give the process demonstration of the network selforganization information exploitation in which some metabolism cells can be seen

334

Y. Liu, D. Shuai, and W. Han

and at last the convergence configuration is achieved. We can find from Fig.2 that the information has been proliferated to the whole networks by self-organization after some time and the information service mode is changed that users can get information from the nearest source which contains the information needed. The total transmission distances of the information exploitation practically offered by the networks are greatly reduced. we also compared the self-organization information exploitation mode with the traditional one under complex environment by some experiments. The efficiency results are achieved in such aspects as the average visiting distance, the balance variance representing the degree of balance between the idle and busy nodes, nodes stochastic fault and intermittent congestion, which can respectively be seen from the Fig.3. The experiments show that the novel model has many advantages over efficiency, robustness, suitability, reliability and adaptability under different environments. The average lifetime and fault rate of multi-path can be seen from Fig.4. The results show that the average lifetime will increase and the fault rate will decrease with the increase of the the path. The conclusions can be drawn from the above analysis and simulation as follows: In comparison with the present mode used for the network information exploitation, the self-organizing distributed mode not only greatly reduces the average distance and information lost number/rate when the faulty nodes occur, extremely balances the idle and busy server nodes, but also improves the real-time performance, adaptability, reliability and security for the network information exploitation. In the future research, we will put importance on the simulation and the software and hardware implement.

References 1. Shuai. Dianxun, Liu. Yan : A Novel Self-Organizing Approach to Network Information Exploitation Based on Generalized Cellular Automata. Chinese J. of Science and Technology. 26 (2003) 895–907 2. D. Laforenza : Grid programming: some indications where we are headed. Parallel Computing. 28 (2002) 1733–1752 3. He. yong : The modeling and Algorithms of Information Organization and Allocation Problem on Internet Communications. Chinese J.Computers. 24 (2001) 597–601 4. Feng. yongxin,Wang. guangxing .: Research and implementation of a network reconfiguration algorithm in the network managerment. J.Computer Research and Development 38 (2001) 1194–1198 5. C. Claudio, M. Michela : A new approach to model the stationary behavior of TCP connections. IEEE Infocom. (2000) 367–375 6. M. Mathis, J. Semke, J. Mahdavi: The macroscopic behavior of the TCP congestion control algorithm. Computer Commun.Rev. 27 (1997) 67–82 7. B. A. Huberman, R. M. Lukose : Social Dilemmas and Internet Congestion. Science. 277 (1997) 535–537

Admire – A Prototype of Large Scale E-collaboration Platform* Tian Jin, Jian Lu, and XiangZhi Sheng State key Lab. of Software Development Environment, BeiHang University, Beijing, China {jintian,

lujian, xzsheng}@nlsde.buaa.edu.cn

Abstract. In this paper, we will discuss the techniques of building up a large scale prototype platform for collaboration on IP-based network, and summarize the structure of that platform. To solve the large-scale problem, we make performance analysis on key modules of the system. Based on the result, we provide a basic prototype platform which contains multimedia communication and basic data collaboration function. Also, we discuss the method of combining other data collaborations and video conferences into this platform. At last, we will list some detail deployment of the prototype system. Keywords: Admire, Collaboration platform, Videoconference

1 Introduction Scientific collaboration has become more and more important in current scientific research. Traditional hardware videoconference could only provide basic multimedia communication between a few people. In this paper, we will introduce how to build up the platform for large scale multimedia and data collaboration platform over the Internet. In first several chapters, we will introduce the problems and solutions in building up such platform, and adapt the solutions in our architecture design. In the latter part of the paper, we will make some performance analysis, and show current deployment status of Admire in China. At last, we will draw some conclusion, and explain what will Admire system[14] do in next stage.

2 Related Work The two different standard groups have different architecture of multimedia communication and control protocol. The IETF architecture for multimedia communications includes Real-Time Transport Protocol [11] and Real-Time Transport Control Protocol, and other several upper level protocols (Session Initiation Protocol [5], Session Description Protocol [4] and Real Time Streaming Protocol). This architecture has * This work has been supported by the 973 project of MOST - Massive Information System (G1999032711). M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 335–343, 2004. © Springer-Verlag Berlin Heidelberg 2004

336

T. Jin, J. Lu, and X. Sheng

focus on peer-to-peer structure for all of the users. This structure has much better scalability and adaptability than C/S structure. On the other hand, the ITU-T multimedia protocol suite for packet networks H.323 has more data/control interleaving, leading to complex, centralized implementations. Most of H.323 equipments are make by hardware, so it is much easy to use, but much expensive. Mbone Tools [13] is the first group that uses multicast and RTP technology to build up large scale video conference system. Base on it, Access Grid group build up the large scale A/V communication platform over Internet2. The original purpose of AG[17] is to build up a platform that can Access the Grid computation for DOE of U.S. If anyone connects to Internet2 via multicast, he can communicate with any people of more than 150 universities and organizations on AG list. By contrast, H.323 video conference system is designed to be used on VPN network rather than Internet. The inter-cooperation technology of H.323 protocols and the SIP/SDP/SAP protocols can be obtained from MECCANO project[9], VRVS project, GlobalMMCS[15] project. We are trying to build a platform which will provide basic A/V service for communication, basic data service for collaboration, unified network platform for heterogeneous network problem and plug-in method for appending more data collaboration services.

3 Problems and Solutions 3.1 Design During build up large scale collaboration platform, lots of user will transmit audio and video data to the center via many different kinds of network. So, centralized Media Control Unit (MCU) could not process all audio and video data on time in this circumstance. Also, the MCU could not let user join the multimedia collaboration according to their network bandwidth. The platform should be an opened system, and should compatible with the existed audio and video applications. For example, system should compatible with H.323 video conference system, which means current H.323 system could join the multimedia conference by the platform. Also, the system should combine with stream media system, and let user join the platform via stream media technology (such as VOD). For data collaboration, basic collaboration environment should be provided, and should support new user to add different ad-hoc applications to fit their specific collaboration needs. Above all, Admire system need to solve following problems to setup large-scale collaboration: Network Problems Distributed A/V Gateway

Admire – A Prototype of Large Scale E-collaboration Platform

337

Fig. 1. Structure of Admire System

Appending Data Collaborations Live Stream Rebroadcast H.323 Inter-cooperation

3.2 Network Problems Heterogeneous Network Multicast technology is introduced to solve large scale A/V communication. In Mbone and AG tools, A/V data is distributed via multicast protocol. But the multicast protocol is not supported in standard ISP. The solution to this problem is combine tunnel and access-point gateway together. The topology of this method is organized as following way: Server will be located at MB one multicast network, which forward multicast data to unicast data to proxy or client. The multicast data forwarding is according to user’s RTP and SSRC selection. Proxy will be located at a multicast network (such as LAN). The multimedia data will come from Server, and continue to forward to local multicast network data. The forward is according to user’s selection. Client will use an outband signal (Common Message Bus) to inform the Server/Proxy which multicast data they need, and the Proxy will also inform Server by this way. This topology will fit the network condition which the bandwidth of inter-link is much lower than that of the intra-link. Also, the hierarchical structure will overcome the shortcomings of the plain structure of multicast data transmission in large-scale. Firewall The firewall always takes disadvantage for multimedia communication. Both in H.323 and MBone system, multimedia stream can not transmit through firewall. To

338

T. Jin, J. Lu, and X. Sheng

Fig. 2. Illustration of multicast multimedia gateway

Fig. 3. NAT and firewall problem

solve this problem, a common UDP network layer is introduced to transparently solve NAT and packet-filter firewall. All multimedia data is virtually mapping into one socket port, and then data communicate between the new socket port without any modification in current application. The application will send the data to destination via the virtual address and port. We call this technique as NSP (Network Service Provider layer).

3.3 Distributed A/V Gateway Due to the scale of the collaboration, single MCU or media process unit will cause single-fault problem. By developing distributed A/V gateway, the system can build up a distributed media gateway cluster. For video stream, the user can connect to a media gateway, and upload his video stream to the gateway. And video stream from other user might push to the user from other gateway. The gateways are self-organized into a mesh, and transmit the video stream via the shortest path. For audio, we receive and decode those audio packets, mix them with different requirements for each receiver. The selection of gateway is based on average load of the media gateway. Also, because we know the content of video and audio streams, we can automatically adjust the bandwidth of video stream according to network bandwidth; we can intro-

Admire – A Prototype of Large Scale E-collaboration Platform

339

duce AGC (Automatic Gain Control) and silence suppression techniques to improve audio communicating impression in distributed media gateway.

3.4 Data Collaborations Data collaboration is killer application of e-collaboration systems. We define common data collaboration as text, whiteboard, screen-share, which fit the basic need of standard collaboration system. Other data collaborations depend on the ad-hoc needs of different users. For example, the document writer team might need Office collaboration; the system engineer might need AutoCAD collaboration; the programmer might need programming collaboration. So, the one-fit-all collaboration solution for different kind of people is impossible. It is necessary to provide common collaboration applications and SDK to develop ad-hoc collaboration applications. Common Message Bus Common Message Bus (MBus) is a simple light-weight message-oriented group communication middleware for any component in Admire system to communicate with another component(s).

Fig. 4. Mbus structure

MBus will provide a basic message-oriented transmission scheme for data collaboration. It provides p2p reliable message transmission, p-to-n reliable message transmission, and p2p RPC message transmission. Also, collaboration application can get some other basic session and user information by using MBus to communication with local Admire UI application and component. Whether new collaboration application use MBus or not is not a main problem of Admire System. MBus only provide an additional choice for user to develop their collaboration tools. Still lots of other MOMs can be selected (such as JMS) for data collaboration development.

340

T. Jin, J. Lu, and X. Sheng

Independent Data Services Both at server and client, it is easy for user to add new collaboration into Admire system. The data collaboration service is independent with multimedia service and session management. At server side, you can directly add a new collaboration server application, and save basic collaboration information into database. If the user starts a new collaboration client, it will automatically connect to the designed server by Admire client via database. At client side, you can add any new data collaboration with plug-in method, by change the plug-in configuration in Admire client software. The independence of data service make it possible for users to add new collaboration correspond to their needs, and let Admire system provide better multimedia and message-oriented middleware service for new data collaboration. With the bundle of new ad-hoc collaborations and basic multimedia service, the Admire can build up a largescale interactive and collaborative platform over the Internet according to user’s need.

Fig. 5. Live Stream Rebroadcast Structure

3.5 Live Stream Rebroadcast Stream media provide a way for non-interactive user to join Admire’s A/V session. The multimedia data is receiving and transcoding into a common “Access Gateway”, and the gateway forward the multimedia data to correspond stream media producer and server. The stream media server can redistribute the multimedia stream with consideration load balance issues.

3.6 H.323 Inter-cooperation Due to H.323 and other videoconference system only support A/V communication, inter-cooperation between these system and Admire is based on multimedia application.

Admire – A Prototype of Large Scale E-collaboration Platform

341

There are three kinds of entities in inter-cooperative framework. The first entity is the community of collaboration client, using various A/V technologies, such as H.323, SIP and Access Grid. All the clients will be connected into the system through WebService Gateway, which build them into web service entities. The second is the Media Server, which is a web service entity for RTP communication channels between the clients. The third entity is the session server, providing the basic services for an A/V session, such as constructing collaboration groups, maintaining the membership, advertising collaboration resources and binding communication channels. The session servers can be termed the core collaboration middleware. The XGSP[15] protocol provide a way to describe common session description and information for all kinds of video conference system.

Fig. 6. Architecture of client and server of Admire collaboration platform

3.7 Summary With the consideration of all previous problems, the Admire collaboration client’s and server’s architecture is: The integrated UI provide ability to combine multimedia, basic and ad-hoc collaboration together. The NSP layer provides firewall supports for multimedia data. The MBus layer provides basic message-oriented middleware for collaborative applications. This will meet the basic need to build up large-scale collaboration platform. This server’s structure could let different kinds of clients to access a unified multimedia collaboration server without modification.

4 Performance Analysis The scale of the system is based on the distributed media gateway. In order to improve scalability, gateway can both run separately and run as gateway cluster. If average load of gateway is low, use single gateway will be cheaper. If average load of gateway is high, clustered gateway will provide load balance function.

342

T. Jin, J. Lu, and X. Sheng

By analysis result, the media gateway can process 50Mbps A/V data stream per gateway. Let’s assume the data of each multimedia stream is 384kbps, for full interactive mode, the gateway’s maximum load capacity is:

When 10 uniWe can deduce that and cast satellite sites full-interactive with each other by this gateway, total bandwidth in the gateway is 50Mbps. That table also means that 20 sites can make up a fullinteractive multimedia conference by 4 gateways’ cluster. If each site only receives 4 remote videos and 1 local video, then the gateway’s maximum load capacity is:

and The table means that 65 We can deduce that sites can make up a 4-video multimedia conference by 4 gateway’s cluster. That will fit mostly large scale interactive meetings.

5 Conclusion In ministry of education, more than 150 universities and all departments of education of Chinese provinces have used this platform for ordinary meeting inside ministry of education. At most, about 100 sites have join this platform simultaneously. In ministry of science and technology, all departments of science and technology of Chinese provinces have begun to use this platform for distance, large-scale scientific

Admire – A Prototype of Large Scale E-colIaboration Platform

343

collaboration. Many other scientific institutes and administration departments have begun to use this platform for common scientific discussion and project evaluation.

Acknowledgement. Thank Professor Li Wei for the choice of this research and guide on the research method. And I should also thank Chen QingJi, Huang Tao, Wang Lei, Zhu DuoZhi, Zhou Ning, Meng XiangZheng, Shan BaoSong, Sun LiLi, Yu Min and Yu XiangNing for their support in implementing Admire prototype system.

References 1.

2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

S. Deering, D. Estrin, D. Farinacci, V. Jacobson, A. Helmy, D. Meyer, and L. Wei. Protocol independent multicast version 2 dense mode specifi-cation. Internet Draft, June 1999. draft-ietf-pimv2-dm-03.txt. Steve Deering. Host extensions for IP multicasting. RFC 1112, August 1989. Steve Deering. Multicast Routing in a Datagram Internetwork. PhD thesis, Stanford University, 1991. M. Handley and V. Jacobson. SDP: Session description protocol. RFC 2327, April 1998. M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg. SIP: Session initiation protocol. RFC 2543, March 1999. Mathias Johanson. RTP translator REFLEX. Swedish Institute for System Development, January 2000. Dirk Kutcher, Jörg Ott. The message bus. White Paper, January 2000. http://www.mbus.org/. Inc. Live Networks, livegate, 2000. http://www.livegate.com/livegate/. MECCANO. Telematic for research project 4007, 1998-2000. http://www-mice.cs.ucl.ac.uk/multimedia/projects/meccano/. University of Oslo. Reflector session protocol (RSP) specification,2000. http ://www .ifi. -uio. no/_meccano/reflector/rsp. html. H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A transport protocol for realtime applications. RFC 1889, January 1996. Henning Schulzrinne. RTP translator RTPTRANS, August 1999. http://www.es.-Columbia. edu/_hgs/rtp/. Computer Science Department University College London. Mbone conferencing applications, 2000. http://wwwmice.cs.ucl.ac.uk/multimedia/software/. Admire System, BeiHang University, http://www.nlsde.buaa.edu.cn/projects/admire. Geoffrey Fox, Wenjun Wu, A Web Services Framework for Collaboration and Audio/Videoconferencing. PDPTA’02 Jin Tian, Chen QingJi, Lu Jian, Multimedia Multicast Gateway Infrastructure. SCI2002 Access Grid, http://www.accessgrid.org Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The Stanford Digital Library Metadata Architecture. Int. J. Digit. Libr. 1 (1997) 108–121

A Most Popular Approach of Predictive Prefetching on a WAN to Efficiently Improve WWW Response Times Christos Bouras1,2, Agisilaos Konidaris1,2, and Dionysios Kostoulas1 1

Computer Engineering and Informatics Department, University of Patras, GR-26500, Patras, Greece [email protected]

2

Computer Technology Institute-CTI, Riga Feraiou 61, GR- 26221, Patras, Greece {bouras, konidari}@cti.gr

Abstract. This paper studies Predictive Prefetching on a Wide Area Network with two levels of caching. The WAN that we refer to is the GRNET academic network in Greece. We rely on log files collected at the network’s Transparent cache (primary caching point), located at GRNET’s edge connection to the Internet. Our prefetching model bases its predictions on popularity ranking of passed requests. We present a “n-next most popular” approach used for prefetching on GRNET’s architecture and provide preliminary results of our experimental study, quantifying the benefits of prefetching on the WAN.

1 Introduction Web Prefetching has been proposed mainly as a complementary procedure to caching, due to limitations in the performance of caching [1]. The work in [2], [3] present useful overviews of caching and prefetching. The benefits of prefetching have been explored in various Internet configurations, including client/server [4], client/proxy/ server [1], [5], [6] and client/mediator/server [7], [8] systems. In this paper we present a study on how prefetching can be performed on a Wide Area Network with three levels in its caching hierarchy. A Transparent cache on the edge of the WAN to the Internet and local Proxy servers on the edge of the backbone. A prediction algorithm is at the heart of any prefetching system. The Prediction by Partial Match (PPM) algorithm, which originates in the data compression community, has been explored in depth. In [4], [7] PPM is used to create branches from historical URLs. Moreover, Data mining techniques and algorithms with Markov models have been proposed for predictions [9], [10], [11]. In [12] Padmanabhan and Mogul propose a method in which the server makes predictions while individual clients initiate pre-fetching. Finally [6] presents a popularity-based Top-10 approach to prefetching, which combines the servers’ active knowledge of their most popular documents (their Top-10) with client access profiles. Our work is based on a “n-next most popular” approach that uses access log data to predict future requests, based on most popular requests following a specified request.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 344–351, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Most Popular Approach of Predictive Prefetching on a WAN

345

This scheme uses a popularity-based algorithm quite similar to the one proposed in [6] by Markatos and Chronaki. However, the knowledge of the most popular documents from those found to have been requested after a given server’s document is used, limiting predictions only to those pages that appear to be highly related to the currently displayed page. Furthermore, a page dependency threshold is used in a similar way to [12] in order to keep the amount of prefetched documents low, in case of bandwidth limitations. Finally, the computational and storage complexity of our algorithm is much lower than that of the more complex Markov, PPM or Data mining techniques, making it possible to serve our primary goal of applying prefetching which is to reduce the response times on the WWW. In this paper we look at the case of several inter-connected LANs with the use of a broadband backbone that provides access to the Internet through a main access point. This is the case of the Greek Research Network, GRNET [13].

2 The n-Next Most Popular Approach To predict a future request made by a client, we first need to build the access profile for this client. The Transparent cache log data is used for that reason. Analysis of log data focuses on the frequency of visits and the sequence of requests. These specify the preferences of users and imply their future behavior. Log data processing includes popularity ranking of requested pages, frequency of content change estimation and page dependency examination. All these procedures are carried out for every client separately and result in the construction of different popularity lists for each client. These popularity lists are then used by a prediction algorithm to compute which pages are most likely to be requested next and by an additional decision algorithm that decides whether prefetching will be applied or not, and how many pages are going to be prefetched, based on bandwidth limitations determined by available resources. If an overall approach is followed, both the prediction algorithm and the decision algorithm use general popularity lists extracted by adding up popularity data from all the separate client-based popularity lists. Popularity ranking: The basic goal of log data analysis is finding the most frequently requested pages after each page. We look for pages that were accessed within n accesses after a specified page. The parameter n is called lookahead window size. Any page requested within n accesses after a specified page was requested is considered to be a n-next page of it. In order for a page to be counted as a n-next page of an examined page, it also needs to have been requested within the same user session that the examined page has been requested. For every page in the log data we find the frequency of visits within the lookahead window size, of all other pages found to be a n-next page to it. The value of the lookahead window size needs to be large enough to extend the applicability of the prefetching algorithm and small enough to avoid abusing the system and network resources available. Our study of Transparent cache log data reveals that pages accessed more than 5 accesses after a specified page are not highly related to this page, we choose the lookahead window size value to be equal to 5.

346

C. Bouras, A. Konidaris, and D. Kostoulas

Initially, the process of n-next popularity ranking is carried out for every client separately. For each web page that has been requested by the client, the pages that had been requested within n accesses after it are stored. The instances of any of these pages as n-next of the examined page are calculated and the pages are ranked according to their popularity as n-next of the examined page. Thus, a n-next popularity list is created for every page requested by the client (client-based n-next popularity ranking). Putting the results from all clients together, we build a general n-next popularity list for every page logged, which maps the web behavior of the general population for that page (overall n-next popularity ranking). Frequency of change: As in the case of n-next popularity ranking, the process of finding a page’s frequency of change as n-next of a specified web page is carried out both for every client separately and overall. In the first case, frequency of change is estimated only for those times the page has been requested by the specific client within n accesses after the examined page. In the second case, all occurrences of the page as n-next of the examined page are taken into account. Frequency of change values are kept for every page in the proportional field of the appropriate n-next popularity list of the specified page. Page dependency: The accuracy of prediction of the next request to be made by the client is affected by the extent of relation between pages. If a page that is a candidate for prefetching, is highly dependent of the currently displayed page, then its prediction as a client’s next request, has a high probability of being correct. Dependency is defined as the ratio of the number of requests for a page as n-next of a specified page (stored in a n-next popularity list), to the total number of requests for the specified page (stored in a page popularity list). Prediction algorithm: To predict a future request of a client, we use a simple algorithm that is based on n-next popularity data. Suppose a client is currently displaying a page. To predict his next request we rely on the client’s request history for the currently displayed page. The pages that have the best chance of being requested after the currently visited page are those that were most frequently visited as n-next of it in the past. Thus, the pages, which are at the top of the n-next popularity list of the currently displayed page, appear to be candidates for prefetching. The number of pages that are going to be predicted as candidates for prefetching is specified by the parameter m, which is called prefetching window size. The prediction algorithm suggests the m first web pages of the n-next popularity list of the currently displayed page as the most probable pages to be accessed next. The prefetching window size is a parameter of the prefetching scheme. A large value of m results in many pages being prefetched, which increases the number of successful prefetching actions. However, more bandwidth is required to perform prefetching then, resulting in a considerable increase of network traffic. Decision algorithm: The “n-next most popular” prefetching model that is proposed in this paper uses a decision process, which determines whether or not prefetching will be applied and how many pages will be prefetched. Prefetching is decided for any page suggested by the prediction algorithm. We characterize this decision policy as an aggressive prefetching policy. However, when available bandwidth is limited we need to restrict our model and perform prefetching only for those pre-

A Most Popular Approach of Predictive Prefetching on a WAN

347

dieted pages that appear to have a high chance of being actually requested. Those are pages that are highly dependent to the currently displayed page or pages whose content seems not to change frequently. This decision policy is called strict prefetching policy. In this case, the decision algorithm checks the following parameters for any page proposed by the prediction algorithm: its size, its dependency to the current page and its rate of content change, and decides, regarding bandwidth limitations, whether prefetching of that page would be advantageous or not. If the size of the page is larger than the average size of all visited pages, then a possible unsuccessful prefetching action performed for this page would affect network traffic dramatically. So prefetching is decided for such a page only if the dependency and the frequency of change values estimated for it satisfy the thresholds used to assure that prefetching of this page is very probable to be successful.

3 Results In order to evaluate the performance benefits of our prefetching scheme we use tracedriven simulation. Access logs from the GRNET Transparent cache are used to drive the simulations. The results presented in this paper are based on logs of web page requests recorded over a 7-day period. In all experiments, 80% of the log data is used for training (training data) and 20% for testing (testing data) to evaluate predictions. Furthermore, all traces are preprocessed. The performance metrics used in our experimental study: Prefetching Hit Ratio is the ratio of prefetched pages that the users requested (useful prefetched pages) to all prefetched pages. It represents the accuracy of predictions. Usefulness of Predictions is the ratio of prefetched pages that the users requested (useful prefetched pages) to all requested pages. It represents the coverage (recall) of predictions. Prefetch Effectiveness is the ratio of requests that are serviced from prefetched documents to the total number of requests for which prefetching is performed. This value is different from that of Usefulness of Predictions as only requests for which prefetching is applied are taken into account. Network Traffic Increase is the increase in network traffic due to unsuccessful prefetching. It represents the bandwidth overhead added, when prefetching is employed, to the network traffic of the non-prefetching case. Average Rank is the average rank of prefetch hits (in cases of successful prefetching action) in the set of predicted pages (or in the n-next popularity list of the active request). The experimental scenarios for the evaluation of the “n-next most popular” prefetching scheme’s performance, as these derive from the alternative values of the parameters mentioned above, are: 1. Aggressive policy, m = 5, client-based and overall prediction 2. Aggressive policy, m = 3, client-based and overall prediction 3. Strict policy, m = 5, client-based and overall prediction

348

C. Bouras, A. Konidaris, and D. Kostoulas

For every request examined the actual request that was made just after it (by the client) is checked from the simulation log data. This request is compared to the page suggested for prefetching by the “n-next most popular” algorithm proposed. If the actual request was the one predicted, then this request is counted as a prefetch hit. Otherwise a prefetch miss is logged and traffic overhead for the unsuccessful prediction is estimated. In the first case the rank of the successfully predicted page is also found in the n-next popularity list in order to calculate the average rank for all useful prefetched pages. Table 1 shows that when the n-next popularity data is obtained from the general population, instead of client log data, prefetch effectiveness is a bit higher (54%). However, this requires an 18% traffic increase. This is expected, since in the case of overall prediction there is greater availability of n-next popularity data. Therefore, prefetching is performed for more requests. As a result, the cost in bandwidth is greater, but prefetch effectiveness is higher. If traffic increase is limited to 8%, then prefetch effectiveness is found equal to 50%. It is clear that for the same traffic increase the performance results of client-based prediction are better since client data implies more accurately the future web behavior of the user connected to this client than data extracted from all clients does. It is clear that a small value of the prefetching window size provides better accuracy of predictions. Actually, the less documents a client is allowed to prefetch, the higher its prefetching hit ratio will be, as only highly probable objects are going to be prefetched. When practicing simulation for a smaller value of the prefetching window size (m = 3) and client-based prediction, we experience a significant increase of hit ratio (58%). In addition, less bandwidth is required. However, usability of predictions is lower (25%), as less prefetching actions are performed. Table 1 shows results taken for all three cases of client-based prediction.

A comparison of the performance results for the different prefetching policies in the case of client-based prefetching is also depicted in Figure 1. Figure 2 (a,b) compares performance results of client-based and overall prefetching applied at the Transparent cache. As we saw earlier network traffic overhead is much more in the case of prefetching based on the general population than in the client-based scenario.

A Most Popular Approach of Predictive Prefetching on a WAN

349

Figure 1 shows that the recall of the algorithm is greater when a larger prefetching window size is used or a more aggressive prefetching policy is carried out, as in both cases more prefetching actions are being performed and therefore more useful pages are pre-sent. The use of a smaller prefetching window size appears to limit the coverage of prefetching method more than the use of a more strict policy, but this results in a significant increase of the accuracy of predictions (58% for the aggressive policy with m=3 compared to 51% for the strict policy with m=5).

Fig. 1. Comparison of client-based policies for different performance metrics.

Fig. 2. Comparison of client-based and overall prediction scenarios for all policies (graphs should be read in pairs of bar charts)

As we mentioned earlier in this paper, a basic motivation for applying a prefetching scheme is to reduce the delay that an end user (client) experiences when requesting a Web resource. The computational cost of the “n-next most popular” algorithm presented in this paper is very low, since a simple algorithm, with no special operational or storage requirements, is used for the construction of the “n-next most popular” prediction model. This results in an efficient improvement of response times

350

C. Bouras, A. Konidaris, and D. Kostoulas

experienced by users as no significant time is needed to perform predictions about which pages to prefetch. The application of prefetching on a higher level, the level of a Wide Area access point, offers the opportunity to use additional amount of bandwidth. Therefore, more predictions can be made, resulting in the increase of the number of useful predictions. The impact of this increase on the accuracy of predictions and the network traffic increase is not so significant in the case of a WAN due to greater bandwidth availability. The “n-next most popular” approach appears to have high usefulness of predictions, taking advantage of the available bandwidth when performing prefetching on a WAN. For the case of the “most popular” aggressive policy with prefetching window size equal to 5, for example, the usefulness of predictions is equal to 27,5%. It is found that if a more complex prefetching algorithm, a PPM algorithm with proportional values for its parameters with the case of the “n-next most popular” algorithm mentioned above, was used on the same Wide Area architecture the usefulness of predictions would be no more than 24%. Furthermore, the difference in the traffic increase between the two methods is insignificant for a WAN. The “n-next most popular” algorithm appears to add only 2% more traffic increase than the PPM algorithm does. This shows the advantage of applying the “n-next most popular” prefetching algorithm in the Wide Area, since it manages to profit a lot from prefetching by effective use of the extra bandwidth that is available on a WAN. All results studied in the above paragraphs clearly show that the application of prefetching in the Wide Area can be quite beneficial. Even with the use of a simple prediction algorithm, as the “n-next most popular” algorithm proposed in this paper, the accuracy of predictions can reach 58% (case of aggressive, user-based policy with prefetching window size equal to 3) with an insignificant for a WAN increase of traffic network equal to 4%. In fact, performance results are better, if we take into account that many of the prefetched requests will be used by more than a single end user, as prefetching in many cases is performed for ICP requests made by Proxy servers, which in turn serve many individual clients connected to them.

4 Future Work and Conclusions In this work we did not study the prefetching of dynamically constructed resources such as search engine results or parameterized pages. The study of whether dynamic content may be included in prefetching, is very “attractive”. This study must include the Web resource frequency of change problem in order to provide adequate results. Another important open issue is the creation of an algorithm that would prioritize ICP requests to the Transparent cache over direct TCP requests. This idea is based on the observation that ICP requests originate from Proxy servers and TCP requests originate from single users. The prioritization of ICP requests in prefetching intuitively means that the resulting prefetched resource could potentially be useful to more than one clients since it would reside on a proxy server. We also intend to study the usefulness of prefetched objects in the case of ICP requests

A Most Popular Approach of Predictive Prefetching on a WAN

351

Prefetching can be highly beneficial to the reduction of User Perceived Latency. In this paper we argue that prefetching can be more efficient if it is applied at the edge network connection of a WAN. This approach can be more efficient than applying client initiated prefetching because of the more efficient use of available bandwidth and because prefetching at this point may be useful to many clients. In this work we have shown that if we employ an “n-next most popular” approach and find that prefetching can be potentially beneficial to the GRNET WAN. Of course many further issues must be explored, before deploying prefetching on the edge of GRNET. Preliminary results provide a clear indication that response times would be significantly improved in GRNET if a simple “most popular” prefetching policy was performed at the Transparent cache.

References 1. Kroeger, T., M., Long, D., D., E., Mogul, J., C.: Exploring the Bounds of Web Latency Reduction from Caching and Pre-fetching: Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS), Monterey, CA (1997) 13-22 2. Wang, J. : A survey of web caching schemes for the Internet: ACM Computer Communication Review, 29(5), (1999) 36-46 3. Wang, Z., Crowcroft, J.: Prefetching in World Wide Web: Proceedings of the IEEE Global Internet 96, London, (1996) 28-32 4. Palpanas, T., Mendelzon, A.: Web Prefetching Using Partial Match Prediction: Proceedings of the Web Caching Workshop, San Diego, CA, USA, (1999) 5. Chen, X., Zhang, X.: Coordinated data prefetching by utilizing reference information at both proxy and Web servers: ACM SIGMETRICS Performance Evaluation Review, Volume 29, Issue 2, (2001) 32-38 6. Markatos, E., P., Chronaki, C., E.: A Top-10 Approach to Pre-fetching the Web: Proceedings of INET ’98 (The Internet Summit), Geneva, Switzerland, (1998) 7. Fan, L., Cao, P., Jacobson, Q.: Web Prefetching Between Low-Bandwidth Clients and Proxies: Potential and Performance: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS ’99), Atlanta, GA, (1999) 178-187 8. Loon, T., S., Bharghavan, V.: Alleviating the latency and bandwidth problems in WWW browsing: Proceedings of the 1997 Usenix Symposium on Internet Technologies and Systems (USITS-97), Monterey, California, USA, (1997) 219-230 9. Nanopoulos, A., Katsaros, D., Manolopoulos, Y.: Effective Prediction of Web-user Accesses: A Data Mining Approach: Proceedings of the Workshop WEBKDD, San Francisco, CA, (2001) 10. Bestavros, A.: Using speculation to reduce server load and service time on the WWW: Proceedings of the 4th ACM International Conference on Information and Knowledge Management, Baltimore, Maryland, (1995) 403-410 11. Zukerman, I., Albrecht, D., W., Nicholson, A., E.: Predicting Users’ Requests on the WWW: Proceedings of the 7th International Conference on User Modeling, Banff, Canada, (1999) 275-284 12. Padmanabhan, V., Mogul, J.: Using Predictive Prefetching to Improve World Wide Web Latency: Computer Communication Rev., 26(3), (1996) 22-36 13. GRNET, Web Site: http://www.grnet.gr/

Applications of Server Performance Control with Simple Network Management Protocol Yijiao Yu, Qin Liu, and Liansheng Tan* Department of Computer Science, Central China Normal University, Wuhan 430079, PR China. {yjyu, liuqin, l.tan}@ccnu.edu.cn

Abstract. Automated performance control is seen to be necessary when application server is overloaded due to over utilization of critical resources. Using Simple Network Management Protocol (SNMP) to provide controlled network device’s feedback, centralized server control model is applied in this paper. The method to choose the minimum sampling period in a network control system is then presented and thresholds are yielded based on Round Trip Time (RTT) testing results. An admission control system in a genetic algorithm server is further illustrated step-by-step, where two classical controllers, i.e., proportional controller and intelligent controller are introduced. Analysis to the stability of network control is subsequently discussed in detail. We finally use a real computer control model to test the control effects. The experiment results show the efficiency of the control approach to automatically controlled network systems.

1 Introduction With the development of Internet Application, the number of servers increased sharply in recent years. For every server, accepting a request means consuming its resources, including CPU slot, memory, bandwidth and so on. When one or several kinds of resource are scare, the server will be regarded as overloaded. Some researches about application server performance control were carried out, especially on WWW server [1], FTP server [2] and Video-On-Demand (VOD) sever [3]. Admission control system is an available way to guarantee the performance of server. Reducing the response time, improving throughput or other special objects can be looked as the performance control goals of server in general. In those admission control systems, the control models and the ways to get feedback are abound. The controller software process is embedded in Apache Server Linux OS in [1]. In [2] log files in a Lotus Note server is analyzed to get feedback and the admission controller located in another computer. Simple Network Management Protocol (SNMP) is suggested to be applied in automatic network control in [4]. Getting the abundant feedbacks in real time and transmitting them in a reliable way are * Corresponding author. E-mail: [email protected]. Address: Department of Computer Sci-

ence, Central China Normal University, Wuhan, 430079, PR China. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 352–359, 2004. © Springer-Verlag Berlin Heidelberg 2004

Applications of Server Performance Control

353

important to the network control software implementation. In our view, the most popular and mature network protocols should have high priority in network control system design. In [1], the stability of the network control system is not discussed, but this requirement to a control system in engineering is very important. If the stability of application server control system cannot be guaranteed, the workload of server will always jump between overload and idleness. Obviously, it’s not the expected working states. In this paper, we focus on how to get the feedback from the network devices with a general network management protocol and the stability analysis of network control system with both traditional control methods and intelligent control.

2 With SNMP to Get Performance Status Information In a closed-loop control system, controller should get feedback from controlled application servers or software process, and adjust control parameters to make servers work on the expected states. As discussed in section 1, the ways to get feedback from controlled hosts are abundant, such as log files, explicated bytes in packets, SNMP Management Information Base (MIB). Choosing the most appropriate method comes to the first problem of network control system design. Network management has been studied for nearly twenty years and the mechanisms of collecting status information with MIB are mature. So far, SNMP MIB has been accepted by most of industries and users in the world. Up to the present, the number of MIB variables is more than 200,000, and it still keeps increasing now. The managed arguments cover from hosts, routers and switches to large software. RFC1213 is a general network management information base for all network devices. It covers the system information of a device, interfaces, SNMP and so on. For application services, such as VOD or computing, CPU slot, memory or buffer resources are the bottleneck. With RFC 1514, controller can get feedback about host resources system group, storage group, device group, running software group, running software performance group and installed software group, which clarify the status of the server. What’s more, the coding way of MIB, Abstract Semantic Naming, is so flexible that anyone and any company can extend the common MIB to satisfy his own need. Microsoft is one of the most successful OS providers, and it has developed lots of private MIB for its own products. With the private MIB, it is possible to get more detailed information of server with SNMP. Another reason to select SNMP is the facility to implement experiments. Measurement of quality of service (QoS) is needed in network performance control system. SNMP agents survive almost in every network device. Nothing else is needed but to get value of MIB variables from SNMP agents. It seems that controller can be distributed all over the world when it is connected with Internet, even located on the server itself. But it is limited by the transmission delay and loss ratio. Experiments can be carried out quickly in real network environments and results are convincing if SNMP is used. Furthermore, there’re agents in other network devices, including switches, routers and hubs, and the control method can be easily transferred to another kind of

354

Y. Yu, Q. Liu, and L. Tan

performance control system design on other network devices. An example of centralized network control system will be shown in section 4 and section 5.

3 Minimum of Sampling Period The minimum sampling period is a key value of network control system because it associates with the robustness and real time feature guaranteed. When the sampling period is too small, feedback information will include redundancy data; on the contrary, control system will lose real time feature. Through network measurement to get the minimum sampling period is helpful to network control system design. In our view, the minimum sampling period in network control system must be larger than two special kinds of time. One is the sampling period of SNMP agent itself and the other is the Round Trip Time (RTT) between the controller and controlled device. The first is easy to be found from SNMP while the second should be measured on Internet. In our experiment, connections between hosts are classified into six types, namely those in the same hub (marked as 1), in a local area network (marked as 2), in a national wide area network (marked as 3) and international area network. Due to the difference of distance between international connections, this type is divided into two sub-types: connections between China and UK (marked as 4) and connections between China and USA (marked as 5). Especially, the transmission delay and packet loss of connections are measured, which agent and controller are in the same machine. This virtual software connection is marked as 6. The measurement tool we use is Ping command in Internet Control Message Protocol (ICMP). Ping command is executed in Linux because the measurement results are more accurate compared with that in Windows. Destination hosts are selected in random to demonstrate that the measurement results are representative. Some of destination hosts are websites of universities, and some are business sites or government sites. Network utility rate or congestion status is different in a day or a week, which has been considered. The measurements are carried out on different time in different day in a week, such as workdays and weekend.

In terms of a connection, source host sends two hundred ICMP messages to destination host and waits for the reply. The number of replied messages and RTT of every session are recorded. After testing, the records will be used to do statistical operations. With them, packets loss ratio, average RTT, minimum RTT and maximum

Applications of Server Performance Control

355

RTT can be calculated accurately. Because there’re about one hundred instances for a kind of connections, the statistical results will be accumulated again and the average RTT or maximum RTT value will be get in the end. Finally, the measurement and statistics results of the six types connections are listed in Table 1. From Table 1, we can see communication in LAN is highly reliable and possesses good real time features. But when it comes to WAN, the loss ratio is larger than zero and the delay cannot be ignored. The maximum RTT of every type of connections can be regarded as the RTT value in worst case of network. It’s well known that SNMP is an application protocol and the RTT between SNMP processes is certainly larger that RTT in ICMP layer. Hence we can conclude that the minimum sampling period in network control system should be larger than the maximum RTT. Otherwise the nth sampling action may have not finished, but the (n+1)th sampling is executed. According to the measurement results analysis, the minimum sampling period is suggested to be no less than values listed in the right line of Table 1. If the waiting time is over the sampling interval, controller considers that the feedback is lost, and it samples again. On this occasion, some argument questions are raised. For example, the (n+1)th sampling command is executed and a feedback is received very soon. Whether the feedback is the nth one or (n+1)th result should be considered.

4 An Admission Control Example The system environments of the admission control system are shown in Fig. 1. The controlled object is a genetic algorithms computing server, which has double CPUs. Windows 2000 runs on the server and SNMP agent is the component of operation system. The server computes NP hard problems for users on campus. Server tries its best to compute, but network manager hopes its CPU utilization ratio is between 60% and 70% for a long time to avoid hurt of CPU. Experiments show that every computing request occupies CPU resource about ten seconds.

Fig. 1. Admission control system on Genetic Algorithm Server

In terms of the requirements, MIB variable “hrProcessLoad” defined in RFC 1514 is selected to be the feedback. Controller runs in another PC in our lab and users send computing requests from Internet. Agent samples “hrProccessorLoad” variable once a minute and the sampling period of admission control system is also one minute, larger than the minimum sampling period suggested in section 3. Controller decides how many requests can be concurrently computed in the server and sends it to the application server with private control protocol.

356

Y. Yu, Q. Liu, and L. Tan

Fig. 2. Input requests

Fig. 3. Server status without controller

To simplify the control system, we measure the rule of input requests and describe it in Fig. 2. The period is two minutes, in the first half period, there’re ten requests per minute and in the second there are five. The average input rate is 7.5 requests per minute and exceeds the service computing ability. Fig. 3 is an unexpected server occasion without admission control; because the server is so busy in a long time that CPU is easy destroyed for hot. On the other hand, the server is always working and it is difficult to response some high prior computing requirements in real time.

5 Design of Controllers and Testing Notations in the admission control system are listed below. m(n) the number of maximum parallel threads in the nth interval. f(n) the feedback information of server in the nth interval. the expected CPU utilization ratio of genetic server. r(n) the number of requests in the nth interval.

5.1 Proportional Controller Proportional control is a classical method. With proportional control, transfer functions about the controlled plants should be described first. In Fig. 1, there’re two input variables of genetic server: m(n) which is the output of controller and r(n) which can be looked as noise. The CPU utilization rate is the only output variable. It is a multiple input variables and single output variable controlled object. As shown in Fig. 3, the server works on unexpected situation after two minutes without controller. It hints that m(n) is most likely important than r(n) for CPU utilization rate. To simplify design, transfer function of genetic server can be described only about m(n) and f(n). Unfortunately, we do not have direct measurements of f(n) and m(n) because it’s not explicit to compute. When a computing request comes, genetic server decides whether accept it or not. If it is accepted, server will create a thread and the number of parallel computing thread will increase one. However, it’s not easy to tell how many requests are accepted and finished. For example, in nth interval, a request is accepted but it may be processed in the (n=1)th interval. Even m(n) is only one, the trouble still exists.

Applications of Server Performance Control

357

Therefore, the only way is to derive the relationship between f(n) and m(n) with history records about m(n) and f(n). A linear function about server is concluded with least square fit. So far the control system can be looked as Fig. 1 and described by equation (1) and (2) formally. Obviously, the controller is a proportional controller. In equation (2), a equals 0.5 and b equals 0.2.

Whose z transforms are given by (3), (4), where

From (3), we can derive By substituting (5) into (4), one yields From (6), we can get If

represents the characteristic polynomial of (7), it can be written as

is the characteristic equation of the closed loop system given by (1), (2). If all roots of the equation lie in the unit disc, the system is stable. From (8), it’s clear that has all roots of z = 0 together with The roots of are depicted below.

If the system is stable, and must be satisfied. k is a variable which affects the stability of the admission control system. In experiment, k is set to –6 and to –3, and control effects are shown in Fig. 4 respectively. The result of the first one is unacceptable but the later is on the contrary. Let’s analyze the stability of control system when k is –3 and –6. Substitute a, b, k into and When k =-3

When k = -6, as the same way we can get When k equals –3, the control system is stable in view of control theory. If k equals –6, the control system isn’t stable. Hence, the traditional control theory analysis is very important to network control system design.

358

Y. Yu, Q. Liu, and L. Tan

Fig. 4. Control effect of the P controller

Fig. 5. Control effect of the intelligent controller

5.2 Intelligent Controller During the proportional controller design, transfer functions should be described first. In this experiment, we get it from statistic fit that is usually not an easy job. What’s more, network is so large and complex that it is a great burden to realize network control with traditional control methods because either controllers or controlled objects should be described with equations precisely. Intelligent control methods, such as artificial neural network, fuzzy control, have been developed in recent twenty years and utilized in network control system [5]–[6]. The special attractive feature of intelligent control system is that it’s not necessary to form transfer functions about plants. Experiences can be used directly and the plants are looked as black boxes. To avoid data fit operations and try to use intelligent control method in network control system, a production rules controller is implemented and the rules are below.

The control effects are illustrated in Fig. 5. We have tried some other rules about this controller and unfortunately they failed.

The above rules is an unsuccessful example, which gets the control effect, illustrated in Fig. 5. Comparing the two sets of epsilon rules, it’s easy to see that the former one is flexible and adaptive, but the later one is static. Especially, there’re too many experience values in the second set of rules, which are hard to obtain. So far, the different experiments results are reasonable. From this intelligent controller design, we conclude that the intelligent controller can be very effective and drive the system to the desired point quickly if you get appropriate experiences. Otherwise, it’s a daunting task to designers.

Applications of Server Performance Control

359

6 Conclusions The issue how to use SNMP collecting the performances data of server and developing network control is discussed in this paper. The admission control experiments show the approach is efficient and easy to be realized in engineering. With the network control model, manager can take actions to guarantee QoS in real time. To the above purpose, the proportional controller and intelligent controller are designed and tested in the experiment. Both of them get good control effects. So far, either traditional control method or modern control methods can be used to network control model. With traditional control method, transfer functions must be built first and they are necessary to analyze the stability of control system. In network control system, transfer function is not clear between input and output variables, so least square fit method is used in the proportional controller design and is verified to be helpful to control system design. Of course, this job is not easy in engineering. Although intelligent controller can achieve efficient effects in network control design, the stability analysis of it is not mature. How to derive the transfer functions of servers and analyze the stability of intelligent control system should be studied. Acknowledgements. This research has been partially supported by National Natural Science Foundation of China under Grant No. 60174043, by the Key Project of Natural Science Foundation of Hubei Province in China under Grant No. 2002AB025, and by the Natural Science Foundation of Central China Normal University under Grant No. 500502.

References 1. Thiemo Voigt, Per Gunningberg, “Handling Multiple Bottlenecks in Web Servers Using Adaptive Inbound Controls,” Seventh International Workshop on Protocols for High-Speed Networks, April 2002, Berlin, Germany 2. Joe Hellerstein, Sujay Parekh, “An Introduction to Control Theory and Its Application to Computer Science,” SIGMETRICS 2001/Performance 2001 3. P. Mundur, R. Simon, A. Sood, “Integrated Admission Control in Hierarchical Video-onDemand Systems,” In Proceedings of the IEEE International Conference on Multimedia Computing and Systems (ICMCS ’99), Florence, Italy, June 7-11, 1999, pp. 220-225 4. Yijiao Yu, Qin Liu, Liansheng Tan et al., “Automated Network Management with SNMP and Control Theory,” Proceedings of DCABES 2002, Wuxi, PR China, 2002. pp. 260-264 5. J. Aweya, D. Y. Montuno, Qi-jun Zhang, L. Orozco-Barbosa, “Multi-step neural predictive techniques for congestion control-part1: prediction and control models,” International Journal of Parallel on Distributed Systems and Networks, Vol. 3, No. 1, 2000, pp. 1-8 6. Rose Qingyang Hu, David W. Petr, “A Predictive Self-Tuning Fuzzy-Logic Feedback Rate Controller,” IEEE/ACM Transactions on Networking, vol. 8, no. 6, December 2000

Appcast – A Low Stress and High Stretch Overlay Protocol V. Radha1, Ved P Gulati1, and Arun K Pujari2 1

Institute for Development and Research in Banking Technology, Hyderabad, India {vradha,vpgulati}@idrbt.ac.in 2

University of Hyderabad, Hyderabad, India [email protected]

Abstract. With IP Multicast not gaining wide acceptance, researchers turned to alternative multicast mechanisms like application level multicast. Application Level Multicast protocols arrange the participating hosts into an overlay topology; maintain it and distribute data packets over that topology. In this paper, we propose a new application level multicast protocol, describe few well-known protocols, simulate the protocols and compare the results. Keywords: Networks; Network Protocols; Network Algorithms; Broadcast; Multicast; Distributed Computing

1

Introduction

At present, the multicast applications are implemented as multiple point-to-point applications. An application logically requiring multicast must send individually addressed packets to each recipient. This has two drawbacks – 1. The source should know the addresses of all recipients and 2. Transmitting multiple copies of the same packet results in inefficient usage of sender’s resources and network bandwidth. Unicast is completely impractical due to its redundant use of link bandwidth, when thousands of receivers have to receive the same data. The benefits of multicast in terms of bandwidth efficiency are quiet often outweighed by the control complexity associated with group setup and maintenance. The goal of multicast or broadcast mechanism is to eliminate redundant packet replication in a network, when a group of computers participate in a communication.

Fig 1.a

Fig 1.b

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 360–371, 2004. © Springer-Verlag Berlin Heidelberg 2004

Fig 1.c

Appcast – A Low Stress and High Stretch Overlay Protocol

361

From the figures, it is clear that in Unicast, the links nearer to source experience redundant packet movement. On links S-R1 and R1-R2, the same packets move thrice. In IP Multicast, no link experiences redundant packet movement. In Application Level multicast, the links nearer to end-hosts experience redundant packet movement. On links R2-A1 and R3-A2, packets move twice. Though application level multicast, can’t perform the way IP multicast can, it still achieves better results compared to unicast. The contributions of this paper include A new application level multicast protocol - Appcast Simulation of the protocols Comparison of protocols’ performance Section 2 details the new protocol - Appcast; Section 3 details some well-known protocols; Section 4 details the simulation and performance results.

2

Application Level Multicast Protocols

Application Level Multicast Protocols follow the following steps to achieve multicast capability. Arrange receivers into a overlay network or um-cast connections Construct efficient data distribution trees over this overlay network to distribute data packets At the heart of Application Level Multicast protocols, is the overlay topology they create. The topology is created, as members/hosts join the multicast group. Application Level Multicast topology building algorithms define a definite relationship among the participating members and thereby create topologies like tree, mesh, hierarchy etc. The relationship can be parent-child, host-neighbors, cluster member – cluster leader etc. Each application level multicast protocol differs in how they arrange the hosts into overlay, manage it and distribute the data over it. Protocols like Bayeux[11] and Scribe[14] are motivated by Peer-to-Peer networks and arrange the hosts into overlay by logically assigning unique number to each host, irrespective of their proximity relations in real underlying network topology. Similarly CAN[17] and DTProtocol[10] arrange the hosts by assigning each host a place in a geometric space. The performance of these protocols is heavily taxed by their lack of awareness of underlying topology.

2.1 Appcast Overlay Topology Appcast creates an overlay topology that is common for all muti-cast groups. The overlay topology is a tree topology with centralized algorithm to construct the tree. Appcast depends on the topology information to create the tree. One of the nodes, preferably the one owned by the network service provider acts a root node for the multicast tree. Unlike other application layer multicast protocols, where in every node acts as a routing node also, we clearly demarcate the functionalities of the nodes. In our scheme, we have two kinds of nodes, 1. End hosts and 2. Proxies. Proxies are the ones, who can actually route the applications, where as the end-hosts are the ones

362

V. Radha, V.P Gulati, and A.K Pujari

which can be either source or destination. In other words, in Appcast tree, Proxies can have children, but end-hosts can’t have. In all the proposed topology creation algorithms, we considered proxy nodes only. The end-host nodes information need not proliferate through out the topology and is local to each proxy.

2.1.1 Overlay Topology Creation Any new proxy that has to join the overlay, contacts the root first. The root determines to which proxy, this new member should get hooked based on the distance metric – number of hops. Root selects a proxy (p), as the parent of the new member (m) meeting the following conditions. 1. D(R,m) > D(p,m) The distance between root and new member is greater than the distance between proxy and new member 2. For all proxies that satisfy condition 1, There can be more than one proxy satisfying the condition 1, and if so, choose the ones which are nearer to the new member. 3. Path(Parent(p),m) contains Path(p,m) The path between proxy and member is a prefix to the path between proxy’s parent and member. The path need not be shortest path. In the pictures Fig 2.a, Fig 2.b & Fig 2.c, S is source, R1,R2,R3.... are routers and P1.P2... are proxies joining the overlay in 1,2... order. A node joining can cause the following cases. 1. Simple Direct Join: New proxy Pn gets a P, which is very near to it, and introducing this proxy into overlay doesn’t affect, rest of the tree structure. Figure 2a depicts this scenario. In this scenario, the algorithm selects S as the parent of P1 and when P2 also selects S as its parent, the overlay relation of S and P1. 2. New proxy selects a parent P such that is in the path of P’s parent and P. In this case, in fig 2b, the algorithm first selects P1 as the parent of P3 as per condition 1, but to meet the condition 3, it has to reorder the parent-child relationship, ie P3 becomes parent of P1 and child to R. becomes parent to some of the children proxies of P. In the 2c 3. New proxy picture above, P4 selects S as its parent and since P4 is the child of S and is in the path of S and P3, P4 takes over as parent to P3.

Fig. 2.a

Fig. 2.b

Fig. 2.c

Appcast – A Low Stress and High Stretch Overlay Protocol

Fig. 2.a1

Fig. 2.b1

363

Fig. 2.c1

2.1.2 Appcast Tolpology Creation Algorithms Any new proxy, joining the group sends a join message to the root. The root invokes the function “FindNearestProxy” which returns the proxy that is closest to the node. Root then calls “FindRelations” to fix the relationship of the new joining proxy and others in the overlay tree. The well-known Dijkstra’s algorithm finds the shortest paths from a source to any/all destinations (vertices) in a graph. Dijkstra’s algorithm keeps two sets of vertices. 1. The set of vertices that must be the part of the path from source to destination and 2.The remaining vertices that can be part of the path. The algorithm terminates once the required destination joins the first set, in case it has to find path between source and destination or once the set becomes null, in case it has to find paths between source and all destinations. To find the nearest proxy, we keep one more set of vertices – set of all proxies. We change the algorithm; such that the algorithm terminates once it reaches any vertex that belongs to this set. The complexity of Dijkstra’s algorithm is based on two operations – 1. Find minimum, with complexity O(N) and 2. Change the label with complexity O(m). In addition to this, in our algorithm, we have to find whether the selected node with minimum index that belongs to the set of proxies. This has a complexity of O(k). These three steps are performed N times and so total complexity is There are many ways in which Dijkstra’s algorithm has been implemented using data structures like binary heaps to reduce the complexity and to achieve better performance. It can be applied to this modified Dijkstra’s algorithm also. 2.1.3 Appcast Optimization In Appcast, a proxy joining the multicast group selects the very first proxy that it comes across while finding the path from itself to the root. This approach definitely ensures that the path length from joining proxy to parent proxy is lesser than the path length from joining proxy to root. However, if we take into consideration the actual path length from root to this new proxy (along the proxies), the path length would be higher. The performance results clearly showed that Appcast uses very few over all links/hops. At the same time, it also showed the maximum application level path lengths and maximum stretch. To keep the stretch and stress at an optimum level, the Appcast_opt algorithm is proposed. In this, a joining proxy can specify how many children (stress) it can accept and how much stretch (delay) it can bear. 2.1.4 Control and Data Paths The root keeps information of all proxies. Every proxy keeps information about its parent and its children. Also, every proxy keeps track (heart beat) of its children and

364

V. Radha, V.P Gulati, and A.K Pujari

parent. If any proxy is down, immediately, its children contact the root and try to hook to the parent of the downed proxy. It is the root, which tells the children about their new parent, keeping all constraints satisfied. Whenever, a new proxy joins or leaves, few other proxies also will be informed by root to change their relationships, so that the constraints are satisfied. In Appcast, data can be flowed bottom-up and top-down across the Appcast topology tree. To avoid loops, each node checks from which it received the data and accordingly forwards to selective children and parent. Whenever a proxy receives a packet, it checks from whom it received. If it received from it’s parent, then it forwards packets to all its children. If it received packet from one of its children (ie not parent), it forwards the packet to its own parent and to all its children except the child from whom it received as in Algorithm 5 – MulticastForward given in Table 1.

Appcast – A Low Stress and High Stretch Overlay Protocol

365

366

3

V. Radha, V.P Gulati, and A.K Pujari

Other Application Layer Multicast Protocols

The general purpose of creating a topology is 1.To distribute the data packets and 2.To send control information to manage the topology. Some protocols use the same topology for both the purposes, while others use separate topologies like tree and mesh. ESM[2], YOID[18], Scattercast[3], Overcast[4] create mesh and tree topologies, with mesh for controlling purpose and tree for distribution purpose. HMTP[12] and TAG create tree for both control and data distribution purposes. NICE arranges the hosts into a hierarchy of clusters. All these protocols take proximity metrics like rtt (round trip time), shortest path, maximum common path overlapping etc into consideration while creating the topology. We consider HMTP, TAG and NICE for comparison purpose with Appcast and hence describe the same in this section.

3.1

Host Multicast – HMTP

HMTP[12] creates group specific tree topology as the multicast overlay topology. In HMTP, each multicast group requires a Host Multicast Rendezvous point that acts as a contact point for new members to join the group. HMTP Clusters nearby members together. Members choose their parent closer to them, by using the following procedure. 1. New member sets the root as potential parent (PP) and contacts PP. 2. Query PP to discover all its children and measure its nearness to PP and PP’s children 3. Find the nearest member among the PP and PP’s children except those marked as invalid. If all of them are marked as invalid, pop the top element from stack, set it as PP and return to step 2. 4. If the nearest member is not current PP, push current PP onto stack; set the nearest member as new PP and return to step 3. 5. Otherwise send join request to PP. If PP accepts it as a child, it becomes child of the PP; if rejected mark PP as invalid and return to step 3 (PP may not accept it as its child due to many reasons like – out degree); otherwise parent found and so establish unicast path. HMTP proposed member leave, link failure and improvement algorithms also. In HMTP, every member keeps track of every other member that falls in the path of member and root. So, the average control overhead for HMTP is O(max degree), ie the maximum children a node has.

3.2 NICE NICE[13,15,16] claims, relatively small control overhead. Its motivation is actually from key distribution in a secure group communication. NICE arranges set of end hosts into a hierarchy. The hierarchy implicitly defines data Path. Each member maintains soft state information about other hierarchically nearer members and has only limited knowledge about other members. In NICE, all members belong to Layer 0. Members are grouped into clusters with size between K and 3K, where K is

Appcast – A Low Stress and High Stretch Overlay Protocol

367

constant. For each cluster, one of the cluster members acts as a leader and enters into higher layers. A member is part of layer if it is leader in all levels. A cluster leader has minimum maximum distance from all of its members. A host belongs to only a single cluster at any layer. If a host is not present in layer it can’t be present in any layer where j>i. For a group size N, and cluster size K, there can be at most layers. Each member maintains information about every other member of it’s own cluster in all of its layers. NICE constructs an overlay tree, before it clusters the group members and arranges them into a hierarchy. NICE constructs an overlay tree based on the underlying network topology. Next, it uses a clustering protocol to group the members into clusters of size K to 3K-1, where K is a constant by traversing the overlay tree bottom up. This clustering is basically to reduce the depth of the tree and to keep control overhead cost to be constant. As the cluster size increases, unicast with in the cluster may increase. NICE doesn’t give flexibility to the joining member, to choose its leader. Since NICE has made the cluster size constant, the control overhead in NICE is constant. Similarly, NICE can deliver the data to the members in at most O(log N) application hops.

3.3

TAG – Topology Aware Group Communication

TAG uses information about path overlap among group members to construct the overlay tree. In TAG each new member of multicast group, determines the path from the root to itself and finds out its parent and children by partially traversing the overlay tree. TAG proposed complete path matching algorithm, where in a new node selects one as its parent, which shares the maximum common path with it. Each TAG node maintains a Family Table, with information about its parent and children. The path-matching algorithm traverses the overlay tree from root down the children, matching the paths from the root to new node with the path from root to TAG node. It considers three mutually exclusive cases. Let N be a new member wishing to join and C be the node being examined. Then the three cases are 1. There exists a child A of C, whose path is a prefix for the path N, with the condition that the path length of N > A > C. In this case N chooses node A, and traverses the sub-tree rooted at A. 2. There exist children of C, who have the path of N as the prefix, in their path. In this case, N becomes child of C, with all as its children. 3. In case, there’s no child of C satisfying the cases 1 or 2, N becomes the child of C. As an optimization method, TAG proposed partial path matching algorithm, where in, instead of matching the complete path of a new member, a predefined number of elements in the path are matched. This helps reduce the depth of the tree.

4

Comparative Study

The evaluation criteria for multicast protocols have been defined in terms of stretch, stress and control overhead. Stretch is defined per member as the ratio of path length from the source to the member along the overlay to the length of the direct uni-cast path. Stress is defined per link or node as the number of identical packets sent by the

368

V. Radha, V.P Gulati, and A.K Pujari

protocol over that link or node. Control overhead is defined as the extra computing required to maintain the topology. Native multicast protocol achieves unit stress and unit stretch. Though Application level multicast protocols are not able to achieve this, they try to balance. While reducing stress will balance the load at nodes, it may increase stretch. Reducing stretch will increase the stress. Both are inversely proportional. The protocols (CAN, Bayeux, DTProtocol etc) that have no knowledge of the underlying topology suffer poor performance and can help only in sharing and distributing the load of the source across the members. The mesh based protocols like ESM and Yoid suffer from control overhead and are not suitable for large groups. The tree and hierarchical topologies like HMTP, TAG and NICE are able to contain the control overhead and at the same time performing well. The following table shows the intuitive comparison metrics.

4.1 Simulation and Results For comparison purposes we considered only TAG, NICE and HMTP as our future work would be based on tree topologies. The figures 3b, 3c, 3d show the overlay topologies created by them, when taken the network shown in figure 3a, with R1,R2,R3....R10 routers and S,A1,A2...A5 nodes, with the order of joins A3, A4, A5, A1 and A2. Since TAG chooses its parent based on the longest path match over shortest path from node to root, A3, selects A2 as parent, though A1 is nearer to it. Order of joins, matter a lot for the performance of HMTP. Since ‘A1’ joined in the last, it just took the one as its parent, which is nearer to it ie A3, without checking whether it is on the way between S and A3. NICE groups nearby members into clusters and arranges these clusters into a hierarchy. We used Boston University’s Network Topology generator - Brite to simulate our experiments. BRITE generates different kinds of network topologies based on the models - Flat Router Level models (Router Waxman, Router BarbasiAlbert); Flat AS Level models (AS-Waxman, AS-BarbasiAlbert) and Hierarchical models (Transitstub, tiers). First, we generated 100 nodes in AS model and assigned 20 hosts to these nodes. In this experiment, HMTP showed higher stress and lower stretch. TAG showed even more higher stress and less/no stretch. NICE, with cluster members fixed to 3, almost showed similar result like HMTP. Similar experiments have been conducted on network topology with 1000 nodes and with varying group memberships of hosts. Figures 4a, 4b, 4c and 4d show the results. HMTP used over

Appcast – A Low Stress and High Stretch Overlay Protocol

Fig. 3a.

Fig.3c. NICE

Fig. 4a.

Fig. 4c.

369

Fig. 3b. TAG

Fig. 3.d. HMTP

Fig. 3e. Appcast

Fig. 4b.

Fig. 4d.

all less hops and TAG used almost similar hops like unicast. This is because - TAG node will select a parent, which has maximum overlapping shortest path with it. In other words, TAG doesn’t look into alternative paths. This makes all nodes select the source itself as their parent. Very few nodes get nodes other than source as parent. Same is the reason TAG showing application level hops almost similar to unicast. NICE, while showing less over all hops compared to TAG, showed the higher application level hops compared to TAG and HMTP. This is because, with in clusters, NICE uses normal unicast among the cluster members and clusters leaders. As the group size increases, application level hops increase tremendously for NICE. Appcast

370

V. Radha, V.P Gulati, and A.K Pujari

is the one, which used the less number of hops. However, it is the one, which used maximum application level, hops. This is because; it doesn’t use any mechanism to control the tree depth. For this reason, an optimized version of Appcast protocol has been proposed, in which each joining host can specify the stretch parameter – the ratio between unicast hops and application level hops. As far as stretch is concerned, TAG showed less stretch and Appcast showed high stretch. NICE showed less stress and TAG showed high stress.

5

Conclusions and Future Work

The proposed application level multicast protocols basically differ in the overlay topology creation and distribution of data over the same. While studying the existing protocols, it has been found that mesh based systems are complex to maintain and tree based systems give good performance and less control overhead. In both the treebased systems ie TAG and HMTP, new joining node traverses the tree from root, down the children. While TAG is relying on shortest path, HMTP relies on shortest distance. These features some times may lead to overlapping links. We proposed a new method that allows the joining node to select a parent, which is on its way to the source. On the proposed new topology-building algorithm, we plan to use SOAP as application level transport mechanism and implement certain applications.

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

[10]

Dering, S. And Cheriton, D. Multicast Routing in Datagram Internetworks and Extended LANS. ACM Transactions on Computer Systems 8, 2 (May 1990) Hua Chu, Y., Rao, S., And Zhang, H. A Case for End System Multicast. In Proceedings of ACM Sigmetrics ’00 (Santa Clara, CA, June 2000) Chawathe, Y. Scattercast: An Architecture for Internet Broadcast Distribution as an Infrastructure Service. PhD thesis, University of California, Berkeley, Dec. 2000 Jannoti, J., Gifford, D.K., and Johnson, K.L. Overcast: Reliable Multicasting with an Overlay Network. In Proceedings of the 4th Symposium on Operating System Design and Implementation (OSDI) (San Diego, CA, Oct. 2000), USENIX Prabhakar Raghavan, Beyond Web Search Services – IEEE Internet Computing, MarApr 2001 Peter N. Yianilos; Sumeet Sobti; The Evolving Field of Distributed Storage; IEEE Internet Computing, Sept-Oct 2001 D.Cheriton and S.Deering, “Host Groups: A multicast extension for Datagram Internetworks”, DataCommun. Symp., Sept. 1985, pp.172-79 A. Shaikh, M. Goyal, A. Greenberg, R. Rajan, and K. K. Ramakrishnan. An OSPF Topology Server: Design and Evaluation, 2001. http://www.cis.ohio-state.edu/mukul/research.html. D. Pendarakis et al., “ALMI: An Application Level Multicast Infrastructure,” 3rd USNIX Symp. Internet Tech. and Sys., Mar. 2001. J. Liebeherr, M. Nahas, and W. Si, “Application-Layer Multicast with Delaunay Triangulations,” IEEE GLOBECOM ’01, also tech. rep. CS-2001-26, Nov. 2001.

Appcast – A Low Stress and High Stretch Overlay Protocol

371

[11] S. Zhuang et al., “Bayeux: An Architecture for Scalable and Fault-Tolerant Wide-Area Data Dissemination,” 11th Int’l. Wksp. Net. and Op. Sys. Supportfor Digital Audio and Video, June 2001. [12] B. Zhang, S. Jamin, and L. Zhang, “Host Multicast: A Framework for Delivering Multicast to End Users,” IEEE INFOCOM ’02, New York, NY, June 2002. [13] S. Banerjee, B. Bhattacharjee, and C. Kommareddy, “Scalable Application Layer Multicast,” ACM SIGCOMM ’02, Pittsburgh, PA, Aug. 2002. [14] M. Castro et al., “Scribe: A Large-Scale and Decentralized Application-level Multicast Infrastructure,” IEEE JSAC, 2002. [15] S. Banerjee and B. Bhattacharjee. Analysis of the NICE Application Layer Multicast Protocol. Technical report, UMIACSTR 2002-60 and CS-TR 4380, Department of Computer Science, University of Maryland, College Park, June 2002. [16] S. Banerjee and B. Bhattacharjee. Scalable Secure Group Communication over IP Multicast. In Proceedings of Internation Conference on Network Protocols, Nov. 2001. [17] S. Ratnasamy,M. Handley, R. Karp, and S. Shenker. Application-level multicast using content-addressable networks. In Proceedings of 3rd International Workshop on Networked Group Communication, Nov. 2001. [18] P. Francis. Yoid: Extending the Multicast Internet Architecture, 1999. White paper http://www.aciri.org/yoid/. [19] Minseok Kwon, Sonia Fahmy. Topology aware group communication. In NOSSDAV’02, May 12-14, 2002, Miami, Florida, USA.

Communication Networks: States of the Arts Xiaolu Zuo Splendidsky Networkings FreeResearch, United Kingdom [email protected]

Abstract. This paper presents the states of the arts of communication networks. Firstly, fundamentals of communication systems are presented, particularly, these of data/computer communications and ISDN networks. Then, the latest developments of communication networks are highlighted, including active/programmable networks, networking for ubiquitous computing, ad hoc networking, and autonomic computing for network infrastructures. Finally, outlooks are summarized.

1 Introduction New paradigms of networking have been constantly emerging, e.g., intelligent networks [1] [2], next generation networks [3] [4], active networks [5] [6], etc. They require network management must enable an integrated management of highly complex heterogeneous network infrastructures (wired, wireless, ad hoc, edge, GRID) and to realize ubiquitous user connectivity, i.e., any where, any time, any devices and any service contents.

2 Network Fundamentals Communications is the exchange of information between individuals or machines over distance. A basic model of communication can be depicted in Fig.1.

Fig. 1. Generic model of communication systems

In terms of information, there are voice communication, image communication, data communication, and multimedia (voice, texts, characters, symbols, images, graphics, data, etc.) communication. In terms of signals, there are analogue communication, and digital communication. In terms of services, there are telegram, telephone, M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 372–379, 2004. © Springer-Verlag Berlin Heidelberg 2004

Communication Networks: States of the Arts

373

telefax, data communication, broadcasting, TV, navigation, tele-sensing, telemetering, remote control, video conference, etc. In terms of transmission media, there are, tethered communication (twisted pairs, shielded twisted pairs, coaxial cables, fiber optic cables), wireless communication (air (microwave, mobile, pager), space (satellite), sometimes water). In terms of bandwidth, there are, narrow-band communication, broadband communication. The bandwidth of a telecommunication system determines the type and amount of information that can be transmitted.

2.1 Data Communication Networks Data communication networks are usually classified by their size and complexity, e.g., Local Area Networks (LAN), Metropolitan Area Networks (MAN), Wide Area Networks (WAN). Compound topology is widely used, as depicted in Fig. 2.

Fig. 2. Network segments are interconnected via telecommunication network

Internets are a huge collection of computer networks at local, national and international levels, a combination of LAN’s, telecommunications trunks, switching facilities and public dial-up facilities. Internets are across telephone lines, cables, fibre optics and satellites, have low cost telecommunication, and their signals are sent in real or actual time.

2.2 Wireless Communication Networks Wireless telecommunication facilitates the access to information anywhere and anytime. Examples of wireless technologies are two-way radios, mobile telephones, cellular telephones, and satellites. Fig. 3 illustrates their mobility and data rates. Wireless LANs (WLANs) provide something wired ones cannot: mobility. This mobility and the attendant flexibility it provides to the computer user are what make wireless computing environment so attractive on e.g. campuses and in libraries. WLANs use access points to receive and transmit radio signals to and from user’s computer or other device, the user’s device has a special card that contains a small

374

X. Zuo

Fig. 3. Network types of wireless communication

radio transmitter and receiver. The access point is hard-wired to the LAN and via that to the Internet [7]. The emerging satellite technology gives a new perspective for a universal access to the broadband infrastructure, potentially alleviating the prohibitive cost of serving every user by terrestrial digital networks. Terrestrial network infrastructure plus satellite communications could in the global information infrastructure [8], as illustrated in Fig. 4.

2.3 Integrated Services Digital Network (ISDN) ISDN is a high speed, high capacity, and high quality multimedia communication. ISDN is designed to provide greater numbers of digital services to telephone customers, such as digital audio, interactive information services, fax, e-mail, and digital video, as depicted in Fig. 5.

Fig. 4. Global information infrastructure

Communication Networks: States of the Arts

375

Fig. 5. Integrated Services Digital Network (ISDN)

There are two types of ISDN. The original version of ISDN is now called Narrowband ISDN (N-ISDN), which employs baseband transmission. Another version, called Broadband ISDN (B-ISDN), uses broadband transmission, which supports higher transmission rates. Asynchronous transfer mode (ATM) can handle data transmission in both connection-oriented and packet schemes. B-ISDN is an ATMbased multi-service digital network, which can not only support high transmission rates, but can also allow different applications or multimedia streams to be transmitted simultaneously in an integrated manner. The main characteristics of B-ISDN include the capability to provide many types of services (so far offered on different networks) and a high multimedia content of the services.

3 Some Latest Developments in Communication Networks 3.1 Active/Programmable Networks Active/programmable networks allow their users to add customized programs into the nodes of the network. For instance, packets could be replaced with program fragments that are executed at each network router/switch they traverse [9]. Active/programmable architectures, as depicted in Fig. 6 permit a massive increase in the sophistication of the computation that is performed within the network. They will enable new applications, especially those based on application-specific multicast, information fusion, and other services that leverage network-based computation and storage. Furthermore, they will accelerate the pace of innovation by decoupling network services from the underlying hardware and allowing new services to be loaded into the infrastructure on demand. An active/programmable network enables users to customize live networks by providing programs/points with data [5] [6]. It will be fully self-customizable, will facilitate operator system integration by replacing and dynamically upgrading proprietary router software and facilitate end to end service creation. As a result, users can directly program switches and other devices within the network so as to meet

376

X. Zuo

their requirements. At the same time the network should continue to be robust and resilient to technology faults, human error and malicious attacks. Next generation networks will be active/programmable as a minimum.

Fig. 6. Architecture for active/programmable networks

3.2 Networks for Ubiquitous Computing Networks have to support applications in Ubiquitous Computing environments, such as home appliances, building access control, hand-held mobile access, personal working environment, car - office - home connectivity, etc. With ubiquitous computing people can work with full access to communication, data, and computing from any location at any time. Two scenarios are depicted in Fig. 7 and 8.

Fig. 7. Built-in and external access networking systems for home

Communication Networks: States of the Arts

377

Fig. 8. On-board and external access networking systems for car

In ubiquitous computing environments, computers will be embedded in our natural movements and interactions with our environments — both physical and social. Ubiquitous computing will help organize and mediate social interactions wherever and whenever these situations might occur. The idea of such an environment emerged more than a decade ago in Weiser’s seminal article and its evolution has recently been accelerated by improved wireless telecommunication capabilities, open networks, continued increases in computing power, improved battery technology, and the emergence of flexible software architectures [10] [11] [12] [13].

3.3 Ad Hoc Wireless Networking Ad Hoc wireless networking supports a rapid and temporary, but reliable formation and configuration in collaborative computing and collaborative works, meetings, etc. Ad Hoc wireless networking is created on demand in order to enable the communications between the mobile hosts equipped with the wireless devices. Before the creation of the Ad Hoc wireless network, each mobile host has no information about other hosts or links. The network has no centralised manager and its topology changs dynamically by the movement of mobile hosts.

3.4 Autonomic Computing for Network Infrastructures Autonomic computing systems are self-managing systems which can perform management activities based on situations they observe or sense in the IT environment. Such computing systems have the ability to manage themselves and dynamically adapt to changes in accordance with business policies and objectives [14] [15]. In an autonomic environment, system components -- from hardware such as desktop computers and mainframes to software such as operating systems and business applications -- are self-configuring, self-healing, self-optimizing and self-protecting.

378

X. Zuo

Self-configuring is to adapt automatically to the dynamically changing environments. Self-healing is to discover, diagnose and react to disruptions. Self-optimizing is to monitor and tune resources automatically. Self-protecting is to anticipate, detect, identify and protect against attacks from anywhere. Networks infrastructure will become increasingly huge in scale, heterogeneous in composition, active in physical components, and programmable in revise provision. All of those make manual management of IT infrastructure impossible. It must be able to manage itself.

4 Outlooks The 21st century may become the “Broadband Age” or even better: the “Service Convergence Age”. Today, broadband sources such as fiber optic, satellite and cable modem provide very high speed access to information and media of all types via the Internet, creating an “always-on” environment. The result is a widespread convergence of entertainment, telephony and computerized information: data, voice and video, delivered to a rapidly-evolving array of Internet appliances, Personal Digital Assistants, wireless devices and desktop computers. Ubiquitous access to information, anywhere, and anytime, will characterize whole new kinds of information systems in the 21st Century. These are being enabled by rapidly emerging wireless communications systems, The needed expertise encompasses, e.g., network management, integration of wireless and wireline networks, system support for mobility, computing system architectures for wireless nodes/base stations/servers, user interfaces appropriate for small handheld portable devices, and new applications that can exploit mobility and location information. In the future, the host may see the network as a message-passing system, or as memory. At the same time, the network may use classic packets, wavelength division, or space division switching. Future network protocols will need to provide a secure connection independent of the networks for applications to use.

References 1. 2. 3. 4.

R. Brennan, B. Jennings, C. McArdle, and T. Curran: Evolutionary trends in intelligent networks. IEEE Communications Magazine (2000) 86–93 M. Finkelstein, J. Garrahan, D. Shrader, and G. Weber: The future of the intelligent networks. IEEE Communications Magazine (2000) 86–93 A. R. Modarressi and S. Mohan: Control and management in next-generation networks: challenges and opportunities. IEEE Communications Magazine (2000) 94–102 A. Leon-Garcia and L.G. Mason: Virtual network resource management for nextgeneration networks. IEEE Communications Magazine (2003) 102–109

Communication Networks: States of the Arts 5. 6.

7. 8. 9. 10. 11. 12. 13. 14.

15.

379

K. L. Calvert, S. Bhattacharjee, E. Zegura, and J. Sterbenz: Directions in active networks. IEEE Communications Magazine (1998) 72–78 A. T. Campbell, H. G. De Meer, M. E. Kounavis, K. Miki, J. B. Vicente, and D. Villela: A survey of programmable networks. ACM SIGCOMM computer Communication Review, 29 (1999) 7–23 K. Asatani and Y. Maeda: Access network architectural issues for future telecommunication networks. IEEE Communications Magazine (1998) 110–114 C-K. Toh and V.O.K. Li: Satellite ATM network architectures: an overview. IEEE Network, (1998) 61–71 D. L. Tennenhouse and D.J. Weterall: Towards an active network architecture. http://www.tns.lcs.mit.edu (1997) K. Lyytinen and Y Yoo: Issues and challenges in ubiquitous computing. Communications of the ACM, 45 (2002) 63–65 G. E. Burnett, and J. M. Porter: Ubiquitous computing within cars: designing controls for non-visual use. International Journal of Human-Computer Studies, 55 (2001) 521–531 G. B. Davis: Anytime/anyplace computing and the future of knowledge work. Communications of the ACM, 45 (2002) 67–73 W. Drew Jr: Wireless networks: new meaning to ubiquitous computing. The journal of academic Librarianship, 29 (2003) 102–106 H. Tianfield: Multi-agent based autonomic architecture for network management. Proceedings of the IEEE International Conference on Industrial Informatics (INDIN’03), Banff, Alberta, Canada, 21–24 August 2003 A. G. Ganek and T. A. Corbi: The dawning of the autonomic computing era. IBM Systems Journal, 42 (2003) 5–18

DHCS: A Case of Knowledge Share in Cooperative Computing Environment Shui Yu, LeYun Pan, FuTai Zou, and Fan Yuan Ma Department of Computer Science and Engineering, Shanghai Jiaotong University, Shanghai 200030, China {merlin,fyma}@sjtu.edu.cn

Abstract. Large-scale hypertext categorization has become one of the key techniques in web-based information acquisition. How to implement efficient hypertext categorization is still an ongoing research issue. This paper introduces the Distributed Hypertext Categorization System (DHCS), in which the Directed Acyclic Graph Support Vector Machines (DAGSVM) for learning multi-class hypertext classifiers is incorporated into cooperative computing environment. Knowledge share among the local learning machines is achieved via utilizing both the special features of the DAG learning architecture and the advantages of support vector machines. The key problems encountered in design and implementations of DHCS are also described with solutions to these problems.

1 Introduction Over the years, computer scientists have primarily studied the knowledge discovery process as a single user activity. For example, the research of automatic text categorization (ATC) has provided us with sophisticated techniques for supporting the information filtering process, but mostly in the context of a single, isolated user’s interaction with an information base. Recently, a number of case studies have studied the cooperative nature of information search activities. The case study reported in [1][2] provides insight into the forms of cooperation that can take place during a search process. However, most researchers have studied in depth the kinds of collaboration that can occur in either the physical or digital library [2] [3] [4]. With the rapid change of the World Wide Web, cooperative web mining plays a crucial role in web information acquisition. As a typical application of web information retrieval, hypertext categorization is suffering the large-scale unlabeled web page base. Since building text classifiers by hand is difficult and time consuming, it is desirable to learn classifiers from examples. Apparently it is necessary to extend the state-of-the-art machine learning techniques to cooperative learning environment so as to solve the problem of distributed web information retrieval. Another motive of this extension is the local learning machines are always looking forward to knowledge share in one community. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 380–387, 2004. © Springer-Verlag Berlin Heidelberg 2004

DHCS: A Case of Knowledge Share in Cooperative Computing Environment

381

The main aim of this paper is to discuss some important issues of how to implement efficient hypertext categorization in the distributed and cooperative learning environment. The rest of the paper is organized as follows. Section 2 introduces the state-of-the-art machine learning technique and hypertext categorization. Section 3 explains the knowledge share and cooperative learning in DHCS. Section 4 discusses some implementation issues. Section 5 describes some experimental results. And in section 6, we present the conclusions and give some ideas of the future work.

2 Machine Learning and Hypertext Categorization 2.1 Support Vector Machines Kernel-based learning methods (KMs) are a state-of-the-art class of learning algorithms, whose best-known example is Support Vector Machines (SVMs). SVMs method has been introduced in automated text categorization (ATC) by Joachims [5] [6] and subsequently extensively used by many other researchers in information retrieval community. It has shown to yield good generalization performance in both text classification problems and hypertext categorization tasks. So far, SVM is the best choice for constructing hypertext categorization systems [5] [7]. The original primal optimization problem describes the principle of SVMs:

Instead of solving the above optimization problem directly, one can derive the following dual program:

It is obvious that SVM learning algorithm needs to solve a numerical quadratic programming problem. By taking some decomposition techniques such as SMO [8] [9], one can solve SVM problem iteratively and the computation time can scale as ~N1.7 (N is the total number of training samples) in the best case. However, it is still complicated while dealing with large-scale multi-class categorization problems.

382

S. Yu et al.

2.2 Multi-class SVM and DDAG Learning Architecture SVM was originally designed for binary classification. How to effectively extend it for multi-class classification is still an ongoing research issue. Several methods have been proposed where typically the binary SVMs are combined to construct the multiclass SVMs. There are three main methods: one-against-one, one-against-all and DAGSVM. It has been pointed out in [10] that DAGSVM is very suitable for practical use. Previous experiments have proved that DAGSVM yields comparable accuracy and memory usage to the other two algorithms, but yield substantial improvements in both training and evaluation time [10][11].

2.3 Challenges of Large-Scale Hypertext Categorization Tasks So far, all issues we have discussed are under the assumption that the whole computation process is executed in one computer. Nevertheless, does DAGSVM work well while handing real large-scale hypertext categorization tasks? How can we handle thousands of HTML files even millions of them? How can we handle too many categories? How can we update the decision rules efficiently? Unfortunately, most practical categorization systems are isolated and unable to deal with these problems. Driven by the idea of implementation of hypertext categorization in the cooperative computing environment, we propose the distributed system (DHCS) as an interesting and practical case of knowledge share in the cooperative computing context.

2.4 Distributed Hypertext Categorization System We find that binary SVM nodes in DAGSVM are very similar to the real nodes in computer networks. Thus we may divide the training workload of the whole DAGSVM into several separated groups (each group contains one or more binary SVM nodes). Fig. 1 describes the basic idea:

Fig. 1. Allocate the DAGSVM nodes to physical computer nodes

DHCS: A Case of Knowledge Share in Cooperative Computing Environment

383

In the distributed hypertext categorization system, there are several key problems to be solved including: How to divide the DAGSVM nodes and allocate them to computer nodes? How do the computer nodes communicate with each other? How to share categorization knowledge among the computer nodes? We will discuss these problems in the later sections.

3 Knowledge Share and Cooperative Learning 3.1 Information Knowledge in DHCS According to the structure and the learning algorithm of DAGSVM, we can explain the concepts of information and knowledge in DHCS. First, divided DAGSVM node groups need their training samples. When one computer node has labeled some samples that do not belong to its categories, it should send them to other computers. We may notice that the exchange of the labeled samples among the computer nodes is very similar to the regular information exchange in a traditional peer-to-peer computer networks. But it is more meaningful that we can implement knowledge share in DHCS. Since each computer node has known some “knowledge” after it finish the training of its own DAGSVM nodes, other computer nodes can share its learning results. Once all computer nodes get enough “knowledge” from others in the cooperative environment, they can assemble the whole DAGSVM respectively. That is difficult in other cooperative learning systems but is considerate easy in DHCS for the special structure and features of DAGSVM. We will discuss this in detail in the following sections.

3.2 Knowledge Share in DHCS Before we go any further, it is necessary to take a look at the decision rules in SVM. After we solve the optimization problems in (1) and (4), we get the optimal then we have:

According to the Karush-Kuhn-Tucker (KKT) condition, we have:

In (4),

is a support vector if the corresponding

And the decision rule is:

384

S. Yu et al.

Combining (10) with (7) and (8), we see only those support vectors can affect the decision function. Researchers have found that the proportion of support vectors in the training set can be very small (usually 2%~5% in text categorization tasks [8][12]) via proper choice of SVM hyper-parameters. In fact, other computer nodes can restore the categorization rules only through the support vectors and their coefficients. Thus, for a computer node, it can transfer its categorization knowledge to other computer nodes in the form of support vectors and the corresponding coefficients. And fig. 2 shows the knowledge share in the context of DAGSVM:

Fig. 2. Knowledge share in DHCS

In VSM style hypertext categorization, sparse matrix technique can be applied so that the communication cost is not an important issue any longer. For example, Joachims [12] has pointed out that an average document in the WebKB collection is 277 words long and contains 130 distinct terms. And the whole collection leads to 38,359 features. Apparently combining VSM and sparse matrix technique can benefit DHCS greatly. Thus the local knowledge of the categorization can easily be transferred to other computer nodes so that all nodes can obtain the global knowledge of the hypertext categorization.

4 Some Implementation Issues of DHCS 4.1 Allocation of Computation Load to Computer Nodes DHCS is implemented in the LAN environment so that we can ignore the cost of communications. And all computers in DHCS communicate with each other via simple broadcast. We have found this simple strategy works very well in DHCS. The next step is mapping DAGSVM nodes to the computer nodes. We focus on classify web pages in CERNET (China Education & Research Net). We define twelve top

DHCS: A Case of Knowledge Share in Cooperative Computing Environment

385

categories, which needs to construct 12(12-1)/2=66 DAGSVM nodes. In our DHCS, the four computers are assigned 15,17,17 and 17 DAGSVM nodes respectively. (Obviously, DHCS can be easily extends to the P2P computing environment.)

4.2 Information Exchange in DHCS To implement dynamic and incremental learning in DHCS, the four computers need to exchange information periodically. Here, we refer to the information but not the knowledge yet, because the computers broadcast the labeled samples periodically. Users can label the hypertext files independently while they surfing the Internet. In our demonstrative DHCS, the system is running on four personal computers, and the users belong to one research group, that is, they have the same research background and trust others information and knowledge.

4.3 SVM Training Algorithm One key issue of SVM is the training algorithm. Iterative training algorithms are very suitable for solving the SVM optimization problem. We implement the modified SMO algorithm [9] in DHCS, which is proved efficient. Meanwhile, the optimal hyper-parameters can be achieved via minimizing the generalization error. Based on the leave-one-out cross validation estimation, one can derive the performance estimators for finding the best hyper-parameters. In DHCS, we develop an efficient algorithm for tuning the hyper-parameters of DAGSVM [13].

Fig. 3. Experimental results of DHCS

386

S. Yu et al.

5 Experimental Results To evaluate the performance of DHCS, we run the system on four daily-using computers in our laboratory. Considering the vast dynamic web, we stop a computer node if it reaches a set point (we name it “satisfying point”), that is, the accuracy is over 50%. (In fact, with enough time and enough effort, higher accuracy is definitely reachable.) And every user’s judgment of the navigated web pages is the expert validation to other computers’ decision rules. Fig.8 shows how our DHCS runs in 12 days. Although this is a simple experiment, we can still see DHCS is stable and effective.

6 Conclusions and Future Work Hypertext categorization plays an increasingly important role in web information acquisition systems. It provides fundamental application interfaces for web information retrieval and web mining, and also benefits other research fields such as e-mail filtering and web users’ relevance feedback, etc. To avoid expensive manual labeling, cooperative learning method is a must for distributed web page categorization systems. In this paper, we have introduced a distributed hypertext classification system, which implements DAGSVM in the cooperative learning environment. With little communication cost, knowledge share is achieved at the same time. Experimental result has shown the proposed DHCS works well in a laboratory LAN. Nevertheless, there are still some problems to be explored. For example, how about running DHCS in P2P networks? In the hetergeneous P2P environment, is it still ac-ceptable to ignore the communication cost in DHCS? How does DHCS handle the hierachical categorization tasks? We will explore these aspects in the future work.

Acknowledgements. This work was supported by the Science & Technology Committee of Shanghai Municipality Key Research Project Grant 02DJ14045 and Key Technologies R&D Project Grant 03DZ15027.

References 1. 2. 3.

O’Day, V., R. Jeffries: Orienteering in an Information Landscape: How Information Seekers Get From Here to There. In: Proc. INTERCHI 93, 1993, pp. 438-445. Twidale, M. B., D. M. Nichols, G. Smith, J. Trevor: Supporting Collaborative Learning During Information Searching. In: Proceedings of CSCL95,1995, pp. 367-374. Hertaum, M., and Pejtersen, A.M. The Information-seeking Practices of Engineers: Searching for Documents as well as for People. In: Information Processing & Management, 36, 2000, 761-778.

DHCS: A Case of Knowledge Share in Cooperative Computing Environment 4.

5. 6. 7. 8.

9.

10. 11. 12.

13.

387

Fidel, R. Bruce, H. Pejtersen, A. Dumias, S. Grudin, J. and Poltrock, S. Collaborative Information Retrieval. In: L. Höglund, ed. The New Review of Information Behavior Research: Studies of Information Seeking in Context. London & Los Angeles: Graham Taylor. Jochims, T. Text Categorization With Support Vector Machines: Learning With Many Relevant Features. In: Proceedings of ECML-98. Berlin: Springer. 1998. 137-142. Jochims, T. Transductive Inference For Text Classification Using Support Vector Machines. In: Proceedings of ICML-99. US: Morgan Kaufmann Publishers. 1999. 200-209. S. T. Dumais, J. Platt, D. Heckerman and M. Sahami. Inductive Learning Algorithms and Representations for Text Categorization. In: Proceedings of CIKM98, 1998, pp. 148-155. J. Platt. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. In: Advances in Kernel Methods – Support Vector Learning. Cambridge, MA: MIT Press. 1998. 185-208. S.S. Keerthi, S.K. Shevade, C. Bhattacharyya and K.R.K. Murthy. Improvements to Platt’s SMO Algorithm for SVM Classifier Design. In: Neural Computation, 2001,Vol. 13, pp. 637-649. Chih-Wei Hsu, Chih-Jen Lin. A Comparison of Methods for Multicalsss Support Vector Machines. In: IEEE Transactions on Neural Networks. 2002, Vol. 13, No. 2, 415-425. J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large Margin DAGs for Multiclass Classification. In: NIPS2000. Cambridge, MA: MIT Press. 2000. 547-553. Thorsten Joachims. Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms. Norwell [M], MA, USA: Kluwer Academic Publishers. 2002. Shui Yu, Liang Zhang, Fanyuan Ma. Design and Implementation of a Large-scale Multiclass Text Classifier. Submitted to Journal of Harbin Institute of Technology, 2003.

Improving the Performance of Equalization in Communication Systems Wanlei Zhou1, Hua Ye1, and Lin Ye2 1

School of Information Technology Deakin University 221 Burwood HWY, Burwood. VIC. 3125. Australia.

{wanlei, hye}@deakin.edu.au 2

School of Adults Education Harbin Institute of Technology Harbin City, P.R.China [email protected]

Abstract. In this paper, research on exploring the potential of several popular equalization techniques while overcoming their disadvantages has been conducted. First, extensive literature survey on equalization is conducted. The focus has been placed on several popular linear equalization algorithm such as the conventional least-mean-square (LMS) algorithm, the recursive leastsquares (RLS) algorithm, the filtered-X LMS algorithm and their development. The approach in analysing the performance of the filtered-X LMS Algorithm, a heuristic method based on linear time-invariant operator theory is provided to analyse the robust performance of the filtered-X structure. It indicates that the extra filter could enhance the stability margin of the corresponding non filteredX structure. To overcome the slow convergence problem while keeping the simplicity of the LMS based algorithms, an optimal initialization is proposed.

1 Introduction The least-mean-square (LMS) based adaptive algorithm have been successfully applied in many communication equalization practices. The importance of the LMS algorithm is largely due to two unique attributes[1]: Simplicity of implementation Model-independent and therefore robust performance The main limitation of the LMS algorithm is its relatively slow rate of convergence. Two principal factors affect the convergence behaviour of the LMS algorithm: the step-size parameter and the eigenvalues of the correlation matrix R of the tapinput vector. The recursive least-square (RLS) algorithm is derived as a natural extension of the method of least square algorithm. The derivation was based on a lemma in matrix algebra known as the matrix inversion lemma. [2]. [3].

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 388–395, 2004. © Springer-Verlag Berlin Heidelberg 2004

Improving the Performance of Equalization in Communication Systems

389

The fundamental difference between the RLS algorithm and the LMS algorithm can be stated as follows: The step-size parameter in the LMS algorithm is replaced in RLS algorithm by

that is, the inverse of the correlation matrix of the input

vector U(n). This modification has a profound impact on the convergence behavior of the RLS algorithm in a stationary environment, as summarized here [4]-[10]: 1. The rate of convergence of the RLS algorithm is typically an order of magnitude faster than that of the LMS algorithm. 2. The rate of convergence of the RLS algorithm is invariant to the eigenvalue spread ( i.e., condition number ) of the ensemble-averaged correlation matrix R of the input vector U(n). 3. The excess mean-squared error of the RLS algorithm converges to zero as the number of iterations, n, approaches infinity. The computational load of the conventional RLS algorithm is prohibited in real time applications. The recursive least-squares ( RLS ) algorithm is characterized by a fast rate of convergence that is relatively insensitive to the eigenvalue spread of the underlying correlation matrix of the input data, and a negligible misadjustment, although its computational complexity is increased[11]-[14].

2 Experiment and Results We present the experiment results of three adaptive equalization algorithms: leastmean-square (LMS) algorithm, discrete cosine transform-least mean square (DCTLMS ) algorithm, and recursive least square ( RLS ) algorithm. Based on the experiments, we obtained that the convergence rate of LMS is slow; the convergence rate of RLS is great faster while the computational price is expensive; the performance of that two parameters of DCT-LMS are between the previous two algorithms, but still not good enough. Therefore we will propose an algorithm based on in a coming paper to solve the problems. It is well known that high data rate transmission through dispersive communication channels is limited by the inter-symbol interference (ISI). Equalization is an effective way to reduce the effects of ISI by cancelling the channel distortion. However, dynamic, random and time-varying characteristics of communication channels make this task very challenging. High speed of data transmission demands a low computational burden. Hence, simplicity and robust performance play a crucial role in equalizer design. Due to its good robust performance and computational simplicity, least-mean-square (LMS) based algorithms have received a wide attention and been adopted in most applications [2], but one major disadvantage of the LMS algorithm is its very slow convergence rate, especially in high condition number case . To solve this problem, a variety of improved algorithm have been proposed in the literature. Although their actual implementations and properties may be different but the underlying principle remains the same: trying to orthogonalize as much as possible the input autocorrelation matrix and to follow a steepest-descent path on the transformed error function. Therefore, we extend the least square algorithm to a recursive algorithm for the design of adaptive transversal filter. An important feature of the RLS algorithm is that

390

W. Zhou, H. Ye, and L. Ye

it utilizes information contained in the input data, extending back to the instant of time when the algorithm is initiated. The resulting rate of convergence is therefore typically an order of magnitude faster than the simple LMS algorithm. This improvement in performance, however, is achieved at the expense of a large increasing in computational complexity. The RLS algorithm implements recursively an exact least squares solution [10]. At each time, RLS estimates the autocorrelation matrix of the inputs and cross correlation between inputs and desired outputs based on all past data, and updates the weight vector using the so-called matrix inversion lemma. The DFT/LMS and DCT/LMS algorithms are composed of three simple stages [5]. First, the tap-delayed inputs are preprocessed by a discrete Fourier or cosine transform. The transformed signals are then normalized by the square root of their power. The resulting equal power signals are input to an adaptive linear combiner whose weights are adjusted using the LMS algorithm. With these two algorithms, the orthogonalizing step is data independent; only the power normalization step is data dependent. Because of the simplicity of their components, these algorithms retain the robustness and computational low cost while improving its convergence speed. Although the structure of the filtered-X LMS adaptive equalization scheme is a little bit different from that of the basic LMS adaptive equalization scheme, the control adjustment process is the same: adjusting the FIR model of the equalizer to minimize the least mean square error Therefore, the optimal solution is actually the limit of the best solution for the filtered-X LMS adaptive equalization algorithm. Similar to the case of the basic LMS adaptive equalization scheme, the filtered-X LMS adaptive equalization scheme can not, in general, achieve this limit. However, its optimal solution is still expected to be close to that point if the adaptive size is small. Therefore, the optimal initialization method proposed still applies here. Why not simply use the optimal model matching filter as the final equalizer? This is because of the presence of the model uncertainty and other unexpected disturbances. The optimal solution obtained in offline computation may not be optimal when the filter is implemented in the real world system because of model uncertainty and other unexpected disturbance. For the optimal initialization, a poorly identified system model may give rise to a low quality model matching solution. However, due to robustness of filtered-X LMS adaptive equalization scheme, this solution may still be well within the convergence region. By extensive simulations and experiments, it is observed that method proposed here can also cope with wide eigenvalue spread of the input without having to use Discrete Cosine Transformation (DCT) that was conventionally required. This is an advantage in real-time operation environment where computation burden is a critical factor. The focus of this work is on improving the equalization performance of the powerful LMS and RLS adaptive algorithms while minimizing the increase of the related computational complexity. Since these algorithms are very popular in real world applications, the attemption is significant. The channel equalization is an effective signal processing technique that compensates for channel-induced signal impairment and the resultant inter-symbol interference (ISI) in communications system. Many sophisticated techniques have

Improving the Performance of Equalization in Communication Systems

391

been proposed for equalization, most of successful real world applications are still dominated by techniques that are related to several popular algorithms, such as the adaptive LMS algorithm, the filtered-X LMS algorithm and the RLS algorithm. For high-speed commercial communication systems, simplicity, robust and fast convergence rate are critical criteria for the design of a good equalizer. The adaptive LMS algorithm, the filtered-X LMS algorithm, and the RLS algorithm meet some of these criteria. Unfortunately, none of them, alone, satisfies all these criteria. Therefore, research on exploring the potential of these techniques while overcoming their disadvantages is important and necessary, which is exactly what has been conducted in this paper.

3 A Fast Start-Up Technique Though the LMS algorithm does not actually converge to the least--mean--square solution that optimal model matching solution achieves, they are very close if the adaptive step size is small enough. Interestingly, not much effort is needed to find the filter

as the filtered-X LMS algorithm still converges so long as the

estimate of the channel P (z) has less than phase shift and unlimited amplitude distortion. The robust performance analysis of the LMS algorithm conducted by Hassibi, et al. reveals that sum of the squared errors is always upper bounded by the combined effects of the initial weight uncertainty and the noise ( i ). This evidence strongly supports that the optimal initialisation presented in this thesis can confine the error to a low level right from the beginning and hence improve the convergence rate dramatically. A major benefit of this approach is that it makes the adaptive process a virtual finetuning process if a reasonable initialization is obtained, which avoids experiencing a possibly long adaptation process in transit to the fine-tuning period. The advantage will be more clearly illustrated by a high eigenvalue spread case. Extensive simulation experiment has shown that, in many cases, the adaptive process starts from an acceptable performance, and it does not need any remedy like Discrete Cosine Transform ( DCT ) or Discrete Fourier Transform ( DFT ) even in the case with a very high input signal eigenvalue spread where the conventional LMS algorithm may fail and traditionally a remedy like DCT and DFT technique is required. The conventional filtered-X LMS is modified and introduced for the purpose of equalization. Generic integration of the filtered-X structure, LMS algorithm, RLS algorithm and optimal initialization is conducted to meet all paramount criteria of simplicity, robust and fast convergence for equalization of high-speed, distorted communication channels. Finally, various techniques proposed in this thesis are tested using a popular communication channel example, under both slight non-stationary and sever nonstationary conditions. Comparisons are made with other conventional methods.

392

W. Zhou, H. Ye, and L. Ye

Significant performance improvement has been observed by Mont Carlo. The effectiveness of the methods proposed in this thesis has been verified.

Fig. 1. Learning curves of the various adaptive algorithms experiencing a abrupt increase of impulse response of the channel by 35%

This experiment has verified a well known fact that the conventional adaptive LMS algorithm can track slight non-stationary environments such as slowly varying parameters. Now, a more severe non-stationary situation is tested by abruptly increasing the channel impulse response coefficients by 35% of its nominal value. Fig. 1 shows the simulation result. The conventional adaptive LMS algorithm begins to diverge while the filtered-X LMS algorithm with or without optimal initialization still maintains a good robust performance. The conventional RLS algorithm still has an acceptable performance, which matches the observation that when the time variation of the channel is not small, the RLS algorithm will have a tracking advantage over the LMS algorithm . The filtered-X RLS algorithm has a better robust performance. The robust performance enhancement by the introduction of the filtered-X structure is obvious and significant. From a computational point of view, optimal initialization needs an additional effort to solve an optimal model matching or filtering problem. Since this procedure is a non-iterative solution and can be done off-line, it does not increase the computational burden in online operation. The only extra online computational burden concerned comes from the extra filter that is involved in every adaptive step. However, that structure increases only a computation of one simple algebraic convolution. This poses no serious problem in computation at all.

Improving the Performance of Equalization in Communication Systems

393

4 Conclusions (1). The practical importance of LMS algorithm is largely due to simplicity of implementation and its robust performance and its main limitation is relatively slow rate of convergence. The RLS algorithm is characterized by a fast rate of convergence that is relatively insensitive to the eigenvalue spread of the underlying correlation matrix of the input data, and a negligible misadjustment. Although it is computational complexity. (2). The conventional filtered-X LMS is modified and introduced for the purpose of equalization. The famous filtered-X LMS algorithm has found very successful applications in the field of active noise and vibration control. It has inherited the elegant simplicity of the conventional LMS algorithm, and is very robust. For approach in analyzing the performance of the filtered-X LMS algorithm, a heuristic method based on linear time-invariant operator theory has been provided to analyze the robust performance of the filtered-X structure. It indicates that the extra filter could enhance the stability margin of the corresponding non filtered-X structure. In this thesis, a generic integration of the filtered-X structure, LMS algorithm, RLS algorithm and optimal initialization has been conducted to meet all paramount criteria of simplicity, robust and fast convergence for equalization of high-speed communication channels. (3). To overcome the slow convergence problem while keeping the simplicity of the LMS based algorithms, an optimal initialization is proposed. Though the LMS algorithm does not actually converge to the least-mean-square solution that optimal model matching solution achieves, they are very close if the adaptive step size is small enough. Interestingly, not much effort is needed to find the filter as the filtered-X LMS algorithm still converges so long as the estimate of the channel P(z) has less than phase shift and unlimited amplitude distortion [21]. The robust performance analysis of the LMS algorithm conducted by Hassibi, et al. reveals that the sum of the squared errors is always upper bounded by the combined effects of the initial weight uncertainty

and the noise

This

evidence strongly supports that the optimal initialization presented in this thesis can confine the error to a low level right from the beginning and hence improved the convergence rate dramatically. A big benefit of this approach is that it makes the adaptive process a virtual finetuning process if a reasonable initialization is obtained, which avoids experiencing a possibly long adaptation process in transit to fine-tuning period. The advantage will be more clearly illustrated by a high eigenvalue spread case. As it is well known that the conventional LMS converges very slowly or even fails to converge with a no matter how small adaptive step size due to high input signal eigenvalue spread. optimal model matching solution is independent of this input signal eigenvalue spread, and hence could avoid this trouble. Moreover, this idea can be combined with other speed-up techniques such as Discrete Cosine Transform (DCT) and Discrete Fourier Transform (DFT) as well as various adaptive algorithms. Extensive

394

W. Zhou, H. Ye, and L. Ye

simulation experiment has shown that, in many cases, the adaptive process starts from an acceptable performance, elated ven in the case with a very high input signal eigenvalue spread. Another approach proposed here is that it generally does not require detailed knowledge of the external signal which is a great advantage in practice. Since there exist many powerful tools solving filtering problem, including explicit solution, the method proposed in this thesis is very promising. (4). A popular communication channel example is used to test the proposed techniques, under both slight non-stationary and severe non-stationary conditions. The level of channel distortion is deliberately raised to a level that is much higher than any published result with system condition number as high as nearly 390. Furthermore, it is assumed that each tap weight of the channel undergoes an independent stationary stochastic process with each parameter fluctuating around its nominal value with a uniform probability distribution over the interval in addition to the white noise disturbance of variance 0.001 at the channel output. Mont Carlo simulation experiment of 1000 independent trials is conducted to obtain an ensemble-averaged learning curve. All adaptive algorithms have shown a good robust performance against the time varying, random Gaussian impulse response coefficient fluctuations specified above. The filtered-X LMS with the optimal initialization has been shown to have the fastest convergence rate and best performance. (5). A more severe non-stationary situation was tested by abruptly increasing the channel impulse response coefficient by 35% of its nominal value. The conventional adaptive LMS algorithm begins to diverge while the filtered-X LMS algorithm with or without optimal initialization still maintains a good robust performance. The conventional RLS algorithm has an acceptable performance, which matches the observation that when the time variation of the channel is not small, the RLS algorithm will have a tracking advantage over the LMS algorithm [1]. The filtered-X RLS algorithm has a better robust performance. The performance improvement by using the proposed techniques is significant and hence, the effectiveness of the new method has been verified. The contributions of this paper are: we are compared the LMS with DCT-LMS and RLS for adaptive equalizer first, then we will be conducted on how to speed up the convergence rate of LMS based algorithm while keeping the increased in-line computational burden as low as passible, we will overcome the slow convergence problem while keeping the simplicity of the LMS based algorithm, and the Optimal initialization has been applied in adaptive equalizer for communication systems. There still exists many open problems. For instance, the analysis of the stability margin of the filtered-X LMS was conducted in a heuristic manner. Can we extend this to a general case such as a discrete time MIMO case? What about the filtered-X RLS algorithm? Can we apply the ideas to other adaptive equalization techniques such as decision-feedback equalization, etc.? What happens if we use the optimal initialization instead of the optimal initialization? Another very active area of equalization is wireless communication where the phenomenon of fast multiple-path fading (Rayleigh fading) is very challenging. As indicated in the

Improving the Performance of Equalization in Communication Systems

395

simulation, rapid and not so small channel variations can cause the conventional LMS algorithm to diverge. It will be interesting and challenging, therefore, to apply the new techniques presented here to those areas in the future.

References 1. 2. 3. 4.

5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

R.D. Gitlin, J.F. Hayes, and S.B. Weinstein, Data communication principles, Plenum Press, New York, 1992. E.A. Lee and D.G. Messerschmitt, Digital communication, Second Edition, Kluwer Academic Publishers, 1994. S. Haykin, Adaptive filter theory, Third Edition Edition, Prentice Hall Information and System Sciences Series, 1996. David S. Bayard, “LTI representation of adaptive systems with tap delay-line regressors under sinusoidal excitation,” “Necessary and sufficient conditions for LTI representations of adaptive systems with sinusoidal regressors,” Proceedings of the American Control Conference, Albuquer, New Mexico, June 1997, pp. 1647-1651, pp. 1642-1646. Steven L. Gay, “A fast converging, low complexity adaptive filtering algorithm,” Proceedings of 1993 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 1993, pp.4-7. S. Elliott and P. Nelson. “Active noise control,” IEEE Signal Processing Magazine, Oct. 1993. Markus Rupp and Ali H. Sayed, “Robust FXLMS algorithms with improved convergence performance,” IEEE Trans. on Speech, Audio Processing, vol.6, no.1, Jan.1998, pp.78-85. E.A. Wan, Adjoint LMS: an efficient alternative to the Filtered-X LMS and multiple error LMS algorithms. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996, ICASSP-96, vol.3, pp.1842-1845. Markus Rupp, “Saving complexity of modified filtered-X LMS and delayed update LMS algorithm,” IEEE Trans. on Circuits and System-II: Analog and Digital Signal Processing, vol.44, no.1, Jan. 1997, pp.57-60. Steven L. Gay, “A fast converging, low complexity adaptive filtering algorithm,” Proceedings of 1993 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, USA, 1993, pp.4-7. J.M. Cioffi and T. Kailath, “Fast recursive least-squares transversal filters for adaptive filtering,” IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. ASSP-32, April 1984. pp. 304-338. F.T.M. Slock, “Reconciling fast RLS lattice and QR algorithms,” 1990 International Conference on Acoustics, Speech and Signal Processing, vol.3, New York, USA. 1990. pp. 1591-1594. M. Bouchard, S. Quednau, “Multichannel RLS Algorithms and Fast-Transversal- Filter Algorithms for Active Noise Control and Sound Reduction System,” IEEE Trans. Speech and Audio Processing, vol.8, no.5, Sep. 2000 Bouchard, M. and Quednau, S. “Multichannel Recursive-Least-Squares Algorithms and Fast-Transversal-Filter Algorithms for Active Noise Control and Sound Reproduction Systems,” IEEE Transactions on Speech and Audio Processing, vol. 8. No. 5, September 2000.

Moving Communicational Supervisor Control System Based on Component Technology Song Yu and Yan-Rong Jie School of Computer, North China Electric Power University, Baoding 071003,China [email protected]

Abstract. Based on XYZ/E language, the moving communicational supervisor control system(MCSCS) based on component technology is introduced in the paper. The authors presented the system architecture and gave its XYZ/E description briefly, discussed separately that central supervision center is case of server and case of both client and server, gave the data transmission program of the XYZ/E description briefly, presented the implement of the system combination state environment. Keywords: Component; supervisor control system; XYZ/E language; client; server

1 Introduction A component is a program body that works alone or in cooperation with other components. Once is defined, it hasn’t relation to its concrete implementation language [1].Components existence relies on architecture techniques to a certain extent [2], only in suitable architecture, a software may be abstracted, isolated, and ultimately turns into components. Components are minimum units of the software resue, ideally, the whole system is composed of several components which connected each other through interface definitions. CBD (Component-Based Development) is looked on software architecture as packaging blueprint[3], took resue software components as packaging prefabricated blocks, supports packaging software resue, is one of the effective ways which enhance software productivity and quality, reduce the side effect of developers leave, shorten the product delivery date. As software industry and software engineering techniques develop, software resue [4] is paid more and more attention to. During the developing course, the first thing we should do is to define the specification (static semantics) of the component according to the requirement. Then select the right architecture style; create the subcomponent and connector; write out the specification of every subcomponent. After constructing a whole component’s structure, we can create the correspondent procedure in XYZ/E (dynamic semantics), and decompose the abstract component in the next level till all static semantics having been converted into dynamic semantics and executable program. The CBD techniques has come a noticeable issue. The paper describes the component dynamic semantics of the MCSCS development course in XYZ/E, this method has evidently enhanced the software productivity and reliability.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 396–399,2004. © Springer-Verlag Berlin Heidelberg 2004

Moving Communicational Supervisor Control System

397

2 XYZ System and Temporal Logic Language XYZ/E XYZ system [5] is the software engineering tool system that is foundation of temporal logic language XYZ/E, it combines temporal logic with software engineering organically, its goal is to enhance the software productivity and reliability. XYZ/E corresponds to a wide spectrum language which each sublanguage represents different program mode or program paradigm. There are three forms of the control structure in XYZ: basic XYZ/E which represents status transformation directly, structured XYZ/E and production rule form XYZ/E. There are two temporal logic operators in XYZ/E, the future temporal logic operators: $O, $U and $W; the past temporal logic operators: $ $ [ · ] , $ , $S and $B. The basic commands are called conditional elements(CE) in XYZ/E, they take the following form:

The basic component of a XYZ/E program is called unit, it takes form: WHERE The structured CEs (statements) includes conditional statement, loop statement, case statement, wait statement, continue statement, select statement and parallel statement. All the features of XYZ/E indicate that it will benefit the abstract description, stepwise refinement, and procedure synchronization [6,7], it can express the real world flexibly, and figure out the implementation of the procedure, it is very natural and meaningful to apply XYZ/E to formalizing architecture.

3 System Design MCSCS includes 3 layers structure: CSC (Central Supervision Center) part, LSC (Local Supervision Center) part and SU (Supervision Unit) part. CSC and LSC is composed of software, SU mainly hardware. It is mainly running background software in CSC which includes processing parameter setting with data, configuration module, and basic stand module management, and it also includes image terminal. LSC includes data transmission program, protocol transformation service program, and historical data processor etc.. MCSCS can be looked as client/server (C/S) mode entirely. Being looked from TCP/IP process procedure, LSC acts as server, terminals and its down supervision units act as clients. CSC echoes client requests and supervises their actions, they send communicational requests each other through TCP/IP protocol and transfer data; looked from system logical structure, LSC acts as server, supervision terminals (ST) as clients, ST pose request, LSC works according to concrete request, SU is server and LSC is client between relationship of them, LSC issue request to SU or send out command parameters request, SU respond to it.

398

S. Yu and Y.-R. Jie

As LSC is server case, using LscServer represents server role. There are two client roles, TerminalClient represents terminal client role, SuClient represents down level client roles, the C/S mode between LscServer and TerminalClient is:

TerminalClient communicates with LscServer according to TCP/IP protocol through network. As LSC is both client and server case, LSC is double level roles, using LscDouble represents it. Terminal clients are represented by TerminalClient, the server role of SU is represented by SuServer. The C/S mode between LscDouble and TerminalClient is:

TerminalClient communicates with LscDouble according to TCP/IP protocol through network by means of message. Data transmission program is situated in foreground LSC server, the upper is connected with service program, the lower is connected with serial interfaces achieving double communication. Because of data transmission through network by means of message mode, it is hard to avoid bringing about errors in data transmission process for outer interference. MCSCS is demanded data to be reliable and real time, so command parameters from terminal reception and data from lower send to upper must be right, there for a data transmission program is needed, which fulfils the process of data analysis, check, and repacked. The data transmission program act as filter, which similar to a instance of pipe-filter style[8]. The filter object is two messages which corresponds to upgoing and down data (the upgoing data is that from SU send to LSC, the down data is that from LSC send to SU). The input data of the filter is get from buffer, after the data read from buffer, they are divided package, verified and packed, at last the data is put to queue waiting to be transmitted. Configuration environment is a auxiliary tool that was attached to supervisor control software, user needn’t professional programming to fulfill special function. Because the system requires displaying configuration environment for figure mode, it is used some configuration tools of editing figure interface. After configuration environment was built, we find suitable components to insert in the architecture based on definite requirements. If components just satisfy the requirement, we can only program link code; if the components has sum distance to requirements, we can properly modify the components and make them satisfy the requirements; we program corresponding code on the requirements, satisfy the interfaces demands of function requirements and architecture.

Moving Communicational Supervisor Control System

399

4 Conclusion This paper applies the based on components/architecture development idea to real supervisor control system. In course of development, we combine the black with white box reuse way to build the system. If the component can directly be used, we only developed interface program, or if it is not directly, we make suitable modification to fulfill white box reuse. The program of MCSCS in XYZ/E has been transformed C++ program through corresponding transformation tools, and has run correctly.

References 1. Pat Hall, Educational Case Study –What is the model of an ideal component? Must it be an object? Third International Workshop on Component-Based Software Engineering: Reflection on Practice. Papers: 2000 International Workshop on Component-Based Software Engineering 59-62 2. Ralph E. Johnson , Components, Frameworks, Patterns, ACM Software Engineering notes 1997,22(3)10-18 3. MEI Hong, Software Component Composition based on ADL and Middleware, SCIENCE IN CHINA(Series F), 2001,44(2)136-151 4. Premkumar T. Deranbu, Next Generation Software Reuse, IEEE Transactions on Software Engineering, 2000,26(5)423-424 5. Tang Zhi Song, Temporal logic programming and software engineering, Beijing: Scientific publishing house, 2002(5)40-66 6. Tang Zhi Song, Object, meaning and application of XYZ system, journal of software, 1999,10(4)337-341 7. Zhang Guang Quan, Software architecture concepts, styles and its descriptive language, Journal of Chong Qing Teacher-training Institute, 2000, 17(3)1-5 8. Mary Shaw, Software Architecture Perspectives on An Emerging Discipline, Prentice Hall,(1996)

A Procedure Search Mechanism in OGSA-Based GridRPC Systems Yue-zhuo Zhang, Yong-zhong Huang, and Xin Chen Department of Computer Science & Technology, Information Engineering University of PLA, Zhengzhou Henan 450002,China [email protected]

Abstract. This paper presents a way of searching remote procedures in OGSAbased GridRPC systems. GGF recommends a grid-enabled, remote procedure call mechanism (GridRPC) to provide a low barrier to acceptance for grid by providing a well-known and established programming model that allows the full use of grid resources while hiding the tremendous amount of infrastructure necessary to make grids work. In this paper, by defining a kind of Grid service called Procedure Search Service in OGSA, we present a procedure search mechanism for discovering the remote procedures in grid computing implementations based on OGSA and GridRPC.

1 Introduction Although Grid computing is regarded as a viable next-generation computing infrastructure, the widespread adoption of grid is still hindered by several factors [1]. One of the factors is that for an application programmer, it is very difficult to program directly on Globus I/O. In order to provide a low barrier to acceptance for grid use, GGF will produce a recommendation for a grid-enabled, remote procedure call mechanism (GridRPC) [1]. This GGF proposed recommendation will primarily consists of an Application Programming Interface (API), and associated programming model, that will enable simple, RPC-based use of grid computing resources. GridRPC will provide a well-known and established programming model that allows the full use of grid resources while hiding the tremendous amount of infrastructure necessary to make grids work. A draft programming model and API already exist [l].The current GridRPC model and API presented by GGF is a first-step towards a general GridRPC capability, there are certainly a number of outstanding issues regarding widespread deployment and use, one of which is simply discovery. Currently a remote procedure is discovered by explicitly asking a well-known server for a well-known function through a name string lookup. Establishing this function-to-server mapping is all that the user cares about and, hence, the GridRPC model does not define how discovery is done. This paper will discuss how to find the remote procedures for an application running in a grid computing environment which is based on OGSA [2] and uses GridRPC mechanism.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 400–403, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Procedure Search Mechanism in OGSA-Based GridRPC Systems

401

2 Related Works Currently you can implement GridRPC by Netsolve or Ninf-G[1]. The current GridRPC model has four prototype implementations and has not built on OGSA. As we know, OGSA is called the next generation grid architecture; it will be widely used in building grid computing environments. In grid computing systems that based on OGSA and GridRPC, the remote procedures can be seen as a kind of Grid service, and they can be registered and searched by the way that OGSA provides. In the following part of this paper, we will define the remote procedure as a kind of Grid service called Remote Procedure Service. The discovery of remote procedure becomes the discovery of such a Grid service instance by querying the registers. We will define a kind of Grid service called Procedure Search Service; this kind of service is used specially for searching Remote Procedure Service. A client can find the procedures it wants simply by sending its requirements of the procedures to the Procedure Search Service; and then the Procedure Search Service will return a number of procedures, each of them accords with the client’s requirement. The client can select a valid procedure from these procedures without query the registry service for many times if the procedure that it found is not valid.

3 Remote Procedures Search Mechanism 3.1 Remote Procedure Service Definition We will discuss how to define the remote procedures as a kind of Grid service at first, and we call this kind of Grid service Remote Procedure Service. The Remote Procedure Service will be consisting of Grid service data and Grid service interfaces. The part of Grid service data including all the information required specifying a remote procedure, such as the name, the parameters and functions of the procedure. The Grid service interfaces include at least three portTypes [2]: GridService, Registration and Factory; the other portTypes can also be included in the definition of the service if required.

3.2 Procedure Search Service Definition We define the Procedure Search Service to search remote procedures in OGSA-based GridRPC systems. A group of remote procedures’ GSHs will be returned to an application, which wants to find a procedure via sending a query to a Procedure Search Service. The Procedure Search Service’s interface includes five core portTypes in OGSA: GridService, Factory, Registry, NotificationSource, NotificationSink and a userdefined portType: Compare.

402

Y.-z. Zhang, Y.-z. Huang, and X. Chen

Procedure Search Service implements a user-defined portType called Compare; and this portType will help the service to compare the WS-Inspection document [3] returned by a register with the specification document comes from a client which specified what kind of procedure it wants to find, if the former is the match of the later the Procedure Search service will return the GSH to the client as an answer.

3.3 Procedure Search Course in Details There are three roles in the discovery course: an application which we call it a client, a Procedure Search Service and the registers. The main idea of the discovery mechanism is that a client sends it requirements of the remote procedure to an Procedure Search Service at first, then the service subscribe information of remote procedures registered in a register, if the requirement accord with the information returned by the register, it means that the procedure with such information is the answer. The discovery course can be described as the flowing steps: 1. A client sends its procedure requirement to a Procedure Search Service. 2. The Procedure Search Service describes the information registered in a local register as notification sink; and the local register sends the information registered in it to the Procedure Search Service periodically. 3. The Procedure Search Service compare the information returned by the register with the requirement comes from the client, if the former is the match of the later it will register the corresponding Grid service’s GSH in itself. The information returned by the local register maybe refer to a register; the Procedure Search Service will also register the GSH of such a register. With the register of other registers, the Procedure

Fig. 1. Sequence diagram of procedure search course

A Procedure Search Mechanism in OGSA-Based GridRPC Systems

403

Search Service can search in such registers if it can’t find valid information from the local register, in this case we call the local register the first register and the other registers the second, the third, etc. 4. Init the registers that registered in the Procedure Search Service as a set, take the second register out and search it by following step2 and step3, then take the third, four, etc. After each search, some new registers will be added in the set, search each of the registers iteratively and delete the register from the set if it has been searched. If the set is null, stop the searching. 5. The Procedure Search Service can search the registers in the order of depth-first or breadth-first. 6. The search will not stop until a number of procedures have been found, or it will stop when the search time reaches a predefined threshold value. The Procedure Search Service unregistry all the registers that has been registered in it. Now, what registered in the Procedure Search Service is GSHs of the Remote Procedure Services that can meet the client’s need. 7. The Procedure Search Service returns the GSHs registered in it to the client. If it can not find a remote procedure that meet the client’s need, it will return a fault message and ask if the client want to search again, if the client want to search again, the steps mentioned above will be repeated. Figure 1 illuminates the procedure search course.

4 Discussion and Conclusions We discuss a procedure discovery mechanism in OGSA-based GridRPC systems. We define the remote procedures as a kind of Grid service called Remote Procedure Service and design a Procedure Search Service as an agent for discovering remote procedures instead of discovery by directly querying the registry service. The discovery mechanism presented above can obtain a group of procedures that accord with the client’s requirement. The client can find a valid procedure from them without querying the registry service for many times.

References 1. Hedemoto Nakada, Satoshi Matsuoka, Keith Seymour, Jack Dongarra. GridRPC: A Remote Procedure Call API for Grid Computing (2002). http://graal.ens-lyon.fr/GridRPC/pub/APM_GridRPC_0702.pdf 2. S. Tuecke, K. Czajkowski, I. Foster, J. Frey, S. Graham, C. Kesselman, T. Maguire, T. Sandholm, P. Vanderbilt, D. Snelling, S. Open Grid Services Infrastructure (OGSI) Version 1.0. Global Grid Forum Draft Recommendation (6/27/2003). http ://www. globus .org/research/papers/Final_OGSI_Specification_V1.0.pdf

An Improved Network Broadcasting Method Based on Gnutella Network* Zupeng Li1,2, Xiubin Zhao2,3, Daoyin Huang1, and Jianhua Huang1 1

National Digital Switching System Engineering & Technological R&D Center, [email protected] 2

Telecommunication Engineering Institute, Airforce Engineering University 3 Northwestern Polytechnical University No. 783 P.O.Box 1001, Zhengzhou, 450002, P.R.China Tele: 86-371-3532770; Fax: 86-371-3941700

Abstract. Peer-to-peer networking is a hot buzzword that has been sweeping through the computing industry over the past year or so. Gnutella, as one of the first operational pure P2P systems, is considered as an important case study for P2P networking. By analyzing the Gnutella network topology data, we can discover both the small diameter and the clustering properties characteristic of “small-world” networks. Based on this, the paper aims at analyzing and predicting the performance of current broadcasting algorithm and proposing an improved broadcasting method on Gnutella. By avoiding the unnecessary message forwarding in the network, flow of network communication is remarkably reduced in the new algorithm.

1 Introduction Peer-to-peer network (P2P) technology is a currently emerging technology in the network research domain [1,2] .In the P2P network research domain, Gnutella [3] is considered to be the first completely decentralized peer-to-peer protocol which has been created. The first client was written largely as an experiment by developers at Nullsoft, a subsidiary of AOL. Upon launch, Gnutella was swiftly labeled an “unauthorized freelance project” by AOL and removed from the Nullsoft website. The Open Source community soon continued its development and there now exist billions of clients operating the protocol. In this paper, the “small-world” property of Gnutella network is firstly discussed. In the following section, we present an analysis of the problems of Gnutella network. Armed with the “small-world” property of the underlying network topology, an improved network broadcasting method - Intelligent Network Broadcasting Method (INB) is proposed to avoid the unnecessary message forwarding in the Gnutella network. Finally the conclusion is given in the last part.

*

This research is supported by the National High Technology Development 863 Program of China under Grant Nos. 2001-AA-11-1-141.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 404–407, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Improved Network Broadcasting Method Based on Gnutella Network

405

2 “Small-World” Properties The term “small-world” originated with a famous social experiment conducted by Stanley Milgram in the late 1960s. By analyzing the Gnutella network topology data obtained by, we can discover both the small diameter and the clustering properties characteristic of “small-world” networks.

The values for the Gnutella topology graphs are benchmarked against two widely used “small-world” models, the Watts- Strogatz [4] and the Barabási-Albert model [5], the random graph and the 2-D torus in tables 1 and 2. As can be seen, all of the Gnutella topology snapshots demonstrate the “small-world” phenomenon: characteristic path length is comparable to that of a random graph, while the clustering coefficient is an order of magnitude higher. These results clearly indicate strong “smallworld” properties of the Gnutella network topology.

3 Intelligent Network Broadcasting Method 3.1 Problems of Gnutella Network Gnutella, as one of the first operational pure P2P systems, is considered an important case study for P2P networking. It consists of many peers, all of which are similar in functionality. There are no specialized directory servers. Peers must use the network of which they are part to locate other peers. As outlined in [6], the main problem with Gnutella is its use of broadcasts for searching (and discovering) the network. As the network grows in size, not only does the rate of messaging increase, but the traffic potentially generated by each message increases too.

406

Z. Li et al.

3.2 Description of INB Algorithm According to the “small-world” property, the system will tend to form more highly connected clusters of nearby machines in use. The important fact is that if nodes A and B are connected, along with B and C, then A is more likely to be connected to C than if the network was purely randomly connected. With more clustering, this effect will be more apparent. We can exploit this to remove much of the redundancy in broadcasting. Firstly, the algorithm assumes each node knows who each of its immediate neighbors is connected to. This knowledge can easily be passed to neighboring nodes, either with regular refreshes, or notifications of changes. Armed with this information, some unnecessary message forwarding can be avoided, which is shown in (a) of Fig. 1:

Fig. 1. Example I of intelligent network broadcasting

In this small example, node A receives a broadcast message and forwards it to B and C. With Gnutella, B and C would then forward the message to each other (and both promptly ignore the repeat), resulting in two unnecessary messages. However, with this scheme B knows that C is connected to A, so will have already received the broadcast. Hence, B does not forward to C, and similarly C does not forward to B. We can also do slightly better, if we make the algorithm a little more complex, as we can see in (b) of Fig. 1 .Implemented Gnutella-style (assuming nodes do not forward the message back to where it was received from), this would again result in two wasted messages. In this scheme, C uses the knowledge that A is connected to B and is connected B to D to imply that D will have already received the broadcast from A. Of course, B could imply the same given the knowledge that C is connected to both A and D, in which case B would not forward the message to D. To prevent this, some ordering of nodes is needed, so that in these situations the nodes know which one needs to forward the message. Using standard alphabetical ordering above, B must forward the message as B
An Improved Network Broadcasting Method Based on Gnutella Network

407

has received the message with UID m already. The algorithm for the routing at each node is outlined as below:

4 Conclusions In this paper, the “small-world” property of Gnutella network is firstly discussed. According to the “small-world” property, the P2P system will tend to form more highly connected ‘clusters’ of nearby machines in use. Armed with the “small-world” property of the underlying network topology, we present an analysis of the problem of Gnutella network and propose an improved network broadcasting method - Intelligent Network Broadcasting Method (INB). By avoiding the unnecessary message forwarding in the network, flow of network communication is remarkably reduced.

References 1. Geoffrey Fox, Peer-to-Peer Networks, Web Computing, Vol. 3, No. 3, pp. 75–77, May/June 2001. 2. Manoj Parameswaran, Anjana Susarla, P2P Networking: An Information-Sharing Alternative, IEEE Computing Practices, Vol. 34, No. 7, pp. 31–38, July, 2001. 3. Kan, G, Peer-to-Peer: Harnessing the Benefits of a Disruptive Technology, O’Reilly and Associates, Inc., Sebastopol, California, Gnutella, 2001. 4. Watts, D. J. and Strogatz, S. H., Collective dynamics of small-world networks, Nature, 393:440-442, June 1998. 5. Barabási, A. and Albert R., Emergence of scaling in random networks, Science, 286:509512, October 15, 1999. 6. Eytan Adar and Bernardo A. Huberman,Free, Riding on Gnutella, First Monday, volume 5, number 10 ,October, 2000.

Some Conclusions on Cayley Digraphs and Their Applications to Interconnection Networks* Wenjun Xiao1,2 and Behrooz Parhami3 1

Dept. of Computer Science, South China University of Technology, Guangzhou, 510641,P.R. China 2 Dept. of Math., Xiamen University, Xiamen, Fujian, 361005, P.R. China [email protected] 3

Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106-9560, USA. [email protected]

Abstract. In this short communication, we survey the relationships between Cayley digraphs and their subgraphs and coset graphs with respect to subgroups and obtain some general results on homomorphism and broadcasting between them. We also obtain a factorization of Cayley digraphs on subgraphs. We discuss the applications of these results to well-known interconnection networks. These conclusions possess potential application to grid computing.

1

Introduction

It is known that Cayley (di)graphs and coset graphs are excellent models for interconnection networks [1], [2], [5]. Many well-known interconnection networks are Cayley (di)graphs or coset graphs. For example, hypercube, butterfly, and cube-connected cycles networks are Cayley graphs, while de Bruijn and shuffleexchange networks are coset graphs [5]. As suggested by Heydemann [5], general theorems are lacking for Cayley digraphs and more group theory has to be exploited to find properties of Cayley digraphs. In this paper, we consider the relationships between Cayley (di)graphs and their subgraphs and coset graphs with respect to subgroups and obtain some general results on homomorphism between them. We provide several applications of these results to well-known interconnection networks. Before proceeding further, we introduce some definitions and notations related to (di)graphs, Cayley (di)graphs in particular, and interconnection networks. For more definitions and basic results on graphs and groups we refer the reader to [3], for instance, and on interconnection networks to [6], [7]. Unless noted otherwise, all graphs in this paper are directed graphs. A digraph is defined by a set V of vertices and a set E of arcs or directed edges. The set E is a subset of elements of V × V. If the subset E is symmetric, that is, implies we identify two opposite *

This work was supported by the Natural Science Foundation of China and Fujian Province.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 408–412, 2004. © Springer-Verlag Berlin Heidelberg 2004

Some Conclusions on Cayley Digraphs and Their Applications

409

arcs and by the undirected edge We then obtain a graph. Let G be a (possible infinite) group and S a subset of G. The subset S is said to be a generating set for G, and the elements of S are called generators of G, if every element of G can be expressed as a finite product of their powers. We also say that G is generated by S. The Cayley digraph of the group G and the subset S, denoted by Cay(G, S), has vertices that are elements of G and arcs that are ordered pairs for If S is a generating set of G then we will say that Cay(G, S) is the Cayley digraph of G generated by S. If (1 is the identity element of G) and then Cay(G, S) is a simple graph. Assume that and are two digraphs. The mapping of to is a homomorphism from to if for any we have In particular, if is a bijection such that both and the inverse of are homomorphisms then it is called an isomorphism of to Let G be a (possible infinite) group and S a subset of G. Assume that K is a subgroup of G (denoted as Let G/K denote the set of the right cosets of K in G. The (right) coset graph of G with respect to the subgroup K and subset S, denoted by Cos(G, K, S), is the digraph with the vertex set G/K such that there exists an arc if and only if there exists and The following basic result is easily verified. Theorem 1. The mapping to Cos(G,K,S) for

2

is a homomorphism from Cay(G,S)

Main Results

The tense product denotes the digraph with the vertex set and the arc set Let us assume that the group G satisfies G = NK, where N is a normal subgroup of G (denoted by ), and that is, G is the semidirect product of N by K. Let and where S is a generating set of the group G. Then, any element of G can be uniquely expressed as with Define the corresponding of to Then, it is easily verified that is a bijection, and we have the following result. Theorem 2. The mapping digraph

is a homomorphism of the digraph

to the

Proof. Omitted. As applications of Theorem 2, we consider the following two examples. Example 1. Relating the butterfly network to the de Bruijn network Let and Then, is a semidirect product of N by K. Assuming we have where is a directed cycle of order and Thus, we have a homomorphism from to In fact, it is easily shown that it is an isomorphism and

410

W. Xiao and B. Parhami

Example 2. Relating the cube-connected cycles and the shuffleexchange network Assume and as in Example 1. Let Then, where is a directed cycle of order with a loop at every vertex, and In this way, we obtain a homomorphism of to We now consider the broadcasting problem for interconnection networks. Broadcasting, a communication operation whereby a message is sent from one processor to all others, is a basic building block in the synthesis of parallel algorithms. The time to send a message from a processor to a neighboring one depends on the communication model assumed, with linear- and constant-time models being the two main choices. We assume the constant-time model, wherein communication between adjacent processors needs one time unit. Besides communication delay, other assumptions relating to the communication mode are needed. We assume that messages are sent in store-and-forward mode, where a processor cannot use the contents of a message, or send it on to another processor, until it has been received in its entirety. Given a connected graph G (representing an interconnection network) and a message originator the broadcast time of the vertex is the minimum time required to complete broadcasting from vertex under the model M. The broadcast time of G under M, is defined as the maximum broadcast time of any vertex in G; i.e., For more details, we refer the reader to [4]. Now let G be a finite group and Assume that and for some generating set S of G. For a communication model M, let be the minimum time required to complete broadcasting in the vertices of K from the identity element 1 (which is the message originator). One of our main results is as follows. Theorem 3. Proof. Omitted. As applications of Theorem 3, we revisit Examples 1 and 2. Example Consider the butterfly network and the de Bruijn network By Example 1, we know that It is easily shown that Hence, we obtain In general, is easily derived. For example, under the unit-time store-and-forward communication model, we have So, we can obtain an upper bound on when we know some upper bound on Similarly, any known lower bound on leads to a corresponding lower bound for Example Consider the cube-connected cycles and the shuffleexchange network By Example 2, we have where So, the observations made in Example apply here as well. The methods can be extended to the other communication problems and to undirected graphs.

Some Conclusions on Cayley Digraphs and Their Applications

411

We assume that is a finite digraph (possibly with loops) having the vertex set and the arc set Let be the automorphism group and the adjacency matrix of the digraph Suppose that are orbits of the group G acting on such that for and Let be the digraph with the vertex set and the arc set for Then, we have Let denote the empty set. It is easily verified that: (1) (2) for (3) We denote the above as Let be a Cayley digraph Cay(H, S) for a finite group H and its generating set S. Then H may be regarded as the left regular automorphism group of Because the digraph is a Cayley digraph per [4]. In fact, it is easily proven that and Thus, we obtain the following factorization theorem and associated examples. Theorem 4. Example 1” . For the butterfly network we have: Example 2” . For the cube-connected cycles we have:

3

Conclusion

In this paper, we have supplied general theorems on homomorphism and broadcasting between Cayley digraphs and their coset graphs, and a factorization theorem on subgraphs of Cayley digraphs. We have also shown the applications of these results to some well-known interconnection networks: the butterfly network, the de Bruijn network, the cube-connected cycles network, and the shuffleexchange network. Many other useful directed and undirected networks can be similarly formulated and studied. Because of the generality of these theorems, we believe that they will have further applications to interconnection networks, providing an interesting area for further research. In particular, the design of scalable interconnection networks for parallel processing, offering the desirable properties of simple routing algorithms, balanced communication traffic, and resilience to node and link failures, can benefit from our results.

References [1] Akers, S.B., Krishnamurthy, B.: A Group Theoretic Model for Symmetric Interconnection Networks. IEEE Trans. Computers, 38 (1989) 555-566 [2] Annexstein, F., Baumslag, M., Rosenberg, A.L.: Group Action Graphs and Parallel Architectures. SIAM J. Computing, 19 (1990) 544-569 [3] Biggs, N.: Algebraic Graph Theory. Cambridge University Press (1993)

412

W. Xiao and B. Parhami

[4] Fraigniaud, P., Lazard, E.: Methods and Problems of Communication in Usual Networks. Discrete Applied Mathematics, 53 (1994) 79-133 [5] Heydemann, M.: Cayley Graphs and Interconnection Networks. In: Graph Symmetry: Algebraic Methods and Applications. (1997) 167-224 [6] Leighton, F.T.: Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes. Morgan Kaufmann (1992) [7] Parhami, B.: Introduction to Parallel Processing: Algorithms and Architectures. Plenum (1999)

Multifractal Characteristic Quantities of Network Traffic Models* Donglin Liu1 and Dianxun Shuai1 Department of Computer Science, East China University of Science and Technology, Shanghai 200237, P.R. China {ldliu,shds}@ecust.edu.cn

Abstract. This paper presents research on network flow behavior by adopting some new theories and tools. Firstly, the attractors of the network flow time sequence are reconstructed. Secondly, we find and classify four kinds of network flow with outburst character in a LAN, and study their multifractal spectrums in their reconstruction phase space. These effective micro parameters of the network traffic can be effectively exploited for the controlling and modeling of the network behaviors and the recognizing characterizes among the difference outburst traffic models.

1 Introduction In the last few years, many researches not only have shown convincingly the presence of long rang dependence (LRD) as a very prominent property in today’s LAN/WAN, but also have pointed out there were enrich multifractal characteristic at the small time scale([1],[2]). The empirically observed fractal of selfsimilar nature declares the deficiency of classical traffic models such as Poisson or Markov process. In order to understand and construct the internal dynamics in various levels of the network traffic system, we must consider, at least, these problems: What kind of information is embedded in the network flow internal dynamics? How can we characterize such internal dynamics? How can we extract such internal dynamics from the observable network flow signals? In this paper, concentrating on these problems, we investigate a characterization method of the network internal dynamics. This paper is devoted to depict the macroscopically chaotic nature in the networks by some nonlinear characteristic quantities. To this end, we reconstruct the isometrically isomorphism phase space of a network dynamic system by using the embedding theory, then study the multifractal feature of four types of network traffic model to detect and identify the local singularities, which accounting for a better understanding of the *

Supported by the National Key Foundational R&D Project(973) under Grant No. G1999032707, the National Natural Science Foundation of China under Grant No. 60135010 and No. 60073008, and the State Key Laboratory Foundation of Intelligence Technology and System, Tsinghua University.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 413–417, 2004. © Springer-Verlag Berlin Heidelberg 2004

D. Liu and D. Shuai

414

network traffic. The results are very instructive for more effectively exploring the essential features of massive information systems.

2

Chaos Dynamics of Network Traffic

Here we look upon network flow as time sequence with single variable and suppose attractors of dynamical network system can be acquired from the sequence. So we apply Takens[3] embedded theory and Grassberger[4] phase space restructure method to construct the orbit of one-dimension time sequence in highdimension phase space. Multifractal is new to the network area, and they have been broadly employed in many diverse fields. The multifractals provide a structural modeling approach for WAN traffic [5] and capture the observed more effectively exploring the essential features of massive information systems.scaling phenomena at large timescales as well as small timescales in an effective, compact and parsimonious manner. In order to gather micro parameter to represent network behavior from seemingly random network flow, we employ multifractal spectrum below: Definition 1. The

correlation function is defined as

where N denotes the number of phase points in the phase space is a given constant; is the distance between every two phase points. is Heaviside function. Thus the correlation dimension of is

In fact, the multifractal spectrum almost embodies all fractal dimensions in the fractal theory, for example, is the fractal capability dimension; is the Reny info-dimension is the incidence dimension. Furthermore, the famous hausdorff dimension can be gotten by the inequation: for all We give two definitions as important as the to explore and describe the internal dynamic characterization of the network traffic as follows. Definition 2. Let be the multifractal spectrum of the network traffic, the fractal spectrum width is defined as follows:

where Definition 3. Let dimension difference.

be the multifractal spectrum of the network traffic, denote thus is called the sensitive

Multifractal Characteristic Quantities of Network Traffic Models

3

415

The Multifractal Characters of Network Traffic Models

The experiment environment where we performed our monitor and analysis is an Ethernet made up of over 10 hosts. We choose the Ethernet because it is a typical small-sized LAN, whose traffic is moderate and has a relatively concentrative working hour. We recorded the packets/s and octets/s values everyday and stored them into sample pool according to their own types. Then samples of packets/s at different hours or types are chosen as experiment objects. It has been proved that the two distinct signal datas may have the same fractal dimension[6]. On the other word, it is, however, not enough to characterize the internal dynamic structure of the traffic burst from those macro nonlinear quantities, such as the Fractal dimension, Kolmogorov entropy and the largest Lyapunov exponent.

Fig. 1. (a)The four kinds of network traffic models (b).The multifractal spectrum curves of the four kinds of network traffic models

So, in this section, we will study the characteristics of the network traffic by the multifractal theory. Firstly we sort out four kinds of network flow models approximately according to their different burst characters. They are: smooth flow model, continuous burst flow model, intermittent burst flow model and single burst flow model with highest peak value. Fig.(1-a) shows example of such four kinds of traffic model in real network. Then we calculate the multifractal spectrum of these traffic models using the improved GP algorithm, where as As shown in Fig. (1-b), their incidence dimensions are quite similar, but the multifractal spectrum curves of the four traffic models are substantially different. So it is

416

D. Liu and D. Shuai

proved that the multifractal spectrum curves are more effective characters to describe the network traffic dynamics than those scalar quantities, especially for the outburst traffic. In addition, as the iteration value q increases, the multifractal spectrum of the model A gradually decreases, which is resemble to the sigmoid curve of the fractal function, but the of others changes abruptly on some iteration value q In other word, the curve of the smooth traffic model exhibits better steady than the other three burst traffics. Hence it verifies that the network traffic dynamics will change seriously when emergent traffic is coming. In order to further analyze the traffic characters of the different burst traffic, we get some multifractal parameters of the four kinds of traffic, such as the fractal spectrum width, the sensitive dimension difference and the capability dimension, where the fractal spectrum width and the sensitive dimension difference could be calculated through definition 2 and definition 3, respectively. As shown in Table. 1, these quantities of the smooth traffic model are all less than those of others traffic models, and their value will be increases abruptly with the emergent traffic coming. Moreover the more outstanding the traffic burst is, the larger these multifractal parameters become. Thus in the model D, these multifractal parameters values of are the largest ones because there are the highest peak value, whereas the parameters values of the model B are close to those of model A, because the outburst flow in model B is standing and average. So we can draw the similar conclusions: these multifractal parameters are the valid characteristic quantities to the distinct dynamic characterization of the different burst traffic models.

4

Conclusion

In this paper, we analyze network flow time sequence that we collected in real traffic, by applying chaos and fractal theory. By studying the multifractal spectrum curves we obtained micro quantities of the four kinds of network traffic models. The results verify that these multifractal parameters are effective characteristic quantities to depict the network internal dynamics, especially for the outburst network traffic. Theoretically, these parameters can reflect macronature of network flow behavior from different aspects.

References 1. Feldmann A., Gilbert A. C. et al.: Data networks as cascades: Inverstigating the multifractal nature of Internet. WAN traffic [C]. Vancouver, Canada: Proc. of the ACM Sigcomm’98 (1998) 25–38 2. Gilbert A. C. et al.: Scaling Analisis of random cascade, with applications to network traffic [J]. IEEE Trans. Inform, Theory. 3 (1999) 971–991 3. Takens F.: Detecting strange attractors in turbulence [J]. Lecture Notes in Math, 1981, (898):366-381. 4. Grassberger Procaccia I.: Dimension and Entropy of Strange Attractors from a Fluctuaing Dynamics Approach[J]. Physica. 1984,13D:34.

Multifractal Characteristic Quantities of Network Traffic Models

417

5. V. Paxson, S. Floyd.: Wide area traffic – The failure of Poisson modeling, IEEE/ACM Trans. Networking. 3 (1995) 226-244 6. Arduini F., Fioravanti S., Giusto D. D.: A multifractal-based approach to natural scene analysis[C]. NY, USA: International Conference on Acoustics Speech and Signal Processing. IEEE. 4 (1991) 2681-2684

Network Behavior Analysis Based on a Computer Network Model* Weili Han, Dianxun Shuai, and Yujun Liu East China University of Science and Technology, 200237, Shanghai, P.R.China [email protected], [email protected], [email protected]

Abstract. This paper applies a new traffic model, iterated function systems (IFS) for network traffic modelling, to explore computer network behaviour and analyse network performance. IFS model can generate various self-similar data flows according to a certain character. In the light of the experiments, we have discovered interesting phenomena. It is found that increasing the routers’ processing capacity will have quite different influence on the power spectrum from that for enlarging the routers’ buffer size. Furthermore, phase transition and type power law is found under mixed traffic model.

1 IFS for Fractal Traffic Modeling of Computer Networks In recent years the fractal nature of Internet traffic has been observed [1,2], and many self-similar models for network traffic (such as on/off models, FARIMA models, etc.) have been presented so far [3]. Here we introduce a new approach, IFS, to produce the fractal network traffic. Definition 1. The two dimensional affine transform is defined as

When is called a strictly contractive affine transform with coefficient vector Definition 2. Iterated function systems (IFS) consists of a set of strictly contractive affine transforms over a complete measurement space (X, d), where d is the distance function in set X. The IFS is also represented as with the coefficient vector of being As for a given networks’ traffic data, we can always construct such an IFS that its attractor will be a good approximation to a given traffic-time curve. In our simulation experiments based on the IFS, the iterations start out from a specific rectangle graph in the height h that can be used to control the magnitude of the generated traffic. To

* Supported by the 973 Project of China under Grant No. G1999032707, the National Natural Science Foundation of China under Grant No. 60135010 and No.60073008 M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 418–421, 2004. © Springer-Verlag Berlin Heidelberg 2004

Network Behavior Analysis Based on a Computer Network Model

419

ensure the fidelity of the generated traffic with respect to the real traffic, the transform coefficients are subject to the following constraints.

It was validated that IFS can reconstruct network traffic at much less cost [4]. Consequently, IFS model is quite suitable for simulating fractal network data flow. In this paper, we are concerned about the burst character of the network traffic. Here we adjust the coefficient vectors of the affine transform to generate the following four types: relatively stationary, continuous burst, intermittent burst, and single peak burst.

Fig. 1. Packet loss rate under different processing capacity (1,4,8,16) and different traffic types

Fig. 2. Power spectrum of average queue length in output buffer

420

W. Han, D. Shuai, and Y. Liu

Fig. 3. Average delay of packets versus packet creation rate p

Fig. 4. Power spectrum of average queue length under mixed traffic model with different p

2 Simulation Experiments and Conclusions The connective topology used in our simulation experiments consists of 10 routers, which is constructed according to the statistical rule of the real network connections. Each router has two buffers: an input buffer and an output buffer. At each time step, each router has three functions: generating packets, transmitting packets and accepting packets. In what follows, The transport layer protocol UDP is adopted. In this section, at first we are concerned about the differences of network behaviour under the four different traffic types mentioned above. For comparison, we so adjust parameter h that the mean traffic of every traffic type could be approximately equal. As shown in Fig.1, with the increasing of the router’s processing capacity, packet loss rate will decrease, but the decrease degree depends greatly upon traffic type in networks. For the single peak burst traffic the increase of router’s processing capacity is much less effective than other traffic types in terms of packet loss rate. For the power spectrums of the average queue length, as illustrated in Fig. 2, firstly type power law is found. When the router’s processing capacity is quite small, the power spectrums of queue length in output buffer almost obey type power law for all the four traffic types, which implies that the network is in a high congestion state. With the increase of the router’s processing capacity, the spectrum exponent n falls down, which means that the network is in a harmony situation with the performance improved. Particularly, only the stationary traffic will cause n to

Network Behavior Analysis Based on a Computer Network Model

421

greatly change from 3 to 1, which implies that the network situation will changes from a congestion phase to a non-congestion phase. Whereas for other traffic types n just changes a little with the increase of router’s processing capacity. Hence the stationary traffic is the most sensitive to router’s processing capacity. Moreover, we also investigate the influence of the buffer size on network behaviour. Simulation experiments show that enlarging buffer size can lower packet loss rate, whereas has little influence on the power spectrums of queue length. Since the macroscopic situation of a complex system can be described much better by some power spectrum, it is a reasonable conclusion that the macroscopic performance of networks cannot improved very much by simply increasing the buffer size of router. In the above experiments, all the routers adopt the same traffic model for one experiment. To more practically simulate the variety of network application, the following experiments apply a mixed traffic model, i.e. each node generates packets under a certain traffic model randomly. Consequently there exist different burst features for different routers at the same time. In order to simulate the randomness of user behaviors, we introduce packet creation rate p. At each time step, user sends out packets at a probability p. As can be seen from Fig.3, phase transition occurs at the point p=0.2. As for the power spectrum of the average buffer length, shown in Fig.4, type power law is found. The spectrum exponent n increases with the growth of p, but it has a saturated value, a number up to 3, when p is beyond a certain value. In conclusion, simulation results show that router’s processing capacity has more influence than the buffer size in terms of the improving of the macroscopic performance of networks. Comparisons of the behaviors under four different traffic types denote that stationary traffic is the most sensitive to router’s processing capacity. Phase transition and type power law exist in network under mixed traffic model, and the spectrum exponent of average queue length will increase to a saturated value with the growth of users’ packet creation rate.

Reference 1. Leland, W.E. et al: On the self-similar nature of Ethernet traffic(extended version). IEEE/ACM Trans. Networking (1994) 2:1-15 2. Crovella, M. E., Bestavros, A.: Self-similarity in World Wide Web traffic: evidence and possible causes. Proc of SIGMETRICS, Philadelphia, PA, (1996)160-169 3. Willinger, Taqqu, W., M.S., Erramilli, A.: Oxford University Press 4 (1996) 4. Wu, X.J., Chai, Z.C., Shuai, D.X.: Network Flow Reconstruction and Network Behavior Research. Proc. Int.Conf. on System Simulation and Science Computing, Shanghai, China, (2002)729-733

Cutting Down Routing Overhead in Mobile Ad Hoc Networks Jidong Zhong and Shangteng Huang Department of Computer Science and Technology Shanghai Jiaotong University, 200030 Shanghai, China {zhongjidong, huang–st}@cs.sjtu.edu.cn

Abstract. An ad hoc network is a P2P network, formed by a collection of wireless mobile hosts requiring no fixed infrastructure. Many on-demand routing algorithms proposed for such kind of networks incur much routing overhead during the route discovery process. This paper presents Band Zone Route Discovery, a new route discovery scheme to cut down the routing overhead. The basic idea of the proposed scheme is to use cached routes to limit the range of route discovery. Simulation comparisons show that the new scheme can reduce the number of routing packets needed in the route discovery process.

1 Introduction An ad hoc network is a P2P wireless network without any fixed infrastructure or centralized administration. Since nodes in the network may move at their own will, great endeavor has been devoted to designing new routing protocols [1,2,3] pertaining to this kind of network. When a mobile node attempts to send data to a destination for which it does not have a route, it initiates its process of route discovery to search for the destination node by broadcasting to its neighbors a route request (RREQ) packet. Each node receiving an RREQ rebroadcasts it, unless it itself is the requested destination or it knows how to reach the destination. Such a node generates a route reply (RREP) to the received request, and the route reply retraces the way back to the original source. Upon receipt of the route reply, the source node sends out those waiting data packets. Many other on-demand routing protocols follow the same flooding model of routing discovery as that of AODV [2] and DSR [1]. However, flooding mechanism generates too many overhead RREQ packets. Therefore, much effort has been devoted to cutting down the routing overhead in ad hoc networks. Yih-Chan Hu and David B. Johnson [4] at Carnegie Mellon University have evaluated several caching strategies on performance improvements for their DSR protocol. H. Lim and C. Kim [5] proposed two new flooding schemes: self-pruning and dominant pruning to reduce unnecessary packet rebroadcasts. Our paper will present a novel way of routing discovery using cached history routes. The rest of this paper is organized as follows. Section 2 details our scheme to cut down the routing overhead. In section 3, the implementation of our new scheme is provided to show how it works with the existing ad hoc routing protocols. In section 4, we present M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 422–425, 2004. © Springer-Verlag Berlin Heidelberg 2004

Cutting Down Routing Overhead in Mobile Ad Hoc Networks

423

simulation results and their interpretations. Section 5 concludes the paper by a conclusive analysis of our new approaches.

2 Band Zone Route Discovery (BZRD) An ad hoc network can be modeled as a graph G, where mobile hosts are represented by vertices, and links by edges. The vertex set V(G) is the set of all vertices of G. The distance between u and w, is the length of the shortest path between u and w, and is denoted as d(u, w). For further discussion, we first introduce some new terms. The expiry route to the destination cached by the source node is called a trunk route, denoted by TR. The vertices on the truck route are called trunk nodes. A TR band zone between u and w is a set of vertices where V(TR) is the set of vertices on the trunk route TR between u and w, and TW is a constant defined as the trunk width of the band zone. If all trunk nodes are included in the TR band zone of width TW, it is very likely that a route to the destination can be found in such a zone. When a source node has a trunk route to the destination, it starts the route discovery by broadcasting a route request with TW set as the trunk width. Upon receipt of the route request, a node decides whether it itself is the required destination or it has an active route to the destination. If an active route for the destination is found, it simply returns a route reply to the source node; otherwise, it first decrease TTL (Time To Live in the IP header of route request) and TW by 1 respectively, and if the decreased TTL and TW are both not equal to zero, the received route request is then forwarded (with TW updated as the trunk width if it has an expiry route to the destination); if instead TTL or TW is equal to zero, the received route request is silently discarded. Two very important parameters, TW and TTL, play a critical role in our scheme and must be carefully chosen.

3 Incorporating BZRD into AODV Each route entry in the routing table is associated with a routing flag to indicate the state of the routing entry. In our implementation, routing flags can be one of the following states: RTF_UP, RTF_DOWN, RTF_EXPIRED, and RTF_IN_REPAIR. RTF_UP indicates that the route entry is active and valid for use, whereas route entries with RTF_DOWN flags are invalid routes. Routes under local repair are marked with RTF_IN_REPAIR. RTF_EXPIRED is a state between RTF_UP and RTF_DOWN, and shows that the route entry is inactive. An active route entry is associated with a lifetime ACTIVE_ROUTE_TIMEOUT. If an active entry expires, its routing flag is set as RTF_EXPIRED for a lifetime of current time plus MAX_CACHE_TIMEOUT. When a route entry with RTF_EXPIRED flag expires, it will be marked as RTF_DOWN. MAX_CACHE_TIMEOUT is a parameter defined as how long the expiry route entry stays at RTF_EXPIRED state. The route discovering process works in the same way as AODV does. The only difference is that if there is an inactive route entry with RTF_EXPIRED flag for the destination, the source node starts a band zone route discovery with TW set as the

424

J. Zhong and S. Huang

trunk width. For any node receiving the route request, it has to decrease both TTL and TW by 1. Local repair for broken routes is also implemented in our scheme.

Fig. 1. Drop percentage of routing overhead

Fig. 2. Comparison of average route discovery delay (TW=2)

4 Performance Comparison We evaluate our new scheme against AODV on ns-2 [6]. Our simulation is based on a network of 50 mobile wireless nodes, moving about over a rectangular (1500m× 300m) flat space for 900 seconds of simulated time. The pause time defines how long each node remains stationary before it moves. We ran our simulation with 10 different pause times: 0, 100, 200, 300, 400, 500, 600, 700, 800 and 900 seconds. In our simulation, the maximum speed is 20m/s. ACTIVE_ROUTE_TIMEOUT is set as 10s, and the trunk width is set as 2 and 3 hops with MAX_CACHE_TIMEOUT accordingly as 10s and 15s. We used 40 constant bit rate (CBR) sources to originate

Cutting Down Routing Overhead in Mobile Ad Hoc Networks

425

data packets between the source and the destination. In order to evaluate how the route discovery and the local repair can benefit from BZRD, we experimented with the sending interval of 5s and 16s respectively. The following metrics are used to evaluate the performance of BZRD against AODV. 1. Routing overhead drop: 100%–(the total number of routing packets for BZRD / the total number of routing packets for AODV). 2. Packet delivery ratio: The rate of successfully delivered data packets. 3. Average route length: The average length of all established routes. 4. Average route discovery delay: The average time taken for the route discovery process to discover and establish routes. Fig.1 shows that for the sending interval of 16s, BZRD can cut the routing overhead by about 20%-30%. For a sending interval of 5s, we still see a 0-20% improvement over AODV since BZRD can help limit the range of route discovery of local repair. BZRD achieves almost the same packet delivery ratio and average route length with AODV. However, for page consideration, the results are not shown here. In Fig.2, at higher rates of mobility, BZRD behaves poorly because BZRD is more likely to fail to find the route by the cached expiry route. However, as mobility drops in the network, the average route discovery delay of BZRD converges to that of AODV. This is because BZRD performs better at lower rates of mobility.

5 Conclusions This paper has presented a new way of route discovery using cached expiry route information. This new scheme utilizes the history routes to help contain the flood of route requests. TTL specifies how far the route request may propagate, whereas TW limits how wide the request may spread across. Simulation comparisons against AODV have been made to show that our scheme can cut down the routing overhead while not degrading the overall performance.

References 1. David B. Johnson and David A. Maltz: Dynamic Source Routing in Ad Hoc Wireless Networks. In: Mobile Computing. Kluwer Academic Publishers, Boston, New York, 1996, pp.53-181. 2. Charles Perkins, Elizabeth M. Belding-Royer and Samir R. Das: Ad Hoc On-Demand Distance Vector (AODV) Routing. Internet Draft, draft-ietf-manet-aodv-11.txt, June 2002. 3. Zygmunt J. Hass and Mark R. Pearlman: The performance of Query Control Schemes for the Zone Routing Protocol. IEEE/ACM Transactions on Networking 9(4) (2001) 427-438. 4. Yih-Chun Hu and David B. Johnson: Caching Strategies in On-Demand Routing Protocols for Wireless Ad Hoc Networks. In: Proc. of ACM MobiCom 2000, Boston, MA, August 2000. 5. H. Lim and C. Kim: Flooding in Wireless ad hoc networks. Computer Communications 24(2001) 353-363. 6. K. Fall and K. Varadhan (eds.): ns Manual. http://www.isi.edu/nsnam/ns/ns-documentation.

Improving Topology-Aware Routing Efficiency in Chord Dongfeng Chen and Shoubao Yang Computer Science Department, USTC, Hefei 230026, China [email protected], [email protected]

Abstract. Due to their minimum consideration to an actual network topology, the existing peer-to-peer (P2P) overlay networks will lead to high latency and low efficiency. In TaChord, we present a study of topology-aware routing approaches in P2P overlays, and present an improved design in Chord. We evaluate TaChord and other algorithms by physical hops, interdomain-adjusted latency, and aggregate bandwidth used per message. From experimental results TaChord shows the drastic improvement in routing performance. Finally, we show that the impact of cache management strategies in the TaChord overlay can’t be neglected.

1 Introduction Peer-to-peer (P2P) Internet applications have recently been popularized through file sharing applications such as Gnutella and Freenet. These systems have many interesting technical aspects such as decentralized control, self-organization and adaptation. However, these systems such as Gnutella may have scaling problems. Meanwhile, several research groups have developed a new generation of scalable P2P systems that support a distributed hash table functionality; among them are Tapestry, Pastry, Chord[1], and CAN. In these systems, files are associated with a key, and each node in the system is responsible for storing a certain range of keys. Chord in its original design does not consider network proximity at all. As a result, messages may travel arbitrarily long distances in Internet in each routing hop. And it assumes that nodes in the system are uniform in resources such as network bandwidth and storage. This results in its routing without taking actual network topology and differences between node resources into consideration. In topology-aware Chord, a secondary overlay, which is layered on Chord, maintains Chord characteristics. Topology-aware routing algorithms won’t ignore the latencies of individual hops, and are not prone to result in high latency paths[2]. In [3], a super-peer network has the potential to combine the efficiency of a centralized search with the autonomy, load balancing and robustness to attacks provided by distributed search. In this case, we also adopt super peers in topology-aware Chord. These super peers maintain node ID lists of the autonomous system (AS), as well as routing cache. In this paper, we propose our P2P system based on Chord system, and demonstrate its potential performance benefits by simulation. The rest of this paper is organized as follows. Section 2 describes the design of the topology-ware system with super peers, and Section 3 present preliminary simulation results. We conclude in Section 4. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 426–429, 2004. © Springer-Verlag Berlin Heidelberg 2004

Improving Topology-Aware Routing Efficiency in Chord

427

2 Topology-Aware Chord Routing In the topology-aware overlay, the routing idea is to choose routing table entries to refer to the topologically nearest among all nodes. At the same time, we use super peers to improve routing performance on the overlay. These super peers have high bandwidth and fast access to the wide-area network. So a TaChord system provides a shortcut routing algorithm by looking up super peers’ caches. When it receives a routing message, a node n firstly checks its finger table, routing table and local node ID list. If the node where the key is stored is not found, it will be sent to the node’s super peers. If the super peers’ caches show that no other node has looked up the key before, the node n needs to determine the next node closet to the message destination from these data sets above. Figure 1 is the pseudo code of TaChord algorithm.

Fig. 1. Simplified TaChord routing algorithm

n is the node which wants to find the successor node of an identifier key. The algorithm firstly checks whether current node is the node responsible for the key. To reduce message traffic in inter-domain network, super peers keep a cache to store remote nodes they or all nodes in the same domain have communicated with. If the destination is found in the cache, it is returned to the sender. If not, n picks the node closest to the destination node. The process continues in a similar fashion to the next node next till the query is satisfied. To initiate the cache of super peers, all nodes in the same domain will send the query result to its super peers as soon as the query is finished, and super peers will add the result into their cache. Many strategies are used in managing these cache entries.

3 Evaluation and Results In this section, we present some analysis and simulation results showing the performance improvement possible with the use of TaChord. Here we use a two-level hierar-

428

D. Chen and S. Yang

chical topology generated by BRITE[4]. We constructed Chord networks of size 4096, and marked 100 as the size of AS.

3.1 Hop and Latency We measured the performance of hop and latency using three algorithms: 1. original Chord, 2. topology-aware Chord without super peer cache, 3. TaChord. The TaChord parameters are set to super-peer set size k = 3 and FIFO cache removal strategy. We measured the hop and latency PDF (probability density function) of different algorithms. Figure 2 shows a distribution of the hop per message. Both topology-aware Chord without super peer cache and TaChord algorithms improve upon original Chord point-to-point routing. This is because topology-aware Chord incorporates topology information to routing. In TaChord, if the destination is found in its super peer cache, the node sends the message directly to the destination.

Fig. 2. Distribution of Latency

Fig. 3. Aggregate bandwidth used per message

Fig. 4. Performance of k-Redundancy

Fig. 5. A 4096 node network with 100 domains

We finally measure the aggregate bandwidth taken per message delivery, using units of (average sizeof(MSG) * hops). As shown in Figure 3, when we don’t take intradomain bandwidth used into account, TaChord dramatically reduces interdomain bandwidth usage per message delivery. That is because TaChord, which uses cache and topology information, forwards meesages directly to the destination, and reduces message forwarding on the interdomain.

Improving Topology-Aware Routing Efficiency in Chord

429

3.2 Super Peer Redundancy and Cache Removal Policy A super peer may become a single point of failure for its domain. That is, when the super peer fails or leaves unexpectedly, the local node list may be destroyed and cache in the super peer will be not available any longer. Here we only consider the case where k=1, k=3, and k=5, respectively, and we constructed Chord networks of size 1024, and marked 50 as the size of AS. As shown in Figure 4, super-peer redundancy is good. We see that latency improve greatly as k increases. This is expected, since a k-redundant super-peer has much greater availability and reliability, and has cache entries k times more than a single super-peer. However, super-peer redundancy costs a lot. When a node joins or leaves, each super peer must receive or update metadata. Besides, when a node sends queries to each super peer in a round-robin fashion, it must increase intradomain network traffic. Besides we compare the performance of some simple cache removal policies, such as FIFO (First-In-First-Out), LRU (Least Recently Used), and LFU (Least Frequently Used). To simplify the analysis, we consider the case where the keys are limited in one part of the one-dimensional circular key space. Figure 5 shows that LRU and LFU show a little improvement in Hops and Latency found in the simplistic simulation.

4 Conclusion We present improved routing in Chord, which significantly reduce the overhead of routing and bandwidth consumption in an interdomain overlay. Simulations confirm that TaChord yields good performance at low overhead. Super-peer redundancy is good, and also increases the overhead of topology-aware overlay construction and maintenance. Cache management strategies can also affect the performance improvement in TaChord. The investigation of the overlay maintenance, super-peer size, and cache removal policies is ongoing.

References 1. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. ACM Sigcomm, 2001. 2. Ben Y Zhao, Yitao Duan, etc. Brocade: landmark routing on overlay networks. Electronic Proceedings for IPTP’ 02, 7-8 March 2002 –Cambridge, MA, USA 3. Beverly Yang and Hector Garcia-Molina. Designing a super-peer network. International Conference on Data Engineering, IEEE computer society. 2003 Bangalore, India 4. Brite, a network topology generator, http://www.cs.bu.edu/brite/ 5. Z. Wu, L. Ma, K. Wang, Efficient Topology-aware Routing in Peer-to-Peer Network. GCC2002, p172-185

Two Extensions to NetSolve System* Jianhua Chen, Wu Zhang, and Weimin Shao School of Computer Engineering and Science, Shanghai University, 200072, Shanghai, China [email protected] [email protected]

Abstract. This paper deals with the NetSolve system based on the cluster--ZQ2000. The work mechanism of NetSolve is pointed out clearly. Server Proxy, a new component running on pre-server was used as a bridge between clients and servers. Moreover, a Zero-Changed scheme in source code is also presented.

1 Introduction The NetSolve System, designed by the University of Tennessee and Oak Ridge National Laboratory, is a representative implementation of Grid. NetSolve is a software system founded on the concepts of remote procedure call (RPC) that allows for the easy access to hardware and software computational resources distributed in both geography and ownership. This article is meant to probe into the implementation of NetSolve and describe its two extensions. Section 2 of this article gives a general overview of the NetSolve System. Section 3 describes a new NetSolve component, Server Proxy. And in Section 4, Zero-Change in source code scheme was presented. Section 5 concludes with future research and NetSolve design directions.

Fig. 1. The NetSolve System

* This work is supported by the key project (No. 205269) and development Fund (No.205155) of Shanghai Municipal Education Commission M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 430–433, 2004. © Springer-Verlag Berlin Heidelberg 2004

Two Extensions to NetSolve System

431

2 The NetSolve System The NetSolve system is comprised of a set of loosely connected machines. By loosely connected, we mean that these machines are on the same local, wide or global area network, and may be administrated by different institutions and organizations. Figure 1 shows the global conceptual picture of the NetSolve system. In this figure, we can see the three major components of the system: the NetSolve client, the NetSolve agent and the NetSolve servers. As the system matures, several promising extensions of NetSolve emerged: An interface to the Condor system, an interface to the ScaLAPACK parallel library, a bridge with the Ninf system, and an integration of NetSolve and Image Vision [1].

3 Server Proxy The Server Proxy is a new component of NetSolve, designed by NetSolve research group of Shanghai University. We found NetSolve system has a drawback. NetSolve server needs a real IP address. But cluster, MPP and other super–computer could not expose all their interior nodes to outer network. The common structure of supercomputer is that they have one or several pre-servers which possess real IP address and the rest fake IP address(local network IP address). The pre-servers have double IP addresses like Gateway. If we run a Server Proxy on pre-server or on Gateway, it can set up a bridge between client and interior NetSolve server.

3.1 The Work Mechanism of NetSolve System Figure 2 gives the whole course. First, client sends out a request netsolve (‘dgesv’, A,b); The client Proxy ask Agent whether there are any servers that provide service “dgesv”. And servers’ information had been registered to Agent before. Agent returns an array of qualified servers to client proxy, the best server always in first position of the array. Then client proxy tries to set up a TCP Socket connection with these servers, and returns first reachable server to client. Client then connect to that server, and send request data. Thus, if the server has no real IP, the client proxy in outer network will always fail in connecting to the servers. This is the key point of the drawback above-mentioned. The server proxy can avoid these things happening. The part in dotted line of the following figure shows the new mechanism when server proxy was added in.

3.2 Feasibility and Implementation of Server Proxy Server proxy must run on the pre-server or gateway of super-computer. It can ensure the success of connecting between client and server. A problem is that all the data must pass through pre-server, which may cause a bottleneck. Starting up several server proxies in different pre-servers can solve this problem if each server proxy

432

J. Chen, W. Zhang, and W. Shao

manages a part of the interior servers. Moreover, in order to improve efficiency, we can set up a connection pool between server proxy and interior servers.

Fig. 2. NetSolve Request

As the addition of server proxy, server should add the server proxy information to Agent. Server proxy must support multi-thread. It first listens on a port, accepts connecting request and generates a new socket, then receives IP address from the new socket, and finally connect to the IP with another socket. After that, server proxy binds the two sockets into a socket couple. Server proxy just transmits data between sockets of this socket couple, and does nothing else. Agent manages server proxy. It has the right to kill server proxy. Server proxy registers to Agent at first time. If server proxy dies, Agent will delete the server proxy information of relevant servers.

4 Zero-Change in Source Code Scheme The domain scientists always want to use computational resources on Internet, but they are frightened by complicated network knowledge. NetSolve system alleviated the burden of domain scientists; a few changes in users source code can make network computational resources available. While, the Zero-Change scheme will make the few changes into zero change. We want to build up an automatic component that can transfer a sequential program into a program that can be run on NetSolve system. This will be accomplished in two steps. Step one, matching. Find out all the functions or subroutines that can match a service name in NetSolve system, translate the corresponding statements into statements that can be run in NetSolve. Thanks to the unique international name standard on computational library, the same function or service between users’ program and NetSolve’s services can exchange each other.

Two Extensions to NetSolve System

433

Step second, optimizing. Nonblocking Call, Request Sequence and Task Fanning can improve computation performance, as discussed in [2] [3] [4]. Here we want to use those techniques to get an optimization to the code produced from first step. We analyze this code and transfer it into a syntax tree. To the irrelative NetSolve Call, Nonblocking Call should be used to replace the Blocking Call as mush as possible. If there exists repetition in input or output parameters of NetSolve Calls, Request Sequence should be used. While a series of requests to a same NetSolve service emerges, the Task Farming would be used to simplify these situations. The automatic component produced by this scheme could bring up a lot advantages: Users could take advantage of computational resources on web without changing anything in their source code. This will entirely alleviate the burden of domain scientists; If users upload his program to pres-server directly, automatic component on pre-server will modify it to NetSolve-format, then optimizing, compiling and finally solving it within the cluster or LAN. That will improve performance greatly. The difficulty of this scheme is optimizing algorithm. We only implemented the first step at present.

5 Conclusions and Future Work There are many ways in which NetSolve should be further extended. The present NetSolve System uses Kerberos [5] for secure measure. The emerging of server proxy brings up a serious secure problem. The present NetSolve system solves problem on the level of library function. A thinner granularity solution that accomplishes those functions with several servers should be developed.

References [1] H. Casanova and J. Dongarra, Applying NetSolve’s network enabled server, IEEE Comput. Sci. & Eng., 5:3 (1998), 57-66. [2] D. C. Arnold, D. Bachmann and J. Dongarra. Request Sequencing: Optimizing Communication for the Grid. [J].In Euro-Par 2000-Parallel Processing, August 2000. [3] Dorian C. Arnold and Jack Dongarra. The NetSolve Environment: Progressing Towards the Seamless Grid[M]. http://icl.cs.utk.edu/netsolve. [4] H. Casanova, M. Kim, J. S. Plank, and J. Dongarra. Adaptive Scheduling for Task Farming with Grid Middleware. The International Journal of supercomputer Applications and High Performance Computing, 1999. to appear. [5] B. C. Neuman and T. Ts’o. Kerberos: An Authentication Service for Computer Networks. IEEE Communications, 32(9):33-38, September 1994.

A Route-Based Composition Language for Service Cooperation Jianguo Xing Department of Computer Science & Engineering , Hangzhou University of Commerce No.149 Jiaogong Rd. Hangzhou, P.R.China 310035 [email protected]

Abstract. Service-oriented programming model provides an effective way to integrate different applications across distributed, heterogeneous, dynamic environment which is common in e-business. Web service, with WSDL, SOAP and related protocols, is such an industrial effort to address the interoperation of different systems. Service composition is to combine such services, which distributed among different organizations, to new complex and value-added service while preserving the autonomy of the old ones. In this paper, we identify some common patterns and scenarios of service composition and then propose a general framework for dynamic service publish, discovery and life management which is the basis of service composition. By extending and enhancing web service definition, a route-based composition language xSCL is proposed to solve problems that impede service reuse.

1 Introduction With the popularity of the Internet, web is becoming an important way to share information for enterprises and all kinds customers. Currently, such information mostly exists as static or dynamic html files on web servers and user retrieves and browses these files with browser such as Explorer or Netscape. Such Browser-Server structure only cares about how to display the http file in screen and does not care what information it contains and human must be the information interpreter and take next step action according its contents. Because there is no common interface on how to interpret the information, such B-S structure is hard to implement e-business automation, which often need to combine several services and information from different organizations. For example, a customer wants to buy a book with lowest price. First, he uses a search engine to find a list of companies who have the book he need. Then he retrieves book information from each company. By comparing the price and other advantages each company provided, he chooses one and order it. In such buying book process, there involves several service providers and services. For buyer, he only cares about to buy a book while spending little money. He does not care how and where to find the book. He wants a service that can do this in a “single” step. In e-business there is also similar requirements to combine several low-level services to accomplish a high-level transaction. Services can be combined possibly at different levels of granularity and new services can be synthesized out of the existing ones. This is where web service comes in. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 434–437, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Route-Based Composition Language for Service Cooperation

435

With XML and related protocols web services provide end users an interface which is independent of any particular platform or language. Service composition is to combine several services that deployed on different web site to accomplish a single job. Although the need and opportunity of service composition in the e-marketplace is obvious, there is no thorough research conducted in this area. Some issues such as compos ability, compatibility, transaction and substitutability are still open and require developing new protocols and effective tools to support the development and delivery of composite services.Web service composition is currently an active area of research, with many languages being proposed by academic and industrial research groups, such as IBM’s WSFL, Microsoft’s XLANG and BPEL4WS[1]. DAML-s is an alternative one that is not based on XML. The outline for this paper is as follows. Section 2 analyses typical patterns of service composition in e-business. Section 3 introduces our general framework for service composition. Section 4 discusses how to extending web service protocol to support service composition and gives the description of xDSL.. Finally section 5 concludes the paper and presents future directions.

2 Patterns of Service Composition in E-business Service composition can be considered along the following dimensions: data, process, security, and protocol. Here we only talk about process composition. In particular, with process oriented service composition, following aspects deserve more attention: 1. Order: which indicates the services executing order in time space.; 2. Dependence: which indicates data and function dependency among the services. 3. Flow control: which indicates execution order under different trigging conditions. 4. Alternative service execution: which indicates there is any alternative service in the service composition that can be invoked. Here we only refer to some basic patterns such as sequence, parallel, join and split: 1. Sequence: Services executing in sequence. The sequence pattern is used to model consecutive steps in e-business process. 2. Parallel: Services executing in parallel. 3. Join: when 2 or more services finished or time-out in a given time, then using “F” to determine the next step action. “F” is a joining map, may be “ AND”, “OR”, or even very complicated relationship of service A and B’s outputs, such as “MAX (parameter X of A and B)” 4. Split: when a service finished, split several services using function F. For example, when Service A failed, then Service S1 executes, else S2. Another split example is simpler: when Service A finished, execute 2 parallel services S1 and S2. Other complex patterns occurred in e-business such as m joins and n split, Conditional / unconditional loop and transaction ( commit and rollback ), can be described with above basic patterns[2].

436

J. Xing

3 Framework for Service Composition In section 2 we give several patterns for modeling e-business process. In this section, we begin to discuss architecture issues to implement service composition. Generally, service composition includes two phases: first is to translate user’s requirement into a composition script, second is to execute the planned script to fulfill user’s requirement. In the real world, this two phase may be interweaved together. For example, if the script execution result is right but too low to accept, then planner can refine the script to speed up. Service Planner: in our service composition framework, a service planner is to provide a detailed service composition plan which can satisfy user’s request. Its final outcome is the script that describes the route, service dependency and some other information, such as security, alternative route, transaction and QoS Requirement. It can be manually written by user himself or can be generated in the fly according some rules. For most complex cases, an intelligent planner is better than human beings. An ideal planner should provide an abstract representation for some services that have similar semantics but with some structure or QoS differences. Such location transparency, same invoking interface, explicit QoS support and so on are also requirement for a planner which greatly simplify the work of service composition. Another problem a planner must concerns is the dynamic services. A service that is live during planning phase may be down when the planned script is running. So the planner must provide a try-catch-finally mechanism. Service scheduler: in our service composition framework, a service provider has 3 layers to deal with service request. They are communication layer, schedule layer and implementation layer. Communication layer receives and sends data with SOAP, HTTP and other protocols. Implementation layer has the codes that actually deal with service request. Between communication and implementation layer, we introduce a schedule layer. This layer has several jobs to do. First it receives the composition script and parses the route and data dependency information it contains. If necessary, it forwards the script to other URLs. And then establishes a schedule table to describe when and how to execute the script. When time arrives or all the needed data have been collected, the scheduler invokes the program residing in implementation layer and passes input parameters to it. It also gathers the result and returns to service requestor or forwards to other service.

4 Extension of WSDL and xSCL Specification WSDL defines 4 transmission primitives that an endpoint can support: 1) One-way (The endpoint receives a message); 2) Request-response (The endpoint receives a message and sends a correlated message); 3) Solicit-response (The endpoint sends a message, and receives a correlated message); 4) Notification (The endpoint sends a message). But WSDL does not explicitly define the input message sender and the output message receiver. It implicitly assumes that service requester provides the input parameters and receives the return parameters. This greatly limits the usage of web service, especially for service composition. In some case, a web service needs data from several different locations and returns the result to different place.

A Route-Based Composition Language for Service Cooperation

437

With this extension, we introduce the xSCL definitions. In a xSCL script, the root element is ‘action’, which has several attributes. For example, attribute ‘WSDL’ is the location of this service’s wsdl file. The script scheduler can get the WSDL description from this attribute. Another attribute TransactionID is an unique tag to notify the scheduler that messages from different place belongs to one script instance. For same script, each time execution of the script, a new TransactionID should be allocated. It also includes some QoS attributes to instruct the script scheduler. An action is equivalence to a service, whether its is a simple service or composed service. Generally, it accompanies with a WSDL description. In an action block, there are 0-n parallel blocks and sequence block. In these blocks can include action block again. For an action block, it can explicitly designate where to get its input variables, and where to return its result. An action has several optional elements and attributes. Attribute ‘WSDL’ is the location where we can find the web service definition of the action. Attribute ‘Trigger’ is the condition when the scheduler to invoke the action. For example, we will not execute the action until all the input parameters arrive. This is its default behavior. In some case, we only need partial inputs. Attribute ‘Timeout’ is the time that scheduler waits for input parameters. Element ‘input’ and ‘output’ are the parameters description, which can be mapped to the interface of web service which action belongs to. Attribute ‘as’ of input/output links the collected data to web service parameters. And attribute ‘to’ and ‘from’ indicate the destination and source of input/output parameters. Attribute ‘function’ indicates how to combine or split the input/output parameters. More information about xDSL and examples can contact the author.

5 Conclusions and Future Work In this paper, we give several typical composition patterns that occur in e-commerce and extend current web service definition and propose a framework for service composition. A new service composition language xSCL as an extension of WSDL is proposed. However, there are still several issues need to be addressed. First, the service composer or planner is not an easy problem if we consider the dynamic of services. Currently in our works, we avoid this by using a statically written script. Second, the xSCL language deliberately does not define QoS options, especially the time requirement. Considering that QoS requirement can be supported from higher level, such as script execution level, we left the QoS as an optional implementation.

References 1. Curbera, F., Goland, Y., Klein, J., Leymann, F., Roller, D., Thatte, S., and Weerawarana, S. Business Process Execution Language for Web Services,2002 2. Van der Aalst, et al., Workflow Patterns, Distributed and Parallel Databases, 14(3), pages 551, 2003

To Manage Grid Using Dynamically Constructed Network Management Concept: An Early Thought* Zhongzhi Luan1, Depei Qian2, Weiguo Wu1, and Tao Liu1 1

Neocomputer Institute, Dept. of Computer Science, Xi’an Jiaotong University, Xi’an, 710049, China [email protected] 2

National Lab on Software Development Environment, Beihang University, Beijing, 100083, China [email protected]

Abstract. The management of grid is differing from that of individual system and current Internet. A novel model on network management--Dynamically Constructed Network Management (DCNM) is proposed. Maybe it could give a good suggestion to management on grid. The network services and applications are treated as “soft equipment” managed by the NMS thus all the resources in the network can be managed under this model. As the function of managed objects, i.e., network services and applications etc., management functions will be dynamically constructed along with the development and deployment of variety of services and applications in the network. This makes management itself an adaptive component of the network. We also introduced our opinion on the research directions of grid management using DCNM concept at the end of this article.

1 Introduction The goals of grid are resources sharing and coordinate work. This brings new demands and problems to management. The management of grid is differing from that of individual system and current Internet. It should offer uniform management of physical resources in the grid and applications/services running on the grid. This management should not only emphasize the efficiency utilization of resources, and also consider the cost model of users which could make users obtain the economy and facility services. At the same time, to establish the management criterion of grid is important. We need guarantee from institution and interest mechanism to make the grid working stable, reliable and efficiency in a long period. But the situation on study of management of grid is that there is even not a legible concept corresponding to that of network management. The major studies are on *

This work was supported by the National Science Foundation of China (NSFC) under the grant No.90104022, and the National Key Basic Research Program of China (973 Project) under the grant No. G1999032710.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 438–441, 2004. © Springer-Verlag Berlin Heidelberg 2004

To Manage Grid Using Dynamically Constructed Network Management Concept

439

technology and method for resources management of grid [1]. And also some studies aim at demand on applications of specific grid system [2]. There are some drafts were set down by GGF [3]. In this article, we have introduced a novel model on network management. We thought it could give a good suggestion to management on grid.

2 The Dynamically Constructed Network Management 2.1 The DCNM Concept As the function of managed objects, i.e., network services and applications etc., management functions will be dynamically constructed along with the development and deployment of variety of services and applications in the network. The network services and applications are treated as “soft equipment” managed by the NMS. When a new application or service is built, the NMS will find a new soft device is accessed to the network. At the same time, the NMS will construct management functions and interfaces dynamically. Dynamically constructed network management has three marked characteristic: Applications are treated as “soft equipment” brought into the NMS. This makes network devices and network applications an overall management; As the function of managed objects, i.e., network services and applications etc., management functions will be dynamically constructed along with the development and deployment of variety of services and applications in the network. This makes the NMS a characteristic of dynamic scalability; Using active network technology to realize the dynamic distribution, deployment, storage and execution of management code makes the NMS a characteristic of dynamic These characteristics make management itself an adaptive component of the network.

2.2 The DCNM Model 2.2.1 DCNM System Model A DCNMS (dynamically constructed network management system) is formed of a six-element-set as followed: DCNMS={MO, MF, MU, MI, f, g}, in this six-elementset: MO, sets of managed objects, is denoted as Managed object sets include network resources such as network services, applications and protocols, topology, host and other hardware. MF, sets of management functions that is constructed dynamically, is denoted as is management function be constructed dynamically by the network management system. MU, sets of management system users, is denoted as We use mu(mf) to denote user mu executing management function mf.

440

Z. Luan et al.

MI, sets of dynamically constructed management interfaces, is denoted as is management interface be constructed dynamically by the network management system. f, a function relation between management interfaces, management users and management functions, namely, a function between MI, MU and MF. It indicates that the management interfaces would changed dynamically along with the variety of management users and management functions and the executions of functions by users. That is to say, toward MI, MU and MF, in this equation, g, a function relation between management functions and managed objects, namely, a function between MF and MO. It indicates that the management functions would changed dynamically along with the change of managed objects. That is to say, toward MF and MO, in this equation,

2.2.2 DCNM System Deployment Model The deployment model of the dynamically constructed network management system that is described above is formed of a four-element-set { AN, EE, AM, h }, in this four-element-set: AN, sets of active nodes, is denoted as AM, sets of management modules that can be deployed dynamically, is denoted as EE, sets of management module executive environments, is denoted as h, an operation relation inside the active network environment, namely, a relation between AN,AM and EE. It indicates that toward any management modules that can be deployed dynamically there do exist an EE on any possible active node. That is to say, toward

2.2.3 Model Signification According to the descriptions of the DCNM model we can know that two issues. On the one hand, through operation f, management interfaces become functions of management users and management functions. Management interfaces will be constructed dynamically along with the variety of the executions of functions by users. This incarnates the individuation and dynamic of management interfaces. On the other hand, through operation g, management functions become functions of managed objects. It means that management functions will be constructed dynamically along with the variety of the managed objects including devices, applications and protocols. This incarnates the association between management functions and managed objects and the scalability of management functions. The characteristics mentioned above are realized based on the deployment model. Through operation h, the management functions get the guarantee of be deployed and executed dynamically by active network technology support.

To Manage Grid Using Dynamically Constructed Network Management Concept

441

3 What Should We Do on Management of Grid? We need to define a clarity concept and intension of grid management corresponding to network management. The possible research directions are: the model and architecture of grid management, auto discover and locating mechanism of managed objects, association mechanism between management functions and managed objects, dynamic deployment mechanism of management functions, dynamic extension mechanism of management functions, dynamic construction mechanism of management interfaces and security mechanism of management system. The performance of grid and grid management is essential issue which people always give their attention. We also need to study the method on analysis and evaluation of grid and its management. The quality of grid and its management could be improved via these analysis and evaluation. We should develop grid management software based on studies on concept, model, architecture and related technology of grid management. And we also need to develop universal grid applications testing packages via the studies on method on analysis and evaluation of grid. Based on the studies mentioned above the management criterion and standard should be established.

4 Conclusions The management of grid is differing from that of individual system and current Internet. We have introduced a novel model on network management--Dynamically Constructed Network Management (DCNM). The network services and applications are treated as “soft equipment” managed by the NMS thus all the resources in the network can be managed. As the function of managed objects, i.e., network services and applications etc., management functions will be dynamically constructed along with the development and deployment of variety of services and applications in the network. This makes management itself an adaptive component of the network. We could use the DCNM concept on grid management. But there would be a lot of works to do.

References [1]

[2] [3] [4] [5]

K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, S. Tuecke. A Resource Management Architecture for Metacomputing Systems. Proc. IPPS/SPDP ’98 Workshop on Job Scheduling Strategies for Parallel Processing, 1998 http://www.doesciencegrid.org/management/ http://www.ggf.org/ogsa-wg Xu Bin, Qian Depei et al. “The Research on Active Network Management Architecture”. Journal of Computer Research and Development. Vol.39, No.4, Apr. 2002 D.Tennenhouse, JonathaCn M Smith, W David Sincoskie et al. “A Survey of Active Network Research”[J]. IEEE Communications Magazine, 1998,vol35(1): 80–86

Design of VDSL Networks for the High Speed Internet Services Hyun Yoe1 and Jaejin Lee2 1

Professor of Dept of Computer & Communication Sunchon National University , Sunchon-si, Chonnam, 540-742, Korea [email protected] 2

Korea Telecom

[email protected]

Abstract. VDSL is recently getting more attention as the efficient xDSL technology. VDSL can be used for next generation Grid computing. Because VDSL is last mile network, they designed multimedia supplying VDSL networks through combining it with core network or access network properly. Core network was designed with IP network and ATM network and simulated through COMNET. In the measure result(Bit rate, Delay) of the simulation, we knew that IP network is superior to ATM network in terms of measure result.

1 Introduction In this paper, we studied about the VDSL network’s architecture design which supply multimedia services(voice, data and video services) through combining network using the present infra from ATM or IP core network to last mile network(VDSL). And this result will be used for the network design of next generation xDSL services which is one of important components for next generation grid computing.

2 Design of VDSL Based Network In Korea, VDSL service network architecture is based on the ADSL study we design and propose a VDSL network architecture. Proposed scenarios are as follows. ATM end to end ATM is in the Access Network and IP is in the Core Network. IP is in the Access Network and ATM is in the Core Network. IP end to end In our study we focused on the first and Korea.

cases, as the seems to be deployed in

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 442–445, 2004. © Springer-Verlag Berlin Heidelberg 2004

Design of VDSL Networks for the High Speed Internet Services

443

Fig. 1. 4 types of VDSL network architecture

2.1 VDSL Network Architecture Based on All IP Networks As next generation network architecture is expected to be all IP based network, we propose all IP based VDSL network first. We show the characteristics and operation of all IP based VDSL networks.

Fig. 2. All IP based VDSL network

IP End to End Operation Procedures Video(VOD), voice and data stream is transported from each server at nodes[1]. Application server provides various services POTS (voice service) is transported to fiber optic cable and VDSL through core network because POTS is transformed in to digital in order to use all digital loop. At the EDM-PON at nodes[2], ONU provides multiplexed WDM signals as forms of several wave length ONU assigns several wave length to each subscriber. VDSL services using current coppor wire lines are provided at nodes[3] For supporting QoS, Intserv, Diffserv and RSVP are used. Diffserv is used on the connection between each domain. And each domain can communicate by using SLA. In domain, Intserv and RSVP provide QoS

444

H. Yoe and J. Lee

ATM Based VDSL Network Operation of ATM based VDSL network is same as all IP based VDSL network except QoS. In ATM based network, CBR and VBR class can be used for QoS. The architecture of ATM based VDSL Network is shown in fig. 3.

Fig. 3. ATM based VDSL network

Fig. 4. Voice services transmission rate

Fig. 5. Data services transmission rate

Design of VDSL Networks for the High Speed Internet Services

445

3 Simulation We simulate All IP and ATM based VDSL networks. Simulation is performed on the voice ; data and video traffic. From the simulation results of fig. 4, IP based network need less time to transmit arbitray size of voice traffic. For example IP based network needs only 120s for 200kbyte, but, ATM based network needs 140s. For data traffic IP based network shows better performance than ATM based network. Fig. 5 shows that for 10M bits data transmission, IP based network needs about 100s, but ATM based network needs about 140s.

4 Conclusion In this study we find out that IP based network shows better performance than ATM based network. But it’s the result from the assumption of multicast type service(for 100 nodes) so it’s not an absolute result. For real VDSL service deployment environment, case study is needed for each specific environment.

References 1. Padmanand Warrier, Balaji Kumar, “XDSL Architecture”, McGraw-Hill, pp.217–247, 2000. 2. White paper, “Hybrid Access Reconfigurable Multi-wavelength Optical Networks for IPbased Communication Systems”, Lucent Technologies Nederland B.V., Mar. 2000. 3. FSAN VDSL Working Group, “QoS and admission control for traffic streams in a Full services Network” KPN Research, Feb. 2001. 4. Position paper, “VDSL System Architecture ATM vs IP”, Bell Canada Technology & Network Development, Feb. 2000. 5. FSAN VDSL Working Group, “FS-VDSL Full services Architecture”, Next Level Communications, Mar. 2001.

The Closest Vector Problem on Some Lattices* Haibin Kan1,2, Hong Shen2, and Hong Zhu1 1

The Department of Computer Science & Engineering Fudan University, Shanghai, P.R.China 2 Graduate School and Information Science Japan Advanced Institute of Science and Technology 1-1, Asahidai, Tatsunokuchi, Ishikawa, 923-1292,Japan

Abstract. The closest vector However, we can efficiently special lattices, such as root paper, we discuss the closest than root lattices.

problem for general lattices is NP-hard. find the closest lattice points for some lattices and some ). In this vector problem on more general lattices

Keywords: Lattice, dual lattice, closest vector, root lattice.

1

Preliminaries

Let Z, Q and R denote the integer set, rational set and real set, respectively. means the real vector space, similar notations for and For any real let denote the integer closest to For linearly independent vectors in the set is called a lattice, denoted by L or is the dimension of L and is a basis of L. Let the matrix. We call B a basis matrix of The dual lattice L* consists of all vectors in the subspace of spanned by such that the inner product is an integer for each If the matrix B is a basis matrix of L, then is a basis matrix of L*. For a vector the norm or the length The shortest vector problem (SVP) is: Given a lattice L, find a nonzero lattice vector in L with the minimal length. The closest vector problem (CVP) is: Given a lattice L and a target vector find a lattice vector in L closest to SVP and CVP are very hard and the central problems of lattice theory. For let

and * This work is supported by NSF of China (No.60003007) and by Japan Society for Promotion of Science (JSPS) Research Grant (No. 14380139) M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 446–449, 2004. © Springer-Verlag Berlin Heidelberg 2004

The Closest Vector Problem on Some Lattices

447

Clearly, and are lattices and are called root lattices. In [3] or [4], the closest vector problem for and their duals was solved in a very simple and efficient method. In this paper, we discuss the closest vector problem on more general lattices than roots, and generalize some results in [3].

2

The Closest Vector Problem for More General Lattices

In this section, we discuss the closest vector problem for the lattices and For any positive integer and vector define

and

Clearly and are both integer lattices. Furthermore, if then and if then Now, we describe an algorithm to find the closest lattice point for Here, We use the notation for any real define Algorithm1 : Input a vector and a positive integer (W.l.o.g., we can assume Step 1. Let Arrange and get the sequence Step 2. Compute and where Step 3. If return Step 4. If let where and for On the other hand, let where and for Compute the distances and If return else return The above algorithm also works for if we view subscript operations in Step 4 are all Because we make two smallest changes on the integer vector in order that the resulting vectors are in so the above algorithm solves CVP for Thus, we have the following theorem. Theorem 1. Algorithm 1 solves the closest vector problem for When the Algorithm 1 is exactly the algorithm for Example 1. Let and and Clearly,

in [3]. Then

448

H. Kan, H. Shen, and H. Zhu

Hence and closest lattice point to in The following lemma is clear and we omit its proof. Lemma 2. where

and its dual

Therefore,

have basis matrices B and

is the

respectively,

and

The following lemma is to determine all coset elements of the quotient group in order to solve CVP for Lemma 3. There exists a basis Furthermore, the basis the following proof.

of such that can be easily computed as

Proof. Clearly, The key task is to determine all element of the quotient group Since is a sublattice of there exists an integer matrix A such that where B and are the basis matrices of and as (1) and (2), respectively. Since we get So

Now, we want to find a unimodular matrix T such that AT is an upper triangular matrix. It is not difficult to get the following upper triangular matrix AT by direct computing:

The Closest Vector Problem on Some Lattices

So

449

(we can also write the expression of T firstly). Therefore, Since T is the unimodular matrix, is a basis of

Let

By the upper triangular structure, we get So we have the following corollary. Corollary 4. The closest vector problem for

can be efficiently solved.

Now we simply discuss the CVP for the lattice where For any vector it is easy to compute the projection of in the plane where to a target vector target vector satisfying

So to find the closet lattice point in we can always assume the Let

If

then is exactly the closest point in to the target vector But if we should make the slightest changes on so that the resulting vector, said is in This is equivalent to find an integer vector such that and the norm isminimal.

References 1. L. Babai, On Lovasz lattice reduction and the nearest lattice point problem. Combinatorica, 6(1986),1-13. 2. Van Emde Boas, Another NP-complete problem and the complexity of computing short vectors in a lattice, Report 81-04, University of Amsterdam. 3. J. H. Conway and N. J. A. Sloane, Fast quantizing and decoding algorithms for lattice quantizers and codes, IEEE Transactions on Information Theory, 28(1982),227232. 4. J. H. Conway and N. J. A. Sloane, Sphere Packings, Lattices, and Groups, 3rd. New York: Springer-Verlag, 1998. 5. A.K. Lenstra, H.W. Lenstra, Jr and L. Lovasz, Factoring polynomials with rational coefficients. Math. Ann. 261(1982),513-534. 6. D. Micciancio and S. Goldwasser, Complexity of Lattice Problems: a Cryptographic Perspective, Kluwer Academic Publishers, 2002. 7. C.P. Schnorr, A hierarchy of polynomial time lattice basis reduction algorithms. Theoret. Comput.Sci., 53(1987),201-224.

Proposing a New Architecture for Adaptive Active Network Control and Management System Mahdi Jalili-Kharaajoo, Alireza Dehestani, and Hassan Motallebpour Iran Telecommunication Research Center (ITRC) [email protected], {dehstani, motalleb}@itrc.ac.ir

Abstract. In this paper, the general architecture of adaptive control and management in active networks is presented. The proposed Adaptive Active Network Control and Management System (AANCMS) merges technology from network management and distributed simulation to provide a unified paradigm for assessing, controlling and designing active networks.

1 Introduction Active Networking (AN) is an emerging field which leverages the decreasing cost of processing and memory to add intelligence in network nodes (routers and switches) to provide enhanced services within the network [1,2]. The discipline of active networking can be divided into two sub- fields: Strong and Moderate AN. In Strong AN, users inject program carrying capsules into the network to be executed in the switches and routers. In Moderate AN, network provides provision code into the routers to be executed as needed. This code can provide new network based services, such as active caching and congestion control, serve as a mechanism for rapidly deploying new protocol versions, and provide a mechanism to monitor, control, and manage networks [3]. The most significant trends in network architecture design are being driven by the emerging needs for global mobility, virtual networking, and active network technology. The key property common to all these efforts is adaptability: adaptability to redeploy network assets, to rewrite communication rules, and to make dynamic insertion of new network services1 a natural clement in network operations. In this paper, an Adaptive Active Network Control and Management System (AANCMS) is proposed. Our architecture is designed to actively control, monitor, and manage both conventional and active networks, and be incrementally deployed in existing networks. The AANCMS is focused on an active monitoring and control infrastructure that can be used to manage networks. The rest of the paper is organized as follows: in Section 2 we explain the architecture of an active network node. Section 3 presents the basic structure of AANCMS. Some comments on distributed simulation 1

In this paper, the term network service refers to a resource made available through the network that provides a well-defined interface for its utilization.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 450–454, 2004. © Springer-Verlag Berlin Heidelberg 2004

Proposing a New Architecture for Adaptive Active Network Control

451

are made in Section 4. Section 5 presents some possible future works on this subject. Finally, the paper is concluded in Section 6.

2 Active Network Node Architecture Active networking technology signals the departure form the traditional store-andforward model of network operation to a store-compute-and-forward mode. In traditional packet switched networks, such as the Internet, packets consist of a header and data. The header contains information such as source and destination address that is used to forward the packet to the next element that is closer to the destination. The packet format is standardized and processing is limited to looking up the destination address in the routing tables and copying the packet to the appropriate network port. In active networks, packets consist not only of header and data but also of code. This code is executed on the active network element upon packet arrival. Code can be as simple as an instruction to re-send the packet to the next network element toward its destination, or perform some computation and return the result to the origination node. Additionally, it is possible for these packets to install code whose lifetime exceeds the time that is needed for the active packet to be processed. Software modules that are installed in this fashion are called active extensions. Active extensions facilitate for software upgrades, new protocol implementations, system and network monitoring agents. Other potential applications that need control functionality to be installed on demand are also made possible.

3

AANCMS Structure

In addition to work in the active network community, new standards are being proposed to assist in the distribution and maintenance of end-user applications [4-6]. While the trend toward adaptable protocol and application-layer technologies continues, the control and assessment of such mechanisms leaves open broader questions. Future networks could greatly benefit from simulation services that would allow network engineers to experiment with new network technologies on live network operations, without compromising service. Live traffic-based simulation services would provide engineers insight into how a proposed alteration would affect a network, without committing the network to potentially disruptive consequences. Finally, the management of adaptive networks would greatly benefit from sophisticated monitoring tools to help assess the effects of runtime alterations and detect when those effects result in activity outside a boundary of desired operation. AANCMS is intended to streamline and, at the same time, enrich the management and monitoring of active networks, while adding new support to the network management paradigm to assist network designers. The AANCMS is pursuing a unified paradigm for managing change in active network computing environments. Underlying this framework is a conceptual model for how elements of technology from network management, distributed simulation, and active network research can be combined under a single integrated environment. This conceptual model is illustrated Fig. 1.

452

M. Jalili-Kharaajoo, A. Dehestani, and H. Motallebpour

Fig. 1. Conceptual Framework of AANCMS.

AANCMS gains from discrete active networking the ability to dynamically deploy engineering, management, and data transport services at runtime. AANCMS leverages this capability with network management technology to (1) integrate network and system management with legacy standards (SNMP, CMIP) to provide a more flexible and scalable management framework, (2) dynamically deploy mechanisms to collect network statistics to be used as input to network engineering tools and higher-level assessment tools, and (3) assist network operators in reacting to significant changes in the network. AANCMS targets an active network environment, where powerful design and assessment capabilities are required to coordinate the high degree of dynamism in the configuration and availability of services and protocols. To this end, we have formulated architecture of a network management and engineering system that, while inheriting some components from current NM technology, introduces distributed simulation as an additional tool for design and performance assessment. Some components of the AANCMS architecture map very well to already existing technology. Recognizing this, the architecture has been explicitly designed to accommodate other network management engineering solutions. The AANCMS architecture is divided into data, assessment, and control layers. Fig. 2 shows how the data and information flow through the layers. The data layer operates at the data packet level and offers a set of services for the manipulation of network data. The assessment layer performs analytical reviews of network behavior to extract relevant semantic information from it. The control layer performs higher-order functions based on expert knowledge. The AANCMS architecture has been designed to reuse and integrate software components derived from significant advances in network alarm correlation, fault identification, and distributed intrusion detection. In particular, the assessment and control layers of the AANCMS architecture perform tasks analogous to alarm correlation and fault analysis of the types currently proposed by network management expert systems [7,8]. All the components constituting these logical layers may be independently deployed and configured on machines throughout the network using

Proposing a New Architecture for Adaptive Active Network Control

453

Fig. 2. AANCMS architecture.

common system management support. The implementation of each of these logical layers may use (1) existing non-active technology properly fitted to be dynamically deployed (thus implementing the discrete active networking approach) or (2) new active networking technology. AANCMS may distribute data-layer services on machines across domains,2 but deploys assessment and control layer services in machines within the domain they manage. Several control services may then cooperate at the inter-domain level to exchange information for making better control decisions about their respective domains.3 The following sections describe the data layer, which embodies the most innovative features of our architecture. The assessment and control layer will not be further discussed in this paper. The foundation of the AANCMS architecture is the data layer, which is composed of engineering, monitoring, and data transport services. Although presented as a single layer, it is useful to recognize and distinguish the various modules that may populate this layer. For this reason, we decompose the data layer into three distinct data service types, all of which may benefit from dynamic deployment in the network.

4

Distributed Simulation

Adaptable and configurable networks will require code repositories to store and retrieve deployable applications. This idea has already appeared in several network management designs where deployable monitors can be dynamically inserted to key points in a network. Under AANCMS we are reusing and extending these concepts in the development of generic and reusable simulation models, which are deliverable as part of an AANCMS simulation service. In particular, we are developing simulation models that allow network engineers to compose and design experiments dynamically, which may then use traffic models derived form network traffic 2 In this context, a domain consists of a collection of software and hardware objects managed by a single administrative authority. 3 A discussion about inter-domain information exchange between control services is beyond the scope of this paper.

454

M. Jalili-Kharaajoo, A. Dehestani, and H. Motallebpour

observed from spatially distributed points in a network. The traffic models may be (more traditionally) derived at the NM station and then re-exported to the simulation nodes or derived in the network itself through a distributed modeling approach (i.e. deploy a specialized monitoring application that creates and feeds the models to the network engineering services).

5

Conclusion

In this paper, an adaptive control and management system for active networks (termed AANCMS) was proposed. In AANCMS, network monitoring, control, and design can coexist in an integrated paradigm. The synergy of combining distributed simulation, network monitoring, and active networking will dramatically increase the power of network management and engineering.

References [1] [2] [3] [4]

[5] [6] [7] [8]

D. L. Tennenhouse and D. J. Wetherall. Towards and active network architecture. ACM Computer Communication Review, 26(2):5–18, Apr. 1996. K. L. Calvert, S. Bhattacharjee, E. Zegura, and J. P. Sterbenz. Directions in active networks. IEEE Communications Magazine, 36(10), Oct. 1998. A. W. Jackson, J.P.G. Sterbenz, M. N. Condell, R. R. Main. Active network monitoring and control, Proc. DARPA Active Networks Conference and Exposition (DANCE.02), 2002. A. van Hoff, J. Giannandrea, M. Hapner, S. Carter, and M. Medin. The HTTP Distribution and Replication Protocol. http://www.marimba.com/standards/drp.html, August 1997. A. van Hoff, H. Partovi, and T. Thai. Specification for the Open Software Description (OSD) Format. http://www.microsoft.com/standards/osd/, August 1997. S. Crane and N. Dulay and H. Fossa and J. Kramer and J. Magee and M. Sloman and K. Twidle, Configuration Management For Distributed Software Services, Integrated Network Management IV, 1995. I. Rouvellou and G. W. Hart, Automatic Alarm Correlation for Fault Identification, INFOCOM ’95. P. A. Porras and P.G. Neumann, EMERALD: Event Monitoring Enabling Responses to Anomalous Live Disturbances, Proceedings of the National Information Systems Security Conference, Baltimore, MD, 1997.

A Path Based Internet Cache Design for GRID Application Hyuk Soo Jang1, Kyong Hoon Min3, Wou Seok Jou 2, Yeonseung Ryu 1 , Chung Ki Lee1, and Seok Won Hong1 1

3

Department of Computer Software, MyongJi University San 38-2, Yong In, KyungGi, Korea 2 Department of Computer Engineering, MyongJi University Samsung Electronics Co. Ltd, Core Lab.(WCDMA), Suwon, Korea

Abstract. Internet users are most likely opening multiple windows and surfing several sites concurrently with frequent site changes within a relatively short period of time. This work proposes a web cache organization algorithm, which can satisfy the frequent site changes effectively with low cost. The algorithm is based on the collected statistics of the visited sites and the pattern analysis of the site change. Our study suggests that the proposed path based cache scheme outperforms the existing algorithms in the hit ratio and response time dramatically.

1

Introduction

Most of the current web cache mechanisms are based on the hierarchical structure like the CERN and Harvest/Squid [1]. Each host needs to specify a cache server in advance, while the cache servers are in general hardwired each other hierarchically [2,3,4]. The cache needs to be organized to adapt the dynamic usage patterns of the current internet users, who are likely to open multiple windows and visit several sites concurrently with frequent site changes within a short period of time. Therefore, we need to build a new cache organization algorithm to adapt the dynamic site changes of the users. We collect and analyze requested URLs, visited site sequences and routed paths in the client-server GRID environments. The analysis result shows that most of the requests traverse the same routes to a certain branching point. Only a handful number of different paths exist thereafter. We categorize them into a few groups based on the traversing path. Then, each cache or partition of a cache is redesigned to take care of each group.

2

A New Web Cache Organization Algorithm

This paper finds out that many URL requests follow a common route until they reach a certain point. Neighboring organizations running similar application like GRID are likely to use the same URLs, ISPs and routes. While the number of ISP are relatively small and the requested URL routes are in most cases directed to a M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 455–458, 2004. © Springer-Verlag Berlin Heidelberg 2004

456

H.S. Jang et al.

few ISPs, the routing paths are not so plenty that it is rather easy to categorize them into a few groups based on the common path. Our study shows that many internet users request similar URLs regardless of their locations, when the age, interest and education level of the users are homogeneous in similar application.

Fig. 1. A proposed web cache algorithm

The suggested algorithm is depicted in the figure 1. In the first phase, data are collected to find out the requested URLs in an organization. In the second phase, the requested URLs are sorted based on the routing paths. The URLs sharing common paths are categorized into a group and those remaining URLs not even belonging to a certain group can be put into another group. Then all the URLs are classified into several groups. We need to count the number of requests per group to decide which group is heavily referred. A cache (or a portion of a cache) is favorably allocated to the heavily referred group. In the third phase, the actual test operation is done according to the users’ URL requests. The request is directed to the corresponding cache according to the URLs.

3

Performance Analysis of the Algorithm

We test the algorithm on two closely located sites, called “N” and “S”, running similar business application. Three computers of the “N” and the “S” are selected randomly and the history data of the web browsers are analyzed so that each “N” or “S” uses almost identical path until 168.126.109.9 router as shown in

A Path Based Internet Cache Design for GRID Application

Fig. 2. Routing paths and statistics

Fig. 3. Experiment results based on scenario 1 and 2

Fig. 4. Hit ratio based on the scenario 2 and 3

457

H.S. Jang et al.

458

the figure 2(a) and then several routes exist thereafter. Based on the 16,000 and 14,000 collected data from the “N” and the “S” respectively, we found that there are a few routing paths which most of the requests follow as shown in the figure 2(b). Four most visited routing paths can cover 68% of the whole requests and the most preferred path takes care of 30% of the entire requests. The performance analysis is done based on the following three scenarios: (I) in case of no cache, (II) when each “N” or “S” has its own cache, (III) when a single cache is shared by both “N” and “S”. In the scenario I, the “N” or “S” does not have a cache server in an institutional level, but uses its local cache in the PC as a web browser’s cache. A cache server is used in the scenario II and III. The size of the cache is measured based on the number of URLs to be stored. The hit ratio is increased in the scenario II compared with the scenario I as shown in the figure 3(a). The performance of the “N” looks better than the “S”, but the results are not conclusive. The response time is proportionally reduced as the hit ratio is increased as depicted in the figure 3(b). The performance comparison is done for the scenario 2 and 3 in the figure 4 and the hit ratio is not decreased even in the case of sharing a single cache server. It suggests that we can save the total cost if the sites share a cache server.

4

Results and Future Work

This paper shows an algorithm to design a cache system based on the routing paths of the requested URLs. The performance analysis shows that the new algorithm performs well in response time and hit ratio even in the fast site changes of the users. Also, the hit ratio is not degraded when a cache server is shared by multiple sites.

References 1. K. Claffy D. Wessels. Icp and the squid web cache. IEEE Journal, Apr. 1998. 2. J. Almeida L. Fan, P. Cao. Summary cache: A scalable wide-area web cache sharing protocol. ACM, 1998. 3. K.W. Ross. Hash routing for collection of shared web cache. IEEE Network, Nov. 1997.

4. E. W. Zegura S. Bhattacharjee, K. L. Calvert. Self-organizing wide-area network caches. IEEE INFOCOM, 1998.

On the Application of Computational Intelligence Methods on Active Networking Technology Mahdi Jalili-Kharaajoo Young Researchers Club of Aazad University and Iran Telecom Research Center [email protected]

Abstract. In this paper, we report on the characteristics of Computational Intelligence (CI) technologies, their synergy and on outline recent efforts in the design of a computational intelligence toolkit and its application to routing on a novel active networking environment.

1

Introduction

The events in the area of computer networks during the last few years reveal a significant trend toward open architecture nodes, the behavior of which can easily be controlled. This trend has been identified by several developments [1,2] such as: Emerging technologies and applications that demand advanced computations and perform complex operations Sophisticated protocols that demand access to network Resources Research toward open architecture nodes Active Networks (AN), a technology that allows flexible and programmable open nodes, has proven to be a promising candidate to satisfy these needs. AN is a relatively new concept, emerged from the broad DARPA community in 1994–95 [1,3,4]. In AN, programs can be “injected” into devices, making them active in the sense that their behavior and the way they handle data can be dynamically controlled and customized. Computational Intelligence (CI) techniques have been used for many engineering applications [5,6]. CI is the study of the design of intelligent agents. Due to highly nonlinear behavior of telecommunication systems and uncertainty in the parameters, using the CI and Artificial Intelligence (AI) techniques in these systems has been widely increased in recent years [7,8]. However, these techniques never really made it into production systems for two basic reasons: the one, which we already mentioned above, is that up to now the primary concern was to address infrastructural issues and algorithmic simplicity, and secondly, researchers hardly had the opportunity to implement their work on real networking equipment. In this paper, the application of CI and AI techniques for active networks technology will be studied. CI can be employed to control prices within the market or be involved in the decision process.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 459–463, 2004. © Springer-Verlag Berlin Heidelberg 2004

460

2

M. Jalili-Kharaajoo

Active Networks

Active networking technology signals the departure form the traditional store-andforward model of network operation to a store-compute-and-forward mode. In traditional packet switched networks, such as the Internet, packets consist of a header and data. The header contains information such as source and destination address that is used to forward the packet to the next element that is closer to the destination. Apart from obvious practical advantages, there are several properties, which make active networks attractive for the future of global networking as a form of agreement on network operation for interactions between components that are logically or physically distributed among the network elements. A number of reasons have been contributing to a very long standardization cycle, as observed in the activities of the Internet Engineering Task Force. Most importantly, the high cost of deploying a new function in the infrastructure, required extreme care and experimentation before the whole community would to agree that a standardized protocol or algorithm is good enough. The key component enabling active networking is the active node, which is a router or switch containing the capabilities to perform active network processing.

3

Computational Intelligence

Computational Intelligence (CI) is an area of fundamental and applied research involving numerical information processing (in contrast to the symbolic information processing techniques of Artificial Intelligence (AI)). Nowadays, CI research is very active and consequently its applications are appearing in some end user products. The definition of CI can be given indirectly by observing the exhibited properties of a system that employs CI components: A system is computationally intelligent when it deals only with numerical (lowlevel) data, has a pattern recognition component, and does not use knowledge in the AI sense; and additionally, when it (begins to) exhibit computational adaptivity; computational fault tolerance; speed approaching human-like turnaround; error rates that approximate human performance. The major building blocks of CI are artificial neural networks, fuzzy logic, neurofuzzy systems and evolutionary computation. In the following, these topics will be briefly reviewed.

4

Application of Computational Intelligence in Active Networks

In the previous sections, we described in summary the foundations of computational intelligence techniques as a set of tools for solving difficult optimization problems, and active networking as a novel networking infrastructure technology. In this section, we will first elaborate on the features of active networking that make it attractive for applying computational intelligence techniques, since the problems that

On the Application of Computational Intelligence Methods

461

we want to solve are optimization problems, the implementation domain needs to be mapped to the optimization domain. We developed a market-based approach to controlling active network resources that naturally bridged decision and optimization. Based on this, we then provide a number of simple problems that can be effectively dealt within this framework. We also discuss implications and a large set of problems that can be dealt with in a similar fashion. The features of active networks that are appealing to us in our attempt to utilize computational intelligence techniques are mainly:

Programmability Mobility Distributivity

4.1 Application of Computational Intelligence to Providing an Economic Market Approach to Resource Management In this section, market-based resource management architecture for active networks will be adopted. The control problem is cast as a resource allocation problem, where resources are traded within the computational market by exchange of resource access rights. Resource access rights are simply credential that enables a particular entity to access certain resources in a particular way. These credentials result from market transactions, thus transforming an active network into an open service market. Dynamic pricing is used as a congestion feedback mechanism to enable applications to make policy controlled adaptation decisions. This system focuses on the establishment of contracts between user and application as well as between autonomous entities implementing the role of resource traders is emphasized. While this architecture is more oriented to end systems, the enhancements and the mechanisms described are well suited for use in active networks; Market-based control has also been applied to other problems such as bandwidth allocation, operating system memory allocation and CPU scheduling. The control problem can be redefined as the problem of profit or utility maximization for each individual player. Access rights are made available by agents called resource brokers, which set the rules of the market by controlling price and monitoring availability. This model is used to facilitate trading of services such as connectivity, bandwidth and CPU on the active network elements without placing questionable assumptions on the cooperative or competitive motives of the individual agents that populate the network elements without placing questionable assumptions on the cooperative or competitive motives of the individual agents that populate the net work elements. The model and the components of this resource management framework are candidate for employing CI for optimization based on price and utility. CI can be used for both setting prices as well as taking decisions based on price and other parameters that might be required by a particular service.

462

M. Jalili-Kharaajoo

4.2 Application of Computational Intelligence for Routing Routing is one of the most fundamental and at the same time most complex control problems in networking. Its function is to specify a path between two elements for the transmission of a packet or the set-up of a connection for communicating packets. There are usually more than one possible paths and the routing function has to identify the one that is most suitable, taking into consideration factors such as cost, load, connection and path characteristics etc. In best effort networks, such as the Internet, routing is based on finding the shortest path, where the shortest path is defined as the path with the least number of “hops” between source and destination. For more advanced network services, such as the transmission of multimedia streams that require qualitative guarantees, routing considers factors such as connection requirements (end-to-end delay, delay variation, mean rate) and current (or future) network conditions. Furthermore, the information available to the decision process might be inaccurate or incomplete. Given the above, routing becomes a complex problem, with many aspects, including the perspective of a multi-objective optimization problem. Maximization of resource utilization or overall throughput, minimization of rejected calls, delivery of quality of service guarantees, faulttolerance, stability, security consideration for administrative policy are just a few of the properties that are requirements for an acceptable solution. Issues of active organization and an approach for quality of service routing restricted to the case of routing connections with specific bandwidth requirements. Our goal here is to address the routing problem using CI. The solution we propose involves the following components: Roaming agents are moving from element to element to collect and distribute information on network state. Routing agents, at each network element, are responsible for spawning roaming agents and are also the recipients of the information collected by them. The CI engine is a set of active extensions that include several subcomponents. These subcomponents from a generic library-like algorithmic infrastructure. The components we have currently implemented for the CI engine are: An Evolutionary Fuzzy Controller (EFC) A Stochastic Reinforcement Learning Automation (SELA) An Evolutionary Fuzzy Time Series Predictor (EFTSP) New components and improved algorithms can easily be added to the architecture. With respect to policy, the economy acts as a form of policy enforcement. Prohibited actions, for example sending roaming agents through a network cloud that does not allow that, can be simply implemented by having a high (or infinite) cost active packet forwarding service. This has two aspects: the ability of the system to run without administrator/owner/operator privileges and the ability of the roaming agents to reconstruct routing tables in an otherwise collapsed infrastructure. Of course, there might be a high cost for this process but there are obvious cases (for example medical applications) that fault-tolerance is worth paying for. Most of the above features cannot be easily attributed to CI or active networks. Adaptivity however is a clear benefit from the application of CI. Further investigation might reveal further benefits

On the Application of Computational Intelligence Methods

463

from this approach; however, at the current stage the above are the first positive indications of the success of our approach.

5

Conclusion

In this paper, some applications of computational intelligence techniques in active networking technology were presented. One of these possible applications is the novel approach to routing in active networks. Another interesting application area is in problems that have a lower feedback cycle, such as traffic shaping and policing. This is our belief that this area of research is open area and this work is a stand stone to tackle the more complex problems.

References [1] [2] [3] [4] [5] [6] [7] [8]

D. L. Tennenhouse et al., A Survey of Active Network Research, IEEE Commun. Mag., vol. 35, no. 1, Jan. 1997, pp. 80–86. R. Boutaba, A. Polyrakis, Projecting Advanced Enterprise Network and Service Management to Active Networks, IEEE Network • January/February. pp.28-33, 2002. K. Psounis; Active Networks: Applications, Security, Safety, and Architectures, IEEE Commun. Surveys, vol. 2, no. 1, 1st qtr. 1999. J. M. Smith, Activating Networks: A Progress Report, Camp., vol. 32 4, Apr. 1999, pp. 32–41. W. Pedrycz, Computational Intelligence: An Introduction, CRC Press, Boca Raton, 1997. M. Jalili-Kharaajoo and H. Ebrahimirad, Improvement of second order sliding mode control applied to position control of induction motors using fuzzy logic, Lecture Notes in Artificial Intelligence (Springer Verlag), Proc. IFSA2003, Istanbul, Turkey, 2003. Ascia, G., Catania, V., Ficili, G., Palazzo, S., and Panno, D. (1997). A VLSI fuzzy expert system for real-time traffic control in ATM networks. IEEE Transactions on Fuzzy Systems, 5(1), pp.20–31, 1997. J. Bigham, L. Rossides, A. Pitsillides and A., Sekercioglu, Overview of Fuzzy-RED in Di.-Serv Networks, LNCS 2151, pp.1-13, 2001.

Grid Computing for the Masses: An Overview Kaizar Amin1,2, Gregor von Laszewski1, and Armin R. Mikler2 1

Argonne National Laboratory, Argonne, IL, U.S.A University of North Texas, Denton, TX, U.S.A

2

Abstract. The common goals of the Grid and peer-to-peer communities have brought them in close proximity. Both the technologies overlay a collaborative resource-sharing infrastructure on existing (public) networks. In realizing this shared goal, however, they concentrate on significantly contrasting issues. The Grid paradigm focuses on performance, control, security, specialization, and standardization. On the other hand, the peer-to-peer paradigm concentrates on fault tolerance, resilience, decentralization, and peer cooperation. In this paper, we discuss Grid usage models including traditional Grids, ad hoc Grids, and federated Grids. We comparethese approaches to peer-to-peer computing and discuss the issues involved in the convergence of the two paradigms.

1 Introduction The term “Grid computing” [1,2] is commonly used to refer to a distributed infrastructure that promotes large-scale resource sharing in a dynamic multi-institutional “virtual organization” (VO). A computational Grid is conceptually based on the principles of an electric power Grid. A large number of electric power generating plants interconnect with one another, providing standardized, reliable, cheap, and ubiquitous access to electric power. Similarly, a computational Grid forms a closed network of a large number of pooled resources providing standardized, reliable, specialized, and pervasive access to high-end computational resources. Typically, in order to establish a computational Grid, several institutions pool their resources such as computational cycles, specialized software, database servers, network bandwidth, and people. Thereafter, global policies for the VO are established that identify the role and responsibilities of participating entities. Well-trained professional administrators associated with the participating institutions enforce the global VO and local domain policies. Based on these policies, the Grid administrators provide security credentials to the Grid users, who can access the distributed Grid resources within the scope of their credentials irrespective of their geographical positions. Several applications and infrastructures have been proposed in the literature that can significantly benefit from the Grid concept [1]. Applications that can leverage from more than one supercomputer can benefit from a distributed supercomputer created as a Grid by pooling several supercomputers. Applications containing “pleasantly parallel” subtasks can take advantage of the Grid to co-allocate a large number of distributed compute resources in parallel [3,4]. Data-intensive applications can use specialized data stores and replica systems available in the Grid to store and retrieve large number of M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 464–473, 2004. © Springer-Verlag Berlin Heidelberg 2004

Grid Computing for the Masses

465

datasets. Advanced collaborative applications [5] can use the interactive feature of the Grid to provide an enhanced human-to-human interaction. Irrespective of the application, these Grid infrastructures have some common characteristics [6]. A computational Grid is “collaborative”. It comprises heterogeneous resources that are managed by more than one entity. Such a distributed pooling of resources within the same institution or across multiple institutions requires significant collaboration among the participating entities. The Grid architecture respects the institutional policies of its collaborators by giving such policies preference over the global Grid policy. Such an enforcement not only allows collaborating institutions to protect intellectual property but also provides enough flexibility to allow other institutions to participate within the Grid. A computational Grid provides non-trivial “quality of service” (QoS) assurances. High connectivity is maintained between resources via dedicated high-speed networks. Further, the Grid services themselves offer advanced high-level functionality that enables sophisticated science and commerce. A well-established Grid administration by dedicated human-resources facilitates constant connectivity, monitoring, and fault tolerance in a Grid. The Grid architecture is “standardized”. In the early days of the Grid, the lack of standards for Grid service development resulted in several non inter-operable Grid middleware frameworks [7,8,9,3]. Recently, however, the Global Grid Forum (GGF) [10] has assumed the responsibilities for coordinating the standardization of Grid developments. The Open Grid Services Architecture (OGSA) [11] initiative of the GGF defines the artifacts for a standard service-oriented Grid framework based on the emerging W3Cdefined Web services technology [12]. A service-oriented Grid framework provides a loosely coupled technology- and platform-independent integration environment allowing different vendors to offer Grid-enabled services in a variety of technologies, yet conforming to the GGF-defined OGSA standards and thus making them inter-operable.

2 Grid Usage Models In this section we discuss the usage model attributed to contemporary Grid frameworks. We also explore other usage patterns that can enhance the benefits of the Grid to a larger community.

2.1 Traditional Grids The popularity of the Grid architecture is evident from the large number of advanced scientific and commercial projects that are deploying the Grid framework. Some success stories include the DOE Science Grid [13], the European Union DataGrid [14], the Grid Physics Network (GriPhyN) [15], the Information Power Grid [16], the National Fusion Grid [17], the National Research Grid Initiative [18], the Network of Earthquake Engineering Simulation (NEES) Grid [19], the Particle Physics Data Grid [20], and the TeraGrid [21].

466

K. Amin, G. von Laszewski, and A.R. Mikler

A prominent characteristic of each of these Grid applications is that it constitutes a “closed” network of resources. For example, the NASA Information Power Grid is available only to engineers and scientists employed by NASA for official business and research. Similarly, GriPhyN is open only to experimental physicists. In other words, the current Grid usage model caters to the needs of certain “classes”. Unless one belongs to a research or commercial organization, it is quite difficult to get access to one of the high-performance Grid infrastructures. Further, the administrative overhead involved in the initial Grid setup makes it non-trivial for an individual or small organization to establish personal Grids at will. Proponents of the current usage model justifiably argue that the philosophy of these collaborative Grids is to support the computational and data-intensive needs of an elite few, rather than providing Grid access to the “masses”. The tightly controlled administrative mechanism enables Grid service providers to offer the promised QoS guarantees. Further, contemporary Grids have a highly segregated role-based usage [1]. An individual interacting with the Grid can be conveniently categorized as one of the following: service provider, service developer, administrator, or end-user [22]. Although there can be some degree of overlap between associated roles, an extreme mixture of these roles is explicitly avoided. Such a well-defined role-based interaction facilitates a separation of concerns, allowing the end-user to concentrate on science or commerce, the service provider to concentrate on QoS assurances, the service developer to concentrate on Grid protocols, and the administrator to focus on enforcement of access control and organizational policies.

2.2 Ad Hoc Grids Even though a tightly controlled Grid framework provides the required QoS parameters, it is highly restrictive in expanding its usage beyond the traditionally proposed model. As discussed previously, it is non-trivial for individuals not belonging to advanced scientific, academic, or commercial institutions with a Grid-vision to collaborate with fellow peers at random in a Grid environment. In other words, the current Grid usage model does not facilitate “ad hoc” Grid establishment [23]. Advocates of the conventional Grid architecture may reasonably argue that, similar to the power Grid, the computational Grid provides persistent and reliable service to its users. However, a large number of scenarios require transient, short-lived collaboration that needs to be supported by the Grid [24]. For example, consider the following case. A group of geographically separated scientists require ad hoc short-term collaboration and resource sharing in a secure environment to evaluate different experimental simulations of a thermochemistry application. One scientist contributes the simulation service, one pools a visualization service to render the results of the simulated experiment, another scientist provides the data repository storing the input datasets for the experiment, and a few others want to interactively discuss the final results in an educational setting. Although simple, this example is representative of a large class of collaborative applications developed as a part of multi-domain sciences. To implement such an application using the current usage model, the scientists would need to formally establish a Grid virtual organization (VO) defining appropriate use policy and describing individual contributions and responsibilities. The VO would then

Grid Computing for the Masses

467

assign administrative privileges to a dedicated entity, who would then create Grid credentials for every other entity within the VO. All of the participating entities would need to support the appropriate Grid middleware [25] and expose their services as a part of this middleware. Once the administrative functions were performed, each user would be able to interact with the Grid within the context of his own rights. Although this setup provides the required functionality, the administrative overhead to establish such a short-term community (possibly one-time collaboration) surpasses its utility. Clearly, there is a need for a set of Grid tools to realize such an ad hoc Grid, thereby increasing the user base within the Grid community.

2.3 Federated Grids Motivated by the success of volunteer computing architectures such as seti@home [26], distributed.net [27], and grid.org [28], we further extend the concept of ad hoc Grids to form “federated Grids”. A federated Grid is a generic Grid architecture where a resource contribution is not limited to active collaborators alone. Individuals can pool their resources (idle computational cycles) to enhance the computational power of the Grid. One of the greatest problems with the current volunteer computing model is the lack of motivation to encourage the “masses” to contribute their resources. Success of some initial applications can be attributed to the “cool” factor of the underlying science. However, without providing sufficient incentive to the resource contributor, this does not constitute a viable economic model. Further, these architectures represent a master/slave paradigm. A single privileged master has the authority to attribute independent tasks to a large number of slaves. The slaves do not have the appropriate authorization to submit their compute-intensive tasks to the pool of available resources. In a federated Grid framework, we can overlay an economic model on the Grid, providing sufficient incentive to the resource providers to contribute their resources. Further, we shift from the master/slave model of volunteer computing to a more flexible model allowing resource providers to also become resource consumers. For example, commercial institutions can contribute their idle resources to a federated Grid in return for computational power on demand at a later time. This paradigm exposes the Grid to an important research domain of usage economics and brokering [29]. A similar ideology has been adopted in electrical power Grids with great success [30,31]. Sophisticated solar and wind electricity generation devices are installed in homes and other establishments. Advanced control mechanisms feed all excess electricity generated by these devices into the power Grid. When the system does not produce enough power, electricity is extracted from the Grid. The users pay only for the net electricity used by them. Such an economic use-case serves as a model for computational Grids too. On a larger scale, one can envision a universal version of federated Grids as ubiquitous “public Grids” or “pervasive Grids”. The concept of public Grids in computing is analogous in principle to the vision of the Internet in information science. However, the degree of complexity in public Grids is extremely high when compared with that of the contemporary Internet. Users could access the public Grids ubiquitously as service providers or consumers or both. Further, these users could dynamically establish virtual

468

K. Amin, G. von Laszewski, and A.R. Mikler

organizations in an ad hoc fashion, thereby overlapping the functionalities of traditional as well as ad hoc Grids. Based on their personal credentials and preferences, peers could participate in these ad hoc transient groups enabling collaborative resource sharing.

3 Grid and Peer-to-Peer Computing The term peer-to-peer (p2p) computing is used to refer to an ad hoc, dynamic, unstable, and self-organizing distributed model that assists in collaborative community formation and resource sharing at the edge of the network. In a p2p environment, there is no distinction between a resource user (client) and a resource provider (server); each is commonly referred to as a “peer”. Several aspects of ad hoc Grids and federated Grids have already been showcased within the p2p community. Ad hoc collaborative file-sharing applications such as Napster [32] and Gnutella [33] have been successfully implemented and widely used by a large p2p community. However, ad hoc Grids focus on issues that go beyond file-sharing mechanisms and addresses aspects such as advanced security, trust and reputation management, and quality-of-service assurances. As discussed in Section 2.3, several volunteer computing frameworks such as seti@home and distributed.net have demonstrated the usefulness of distributed resource pooling using p2p technologies. However, federated Grids address a much larger problem of converting a master/slave model to a pure p2p model. It is evident that the Grid paradigm and the p2p paradigm both have the same end goals: collaborative community formation and shared access of distributed resources. Despite of the similarity in philosophy, there some fundamental differences between the two technologies [34,35].

3.1 Organization The Grid computing model is a client-server model where the Grid servers offer specialized, reliable, highly advanced, and sophisticated scientific and commercial applications. Grids require a pre-established administrative infrastructure enforcing the VO policy. In other words, the roles, responsibilities, and privileges of the collaborating institutions and users are pre-defined in a Grid environment. These responsibilities and privileges do not change frequently and are maintained by well-trained administrators. On the other hand, p2p paradigm provides direct communication between peers without warranting any pre-established management infrastructure. The responsibility and privileges of the participating entities are not defined a priori and are often in flux. Every peer is responsible for defining and maintaining the access policies for her resource within the community.

3.2 Security Grids are centrally controlled by dedicated administrators enforcing centralized security policies. The trust level between participating entities is high, hence alleviating the requirement of complex reputation and trust enforcement models.

Grid Computing for the Masses

469

Popular peer-to-peer applications such as Napster, seti@home, and Gnutella lack the concept of a pre-defined trust relationship between participating entities [36]. The absence of a centralized policy enforcement architecture warrants p2p applications to deploy advanced distributed trust and reputation management services. Most p2p applications used by the masses assume an unsecured environment at all times. Peers cannot trust fellow peers due to the lack of accountability for wrongdoings.

3.3 Scalability The Grid paradigm concentrates on providing a suite of advanced services to a moderate number of privileged users. Its goal is to provide high quality of service to a small community, rather than providing scalability assurances to a large group. Moreover, the VO membership in a Grid requires significant administrative overhead, which makes it difficult to manage a large user-base and thus further restricts the scalability of the Grid. The p2p architecture focuses on integrating simple resources for the masses. Popular p2p applications such as Napster and seti@home have reached a user-base of millions [36]. The absence of any centrally controlled administrative infrastructure and the distributed nature of resource utilization make p2p applications extremely scalable.

3.4 Quality of Service Like most client-server models, Grid services are hosted on specialized “high-end” resources including expensive scientific instruments, clusters, and data storage systems. High connectivity is maintained between resources via dedicated high-speed networks. A well-established resource administration facilitates constant resource connectivity, resource monitoring, and fault tolerance. The required quality of service (QoS) is provided by the committed members of the VO based on their pre-agreed Grid policy and their dedication in the overall collaboration. Peer-to-peer systems cannot guarantee any QoS. They provide their services on a best-effort basis. Ongoing efforts have been made in the p2p community to improve the QoS by deploying dedicated peers (rendezvous peers) to improve resource connectivity and service discovery. However, such peers cannot be guaranteed and any QoS assertion is entirely dependent on the organization of connected peers.

4 Challenges The concept of ad hoc Grids and federated Grids is intriguing and the possibilities of its applications are many. Nonetheless, several technical challenges need to be addressed within the Grid community before such a paradigm can be adopted. The biggest challenge in p2p Grids is the enforcement of a dynamic and adaptive security model that overlays a secure framework over an insecure network. In a Grid approach the notion of trust and reputation is implicit within a closed VO formation. As discussed earlier, every collaborating entity trusts other entities. Grid resources are not anonymous; they are accountable for any misconduct. Further, Grid resources adhere to well-established mechanisms for authorization and authentication, restricting the use of

470

K. Amin, G. von Laszewski, and A.R. Mikler

resources by non collaborating or rogue users. Several security enforcement and policy maintenance models have been proposed within the Grid community [37,38,39,40]. However, all these architectures are based on the assumption of a stable, persistent, and long-term Grid establishment with a small set of seldom-changing users. Hence, these frameworks cannot be applied “out-of-the-box” to the proposed p2p Grid framework. Ad hoc and federated Grids require an adaptive security model that incrementally builds a secure Grid community based on the notion of trust and reputation. Similar security models are being investigated by the p2p research community [41,36]; however, their application in the Grid domain needs to be studied. Providing the adaptive authentication and authorization model described above does not guarantee the safety of Grid resources against spying, sabotage, and destruction. This issue has been addressed in detail by the mobile agent community in the context of protecting a mobile agent from a malicious host and vice versa. However, it needs to be verified whether these principles can be easily extrapolated from the agent context to the Grid context. Grid computing specializes in providing different levels of QoS guarantees. Without such assurances a p2p Grid infrastructure will lose its utility as a Grid environment. Hence, it is imperative to overlay an advanced QoS framework on p2p Grids. A sophisticated QoS environment that delivers the promised assurances despite unreliable, dynamic, and insecure Grid resources is yet to be researched. Peer-to-peer technologies assume the occurrences of system failure and service unavailability and provide mechanisms to adapt to such occurrences. However, the impact of these events on the performance of Grid applications and the effects of their fault-tolerance measures in a Grid context need to be thoroughly investigated. Many computational economic models have been proposed in literature by the Grid and p2p communities [29,36]. Several of these models could be used in the p2p Grid framework for fair sharing of resources. However, their feasibility and utility in the context of a strict yet dynamic security and QoS requirements need to be analyzed. Further, in the absence of a centrally controlled administrative framework, policing and monitoring these self-adaptive and self-configuring Grids is a formidable task. Mechanisms must be established that allows dynamic monitoring of such p2p Grid frameworks. Some researchers believe that social acceptance of such federated Grids is far from reality [42]. They argue that irrespective of the security, economic, and QoS assurances given by the Grid community, it would be extremely unlikely for research and commercial agencies to execute critical applications in such an unpredictable environment. Although the initial success of volunteer computing infrastructures in low-risk research applications seems encouraging, its wider sociological acceptance by a much larger community needs to be seen.

5 Conclusion Irrespective of the inherent differences, both the Grid and p2p paradigm have some distinct characteristics that complement each other. Grids enable sharing of specialized and advanced services in a secure environment providing high quality of services. The p2p

Grid Computing for the Masses

471

environment offers dynamic, self-organizing, and self-configuring ad hoc collaboratories sharing a large number of resources at the edge of the network. A large number of applications can inherently benefit by combining the characteristics of both these paradigms. Indeed, the amalgamation of the two technologies, resulting a new computing paradigm, the “peer-to-peer Grid”, will provide the necessary standards, security, and QoS assertions of the Grid paradigm and the ad hoc, self-organizing, and self-configuring attributes of the p2p architecture. This new paradigm can be further enhanced to form ubiquitously available universal public Grids that can provide a computing infrastructure similar to the Internet. Before p2p Grids can become a reality, a large number of issues need to be addressed by the Grid and p2p communities. Issues dealing with decentralized resource management, dynamic security policies, adaptive trust and reputation management, robust QoS delivery, reliable yet distributed monitoring mechanisms, and resilient computational economics need to be solved in way that is acceptable to both communities. The peer-topeer research group [43] of the GGF is one example of an initial attempt to address some of these issues. At present the group is undertaking a comparative study of conventional client-server Grids and p2p Grids suggesting the OGSA working group to incorporate p2p requirements. We hope that our contribution will help this effort. Acknowledgments. This work was supported by the Mathematical, Information, and Computational Science Division subprogram of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract W-31109-Eng-38. DARPA, DOE, and NSF support Globus Project research and development. The Java CoG Kit Project is supported by DOE SciDAC and NSF Alliance.

References 1. I. Foster and C. Kesselman, Eds., The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers, July 1998. 2. G. von Laszewski and K. Amin, Grid Middleware. Wiley, 2004, ch. Middleware for Communications, to be published. http://www.mcs.anl.gov/ ~gregor/papers/vonLaszewski--grid-middleware.pdf 3. D. Thain, T. Tannenbaum, and M. Linvy, Grid Computing: Making the Global Infrastructure a Reality. John Wiley, 2003, no. ISBN:0-470-85319-0, ch. Condor and the Grid. 4. G. Bell and J. Gray, “What’s next in high-performance computing,” Communications of the ACM, vol. 45, no. 2, pp. 91–95, Feb. 2002. http://doi.acm.org/10.1145/503124.503129 5. “The Access Grid Web Page,” Web Page. http://www-fp.mcs.anl.gov/fl/accessgrid/ 6. I. Foster, “What is the Grid? A Three Point Checklist,” 22 July 2002. http://www.gridtoday.com/02/0722/100136.html 7. I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of Supercomputing Applications, vol. 15, no. 3, 2002. http://www.globus.org/research/papers/anatomy.pdf 8. “Unicore,” Web Page, http://www.unicore.de/ 9. A. S. Grimshaw and W. A. Wulf, “The Legion Vision of a Worldwide Virtual Computer,” Communications of the ACM, vol. 40, no. 1, pp. 39–45, January 1997. http://legion.virginia.edu/copy-cacm.html

472

K. Amin, G. von Laszewski, and A.R. Mikler

10. “The Global Grid Forum Web Page,” Web Page, http://www.gridforum.org 11. I. Foster, C. Kesselman, J. Nick, and S. Tuecke, “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration,” Web page, Jan. 2002. http://www.globus.org/research/papers/ogsa.pdf 12. “World Wide Web Consortium,” Web Page. http://www.w3.org/ 13. “Doe Science Grid,” Web Page. http://www.doesciencegrid.org/ 14. “The DataGrid Project,” 2000. http://www.eu-datagrid.org/ 15. “GriPhyN - Grid Physics Network,” Web page. http://www.griphyn.org/index.php 16. W. E. Johnston, D. Gannon, and B. Nitzberg, “Grids as Production Computing Environments: The Engineering Aspects of NASA’s Information Power Grid,” 1999. 17. K. Keahey, T. Fredian, Q. Peng, D. P. Schissel, M. Thompson, I. Foster, M. Greenwald, and D. McCune, “Computational Grids in Action: The National Fusion Collaboratory,” Argonne National Laboratory, Tech. Rep., 2002. http://www-unix.mcs.anl.gov/~keahey/papers/FusionPaperSubmitted.pdf 18. “National Research Grid Initiative,” Web Page. http://www.naregi.org/ 19. T. Prudhomme, C. Kesselman, T. Finholt, I. Foster, D. Parsons, D. Abrams, J.-P. Bardet, R. Pennington, J. Towns, R. Butler, J. Futrelle, N. Zaluzec, and J. Hardin, “NEESgrid: A Distributed Virtual Laboratory for Advanced Earthquake Experimentation and Simulation: Scoping Study,” NEES, Tech. Rep. 2001-02, February 2001. http://www.neesgrid.org/html/TR_2001/NEESgrid_TR.2001-04.pdf 20. “Particle Physics Data Grid,” Web Page, 2001. http://www.ppdg.net/ 21. ‘TeraGrid,” Web Page, 2001. http://www.teragrid.org/ 22. G. von Laszewski, E. Blau, M. Bletzinger, J. Gawor, P. Lane, S. Martin, and M. Russell, “Software, Component, and Service Deployment in Computational Grids,” in IFIP/ACM Working Conference on Component Deployment, ser. Lecture Notes in Computer Science, J. Bishop, Ed., vol. 2370. Berlin, Germany: Springer, 20-21 June 2002, pp. 244–256. http://www.mcs.anl.gov/~gregor/papers/vonLaszewski--deploy-32.pdf 23. G. von Laszewski and P. Wagstrom, “Gestalt of the Grid,” in Performance Evaluation and Characterization of Parallel and Distributed Computing Tools, ser. Series on Parallel and Distributed Computing. Wiley, 2003, (to be published). http://www.mcs.anl.gov/ ~gregor/papers/vonLaszewski--gestalt.pdf 24. G. von Laszewski, M.-H. Su, J. A. Insley, I. Foster, J. Bresnahan, C. Kesselman, M. Thiebaux, M. L. Rivers, S. Wang, B. Tieman, and I. McNulty, “Real-Time Analysis, Visualization, and Steering of Microtomography Experiments at Photon Sources,” in Ninth SIAM Conference on Parallel Processing for Scientific Computing, San Antonio, TX, 22-24 Mar. 1999. http://www.mcs.anl.gov/~gregor/papers/vonLaszewski--siamCmt99.pdf 25. “The Globus Project,” Web Page. http://www.globus.org 26. E. Korpela, D. Werthimer, D. Anderson, J. Cobb, and M. Leboisky, “SETI@home-massively distributed computing for SETI,” Computing in Science & Engineering, vol. 3, no. 1, pp. 78–83, January–February 2001. 27. “Distributed.net Homepage,” Web Page. http://www.distributed.net/ 28. “grid.org Homepage,” Web Page. http://www.grid.org/ 29. R. Buyya, D. Abramson, and J. Giddy, “An Economy Driven Resource Management Architecture for Global Computational Power Grids,” in The 2000 International Conference on Parallel and Distributed Processing Techniques and Applications, Las Vegas, USA, 2629 June 2000. 30. “The Australian Greenhouse Office: Grid Interactive Systems,” Web Page. http://www.greenhouse.gov.au/renewable/power/grid.html 31. “Massachusetts Institute of Technology: Community Solar Power Initiative,” Web Page. http://solarpower.mit.edu/

Grid Computing for the Masses

473

32. “Napster Homepage,” Web Page, http://www.napster.com/ 33. “Gnutella Homepage,” Web Page. http://www.gnutella.wego.com/ 34. I. Foster and A. lamnitchi, “On Death, Taxes, and the Convergence of Peer-to-Peer and Grid Computing,” in 2nd International Workshop on Peer-to-Peer Systems (IPTPS’03), Berkeley, CA, USA, February 2003. 35. J. Ledlie, J. Shneidman, M. Seltzer, and J. Huth, “Scooped Again,” in 2nd International Workshop on Peer-to-Peer Systems (IPTPS’03), Berkeley, CA, USA, February 2003. 36. A. Oram, Ed., Peer-To-Peer: Harnessing the Power of Disruptive Technologies, 1st ed. O’Reilly, 2001. 37. L. Pearlman, V. Welch, I. Foster, C. Kesselman, and S. Tuecke, “A Community Authorization Service for Group Collaboration,” in IEEE 3rd International Workshop on Policies for Distributed Systems and Networks, Monterey CA, USA, 5-7 June 2002. 38. “VOMS Architecture v1.1,” Web Page, May 2002. http://grid-auth.infn.it/docs/VOMS- v1_1 .pdf 39. M. Thompson, W. Johnston, S. Mudumbai, G. Hoo, K. Jackson, and A. Essiari, “Certificatebased Access Control for Widely Distributed Resources,” 1999. 40. D. W. Chadwick and A. Otenko,“The PERMIS X.509 Rol Based Privelege Management Infrastructure,” in 7th ACM Symposium on Access Control Models and Technologies, 2002. 41. S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina, “The EigenTrust Algorithm for Reputation Management in P2P Networks,” in 12th International World Wide Web Conference, Budapest, Hungary, 20-24 May 2003. 42. D. B. Skillicorn, “Motivating Computational Grids,” in 2nd IEEE/ACM International Symposium on Cluster Computing and Grid (CCGrid02), Berlin, Germany, 21-24 May 2002. 43. “Peer-To-Peer Working Group,” Web Page. https://forge.gridforum.org/projects/p2p/

A Multiple-Neighborhoods-Based Simulated Annealing Algorithm for Timetable Problem* He Yan1 and Song-Nian Yu2 1

School of Computer Engineering and Science Shanghai University, China, 200072

2

School of Computer Engineering and Science Shanghai University, China, 200072

[email protected]

[email protected]

Abstract. This paper presents a simulated annealing algorithm that based on multiple search neighborhoods to solve a special kind of timetable problem. The new algorithm also can solve those problems that can be solved by local search algorithm. Various experimental results show that the new algorithm can actually give more satisfactory solutions than general simulated annealing algorithm can do.

1 Introduction It is well known that timetabling problems can be very difficult to solve, especially dealing with particularly large instances. And the timetable problem is known in general to be NP-hard problem. Since there are many strict constraints to be satisfied, so it’s very hard to generate a satisfactory solution within a short time. The timetable problem can be simply defined as follows: We should allocate a number of events to a finite number of time slots (or periods) so that the necessary constraints are satisfied. A large number of variations of timetable problem have been proposed in some literatures. In this paper we mainly solve a special kind of timetable problem ---Coursetable Problem. The Coursetable Problem consists in scheduling the lectures of a set of courses of a university in the given set of periods that compose the week and using the available rooms and the available teachers. Although the constraints of Timetable Problem vary among different instances of the problem, we classify all kinds of constraints into two categories: hard constraints and soft constraints. Hard constraints must be taken into consideration very strictly, because the timetables that even violate just one of hard constraints are unusable. The timetable that violates some soft constraints is still usable. But it is not convenient for the people who use it. In general, it is too difficult to satisfy all the soft constraints in *

1.This work is supported by the Science Foundation of Shanghai Municipal Commission of Science and Technology, grant No. 00JC14052. 2.This work is also supported by the peoject of Shanghai Municipal Commission of Education: Grid Technology E-Institute

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 474–481, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Multiple-Neighborhoods-Based Simulated Annealing Algorithm

475

a real-life timetable problem. Some concrete definition of hard constraints and soft constraints in the Coursetable Problem will be given in the second part of the paper. In order to ascertain the quality of a timetable, the penalty function is applied to it .The penalty function of a timetable usually calculate the weighted sum of the times it violates hard constraints and the times it violates soft constraints.The penalty function of the Coursetable Problem will be given in the second part of the paper. The paper is organized as follows: In Section 2. Some definitions about the coursetable problem are given. In Section 3 our algorithm will be explained in detail and how does the algorithm applied to the Coursetable problem will be given. In Section 4 will give some experimental results and discussion about the results will also be given. The last section is devoted to conclusions and some directions of our future works.

2 Some Definitions about the Coursetable Problem There are m courses C1, C2...,Cm, n teachers J1,J2...,Jn, p periods 1,2,...,p, and k rooms R1,R2...,Rk. Each course Ci consists of Li lectures to be scheduled in distinct periods, and it is taken by Si students. Each room Ri has a capacity Yi, in terms of number of seats. Hard constraints: 1. The number of lectures of each course Ci must be exactly Li 2. Two distinct lectures cannot take place in the same room and the same period. 3. Two courses with confliction cannot take place in the same period. 4. Every course might take place in these periods in which the course is available. Soft constraints: 1. The number of students that attend a course must be less than or equal to the number of seats of all the rooms that host its lectures. 2. The interval between the first lecture and the last lecture of every course must more than min time span of the course. We use some symbols to represent hard constraints and soft constraints: H1,H2,H3,H4 represent the four distinct hard constraints, and S1,S2 represent the two soft constraints. We define the function B(x) to calculate the times that a coursetable breaks the constraint x. So B(H1), B(H2), B(H3), B(H4) calculate the times a coursetable breaks H1,H2,H3,H4 respectively. F(S1), F(S2) calculates the times a coursetable breaks S1,S2 respectively. In this way, the following penalty function will show the quality of a coursetable T. Penalty(T) =[B(H1)+B(H2)+B(H3)+B(H4)] × 1000+[B(S1)+B(S2)] Notice that the weight of the times that the coursetable breaks hard constraints is 1000 times of the weight of the times that the coursetable breaks soft constraints. This means that hard constraints were taken into consideration more strictly than soft constraints. The less the value of Penalty(T) is, the better the quality of the course coursetable T is. If there is a coursetable with zero value of its penalty function, we think it optimal.

476

H. Yan and S.-N. Yu

3 Multiple-Neighborhoods-Based Simulated Annealing Algorithm for Coursetable Problem The multiple-neighborhoods-based simulated annealing algorithm is based on general simulated annealing algorithm and overcomes some disadvantages of general simulated annealing algorithm. In this section we will show how to use multipleneighborhoods-based simulated annealing algorithm for coursetable problem. The detailed definitions of coursetable problem were given in the section 2. Here we want to say something about the “state”. State is a term in the simulated annealing algorithm. It represents different things according to the concrete problem. In the coursetable problem the state just represents a coursetable and state can be seen as a matrix like the coursetable. Given an instance p of a problem P, we associate a search space S with it. Each element corresponds to a potential solution of p and is called a state of p. Simulated annealing algorithm relies on a move m which assigns to each its neighborhood Each state is called a neighbor of s. The definition of move is the key of a simulated annealing algorithm. Now we define two kinds of move for coursetable problem. They are P-move and Rmove. Corresponding to P-move and R-move, there exist two kinds of search neighborhoods respectively: P neighborhood and R neighborhood. P-move Current state is changed to a neighbor state in its P neighborhood by a P-move. Figure 1 shows how a P-move operates. R-move Current state is changed to a neighbor state in its R neighborhood by a R-move. Figure 2 shows how a R-move operates.

Fig. 1. How a P-move operates

Fig. 2. How a R-move operates

A Multiple-Neighborhoods-Based Simulated Annealing Algorithm

477

Figure 3 shows how multiple-neighborhoods-based simulated annealing algorithm applies to the coursetable problem. Some parameters are listed as follows: Start Temperature: t0 End Temperature: c Cooling Rate: a (a<1) Neighbors sampled in a temperature: b Stop criterion: The temperature is cooled to c or the value of Penalty (Si) is equal to 0 (The function Penalty () is defined in the section 2). The selection strategy: To select the best solution among the different solutions generated from different neighborhoods.(Random selection is another selection strategy)

Fig. 3. How multiple-neighborhoods-based simulated annealing algorithm applies to the coursetable problem

478

H. Yan and S.-N. Yu

4 Experimental Results Two groups of experimental data are used to test our algorithm. The first group of experimental data is used to explain the relationship between the number of iterations and the value of Penalty() when using different search neighborhoods(P, R, or P+R). The second group of experimental data is used to show that the multipleneighborhoods -based simulated annealing algorithm is also effective to some largescale Coursetable problems. And the second group of experimental data is also used to show the relationship between algorithm’s arguments and algorithm’s performance.

4.1 The First Group of Experimental Data (Listed in Table 1) Now we will list the experimental results of different simulated annealing algorithms using the same arguments. SA(P) denotes the simulated annealing algorithm using the single search neighborhood P, SA(R) denotes the simulated annealing algorithm using the single search neighborhood R. And we use SA(P+R) to denote the simulated annealing algorithm using multiple search neighborhoods: R and P. The same arguments using in all three algorithms are listed as follows: Initial_Temperature: 20000 Cooling_Rate: 0.99 Neighbors_Sampled: 200 The Table 2 shows the results run with the same initial solution with its value 182107 of Penalty() using SA(P), SA(R) and SA(P+R) respectively.

The Figure 4 shows the relationship between the number of iterations and the times the coursetable break hard constraints.

A Multiple-Neighborhoods-Based Simulated Annealing Algorithm

479

Fig. 4. The relationship between the number of iterations and the times the coursetable break hard constraints .

Fig. 5. The relationship between the number of iterations and the times the coursetable break soft constraints

The Figure 5 shows the relationship between the number of iterations and the times the coursetable break soft constraints . From the experimental results above, we could find that the simulated annealing algorithm combined with search neighborhood P can generate better solutions than the simulated annealing algorithm combined with search neighborhood R. But we find that simulated annealing algorithm combined with a single search neighborhood can’t generate the optimal solution. This is the reason why we apply the multipleneighborhoods-based simulated annealing algorithm to the coursetable problem. We also can draw some conclusions from the figure 5 and figure 6: SA(P) can effectively

480

H. Yan and S.-N. Yu

reduce the times of breaking hard constraints. In figure 5 during the 320000 iterations SA(P) reduced the times of breaking hard constraints from 182 to 0. SA(P)’s performance of reducing the times of breaking soft constraints is poor. In figure 6 during the 320000 iterations SA(P) only reduced the times of breaking soft constraints from 107 to 93. But the situation of SA(R) is just on the contrary. The advantage of SA(R) is that it can effectively reduce the times of breaking soft constraints. In figure 6 during the 320000 iterations SA(R) reduced the times of breaking soft constraints from 107 to 22. And the capability of SA(R) to reduce the times of breaking hard constraints is so limited that during the 320000 iterations SA(R) reduced the times of breaking hard constraints from 182 to 136. Only the multiple-neighborhoods-based simulated annealing algorithm SA(P+R) can effectively reduce both the times of breaking hard constraints and the times of breaking soft constraints. In figure5 and 6, SA(P+R) reduced both the times of breaking hard constraints and the times of breaking soft constraints to zero during the 320000 iterations.

4.2 The Second Group of Experimental Data (Listed in Table 3) In this group of experimental data, we will list the experimental results of the algorithm SA(P+R) by running several times with different arguments and the same initial solution in Table 4.

From the results above, we find that on the second group of experimental data the algorithm SA(P+R) can generate a nearly optimal or an optimal solution in a short time. The algorithm SA(P+R) generates different solutions with different arguments.

A Multiple-Neighborhoods-Based Simulated Annealing Algorithm

481

From the table above, we can find that even running several times with the same arguments, the algorithm never generates the same solution. The reason is that the algorithm implements some random operations during the running time. But as a whole, with the increase of Initial Temperature, Cooling Rate or Neighbors Sampled, the algorithm will run for a longer time and generate better solutions. Up to now, the selection of arguments just depends on our experience and experiments. The theory of how to select the algorithm’s arguments should be studied further.

5 Conclusions In this paper we present a new simulated annealing algorithm based on multiple search neighborhoods to solve a special kind of timetable problem---- Coursetable Problem. The results shown in the previous section indicate that the algorithm is quite effective in finding the optimal solution from an enormous search space. At last, there are several issues that seem to be worth further investigating: 1. The theory of how to select the algorithm’s arguments needs be studied further. 2. The idea of multiple search neighborhoods can be used to other local search algorithms such as Hill Climbing and Tabu Search. We will do some researches about it later. 3. The search of multiple neighborhoods can be assigned to different computers in a parallel environment. The method can reduce the execution time of algorithm considerably.

References Kang Li-Shan, Xie Yun, You Shi-Yong, Luo zhu-hua, Nonnumerical Parallel Algorithm (The First Volume): Simulated Annealing Algorithm. Beijing: Science Press. 1995 2. Wang Ling, Intelligent Optimization Algorithms with Applications. Beijing: Tsinghua University Press.2001 3. Lu Kai-Cheng, Introduce to Algorithm---- Algorithm Design and Analyze, Beijing:Tsinghua University Press. 1996 4. Edmund Kieran Burke, Recent Research Directions in Automated Timetabling.Accepted for publication in European Journal of Operational Research – EJOR, 2002. 5. D. A. Abramson, J. Abela, A Parallel Genetic Algorithm for Solving the School TimetablingProblem, IJCAI workshop on Parallel Processing in AI, Sydney, August 91. Also appearing in15 Australian Computer Science Conference, 1–11, Hobart, Feb 1992. 6. Andrea Schaerf.. Local Search Techniques for Large High School Timetabling Problems. IEEE Transactions on Systems, Man, and Cybernetics Part A: Systems And Humans, VOL. 29:368-377, NO. 4, JULY 1999 7. E. K. Burke and J. P. Newall. A Multistage Evolutionary Algorithm for the Timetable Problem. IEEE Transactions on Evolutionary Computation, VOL. 3:63-74, NO. 1, APRIL 1999 8. D. Abramson. Constructing School Timetables using Simulated Annealing: Sequential and Parallel Algorithms. Management Science, Vol 37, 98 - 113, No 1, Jan 1991. 9. A. Hertz. Tabu search for large scale timetabling problems. European Journal of Operational Research,54:39–47, 1992. 10. E. Aarts and J. K. Lenstra. Local Search in Combinatorial Optimization. John Wiley & Son, Chichester, 1997 1.

Lattice Framework to Implement OGSA: Its Constructs and Composition Scenario* Hui Liu, Minglu Li, Jiadi Yu, Lei Cao, Ying Li, Wei Jin, and Qi Qian Department of Computer Science and Engineering, Shanghai Jiaotong University, 200030 Shanghai, China {liuhui, li-ml}@cs.sjtu.edu.cn

Abstract. This paper presents a lattice framework to implement OGSA. It comprises nine semantic constructs: port, probe, adapter, action, activity, workflow, grid service, constraint and trigger, which form four different integration levels while each level could handle the semantics of presentation, functionality and resource separately. This paper focuses on the structures of constructs and the composition scenario based on this lattice framework.

1 Introduction Grids are evolving toward OGSA in which they provide a set of web services that VO can aggregate in various ways. How to implement a grid service remains diverse although OGSA emphasizes services-oriented visualization. In fact, OGSA is a horizontal investigation that standardizes the composition of grid services as one uniform model. However, the underlying implementation could either improve or restrict such a high-level orchestration. One important costs-down-drive is a flexible software architecture, which should meet the current and future needs of a diverse user population, and will adapt to changing business and technology requirements. This paper presents a novel lattice framework to implement OGSA for this kind of purpose. In the remainder of this paper, related work is summarized in section 2. The possible structures of constructs are discussed in section 3. Section 4 illustrates the composition scenario. Section 5 concludes this paper.

2 Related Works Client-server and multi-tiered architecture are the popular software architectures. They intend to segment software into distinct layers. However, operational strategies are still tightly coupled in the business logical layer. * Supported by the National Grand Fundamental Research 973 Program of China (No.2002CB312002), the Grand Project (No.03dz15027) and Key Project (No.025115033) of the Science and Technology Commission of Shanghai Municipality. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 482–489, 2004. © Springer-Verlag Berlin Heidelberg 2004

Lattice Framework to Implement OGSA: Its Constructs and Composition Scenario

483

One further solution is object-oriented architecture, such as J2EE, CORBA, .Net and DCOM. However, components are very small reusable units to build software and the automatic interoperability issues between them are always being ignored. Several components could be bound together to perform certain tasks. It forms container-based, process-based, agent-based, transaction-based or workflow-based integration. Their common challenge is how to avoid predefining all potentially reachable situations [1]. Very recently, semantics based composition of e-business has attracted some attentions [2]. In this paradigm, composition of web services needs is generated on the fly based on the requests of the customers and could be modeled not requiring programming at all. This paradigm happens to have the same goals as that of grids.

3 Constructs of the Lattice Framework The lattice framework comprises nine semantic constructs, including port, probe, action, activity, workflow, grid service, adapter, trigger and constraint. Among them, action, activity, workflow and grid service are core constructs that form four different integration levels while the other constructs attach themselves to different core constructs and present at different integration level to implement a grid service.

3.1 Port and Probe Core constructs use ports to exchange parameters, messages, events, contracts, handles, resource links and other formal information between each other [3]. Besides the formal information that could be encapsulated by some XML documents, port also possesses its own constraints and triggers that implement the operational strategies acted upon it. IN ports should be distinguished from OUT ports. IN ports are used to pass formal information into a core construct while OUT ports make formal information of a core construct available to the outside. Ports also define the boundaries of core constructs. These boundaries could be used to carry out projection, which means to get parts of the core constructs. Port has the following structure:

Probes are special OUT ports, which expose the runtime status of a core construct. However, they cannot be used to define boundaries of a core construct and cannot be used as delimiters of projection.

484

H. Liu et al.

How to deploy probes depends on the agreements formally established between the requester and provider, which form location semantics of probe. Therefore, probe has some different structure with port:

3.2 Adapter The formal information passed between IN ports and OUT ports (probes) may have different formats. The control model between two interacting core constructs could also create ‘gaps’ [4]. Adapters can be used to bridge these gaps. One important thing is that the transformation must preserve the origin semantics in addition to syntax transformation. Therefore, it could be defined as:

3.3 Action Action is a semantic wrapper of indivisible procedure or function, i.e., it adds semantics of presentation, functionality and resource to them. Procedure and function are reusable programming units for developers. To enhance the feasibility of integration, some standard description language had been used to wrap procedures and functions, such as CORBA IDL language. Now, we are going to wrap them with some XML based doc, such as WSDL, WSFL or BPEL4WS. The concept support this is action. Projection on an action must be invalid. Each action has just one IN port and one OUT port. Meanwhile, it may have several probes. The structure of action is:

Lattice Framework to Implement OGSA: Its Constructs and Composition Scenario

485

3.4 Activity Activity is a semantic wrapper of traditional transaction, a procedure characterized by ACID properties. Each activity must possess two OUT ports; one is used to commit the wrapped transaction while the other one is used to abort, cancel or rollback it. Its structure is:

Projection on an activity must also be invalid.

3.5 Workflow Traditional workflow encodes actions, activities and the relationships among them. It can be graphically depicted with nodes denoting actions or activities and arrows denoting precedence. Typically, there are four control relationships among actions and activities: OR-Split, AND-Split, OR-Join and AND-Join. The former two relationships are used to specify branching decisions in a workflow while the latter two specify points where actions and/or activities converging [5]. The construct of workflow is a semantic wrapper of traditional workflow. It may have numbers of IN ports, OUT ports and probes. The presentation semantics is a visualization view of workflow; it provides some GUI to facilitate the users, such as webflow [6]. The resource semantics is a resource view of workflow; it indicates the underlying procedures of resources requisition. The functionality semantics indicates the tasks involved and their navigation relationships. The structure of workflow is:

With this structure, the projection, integration and outsourcing of workflows involve two phases: 1) projection, integration and outsourcing on workflow semantics; 2) projection, integration and outsourcing on underlying presentation, functionality and/or resource views. Workflows are concrete implementations of grid service.

486

H. Liu et al.

3.6 Grid Service Grid services are specific web services. Even at the very beginning, it was noticed that we should represent them as a semantic model [2]. Main characteristics that make grid service composition very different from workflow integration and software component integration are summarized in [2]. Generally speaking, it is rather polymorphous, e.g., it can be data oriented, process oriented, transactional or combinations of them. The construct of grid service could be represented by:

In fact, grid service is a set of published interfaces of underlying workflows; it describes itself as XML based profiles, such as WSDL, WSFL.

Fig. 1. The class diagram of the architecture constructs. All associations between the constructs could be summed up as ‘possess’

3.7 Constraint and Trigger Operational strategies could be mapped onto integrity constraints and/or trigger rules. In the lattice framework, constraint and trigger are also described as semantic models. Because ports are delimiters of core constructs, we suppose constraint and trigger could be applied to these core constructs only by linking themselves to relevant ports or probes. Therefore, ports and probes are the entries of constraints and triggers.

Lattice Framework to Implement OGSA: Its Constructs and Composition Scenario

487

The same operational strategy may be expressed in different manner for different core constructs. For example, at the grid service level, it would be ‘orders greater than 500 units receive a 30% discount’; at the workflow level, it would be ‘when orders greater than 500 units, the activity of discount must be executed with the parameter of 30%’; at the activity level, it would be ‘when the threshold value ranges from 500 to infinity, discount ratio equals 30%’. We can find two important facts from above example: 1) constraints and triggers have their own scopes. A constraint or trigger at gird service level would affect all actions, activities and workflows involved while a constraint or trigger at action level would just affect ‘this’ action. 2) constraints and triggers would be derived from constraints and triggers at the upper integration level. Therefore, we must pay attention to their sources and relationships. To take all these factors into account, the structures of constraints and trigger are listed below.

Of course, besides the rule expressions, other information, such as scope, parent, etc., must also be wrapped in SR descriptions. The class diagram of the architecture constructs is shown in Fig.1.

4 Composition Scenario of the Lattice Framework The core constructs form four different integration levels for grids application (Fig.2). These levels divide the business logical layer into four sub-layers. On the hand, each sub-layer has its own presentation, functionality and resource semantics.

Fig. 2. The lattice framework to implement OGSA

488

H. Liu et al.

The typical composition scenario of grid services is shown in Fig.3. VO A builds its grid services using the lattice framework and deploys them in the local grid services repository within certain community (step 1). VO A advertises and publishes its grid services profiles to the grid services registry, which could be exposed as a collection of WSDL and/or UDDI (step 2). VO B submits its cooperation requests to grid services registry. After performing profiles matching and compatibility checking, grid services registry returns the integration and composition plan to VO B, including integration and composition alternatives. Different grid services requesters and providers must achieve an agreement before any integration and composition, e.g. authorization and authentication (step 6). After that, integration and composition could be carried out simultaneously at different sites (step 7). If needed, some new grid services could also be registered to grid services registry (step 8). Afterwards, VO A and B could cooperate dynamically to provide new grid services.

Fig. 3. Composition Scenario of OGSA

5 Discussions and Conclusions Some experts foresee the convergence of grid services, web services and ebXML because they address lots of integration requirements in common, such as interoperability, ubiquity, extensibility, flexibility, security, etc. The development of OGSA represents a natural evolution of web services. By integrating support for transient, stateful services instances and defining the lifecycle of grid services instances, OGSA significantly extends the power of the web services. ebXML is a complete solution focused on B2B integration scenarios, whereas the web services problem domain is broad and includes applications of B2B integration. Supposing the articles of bargaining are shifting from actual goods to abstract resources or services and delivery is carried out electronically by integration and sharing, a B2B platform could also be regarded as a grid portal. Finally, grid services, web services and

Lattice Framework to Implement OGSA: Its Constructs and Composition Scenario

489

ebXML have the same foundational technologies that will be the converging points of their evolution, such as XML, SOAP, UDDI, etc. Based on this forecast, this paper intensively argues that any progressive innovation in ebXML could also be used as reference for girds. Therefore, nine semantic constructs, including port, probe, adapter, action, activity, workflow, grid service, constraint and trigger, are proposed to implement and enhance OGSA. These constructs form a lattice framework that has four different integration levels. Each level could handle the semantics of presentation, functionality and resource separately. The lattice framework is a vertical investigation for the purpose of implementing OGSA. The construct of grid service must comprise all portType defined by OGSA. In fact, these constructs are abstracted from the real systems and they are top-down style design to implement OGSA. To support this semantics based lattice framework, a series of technologies are needed to develop a complete solutions: semantic specification used to describe all kinds of constructs, description of composition request and composition plan, matching and compatibility checking scheme, projection and union algorithms, semantic interpreter used to translate the semantic specification between two adjoined integration levels, semantic mediator used to merge incompatible constraints and triggers at the same integration level, etc. Of course, the solutions to these issues must depend on WSDL, WSFL, BPEL4WS.

References 1. Kwak, M., Han, D., Shim, J.: A Framework for Dynamic Workflow Interoperation Using Multi-Subprocess Task. In: Yanchun, Z, Amjad, U., Ee-Peng L., Ming-Chien S. (eds.): Proc. of the IEEE Int. Workshop on Res. Issues in Data Eng.: Eng. E-Com./e-Bus. Sys. IEEE Press, San Jose (2002) 93-100 2. Yang, J., Papazoglou, M., Heuvel, W.: Tackling the Challenges of Service Composition in E-Marketplace. In: Yanchun, Z, Amjad, U., Ee-Peng L., Ming-Chien S. (eds.): Proc. of the IEEE Int. Workshop on Res. Issues in Data Eng.: Eng. E-Com./e-Bus. Sys. IEEE Press, San Jose (2002) 125–133 3. Bussler, C.: B2B Integration Technology Architecture. In: Kawada, S. (ed.): Proc. IEEE Int. Workshop on Advances Issues of E-Com. and Web-Based Inf. Sys. IEEE Press, Los Alamitos (2002) 137–142 4. Froehlich, G., Liew, W., Hoover, H., Sorenson, P.: Application Framework Issues When Evolving Business Applications for Electronic Commerce. In: Ralph, H. (ed.): Proc. of the IEEE Ann. Hawaii Int. Conf. on Sys. Sci. Vol. Track 8, IEEE Press, Hawaii (1999) 182–192 5. Berfield, A., Chrysanthis, P., Tsamardinos, I., Pollack, M., Banerjee, S.: A Scheme for Integrating E-Services in Establishing Virtual Enterprises. In: Yanchun, Z, Amjad, U., EePeng L., Ming-Chien S. (eds.): Proc. of the IEEE Int. Workshop on Res. Issues in Data Eng.: Eng. E-Com./e-Bus. Sys. IEEE Press, San Jose (2002) 134–142 6. Preuner, G., Schrefl, M.: Integration of Web Services into Workflows through a MultiLevel Schema Architecture. In: Kawada, S. (ed.): Proc. IEEE Int. Workshop on Advances Issues of E-Com. and Web-Based Inf. Sys. IEEE Press, Los Alamitos 2002, 44–53

Moving Grid Systems into the IPv6 Era Sheng Jiang, Piers O’Hanlon, and Peter Kirstein Department of Computer Science University College London Gower Street, WC1E 6BT London, United Kingdom {S.Jiang, P.Ohanlon, P.Kirstein}@cs.ucl.ac.uk

Abstract. This paper focuses on integrating IPv6 functionality into Grid systems. We outline the advantages of IPv6 and the benefit to Grid systems. We then introduce our methodology and our efforts to provide IPv6 support on Grid systems using the Globus Toolkit Version 3 as our concrete working example. The status of global Grid IPv6 activities is introduced. We conclude by summarising how to bring IPv6 into Grid systems.

1

Overview – Grid System Working with IP Networks

During the last few years, Grid systems [6, 11, 12, 13, 14, 21] have emerged to perform large-scale computation and data storage over IP-enabled data communication networks. They use distributed, potentially remote, resources to optimise computation and storage resources. Grid systems are normally considered as network middleware [1], since they lie between applications and network resources. The data of Grid systems is transported over TCP/IP [2, 8] – currently using Internet Protocol IPv4 [18], now twenty years old. The next generation IP - IPv6 [10, 24, 28], is expected to replace IPv4 with a number of improvements. Since IPv6 is expected to become the core protocol for next generation networks, Grid computing systems must track the migration of the lower layer network protocols to IPv6. The period of transition from IPv4 to IPv6 will be long. Hence, it is important that Grid systems allow both IPv4 and IPv6 networks to be used. The Globus Toolkit [15, 36], developed mainly in the Argonne National Laboratory (ANL), provides the libraries and services for and Grid computing. The current edition of Globus Toolkit – Version 3 (GT3) is based on the latest Grid standards – the Open Grid Services Architecture (OGSA) [11, 26] and integrates the Grid services with the Web services. GT3 is designed to work with IPv4, though many aspects are compatible with IPv6. We discuss our attempts to provide dual-stack, IPv4 and IPv6, facilities in Grid systems in this paper. When the Grid systems are IPv6-enabled, we will be able to experiment with the several features that become possible with IPv6 support like mobility, security and auto-configuration. We have structured the paper in the following way. First we consider the potential advantages of IPv6, from which Grids could benefit. Then we discuss the general M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 490–499, 2004. © Springer-Verlag Berlin Heidelberg 2004

Moving Grid Systems into the IPv6 Era

491

IPv6 host and environment. During the discussion, we survey the general steps needed to build up an IPv6 experimental environment. In Section 4 and 5, we discuss our experience of implementation of IPv6 within Grid systems using the Globus Toolkit as example. We keep the consideration as general as we can, so that others could use our approach in other Grid systems. In Section 6, we introduce the current status of global activities for getting IPv6 in Grids. Finally, we end this paper by briefly summarising how to bring IPv6 into Grid systems.

2

IPv6 Advantage for Grid Systems

As network middleware, the current release of Globus Toolkit uses IPv4-based network resources to serve the upper layer applications and users. Like all widespread applications, the Globus Toolkit should be prepared to move into the IPv6 era. The bulk of the IPv6 standards (e.g. [10]) were ratified in the Internet Engineering Task Force (IETF) in 1998. IPv6 fulfils the future demands on address space, and also addresses other features such as multicast, encryption, Quality of Service, and better support for mobile computing. In comparison to the current IPv4 protocol family, IPv6 offers a number of significant advantages. Most of these advantages will also be very useful for Grid purposes. The IPv6 data format does not really provide most of these advantages by itself. However, the design of the IPv6 protocol suite has taken the opportunity to re-design the relevant protocols with a better and more logical system; for example the IPv6 renumbering mechanism could simplify dynamic mergers and acquisitions of Virtual Organisation in Grid systems. We address here three major advantages: bigger address space, mobility support and security support – ignoring the many other advantages and potential benefits of IPv6, such as auto-configuration [25], hooks for QoS, etc. While we take advantages of IPv6 features, we also give consideration to communication in heterogeneous IPv4/IPv6 networks.

2.1

Bigger Address Space

With its 128-bit address space and much better address aggregation properties, IPv6 potentially makes massive scaling of Grid networking possible; this is important in view of the aims to deploy Grid computation globally. With the enlarged address space, workarounds like NATs (Network Address Translators) [23] are no longer needed. This allows full, global IP connectivity for IPbased machines as well as upcoming mobile devices like PDAs and cell phones – all can benefit from full IP access through end-to-end services. There can be multiple addresses for a single interface; where the addresses can be used for different functions. The large address space allows for simpler end-to-end security, IPv6 renumbering mechanism, separated addressing and routing, etc.

2.2

Mobility Support

Until recently, most Grid research has focused only on fixed systems. However, the mobility support within Grid systems will be needed as mobility takes an ever more

492

S. Jiang, P. O’Hanlon, and P. Kirstein

important role in modern life. In our research, the Mobile-Grid-specific autoconfiguration mechanisms are proposed to allow a Grid Mobile node to use the Grid resources available locally. As regards next generation mobile networks, IPv6 is mandated by the 3rd Generation Partnership Project [35]. Mobile IPv6 [16, 19] is accessible using general IPv6 APIs, appearing transparent to the application layer. Thus, in an IPv6 implementation, there is potential support for roaming between different networks, with global notification when you leave one network and enter another. Support for roaming is possible with IPv4 too, but it is generally less efficient.

2.3

Built-in Security

While scalability, performance and heterogeneity are desirable goals for any distributed systems, including Grid systems, the characteristics of computational Grids lead to security issues. Though the security improvement from IPv6 does not solve all the security problems, Grid systems can benefit from IPv6’s security features. The IPv6 security and Grid Security Infrastructures are running at different levels. They can be employed together to provide better security granularity. Besides support for mobility, security was another requirement for IPv6. The IPv6 protocol stacks are required to include IPsec [20], which allows authentication and encryption of IP traffic. With IPsec all IP traffic between two nodes can be handled without adjusting any applications. Alternatively application-level security can be employed per service if required. However, using IPsec all applications on a machine can benefit from encryption and authentication, and policies can be set on a per-host (or even per-network) basis instead of per application/service. Full IPsec security operates over IPv4 today – when there is a full end-to-end connection. If NATs are used, as often occurs in IPv4 networks, but are not needed in IPv6 ones, it is not possible to use full IPsec on the end-to-end communications.

2.4 Communication in Heterogeneous IPv4/IPv6 Networks Since there will be a period of IP transition, consideration must be given to an interim coexistence of IPv4 and IPv6 [9]. Our effort to integrate IPv6 into Grid systems takes an IP-protocol independent [27] approach, i.e. it supports both IPv4 and IPv6. The IPindependent server, shown in Fig. 1, has to be able to respond to client calls according to the IP family that the client uses. Since IPv4-only machines will exist for many more years, while IPv6-only machines are starting to appear, it is necessary to provide support to both IPv4-only and IPv6-only environments. The client decides which version of IP is to be used. The Grid server responds to the client calls according to which IP family the client uses. For instance, in Fig. 1, an IPindependent Grid server starts and listens on both its IPv4 and IPv6 interfaces. When an IPv4 client connects over IPv4, the Grid server uses IPv4 interface to callback; only IPv4 communication takes place – similarly with IPv6. With dual-stack servers, the client can choose which IP family is the default or preferred (see Section 4.4). For communication in heterogeneous IPv4/IPv6 networks, there are a number of network transition aids [7, 22], which essentially translate the packet headers between

Moving Grid Systems into the IPv6 Era

493

Fig. 1. IP Communication in heterogeneous IPv4/IPv6 networks

IPv4 and IPv6, leaving the payload untouched. These approaches may work in certain circumstances for Grid applications. A higher level approach, which is employed by other services for transition, is application level gatewaying. This operates in a dualstack node and actually does an application level translation of the payload of the packets between the two communicating nodes. (See Fig2.)

Fig. 2. IP Transition between heterogeneous IPv4-only and IPv6-only networks

3 IPv6 Environment for Grid To start any IPv6 experiments, the host must be IPv6-enabled. The IPv6-capable application API libraries are required in order to run the IPv6-enabled applications or IP-independent applications over IPv6. All network-associated applications, such as network-sharing database applications and web containers, need to be IPv6-enabled to run IPv6 tests. In order to run tests over a network rather than only on local hosts, IPv6 support on the network is essential. We discuss how to build up an IPv6 environment step-by-step, using our IPv6-enabled Grid testbed as an example.

494

S. Jiang, P. O’Hanlon, and P. Kirstein

3.1

Operating System Support on Hosts

The IPv6 support on hosts depends on the operating system and its kernel. For the time being, we restrict ourselves to the LINUX/PC platform since GT3 is only fully working on Linux systems. UCL has set up an IPv6-enabled Grid testbed, which includes 5 nodes running Linux Red Hat 8 and 3 nodes running Linux Red Hat 7.3 with recompiled kernel, for IPv6 support. From Linux Red Hat 8 (kernel 2.4.18), the IPv6 module is provided with auto-load as default. On the earlier distributions of Linux, users had to re-compile the kernel to get the IPv6 support [29]. On Windows 2000, an IPv6 preview package (free download [30]) is available with limited functionality. The IPv6 package in Windows XP, which provides a better IPv6 support, is distributed with the edition, but requests individual installation [31].

3.2

IPv6-Capable Application API Libraries

IPv6-capable application API libraries need to provide support for upper-layer applications. GT3 is mainly written in Java. For the IPv6 support, we use Sun Java SDK 1.4.1 on our IPv6-enabled Grid testbed. With the IPv6-compatible kernel and the IPv6 module loaded, Linux system libraries provide a few IPv6 data structures, such as sockaddr_in6, in6_addr and in6addr_loopback, and IPv6 system functions, such as inet_ntop( ) and inet_pton(), are available to be used. But they are not IP-version-independent. To be IPindependent, IP-independent data structures, such as addrinfo and sockaddr_storage, and functions, such as getaddrinfo() and getnameinfo() should be used on dual-stack servers and server applications. As a platform-independent runtime environment, JDK 1.4 provides the IPv6 support on Solaris and Linux. JDK 1.5 is planned to provide IPv6 for WinXP. Within Java SDK 1.4, the class java.net.InetAddress has two direct subclasses: java.net.Inet4Address and java.net.Inet6Address. They provide the support for IPv4 and IPv6 addresses. The InetAddress class uses the Host Name Resolution mechanisms to resolve host names to their appropriate host address type. Additionally there are various system preferences that can influence protocol preferences, such as perferIPv6Addresses and perferIPv4Stack.

3.3

Associated Applications

The Globus system also utilises external applications. All network associated applications need to be IPv6-enabled as well. In GT3, the Java run-time environment needs to be IPv6-enabled as mentioned earlier. Java DataBase Connectivity, which is used for Reliable File Transfer, needs an IPv6 patch. As recommended by the Globus Implementation Group, Jakarta Tomcat is used as the web container for the Grid services on a Grid server. The container environment needs to provide IPv6 Web services for Grid services. Tomcat version 5 has been tested with IPv6 capabilities. Other Web service container environments, such as IBM Websphere and Microsoft .NET, are being investigated in our Grid IPv6 project as well.

Moving Grid Systems into the IPv6 Era

3.4

495

Networking Support for IPv6

In order to run IPv6 tests over a network rather than only on local hosts, IPv6 support for networking is essential. It requires IPv6-enabled routers, which provide forwarding and dynamic routing, and support from IPv6-enabled network services, such as IPv6 DNS, Web services, etc. A number of the major router manufactures provide now basic IPv6 support and are beginning to provide more advanced support such as hardware forwarding. Support for IPv6 in the DNS – provides hostname and IPv6 address resolution which may be provided over IPv4 and/or IPv6 connection. For the communication in heterogeneous IPv4/IPv6 networks, there are many approaches to the provision of transition aids (see Section 2.4). They need to be considered when building an IPv6 environment within or around current global IPv4 networks.

4

Integration of IPv6 into Globus

The integration of IPv6 into the Grid systems starts with finding IP-versiondependencies in the network protocols. The implementation of network APIs within applications may involve a few IP-dependent functions. We introduce our methodologies in later sections using Globus IPv6 porting as an example. A number of modifications need to be made for IP-dependent operation. In order to operate in heterogeneous IPv4/IPv6 networks, a few configuration options are needed.

4.1

Methods of Finding IP Dependencies

To find out exactly which lower-layer protocols and APIs are being used, two approaches are taken – firstly the ‘top down’ approach where we execute some upper layer applications. Secondly the ‘bottom up’ approach where we monitor all the data traffic between nodes and on the Loopback interface. The following have been identified as relevant: thev have been modified to be IP-independent: Which network protocols are involved and whether they are IP-dependent Where to get or generate IP addresses How to generate URLs [4] and URIs [5] with IP addresses How to create sockets and network connections Hard-coded IPv4 addresses

4.2

GT3 Protocols Modification for IPv6

The specifications of a few protocols have needed to be modified to suit IPv6. In the Globus Toolkit, Grid FTP is being modified in a way similar to FTP (RFC2428 [3] – “FTP extend for IPv6 & NATs”). Correspondingly, the specific implementations of these protocols need modification as well. Within the Globus project, GridFTP is currently implemented in standard C. A new IP-independent network module known as globus_XIO is being developed for use by for GridFTP. In our porting, all protocols involved must be examined, we are in contact with the Globus implementation group on this, and have examined most. The GGF IPv6-WG

496

S. Jiang, P. O’Hanlon, and P. Kirstein

is surveying all GGF specifications for IP version dependence (see Section 6). This is also happening with the Internet Engineering Task Force.

4.3

IPv6 Modification in GT3 Implementation

While modifying the IP-independent protocols and their implementation, the implementation of the Globus Toolkit has needed to be modified as well. Corresponding to the IP network functions found using the method we mentioned in Section 4.1, the following modifications have been made to various modules to realise IPv6 functionality in GT3 while Java SDK has provided the IP-independent data structures and functions: Where to get or generate IP addresses “Localhost” or particular hostname are used in both the Globus configuration file and Globus initial functions. Then IP-independent functions (InetAddress.getByName in Java, getaddrinfo in standard C) are used everywhere need to translate hostname into IP address. How to generate URLs and URIs with IP addresses All URLs and URIs generating functions have been modified in order to handle the particular format of the literal IPv6 addresses in URLs [17]. It ensures the literal IPv6 addresses in URLs are included by square brackets. Hard-coded IPv4 addresses All hard-coded IPv4 addresses have been replaced by “localhost” or particular hostnames. Then use IP-independent functions look up the IP addresses when need to translate hostname into IP address.

4.4

Configuration for IP Operations

In GT3, a few configuration options are available to allow the user to choose the startup IP bindings. To use hostname instead of IP address, the user needs to set up the configuration option “publishHostName” be “true” in the globalConfiguration section of server-config.wsdd. If there are other IP addresses that are associated with a host, such as a particular IPv6 hostname, the user needs to set up which the hostname is used with the configuration option “logicalHost”. By Java default options, IPv6 address has higher priority on dual-stack machines. To operate in IPv4-only network or set IPv4 be higher priority, user needs to set Java system properties “perferIPv6Addresses” be “false” and “perferIPv4Stack” be “true”.

5

IPv6 Grid Tests

Having realised the IPv6-enabled Globus system, a few experiments and tests have been run successfully in different scenarios. These include ones where some of the system is IPv4-only, whereas some are IPv6-only, and most dual stack. These tests were to allow for the scenario of a transition to IPv6 since most current Grid usage and systems are mainly in IPv4 now. A few upper-layer services have been run successfully over IPv6-enabled Grid systems in order to validate the adaptation between IPv6 and Grid applications.

Moving Grid Systems into the IPv6 Era

5.1

497

GT3 Version Dependency

The GT3 alpha version became available at the beginning of 2003. Since then, we have mainly worked on the GT3 alpha version because it provided the most verbose debug information. We have demonstrated successfully IPv6 functionality and run our tests on the GT3 alpha version. We started to move to release GT 3.0 when it came out at the end of June 2003. More components are involved in GT 3.0; some of GT 3.0 components involved more IP-dependent issues; we are still surveying them. Globus Resource Allocation Manager (GRAM) have been identified as not working in GT 3.0, which worked in GT3 alpha, due to Java IPv6 reverse looking up bug.

5.2

IPv6 Test Scenarios and Porting Stages

The IPv6-enabled tests and porting start from an IPv4-only test in the heterogeneous IPv4/IPv6 environment. This may differ from the IPv4-only environment. Then, IPv6only porting has been done with the minimal modification. During this stage, as much as possible, IP-independent functions and data structures have been used instead of IPv4-only or IPv6-only one. The situation becomes much more complicated for IPv4/IPv6 dual stacks. In the dual-stack environment, parallel independent support for both IPv4 and IPv6 must be provided. The Grid server starts with the IP-independent hostname and respond the client calls according to the IP family user used. After modification has been made, we do run IPv4-only tests as well, since most Globus users are obviously still IPv4.

5.3

Test Services

A few upper-layer services have been run successfully over IPv6-enabled Grid systems in order to confirm the adaptation between IPv6 and Grid applications. The services or applications that are distributed with GT3 are used as general initial test services in our test scenarios. We managed to access Grid Services through IPv6 interfaces by using the OGSA graphic user interface service browser. We managed to submit remote GRAM jobs through IPv6 interfaces. These tests were also successfully run with IPv4 in our heterogeneous IP Grid testbed. We also tested systems with externally developed GT3 services as well. The EProtein project [32] had developed a remote executing service based on GT3 GRAM. It was successfully transplanted to the IPv6-enabled Globus infrastructure. We have also begun working with a bio-chemical simulation project, Materials Simulation [33], to test their OGSA-compatible simulation services on IPv6.

6

Grid IPv6 Standardisation Status

Since February 2003, when the Global Grid Forum (GGF) and the IPv6 Forum announced a liaison relationship to drive New Generation Applications deployment worldwide, IPv6 has become relevant to Grid activities. An IPv6-Grid Working Group has been set up in GGF. The GGF IPv6 WG [34] is currently working on two drafts: “IP version dependencies in GGF specifications” and “Guidelines for IP independence in GGF specifications”, which have been presented

498

S. Jiang, P. O’Hanlon, and P. Kirstein

in GGF9 (October 2003). UCL makes contributions for both of drafts. UCL is currently involved with a number of activities that bring IPv6 into other Grid systems, such as Sun Grid Engine.

7

Conclusion and Future Work

Our main contribution is to ensure that we have a version of GT3 at UCL giving IPv6 support, which has been tested with a number of Grid applications. We have had reasonable success. The mechanisms and approach for integrating IPv6 into Globus introduced in this paper could benefit other research to integrate IPv6 into other Grid systems. While keeping modification in GT3 to a minimum during the tests and experiments, we have identified a few further modifications still needed, which will make IPv6 configuration and operation easier and smoother. They will be implemented on the stable Globus Toolkit distribution. We plan to provide a variety of services over IPv6 based on UCL’s own Grid services. In later research, we will look at issues of mobility and security in Grids. Mobility support in Grid computing systems will be achieved based on using Mobile IPv6. Our proposed Mobile-Grid-specific reconfiguration mechanisms will be developed to meet the particular requests for transparent, dynamic and automatic network services in Grid computing systems.

Acknowledgement. The authors wish to thank Soren-Aksel Sørensen, University College London and to Tim Chown and David Mills, of the University of Southampton, who made substantial contributions to the work of this paper.

References Aiken, B. 2000. “Network Policy and Services: A Report of a Workshop on Middleware,” RFC 2768. 2. Albitz, P. 2001. “DNS and Bind, 4th Edition”, Published by: O’Reilly. 3. Allman M. 1998. “FTP Extensions for IPv6 and NATs”, RFC 2428. 4. Berners-Lee T. 1994. “Uniform Resource Locators (URL)”, RFC 1738. 5. Berners-Lee T. 1998. “Uniform Resource Identifiers (URI): Generic Syntax”, RFC 2396. 6. Chervenak, A. 2001. “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets,” Journal of Network and Computer Applications 23: 187-200. 7. Chown T. 2003. “Advanced Aids to Deployment”, Deliverable 17, the 6WINIT Project, UCL. 8. Comer D. 2000. “Internetworking with TCP/IP, Volume 1, 4th edition,” Published by Prentice Hall. 9. Davies, J. 2002. “IPv6/IPv4 Coexistence and Migration,” White paper of Microsoft Corporation. 10. Deering, S. 1998. “Internet Protocol, Version 6 (IPv6) Specification”, RFC 2460. 11. Foster I. 2002. “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”, Global Grid Forum. 1.

Moving Grid Systems into the IPv6 Era

499

12. Foster, I. 2002. “The Grid: A New Infrastructure of 21st Century Science,” Physics Today Volume 55: 42-52. 13. Foster, I. 2001. “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International J. Supercomputer Applications Volume 15(3). 14. Foster, I. 1998. “The Grid: Blueprint for a New Computing Infrastructure,” Published by: Morgan Kaufmann. 15. Foster, I. 1997. “Globus: A Metacomputing Infrastructure Toolkit,” Intl J. Supercomputer Applications Volume 11(2): 115-128 16. Fritsche, W. 2000. “Mobile IPv6 - the Mobility Support for Next Generation Internet,” IPv6 forum. 17. Hinden R. 1999. “Format for Literal IPv6 Addresses in URL’s”, RFC 2732. 18. ISI. 1981 “Internet Protocol DARPA Internet Program Protocol Specification”, RFC 791. 19. Johnson, D. 2003. “Mobility Support in IPv6”, IETF Internet Draft. 20. Kent, S. 1998. “Security Architecture for the Internet Protocol”, RFC 2401. 21. Laszewski, von G. 1999. “Grid Infrastructure to Support Science Portals for Large Scale Instruments,” Proc. of the Workshop Distributed Computing on the Web (DCW). 22. R, Gilligan. 2000 “Transition Mechanisms for IPv6 Hosts and Routers”, RFC 2893. 23. Srisuresh P. 2001. “Traditional IP Network Address Translator (Traditional NAT)”, RFC 3022. 24. Thomson S. 2003. “Basic Socket Interface Extensions for IPv6”, RFC 3493. 25. Thomson, S. 1998. “IPv6 Stateless Address Autoconfiguration,” Request For Comments 2462. 26. Tuecke, S. 2003. “Open Grid Service Infrastructure (OGSI) – Version 1.0 (draft)”, Global Grid Forum. 27. Shin, M. 2003. “Application Aspects of IPv6 Transition”, IEFT Internet Draft. 28. Stevens, W. 2003. “Advanced Sockets Application Program Interface (API) for IPv6”, RFC 3542. 29. http://www.bieringer.de/linux/IPv6/ 30. http://msdn.microsoft.com/downloads/sdks/platform/tpipv6.asp. 31. http://www.microsoft.com/windowsxp/pro/techinfo/administration/ipv6/ 32. http://grid.ucl.ac.uk/proj_epro.html 33. http://grid.ucl.ac.uk/proj_mats.html 34. http://forge.gridforum.org/projects/ipv6-wg/ 35. http://www.3gpp.org 36. http://www.globus.org

MG-QoS: QoS-Based Resource Discovery in Manufacturing Grid Zhanbei Shi1, Tao Yu 2 , and Lilan Liu 2 1

Computer Science Dept. of Shanghai University, Shanghai, China, 200072 [email protected]

2

CIMS & Robot Center of Shanghai University, Shanghai, China, 200072

Abstract. The emerging of the Open Grid Service Architecture (OGSA) andGT3 provides a solution to Network Manufacturing- Manufacturing Grid (MG). We apply GT3 to manufacturing by developing a MG Quality of Service (MG-QoS) management system. The main focus of this framework is to provide a means for the users to search for services based on QoS criteria in Manufacturing Grid, to provide QoS guarantees for service execution and to enforce these guarantees by establishing Service Level Agreements (SLAs). Hereby, we imagine a service discovery mechanism based on QoS properties with regarding to SLAs between the user and the service providers, with the GPP analyzer to decompose the requested job, and the Reservation and Allocation Agent to reserve and allocate the service.

1 Manufacturing Grid Collaborative working and resource sharing between enterprises or different departments of one enterprise distributed worldwide is growing frequent. The emerging of the Open Grid Service Architecture (OGSA) [1] and publication of Globus Toolkit 3.0(GT3) [2, 3] provide a solution to Network Manufacturing. We developed a Manufacturing Grid (MG) system to standardize network-manufacturing platforms in manufacturing, based on GT3 and OGSA. Manufacturing Grid (MG) is made up of a number of components from enabling resources to end user applications. The framework of the MG consists of four layer, namely MG Fabric, MG Core Middleware, MG User-level Middleware, and MG Application [12, 13].

2 MG-QoS 2.1 QoS and Related Works Quality of service management has been explored in various contexts, particularly for computer networks [6] and multimedia applications. QoS management has also been explored in the context of Grid computing. Ian Foster propose Globus Architecture for Reservation and Allocation (GARA) [4], which addresses QoS at the level of fa-

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 500–506, 2004. © Springer-Verlag Berlin Heidelberg 2004

MG-QoS: QoS-Based Resource Discovery in Manufacturing Grid

501

cilitating and providing basic mechanisms for QoS support. The most similar to our work is G-QoSM [5], which is developed by Cardiff University, UK. It provide a framework for QoS management presented to enable Service-Oriented Grid users to specify, locate and execute Grid services with QoS constraints. Comparing to general computing Grid resource (CPU, Network, etc.), we find manufacturing resources has their special particularities: 1. Manufacturing resource is offline, which means that the manufacturing progress is offline and don’t need network connection to user application; 2. Manufacturing resource is heterogeneous and distributed. Different manufacturing resource has different function and attributes, thus the different QoS properties. 3. A manufacturing job is usually composed of several working procedures in a predefined sequence. Each procedure needs different manufacturing resource, and transfers the working result to next procedure timely. According to above, we learn about that no existing QoS System can be simply adapted to manufacturing grid. Therefore, we develop a MG Quality of Service management system (MG-QoS), extending GT3 to manufacturing field. The MG-QoS framework builds on the Open Grid Service Architecture (OGSA), and aims to address the following requirements. 1. Mapping application’s QoS requirements to resource capabilities. 2. Decomposing requested job into subjobs so that can be finished by single service; 3. Reserving and allocating resources; 4. Establishing SLAs (Service Level Agreements) with clients;

2.2 MG-QoS Architecture In the context of our framework, QoS may be characterized at two levels: (1) application level QoS, (2) network level QoS. Application level QoS is relevant when the request and response happened between user and remote services. Due to the specificities of the manufacturing resource (outlined in section 2.1), we will pay more attention to application QoS.

Fig. 1. MG-QoS Architecture

502

Z. Shi, T. Yu, and L. Liu

As illustrated in Fig.1, the Manufacturing Grid QoS Management (MG-QoS) framework is aimed at supporting manufacturing service discovery foremost, and monitoring services to ensure that the QoS requirements for these services are being met in a distributed environment – MG. It consists of the following modules: Application QoS (AQoS) manager: The AQoS manager interacts with all the other components in the MG-QoS to discover services with specific QoS requirements, coordinates with the resource managers to discovery and allocate resources for subsequent service execution. Once the AQoS received the user’s service request with a specific QoS requirement, the Global Process Planning (GPP) Analyzer decomposes the manufacturing job into several subjobs, which can be completed by single service, in pre-defined sequence according to the information from GPP knowledge database. Each subjob has it’s own QoS requirement. Then AQoS finds the services for these subjobs, and asks Reservation Manager to reserve them. Once services for all subjobs have been reserved, a service level agreements (SLAs) need to be established between user and service providers. With the help of Service Reliability Manager, the service status and job-run state are measured and aggregated by the AQoS manager. Resource Manager (RM): A RM is considered in this context as a combination of Globus Resource Allocation Manager (GRAM) and Monitoring and Discovery Service (MDS). GRAM is used to host the service and to create an execution environment with specific QoS specifications; MDS on the other hand is used as a registry service, to record service capability and QoS provisions. Network Resource Manager (NRM): The NRM is conceptually a Bandwidth Broker (BB) [7]. NRM will communicate with local enforcers to determine the state of the network as well as configure the network. It accounts for the ability of the entire network to deliver a particular policy request. Manufacturing Resource (MR) Service: Normally, a manufacturing service capability can be measured by four parameters, namely TQCS (deliver Time, Quality, Cost, Service). Depending on practical experience, MG-QoS uses one of the two QoS level services: (1) Guaranteed-service [8], (2) Controlled-load [9]. The Guaranteed-service delivers QoS based on pre-defined constraints identified by a user, and agreed upon by the provider in a SLA (the bounds of TQCS are stringent). In this type of service, the QoS parameters are enforced, monitored, and the service provider is committed to deliver the service with such specifications. In the Controlled-load, the user states the QoS requirements, the brokering service finds the most suitable resources to execute the service. However, the main difference is that the QoS requirements are a less stringent. Service Reliable Manager (SRM): Each Manufacturing Resource Service is managed by a SRM, which is responsible for recording the manufacturing status (such as running, idle, or faulty) and task scheduler. During service discovery, MDS send a request to SRM to check whether the service is usable and loadable. A much detailed framework is show in Fig.2. An AQoS is composed of Global Process Planning (GPP), Reservation Manager and Allocation Manager. Reservation and Allocation Manager plays a critical role in MG-QoS. They provide a bridge between the application and the available resources, constructing sets of resources that both match application QoS requirements and conform to the local practices and policies of resource providers.

MG-QoS: QoS-Based Resource Discovery in Manufacturing Grid

Fig. 2. Detailed framework of MG-QoS

503

Fig. 3. AQoS Sequence Diagram

Fig. 3 shows a sequence diagram from user’s job submission to allocating the discovered service.

3 Reservation and Allocation Agent We devise a reservation agent that takes service request as a specification of the dataset that is to be analyzed and an indication of desired QoS, expressed in terms of how precise the result should be, how soon results are required, and how much the user is prepared to pay. The agent uses MDS to discover the most suitable manufacturing resource, with the SLA specified by user. GPP aggregates the service discovered by reservation agent for each subtask. Once all services needed by running the submitted job are determined, GPP will invoke the allocation agent to allocate a resource set. Then, GRAM will take charge of the subsequent processing. A reservation is created by a generic “Create Reservation” operation, which interacts with local SRM to ensure that the requested quantity and quality of the manufacturing resource will be available at the requested start time and will remain available until the deadline. If the resource cannot make this assurance, the “Create Reservation” operation fails. Note that the MG-QoS concept of reservation means immediate reservations and The reservation in MG-QoS has three properties: 1. Start time and deadline; 2. Resource type (e.g., CAD, CAM, CNC and so on); 3. Resource QoS properties (e.g., quality, cost, etc.); A reservation agent is responsible for discovering a collection of resources that can satisfy application QoS requirements (a however, rather than allocating those resources, it simply reserves them. Hence, a call to a reservation agent specifies

504

Z. Shi, T. Yu, and L. Liu

QoS requirements and returns a set of reservation handles that can then be passed to an allocation agent. In MG-QoS, the allocation agent has the simpler task of allocating a resource set, given a reservation handle generated by a reservation agent.

4 Service Level Agreement (SLA) Service Level Agreement (SLA) is defined as a service contract between a user and a service provider that specifies the forwarding service the user desired. In fact, an SLA is a formal contractual agreement which contains issues such as network availability, payment agreements as well other legal and business related issues [10]. An SLA guarantees that traffic offered by a user that meets the agreed conditions will be carried and delivered by the service provider. Depending on the contractual agreement, failure to provide the agreed service could result in some form of monetary or legal consequences. In this context, Service Level Agreement (SLA) is used as a means of managing and monitoring QoS attributes at two levels, namely Guaranteed-service and Controlled-load, and to enforce contracts. Once an AQoS has discovered a suitable service that user required, an SLA needs to be established between the service requester and the service provider. If the user requires “Guaranteed-service” QoS, then the SLA consists of exact constraints which must be met by the service provider. If a “Controlled-load” QoS is being requested, then the QoS requirements are more relaxed and the parameters are specified as a range constraint. A sample SLA specification document is shown as follows.

5 QoS Properties Having discussed the service discovery based on QoS properties, we now need a mechanism for the service providers to advertise their services with QoS capabilities.

MG-QoS: QoS-Based Resource Discovery in Manufacturing Grid

505

Note that the OGSA specification for service interface definition documents [3], which defines the elements tags and their grammars in WSDL, does not include any tags for QoS provisions. Therefore, we imagine the incorporation into the main tag a new subelement tag called with various QoS attributes, as showed in the following WSDL service interface definition document:

In this way, service providers will be able to define their services using WSDL documents along with their QoS capabilities, and subsequently services would be registered with MDS. Having considered the technologies used by OGSA, namely WSDL, Simple Object Access Protocol (SOAP) [11] and MDS, it can be said that Index Service is the primary registry and service discovery engine. When a service is registered to local index service, aggregator mechanism in the Index Service will collect the corresponding service data, including QoS properties, which is provided in WSDL. Via the service data querying mechanism provided by MDS, we can discover the desired service based on QoS properties.

6 Conclusions A framework for QoS-based service discovery in Manufacturing Grid is described. The main focus of this framework is to provide a means for the service requesters to search for services based on QoS criteria in MG, to provide QoS guarantees for service execution and to enforce these guarantees by establishing SLAs.

References 1. 2. 3. 4. 5. 6.

I. Foster, etc. The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. http://www.globus.org/research/papers/ogsa.pdf. Globus Toolkit Project, http://www.globus.org (2003). Tuecke, K. Czajkowski, I. Foster, etc. Open Grid Services Infrastructure (OGSI). http://www-unix.globus.org/toolkit/draft-ggf-ogsi-gridservice-33_2003-06-27.pdf I. Foster, etc. A distributed resource management architecture that supports advance reservation and co-allocation. Proceedings of the International Workshop on QoS,1999. R. Al-Ali, O. Rana, etc. G-QoSM: Grid service discovery using QoS properties. Computing and Informatics Journal, Special Issue on Grid Computing (2002). A. Oguz et al. The mobiware toolkit: Programmable support for adaptive mobile networking. IEEE Personal Communications Magazine, 5(4), 1998.

506 7. 8. 9. 10. 11. 12. 13.

Z. Shi, T. Yu, and L. Liu

B. Teitelbaum, S. Hares, etc. Internet2 QBone: Building a testbed for differentiated services, IEEE Network, 13(5):8–16, 1999 S. Shenker, etc: Specification of guaranteed Quality of Service. Request for Comments RFC 2212, Internet Engineering Task Force, September 1997. J. Wroclawski. Specification of the controlled-load network element service. Request for Comments (Standard Track) RFC 2211, IETF, September 1997. S Sohail and S Jha. The Survey Of Bandwidth Broker. Technical Report UNSW CSE TR 0206, University of New South Wales. May 2002. Simple Object Access Protocol (SOAP). http://www.w3.org/TR/SOAP (2003). Liu Lilan, Yu Tao, Shi Zhanbei, etc: Self-organization Manufacturing Grid and Its Task Scheduling Algorithm. Computer Integrated Manufacturing Systems (2003). Shi Zhanbei, Yu Tao, Liu Lilan: Service registry and discovery in Rapid Manufacturing Grid. Computer Application (2003).

An Extension of Grid Service: Grid Mobile Service Wei Zhang1,2, Jun Zhang1, Dan Ma1, Benli Wang2,and YunTao Chen2 1

Huazhong University of Science and Technology, 430074, HuBei, China [email protected] 2

Wuhan Ordnance N.C.O. Academy of PLA, 430075,Hubei, China,

Abstract. As described in Open Grid Service Architecture(OGSA), Grid Service is static: it is fixed on a grid node machine providing this service without mobility. This paper brings up Grid Mobile Service conception, gives the description of Grid Mobile Service based on Mobile Agent technology, and then discusses the critical factors of Grid Mobile Service and its implementation methods in details. It has great theoretic and practical values on consummating and improving the practicability and flexibility of Grid Service.

1 Introduction The Open Grid Services Architecture(OGSA) adopts so-called Grid Service to represent computational and storage resources, networks, programs, databases, and the like. Until recently, Grid Service in OGSA is static, conforming to Web Service, defining a mode of “XML+RPC”. Those services are fixed on host, mapped to uniform core interfaces, and called by other nodes. Those nodes should maintain consistent connect. As for complicated distributed computing, RPC need to transfer parameters and results to and from frequently. With the application complexity enhancing, the invocation number increases rapidly, taking up a great deal bandwidth. Although the Simple Object Access Protocol (SOAP) protocol handles sequence events by multilevel Web Service invocation, every invocation in different levels should be statically specified in SOAP specification, lacking of flexibility and intelligence. However, in some circumstances, some services are expected to be mobile like as: in DCE, problem-solving arithmetic moves to a high performance computer; entities carrying task move to other nodes to balance the load. In information retrieving, codes wandering in network retrieve information and send back the final results. In e-commerce, several intelligent agents travel in different web sites to discover resource, look for services, and bargain autonomously. In mobile computing, mobile users send a request to execute in the grid and receive the results in a proper time, location and manner. Consequently, mobile service’ introduction can increase Grid Service flexibility and validity, providing more selections and technology support to applications. It is here that Mobile Agent(MA) technologies enter the picture. MA technology addresses precisely the challenges that arise when we seek to build mobile service. MA are autonomous software entities that can halt themselves, ship themselves to another agent-enabled host on the network, and continue execution, deciding where to go and what to do along the way. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 507–512, 2004. © Springer-Verlag Berlin Heidelberg 2004

508

W. Zhang et al.

We introduce MA technology to Grid Service, extending Grid Service, and presenting a new type of Grid Service, named Grid Mobile Service. Grid Mobile Service is the extension of Grid Service. It is defined as: it is an intelligent code service wandering in grid nodes to accomplish certain task and provide certain service. Grid Mobile Service provides a series of standard interfaces and conforms specific conventions to solve such problems as: mobile service discovery, dynamic service creation, lifetime management, notification, mobile service interacting and mobile service migration, etc. Service entities look for grid nodes autonomously, moving between effective nodes, negotiate with other mobile entities to accomplish assigned tasks. Grid Mobile Service is the significant extension and complementary to Grid Service. It can construct Grid Mobile Service rapidly from system bottom, expand the scope of Grid Service in application layer, and enrich Grid Service objects as well.

2 Grid Mobile Service Description Grid Mobile Service is a Grid Service marked by a globally unique name, the Grid service handle(GSH) and described by corresponding Grid Service Reference(GSR). GSH is registered by Registry interface, while service caller retrieve registry services by a standard FindServiceData operation. The users need not be aware of how a mobile service moves but be guaranteed that the service can accomplish the assigned task. The mobility of the service is transparent for users. The hosts should provide execution environment for MA. An environment is named as homogeneity environment, if, for example, mobile Java codes only run on Java Virtual Machine, whereas is named heterogeneity environment. The MA codes and state’s moving in homogeneity environment is called as Restricted Mobility, i.e. a MA moves to another homogeneity environment to continue to execute. When Grid Service is extended to be a Grid Mobile Service, Restricted Mobility Service only occurs in homogeneity environment. However, service is not mobile in heterogeneity environment. To solve this problem, Unrestricted Mobile Service is presented, that is, it is a kind of Grid Service that moves service description, calling local services in heterogeneity environment. For those events need to be executed in different environment serially, it is more suitable to apply Unrestricted Mobile Service. These services are called one by one extremely sophisticatedly, while they are called only once if unified to be an Unrestricted Mobile Service.

3 Creating the Grid Mobile Service’s Conventions Conforming to OGSA Based on both Web Services and MA technologies, the key of Grid Mobile Service’ implementation is to integrated the two seamlessly. A Grid Mobile Service is a Web service that conforms to a set of conventions (interfaces and behaviors) that define how a client interacts with a Grid Mobile Service. In this section, we focus on key technical details of how Mobile Agent system is replanted to OGSA.

An Extension of Grid Service: Grid Mobile Service

509

OGSA defines a uniform exposed service semantics (the Grid Service) and defines standard mechanisms for creating, naming, and discovering transient Grid Service instances, provides location transparency and multiple protocol bindings for service instances, and supports integration with underlying native platform facilities[1]. OGSA defines a variety of behaviors and associated interfaces, all but one of these interfaces (GridService) are optional. We add a mobile service interface to standard interfaces shown as figure 1. The function of this interface’ is similar to Mobile Agent platform, with responsibility for management of mobile service’s dispatch and recall.

Fig. 1. The structure of grid service

Mobile Service interface is associated with dynamic set of service data elements— named and typed XML elements encapsulated in a standard container format. Service data elements provide a standard representation for information about Grid Service instances. All MA’ functions are encapsulated in the Mobile Service interface, which provide the standards for program design and replant.

4 Extension of WSDL Protocol Web Service protocol should be extended to adapt to Mobile Service interface. In Web Service stack, UDDI is used to register and discover service. With less associated with Service’ mobility, UDDI, XML and bottom transport protocol is need not to be modified. The WSDL is an XML document for describing Web Services as a set of endpoints operating on messages containing either document-oriented (messaging) or RPC pay loads. The main elements in WSDL service description include Types, Message, Operation, PortType, Binding, Port Service. In order to support Grid Mobile Service, following elements should be added. It is shown as table 1. In table 1, ServiceType element denotes the type of service mobility. Because Restricted Mobile Service and Unrestricted Mobile Service are different from migrationmechanism, service’ type should be definitely indicated. The rule of mobility is: Unrestricted Service is allowed to move to any mobile service interface, while Re

510

W. Zhang et al.

stricted Service moves to restricted mobile service interface. ExecuteEnvironment denotes the concrete runtime environment, only for the Restricted Service, because Restricted Service’s program codes move in homogeneity environment. Recently, most MA and MA platform are programmed in Java, those nodes providing Java execution environment are main nodes for restricted service. Itinerary element denotes host address of service move to. The URL’ format is:XXX://host name(or IP address) : port, where XXX is mobile Agent Transport Protocol(ATP). For example, if IBM aglets system is adopted, URL is then denoted by atp:// example.mobilegrid.cn:4434. For certain projects, their service execution’ hosts are decided beforehand, the value of Itinerary can be assigned directly or NULL. If the value is NULL, it is grid platform or grid administrator’ responsibility to appoint next destination node. By used of discovery and find mechanism, mobile service can find the service when moving, until the service is accomplished.

5 Migration Mechanism of Mobile Service The above extension protocol SOAP and WSDL are used to publish, identify and explain mobile service. The concrete migration of service involves mobile agent control mechanism, like object serialization, thread migration, mobile agent dispatch and recall etc. Following is to discuss migration mechanism of different type of mobile services, the others will not discuss in this paper. For difference in code migration between Restricted and Unrestricted Mobility, the migration mechanisms are different between them. For Restricted Mobility, service codes being encapsulated to be a standard mobile agent implement object serialization by adding mobile service interface. Mobile Service interface dispatches agents according to the value of Itinerary in WSDL. For Unrestricted Mobility, services handle a series of serial distributed application, and service codes needn’t to move without object serialization. The approach is to encapsulate useful information caused by every application to be a MA and move it according to Itinerary’s value. The information includes relative elements and information illustrating this service. On re-

An Extension of Grid Service: Grid Mobile Service

511

ceiver grid node, the receive mechanism is opposite to dispatch mechanism. Mobile Services negotiate and communicate with each other by use of SOAP protocol embodying KQML or FIPA-ACL.

6 Example: A Mobile Data Mining Service We borrow an example [5] of data mining to show how mobile services are used in Grid environment. Figure 2 depicts a situation in which a user wants to discover, acquire, and employ remote capabilities to create a new database using data mined from a number of databases. The figure illustrates the following steps:

Fig. 2. A mobile data mining service example

1. The user—or, more likely, a program or service acting on the user’s behalf— contacts a registry that a relevant Virtual Organization maintains to identify service providers who can provide the required data mining and storage capabilities. The services may be mobile services transparent to user. The user request can specify requirements such as cost, location, or performance. 2. The registry returns handles identifying a miner factory and database factory maintained by service providers that meet user requirements—or perhaps a set of handles representing candidate services. In either case, the user identifies appropriate services. Supposing the user selects the mobile service. 3. The user issues requests to the miner and database factory specifying details such as the data mining operation to be performed, the form of the database to be created to hold results, initial lifetimes for the two new service instances, and the time the user receive the results. 4. Assuming that this negotiation process proceeds satisfactorily, two new service instances are created with appropriate initial state, resources, and lifetimes. The user may be offline, receive the results at proper time.

512

W. Zhang et al.

5. The miner service creates and dispatches miner agents responsible for data mining and then is offline with remote databases. The agents carrying tasks and some information are dispatched to different databases to execute. The information includes destination address, its lifetime, and results storing address, etc. During the execution, the agents can communicate and contact with others. 6. The miner agents return the results either to the host agent or directly to the newly created database, as Figure 2 shows. Referencing to the initially negotiated lifetimes, the whole time the task completed is decided by agents autonomously. 7. The hosting service recalls or kills the outside agents, and releases the resource. 8. The results are cached in appointed location, and the User received the results at proper time.

7 Conclusion and Further Works Grid Service described in OGSA is static: it is fixedly installed on the Grid node machine without mobility. This static service results in many drawbacks such as continuous connection, waste of bandwidth, less intelligence and overloaded with services calling. This paper brings up Grid Mobile Service conception, researches the framework and implementation of Grid Mobile Service, gives the description based on MA technology, and then discusses the critical factors of Grid Mobile Service and its implementation methods in detail. The use of Grid Mobile Service can increase the validity and flexibility of Grid Service, especially for those applications like mobile computing, e-commerce, and information retrieve. The further works are follows: the new problems may arise for introduction of mobility of grid services. These problems include protocol compatibility, security of mobile service and complexity of programming, etc. Our further works are to consummate protocol extension, implement the mobile agent’s replant to Grid, and deeply research and implement mobile agent control mechanisms in grid environment.

References 1. I. Foster and C. Kesselman, eds., The Grid: Blueprint for a New Computing Infrastructure, Morgan Kaufmann, San Francisco, 1999. 2. I. Foster et al., “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration,” tech. report, Glous Project. 3. http://www.fipa.org 4. http://www.trl.ibm.com/aglets/index_e.htm 5. Ian Foster, Carl Kesselman, Jeffrey M.Nick, “Grid Services for Distributed System Integration”, The new net, 2002 IEEE,P37-46

Supplying Instantaneous Video-on-Demand Services Based on Grid Computing* Xiao-jian He, Xin-huai Tang, and Jin-yuan You Department of Computer Science and Engineering Shanghai Jiao Tong University, Shanghai 200030, China {he-xj, tang-xh, you-jy}@cs.sjtu.edu.cn

Abstract. It is impossible to satisfy the infinite rapid-increasing requirements with finite addition of servers and bandwidth in the traditional VOD (Video-onDemand) system. In this paper, we propose a novel hybrid Grid-type architecture to resolve this problem. By taking advantage of the large storage and the powerful processing capability in current client-side devices, user host serves both as a client and as a mini video server, and the system capability is enhanced due to the contribution of user hosts. This VOD system not only benefits from the secure and efficient organization in Grid computing, but also benefits from the flexibility and scalability in peer-to-peer models. In order to reach an optimal utility factor of system resources, an adaptive video delivery policy is implemented in this system. By cooperating with each other, the relevant servers can supply instantaneous video services for local users.

1 Introduction Recent advances in computers and communication technologies (e.g. high-speed networks) have made VOD (Video-on-Demand) services a reality. VOD services are considered as the emerging trend in home entertainment, as well as in remote education, banking, home shopping, and interactive games. Designing and implementing a cost-effective VOD system so as to supplying instantaneous VOD services is still a challenging task. A typical VOD system consists of a set of centralized video servers and geographically distributed clients connected through high-speed networks. A large number of video files are stored on the servers and played by the clients. Under this client-server architecture, a client sends a request to the video server for a video title, and the server admits this request by admission control policy so as to deliver video stream to the client for playback. As the number of user increases, the server will eventually exceed its capacity limit. At this time, one conventional solution may add more servers and each server serves part of the users. The concurrent video servers only extend the server-side system capacity. The distribution network will also be upgraded with more bandwidth to satisfy the increasing amount of video traffic to the * This paper is supported by the Shanghai Science and Technology Development Foundation project (No. 03DZ15027) and the State Natural Science Foundation project (No.60173033). M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 513–520, 2004. © Springer-Verlag Berlin Heidelberg 2004

514

X.-j. He, X.-h. Tang, and J.-y. You

user [1][2][3]. Nevertheless, it is impossible to satisfy the infinite rapid-increasing requirements with finite addition of servers and bandwidth. In this paper, we propose a novel hybrid Grid-type architecture to resolve this problem. Due to the rapid development in computer industry, the current client-side devices in a VOD system not only are low cost, but also are relatively powerful. In the light of taking advantage of the increasing storage and processing capability of the client-side devices, client-side device (or called user host) may serve both as a client and as a mini-server [3]. As the number of user host increases, the total system capacity will be extended flexibly due to the contribution of user hosts. Based on this Grid-type architecture, this VOD system not only benefits from the secure and efficient organization in Grid computing, but also benefits from the flexibility and scalability in peer-to-peer models. Hosts may be assigned to different autonomous groups according to its interests, and workloads are apportioned among autonomous groups fairly. After a long time of running, video data may be distributed among the user hosts. Some video services may be supplied by relevant mini servers locally, which reduce the need for long-distance communication. Implementing a practical VOD system involves a lot of problem. These problems include the dedicated video server placement, the admission control policy, the caching policy in user hosts (namely mini-servers), the adaptation policy for supplying instantaneous VOD services, etc. In this paper, we focus on the adaptation policy. Based on this hybrid Grid-type architecture, this adaptive video delivery policy mainly employs a new dynamic buffering algorithm and an improved video multicast strategy to reach an optimal utility factor of system resources (e.g. disk I/O bandwidth, network bandwidth, etc).

Fig. 1. The hybrid Grid-type architecture for VOD systems

2 The Hybrid Grid-Type Architecture As shown in Fig. 1, a VOD system based on this Grid-type architecture comprises a few dedicated video servers, a pool of non-dedicated mini servers (namely user hosts) and some low-powerful client-side devices (called Only-Viewer). Each component in this system is connected by network. Furthermore, the VOD system is organized as a

Supplying Instantaneous Video-on-Demand Services Based on Grid Computing

515

hierarchical framework according to the network topology, the device functionality and the trust of each host.

2.1 The Functional Components in This Hierarchical Framework Corresponding to the layout of whole system, dedicated video servers are arranged evenly to form a typical distributed subsystem [1], which in charge of original distribution of video data, initial load balance, security policy, global admission control policy, and overall availability, etc. As a dynamic node, one user hosts may has no enough confidences to guarantee the security and reliability. When some dedicated servers are running as administrators, its high-end hardware and high security condition provide a top guarantee for the VOD system with less complexity, and the information organization involving the dedicated server may reduce the number of messages for video index service. Each user host has its own CPU, memory and disk storage. In general, a typical PC client for video on demand is equipped with Disk(>60GB), RAM(>256MB), and powerful CPU. The new set-top box (STB) will be equipped with large capacity disk. An entire video title may be replicated (or called cached) in a user host after this user host performs a video on-demand service. Mini video server software can be downloaded from higher-level video server into this user host. When a video on-demand request is admitted, this mini video server software is running to deliver a video title to other nodes. The process of getting software and providing VOD services is based on the Grid Security Infrastructure (GSI) protocol implemented in this Grid-type architecture. An Only-Viewer node is a simple video-receiving and decoding device, and just taken as a terminator for playback. In this case, its proxy node assists some relevant video servers in supplying video services, and an individual static channel is allocated for this end node. An Only-Viewer node is just a consumer with bad efficiency. It does no help to increasing system capability.

2.2 Integration of Grid Computing and P2P Models Currently, peer-to-peer models (e.g. Napster) are increasingly becoming popular for sharing information and data through direct exchange, and computational Grids have emerged as popular platforms for deploying large-scale and resource-intensive applications. The Grid is defined as coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations. This concept of a virtual organization (VO) is central to Grid computing. Peer-to-peer models offer the important advantages of decentralization by distributing the storage capacity and load across a network of peers and scalability by enabling direct and real-time communication [4] [5]. As an integration of features in Grid and P2P models, this hybrid Grid-type architecture not only can enable a scalable system capacity, but also has a secure and efficient organization.

516

X.-j. He, X.-h. Tang, and J.-y. You

When a user host serves as a mini server, a credit is need to allocate the reserved network resource in a controllable network environment, and sufficient reliance may be needed to provide services for other nodes. All of these can be carried out under the security policy and global resource collaboration framework in Grid. A node may request or provide video services, so user host will involve different autonomous groups according to its interests. We consider that two hosts have similar interests if they are able to provide video services to their each other’s requests. Hosts learn about the interests of their peers by monitoring the replies they receive to their requests. Therefore, hosts decide whom to connect to when to add or drop a connection based on this local information. Hosts with high degree of similar interests are considered good peers [5], which form an autonomous group. The higher-level video servers maintain dynamic index information for its lower-level autonomous group. Under this Grid-type architecture, a request for a specific video title is processed in an autonomous group it is inside at first. If this try fails, the request will be delivered to higher-level server. This will be done so on until this request reach top dedicated servers. When this request is admitted according to the security and resource control policy, these relevant peers will provide video services with no-delay and global efficiency.

3 Video Multicast Delivery Based on Dynamic Buffering Statistical analysis shows that masses of requests focus on a few hot video titles and the entire system availability mainly rely on the network and disk bandwidth utilization. When various users request same video title, the multicast delivery allows these users sharing one channel so as to admit more concurrent on-demand users [6]. In a controllable high-speed network, the QoS can be guaranteed with a core-stateless proportional adaptive fair bandwidth allocation mechanism. Each video stream can be allocated one dynamic channel based on Diffserv, which guarantee relevant network bandwidth [7]. Further information shows that the power of network and storage is projected to double every 9 and 12 months respectively, and the increase in disk I/O bandwidth is slower than former case. In current VOD system, disk I/O bandwidth easily becomes limited. The disk performance becomes a bottleneck of the VOD system.

3.1 Video Multicast with Stream-Merging Algorithm An individual channel is allocated to transmit the entire video in order to satisfy the first request for this video title. When a new request for the same video title is admitted after seconds, we can multicast the rest of the video data through the sharing channel. There is at least two video streams may be processed in the new on-demand user host: the multicasting segment of video data and the fist seconds of the video through another channel allocated at this time. These video streams are merged for playback in client-side storage. The principle for this improved multicast policy is

Supplying Instantaneous Video-on-Demand Services Based on Grid Computing

517

shown in Fig. 2. Contrasting to typical batching scheme, this multicast policy can supply instantaneous VOD services without delaying the new on-demand request.

Fig. 2. Video multicast with stream merging

3.2 Improve Disk I/O Bandwidth Utilization with Dynamic Buffering In the conventional buffering mechanism, the video server maintains an equal buffer for real-time presenting video. Reading and transmitting video data is under the control of this buffering mechanism so as to prevent overflow or underflow in clients. Prior research has shown a Poisson arrival stream of requests. This user pattern implicates that the average disk I/O workload less than the maximal disk I/O capability [8]. In order to improve disk I/O bandwidth utilization, one method may be reading video data at full speed. It is feasible with sufficient network bandwidth and the capability for buffering entire video title in clients. Based on dynamic buffering scheme, video data will be transmitted to clients with best effort. Because the time for single on-demand video title is reduced, more resource may be ready for latter ondemand requests, and the number of total successful on-demand user will be increased during long time.

518

X.-j. He, X.-h. Tang, and J.-y. You

In video server based on this buffering scheme, one dynamic buffer is created for each video stream. The buffer size is adjusted according to video data reading bit rate and channel bandwidth for this video stream. The structure of this dynamic buffer is a link of pages memory. When a video server maintains multiple video streams, the disk performs at full speed, and each video stream is allocated a portion of the disk I/O bandwidth according to the requirements for video presenting and transmission.

4 Adaptive Video Delivery Scheme in This Grid-Type VOD System In this Grid-type VOD system, the mini-server runs as foreign application in user host. As non-dedicated video server, mini-server has lower priority than local application in user host. According to the power of every video server and network availability, the system can provide optimum VOD services. When a new on-demand request comes, some good peers are selected to provide VOD services. If it request the same video title processed by these good peers, it will join into multicast delivery for this video title. Otherwise, one video server starts an individual video delivery for its first presenting. We summarize in Table 1 the symbols we use in this paper.

For any new on-demand request, the video server must determine whether to admit this request by itself or to ask other servers for help. When a video server keeps in stable state, the number of video stream is k. The essential memory space in video server must be reserved for each video stream. Namely, essential buffer size is reserved to guarantee real-time presenting for the video stream i. When one dynamic buffer reaches its upper limit, make a pause in its disk read. The upper limit for video stream i is When a new on-demand request comes to this video server, the number of video stream may be k+1. Based on these following conditions, this video server can determine whether to admit this request or not.

Supplying Instantaneous Video-on-Demand Services Based on Grid Computing

519

If all of these conditions can be satisfied, this request can be admitted immediately. Otherwise, another video server will be selected to continue.

5 Conclusion In this paper, a hybrid VOD system is implemented based on Grid computing. By taking advantage of the large storage and the powerful processing capability in clientside devices, user host serves both as a client and as a mini video server. With the contribution of user hosts, we achieve scalable system capability. All of these user hosts are assigned to different autonomous groups, and workload may be apportioned among autonomous groups fairly. Based on improved multicast policy and dynamic buffering algorithm, this adaptive video delivery scheme furthers resources utilization. Because user hosts act as non-dedicated servers, relevant servers need to cooperate with each other to supply video services with no-delay. In this VOD system, all of the video data are stored on the dedicated servers, and parts of these video data are cached in user hosts. Requests for popular video will benefit from this policy. Due to the limited storage capacity in the user host, a video data distributed policy based on this caching method must be designed in future work. With a large caching space in the cooperative user hosts, more and more video services may be supplied locally.

References 1. S.-H. Gary Chan, Fouad Tobagi: Distributed Servers Architecture for Networked Video Services. IEEE/ACM Transactions on Networking, Vol.9, No. 2, April 2001, 125–136 2. Sridhar Ramesh, Injong Rhee, Katherine Guo: Multicast with Cache (Mcache): An Adaptive Zero-Delay Video-on-Demand Service. IEEE Transactions on Circuits and Systems for Video Technology. Vol. 11, No. 3, March 2001, 440–456 3. J.Y.B.Lee, R.W.T.Leung: Study of a Server-less Architecture for Video-on-Demand Applications. Proceedings of the IEEE International Conference on Multimedia and Expo 2002, Lausanne, Switzerland, Aug 2002 4. I. Foster, C. Kesselman and S. Tuecke: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, Vol. 15, No. 3, 2001

520

X.-j. He, X.-h. Tang, and J.-y. You

5. Murali Krishna Ramanathan, Vana Kalogeraki, Jim Pruyne: Finding good peers in peer-topeer networks. Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 02), 2002, 232–239 6. L. Gao, D. Towsley: Supplying instantaneous video-on-demand services using controlled multicast. Proc. IEEE International Conference on Multimedia Computing and Systems, 1999 7. LI FangMin, LI RenFa, YE ChengQing: A Core-stateless Proportional Adaptive Fair Bandwidth Allocation Mechanism. Journal of Computer Research and Development, Vol. 39, No. 3, March 2002, 269–274 8. He Xiaojian, Li Fangmin, You Jinyuan: A video-on-demand delivery policy for improving the disk access performance. Computer Science, Vol. 30, No. 4, April 2003, 76–78

A Grid Service Lifecycle Management Scheme Jie Qiu1, Haiyan Yu2, Shuoying Chen1, Li Cha2, Wei Li2, and Zhiwei Xu2 1

Computer Science and Engineering, Beijing Institute Of Technology, Beijing, 100081, China [email protected] [email protected] 2

Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China {yuhaiyan,char,liwei,zxu}@ict.ac.cn

Abstract. Grid technologies enable large-scale sharing of computing resources and make them open to formal or informal consortia of individuals, who utilize these computing resources by means of consuming Grid Services. In such context, a more mature mechanism of Grid Service management, not only on the creation and destruction of Grid Service instances but also on the whole lifecycle of Grid Services, is brought up in this paper. Hosting environments incorporated with this mechanism are more flexible and powerful to support Grid Service provisioning, such as clone of overloaded services and automatic recovery of failed services. A Grid Service naming convention, security consideration in this mechanism and our implementation of the mechanism are also presented in this paper.

1 Introduction In the OGSI [1] specification, distributed and potentially heterogeneous computing resources are abstracted as Grid Services [2] to provide a standard means of accessing them. With the support of OGSI-compliant hosting environments, physical computing resources can be easily encapsulated into Grid Services. Although the abstraction of resources as services conceals their heterogeneity, several challenging questions still remain when creating a Grid Service based application. For example, the Grid services that integrated with the application might fail without notification, requiring the system to take recovery measures such as cloning a new identical service to replace the failed one. Or, the Grid services might become overloaded and have intolerable response time. In such context, it is necessary to create new instances of services on other hosting environments on the fly to satisfy application’s requests. Furthermore, a Grid Service naming scheme is requisite for helping applications identify a Grid Service uniquely and find the right one to invoke. To address these challenges, we first propose a Grid Service management mechanism based on a formal description of a Grid Service Lifecycle and a Grid Service Naming M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 521–528, 2004. © Springer-Verlag Berlin Heidelberg 2004

522

J. Qiu et al.

convention to identify Grid Services. The Grid Service management mechanism will be integrated in hosting environment and provide some infrastructures to support cloning and dynamical loading of Grid Services. Next, we discuss security considerations and the implementation of the mechanism. Lastly, we conclude with a discussion of related work and future directions.

2 Grid Service Lifecycle At first, let’s consider the following 2 scenarios: 1. To satisfy QoS policy, a Grid Service might be cloned to another appropriate hosting environment automatically for the purpose of serving more consumers when it becomes overloaded. In this paper consumers can be a user client or even a Grid Service. 2. Under some circumstances, consumers need some proper hosting environment to deploy their own Grid Services at run-time, and this transaction will not affect other on-serving Grid Services within the same environment, e.g. without the need to restart it. To realize the above scenarios, we must explicitly define the lifecycle of Grid Services, not Grid Services Instance Lifecycle [1] that are definitely demarcated by OGSI specification, which consists of 5 phases: Loading, Installation, Initialization, Serving and Revoking.

2.1 Grid Service Loader In the loading phase, all components of a consumer-owned Grid Service are replicated to a target hosting environment (potentially from remote). The components are assembled into a package with such format that it could be directly deployed in the target hosting environment, which referred to as a ready-to-deploy package in this paper, for example, gar package in globus toolkit and war package in Service Domain [3]. Before illuminating the loading course, we define a representation of a Grid Service. We represent a Grid Service using the notation , where N denotes the name of the Grid Service and L denotes the loader of the Grid Service. As to N, we have proposed Grid Service Naming convention in latter section. Grid Service Loader (GSL) is a special Grid Service, which loads ready-to-deploy packages from remote servers or local file system, or generates ready-to-deploy packages in the target hosting environment under the control of consumers. The purpose of Loader is to support dynamical loading of Grid Services in hosting environments. There are two types of GSL: consumer-defined GSL and systematical GSL supplied by hosting environments. Systematical GSLs play two basic roles: checking existence of a specified Grid Service and loading a ready-to-deploy grid service from local or remote to its own hosting environment. Consumer-defined GSLs are written and uploaded by consumers and are also in the form of ready-to-deploy packages, which let consumers have complete control of the process of dynamical Grid Service loading.

A Grid Service Lifecycle Management Scheme

523

A consumer-defined GSL can, for example, recompile source code belong to a consumer-owned Grid Service or reconfigure some configuration files of it according to the information gathered from the target hosting environment, such as MPI binary directory, whether to support PBS, or OS information, and so on; finally the GSL assembles all stuff of the consumer-owned Grid Service and instruments a ready-todeploy package. A Consumer-defined GSL itself is represented by and furthermore all ready-to-deploy services are represented by where denotes Systematical GSL.

2.2 Formal Description of Grid Service Lifecycle Now we give 3 definitions in the below, and also illustrate how to design a model of Grid Service Lifecycle using them, which will also direct our implementation of Grid Service Lifecycle. Definition 1. K is a set of phases. where ki is one phase of Grid Service Lifecycle. Definition 2. is a set of input and output. where null and is input or output of a phase. Definition 3. f is a relation, i.e. According to the definition of our proposed Grid Service Lifecycle, Let K= {Loading, Installation, Initialization, Serving, Revoking}: Loading: I. A Consumer finds out an appropriate hosting environment that can host his own service by Information Services [4] or Matchmaking mechanism [5], and gathers environment information, such as version of axis, implementation of OGSI, and OS. II. By utilizing the Systematical GSL, the Consumer could check existence of the service to be deployed in the found hosting environment by our proposed Grid Service Naming mechanism, retrieve some characteristics of the service and make their decision if it already exists, such as version of the service and load of the service. If the service doesn’t exist, go to next step. III. If the consumer-owned service doesn’t contain a consumer-defined GSL, the consumer only employs the Systematical GSL to load the service from consumerdesignated location. Otherwise, the consumer should first utilize Systematical GSL to load the consumer-defined Loader and then employ the consumer-defined Loader to load the fact Grid Service. IV. That the ready-to-deploy package of the fact Grid Service has been generated and Loaders have correctly loaded it to right directory required by the target hosting environment indicates the completion of loading process. After the above works, a consumer-defined Loader, if exists, should be revoked, which belongs to revoking operation. Installation: Installation is comparable with deployment of Grid Services. Installation involves verifying ready-to-deploy packages and preparing run-time stuff, and would potentially trigger loading of relative services if necessary.

524

J. Qiu et al.

The representation of Grid Services, ready-to-deploy packages, is verified to ensure that its binary representation is structurally valid and complies with the target hosting environment. Verification may cause additional Grid Services to be loaded, which are employed by the installed Services. The ready-to-deploy packages must satisfy the static or structural constraints provisioned by the target hosting environment; also a hosting environment must provision relevant information of the constraints to let consumers know. Preparation involves unpacking the ready-to-deploy packages from repository of the target hosting environment, copying the files to correct directories, making some revision on some files if needed, and registering some static information to the target hosting environment. For example, most hosting environments have some configuration files to manage their Grid Services, such as server-config.xml in AXIS. What does indicate the completion of this phase? If now we restart the target hosting environment and the new installed Grid Service can correctly serve consumers after startup of the target hosting environment, we declare completion of installation of the Grid Service. But we always hope that we could directly use a Grid Service after its installation without restarting the hosting environment and without affecting other on-serving service. So we explicitly define another phase, initialization, to load Grid Services into run-time environment dynamically. Initialization: In some hosting environments, e.g. globus toolkit, automatic service activation and deactivation for more efficient and scalable memory management are supported, and this feature is also supported by both CORBA and EJB. A Grid Service in globus hosting environment will be activated on its first call or on the hosting environment startup, and then it will run as a cycle of activation and deactivation according to some policies of the target hosting environment. But the mechanism would not take effect when the consumer-owned services have just finish the last phase, installation, and the hosting environment hasn’t been restarted, because run-time structures in memory of the target hosting environment have no information of the newly installed services, though installation has updated all static configuration files. So Initialization of a Grid Service consists of registering some data structures of it to the run-time structures in memory of the target hosting environment and loading some specified components of the Grid Service to memory. After initialization, a Grid Service can correctly serve its consumers. Serving: The phase covers the whole lifecycle of a Grid service instance defined in OGSI and also includes service activation and deactivation. For more information, please refer OGSI [1]. Revoking: A consumer may request the un-deployment of a Grid Service via utilization of an explicit Revoking Service provided by the target hosting environment or via a soft-state approach, in which (as motivated and described in [2]) a consumer registers interest in the Grid Service for a specific period of time, and if that timeout expires without having received reaffirmation of interest of the service from any consumer to extend the timeout, the service may be automatically un-deployed. Periodic reaffirmation can serve to extend the lifetime of a Grid Service as long as is necessary. Of course, a Grid Service can also be hosted in the target hosting environment perpetually, and could not be revoked by consumers.

A Grid Service Lifecycle Management Scheme

525

Before really revoking a Grid Service, the target hosting environment should notify all instances of it, for example simply calling a specific operation, let them do some cleanup of resources or send back some stuff to their consumers, and then end all the instances. Revoking involves un-register and un-load all data structures of a Grid Service from memory, which has been done in initialization phase, and withdraw all files and registration information of it that has been generated in installation phase. Potentially, Revoking may also remove ready-to-deploy package of the Grid Service from repository directory of the target hosting environment. Based on the K set, we give and of three types of Grid Service Loading mode respectively: Loading local Grid Service in the hosting environment, Loading immigrant Grid Service using a systematical Loader and Loading immigrant Grid Service with a consumer-defined Loader. Local Loading: ready-to-deploy packages outside the repository directory of the target hosting environment. ready-to-deploy packages in repository of the target hosting environment, possibly all files and static configuration information to be deployed in the target hosting environment. all data structure to be registered in memory. a SOAP request from Grid Service consumers, a SOAP response to the Grid Service consumers. Loading local Grid Services in the form of ready-to-deploy packages to repository. Installing Grid Services from repository. Registering some data structures to memory. Making Grid Services ready for Serving. Grid Services Serving. Systematical Loading: Based on the above definition of and we only define the additional elements. Systematical GSL that is in the serving state. A Revoking Grid Service. Some remains of the revoked Grid Service for result and logging. Systematical GSL loads a consumer-owned Grid Service. Hosting environment requests revoking a Grid Service, because time expires. Revoking a Grid Service, not because time expires. User-defined Loading: Based on the above two definition of and we only define the additional elements. consumer-defined GSL. Loading a consumer-owned Grid Service which its Loader has made proper reconfiguration on it

526

J. Qiu et al.

Hosting environment request revoking consumer-defined GSL Revoking a consumer-defined GSL Based on the above definition of K, and we should additionally provision a Systematic GSL and a mechanism for dynamical installation and initialization of Grid Services and management of consumer-owned Grid Services, including loading and revoking, on existed hosting environments.

3 Grid Service Naming In the foregoing section, we represent Grid Service using , and N denotes the name of a Grid Service. Before a Grid Service is indeed loaded to the target hosting environment, the Grid Service Naming (GSN) convention and the definition of equivalence of Grid Services proposed in this paper can distinguish the Grid Service. Definition 4. GSN := P + SV. P: A set of the Qname of portType, all operations in the portType, and all type defines employed by the operations. All of three parts are described in Grid Service WSDL. In practice, we use hash codes to represent P. For example we use MD5 to hash every parts of the portType file. SV: Semantic Version is a set of semantic strings. Each semantic string represents semantic activates of one operation of a Grid Service PortType. In CORBA and COM, the version and IDL file identify a component. But version and IDL are not semantic. To extend this model, Grid Service provider can defines a Semantic Version for the service. One semantic string for an operation not only defines a version of the operation but also represents its semantic. Suppose that a provider provides two different Grid Services with an identical WSDL description, but function of two Grid Services is different; thus, the provider should give two different SV to the two Services. And we don’t limit the content of SV. Furthermore, provider must define a mechanism to apprehend SV and how to distinguish two Grid Services by their SV. Definition 5. iff where are two Grid Services. Definition 6. if i.e. If portTypes of two Grid Services are different, the Grid Services are different. Definition 7. If two Grid Service have identical WSDL descriptions but one or more operation(s) of them have different semantic strings and the rest operations have identical semantic strings, we call that is partly equal to represented as

A Grid Service Lifecycle Management Scheme

527

4 Security Considerations and Implementation We adopt the security mechanism of Grid Security Infrastructure (GSI). Grid Service Loader (GSL) is implemented as a secure Grid Service with the authorization policy predefined by the target hosting environment, and consumers must use a valid certificate to access the secure GSL. The GSL will authenticate consumers and authorize them according the authorization policy. Besides, hosting environment must limit what the consumer-defined GSL can do and what information it can get. Furthermore, the Grid Service loading should not affect other on-serving Grid Services in the same hosting environment. So when we implement the Grid Service management scheme, our strategy is giving a sub hosting environment, an abridged version of the hosting environment, which runs separately from the hosting environment, runs as a special system user probably differing from the system user who startups the hosting environment, and supports run-time environment for Grid Services. Based on this design, we let consumer-defined GSL run in sub hosting environment, and after Grid Service successful loading, the consumer-defined GSL will be revoked and the sub host environment will be shutdown. Fig 1 gives us the implementation mode of this scheme and Fig 2 gives us the detail framework of Grid Service Lifecycle Management.

Fig. 1. Architecture of hosting environment (left) and sub hosting environment (right)

Fig. 2. Components in Grid Service Lifecycle Management

5 Related Works We have referred to a range of related works in the body of the paper. Here we make additional comments concerning the use of globus toolkit, other works on component technologies and JVM. We use globus toolkit 3 as our infrastructure, especially OGSI implementation and of it. But globus toolkit doesn’t support management of whole

528

J. Qiu et al.

Grid Service lifecycle. We refer lifecycle of a Class specified in JVM to define Grid Service lifecycle and the model of GSN comes from illumination of version definition and IDL in component technologies.

6 Conclusions and Future Work Definition of Grid Service lifecycle in this paper, complementary to the definition of lifecycle of Grid Service instance specified in OGSI specification, covers the whole life cycle of a Grid Service, which comprises 5 phases of loading, installation, initialization, Serving, and revoking. Based on the definition, a powerful mechanism for dynamically loading Grid Services on the hosting environment has been proposed and the notion of Grid Service Loaders presented in this paper has changed the general pattern of Grid Service usage. In the past, a grid Service must be well deployed before consumers access it and consumers must follow the fixed procedures prescribed by service provides, which have limited to most extent the interaction between consumers and providers. In future, computing should be consumer-centered other than now popularly provider-centered. Under such circumstance, consumers can upload own services to fulfill their whole business by employing other services provided by Grid Service providers. But this facility will always violate security of hosting environments. So we must take further consideration in secure violation of this mechanism and system protection of hosting environments. The Grid Service naming convention proposed in this paper is an experimental solution to the key question in the field of grid computing: how to identify a Grid Service. In this paper, we have suggested a definition of equivalency of two Grid Services. And we have first proposed semantic string as a part of identification of a Grid Service. But its implementation mechanism need further research.

References 1. S. Tuecke, K. Czajkowski, I.Foster, J. Frey, S. Graham, C. Kesselman, T. Maquire, T. Sandholm, D. Snelling, P.Vanderbilt: Open Grid Services Infrastructure (OGSI) Version 1.0. Global Grid Forum 2. I. Foster, C. Kesselman, J. Nick, S. Tuecke: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Globus Project, 2002. 3. Yih-Shin Tan, Brad Topol, Vivekanand Vellanki, Jie Xing: Business Service Grid. IBM developerWorks(2003). 4. Czajkowski, S. Fitzgerald, I. Foster, C. Kesselman: Grid Information Services for Distributed Resource Sharing. Proceedings of the Tenth IEEE International Symposium on HighPerformance Distributed Computing (HPDC-10), IEEE Press, August 2001. 5. R.Raman, M.Livny, and M.Solomon: Matchmaking: Distributed Resource Management for High Throughput Computing, In proc. IEEE Symp. On High Performance Distributed Computing. IEEE Computer Society Press, 1998.

An OGSA-Based Quality of Service Framework Rashid Al-Ali1,2, Kaizar Amin1,3, Gregor von Laszewski1, Omer Rana2, and David Walker2 1

Argonne National Laboratory, Argonne, IL, U.S.A 2 Cardiff University, UK. 3 University of North Texas, U.S.A.

Abstract. Grid computing provides a robust paradigm to aggregate disparate resources in a secure and controlled environment. Grid architectures require an underpinning Quality of Service (QoS) support in order to manage complex data and computation intensive applications. However, QoS guarantees in the Grid context have not been given the attention they merit. In order to enhance the functionality offered by computational Grids, we overlay the Grid framework with an advanced QoS architecture, called G-QoSM. The G-QoSM framework provides a new service-oriented QoS management model that leverages the Open Grid Service Architecture (OGSA) and has a number of interesting features: (1) Grid service discovery based on QoS attributes, (2) policy-based admission control for advance reservation support, and (3) Grid service execution with QoS constraints. This paper discusses the different components of the G-QoSM framework, in the context of OGSA architectures.

1 Introduction Grid computing [1,2] has traditionally focused on large-scale sharing of distributed resources, sophisticated applications, and the achievement of high performance. The Grid architecture integrates diverse network environments with widely varying resource and security characteristics into virtual organizations (VO). Computational Grids offer a high end environment that can be exploited by advanced scientific and commercial applications. Soft Quality of Service (QoS) assurances are made by Grid environments by the virtue of their establishment. Grid services are hosted on specialized “high-end” resources including scientific instruments, clusters, and data storage systems. High connectivity is maintained between resources via dedicated high-speed networks. A well-established resource administration facilitates constant resource connectivity, resource monitoring, and fault tolerance. Hence, some preliminary level of QoS is provided by the committed members of the VO based on their pre-agreed Grid policy and their dedication in the overall collaboration. Nevertheless, the complexities involved in several critical Grid applications make it imperative to provide hard and guaranteed QoS assurances beyond that provided by the basic Grid infrastructure. Considering the increasing sophistication of Grid applications and new hardware under development [3] such provisions become an inherent requirement within the Grid architecture. This implies a need for a QoS management entity that facilitates a negotiation mechanism, where the clients can select the appropriate resources with QoS constraints that suit client needs. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 529–540, 2004. © Springer-Verlag Berlin Heidelberg 2004

530

R. Al-Ali et al.

Motivated by this need, to overlay an advanced QoS framework on existing Grid architectures allowing them to support complex QoS requirements, we propose a QoS management framework, called as G-QoSM. Supporting the recent standardization efforts of the Global Grid Forum [4], the G-QoSM framework is based compatible with the latest Open Grid Services Architecture (OGSA) specification. The G-QoSM framework presented in this paper has a number of important features: (1) a ‘QoS brokering service, 2) a ‘policy service and 3) a generic resource ‘reservation manager that includes: support for advance and immediate reservation, support for single and collective resource reservations (co-reservation), accommodation of arbitrary resource types, for example, compute, network and disk, and scalability and and flexibility through an object-oriented that is uses underlying resource characteristics at run-time. The paper is structured as follows. In Section 2 we provide an overview of related research in the area of resource reservation to support QoS needs. In Section 3.1 we outline the general requirements of the Grid QoS model, and present the OGSA-based G-QoSM framework with reservation support. In Section 7.1 we define the reservation, and we present a reservation admission control mechanism and reservation features. We conclude the paper with a summary of conclusions.

2 Related Work Immediate and advance reservation is considered in a wide variety of systems mostly in networking, communication, and distributed applications including distributed multi media applications (DMM). Hence it is of considerable interest to the Grid community. In the context of Grid computing, GARA [5] is a QoS framework that provides programmers a convenient access to end-to-end QoS. It provides advance reservations with uniform treatment to various types of resources such as network, compute, and disk. GARA’s reservation is a promise that the client/application who initiated the reservation will receive a specific level of service quality from the resource manager. GARA also provides reservation application program interface (API) to manipulate reservation requests, such as, create, modify, bind and cancel. NAFUR [6] describes the design and implementation of a QoS negotiation system with advance reservation support in the context of DMM applications. NAFUR aims to compute the QoS that can be supported at the time the service request is made, and at certain carefully-chosen, later times. For example, if the requested multimedia service with the desired QoS cannot be supported at the time the service request is made, the proposed approach allows the computation of the earliest time the user can start the multimedia service with the desired QoS. In [7] a resource broker (RB) model in the context of middleware for DMM application is proposed. The proposed RB has the following design goals: 1) advance and immediate reservation, 2) a new admission control scheme based on using a timely adaptive state tree (TAST) and 3) the RB processes brokerage requests for reservation, modifications, allocation and release.

An OGSA-Based Quality of Service Framework

531

In [8] advance reservation is formalized in the context of networking systems and the fundamental problem of admission control associated with resource reservation is introduced. Based on the authors literature review it is concluded that none of the previous approaches is sufficiently flexible to cover all potential needs of all users. The proposed solution to this fundamental problem is to separate the issue into a technical and a policy part supported by a specifying a generic reservation service description and a corresponding policy layer. This combination improves the flexibility of resource advance reservation compared to the other approaches. None of the research efforts address advance reservation in the context of service-oriented architecture, as in our approach. In general, resource reservation is not widely explored in service-oriented Grids. Nevertheless, the GGF Grid Resource Agreement and Allocation Protocol (GRAAP) Working Group, has produced a ‘state of the art’ document, which lays down properties for resource reservation in Grids [9]. We envision that our reservation model can be used to support the reservation properties outlined by the GRAAP-WG. The features that distinguish our work from existing QoS management approaches are that the generic QoS management service is not coupled to any specific resource type, or even limited to resource quantity; the object-oriented design and the abstraction approach gives the proposed service the ability to integrate with any brokerage system that supports web service interaction; dynamic information gathering and management, such as, resource characteristics and policy information improves scalability; and usage policy frameworks for resource providers/administrators and users to enable a fine-grained request specification. In addition to the projects mentioned above, a general negotiation model called Service Negotiation and Acquisition Protocol (SNAP) is introduced in [10], which proposes a resource management model for negotiating resources in distributed systems. SNAP defines three types of SLAs that co-ordinate management across a desired resource set, and can, together, be used to describe a complex service requirement in a distributed system environment: task SLA (TSLA), resource SLA (RSLA) and bind SLA (BSLA). The TSLA describes the task and the RSLA describes the resources needed to accomplish the task in the TSLA. The BSLA associates the resources from the RSLA and the application ‘task’ in the TSLA. The SNAP protocol necessitates the existence of resource management entity that can provide promises on resource capability; for example, RSLA. Therefore, our reservation model can encapsulate such a requirement and implement the RSLA negotiation.

3 The Proposed QoS Framework In this section we introduce the proposed Grid QoS Management framework. We outline general requirements for the framework, and then we provide discussion on QoS management and the proposed system.

532

R. Al-Ali et al.

3.1 Requirements The proposed framework must adhere to certain important requirements: Service Discovery. The system should be able to discover services based on QoS attributes. These attributes are a) quantitative and b) qualitative. For example, quantitative attributes include computation, networking and storage requirements, while qualitative attributes include the degree of service reputation and service licensing cost. To support service discovery based on these attributes, a discovery mechanism needs to be employed within the proposed framework. Resource Advance Reservation. The system should support mechanisms for advance, immediate, or ‘on demand’ resource reservation. Advance reservation is particularly important when dealing with scarce resources, as is often the case with high performance and high end scientific applications in Grids. Reservation Policy. The system should support a mechanism which facilitates Grid resource owners enforcing their policies governing when, how, and who can use their resource, while decoupling reservation and policy entities, in order to improve reservation flexibility. [8]. Agreement Protocol. The system should assure the clients of their advance reservation status, and the resource quality they expect during the service session. Such assurance can be contained in an agreement protocol, such as Service Level Agreements (SLAs). Security. The system should prevent malicious users penetrating, or altering the data repositories that holds information about reservations, policies and agreement protocols. A proper security infrastructure is required, such as Public Key Infrastructure (PKI). Simple. The system should have a simple design that requires minimal overheads in terms of computation, infrastructure, storage, and message complexity. Scalability. The system should be scalable to large numbers of entities, as the Grid is a global scale infrastructure.

3.2 Grid Quality of Service Management Grid Quality of Service Management (G-QoSM) is a new approach to supporting Quality of Service (QoS) management in computational Grids, in the context of Open Grid Service Architecture (OGSA). QoS management includes a range of activities, from resource selection, allocation, and resource release; activities applied in the course of a QoS session. A QoS session includes three main phases: i) the establishment phase, ii) the active phase, and iii) the clearing phase [11]. In QoS-oriented architectures,during the ‘establishment phase’, a client’s application states the desired service and QoS specification. The QoS broker then undertakes a service discovery, based on the specified QoS properties, and negotiates an agreement offer for the client’s application. During the ‘active phase’, additional activities, including QoS monitoring, adaptation, accounting and possibly re-negotiation, may take place. The ‘clearing phase’ is responsible to terminate QoS session, either through resource reservation expiration, agreement violation or service completion, and resources are freed for use by other clients.

An OGS A-Based Quality of Service Framework

533

Quality of service management has been explored in a number of contexts, particularly for computer networks [12], multimedia applications [13] and Grid computing [5]. Regardless of the context, a QoS management system should address the following needs: Specifying QoS requirements. Mapping QoS requirements to resource capabilities. Negotiating QoS with resource owners - where a requirement cannot be exactly met. Establishing service level agreements (SLAs) with clients. Reserving and allocating resources. Monitoring parameters associated with a QoS session. Adapting to varying resource quality characteristics. Terminating QoS sessions. The G-QoSM [14] framework aims to operate in service-oriented architectures. It provides three main functions: (1) support for resource and service discovery based on QoS properties, (2) support for providing QoS guarantees at middleware and network levels, and establishing Service Level Agreements (SLAs) to enforce these guarantees, and (3) providing QoS adaptation for the allocated resources. The G-QoSM delivers three types of QoS levels: Guaranteed, Controlled Load and Best Effort QoS. At the ‘guaranteed level’, constraints, related to the QoS parameters of the client, need to exactly match the service provision. ‘Controlled load’ is similar to the ‘guaranteed’ level, with the exception that less stringent parameter constraints are defined, and the notion of range-based QoS attributes is used along with range-based SLAs. At the ‘best effort’ QoS level the resource manager has full control in choosing the QoS level without constraints, corresponding to the default case when no QoS requirements are specified. The G-QoSM is an ongoing project, previously investigated and implemented in the context of Globus toolkit (GT) 2.0, [14] [15] using the GARA framework to provide QoS support for ‘compute’ resources. However, with the emergence of service-oriented Grids, and Open Grid Service Architecture (OGSA) [ 16] it is necessary to introduce new features to the G-QoSM, to make it OGSA-enabled and GT3 compliant. In this new GQoSM architecture GARA is not utilized, and is replaced by a new reservation manager, policy service, allocation manager and a newly-developed Java API for a Dynamic Soft Real Time (DSRT) scheduler [17]. The new features in the OGSA-enabled G-QoSM are as follows: QoS brokering service as a Grid service. generic resource reservation manager. policy service as a Grid service. A framework that is OGSA-enabled and can be instantiated in the context of GT3. Figure 1 shows the new G-QoSM OGSA-enabled architecture.

4 QoS Grid Service QoS Grid Service (QGS) is the focal point of this architecture and exists in every Grid node. The QGS interacts with the client’s application, the QoS selection Service, the reservation manager, and the policy Grid service to support:

534

R. Al-Ali et al.

Fig. 1. Framework Architecture.

Interaction with Client’s Application. To primarily capture the service request with QoS constraints, and to negotiate a QoS agreement SLA interaction with client’s application is needed. This negotiation can be summarized as attempting to find the ‘best match’ service, based on given properties and priority levels, for example, one might request that cost has a higher priority than service reliability, and the matching process should comply with such a requirement. Once the best service match is found, and corresponding resources are reserved, an agreement offer is proposed to the client’s application. If the proposed agreement is approved, it becomes a commitment, and the QGS regards this agreement as a fixed guarantee. Otherwise resources are released and no agreement takes place. Interaction with the QoS Selection Service. In order to support basic concept queries, a QoS selection service is provided with QoS constraints similar to the one supplied by the client’s application. It’s main function is to provide information for selecting the best service. Normally, the selection service replies with a list of service matches, which necessitates the QGS selecting one of the returned services. To enable the best selection, we adapted a selection algorithm based on a Weighted Average (WA) concept, taking into account the proportional value of each QoS attribute, using the importance level supplied by the user in the ‘service request’, rather than each attribute being treated equally. The ‘importance level’ associates a level of importance or priority, such as High (H), Medium (M) and Low (L), to each QoS attribute, with this importance level mapped

An OGSA-Based Quality of Service Framework

535

to a numerical value (real number). The algorithm computes the WA for every returned service and selects the service with the highest WA. Interaction with Reservation Manager. After selecting a Grid service the functional requirements, required in support of the reservation, are extracted and formulated as resource specifications. These resource specifications are then submitted to the reservation manager for resource reservation, and a reservation ‘handle’ is returned in the case of a successful reservation. This reservation handle can be later used to claim, or manipulate, the reservation. Interaction with Policy Grid Service. Interaction with the policy grid service enables the QGS to capture policy information necessary to validate the service request. For example, to discover if there is any limitation on resource utilization per service, or the class of service requested. The QGS validates the service request by applying the rules obtained from the Policy Grid Service.

5 QoS Allocation Manager The Allocation Manager’s primary role is to interact with underlying resource managers for resource allocation and de-allocation, and to inquire about the status of the resources. It has interfaces with various resource managers employed in this framework, namely, the Dynamic Soft Real Time Scheduler (DSRT) [17] and a Network Resource Manager (NRM). It associates the execution of Grid services with a previously-negotiated SLA agreement, which process, of associating Grid services with SLAs, is beyond the scope of this paper. The Allocation Manager further interacts with adaptive services to enforce adaptation strategies, with more details on adaptation to be found in [15]. The DSRT [ 17] is a user-level soft real-time scheduler, based on the changing priority mechanism supported by Unix and Linux operating systems. The highest fixed priority is reserved for the DSRT and the real-time process admitted by the DSRT can then run under the DSRT scheduling mechanism. The real-time process can thus be scheduled to utilize a specific CPU percentage. Therefore, the compute QoS supported by the DSRT can be specified in terms of CPU percentage; for example, a real-time process might request the allocation of 40% of the CPU. The Network Resource Manager (NRM) is conceptually a Differentiated Services (Diffserv) Bandwidth Broker (BB) (a concept described in [18]), and manages network QoS parameters within a given domain, based on agreed SLAs. The NRM is also responsible for managing inter-domain communication, with NRMs in neighboring domains, to coordinate SLAs across domain boundaries. The NRM may communicate with local monitoring tools to determine the state of the network and its current configuration.

6 QoS Policy Service Policy Service is a Grid service aiming to provide dynamic information about the domainspecific resources’ characteristics and the domain’s policy concerning when, what and who is authorized to use resources. This policy service relies heavily on the existence of a policy repository, such as, the ‘policy controller’ in our framework. Resource owners

536

R. Al-Ali et al.

include in the policy repository domain-specific rules; for example, resource capacity allowed to be utilized with user authentication, time of the day and class of service. These rules are utilized by the policy service manager to provide information on resource characteristics and domain policies. Having a separate policy manager as a Grid service allows the following advantages: The ability for resource owners to update their policy repository without interfering with other broker services. The resource owner may delegate a remote ‘super’ policy service to act as the policy controller of their resources. Similarly, a policy service might control more than a single administrative domain. Decoupling the policy service from other broker services, allows the ability to dynamically change resource usage policy and system scalability.

7 QoS Reservation Manager Reservation support plays a major role in QoS-oriented architecture. In a shared resource environment, such as Grids, QoS brokers can provide promises on delivering certain resource quality to their clients, if, and only if, a reservation mechanism exists. A reservation can be viewed as a promise from the resource broker to clients on expected quality. Advance resource reservation is defined as: a possibly limited or restricted delegation of a particular resource capability over a defined time interval, obtained by the requester from the resource owner through a negotiation process [9]. As pointed out earlier, resource reservation can be categorized into: (a) Advance reservation and (b) Immediate or ‘on demand’ reservation, and can be for a specified duration, or indefinite. In the proposed reservation manager, we support advance/immediate reservation for a specified duration. Indefinite reservation is undesirable as it introduces blockages, which may result in a waste of unused resources. An important feature of this reservation approach is support for the co-reservation of various resources in service Grids. In this section we further discuss the formal definition of reservation, admission control and outline reservation features.

7.1 Reservation Definition We define a reservation model for collective Grid resources, with as few restrictions as possible, to increase the flexibility of the admission control. The fundamental problem with advance reservation, as discussed in literature [8], is that when an advance reservation is granted, the time from when the reservation is submitted until the start time, is called ‘hold-back time’, and to utilize, or grant, reservations during hold-back time is a complex problem. The problem arises when clients request immediate reservation for an indefinite period, which may, obviously, overrun a previously-granted advance reservation. A number of solutions are proposed to solve this problem; for example, all reservations, including immediate reservation, must be specified within a time frame (i.e. indefinite reservation is not supported); another solution proposes to partition resources for immediate reservation, and advance reservation with specified durations. In

An OGSA-Based Quality of Service Framework

537

this model we opt for the first proposal; that all reservations must be accompanied by duration specifications. We consider this a valid assumption as we deal with high performance resources, and application domains, like scientific experiments or simulations, means there is prior knowledge of the need for such resources, and no ad-hoc requests for simple resources. We formally define reservation R in terms of the following (5) parameters: : reservation start time : reservation end time : reservation class of service : each resource has a resource type. Such types can be “compute”, “network”, and “disk”, ... . : is a function that returns the capacity of resource at time t. With these notation one can express reservation request as a co-reservation for resources, with start time and end time reservation class cl on with the associated capacities as follows:

using QoS

We also introduce in this definition the concept of pre-emption priority, which has been explored in the context of networking and communication service [8]. The preemption priority is that when the reservation is not in effect, either before or after the reservation period, the job, or service that makes use of the reserved resource is not turned down or eliminated, but is rather assigned a low priority value, which means switching its status from ‘guaranteed’ to a ‘best effort’ type of service. In practice to support this concept the underlying resource manager should be a priority-based system, such as the Dynamic Soft Real Time (DSRT) scheduler [17]. This feature is very useful in protecting applications when reservations expired.

7.2 Admission Control Admission control is the process of granting/denying reservation requests based on a number of factors, such as, the actual load of the specified resource, the policy that governs who, how and when reservation for resource usage should be granted. To perform an admission control process an admission control mechanism must be employed. We formally describe our admission control mechanism as a ‘Boolean’ function that returns true or false for a reservation request R at time true means the reservation can be granted for the given time with the resource specifications, and false means otherwise. To further define the admission control function algorithm, we first define the notion of resource load L at time

where is the number of granted reservations for time and capacity reserved on the resource type at time

is the amount of

538

R. Al-Ali et al.

We also need to define resource total capacity as the maximum capacity the underlying resource can provide; formally is the maximum capacity that the resource can provide. With the above basic primitives, we can now define the algorithm for the admission control function.

7.3 Reservation Features As the reservation manager presented in this work operates in a Open Grid Service Infrastructure (OGSI), the service has a number of ‘operations’ can be used by other components. These operations are implemented as an API with a set of primitives, briefly described as: reserve: is invoked by sending a reservation tuple R, this replies with a ‘reject reservation’, if the reservation cannot be granted. Otherwise it returns a reservation ‘handle’, a reference for the newly-made reservation request. isAvailable: is used for checking the status of some resource prior to placing the actual reservation; this operation returns a Boolean result accordingly. nextAvailable: is used for ‘counter-proposals’ brokering service if the user’s request for reservation cannot be granted. Rather than replying with a yes/no answer, as is the case with most reservation systems, the operation can reply with a ‘no’ and a counter-proposal for the next availability. extend: can modify a reservation by extending it for a specified duration. find: finds a reservation, and replies with all details about the reservation. cancel: cancels a reservation. With this set of reservation operations on the reservation manager a higher level brokering service, or agent, can make use of this manager to provide immediate reservations, and reservations in advance, and also manipulate these reservations.

An OGSA-Based Quality of Service Framework

539

8 Conclusion In this paper, we propose a QoS service model in service-oriented Grids comprising a brokering service and a number of supporting modules, including policy service, reservation manager, allocation manager, and QoS selection service. Throughout this paper we describe the individual components of our framework and outline their patterns of interaction. We also discuss an OGSA compliant prototype implementation for our G-QoSM architecture. The important features of our approach are: the QoS manager is a Grid service and dynamically interacts with a reservation and policy service modules, which makes it possible for resource owners to update/modify their policies during run-time; and the reservation is abstracted as a generic service for co-reservation support, which makes it very suitable for distributed computing, such as Grids. This abstraction allows the reservation service to operate with any underlying resources, without previous knowledge of the resource characteristics, with the association of resource characteristics taking place during run-time by querying the policy service. This novel feature demonstrates scalability - highly desirable in Grid infrastructure. Acknowledgment. This work was supported by the Mathematical, Information, and Computational Science Division subprogram of the Office of Advanced Scientific Computing Research, Office of Science, U.S. Department of Energy, under Contract W-31109-Eng-38. DARPA, DOE, and NSF support Globus Project research and development. The Java CoG Kit Project is supported by DOE SciDAC and NSF Alliance.

References 1. G. von Laszewski and P. Wagstrom, “Gestalt of the Grid,” in Performance Evaluation and Characterization of Parallel and Distributed Computing Tools, ser. Series on Parallel and Distributed Computing. Wiley, 2003, (to be published). http://www.mcs.anl.gov/~gregor/papers/vonLaszewski--gestalt.pdf 2. I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations,” International Journal of Supercomputing Applications, vol. 15, no. 3, 2002. http://www.globus.org/research/papers/anatomy.pdf 3. “TeraGrid,” Web Page, 2001. http://www.teragrid.org/ 4. “The Global Grid Forum Web Page,” Web Page, http://www.gridforum.org 5. I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, and A. Roy, “A distributed resource management architecture that supports advance reservation and co-allocation,” in Proceedings of the International Workshop on Quality of Service, vol. 13, no. 5, 1999, pp. 27–36. 6. A. Hafid, G. Bochmann, and R. Dssouli, “A quality of service negotiation approach with future reservation (nafur): A detailed study,” Computer Networks and ISDN, vol. 30, no. 8, 1998. 7. K. Kim and K. Nahrstedt, “A resource broker model with integrated reservation scheme,” in IEEE International Conference on Multimedia and Expo (ICME2000), 2000. 8. M. Karsten, N. Berier, L. Wolf, and R. Steinmetz, “A policy-based service specification for resource reservation in advance,” in International Conference on Computer Communications (ICCC’99), 1999.

540

R. Al-Ali et al.

9. J. MacLaren, “Advance reservations: State of the Art,” GGF GRAAP-WG, See Web Site at: http://www.fz-juelich.de/zam/RD/coop/ggf/graap/graap-wg.html, Last visited: August 2003. 10. K. Czajkowski, I. Foster, C. Kesselman, V. Sander, and S. Tuecke, “SNAP: A Protocol for Negotiating Service Level Agreements and Coordinating Resource Management in Distributed Systems,” in Proceedings of the 8th Workshop on Job Scheduling Strategies for Parallel Processing, 2002. 11. A. Hafid and G.Bochmann, “Quality of service adaptation in distributed multimedia applications,” ACM Springer-Verlag Multimedia Systems Journal, vol. 6, no. 5, pp. 299–315, 1998. 12. A. Oguz et al., “The mobiware toolkit: Programmable support for adaptive mobile networking,” IEEE Pesronal Communications Magazine, Special Issue on Adapting to Network and Client Variability, vol. 5, no. 4, 1998. 13. G. Bochmann and A. Hafid, “Some principles for quality of service management,” Universite de Montreal, Tech. Rep., 1996. 14. R. Al-Ali, O. Rana, D. Walker, S. Jha, and S. Sohail, “G-QoSM: Grid Service Discovery using QoS Properties,” Computing and Informatics Journal, Special Issue on Grid Computing, vol. 21, no. 4, pp. 363–382, 2002. 15. R. Al-Ali, A. Hafid, O. Rana, and D. Walker, “Qos adaptation in service-oriented grids,” in Proceedings of the 1st International Workshop on Middleware for Grid Computing (MGC2003) at ACM/IFIP/USENIX Middleware 2003, Rio de Janeiro, Brazil, 2003. 16. I. Foster, C. Kesselman, et al., “The physiology of the grid:an open grid services architecture for distributed systems integration,” Argonne National Laboratory, Chicago, Tech. Rep., January 2002. 17. H. Chu and K. Nahrstedt, “A cpu service classes for multimedia applications,” in IEEE Multimedia Systems ’99, 1999. 18. B. Teitelbaum, S. Hares, L. Dunn, R. Neilson, R. Narayan, and F. Reichmeyer, “Internet2 qbone: Building a testbed for differentiated services,” IEEE Networks, 1999.

A Service Management Scheme for Grid Systems Wei Li, Zhiwei Xu, Li Cha, Haiyan Yu, Jie Qiu, and Yanzhe Zhang Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China {liwei, zxu, char, yuhaiyan, zhangyanzhe}@ict.ac.cn, [email protected]

Abstract. In this paper, we propose a service management scheme named Grid Service Space (GSS) model, which provides application developers a highlevel logical view of grid services and a set of primitives to control the full lifecycle of a grid service. The GSS model provides a novel approach to meet the desired management requirements for large-scale service-oriented grids, including location-independent service naming, transparent service access, fault tolerance, and controllable service lifecycle.

1 Introduction The physical resource independent property, such as location transparency and access transparency, is a general design principle for resource management in distributed systems. With the emergence of grid computing, the distributed resources are abstracted as Grid Services [3], which aims at the hidden of the heterogeneity of various resources and focuses on the standardization of interface descriptions, access semantics and information representations. Strictly speaking, the current definition of grid service does not endow distributed resources with fully virtual properties due to the use of location-dependent naming mechanism (e.g. The OGSA [3] framework leverages a URL-based naming scheme to indicate a service instance’s physical location) and the lack of transparent service access mechanisms. Under such circumstances, developers have to take extra efforts on much general-purpose resource management work, such as service discovery, scheduling, error recovery, etc. Another problem is that a developer has to modify his applications when the URL-based name of a service changes. How to achieve complete physical resource independency remains a challenge for grid resource management. From the knowledge of traditional operating system design, we know that the virtualization technologies, such as virtual memory [1] and virtual file system [6], are common ways to obtain physical resource independent properties. The virtual memory technology can fulfill the requirements of dynamic storage allocation, i.e. desires for program modularity, machine independence, dynamic data structures, elimination of manual overlays, etc. The virtual file system technology enables the accommodation of multiple file system implementations within an individual operating system kernel, which may encompass local, remote, or even non-UNIX file systems.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 541–548, 2004. © Springer-Verlag Berlin Heidelberg 2004

542

W. Li et al.

To obtain the full physical resource independent properties, we adopt a service management scheme called Grid Service Space (GSS) model, which is similar to the experiences of virtual memory and virtual file system. With this model, a programmer can refer to a service by a location-transparent name without knowing the service location, status, capabilities, etc. Hence, the runtime system can obtain several benefits such as load balancing (by choosing lightly loaded services), fault tolerance (by switching to a new service in response to service failure), locality of service access (by locating a nearer service), etc. The paper is organized as follows. Section 2 analyzes the requirements for grid service management. Section 3 presents the detail description of the GSS model. Section 4 introduces the implementation and section 5 concludes this paper.

2 Requirements for Service Management In current grid research, main efforts have been put on standardizing physical resources as grid services. Analogous to the traditional operating system harnessing the use of hardware, a grid operating system (GOS) becomes a natural solution to manage the use of grid resources. More precisely speaking, a GOS is a runtime system that can manage the heterogeneous, distributed, and dynamical resources efficiently. To realize such a GOS, it is necessary to analyze the lifecycle of a grid application carefully, which can be divided into programming phase and runtime phase. At the Programming phase, a programmer needs to integrate various services together to solve a problem. In most cases, programmers do not care about the location of services (i.e. where the task to be executed). From the view of programmers, the services should be physical resource independent, and a programmer can refer to a service just by a unique name and desired attribute descriptions. At the Runtime phase, when a program is running, it often encounters problems such as resource scheduling, error recovery, task migration, etc. A GOS should provide transparent service access mechanisms including service discovery, error recovery, lifecycle control, etc. to reduce burden of attacking the above issues. From the above analysis, we can summarize the main requirements of a service management system as physical resource independent naming, transparent service discovery and scheduling, service lifecycle control and fault-tolerance. In addition, the GOS should also consider the implementation issues such as resource topology, programming language support, performance, reliabilities, etc.

3 The GSS Model The GSS model is proposed to abstract and define the key concepts for a service management system. In this model, the basic elements are virtual services and physical services, which also construct the Virtual Service Space (VSS) and Physical Service Space (PSS). The difference is that a virtual service is a logical representation for

A Service Management Scheme for Grid Systems

543

a computational resource while a physical service is a computational entity with network access interface. A functional equivalence of multiple services is indicated by a coessential identifier, which means that these services have same processing functions (though they may have different capabilities and attributes). The mapping between two coessential services is called coessential mapping. The coessential mapping from a virtual service to a physical service is called scheduling mapping, and the coessential mapping from a virtual service to a virtual service is called translating mapping. For a given virtual service with a certain coessential identifier, all physical services, which have same coessential identifiers, are called discoverable set of this virtual service. All physical services in a discoverable set are candidates for this virtual service to bind to. In addition to the above basic definitions, we introduce the Virtual Service Block (VSB), which is a subset of VSS and groups related services together within a VSS. we also provide a set of primitives for service lifecycle control. Several service states are defined to indicate different phases of a service lifecycle. A service can switch to different states via the service lifecycle control primitives.

3.1 Formal Definitions Definition 1. A Service Space is a set denoted by name of a service.

where

is the

Definition 2. For all services in a service space S, they can be divided into two types, which are denoted by a set L = {vs, ps}. vs represents a virtual service, whose name is location independent, and ps represents a physical service, which has a locatable address. We denote the type of a service as where A service space is called a Virtual Service Space of S and denoted by V, if and for each service there is A service space is called a Physical Service Space of S and denoted by P, if and for each service there is Definition 3. If have a same function, we say are coessential services. The function is expressed by a coessential identifier e. All coessential identifiers construct a set which is called coessential identifier set for service space S. For each it has one and only one coessential identifier. That is, for each there is a mapping If two service and are coessential services, there is an equation Definition 4. The set coessential service set of is and Definition 5. The mapping of from to where

and and is called the for service space For every service space there and for random two coessential service sets there is is called the coessential mapping and and and

544

W. Li et al.

and and Especially, the coessential mapping of from a VSS V to a PSS P is called scheduling mapping. For each virtual service P is called the discoverable set of s. In addition, the coessential mapping of from one VSS V to another VSS V’ is called translating mapping. For each virtual service V ’ is called the translatable set of s.

3.2 Semantics of GSS Management Service naming mechanisms. The definitions of virtual service and physical service do not give the semantic of a service name explicitly. For a virtual service, the only prerequisite to its name is to differentiate it from other virtual services in one VSS. The location-independent means this name contains no physical resource information and should be translated to a locatable resource before a program can access this virtual service. In our model, the virtual service can use a code-based name or stringbased name, which can be user-friendly or even semantic-based. With the location independent naming mechanism of VSS, programmers can develop applications at a virtual resource layer. For a physical service, the GSS model does not restrict its naming mechanism only if the service name can indicate a locatable address. For example, we can use an IP address and a TCP port to indicate a service instead. The URL-based naming mechanism in the OGSA framework guarantees the global uniqueness of grid services and gives each resource a locatable address. Virtual Service Block. Normally, application developers require the ability to organize a group of related services together. In addition, a programmer needs the ability to refer to this group of services by a name. Similar to the virtual memory design, a service management scheme should fulfill the objectives such as program modularity, module protection and code sharing. The Virtual Service Block (VSB) can achieve the above objectives. In the GSS model, a VSS is composed by a set of named VSB. Each block is an array of service names. The service name in a blocked VSS is comprised by two components (b, s), where b is the name of a VSB and S is a unique name within b. The first objective can be achieved by allocating each module to one VSB. Therefore, other programs can easily share this module just by changing the block name, and the name within this block can remain unchanged. The second objective can be gained by adding extra information and checking mechanism. Each VSB can set the information such as block owner, access rights, etc., to implement the space protection. By mapping one VSB into multiple VSS and using different block names, multiple programs can share the code of modules in other programs. Virtual-Physical Service Space Mapping. Different from the memory mapping technology, the virtual-physical service space mapping in the GSS model is more complex. Although a VSS is similar to a virtual memory space, the PSS is much different from the memory space due to the feature of autonomous control and huge

A Service Management Scheme for Grid Systems

545

size. These two limitations brought several difficulties for efficient service space mapping. The first one makes it hard to deploy a physical service to a specific address (except for service owners). The second one may cause the performance of service locating even worse because of the huge search space of PSS. In the GSS model, we use coessential mapping mechanism formally described in Definition 5, parallel pre-mapping technology and discoverable set to address the above problems. When mapping a virtual service to a physical service, we should consider two important issues: correctness and performance. The correctness can be guaranteed by following the definitions of GSS model. The performance can be improved by better organization of physical services, efficient service locating, and scheduling policies. Several research work [2] [4] [5] have concentrated their efforts on the above issues. To improve the performance of service space mapping, we exploits the parallel pre-mapping technology together with VSB to improve the overall service space mapping performance by hiding the service locating time. The idea is to keep locating and mapping multiple physical services for a group of given virtual services (such as all virtual services in a VSB) in parallel before a running program refers to these virtual services actually. In addition, for each coessential identifier, we use the discoverable set defined in Definition 5 to build a small search space for service mapping, which also can reduce the searching time.

Fig. 1. Using Parallel Pre-mapping technology together with VSB to hide the searching time.

Fig. 2. Using discoverable set to reduce search space.

Figure 1 illustrates the parallel pre-mapping technology used in service space mapping. When loading a program, the GOS will map several virtual spaces in parallel at first. When a program starts to run, it can directly access the mapped physical

546

W. Li et al.

services. At the same time, the GOS will continually map virtual services of subsequent VSB in parallel. Figure 2 illustrates using of a discoverable set to reduce the search space for a virtual service. The GOS will build a discoverable set for each coessential identifier before loading programs. When a program is loaded, the search operations can be performed within a relative small discoverable set. The parallel pre-mapping and discoverable set technologies can be utilized together to improve the overall performance of service space mapping.

3.3 Service Lifecycle Control Compared to physical memory access, the lifecycle of service access is more complex. When a user accesses a service, there may have single send/receive operation or multiple send/receive operations. While in virtual memory systems, accessing a memory cell is in fixed time and the access mode is determined. To perform correctly and determinedly, lifecycle control of services is needed. In our GSS model, the different capabilities and properties between virtual services and physical services imply that they have different lifecycle patterns. Different control primitives are needed to manage the status transition of virtual services and physical services respectively. Here we mainly introduce the lifecycle control of virtual services. When a programmer refers to a virtual service, he not only want know a location independent name but also the full process of service access. In this section, we provide a set of primitives to control the activities of a virtual service. In order to describe the lifecycle control of a virtual service properly, we use a more concrete entity called mService to represent a virtual service. The mService can be defined as a tuple where n is a unique service name in a VSS, e is the coessential identifier, i is the session identifier, is the coessential mapping, V is a VSS, p is a physical service name and st is a service state indicator, which is an element of a set ST = { Created, Binded, Running, Waiting, Terminated}. The lifecycle of a virtual service includes several relevant operations, such as the virtual service creation, service discovery and scheduling, session control, etc. The lifecycle control primitives for virtual services are summarized as follows: create (n), performed when we create a new virtual service and start up a new session with it. After this operation, the state of mService st = Created and a session identifier i is returned. open (n), performed when we reopen an existing virtual service that is out of session. After this operation, the state of the virtual service remains unchanged and a session identifier i is returned. delete (n), performed when we remove a virtual service from a VSS. After this operation, the virtual service with name n is deleted from this VSS. bind performed when we map a physical service to a virtual service. After this operation, there is and st = Binded. In addition, the virtual service n is added to VSS V, the coessential identifier e is added to

A Service Management Scheme for Grid Systems

547

invoke (i), performed when we call a method of a virtual service. After this operation, st = Running. sleep (i), performed by program or GOS kernel. After the operation, st = Waiting. interrupt (n, i), performed when an external event occurs. After this operation, st = Running. close (i), performed when we cut off the current session with a virtual service. After this operation, the virtual service is out of session and the users cannot interact with this virtual service until using open primitive to create a new session.

4 Implementations The GSS model is a key feature of the Vega GOS in Vega Grid project [8] [9], which aims at learning fundamental properties of grid computing, and developing key techniques that are essential for building grid systems and applications. The Vega GOS is also used to build a testbed called China National Grid (CNGrid), which is sponsored by the 863 program and aims at integrating high performance computers of China together to provide a virtual super computing environment.

Fig. 3. The layered architecture of Vega Grid.

The architecture of Vega Grid is conformed to the OGSA framework and Figure 3 shows the layered architecture of Vega Grid. At the resource layer, resources are encapsulated as grid services or web services. The GOS layer will aggregate these services together and provides a virtual view for developers, who can use the APIs, utilities, and developing environments provided by Vega GOS to build VSS-based applications. At this layer, the most import work is deploying and publishing a physical service to upper layers. In our implementation, each physical service should have a unique coessential identifier. After generating the coessential identifier for a physical service, we register this service to a resource router with the coessential identifier and other information needed. According the algorithm in [7], this physical service will be published to all resource routers and every GOS can know the existence of this physical service. At the GOS layer, the resource router plays an important role to locating resources. The current implementation of Vega GOS is developed as grid services specified in [3]. In addition, Vega GOS implements the virtual service lifecycle management defined in Section 3.3. As a full functional integrated system, the Vega GOS also

548

W. Li et al.

considers the implementation issues such as security, user management, communication, etc., which are not covered in this paper. At the application layer, programmers can use the GOS APIs to build a custom application. We also provide a GUI tool called GOS Client based on GOS APIs to help users to utilize the services in Vega Grid.

5 Conclusions and Future Work We have discussed the issues on grid service management. In order to overcome the obstacles in grid application development and system management, the GSS model is proposed to provide the location independent naming, transparent service access and service lifecycle control abilities to developers. As a fundamental component of our service management scheme, the GSS model also helps other research work such as grid-based programming model. We are currently implementing the Vega GOS and the GSS model on the CNGrid testbed. We hope the practical running of Vega GOS and its applications can verify the basic concepts and technologies in the GSS model.

References [1] [2]

[3] [4] [5] [6] [7]

[8] [9]

P. J. Denning, “Virtual Memory”, ACM Computing Surveys, vol. 2:3, pp. 153-189, 1970. S. Fitzgerald et al., “A Directory Service for Configuring High-Performance Distributed Computations”, Proc. 6th IEEE Symposium on High Performance Distributed Computing, pp. 365-375, 1997. I. Foster et al., “Grid Services for Distributed Systems Integration”, Computer, pp. 37-46, 2002. A. Grimshaw et al., “Wide-Area Computing: Resource Sharing on a Large Scale”, Computer, pp. 29-37, 1999. A. Iamnitchi et al., “On Fully Decentralized Resource Discovery in Grid Environments”, International Workshop on Grid Computing, 2001. S. R. Kleiman, “Vnodes: An architecture for multiple file system types in Sun UNIX”, In USENIX Association Summer Conference Proceedings, pp. 238-247, 1986. W. Li et al., “Grid Resource Discovery Based on a Routing-Transferring Model”, 3rd International Workshop on Grid Computing (Grid 2002), LNCS 2536, pp. 145-156, 2002. Z. Xu et al, “Mathematics Education over Internet Based on Vega Grid Technology”, Journal of Distance Education Technologies, vol. 1:3, pp. 1-13, 2003. Z. Xu et al, “A Model of Grid Address Space with Applications”, Journal of Computer Research and Development, 2003.

A QoS Model for Grid Computing Based on DiffServ Protocol* Wandan Zeng1, Guiran Chang1, Xingwei Wang1, Shoubin Wang2, Guangjie Han1, and Xubo Zhou3 1

School of Information Science and Engineering, Northeastern University Shenyang, Liaoning 110004, China [email protected],[email protected] 2

3

Beijing Institute of Remote Sensing, Beijing 100085, P.R.China China National Computer Software & Technology Service Corporation, Beijing 100083, P.R.China

Abstract. Because Grid comprises various kinds of dynamic and heterogeneous resources and provides the users with transparent services, how to achieve the quality of services for Grid Computing will be in face of more challenges. A QoS Model for Grid Computing based on DiffServ protocol is proposed in this paper. The implementation method is introduced. A mathematical model for scheduling on the Grid nodes has been established. This model is evaluated with a simulated Grid environment. The simulation results show that the Grid QoS model based on DiffServ proposed in this paper can improve the performance of Grid services for a variety of Grid applications. Making full use of it and optimizing it further are of great significance for the use of Grid resources and gaining high quality of Grid services.

1 Introduction Grid Computing comprises different kinds of resources such as super computers, large-scale storage systems, personal computers, and other equipment into a unified framework. When the resources needed by a computing project surpass the local processing capability, the Grid permits this project uses the CPU and storage resources of distant machines [1]. Traditional Internet achieved the connection of computer hardware, Web achieved the connection of web pages [2], and Grid integrates Internet resources to realize the sharing of all kinds of resources. Distributed, heterogeneous, and dynamic resources are used by Grids to provide users with transparent, unified, and standard services [3]. The characteristics of Grid challenge the qualities of service of Grid. Many Grid applications have the high qualities of service needs, such as tele-immersion, distributed *

This work was supported by the National Natural Science Foundation of China under Grant No.60003006 (jointly supported by Bell labs) and No.70101006; the National High-Tech Research and Development Plan of China under Grant No.2001AA121064.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 549–556, 2004. © Springer-Verlag Berlin Heidelberg 2004

550

W. Zeng et al.

real-time computing, multi-media applications, and so on. OGSA (Open Grid Services Architecture) proposed the seamless quality of service that spanned integrated Grid resources. The Chinese Academy of Science proposed a set of consumer satisfaction and quality of service evaluation standards like service level agreement [4]. How to efficiently use the distributed and dynamic Grid resources to provide high quality of service for Grid applications has become an indispensable part of Grid Computing research.

2 QoS for Grid Computing 2.1 The Basic QoS Framework for Grid Computing IP QoS provides us a set of mature protocols for QoS. They can provide high beneficial services, increase bandwidth, and improve the quality of end-to-end services. The current leading Grid Computing operating systems such as Globus are based on TCP/IP, so IP QoS protocols should be the important technologies that can be used in Grid Computing QoS. The fundamental framework for Grid Computing IP QoS is as follows.

Fig. 1. Fundamental Framework for Grid Computing IP QoS

2.2 The Grid Computing QoS Strategy Current Grid Computing QoS pays more attention to resource management but take little notice to task scheduling [5]. RSVP is the most commonly used protocol in Grid Computing QoS [6]. RSVP is an in-band signaling protocol and transmits the soft state signaling through channels out of data streams [7]. RSVP gains high performance but at the same time makes the transmission more complicated, thus it is suitable for small-type access network [8]. But the amount of services Grid Computing provided is very large and properties of Grid resources are complicated and dynamic. The out-of-band signaling will bring out huge additional burden for network transmission and will be very hard to control.

A QoS Model for Grid Computing Based on DiffServ Protocol

551

DiffServ protocol is used in the Grid Computing QoS model in this paper. DiffServ protocol uses in-band signaling. The signaling in band is transmitted with the data in data-stream together. It supports eight kinds of priority and obeys to DSCP protocol. The core of DiffServ is “classifying at the edge and transmitting inside”. It will not add the transmission burden to the networks and it has good expansibility. This will be more suitable for the characteristics of Grids.

3 The Implementation Method of Grid Computing QoS The techniques used for the implementation of Grid Computing QoS include service classification, speed limitation, queuing scheduling, congestion control, traffic engineering, and so on [9]. DiffServ protocol is used in this paper. The tasks are classified by Grid operating system. The Grid node schedules tasks by queues and is responsible for the congestion control of tasks.

3.1 The Implementation of Grid Computing DiffServ Grid Computing applications send task requirements including the answering time and some requirement information to the Grid operating system. After that the Grid operating system uses the DiffServ protocol to classify the tasks and then sign them. The Grid operating system converges the tasks as different assemble groups with different priorities.

Fig. 2. DiffServ QoS Model for Grid Computing

In this model, tasks are classified by their answering time. Some tasks are in high demand of quick answering, such as multi-media display, video conference, and so on. These tasks ask for short time delay answering. The priorities of these kinds of

552

W. Zeng et al.

services are high. Some are not sensitive to the answering time, such as some common transaction process, E-Mail, and so on. Their priorities are comparatively low. In order to be compatible with the sub-field IP Precedence of IPv4 ToS, there are at most eight kinds of service priorities. Here we define eight types of services differentiated by their request for answering time. And each priority represents one type of the emergency degree of the task. The DS field is defined as follows:

Fig. 3. The Assignment of DS Field in Grid Computing DiffServ

Bit 0-2 are set to 000, 001, 010, 011, 100, 101, 110, or 111, and represent the priority from lowest to highest. Bit 3-5 are all set to 0. The sixth and seventh are reserved. The Grid operating system sends the task data packets with DS sign to Grid nodes through the network. When a Grid node receives the packet, it parses the content in DS and then assigns the tasks to different queue with different priority. In the same queue every task is processed circularly by the time slice. 3.2 The Implementation Model of Grid Computing DiffServ on Grid Node The arrival of every class of tasks on Grid node obeys to Poisson distribution [10], is the average reaching ratio of tasks with priority i. The service time for every class of task obeys to the negative exponential distribution and the average service time is represent the average waiting and lingering time of the task of priority i. Now:

If represents the lingering leaving time of tasks with priority 1 and 2 when there are both priority 1 and 2 tasks in the system. Then:

A QoS Model for Grid Computing Based on DiffServ Protocol

553

Fig. 4. Task Scheduling on Grid Nodes

can be worked out by the M/M/1 model queue with the arrival ratio of

Similarly, we can get

We can conclude that,

554

W. Zeng et al.

4 The Results of Simulation Simulation is carried on by NS2 on Globus2.2. CBQ/WRR is used to simulate the eight queues of different priorities. We set the length of the queue 21 (number of packets) and the simulation time is fifty seconds. For the 8 priorities from low to high, 21 packets are considered. Considering the page limit, we choose three columns of them as representatives and their priorities are 1, 4, 6, 7, respectively. (Unit is s, and d in the table represents drop.)

A QoS Model for Grid Computing Based on DiffServ Protocol

555

Fig. 5. The Comparison of Grid Packets Processing Time

WRR uses circular inquiring strategy and every queue has the same length of time slice. When congestion occurs, the tasks with lower priority will be dropped. From the table we can see that the losing ratio of the packet increases gradually. The processing time of the same packet in different queues is different and it becomes longer as the priority becomes lower. This model can provide different service quality for Grid applications. It can make full use of Grid resources and improve the performance of Grid Computing services for a variety of Grid applications.

5 Conclusion The characteristics of Grid services are the most important factors for the complexity of the quality of Grid Computing services. The Grid Computing QoS Model discussed in this article is effective according to the results of simulation. It can improve the performance for a variety of Grid Computing applications. But it is still not an ideal model for its own limitation. Because the lack of end to end signaling, DiffServ QoS cannot ensure end to end Grid QoS. And it has only 8 grades. Further research on Grid QoS, for example, further optimizing the model and more economic concepts taken into account, is the main aim and task in the future.

References 1.

2.

3.

Xiao, N., Ren, H., Gong, S.S.: Design and Implementation of Resource Directory in National High Performance Computing Environment. Journal of Computer Research and Development, 2002, 902 (8) Foster, I., Kesselman, C.: Globus: A metacomputing infrastructure toolkit. International Journal of Supercomputer Applications and High Performance Computing, 1997. 11(2): 115-128 Foster, I., Kesselman, C.: The Grid, Blueprint for a New Computing Infrastructure. San Francisco: Morgan Kaufmann Publishers Inc., 1998. 279-309

556

W. Zeng et al.

Xu, Z.W., Li, W.,: Architectural study of the vega Grid. Journal of Computer Research and Development (in Chinese), 2002, 39 (8): 923-929 5. He, X.S., Sun, X.H., von Gregor, L.: QoS Guided Min-Min Heuristic for Grid Task Scheduling. Workshop on Grid and Cooperative Computing. Comput. Sci. & Technol, 2003, 18 (4): 442 451 6. Foster, I., Roy, A., Winkler, L.:A quality of service architecture that combines resource reservation and application adaptation. In: Proceedings of the 8th International Workshop on Quality of Service (IWQOS 2000). 2000. 181-188 7. Xiao, X.P.: Internet QoS: a big picture. IEEE Networks, March/April 1999: 8 18 8. Grenville, A.,: Quality of Service in IP Networks: Foundations for a Multi-Service Internet, Beijing: China Machine Press, 2001.1 9. Subrat, K.: Issues and architectures for better quality of service (QoS). Proceedings of the 6th National Conference on Communications (NCC2000), New Delhi, India: 181-187 10. Lu, Z.Y., Wang, S.M.: The Information Content Theory of Communication, Beijing: Publishing House of Electronics Industry, 1997. 42-45 4.

Design and Implementaion of a Single Sign-On Library Supporting SAML (Security Assertion Markup Language) for Grid and Web Services Security Dongkyoo Shin, Jongil Jeong, and Dongil Shin Department of Computer Science and Engineering, Sejong University 98 Kunja-Dong, Kwangjin-Ku, Seoul 143-747, Korea {shindk, jijeong, dshin}@gce.sejong.ac.kr

Abstract. In recent years, the Grid development focus is transitioning from resources to services. A Grid Service is defined as a Web Service that provides a set of well-defined interfaces and follows specific conventions. SAML is an XML based Single sign-on (SSO) standard for Web Services, which enables the exchange of authentication, authorization, and profile information between different entities. This provides interoperability between different security services in distributed environments. In this paper, we designed and implemented Java-based SAML APIs to achieve an SSO library.

1 Introduction Grid computing denotes a distributed computing infrastructure for advanced science and engineering. It supports coordinated resource sharing and problem solving across dynamic and geographically dispersed organizations. Moreover, the sharing concerns not only file exchange but also direct access to computers, software, data, and other resources [1]. In recent years, the Grid development focus is transitioning from resources to services. A Grid Service is defined as a Web Service that provides a set of welldefined interfaces and follows specific conventions [2]. The interfaces address discovery, dynamic service creation, lifetime management, notification and manageability. Web Services [3], proposed by World Wide Web Consortium (W3C), provide a standatd which is designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL - Web Services Description Language). Other systems interact with the Web services in a manner prescribed by its description using SOAP(Simple Object Access Protocol) messages, typically conveyed using HTTP with an XML serialization. Single sign-on (SSO) is a security feature, which allows a user to log into many different services offered by the distributed systems such as Grid [4] and Web Services, while the user only needs to authenticate once, or at least always in the same way [5, 6]. Various SSO solutions have been proposed that depend on public key M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 557–564, 2004. © Springer-Verlag Berlin Heidelberg 2004

558

D. Shin, J. Jeong, and D. Shin

infrastructure (PKI), Kerberos, or password-stores, which require an additional infrastructure on the client’s side and new administrative steps [7]. Recently a new standard for exchange of security-related information in XML called Security Assertions Markup Language (SAML) [8,9] is recommended by the Organization for Advancement of Structured Information Standards (OASIS). SAML enables the exchange of authentication, authorization, and profile information between different entities to provide interoperability between different security services in distribution environments such as Grid and Web Services. The security information described by SAML is expressed in XML format for assertions, which can be either: authentication assertions, attribute assertions, or authorization decision assertions. SAML authorities issue assertions, which can be security service providers or business organizations. Assertions provide a means of avoiding redundant authentication and access control checks, thereby providing single sign-on functionality across multiple target environments. SAML also defines the protocol by which the service consumer issues the SAML request and SAML authority returns the SAML response with assertions. While SAML is an authentication standard for Web Services, it is also proposed as a message format for requesting and expressing authorization assertions and decisions from an OGSA(Open Grid Services Architecture) authorization service [10]. In this paper, we designed and implemented a Java-based SSO library made up of SAML Application Programming Interfaces (APIs).

2 Background The basic idea of single sign-on (SSO) is to shift the complexity of the security architecture to the SSO service and release other parts of the system from certain security obligations. In the SSO architecture, all security algorithms are found in the single SSO service, which acts as the single and only authentication point for a defined domain. The SSO service acts as the wrapper around the existing security infrastructure that exports various security features like authentication and authorization [5,6]. For SSO implementation, classical three-party authentication protocols that exploit key-exchanges such as Kerberos and Needham-Schroeder are used. Since these protocols start with a key-exchange or key-confirmation phase, the client application uses the new or confirmed key for encryption and authentication [11]. For a different approach for SSO implementation, token-based protocols such as cookies or SAML are used [11]. Being different to key-exchange protocols, an authentication token is sent over an independently established secure channel. In other words, a secure channel is established without an authenticated client key, in which the Secure Socket Layer (SSL) [12] is usually used with browsers, and then an authentication token is sent in this channel without conveying a key. The main advantage of token-based protocols is that a majority of service providers already have SSL server certificates and a suitable cryptographic implementation is available on all client machines via the browsers. In addition, one can use several unrelated authenti-

Design and Implementaion of a Single Sign-On Library Supporting SAML

559

cation tokens to provide information about the user in the same secure channel with the service provider [11]. SAML is a standard suitable for facilitating site access among trusted security domains after single authentication. Artifacts, which have a role of tokens, are created within a security domain and sent to other security domains for user authentication. Since the artifacts sent to the other domains are returned to the original security domain and removed after user authentication, this resolves the problems of session keys being revealed and stolen tokens in the browser. In addition, artifact destination control is fully achieved since artifact identification is attached to the Uniform Resource Locator (URL) and redirects the message sent to the destination [8].

2.1 SAML (Security Assertion Markup Language) Recently, OASIS has completed SAML, a standard for exchanging authentication and authorization information between domains. SAML is designed to offer single signon for both automatic and manual interactions between systems. It will let users log into another domain and define all of their permissions, or it will manage automated message exchanges between two parties. SAML is a set of specification documents that define its components: Assertions and request/response protocols Bindings (the SOAP-over-HTTP method of transporting SAML requests and responses) Profiles (for embedding and extracting SAML assertions in a framework or protocol) Security considerations while using SAML Conformance guidelines and a test suite Use cases and requirements

Fig. 1. Structure of Assertion Schema

Fig. 2. Assertion with Authentication Statement

560

D. Shin, J. Jeong, and D. Shin

SAML enables the exchange of authentication and authorization information about users, devices or any identifiable entity called subjects. Using a subset of XML, SAML defines the request-response protocol by which systems accept or reject subjects based on assertions [8,9]. An assertion is a declaration of a certain fact about a subject. SAML defines three types of assertions: Authentication: indicating that a subject was authenticated previously by some means (such as a password, hardware token or X.509 public key). Authorization: indicating that a subject should be granted or denied resource access. Attribution: indicating that the subject is associated with attributes. Figure 1 shows an assertion schema and Figure 2 shows the assertion statement, which includes authentication assertion issued by SAML authority. SAML does not specify how much confidence should be placed in an assertion. Local systems decide if security levels and policies of a given application are sufficient to protect an organization if damage results from an authorization decision based on an inaccurate assertion. This characteristic of SAML is likely to spur trust relationships and operational agreements among Web-based businesses in which each agrees to adhere to a baseline level of verification before accepting an assertion. SAML can be bound with multiple communication and transport protocols. It can be linked with Simple Object Access Protocol (SOAP) over HTTP [8,9].

Fig. 3. Browser/Artifact

Fig. 4. Browser/Post

SAML operates without cookies in one of two profiles: browser/artifact and browser/post. Using browser/artifact, a SAML artifact is carried as part of a URL query string as shown in Figure 3, where a SAML artifact is a pointer to an assertion. The steps in Figure 3 are explained as follows. (1) User of an authenticated browser on Server A requests access to a database on Server B. Server A generates a URL redirect, which contains a SAML artifact, to Server B. (2) Browser redirects user to Server B, which receives an artifact pointing to the assertion on Server A. (3) Server B sends artifact to Server A and gets a full assertion. (4) Server B checks the assertion and either validates or rejects the user’s request for access to the database. With browser/post, SAML assertions are uploaded to the browser within an HTML form and conveyed to the destination site as part of an HTTP post payload, as show in Figure 4. The steps in Figure 4 are explained as follows.

Design and Implementaion of a Single Sign-On Library Supporting SAML

561

Fig. 5. Java Packages of SAML APIs

(1) User of authenticated browser on Server A requests access to database on Server B. (2) Server A generates an HTML form with SAML assertion and returns it to user. (3) Browser posts the form to Server B. (4) Server B checks assertion and either allows or denies user’s request for access to database.

3 Design and Implementation of Java-Based SAML APIs We designed and implemented SAML APIs as Java packages as shown in Figure 5. The classification of packages is based on the specification “Assertions and Protocol for the OASIS Security Assertion Markup Language (SAML)” [8]. We designed three basic packages named assertion, protocol and messaging packages. To support the messaging function, we also designed generator, uitilities and security packages. The implemented SAML APIs are grouped into these packages. The function of each package is as follows. Assertion package: dealing with authentication, authorization and attribution information. Protocol package: dealing with SAML request/response message pairs to process assertions. Messaging package: including messaging frameworks which transmit assertions. Security package: applying digital signature and encryption on the assertions Utilities package: generating UUID, UTC Data format and artifacts, and so on. Generator package: generating SAML request/response messages. The Structure of major packages will be shown as continuous figures. The structure of assertion package is shown in Figure 6. The structure of protocol package is shown in Figure 7. A protocol defines an agreed way of asking for and receiving an assertion [13]. The structure of the messaging package is shown in Figure 8. The messaging package transforms a document into a SOAP message and defines how the SAML messages are communicated over standard transport and messaging protocols [13]. We verified the message according to the SAML specifications. When we generated SAML request messages as shown in Figure 9, we used RequestGenerator class in generator package (refer to step 1 of Figure 3 and 4).

562

D. Shin, J. Jeong, and D. Shin

Fig. 6. Structure of ac.sejong.saml.assertion package

Fig. 7. Structure of ac.sejong.saml.protocol package

Fig. 8. Structure of ac.sejong.saml.messaging

Fig. 9. Generation of SAML Request Message

This SAML request message is signatured as shown in Figure 10, using signature class in security.sign package. The signature process of signature class follows XMLsignature standards in the enveloped form. Figure 11 shows the generation of SAML response messages, in which ResponseGenerator class in generator package is used (refer to step 4 of Figure 3 and 4). This SAML response message is also signatured using signature class in security. sign package.

Design and Implementaion of a Single Sign-On Library Supporting SAML

563

Fig. 10. SAML Request Message signatured in Enveloped Form

Fig. 11. Generation of SAML Response Message

4 Conclusion We designed and implemented an SSO library supporting the SAML standard. The implemented SAML APIs have following features. Since SAML messages are transmitted through SOAP, XML based message structures are fully preserved. This enables valid bindings. Integrity and non-repudiation are guaranteed by using signatures on transmitted messages. Confidentiality is guaranteed by encryption of transmitted messages. Since XML encryption is applied, each element can be efficiently encrypted.

564

D. Shin, J. Jeong, and D. Shin

Even though digital signatures on a SAML message using RSA is default and using an XML signature is optional, we fully implemented both APIs in security package. Specific encryption methods for SAML messaging are not mentioned in the SAML specification and XML encryptions are a suitable candidate for encryption of SAML messages. We also implemented APIs for XML encryption [14]. Recently, Grid systems have adopted Web Services standards that were proposed by W3C and SAML is an XML-based SSO standard for Web Services. SAML will become widely used in Grid Architecture, as distributed applications based on Web Services become popular. We will further apply this SAML library to real word systems such as the Electronic Document Manage Systems (EDMS) and the groupware systems and continue research on authorization for users.

References 1. 2.

3. 4.

5. 6. 7.

8. 9. 10. 11. 12. 13. 14.

Foster, I., Kesselman C.: The Globus Project: A Status Report. Future Generation Computer Systems, Volume: 15, (1999) 607-621 Foster I., Kesselman C., Nick J.M., Tuecke S.: The Physiology of the Grid - : An Open Grid Services Architecture for Distributed Systems Integration, http://www.globus.org/research/papers/ogsa.pdf Web Services Activity, http://www.w3.org/2002/ws Foster I., Kesselman C., Tsudik G., Tuecke S.: A Security Architecure for Computational Grids. Proc. 5th ACM Conference on Computer and Communications Security Conference, (1998) 83-92 Volchkov, A.: Revisiting Single Sign-On: A Pragmatic Approach in a New Context. IT Professional, Volume: 3 Issue: 1 , Jan/Feb (2001) 39 -45 Parker, T.A.: Single Sign-On Systems-The Technologies and The Products. European Convention on Security and Detection, 16-18 May (1995) 151-155 Pfitzmann, B.: Privacy in Enterprise Identity Federation - Policies for Liberty Single Signon. 3rd Workshop on Privacy Enhancing Technologies (PET 2003), Dresden, March (2003) Assertions and Protocol for the OASIS Security Assertion Markup Language(SAML) V1.0: http://www.oasis-open.org/committees/security Bindings and Profiles for the OASIS Security Assertion Markup Language(SAML) V1.1: http://www.oasis-open.org/committees/security Global Grid Forum OGSA Security Working Group: Use of SAML for OGSA Authorization, http://www.globus.org/ogsa/Security Pfitzmann, B., Waidner, B.: Token-based Web Single Signon with Enabled Clients. IBM Research Report RZ 3458 (#93844), November (2002) Frier A., Karlton P., and Kocher P.: The SSL 3.0 Protocol. Net Scape Communications Corporation, Nov 18, (1996) Galbraith, B., Trivedi, R., Whitney, D., Prasad D. V., Janakiraman, M, Hiotis, A., Hankison, W.: Professional Web services Security, Wrox Press, (2002) XML Encryption WG, http://www.w3.org/Encryption/2001/

Performance Improvement of Information Service Using Priority Driven Method Minji Lee1, Wonil Kim2*, and Jai-Hoon Kim1 1

Ajou University, Suwon 442-749, Republic of Korea {mm2 23j,jaikim}@ajou.ac.kr

2.

Sejong University, Gwangjin-Gu, Seoul, 143-747 Republic of Korea [email protected]

Abstract. Grid is developed to accomplish large and complex computation by gathering distributed computing resources. Grid employs information service to manage and provide these collected resources. The accurate information on resource provider is essential to stable service. In Grid information service, GRIS and GIIS are used to gather and maintain the resource information. These two directory servers store resource information in a cache only for a fixed period to improve fast response and accurate access. Since resource information search is performed in the cache, the system performance depends on the search time in a cache. In this paper, we propose a novel cache management system based on the resource priority. Priority of resource is determined by the frequency of resource usage, and the number of available resources in GRIS. The simulated priority driven schemes provide more accurate information and faster response to a client than the existing cache update scheme of information service.

1 Introduction In Grid, directory service [1] is provided to keep and maintain resource information components. Variable types of connections are opened in directory service to update and search the resource status. Most of connections between the directory server and the resource producer are used in modifying the status of resources. The information transmitted by a resource provider is stored in a cache of directory server for fixed time interval. Current directory servers perform query processing on cache. They rummage their caches to search requested information after receiving a query from a client. If there is no proper resource information in the cache, the query is transmitted to other directory servers. If there are not many resource providers cache size is not an important problem. However, as the number of resource provider increase, cache size becomes a critical issue because it can be a major factor for fast response and accurate access. If there is too much information in a cache, it takes long time to respond to a query whereas the information correctness increases. * Author for Correspondence +82-2-3408-3795 M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 565–572, 2004. © Springer-Verlag Berlin Heidelberg 2004

566

M. Lee, W. Kim, and J.-H. Kim

Cache update is usually performed as a sequence of FIFO (First In First Out). In order to provide the accurate information, old information, which stays in a cache more than the fixed time, is also deleted. As the number of resource providers and requests increase, resources status changes fast. Consequently, it needs effective cache update mechanism that guarantees the accuracy of information. The best strategy for efficient cache management is to predict the next information being requested by a client and stores the information in a cache in advance. However it is impossible to predict every user’s requests and store the information. Instead of predicting every user’s request, this paper proposes two schemes to decide the importance of given information. One method gives priority on the basis of resource access frequency. In order to know the access frequency, GRIS logs the number of requests whenever it receives a resource request message from a client. Another method gives priority to the resource provider having more available resources than the others.

2 Information Service in Grid Grid is an important field in the distributed system. Grid can provide huge amount of resources to the requestor. In order to manage those resources, Grid maintains systematically organized architecture.

2.1 Grid Information Service Protocol In order to provide proper resources to a requestor, it is important to manage the resources efficiently and effectively. It is also important to provide correct information about resource status. Grid Information Service (GIS) supports such a service [2]. GIS provides the initial discovery and ongoing monitoring of the existence and characteristics of resources, services, computations, and other entities that are part of Grid system [2]. In order to support the functions listed above, GIS assumes that resources are subject to fail and the total number of resource provider is large. Furthermore, the type of resource is various. GIS architecture comprises highly distributed resource providers and specified aggregate directory services. Resource provider furnishes dynamic information according to the prototype defined by VO-neutral infrastructure. GIS uses two basic protocols; Grid Resource Information Protocol (GRIP) and Grid Resource Relationship Protocol (GRRP). GRIP is used to access information and GRRP is used to notify aggregate directory services of the availability of the information. For example, GRIP adopts standard Light Weighted Directory Access Protocol (LDAP) [5] as a protocol and it defines a data model, query language, and wire protocol.

2.2 Current Information Service in Grid Metadata Directory Service (MDS) is an implementation of information service in a Grid [4]. The service is provided with GRIS and GIIS. GRIS gathers the status infor-

Performance Improvement of Information Service Using Priority Driven Method

567

mation of resources and reports the status information to GIIS [4]. GIIS is hierarchical structure organized with several GIIS and GRIS [4]. GRIS is located at the bottom of the hierarchy and it is connected to several GIIS. GIIS receives data from many GRIS and stores the data in a cache. Fig 1 shows a prototype of query processing generated by a client. Initially, a query accesses the highest-level GIIS. If the data requested by a client are in the cache, the data are returned to a client. If not, query is sent to other GIIS to find data. This process is repeated until the query finds requested data. In other words, if the requested information is not in the cache, the query has to be transmitted to many GIIS and the requestor has to wait until the search sequence finishes. Therefore, the performance of a query to a GIIS is dependent upon the performance of the resource information servers that the query accesses.

Fig. 1. The Query processing in MDS. A query comes to the highest-level GIIS. GIIS searches its cache. If there is requested information, the information is transmitted to a requestor. If not, the query is sent to other GIIS or GRIS until requested information is found.

Warren Smith [3] showed the different searching time between two cases; first case is that data is at top-level cache and second is that data is at the bottom-level cache. It takes 10 seconds for the first case and 60 seconds for second case to receive data. This result shows the importance of containing data in the higher-level cache.

3 The Proposed Information Service System 3.1 Proposed Directory Service Since current directory service does not consider the importance of resources all the resource information has the same priority. Thus cache update is influenced only by its resource information registration time. However, such a current scheme cannot provide fast and accurate resource information for clients. If each resource has various priorities on the basis of some rules and the priority is applied to cache update method, its cache update result will be different from current cache update result in some ways. Two methods are proposed to decide the priority among resources. One method determines the resource priority with the frequency of resource usage. This scheme

568

M. Lee, W. Kim, and J.-H. Kim

implies that frequently used resources have higher possibility of being accessed by a client than the resource that is used occasionally. Another method determines resource priority with the number of available resources that GRIS has.

Fig. 2. New Scheme Added Directory Service in the Simulation

Fig. 3. Cache Update of Each Algorithm when Resource Information Message Arrives

Fig. 2 shows how the resource access frequency is recorded and used in a simulated directory service. Each GRIS records the type of a requested resource whenever it receives a resource request message from a client. At this time, the GRIS records the requested resource type without considering the availability of resource.

3.2 Data Processing of Proposed Methods Two cache update schemes of the proposed methods are shown in Fig. 3. Both of the methods start data processing when a message arrives. Then they check if there is empty space in their caches to store new data. If there is empty space, new data is inserted into the cache. If there is not, each method decides which data is discarded

Performance Improvement of Information Service Using Priority Driven Method

569

from the cache considering the data priority. One method decides the priority by the number of available resources and another decides the priority by the resource access frequency. After deciding data to be discarded, the data is compared with data in a message. If data in a message priority is higher than the data priority in a cache, the message is inserted into the cache or the message is discarded. Fig. 3 explains an example of the proposed method. In case of method 1, access frequency of a resource update message for resource X is 6 per day. Access frequency of resource G in a cache is 1 per day. Thus resource G is discard from a cache and resource X is inserted into a cache. In case of method 2, there are 6 resources in the resource update message and its resource type is X. In the cache, resource D has only 1 resource. Thus resource D is discarded from the cache and resource X is inserted into a cache.

4 Simulation Simulation is performed with three types of directory service distinguished by the cache update method. First simulation is performed with the information service applying general cache update scheme. Second simulation is performed with the directory service applying new cache update scheme considering the resource access frequency and ttl. Third method is completed after applying new scheme considering the number of available resources and ttl.

Fig. 4. Information Service Organization for Simulation. There are 5-layer GIIS. Each level has 1, 2, 3, 7 and 16 nodes. There are 160 GRIS servers and each server controls 20 resources.

4.1 Construction of Directory Service There are 160 GRISs in the simulation. Each GRIS controls 20 resources and the type of each resource is the same. 5 types of resource are used for the simulation. The component ratio of each resource is 50%, 20%, 10%, 10% and 10% respectively. Resource request ratio is identical to the component ratio Each GIIS controls two GIIS except the level 5. In level 5, each of 16 GIIS controls 10 GRISs. Table 1 shows the cache size at each level. Cache size increases according to GIIS levels. Simulations are performed changing the cache size of each level.

570

M. Lee, W. Kim, and J.-H. Kim

Fig. 5. Cache Hit Ratio at each Level GIIS When Cache Size of the Highest Level GIIS is 5. In order to show the performance improvement of new cache update schemes, cache size is fixed as simulation 1 shown in table 1.

4.2 Cache Hit Ratio of Each Algorithm Fig. 5 shows cache-hit ratio at GRIS. High cache hit ratio at the GRIS level means poor performance, because the GRIS is the last directory where information search is performed. When the highest-level cache size is 10, cache-hit ratio of the previous scheme is 25%. On the other hand, cache hit ratio of proposed schemes is 10%. In case of the cache size is 15, cache-hit ratio of all schemes is about 10%. An increase

Performance Improvement of Information Service Using Priority Driven Method

571

of cache-hit ratio at GRIS means an increase of response time for a query. Performance of the previous algorithm decreases as cache size reduces as shown in Fig 5. Performance of priority driven schemes shows almost the same performance when cache size changes. The proposed schemes show better performance than the previous scheme.

Fig. 6. Cache-Hit Ratio When Cache Size Changes. There are nine types of simulations and each simulation is distinguished by the cache size and the cache update schemes.

Fig. 7. Accuracy of cache information by measuring the success rate of resource request

Fig 5 and 6 show cache hit ratio when cache size changes. If cache size is large enough to store information transmitted by lower level GIIS or GRIS, three algorithms find information from cache at the ever rate. As the cache size decreases their cache hit ratio is discriminated. Cache hit ratio at the highest-level GIIS of new cache update schemes decreased of 50% compared with previous cache update scheme. Furthermore, the cache hit exactness of proposed schemes increases up to 92%.

4.3 Accuracy of Cache Information Fig. 7 shows the accuracy of returned information. When previous update scheme is applied to information service, accuracy of information is decreased as cache size decreases. In contrast to the previous scheme, accuracy of proposed two schemes are unchanged when cache size changes. Furthermore the proposed schemes show higher accuracy than the previous cache update scheme. For example, the Usage-Priority scheme shows almost 90% of accuracy. This rate is stable though the cache size changes.

572

M. Lee, W. Kim, and J.-H. Kim

4.4 Performance Evaluation Proposed algorithms are fast and accurate as shown in Fig. 5, 6 and 7. When previous cache update algorithm is applied to information system, cache hit ratio of the highest-level directory server also decreases as cache size of directory server decreases. It is natural that small size cache has small number of information than the large size cache and the cache makes less responds to a query. In order to increase the cache-hit ratio, new cache update scheme is needed. Basic concept of proposed algorithms is priority driven information. When a cache is full of data, one of the information in a cache is discarded. The previous algorithm considers only staying time in a cache of the information. On the other hand, the proposed algorithms do not only consider staying time but also priority of information.

5 Conclusion Current cache update method is FIFO on the basis of sequence that information enters cache. In this paper we proposed a novel method applying priority to resource information on the basis of resource usage frequency and the number of available resources. The proposed cache update method increased exactitude of cache information. Even though the cache size decreases cache-hit ratio and exactitude of a query response are unchanged. As the resources and users increase, the performance of previous information service may depreciate. Information accuracy of the proposed schemes is increased up to 90% when the previous scheme shows only 70%. Furthermore, the cache-hit ratio at the highest-level directory server doubles comparing with the previous scheme. The simulation shows that the proposed cache update schemes provide fast and accurate information.

References 1. Steven Fitgerald, Ian Foster, Carl Kesselman and Gregor VOn Laszewski, “A Directory Service for ConFiguring high-Performance Distributed computation,” Posted on “http://www.globus.org.” 2. Ian Foster, Carl Kesselman and Steven Tuecke, “The Anatomy of Grid,” “http://www.globus.org.” 3. Warren Smith, Abdul Waheed, David Meyers and Jerry Yan, “An Evaluation of Alternative Designs for Grid Information Service,” 9th IEEE international Symposium on High Performance Distributed Computing 4. “MDS 2.2: Creating a Hierarchical GIIS”, “MDS 2.2: GRIS specification Document: Creating New Resource providers”, http://www.globus.org/mds/NewFeatures.html, 2002 5. Gregor von Laszewski and Ian Foster, “usage of LDAP in globus,” http://www.globus.org/mds/NewFeatures.html, 2002

HH-MDS: A QoS-Aware Domain Divided Information Service* Deqing Zou, Hai Jin, Xingchang Dong, Weizhong Qiang, and Xuanhua Shi Huazhong University of Science and Technology, Wuhan, 430074, China [email protected]

Abstract. Grid computing emerges as effective technologies to couple geographically distributed resources and solve large-scale problems in wide area networks. Resource Monitoring and Information Service (RMIS) is a significant and complex issue in grid platforms. A QoS-aware domain divided information service, HH-MDS, is introduced in this paper. It is an important component of our service grid platform. HH-MDS solves system scalability issue effectively. In additional, several effective QoS strategies are provided to improve the efficiency of resource monitoring and information searching. Based on these strategies, service-oriented definitions and SLA specification are proposed to describe serving capability and relating QoS issues.

1 Introduction Grid technologies [1][13] enable large-scale sharing of resources within all kinds of consortia of individuals and/or institutions. In these environments, the discovery, characterization, and monitoring of resources and computations is a challenging issue. The RMIS need record the identity and essential characteristics of services available to community members, and maintain service validity. Notification framework is proposed as a basic means for determining the existence and properties of an entity in wide area network. Each framework message is propagated with timestamps. Based on soft-state model, a robust notification mechanism coupled with a graceful degradation of stale information is provided. Inter-operation between different systems is also a challenging issue in grid platforms. Web services [2] are Internet based applications that communicate with other applications to offer service data or functional services automatically. A service level agreement is an agreement regarding the guarantees of a web service. A service provider contracts with a client to provide some measurable capability or to perform a task by a service [3]. As resources in a particular grid system have apparently geographical characteristic, it is suitable to divide the whole grid into several parts and manage them separately. The rest of the paper is organized as follows: Section 2, we discuss related works about the information services. Section 3, we propose HH-MDS framework. A domain divided architecture is adopted and system scalability is discussed within this framework. Sections 4, some strategies about QoS guarantees are described. Finally we analyze HH-MDS performance and conclude this paper. * This paper is supported by National Science Foundation under grant 60125208 and 60273076. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 573–580, 2004. © Springer-Verlag Berlin Heidelberg 2004

574

D. Zou et al.

2 Related Works Peer-to-peer [4] paradigm dictates a fully distributed, cooperative network design, where nodes collectively form a system without any supervision. Its advantages include robustness in failures, extensive resource sharing, self-organization, load balancing, data persistence, anonymity, etc. Current search algorithms for unstructured P2P networks [5] can be categorized as either blind or informed. In a blind search, nodes hold no information that relates to document locations. While in informed methods, there exists a centralized or distributed directory service that assists in the search for the requested objects. Informed method is more suitable for global information searching and performance prediction than blind method. Globus [14] platform can be viewed as a representative peer-to-peer system. There does not exist a centralized supervision over all kinds of grid sources, and each task is submitted and controlled by a grid source. Globus adopts an informed method, the Monitoring and Discovery Service (MDS2) [6][7], for resource searching. MDS2 provides an effective managing strategy for both static and dynamic information of resources. It adopts a hierarchical structure to collect and aggregate service information. MDS2 is one of most popular RMISs at present. If the information service of a grid system is constructed in tree architecture, system load in high-level information servers will increase violently, and information searching efficiency could not be guaranteed. Index service in Globus Toolkit 3 is developed based on web services, but there is not an effective name space to organize services in the whole grid. It depends on domain name or IP address to locate the corresponding services. The Relational Grid Monitoring Architecture (RGMA) [8] [12] monitoring system is developed based on the relational data model and Java Servlet technologies. Hawkeye is a tool developed by the Condor group [9]. The main use is to offer monitoring information to anyone interested and to execute actions in response to conditions. They could not extend to a large scale. Based on domain divided principle, we propose a peer-to-peer architecture for high-level information servers to manage domain services and provide global service information. One domain includes many organizations, and one organization includes one or several service providers. A domain information server is responsible for the validity management and general capability provision of domain services. Besides the validity management, an organizational information server is also responsible for current capability provision of services within an organization. Effective strategies, such as performance statistic, prediction, and dynamic notification cycle, are provided to evaluate current service capability of the service provider.

3 Architecture of HH-MDS Framework Fully studied the above popular RMISs, a perfect RMIS oriented service should conform to five principles: 1) Service usability; 2) Information consistency; 3) Query performance; 4) System scalability, and 5) Distributed architecture. HH-MDS framework is designed based on the above principles. As depicted in Figure 1, the whole framework is divided into several domains with peer-to-peer relationship. In each domain, HH-MDS information services are classified into three

HH-MDS: A QoS-Aware Domain Divided Information Service

575

levels: (1) Domain Information Server (DIS), (2) Information Server (IS), and (3) Service Provider (SP). In order to make a DIS from becoming a single point of failure, two DISs are constructed in a domain.

Fig. 1. The HH-MDS Architecture

A SP registers to an IS with local services within an organization. It monitors local services, evaluates and predicts their performance, and reports their current serving capability to the IS. An IS registers to a DIS with services within a domain. Registered services in the DIS include service’s functions and general serving capability. The descriptions of registered services in the IS differ from those in the DIS. Besides those service descriptions in the DIS, the IS also includes current serving capability of services, described by SLA specification. In HH-MDS framework, we propose four types of protocols to achieve global resource sharing in the whole grid: (1) inter-domain GRid Registration Protocol (Inter-domain GRRP), which is available when a new domain is added to current grid system, at first, a new legal certificate signed by Domain CA is required for a new DIS; (2) Inter-domain GRid Information Protocol (Inter-domain GRIP), which is used by a DIS to query service information from the other DISs; (3) Intra-domain GRid Registration Protocol (Intra-domain GRRP), which is available to a SP or a IS when it registers to a higher-level IS or DIS with services; and (4) Intra-domain GRid Information Protocol (Intra-domain GRIP), which is used by users when they query service information from a DIS or a IS. A hierarchical naming method described in XML schema is proposed to organize registered elements. Elements “HHMds-Domain-name”, “HHMds-Organizationname = XXX”, and “HHMds-Host-name” are used to describe a host, and such above elements and element “HHMds-Service-name” are used to describe a service.

576

D. Zou et al.

Fig. 2. QoS Framework of HH-MDS

3.1 HH-MDS QoS Criteria As depicted in Figure 2, HH-MDS QoS framework [10][15] is divided into three levels: DIS level QoS, IS level QoS, and SP level QoS. DIS level QoS is used to guarantee service usability in a domain and provide global service information. Notification mechanism is adopted by the DIS to send service information to users subscribing the corresponding services when notification event occurs. IS level QoS is used to guarantee service validity and provide current serving capability of services in an organization. SP level QoS is used to monitor local services. Based on user request rate, available bandwidth, and local available resources, SP level QoS is used to determine current serving capability of a service and predict its future serving capability.

3.1.1 DIS Level QoS DIS level QoS includes three parts: service SLA specification, subscription and notification, and service validity management. Based on subscription and notification mechanism, a user can subscribe interesting services and get notification message in time once anyone registers to the DIS, or is invalid. A grid service is a WSDL-defined service that conforms to a set of conventions relating to its interface definitions and behaviors. It includes many kinds of service data to describe service functions. Service SLA specification provides general serving capability and policy description. The definition part consists of many SLA parameters, which are assigned a metric defining how its value is measured or computed. These parameters include total response time, total throughout ratio, performance curve, and usage policy. Service SLA specification provides service lifetime, and service obligations, which define QoS guarantees that service provider offers to service consumer. These guarantees represent promises with respect to the state of SLA parameters and promises to perform an action. There are three kinds of service status: normal, failure, and revised. When the status is revised, status report couples with the revised service description.

HH-MDS: A QoS-Aware Domain Divided Information Service

577

3.1.2 IS Level QoS IS level QoS embodies at three aspects: dynamic SLA management (such as current response time, current throughout ratio, etc) of services, cache management, and subscription and notification. Subscription and notification mechanism is realized at two levels in the IS: service level, and node level. At service level, a user interacts with a service directly and subscribes information about its current capability. Once its current capability meets the requirement of user request, the service will notify him. The IS container sets up a queue for each service, which records users who subscribe this service. At node level, the IS container sets up a common queue for user subscription. A user can subscribe interesting services and get notification. Dynamic service information is obtained from the SP, and cached in the IS memory. Cache management is introduced to fasten information searching speed, but it reduces information veracity. It is unadvisable to set a fixed notification interval for a service when it registers to the IS, because its current serving capability is dynamic and undetermined. When current serving capability of a service changes slowly, the notification interval is big, vice versa. Alterable notification improves information veracity and lighten system load. 3.1.3 SP Level QoS SP level QoS includes two parts: statistic decision-making, and changing detection. Statistic decision-making is used for performance evaluation and alterable notification. Changing detection is used to detect immediate change of local services. As different service type has its special resource requirement, we take data intensive computing service type as an example to describe statistic decision-making process. P donates available node resource, including CPU, memory, disk, and P(t) donates available resource changing rate, both including available amount and current access speed. N donates current available bandwidth, and N(t) donates available network bandwidth changing rate. F donates serving capability of a service, and F(t) donates the changing rate of serving capability. If memory and disk space are large enough, F is related to CPU and network bandwidth. We donate F with F1(CPU, N). If network bandwidth is equal to the constant n, we conclude:

The statistic decision-making flow is described as follows: For each n in

I donates bandwidth interval, and K donates CPU available rate interval. According to Eq.1 and linear interpolation formula, we have:

578

D. Zou et al.

We get the following performance evaluation aggregation:

Based on Eq.3, we get current serving capability of a service as:

P(0) and N(0) are obtained by operating system and network detector separately. If

donates the

last notification time), the SP notify F1(P(0),N(0)) to the IS. BasicValue is a threshold specified as a performance parameter of a service. There exist some gusty events in the SP. Those events will cause service performance change violently. Four situations exist: (1) user request number changes violently, (2) system load changes violently, (3) resource reservation, and (4) resource release. For these situations, F should be calculated again and notified the IS at once.

4 Performance Analysis A user obtains global service information from a DIS. There exists information inconsistency when a service fails at status report intervals. If the interval is large, information inconsistency is apparent. The interval should be adjustable according to network status. A user queries current serving capability of services from the IS directly. As our service grid system is developed based on Globus Toolkit 3, information server scalability with both users and service providers, and query performance have been studied in [11]. In this section, we focus on the information service in an organization. Three parameters, including serving capability, looking-up error rate and communication load will be studied. The experiments were run on two sites: the server-sided services are provided at Internet and Cluster Computing Center (ICCC) at Huazhong University of Science and Technology (HUST) with a 16 nodes cluster, each node of Xeon 1GHz, 40GB HD, 512MB memory. The IS is on a node, and the other nodes are used as the SPs. The client is a personal computer with Power604E 200MHz, 2 GB HD, 128MB Memory at High Performance Computing Center (HPCC), HUST. The bandwidth between ICCC and HPCC is around 50Mb/s. A 100 Mbps bandwidth is provided within a cluster. A service for data conversion is provided in one of the SPs. The service has four parameters: 1) size of input data: 1.2Gb, 2) size of output data: 800Mb, 3) size of application code: 2MB, and 4) execution time on a Base Machine (P4 1GHZ CPU, 512MB Memory): 40 sec. Response time of the above service is depicted in Figure 3. When output bandwidth in edge router is limited to a fixed value, response time decreases as CPU available rate increases. But when CPU available rate increases to certain value, response time is basically stable. During the serving process of the service, CPU available rate and output bandwidth are detected timely. Based on response time in Fig.3, current response time of the service is achieved.

HH-MDS: A QoS-Aware Domain Divided Information Service

579

Fig. 3. Response Time Evaluation

We send queries for the service to the IS from the client side timely. By comparing service information with the same time at two sites, looking-up error rate is obtained. BasicValue is used to determine the time the SP reports service information to the IS, and its value is set to certain percent of average response time. Looking-up error rate related to the service under different BasicValue is depicted in Figure 4. The corresponding communication load related to the service is depicted in Figure 5. The SP computes response time per second and determines whether to send information to the IS or not. Communication load statistic is got at one-minute interval. Generally, looking-up error rate and communication load are related to service type closely.

Fig. 4. Looking-up Error Rate

Fig. 5. Communication Load

5 Conclusions and Future Works In this paper, we propose a QoS-aware domain divided information service, HHMDS, as an important component of our service grid platform. We have proposed four types of protocols to construct HH-MDS. Distributed management and distributed information searching are proposed in HH-MDS. Three-level QoS framework of HH-MDS is discussed in this paper to guarantee service usability and information consistency, and improve query performance. In the future works, we will fully study the characteristics of all kinds of services, and propose a more perfect statistic model for serving capability of grid services.

580

D. Zou et al.

References [1] I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid”, Intl. Journal of Supercomputer Applications, 2001. [2] The Web Services Industry Portal, http://www.webservices.org/. [3] H. Ludwig, A. Keller, A. Dan, and R. King, “A Service Level Agreement Language for Dynamic Electronic Services”, Proceedings of the 4th IEEE Int’l Workshop on Advanced Issues of E-Commerce and Web-Based Information Systems, IEEE, 2002. [4] R. Schollmeier, “A Definition of Peer-to-Peer Networking for the Classification of Peerto-Peer Architectures and Applications”, Proceedings of the First International Conference on Peer-to-Peer Computing (P2P’01), IEEE, 2002. [5] M. Kelaskar, V. Matossian, P. Mehra, D. Paul, and M. Prashar, “A Study of Discovery Mechanisms for Peer-to-Peer Applications”, Proceedings of CCGrid’02, pp.414-415. [6] G. Aloisio, M. Cafaro, I. Epicoco, and S. Fiore, “Analysis of the Globus Toolkit Grid Information Service”, Technical report GridLab-10-D.1-0001-GIS_Analysis, GridLab project, http://www.gridlab.org/Resources/Deliverables/D10.1.pdf. [7] K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman, “Grid Information Services for Distributed Resource Sharing”, Proceedings of 10th IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), 2001. [8] S. Fisher, “Relational Model for Information and Monitoring”, Technical Report GWDPerf-7-1,GGF,2001. [9] M. Litzkow, M. Livny, and M. Mutka, “Condor – A Hunter of Idle Workstations”, Proceedings of the 8th International Conference of Distributed Computing Systems, pp.104-111, June 1988. [10] J. Al-Ali, F. Rana, W. Walker, S. Jha, and S. Sohail, “G-QoSM: Grid Service Discovery Using QoS Properties”, Computing and Informatics Journal, Special Issue on Grid Computing, Institute of Informatics, Slovak Academy of Sciences, Slovakia, 21(4), pp.363-382, 2002. [11] X. Zhang, L. Freschl, and M. Schopf, “A Performance Study of Monitoring and Information Services for Distributed Systems”, Proceedings of HPDC’03, 2003. [12] DataGrid, DataGrid Information and Monitoring Services Architecture: Design, Requirements and Evaluation Criteria, Technical Report, 2002. [13] W. E. Johnston, D. Gannon, and B. Nitzberg, “Grids as Production Computing Environments: The Engineering Aspects of NASA’s Information Power Grid”, Proceedings of 8th IEEE Symposium on High Performance Distributed Computing, 1999. [14] I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit”, International Journal of Supercomputer Applications, Vol.11, No.2, pp.115-128, 1997. [15] C. Li, G. Peng, K. Gopalan, and T. Chiueh, “Performance Guarantees for Cluster-Based Internet Services”, Proceedings of CCGrid’03, IEEE, 2003.

Grid Service Semigroup and Its Workflow Model Yu Tang1, Haifang Zhou2, Kaitao He3, Luo Chen1, and Ning Jing1 1

School of Electronic Science and Engineering, National University of Defense Technology, Changsha, Hunan, P.R.China

[email protected], 2

{luochen,

ningjing}@nudt.edu.cn

School of Computer, National University of Defense Technology, Changsha, Hunan, P.R.China [email protected] 3

China Geological Survey, Beijing, P.R.China [email protected]

Abstract. Grid service is defined by OGSA as a web service that provides a set of well-defined interfaces and that follows specific conventions. To classify different Grid services and describe their relations, we present Grid Service SemiGroup (GSSG) based on group theory. Moreover, a novel concept, i.e. meta-service, is proposed based on the definition of generating element in cyclic monoid. To meet integration and collaboration demands of distributed and heterogeneous Grid services, some special elements, such as time, resource taxonomy and etc, are introduced to extend basic Petri net for workflow modeling. A new workflow model for GSSG named Grid Service/Resource Net (GSRN) is proposed and presented as well. And some new analysis methods based on graph theory, which complement traditional analysis methods of basic Petri net, are introduced to analyze and evaluate GSRN. The practicability and effectivity of GSRN are demonstrated in an application project.

1 Introduction As a novel technology defined by the Open Grid Services Architecture (OGSA) to implement resource sharing and cooperation, Grid service has become the focus of research and web-based applications. Grid service is a web service that provides a set of well-defined interfaces and that follows specific conventions. The interfaces address discovery, dynamic service creation, lifetime management, notification, and manageability; the conventions address naming and upgradeability [1,2]. According to different purposes, developers and service types, Grid services belong to different organizations and should be classified into different Grid service sets. Since some relations exist among Grid service sets, it is necessary to provide a mechanism to describe these relations. According to our best knowledge, the related research has not been attracted appropriate attention. Then, we propose a novel concept and framework called Grid service semigroup (GSSG) to classify different Grid service sets and describe their relations. And based on the definition of generating element in cyclic monoid[3], meta-service is presented for the first time.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 581–589, 2004. © Springer-Verlag Berlin Heidelberg 2004

582

Y. Tang et al.

On the other hand, for most web-based applications and works, integration and cooperation of different Grid services are indispensable. The course of Grid services cooperation is a service chain and can be described by workflow model. There exist many ways to define and describe the workflow model, such as WFMC define language [4], RAD graph, EPCM model, and etc, in which Petri net has been the focus of research and application. But Basic Petri net can’t model dynamic, timed and conditional workflow [5-7]. So we introduce some additional elements, such as time, condition, resource taxonomy, and etc, to extend basic Petri net. Then we propose a new workflow model named Grid Service/Resource Net (GSRN). Because GSRN is an extended Petri-net-based workflow model for GSSGs, we introduce some new algorithms and methods based on graph theory [3, 8] to complement traditional Petri net methods for analyzing and evaluating GSRN. The remainder of this paper is organized as follows: the definitions and concepts of GSSG are introduced in Section 2. Section 3 is devoted to definitions and related rules of the extended Petri-net-based workflow model for GSSGs (GSRN). In section 4, some new methods for analyzing and evaluating GSRN are discussed in detail. GSRN are illustrated and demonstrated by means of an application example in Section 5. Finally, Section 6 provides some concluding remarks.

2 Grid Service Semigroup A Grid service can connect and invoke other Grid services through standard interfaces to implement sharing and integration. The connecting relation and invoking relation among Grid services can be regarded as an operation (we define it as Join). Based on Grid service sets and Join operation, we find that Grid service sets are similar to semigroup [3]. So we propose Grid service SemiGroup (GSSG) and induce some useful definitions and theorems to determine structural similarity of GSSGs. Definition 1: A semigroup (s,*) is a nonempty set S which has a binary operation * such that Generally, the symbol * can be omitted, i.e. Definition2: A semigroup (S,*) is a monoid if A monoid is denoted by (S,*,e) and such e is called identity. Definition3: A monoid is a cyclic monoid if and such h is called generating element. Definition4: Join denoted by is a binary operation which describes the connecting and invoking relations between any two Grid services. Definition5: Grid service semigroup (GSSG) is a semigroup(GS,+) in which GS is a set of Grid services, i.e. Definition6: Empty service is a service which has no operation and function. Definition7:Grid service monoid is a GSSG such that

Grid Service Semigroup and Its Workflow Model

583

Definition8: A meta-service (ms) is a basic Grid service such that and is a Grid service monoid. Definition9: T is a subset of a GSSG if is a GSSG, then T is a subGSSG of (GS,+).

Definition10: Given two and map is a homomorphism if Monomorphism, epimorphism and isomorphism are defined as homomorphism between GSSGs. Different GSSGs may have same elements or similar structure, so homomorphism is necessary and useful in determining structural similarity of GSSGs. The related theorems are built on definition 10 and will be discussed in our subsequent papers.

3 Grid Service Semigroup Workflow Model: GSRN We extend basic Petri net to describe and model workflow of GSSGs. The basic definitions and concepts of Petri net are in [5-7]. Definition11: Grid Service/Resource Net (GSRN) is an extended Petri net, i.e. a tuple where: P is a finite set of places, is a set of resource(data, information, and etc), is a set of Grid services. T is a finite set of transitions representing the activities, F is a set of flow relation, K is a places capacity function, generally CLR is a resource taxonomy function, CLS is a services taxonomy function, is a GSSG, AC is a flow relation markup function, CN is a condition restriction function on F, TM is a time function on T. GSRN can be classified as fixed time-delay net and unfixed time-delay net. In fixed time-delay net, there is a fixed time value for each transition. And in unfixed time-delay net, a value area is endued to each transition, i.e. each transition has a scheduled executive time which is decided by practical flow. If a transition is scheduled to execute at time b, then its actual execution time (t) satisfies W is a weight function, M is a marking function, of whole net,

denotes the system marking is the initial marking.

584

Y. Tang et al.

Definition12: The pre-set of GSRN place/transition is a set denoted by such that the post-set of GSRN place/transition is a set denoted by such that GSRN is a directed graph which is composed of places, transitions and arc lines. In GSRN, we use token (black spot) to mark resource distribution (tokens exist in places) and arc lines to express the flow relation between places and transitions. As for extended elements, time elements are marked on transitions, and conditions are kept on the corresponding transitions or places. The running of GSRN is implemented by firing transitions. A transition can fire only if its input places have corresponding tokens (markings). After the transition being triggered and fired, the number of tokens in pre-set decreases and the number of tokens in post-set increases accordingly. Definition13: The transition is enabled under M, in symbols iff and (or fire), resulting in a new marking

If

holds, the transition

in symbol

may occur

with:

The transition structures describe the dependent relation among different resource and services. The basic transition structures of GSRN are concluded as six types (figure 1 shows) and workflow models can be composed of these six basic structures (basic place structures are similar to basic transition structures).

Fig. 1. Basic transition structures of GSRN

4 Analysis and Evaluating Methods of GSRN Characteristics of GSRN are very important in analyzing and evaluating GSRN. Main characteristics of GSRN include boundedness, reachability, liveness, and etc. Their

Grid Service Semigroup and Its Workflow Model

585

definitions are as same as those of basic Petri net (see [5-7]). Because GSRN is an extended Petri-net-based workflow model, traditional methods of Petri net should be combined with new algorithms based on graph theory to form a new analysis and evaluating system of GSRN. Follows are two new analysis and evaluating methods.

4.1 Resource Matching Algorithms of GSRN In GSRN, different sub-flows may request same resource and Grid services, but the resource can’t meet all demands at the same time. Then, confliction problem between resource and requests is induced as the resource matching problem [3, 8]. Based on graph theory, we put emphasis on algorithm of bigraph maximal matching for GSRN. Algorithms of bigraph optimal matching [3, 9] will be discussed in another paper. Definitionl4: Given a graph G and its edge subset M, if any two edges in M have no intersectant vertexes, then M is a matching of G. The vertexes related to edges of M are called saturated points; otherwise, the vertexes are called non-saturated points. Definition15: Given a matching M of graph G = (V, E ) , if for any matching M’ of is the number of edges in M), then M is a maximal matching of G. Definition 16: Given a matching M of graph G = (V,E), interleaved path is a path that is composed of edges belong to M and not belong to M alternately. Definition17: Given an interleaved path of matching M of G, i.e. P, if the two vertexes of P are non-saturated points, then P is called augment path. Theorem1: M is the maximal matching of G iff there is no augment paths in M. Proof: See [3]. Theorem 1 is the foundation of algorithms of bigraph maximal matching, and we use Hungary algorithm [3] to get maximal matching of GSRN.

4.2 Linear Temporal Inference Rules As defined in section 3, time element is an important element to evaluate GSRN, so we deduce some linear temporal inference rules as an evaluating method [10, 11]. Before giving rules, we define some symbols: are transitions in GSRN, denote scheduled time of are actual executive time of According to transforming structures shown in figure 2 and definition of time element, linear temporal inference rules, i.e. are proposed as follows [11]. 1. Rule1 (sequence): Based on sequence structure, we get follow equations.

586

Y. Tang et al.

Fig. 2. Transforming structures of linear temporal inference

By (1), (2) and (3), we

then

Rule1: The inducing courses of other rules are similar to rule1 and omitted in this paper. 2. Rule2 (paralleling): 3. Rule3 (free choice): 4. Rule4 (conditional choice):

5. Rule5 (circle):

(k is the circle times).

These rules above can’t be used for all GSRN models, and their applicable conditions are as same as the conditions discussed in [11].

5 A GSRN Example In our research project, we use GSRN to model the workflow of layout planning for the area nearby a bridge. Application courses are listed as follows: 1. Urban planning bureau proposes application request and the workflow begins. 2. Mapping bureau provides area map. 3. Traffic bureau provides related traffic data, Geological bureau provides related data, and business enterprises provide related business data. 4. Corresponding services process and integrate map and various data. 5. Eventually, the results return to urban planning bureau and the workflow finishes. The corresponding GSRN model is shown in figure3. And the meaning of resource and service taxonomy elements in this GSRN model is explained in following tables. In accordance with GSRN modelin figure 3, figure 4 demonstrates the application flow.

Grid Service Semigroup and Its Workflow Model

587

Fig. 3. GSRN model for layout planning

As experiment result shows, GSRN are effective and practical in modeling Grid services workflow. Based on GSRN, we can combine and aggregate distributed Grid services which belong to different GSSGs to fulfill large tasks.

6 Conclusion According to application demands, we propose GSSG and its related theorems based on group theory. And a new concept, i.e. meta-service, is presented. To describe and model workflow of grid services in different GSSGs, a novel extended Petri-netbased workflow model (GSRN) is proposed and discussed in detail. Moreover, some new algorithms and methods based on graph theory are introduced to analyze and evaluate GSRN. And the practicability of GSRN is verified in an application example. GSSG and GSRN are novel concepts and technologies. We will perfect and extend the definitions, theorems, and algorithms in the future. And we will put research emphasis on more key technologies, such as additional theorems of GSSG, rules for

588

Y. Tang et al.

Fig. 4. Application flow of GSRN example

GSRN model predigestion, new theories and methods for analyzing and evaluating GSRN, optimal resource matching algorithm for common graph, and etc. Acknowledgements. This work is supported in part by the National High Technology Research and Development 863 Program of China (Grant Nos.2002AA 104220, 2002AA131010, 2002AA134010).

References 1.

2. 3. 4. 5. 6. 7. 8.

Foster, C.Kesselman et al. The Physiology of the Grid:An Open Grid Services Architecture for Distributed Systems Integration. June, 2002. See http://www.gridforum.org/ogsiwg/drafts/ogsa_draft2.9_2002-06-22.pdf. S.Tuecke, K.Czajkowski et al. Grid Service Specification. Open Grid Service Infrastructure WG, Global Grid Forum, Draft 2. July 2002. See http://www.globus.org. Y.Q.Dai, G.Z.Hu, and W.Chen. Graph Theory and Algebra Structure (in Chinese). Tsinghua University Press, Beijing, China, 1999. D.Hollingsworth. Workflow Management Coalition: The Workflow Reference Model. Document Number WFMC-TC00-1003, Brussels, 1994. T.Murata. Petri Nets: Properties, Analysis and Applications. In Proceedings of the IEEE, 77(4), pages 541-580, April 1989. J.Peterson. Petri Net Theory and the Modeling of Systems. Prentice Hall, Englewood Cliffs, New Jersey, 1981. C.Y.Yuan. Petri Net Theory(in Chinese). Publishing House of Electronics Industry, Beijing, China, 1998. R.Johnsonbaugh. Discrete Mathematics, 4th Edition. Prentice Hall, Englewood Cliffs, New Jersey, 1997.

Grid Service Semigroup and Its Workflow Model

589

9. J. Edmonds. Path, trees, and flowers. Canadian J. Math., 17:449-467, 1965. 10. M. Silva, E. Teruel, and J. M. Colom. Linear algebraic and linear programming techniques for the analysis of place/transition net systems. In Lectures on Petri Nets I: Basic Models, W. Reisig and G. Rozenberg, Eds. Vol. 1491, Lecture Notes in Computer Science, pages 309–373, Springer-Verlag, 1998. 11. T.Liu, C.Lin, and W.D. Liu. Linear Temporal Inference of Workflow Management System Based on Timed Petri Net Models (in Chinese), ACTA ELECTRONICA SINICA, 30(2):245-248, Feb 2002.

A Design of Distributed Simulation Based on GT3 Core Tong Zhang, Chuanfu Zhang, Yunsheng Liu, and Yabing Zha College of Mechaeronics Engineering and Automation, National University of Defense Technology, Changsha 410073 [email protected]

Abstract. Aimed at coordinated resource sharing in distributed, heterogeneous dynamic environment, OGSA supports distributed simulation effectively in resource management. GT3 Core provides a structure of service container, based on which a new mode of distributed simulation system has been designed. The new mode realized the separation of simulation resource and simulation applications, and supplied a simulation server responsible for the organization of simulation resource. The server provides service index for higher simulation applications and enable the interaction among them. Under this new simulation mode, a combat simulation application has been developed as a prototype. It achieved well reusability, portability of simulation resource, and supported heterogeneous, cross-platform application development.

1 Introduction Building on technologies from the Grid [1, 2] and Web services [3], OGSA[4] has appeared as the most important Grid architecture. It defines a uniform exposed service semantics-Grid service, and provides well-defined interfaces for the components in Globus Toolkit (GT) [5]. GT3 is based on a new core infrastructure complied with OGSA, and is an open source implementation of OGSI [6]. GT3 Core [7] offers a runtime environment hosting Grid services, and mediates between the application and the underlying network, and the transport protocol engines. Distributed simulation is geographically distributed simulators interconnected via LAN or WAN, and aims to gain interoperability and reusability. In current systems based on High Level Architecture (HLA)[8], the reuse and cooperation is conditional and lack of wide applicability, which can hardly satisfy increasing simulation requirements. They care more about the operations in applications than the resource management. OGSA provides a new method for building and managing distributed system. Based on its service-oriented mechanism, resources can be encapsulated in a more standard and effective way. Therefore, OGSA serves as a middleware between the simulation resources and applications, and supports the system with great power.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 590–596, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Design of Distributed Simulation Based on GT3 Core

591

2 Backgrounds 2.1 The Framework of Web Service Web Service is one of the bases that support OGSA architecture. It describes a collection of operations which are network-accessible through standard XML messaging. Web Service is intended to facilitate the communication between computer programs, and builds on such standards as HTTP, XML, SOAP, WSDL and UDDI. It defines techniques for describing software components, methods accessing them, and discovery methods that enable the identification of service providers. OGSA takes great advantage of Web Service. First, dynamic discovery and composition of services in heterogeneous environment necessitates mechanisms for registering and discovering interface definitions and endpoint implementation descriptions, and for dynamically generating proxies based on bindings for specific interfaces. WSDL supports this requirement by providing a standard mechanism for defining interface separately from their embodiment within a particular binding. Second, the widespread adoption of Web services mechanism means that a framework based on Web services can exploit numerous tools and extant services. [4]

2.2 GT3 Core – A Grid Service Container [7] The model of GT3 Core is based on the notion of a container that hosts various logic components. These components can be deployed into the container with varying quality of service (QoS) and behaviors. The container must be flexible enough to be deployed into a wide range of heterogeneous hosting environments. Compared to conventional Web services toolkits, it provides three major functions. First, it supports light-weight service introspection and discovery, where information flows in a both pull and push way. Second, it provides dynamic deployment and soft-state management of stateful service instances that can be globally referenced using an extensible resolution scheme. Third, it has a transport independent Grid Security Infrastructure (GSI) supporting credential delegation, message signing, encryption, and authorization.

3 Key Points in GT3 Core 3.1 Service Data Service data refers to descriptive information about Grid service instances, which can support service discovery, introspection, and monitoring. It is a structured collection of information. Each instance has a set of service data elements (SDEs) with different types. OGSI defines extensible operations for querying, updates, and subscribing to notification of changes in SDEs. The application of the GridService interface’s findServiceData operation is service discovery. The essence of service discovery is to obtain the GSH (Grid Service Handle) of a desired service. A Grid service that supports service discovery is called

A Design of Distributed Simulation Based on GT3 Core

593

4 Distributed Simulation Based on GT3 Core 4.1 Design of the Framework We supposed a scene of combat simulation in a two-dimension world, where a missile is launched from the ground to fire at a plane. When the missile hit the plane, the combat is over. In this simulation, three members have been designed: plane member, missile member and manager member. The manager has to control the process of the whole combat. They are distributed in different computers. Based on members’ requirements, the resource of entity model and manager model is abstracted as services. Entity model is responsible for calculating the state of an entity in the combat and holding all the necessary information about them. And manager model take charge of simulation time to support manager member..

Fig. 2. Logical Structure of the Application

Figure 2 shows the logical structure of our application, which is based on a clientserver model. The server is realized as a registry server holding simulation services, namely simulation service container. The model resources in simulation are encapsulated as grid services, including entity model and manager model. As for the client, three members involves: plane, missile and manager. They complete their tasks using underlying services and interact with others through GT3 Core. The separation of simulation resource and applications is the key point here, which shows the superiority of grid-based system. It will largely reduce the coupling between systems and resource, and facilitate the reusability of resource. Further, OGSA specifies interactions between services in a manner independent of any hosting environment, so the services are portable to heterogeneous platforms.

4.2 Design of the Server – Simulation Service Container The server provides an index of all the services related to the simulation application for the clients. Based on registry service, it puts all simulation services together logically, in which a client can look up for desired one, while physically the services are distributed and implemented in various local containers.

594

T. Zhang et al.

Simulation services registered in the server are factory services, including entity factory and manager factory. The client has to create its own service instance from the proper factory, and make the instance serve as one member in the simulation and finish its supposed task. Here, the concept of instance is similar to that of federate in HLA’s federation [8]. The relationship among the service container, the service provider, and the client is shown in figure 3. The structure shows the way how the server provides these registered services to the clients. The content of registry list can be defined as service data in the registry service, which could be subscribed as notification. All these operations are supported by GT3 Core.

Fig. 3. Structure of the simulation server

4.3 Design of the Whole Process Based on the structure of the above framework, the development of simulation application can be summarized as the following steps: Step1: Define the concept model of simulation application, and specify distributed tasks. Then abstract desired simulation services from the requirement.. Step2: Design the server. Base on the common structure of simulation server, different simulation services are developed and deployed in their local container, and required to register to the simulation service container. The service interface and its service data should be designed and implemented under the mechanism of GT3 Core. Step3: Design the client. The client programs enable the utilization and interoperation of the simulation services. The simulation members will execute these client programs to finish the whole simulation task.

A Design of Distributed Simulation Based on GT3 Core

595

5 Realization of the Application 5.1 Simulation Services in the Server The manager service takes charge of the management of distributed simulation, especially the advancement and management of simulation time. It is defined as the service data. The service interface provides related operations as setting/getting the value of the time, and advancing the time with the process of simulation. This service must have the ability to send notification, which can assure the synchronization in the simulation. The entity service describes the model of entity participating the simulation. Its service data is the state information, including the entity’s position, time, and entity ID. The position is a two-dimension coordinate, and the entity ID appears as a GSH, which is a globally unique name. The interface defines operations to calculate the entity’s next moment position on the base of its current position and calculation formula. In order to enable interoperation between different combat members, the ability of notification is also required.

5.2 Simulation Client The realization of three members in combat simulation shows the execution of distributed task under the utilization and interaction among service instances. The manager client serves as a command center and orders to start simulation and advance the simulation time. It collects the information from all the members in the combat. When it makes sure that all the members have finished their own task at this moment, it will advance the time to next moment and send notifications. The manager subscribed notification messages from more than one member. The time when those messages will be sent is unsure, and they invoke the same callback, so how to identify the source and guarantee receiving every message once for all is quite important. So, the entity ID was set to identify notification source. And a boolean variable flag was set to each member, when the message has been sent set it true, else set it false. Only when all the flags turn true, simulation time could be advanced. As for the other two clients, they are quite similar in function. They need set their initial position, velocity, and subscribe to simulation time first. When the notification comes, they will calculate the entity position of next moment and cause position data changes notification to manager.

6 The Conclusion This paper uses grid technology to build a distributed simulation environment and develop a simple combat simulation application. This new system architecture supports reuse and standardization of simulation resource more than before, and achieves well heterogeneity, portability in various platforms. With the development of grid technology, the research of the combination of grid and distributed simulation will have a leap and the system will become more powerful.

596

T. Zhang et al.

References [1] [2] [3] [4] [5] [6] [7] [8]

I. Foster, C. Kesselman.: The Grid: Blueprint for a Future Computing Infrastructure. Morgan Kaufmann Publisher. San Francisco (1999) I. Foster, C. Kesselman, S. Tuecke.: The Anatomy of the grid: Enabling scalable virtual organizations. International Journal of Supercomputer Applications, Vol. 15. (2001) 200~222 S. Graham et al.: Building Web Services with Java: Making Sense of XML, SOAP, WSDL, and UDDI. Sams Technical Publishing. Indianapolis, Ind. (2001) I. Foster, C. Kesselman, J. Nick, S. Tuecke.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Globus Project, http://www.Globus.org/research/papers/ogsa.pdf, (2002) I. Foster, C. Kesselman.: Globus: A Metacomputing Infrastructure Toolkit. International Journal of Supercomputer Applications, Vol. 11. (1997) 115-128 S. Tuecke, K. Czajkowski, I. Foster, J. Frey, S. Graham, C. Kesselman, T. Maquire, T. Sandholm, D. Snelling, P. Vanderbilt.: Open Grid Service Infrastructure (OGSI) Version 1.0. http://www.ggf.org/ogsi-wg (2003) Thomas Sandholm, Jarek Gawor. Globus Toolkit 3 Core – A Grid Service Container Framework. http://www-unix.globus.org/toolkit/3.0/ogsa/docs/gt3 core.pdf (2003) IEEE Standard for Modeling and simulation (M&S) High Level Architecture (HLA) –Frame and Rules. IEEE Std 1516-2000. (2000)

A Policy-Based Service-Oriented Grid Architecture* Xiangli Qu, Xuejun Yang, Chunmei Gui, and Weiwei Fan School of Computer Science, National University of Defence Technology, Changsha, China,410073 [email protected]

Abstract. Recently, a promising trend towards powerful and flexible Grid executing circumstances is the adoption of a service-oriented infrastructure. Meanwhile, for such requirements as QoS, load balance, security, scalability etc., network paradigm is being shifted from the current hardware-based, manually configured infrastructure to a programmable, automated, policy-based one. Based on the above two observations, in this paper we propose a policybased service-oriented grid architecture, outline its basic model, primary components and corresponding functionalities. Keywords: Grid, policy-based, service-oriented, small world

1 Introduction The Grid concept was first introduced as enabling resource sharing within desperate faraway scientific collaborations [4],[5],[6]. In [3], Grid technologies and infrastructures are defined as supporting the sharing and coordinated use of diverse resources in dynamic, distributed “virtual organizations”(VOs). With the booming of Web Services, component-based programming and middleware technologies, recently, trends show that Grid is more viewed as an extensible set of Grid services. Both in ecommerce and in e-science, integrating services from distributed, heterogeneous, dynamic VO is needed [2]. Therefore, many service-oriented Grid infrastructures and solutions have been presented, among which OGSA is a typical instance. OGSA “defines standard mechanisms for creating, naming, and discovering transient Grid service instances; provide location transparency and multiple protocol bindings for service instances; and supports integration with underlying native platform facilities”[2]. And some specific implementation of this architecture has already come into being, such as ICENI (Imperial College e-Science Networked Infrastructure) [9]. In terms of service orientation, such network-enabled entities as computational resources, storage resources, networks, programs, databases, mediums, etc., can all be classified as a kind of Grid services. For the diverse natures of these entities, for the various requirements from users, for the dynamics of the environment, and for the * This paper is sponsored by Chinese 863 OS New Technologies Research 2003AA1Z2060. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 597–603, 2004. © Springer-Verlag Berlin Heidelberg 2004

598

X. Qu et al.

end guarantee of QoS, finding the “best” service capable of meeting the needs of a user, or a community of users, is inherently complex and challenging. Wherein, no single policy can satisfy the whole situation. Therefore, to enable a transparent, organic and efficient composition of services, to realize the blueprint of a Semantic Service Grid, to change a loosely coupled system to a tightly coupled one, with security, scalability, fault-tolerance, interoperability etc. in mind, it is of great importance to introduce dynamic adaptive multi-policies into the whole infrastructure. Meanwhile, today’s network is moving beyond simple, insecure, best-effort data communications, heading for policy-based infrastructures to enable advanced features such as dynamic traffic engineering, guaranteed bandwidth, secure traffic tunneling and so on [1]. Driven by the two changes, this policy-based service-oriented grid architecture is suggested. The rest of the paper is organized as follows: the targets of this architecture are listed in section 2; section 3 outlines the basic model; section 4 details the components and corresponding functionalities; and a brief summary and an outlook to future work conclude this paper in section 5 and section 6, respectively.

2 Targets The targets of such a policy-based service-oriented grid architecture outlined in this paper are: High performance: It should be of efficient, flexible and succinct organization. Context sensibility: policies will make dynamic adjustment according to network status, workload distribution and requirement variations. QoS capability: to satisfy needs both from service requesters and service providers, to provide differentiated services. Multi-protocol interoperability: to enable seamless cooperation between incompatible domains running different protocols. Scalability: to allow the infrastructure to scale flexibly. Security Fault tolerance Mechanism, not policy

3 Basic Model According to analytical results of network behaviorism, network interactive patterns take on a feature of “small world” [10],[11], exhibiting the following two characteristics: high clustering and a small average shortest path between two random nodes (sometimes called the diameter of the network), scaling logarithmically with the number of nodes.

A Policy-Based Service-Oriented Grid Architecture

599

Taking this into account, we adopt a two-level hierarchical structure in this architecture: an inter-domain policy manager, a backup policy manager and an edge policy manager per domain. The whole Grid infrastructure can be divided into a number of relatively independent domains, which will be in the charge of a corresponding edge policy manager. And the inter-domain policy manager will take the responsibilities of coordinating different domains, making system-wide policies, managing edge policy managers. The backup policy manager, as its name implies, mainly serves to be a backup for the inter-domain policy manager. The whole infrastructure is depicted in Fig. 1:

Fig. 1. Basic Model

4 Components and Corresponding Functionalities After outlining the basic model, next we will focus on illustrating the components and corresponding functionalities.

4.1 Inter-domain Policy Manager Generally speaking, a policy will take the form of “If < condition (s) >, then < action (s) >. Considering the two parts consisting a policy-based system: a set of conditions under which the policy applies, including application types, protocol bindings, QoS priorities, workload distributions etc.; and a set of actions that apply as a consequence of satisfying (or dissatisfying) the conditions, including service matching, protocol

600

X. Qu et al.

selection, channel allocation, data migration and so on, a bundle of active and passive components consist this architecture: a policy maker, a policy base, a service repository, a multi-protocol interactor, a monitoring server, an auditing server and an event logger, as is illustrated in Fig. 2.

Fig. 2. Inter-domain Policy Manager

The corresponding functionalities of each component are: Auditing Server is responsible for such security problems as accessing control, user identification and so on. A service request will first enter this component, and will not get through unless qualified. Service Repository, as its name indicates, is a service collector, which is in charge of service registry, service discovering, service aggregation, service caching and service labeling. As an inter-domain service repository, it mainly interacts with those edge service repositories for service information. Policy Base is filled with all kinds of policies, such as security assuring policies, load balancing policies, protocol selecting policies, service matching policies and so on to cope with different situations. Administrators can input policies in Policy Description Language [7]. It also accepts the feedback of a service transaction to dynamically adjust some policies, embodying some self-learning capabilities.

A Policy-Based Service-Oriented Grid Architecture

601

Policy Maker, in a way, is a critical component here. The final decisions, involving service matching, protocol selection, channel allocation, data migration are made here, according to some specific policies in policy base, while taking the external conditions, such as workload distribution, network traffic, network topology, QoS requirements etc. into consideration. And the final result will be logged to event logger for fault-tolerance and service feedback. Multi-protocol Interactor: this component mainly functions to bridge domains running different protocols, which can be implemented with a Proteus Multiprotocol Message Library [8]. Monitoring Server: this is an observer to external conditions, including workload distributions, network traffics, network topologies, service availableness. The information collected will be saved in info base to enable workload balancing, dynamic network topology depicting, and traffic shaping, so as to provide more powerful aids for real-time policing. Event Logger is responsible for servicing transaction logging. Each time a service intercourse is initiated, each participator will be logged. As soon as the transaction succeeds, a “success” signal will be sent here. In this way, servicing information can also be offered to service repository for service labeling. On the condition that the signal is timed out or a “failure” signal is received, this servicing transaction will be rolled back and the policy maker will be notified to make another policy again. By this means, fault is tolerated to a certain degree. Synchronizing Server is responsible for synchronizations with backup policy manager in service repository, event log, info base and policy base, and with edge policy managers for dynamic service refreshment. For efficiency, synchronizing data can be transmitted in wormhole manners.

4.2 Edge Policy Manager An edge policy manager is responsible for local domain policing, whose components are quite similar to the inter-domain policy manager. Since each domain, in a way, constitutes a small world running almost the same protocol, multi-protocol interactor can be removed. Under this circumstance, service transactions occur much more frequently, therefore a channel selector is configured for proper assignment of channels. Meanwhile, for services cannot be fulfilled within the same domain, an outgoing interactor will relay these service requests to the inter-domain policy manager. And the final policy from the inter-domain policy manager is passed down by way of this component. At the same time, it is also responsible for periodically sending an “alive” signal to the backup policy manager. This infrastructure is shown in Fig. 3.

4.3 Backup Policy Manager The backup policy manager acts as a backup for the inter-domain policy manager. It is configured almost of the same components, wherein the monitoring server plays an

602

X. Qu et al.

Fig. 3. Edge Policy Manager

important part monitoring the aliveness of all the other policy managers instead of other status, as long as the inter-domain policy manager is alive. Otherwise, it will take the place of the inter-domain policy manager. If some edge policy manager is observed to be offline, the backup policy manager will choose another node from the in-problem domain to act as the edge policy manager.

5 Summary Our policy-based service-oriented grid architecture is put forward based on two observations: grid is evolving towards a service orientation, while policy-based network is springing up. From the illustration of its primary components and corresponding functionalities, it can be concluded that: this architecture is dynamic, context sensible, QoS capable, workload balanceable, secure, scalable, multi-protocol interoperable and fault-tolerant.

6 Future Work So far, this architecture remains just a blueprint; we will try to implement a prototype in the future.

A Policy-Based Service-Oriented Grid Architecture

603

For the limited fault-tolerant capabilities of this architecture and frequent volatilities in network, strong and efficient fault-tolerant measures such as service dependence analysis will be taken later. Since the policy maker plays a critical role in this architecture, parallelism will be exploited in avoidance of bottleneck.

References 1. David Durham: A New Paradigm for Policy-Based Network Control. Intel Developer Update Magazine, November 2001 2. Ian Foster, Carl Kesselman, Jeffrey Nick, and Steve Tuecks: The Physiology of the Grid: An Open Grid Service Architecture for Distributed Systems Integration. http://www.globus.org/ogsa/ 3. Foster, I., Kesselman, C. and Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, 15 (3), (2001) 200-222 4. Catlett, C. : In Search of Gigabit Applications. IEEE Communications Magazine (April). 42-51. 1992 5. Catlett, C. and Smarr, L. : Metacomputing. Communications of the ACM, 35 (6), (1992) 44-52 6. Foster, I. The Grid: A New Infrastructure for 21st Century Science. Physics Today, 55 (2), (2002) 42-47 7. Jorge Lobo, Randeep Bhatia, Shamim Naqvi: A Policy Description Language, proceedings AIII, (1999) 291-298 8. Kenneth Chiu, Madhusudhan Govindaraju, Dennis Gannon: The Proteus Multiprotocol Message Library. Proceedings of the IEEE/ACM SC2002 Conference November 16 - 22, 2002 Baltimore,Maryland. p. 30 9. Nathalie Furmento, William Lee, Anthony Mayer, Steven Newhouse, and John Darlington: ICENI: An Open Grid Service Architecture Implemented with Jini. Proceedings of the IEEE/ACM SC2002 Conference November 16 - 22, 2002 Baltimore, Maryland, p. 37 10. Jörn Davidsen, Holger Ebel, and Stefan Bornholdt: Emergence of a Small World from Local Interactions. Modeling Acquaintance Networks Physical Review Letters, 2002 11. D.J. Watts: Small worlds: The Dynamics of Networks between Order and Randomness. Princeton University Press ,1998

Adaptable QOS Management in OSGi-Based Cooperative Gateway Middleware Wei Liu1, Zhang-long Chen1, Shi-liang Tu1, and Wei Du2 1

Department of Computer Science and Engineering, Fudan University, Shanghai 200433 {wliu, chenzl, sltu}@fudan.edu.cn 2

College of Management, University of Shanghai for Science and Technology, Shanghai 200093 [email protected]

Abstract. The Open Services Gateway Initiative (OSGi) Specification defines a service-oriented cooperative framework between home and outer home. It uses the OSGi-gateways to deliver products and services to endusers, such as home security control and intelligent home equipments. The paper studies the QOS problem of OSGi technology, and puts forward the QOS problems and other limitations. And it uses Real-Time Specification for Java (RTSJ) and dynamic adaptable QOS management integrating the OSGi framework to solve the QOS problem.

1

Introduction

Internet connections for private users are becoming much cheaper and faster. While the embedded and telecommunication equipments are getting smaller and more powerful, it needs an embedded server that is inserted into the network to connect the external internet to internal clients. The Open Services Gateway Initiative (OSGi) is making developers and enterprises realize the potential of the consumer equipments market such as virtual intelligent home and intelligent home health care etc. But how to provide reliable quality of service management in OSGi-based open middleware is a stringent problem. The central component of the OSGi specification is the service gateway that acts as the platform for many communication-based services. The service gateway can enable, consolidate and manage voice, data, internet and multimedia communications from the home, office and other locations.

2 2.1

Adaptable QOS Management of OSGi-Based Cooperative Middleware Limitation of QOS in OSGi Framework

Nowadays the specification of OSGi is 3.0. It does not provide rational QOS solution in middleware layer and framework. But the OSGi-based applications M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 604–607, 2004. © Springer-Verlag Berlin Heidelberg 2004

Adaptable QOS Management

605

maybe have requirements for real-time ability and predictability such as virtual intelligent home and intelligent home health care. Increasingly, applications in the domains are to perform more demanding functions over highly networked environments, which in turn places more stringent requirements on the underlying computing and network systems. Therefore the OSGi-based middleware are requiring a broad range of features, such as service guarantees and adaptive resource management, to support a widening performance, secure operation and predictability.

2.2

Software Solution: Adaptable QOS Management

To meet these research challenges, it is necessary to preserve and extend the benefits of existing middleware, while defining new middleware services, protocols and interfaces in OSGi-based middleware. The paper puts forward to integrate the OSGi-based specification using Real-time Java specification such as RTSJ. OSGi-based Middleware by RTSJ. For developing the standard for realtime Java, IBM, Sun and other organizations from industry and academia formed a team called the Real-Time for Java Expert Group, and proposed the RealTime Specification for Java (RTSJ). RTSJ is the definitive reference for the semantics, extensions and modifications to the Java programming language that enable the Java platform to meet the requirements and constraints of real-time system performance, predictability and capabilities. This specification provides programmers with the ability to model applications and program logic that require predictable execution, which meets hard and soft real-time constraints. However, the development of the RTSJ-compliant Java Virtual Machine has been slow for most vendors of real-time operating systems. It decided to design a real-time extension library that can satisfy the basic requirement of developing real-time programs. Dynamic Adaptable QOS Management. It needs to use QOS monitoring etc some kinds of method to provide reliable QOS in OSGi-based middleware using RTSJ specification. QOS violations are reported to diagnosis functions that identify the causes of poor QOS. Allocation analysis identifies possible reallocation actions to improve the QOS, and selects the best node of these possible actions. This section illustrates the use of the system model for QOS monitoring, for QOS forecasting, allocation analysis. QOS Forecasting. Monitoring of real-time QOS involves the collection of timestamped events sent from applications, and synthesis of the events into path-level QOS metrics. Forecasting of the real-time QOS allows early prediction of QOS overload or underload violations. Such conditions may occur when an unanticipated increase in tactical data causes the resource utilization to exceed the appropriate threshold levels. For forecasting QOS violations, the system model must be flexible enough to adapt to dynamic changes in resource utilization.

606

W. Liu et al.

QOS Adaptation and Allocation Mechanisms Analysis. However, to perform monitoring, the QOS requirements specified in application level terminology need to be translated into transport level terminology, that is, for example, from video frame or audio packet to transport protocol data units. A further level of translation is needed. This rescaling of QOS parameters is called QOS parameters mapping. A mapping between the type of services the transport protocol offers and the traffic classes the network offers is also needed. In this section we illustrate the use of the system model and the load indexes for selection of the best node for allocation purposes. In describing a best-node selection algorithm, we use the notation Li(hi,t) and Li(Li,t) to denote the load index of a host hi and LAN Li at time t, respectively, since a variety of different load index functions may be used. The best-node algorithm determines the best node on which to restart or scale a candidate application. The best host is determined using a fitness function that simultaneously considers both host and LAN load indices. The algorithm first computes the trend values of load indices of hosts and LANs over a moving set of samples. The trend values are determined as the slope of a simple linear regression line that plots the load index values as a function of time. QOS Levels. The application set has four applications, each having four and nine levels with associated benefit and CPU usage numbers. While these applications and levels do not correspond exactly to some applications, the ranges of CPU usages and benefit values used test the QOS level model and vary at least as much as one would find in most actual applications. For the next set of experiment, application period is fixed at 1/10 of a second for all QOS levels of all applications.

3

Experiment and Related Analysis

The test results reported in this section were obtained on an Intel Pentium 1.7GHz with 256 MB DDR RAM, running Linux Red Hat 7.3 with the TimeSys Linux/RT 3.0 GPL5 kernel . The Java platforms used to test the RTSJ features are described below:TimeSys RTSJ Reference Implementation(RI). TimeSys has developed the official RTSJ Reference Implementation (RI), which is a fully compliant implementation of Java that implements all the mandatory features in the RTSJ. The RI is based on a Java 2 Micro Edition (J2ME) Java Virtual Machine (JVM) and supports an interpreted execution mode i.e., there is no just-in-time (JIT) compilation. Run-time performance was intentionally not optimized since the main goal of the RI was predictable real-time behavior and RTSJ-compliance. The result shows the QOS levels at which the four applications run with a skip value of 0. The QOS levels change fast at the beginning, because it is starting the system in a state of CPU overload, i.e., the combined QOS requirement for the complete set of applications running at the highest level(level 1) is about 200% of the CPU. By the 10 th sample, the applications have stabilized at levels that can operate within the available CPU resources. There is an additional

Adaptable QOS Management

607

level adjustment of application 3 at the 38th sample due to an additional missed deadline probably resulting from transient CPU load generated by some nonQOS applications. The test result shows the requested CPU allocation for the applications in the same experiments. Here it shows that the total requested CPU allocation starts out at approximately twice the available CPU, and then drops down to about 100% as the applications are adjusted to stable levels. Note also the adjustment at sample 38, lowering the total requested CPU allocation to approximately 80%. As depicted in above, when the CPU running queue length, or the load average - which is mainly based on the CPU running queue length - are used as load indices, the path latencies are the best. This indicates that unlike other load indices considered, the resource manager component of the middleware made the best allocation decisions using Li and Xt.The most important content of the project is the design and development of open OSGi-based Middleware. It will allow services to be remotely deployed and administered onto home network gateways such as set-top boxes and DSL modems.

4

Conclusions and Future Work

In the paper, it brings forward to integrate the RTSJ and adaptable QOS management in OSGi-based cooperative middleware to solve the QOS problem in OSGi. On the other hand, for highly dynamic systems, adaptive QOS-driven resource management is necessary to utilize system resources efficiently and to provide the appropriate end-to-end application-level QOS support. For future work, it needs to adapt the transport to wireless environments, design a feedback scheme for multicast that is scalable and derive equations for exact QOS mapping.

References 1. Open Services Gateway Initiative, “OSGI Service Platform,” Release 3, 2003. 2. G. Bollella, J. Gosling, B. Brosgol et al: The Real-time Specification for Java. Addison Wesley, 2000. http://www.rtj.org . 3. K. Chen. Programming Open Service Gateways with Java Embedded Server Technology. Addison-Wesley,2001. 4. www.timesys.com 2003 5. D. Jordan: “Java in the Home: OSGi Residential Gateways”, Java Report, September, 2000, pp 38-42, 104. 6. E. S. Gardner, Jr.: “Exponential Smoothing: The State of the Art,” Journal of Forecasting. 7. Campbell and G. Coulson: “QOS Adaptive Transports: Delivering Scalable Media to the Desk Top,” IEEE Network ,1997. 8. R. Rajkumar, C. Lee, J. Lehoczky, and D. Siewiorek: “A Resource Allocation Model for QOS Management,” 18th IEEE Real-Time System Symposium , 1997. 9. D. Hardin: “The real-time specification for Java,” Dr. Dobb’s Journal, Vol. 25.

Design of an Artificial-Neural-Network-Based Extended Metacomputing Directory Service* Haopeng Chen and Baowen Zhang Distributed Computing Technique Centre, Shanghai Jiao Tong University, 200030 Shanghai, P.R.China {Chen–hp, Zhang-bw}@cs.sjtu.edu.cn http://www.cs.sjtu.edu.cn

Abstract. This paper analyzes a serious limitation of existing metacomputing directory service of Globus project that the existing metacomputing directory service doesn’t support application-oriented queries, and then designs an artificial-neural-network-based GRC (grid resources classifier) to eliminate this limitation. This classifier extends the metacomputing directory service by classifying grid resources into application-oriented categories. The classification precision of this GRC can be continuously improved by self-learning. This kind of new metacomputing directory service will be compatible with the old ones. Thus, the practicability of metacomputing directory service will be improved.

1 Introduction Globus is the most influential one of the current grid computing projects. In globus, MDS (metacomputing directory service) provides the functions for users to discover, register, query, and modify the information about grid computing environment. It reflects the real-time state of grid computing environment. [1] Users can locate grid resources and get their attributes by invoking MDS. [2] However, the functions provided by the existing MDS are incomplete because existing MDS doesn’t support the application-oriented query. For example, the existing MDS doesn’t support the query about which resource is suitable for massive data analyzing. However, for the most users, the application-oriented queries are more useful, so the functions of MDS need to be extended. This paper aims to the limitation of existing MDS, puts forward an ANN (artificial neural network) based solution to extend the existing MDS to be able to support application-oriented queries. Thus, the practicality of MDS will be improved.

* This paper is supported by the Shanghai Science and Technology Development Foundation under Grant No. 03DZ15027. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 608–611, 2004. © Springer-Verlag Berlin Heidelberg 2004

Design of an Artificial-Neural-Network-Based Extended Metacomputing

609

2 The ANN Topology of the GRC We designed a GRC(grid resources classifier) which can execute application-oriented classification by the information about grid resources to extend the existing MDS. We have chosen the ANN to design the application-oriented GRC because the input attributes of the instances of grid resources are their information which is stored in the LDAP server in the form of attribute-value pairs, and the result of classification is represented by a vector that each element of it represents the probability that the instance is suitable to be classified as the category that this element represents. So ANN learning is suitable for GRC [3] We decide to employ the sigmoid units as the basic units of the ANN of GRC. The ANN of GRC should be a two layers network which has a hidden unit layer inside, and there should be three sigmoid units in the hidden unit layer. The main reason for such design is that according to the practice, such design can be able to solve the most functions, and if we add more layers or more sigmoid units in the hidden layer, we’ll find that the precision couldn’t be improved markedly, but the training time would be prolonged greatly. The input vector of the ANN of GRC should include all static and dynamic information of the specified grid resource. We can use a linear function to scale-up and/or –down the numerical information into a suitable range. The number of the sigmoid units included in the output layer of the ANN of GRC is as many as the number of the application-oriented categories of grid resources. We use the probability that the instance is suitable to be classified as the category that this element represents to represent each element. According to the above analysis, we can obtain the topology of ANN of GRC. It has been illustrated in Figure 1.

Fig. 1. The topology of ANN of GRC

3 The Employed ANN Learning Algorithm BP Algorithm, (Back Propagation Algorithm) is the most common ANN learning algorithm. [4] In order to prevent get into the local minima in the error surface, and gradually increase the step size of the search in regions where the gradient is un-

610

H. Chen and B. Zhang

changing, we employ the BP algorithm which has a momentum term. This algorithm is described as the followings: Backpropagation_for_GRC(training_examples, Statements of symbols: training_examples represent the training instances of grid resources. Each of them is a pair of the form

where

is the vector of network input values,

and is the vector of target network output values. is the learning rate. It is a constant with very small value. We can specify its value according to the proper characteristics of grid, for example, is the dimension of the network input vector. is the dimension of the network output vector. It is equal to the number of units in the output layer. is the number of units in the hidden layer, we evaluate it as 3. The input from unit

to unit

The weight from unit to unit

is denoted is denoted

is a momentum constant. The value of is very small, for example, The process of this algorithm is described as followings: Create a feed-forward network with inputs, hidden units, and output units. Initialize all network weights to small random numbers. Until the termination condition is met, Do: For each in training_examples , Do: 1. Input the instance

to the network and compute the output

of each unit

in the network. 2. For each network output unit k , calculate its error term

3. For each hidden unit h , calculate its error term

4. Update each network weight

Where The description of this algorithm is over. In the above algorithm, the GRC always uses the current learned function to classify some grid resource in real time. According to the feedback of users, the GRC will modify the current function to obtain the new one. So this algorithm would not be stopped for ever, we just continuously use the newest learned function. The values of and should be specified according to the own characteristics of differ

Design of an Artificial-Neural-Network-Based Extended Metacomputing

611

Fig. 2. The architecture of the extended MDS which has a GRC

ent grids. It is unnecessary and impossible to give a set of values which can be applied to any grid.

4 The Architecture of the Extended MDS Which Has a GRC The architecture of the extended MDS which has a GRC is shown in Figure 2. User B and user C access MDS by the primary ways. User A access MDS by accessing GRC, GRC gets the static and dynamic information of the current available resources, filters information by the learned function, and returns the information of suitable resources to the user A. User A will send a feedback to GRC according to his final choice. GRC will modify its classification function by learning this feedback to improve the precision of classification.

5 Conclusion This paper designs an artificial-neural-network-based GRC to extend the metacomputing directory service by classifying grid resources into application-oriented categories. However, several aspects of the GRC given in this paper still need to research, such as the time complexity of training process, the space complexity of instance space, the training algorithm, and simulation.

References 1. Dou Zhi-hui, Chen Yu, & Liu Peng: Grid Computing. Interior materials. (2002) 87-96 2. The Globus Toolkit 2.2 MDS Technology Brief Draft 4 – January 30, 2003 http://www.globus.org/mds/ mdstechnologybrief draft4.pdf 3. Tom M. Mitchell: Machine learning. McGraw-Hill Companies, Inc. (1997) 70-74 4. Martin T. Hagan., Howard B. Demuth., & Mark H. Beale: Neural network Design. PWS Publishing Company. (1996) 197-207

Gridmarket: A Practical, Efficient Market Balancing Resource for Grid and P2P Computing* Ming Chen, Guangwen Yang, and Xuezheng Liu Dept. of Computer Science and Technology, Tsinghua University {cm01,ygw,liuxuezheng00}@mails.tsinghua.edu.cn

Abstract. The emergency of computational Grid and Peer-to-Peer (P2P) computing system is promising to us. It challenges us to build a system to maximize collective utilities through presumed participants’ rational behavior. Although economic theories sound reasonable, many existent or proposed solutions based on that face problem of feasibility in practice. This paper proposes Gridmarket: an infrastructure relying on resource standardization, continuous double auction, and straightforward pricing algorithms which are based on price elasticity inherent in consumers and suppliers. Gridmarket efficiently equates resource’s demand with supply through continuous double auction and price tracing mechanism in the required price ranges. Software agent employing Gridmarket’s schedule is easy to write. To demonstrate its efficacy and efficiency, we have designed, built a simulation prototype and found the experiments promising.

1

Introduction

Computational Grid and P2P computing system’s emergence provides promising solutions for cooperatively solving large-scale computing problems. Such systems consist of organically and economically independent entities. The rationality of human beings derives from their individuals’ selfishness. The contribution of resources only depends on the fickle concept of goodwill. Lacking mechanism to temper supplier and demander, the system tends to be unbalanced and eventually to collapse. A good incentive mechanism can allocate resource efficiently and boost the system’s prosperity. Mechanism built on economic models are better than schedule resolution only concerning system-parameters. In such geographically distributed systems spanning multiple independently organisms and entities, it provides a clear and familiar model for users. Several approaches[2.. 11] in this direction have been proposed to bring balance between demand and supply into these systems. But they are not practical in realms for lacking feasibility, or price formation mechanism [2], or complete support for required price ranges set by consumers and * Supported by National Natural Science Foundation of China (60373004,60373005) M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 612–619, 2004. © Springer-Verlag Berlin Heidelberg 2004

Gridmarket: A Practical, Efficient Market Balancing Resource

613

suppliers, or schedule scope[3][4][5], or transaction efficiency[10][11]. The above all are necessary preconditions for a productive market. In this paper we present Gridmarket, an practical infrastructure aiming to balance demand and supply in grid and P2P systems. Gridmarket is composed of following components: traded resource standardization, continuous double auction, intuitive and straightforward pricing algorithms based on price elasticity set by consumers and suppliers. The pricing algorithms greatly lift burden on consumers and suppliers: once consumers and suppliers set three simple parameters, software agents of consumers automatically bid resources to execute tasks while software agents of suppliers sell idle resources of themselves in the continues double resource market. Gridmarket features maneuverability, simplicity, efficacy, and efficiency.

2

Market Model

Enlightened by the economic phenomenons of realms, we reason that a fully competitive resource market where consumers and suppliers trade standard resources is necessary for Grid and P2P computing. Resource market provides the basic exchange function for resource consumers and suppliers. Resources traded in resource market are immediately consumable right after transactions. Because it is a perfectly competitive market, resource suppliers can’t manipulate prices to maintain high prices and fleece consumers unless they conspire together. Considering possible prohibitive expensive legal penalties and the difficulties to collude among large number of independent suppliers, the possibility of ostensible collusion is very low. Resource suppliers have to sell their resources at the prices of market to get returns on and of their sunk investments. This is their only available choice. In such a completely competitive market, supplier’s margin revenue == market resource price. A supplier can maximize its profit by providing as many as resources at the cost less or equal to market price. On the other side, consumers also can’t manipulatively depress market price to extort suppliers. Lower price stimulates demand and restrains supply while higher price chokes off demand and fuels supply. The market tend to be equivalent. The invisible hand[1] of market guarantees the full employment of resources. To increase the liquidities of the resource market, all items traded in it must be standardized, say resources are classified into different predefined standardized categories with unique identifiers. Consumers and suppliers can and only can bid/be asked for standard resources. Although this design may limit the flexibility of expression for resource, it provides standardization and reduces the complexity of communication and match process both for programmes and human beings. Backed by human standardization history, we envision that with the evolution of P2P and Grid, traded resources will gradually be standardized,too. Every transaction price in the resource market is published to market participants. The publication makes the market transparent, fair and efficient. Con-

614

M. Chen, G. Yang, and X. Liu

sumers and suppliers can make orders according to transaction price and their own pricing strategies. Orders are directly sent to the resource market for match.

3

Market Components

In this section, we describe match process, and pricing algorithms in detail in order.

3.1

Match Process

Resource market periodically uses price-driven continuous double auction process to match consumer’s bidding orders and supplier’s asked orders. Double Auction is one of the most common exchange institutions in the marketplace. Most stock markets (e.g.: NASDAQ, Shanghai Stock Exchange, and Shenzhen Stock Exchange) use double auction to exchange equities, bonds, and derivatives. In the double auction model, bid orders (buy orders) and asked orders (sell orders) can be submitted at anytime during the trading period. At the end of a match period, if there are open bids and asks that match or are compatible in terms of price and requirements (e.g., quantity of goods or shares), a trade is executed. Bids are ranked from highest to lowest in term of bid price while asks are ranked from lowest to highest in term of ask price. The match process starts from the beginning of ranked bids and asks. Some complex algorithms[12][13] have been developed to automate bidding in double auction process for stock trading. If prices are equal, match priorities are based on the principle of time first and quantity first: previous orders superiors later orders and orders with larger quantities arriving at the same time precede those with less quantities.

3.2

Pricing Algorithms

We propose two pricing algorithms: consumer pricing algorithm (Figure 1) and supplier pricing algorithm (Figure 2). The consumer pricing function is: where denoting base price and expressing price elasticity are consumer-specific coefficients and t is the time parameter. This function is intuitive and straightforward. With time elapsing, a consumer usually may bid with a higher and higher price if he can’t successfully buy needed resource. The supplier function is: where denoting base price and expressing price elasticity are supplier-specific coefficients and t is the time parameter. This function is also easy to understand. With time elapsing, a supplier usually may tend to offer his idle resource with a lower and lower price if he can’t successfully sell his resource. These two functions automatically make temporal differences between bid price and ask price to converge to clear the market. Consumers can set their ceiling prices and suppliers can set their floor prices. Increase in ceiling price in bid improves the demand power for consumers and decrease in floor price in ask buildup the supply competitiveness for suppliers.

Gridmarket: A Practical, Efficient Market Balancing Resource

Fig. 1. Consumer Pricing Algorithm

4 4.1

615

Fig. 2. Supplier Pricing Algorithm

Analysis and Experiments Analysis

The system are modelled as a M/M/N queuing network[17]. Task streams of all consumers are bound into a single task streams as system input stream. We employs below equations[17] to theoretically analyze the resource utilization rate and responsive time of our system:

where separatively and

and are the number of consumers and suppliers is the system resource usage rate.

System responsive time is:

4.2

Experiments

We use a event-driven prototype to measure our algorithms. The prototype samples before the 3000th task arrives.

616

M. Chen, G. Yang, and X. Liu

Fig. 3. Transaction mean price and transactions with different ceiling prices and floor prices (2 consumers vs. 2 suppliers):

Synthetical experiment. Two consumers with different ceiling prices and two supplier with varying floor prices play bargaining game in this experiment. The result shows that comparatively lower floor prices and relatively higher ceiling prices are good choices for suppliers and consumers respectively in the constraint of cost/wealth. There is no absolute panacea for consumers and suppliers. The game theory dominates as expected. Schedule efficiency. In this section, we explore the schedule efficiency of this algorithm in aspects of task responsive time penalty and resource utilization rate varying elasticity coefficients (Figure 4 and Figure 5). First, we can see from figures that our schedule algorithm are highly efficient: the theoretical curves (Plotted according to Equation 2 and Equation 1 respectively) are almost approximated by experiment curves when system’s load is not high. Second, time burden duo to bargaining between consumers and suppliers

Gridmarket: A Practical, Efficient Market Balancing Resource

617

increases sharply as system approaches saturation and the degree of increased burden is negatively related to elasticity coefficients. The reason behind it is straightforward: bargaining time costs are neglectable relative to ‘long’ arrival intervals when system load are light, but the costs do matter in high load cases. These costs reduce resource utilization rates and increase responsive times.

Fig. 4. Responsive Time (1 consumer vs. 1 supplier)

5

Fig. 5. Usage Rates (1 consumer vs. 1 supplier)

Related Work

There are lots of works in this area which can be classified into four categories: commodity-market model, auction model, credit-based model and theoretical analysis. We outline them by category.

5.1

Commodity-Market Model

Nimgrod-G[2], Mungi[3], and Enhanced MOSIX[4] fall into this category. Nimrod-G claimed that it supported multiple economic models, but its implementation focused on commodity-market model. Nimrod-G assumed that exogenous, predefined and static prices exists for resources and that the length of run time of a program can be accurately estimated which maybe unrealistic in practice. In Mungi[3], which is a single address space operating system, applications obtain bank accounts from which rent is collected for the storage occupied by objects. Rent automatically increases as available storage runs low, forcing users to release unneeded storage. Its main concern is garbage collection. Enhanced MOSIX[4] deployed in cluster environment uses opportunity cost method which converts the usage of several heterogeneous resources in a machine to a single homogeneous cost. It does not take the prices that consumers can afford into account.

5.2

Auction Model

This class includes Spawn[5], Rexec/Anemone[7], and JaWS[8]. Spawn employs Vickrey Auction [6]—second-price sealed auction—to allocate resources among

618

M. Chen, G. Yang, and X. Liu

bidders. Bidders receive periodical funding and use balance of fund to bid for hierarchical resources. Task-farming master program spans and withdraws subtasks depending on its relative balance to its counterparts. It doesn’t consider heterogenous resources and is mainly targeted for Monte Carlo simulation applications. Rexec/Anemone[7] implements proportional resource sharing in clusters. Users assign utility value to their applications and system allocates resources proportionally. Cost requirement is not its consideration. In JaWS (Java Webcomputing System) [8], machines are assigned to applications via auction process in which highest bidder wins out. These above solutions doesn’t make use of continuous double auction.

5.3

Credit-Based Model

Mojo-Nation[10] and Samsara[11] are all kind of this type. In Mojo-Nation and Samsara, storage contributors earn some kind of credits or claims by providing storage space and spend them when needed. It is a bartering methodology.

5.4

Theoretical Analysis

[14] explored the interaction between human objects and software bidding agents using strategies based on extensions of the Gjerstad-Dickhaut[12] and ZeroIntelligence-Plus[13] algorithms in a continuous double auction process. Gains of human objects and software agents and trading equilibrium are its main concern. [15] measured the efficiency of resource allocation under two different market conditions—commodities markets and auctions—in terms of price stability, market equilibrium, consumer efficiency, and producer efficiency using hypothetical mathematical model.

6

Conclusion

It is an effective approach, using economic models to schedule tasks in a worldwide geographically distributed environment. In this paper, we present Gridmarket, a practical, simple but efficient schedule infrastructure. Gridmarket is built on resource standardization, continuous double auction, and intuitive and straightforward pricing algorithms based on price elasticity inherent in consumers and suppliers. Software agents for consumers and suppliers can automatically bid resources to execute tasks and sell idle resources respectively through Gridmarket. Gridmarket efficiently equates resource demand with supply through continuous double auction and price tracing mechanism in the reasonable price range. Preliminary simulation results demonstrate the efficacy in term of resource allocation and the efficiency in term of resource utilization.

Gridmarket: A Practical, Efficient Market Balancing Resource

619

References 1. Adam Smith, An Inquiry into the Nature and Causes of the Wealth of Nations, 1776. 2. R. Buyya, D. Abramson, J. Giddy, and H. Stockinger, Economic Models for Resource Management and Scheduling in Grid Computing, Special Issue on Grid Computing Environments, The Journal of Concurrency and Computation: Practice and Experience (CCPE), Wiley Press, May 2002. C. Waldspurger, T. 3. G. Heiser, F. Lam, and S. Russell, Resource Management in the Mungi SingleAddress-Space Operating System, Proceedings of Australasian Computer Science Conference, February 4-6, 1998, Perth Australia, Springer-Verlag, Singapore, 1998. 4. Y. Amir, B. Awerbuch., A. Barak A., S. Borgstrom, and A. Keren, An Opportunity Cost Approach for Job Assignment in a Scalable Computing Cluster, IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 7, pp. 760-768, IEEE CS Press, USA, July 2000. 5. Hogg, B. Huberman, J. Kephart, and W. Stornetta, Spawn: A Distributed Computational Economy, IEEE Transactions on Software Engineering, Vol. 18, No. 2, pp 103-117, IEEE CS Press, USA, February 1992. 6. W. Vickrey, Counter-speculation, auctions, and competitive sealed tenders, Journal of Finance, Vol. 16, No. 1, pp. 9-37, March 1961. 7. B. Chun and D. Culler, Market-based proportional resource sharing for clusters, Technical Report CSD-1092, University of California, Berkeley, USA, January 2000. 8. S. Lalis and A. Karipidis, An Open Market-Based Framework for Distributed Computing over the Internet, Proceedings of the First IEEE/ACM International Workshop on Grid Computing (GRID 2000), Dec. 17, 2000, Bangalore, India, Springer Verlag Press, Germany, 2000. 9. K. Reynolds, The Double Auction, Agorics, Inc., 1996, http://www.agorics.com/Library/Auctions/auction6.html. 10. Mojo Nation - http://www.mojonation.net/, October 2003. 11. Landon P. Cox, Brian D. Noble, Samsara: Honor Among Thieves in Peer-to-Peer Storage , Proceedings of the 19th ACM Symposium on Operating System Principles, October 2003. 12. S. Gjerstad and J. Dickhaut, Price formation in double auctions. Games and Economic Behavior, 22:1 C29, 1998. 13. D. Cliff and J. Bruten, Minimal-intelligence agents for bargaining behaviors in marketbased environments, Technical Report HPL-97-91, Hewlett Packard Labs, 1997. 14. R. Das, J. Hanson, J. Kephart, and G. Tesauro, Agent-Human Interactions in the Continuous Double Auction, Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI), August 4-10, 2001, Seattle, Washington, USA. 15. R. Wolski, J. S. Plank, J. Brevik and T. Bryan, Analyzing Market-Based Resource Allocation Strategies for the Computational Grid, The International Journal of High Performance Computing Applications, Sage Science Press, Volume 15, number 3, Fall, 2001, pages 258-281. 16. M. Livny R. Raman and M. Solomon, Matchmaking: Distributed Resource Management for High Throughput Computing, Proceedings of the Seventh IEEE International Symposium on High Performance Distributed Computing, July 28-31, 1998, Chicago, IL. 17. Hock N C. Queuing Modelling Fundamentals John Wiley & Sons Ltd., 1997.

A Distributed Approach for Resource Pricing in Grid Environments Chuliang Weng, Xinda Lu, and Qianni Deng Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, 200030, People’s Republic of China {weng-cl, lu-xd, deng-qn}@cs.sjtu.edu.cn

Abstract. A distributed group-pricing algorithm is presented for the marketbased resource management in the grid context based on quick convergence of centralized algorithms and scalability of distributed algorithms. According to the new algorithm, resources in the system are divided into multiple resource groups according to the degree of price correlation of resources. When the demand and supply of resources in the system changes, each auctioneer in the defined system structure is responsible for adjusting simultaneously the price of one resource group respectively until the excess demand of all resources becomes zero. We test the distributed group-pricing algorithm against the existed distributed algorithm, and analyze the property of the algorithm. Experimental results indicate that an equilibrium can be achieved by the distributed grouppricing algorithm quicker than by the existed distributed algorithm.

1 Introduction As a new infrastructure for next generation computing, grid systems enable the sharing, selection, and aggregation of geographically distributed heterogeneous resources for solving large-scale problems in science, engineering and commerce [1]. Many studies have focused on providing middleware and software programming layers to facilitate grid computing. There are a number of projects such as Globus [2] and Legion [3] that deal with a variety of problems such as resource specification, information service, and security issues in a grid computing environment involving different administrative domains. Grid resources are geographically distributed across multiple administrative domains and owned by different organizations. The characteristic of resources in the grid computing systems results in the following difficulty: there is no uniform strategy for resource management because resources belong to different organizations which have their own local strategies for resource management; the dynamic characteristic should be transparent to grid users with appropriate methods; resources are heterogeneous which differ in many aspects. The dynamic and heterogeneity not only are the inherent characteristics of grid computing systems, but also autonomy becomes the special characteristic of the grid for resources distributed across multiple administrative domains. The market mechanism is very suitable for solving the problem of resource management in the grid M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 620–627, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Distributed Approach for Resource Pricing in Grid Environments

621

context: market mechanism in economics is based on distributed self-determination, which is also suitable for resource management in the grid context; at the same time, the variation of price reflects the supply and demand of resources; finally, market theory in economics provides precise depiction for efficiency of resource allocation. Market-based resource allocation can be divided into two sub-problems. One is how to determine the price of resources, and the other is how to allocate resources for achieving the goal of high effective utilization of resources in response to current resource prices. In this paper, we focus on the first problem, i.e., how to determine the general equilibrium price. It is tâtonnement process [4] that varies the price of resources until an equilibrium is reached according to the general equilibrium theory. Generally, there are two kind of pricing methods: one kind is the distributed independent pricing method, which adjusts the price of the individual resource according to the equilibrium of supply and demand of the individual resource with the distributed manner; the other kind is the centralized simultaneous pricing method, which varies the price of all resources simultaneously according to the equilibrium of supply and demand of all resources with the centralized manner. In Section 2, a brief overview is given for current research on resource pricing for grid computing. A system structure for resource pricing is described in Section 3. A distributed group-pricing algorithm is presented in Section 4. In Section 5, we test the performance of the presented algorithm. Finally, we conclude this paper in Section 6.

2 Related Works Research efforts on resource management for grid computing based on economics principles include works [5,6,7,8]: the distributed pricing method is studied in [5], and the centralized pricing method is studied in [6,7]; In GRACE [8], the price of resources was given to them artificially in economics-based resource scheduling experiments, leaving no space for optimization of resources allocation. Distributed pricing WALRAS algorithm is presented in [9], and the property of this algorithm is also discussed. Distributed independent pricing methods and centralized simultaneous pricing methods are compared in [10]. The distributed independent pricing method is suitable for large-scale distributed systems, and the complex of the method is relative lower, however the speed of achieving the equilibrium is slower for not considering the correlation of different resource prices. In contrast, the centralized simultaneous pricing method can obtain quick convergence for considering the price correlation, however, the centralized manner is not suitable for large-scale grid computing systems, and the complex of the method increases quickly as the number of resources increases [11].

622

C. Weng, X. Lu, and Q. Deng

3 System Structure Resources in the grid are organized as resource domains that are individual and autonomous administrative domains, and multiple resource domains are integrated into a seamless grid by the grid middleware such as globus, legion, etc. A grid consists of multiple distributed resource domains, where resources are utilized by selfdetermination, and there are different kinds of resources in a resource domain. The supply and demand of resources in a resource domain is varying along with time. The price of resources should reflect the variation of the supply and demand of resources in the system. A system structure for pricing resources in the grid context is depicted based on the globus toolkit, which is illustrated as Fig.1.

Fig. 1. System Structure

In Fig.1, one kind of agents is used to manage local resource domains based on GRAM in globus toolkit, and is denoted by R-Agent (resource domain agent), which is responsible for assembling the information of supply and demand of resources in the range of the resource domain in response to the given price of resources, and calculating the excess demand of resources in the resource domain, and submitting the excess demand information to the auctioneer. The other kind of agents is responsible for pricing resource groups to achieve an equilibrium, and is denoted by R-Auctioneer (resource group auctioneer). Located in WAN, each R-Auctioneer is in charge of pricing resources among one resource group in the grid system, and communicates with R-Agents for collecting information on the supply and demand of resources through the middleware modules provided by the globus toolkit. The pricing system consists of two kinds of agents. Usually the pricing process need be repeated more than one iteration when the supply and demand of resources has a change. So it is expected that the communication occurring in WAN between RAgents and R-Auctioneers for adjusting price to an equilibrium could be minimized, consequently a pricing approach is presented to meet the requirement.

A Distributed Approach for Resource Pricing in Grid Environments

623

4 The Pricing Algorithm In this section, we present a distributed group-pricing algorithm. Firstly, resources in the grid are divided into multiple resource groups according to the degree of price correlation. Then, after the change of the supply and demand of resources invoking a tâtonnement process, the price of one resource group is adjusted independently from the other groups, and the price of resources in the same resource group is adjusted simultaneously according to the equilibrium of the resource group. This procedure is repeated until the global equilibrium of all resources reaches. The algorithm is described formally as follows. The total number of resource domains in the grid system is denoted by M, and N denotes the total number of resources. According to the degree of price correlation, resources are divided into G groups, and correspondingly the number of RAuctioneer is also G. Assumed that the number of resources in resource group k is and then we have:

Prices of resource group k are adjusted by R-Auctioneer k, and denoted by price vector The algorithm for R-Auctioneer k is as follows: 1. Initialize price vector of resources with the previous equilibrium price. 2. Receive the excess demand function from R-Agent i, i=1, 2,…, M, and

3. Calculate the new price vector With Taylor serial for multivariable function, we have an approximate function as follows:

where,

is the first derivate of vector function which element is as follows:

and as a matrix

where, A new price vector can be obtained through equation (3) with initial and we substitute the new vector price for in equation (3), then equation (3) is calculated repeatedly until We denote this specified as and is a given equilibrium threshold. 4. Determine the amplitude of the price variation:

624

C. Weng, X. Lu, and Q. Deng

According to the given price threshold , the flag for price variation is determined as follows:

5. Send the new price vector and the flag for the price variation to all R-Agents. Each R-Agent will obtain the new price vector and the flag vector by combining individual price vector and individual flag received from R-Auctioneer k. If i.e., P satisfies: total excess demand function Z(P) approximately equals zero, then P is a new equilibrium price vector and denoted by Otherwise, the R-Agent needs to calculate the new excess demand function of resource domain with the algorithm (see Fig.2).

Fig. 2. The algorithm for R-Agents that is used to calculate the new excess demand function

When the supply and demand of resources in the grid system changes, each RAuctioneer and each R-Agent repeatedly carry out the above algorithms by turns respectively until which means that a new equilibrium achieves.

5 Experiments and Discussion In this section, we will test the performance of the distributed group-pricing algorithm by simulation experiments against the performance of the WALRAS algorithm [9], and analyze the property of the presented algorithm. The CES (constant elasticity of substitution) utility function [12] is chosen for the valid comparison between the two algorithms. Then the utility function is as follows:

A Distributed Approach for Resource Pricing in Grid Environments

625

We choose the performance metrics: (1) the number of iterations of R-Auctioneer and R-Agent executing their algorithms by turns until an equilibrium is achieved; (2) square root of sum of the square of excess demand for all resources, which reflects the integral convergence degree during the pricing process. In experiments, the termination condition for iteration is We set =2, and randomly generate coefficient from a uniform distribution [0.1, 200]. The number of resource domains is 50, and endowments for resource domains are uniformly distributed in the range [2000, 3000]. The first situation is that the number of resources is 10. Correspondingly there are 10 R-Auctioneers in the WALRAS algorithm where each R-Auctioneer is responsible for adjusting the price of one resource. For the distributed group-pricing algorithm, resources are divided into 3 resource groups according to the degree of price correlation, and correspondingly there are 3 R-Auctioneers for the 3 resource groups respectively. The experimental result of pricing adjusting process is illustrated as Fig.3 (a). The second situation is that the number of resources is 20, and resources are divided into 3 resource groups for the distributed group-pricing algorithm, and the experimental result is illustrated as Fig.3 (b). The third situation is the same as the second situation except that in the distributed group-pricing algorithm, resources are divided into 6 resource groups, and the experimental result is illustrated as Fig.3 (c). According to Fig.3 (a) and Fig.3 (b), the iteration number for achieving the equilibrium increases for the two pricing algorithms as the number of resources in the system increases, which is because there are more influence on the price of one resource as there are more kinds of other resources in the system. Also we can find that more groups are divided for a fixed number of resources, more iterations for achieving the equilibrium are needed by the distributed group-pricing algorithm, which is illustrated as Fig.3 (b) and Fig.3 (c). That can be explained as more groups of resources are divided, more price interaction among groups of resources exists. Experimental results indicate that an equilibrium can be achieved by the distributed group-pricing algorithm quicker than by the WALRAS algorithm. The rationale behind the distributed group-pricing is that not only all resources are divided into resource groups for scalability borrowed from the distributed WALRAS algorithm, but also prices are adjusted simultaneously in the rang of one resource group for quick convergence borrowed from the traditional centralized pricing method. The other important issue is that the WALRAS algorithm is suitable for asynchronous pricing [9]. Considering the manner of pricing between resource groups is similar to the manner of pricing between individual resources in the WALRAS algorithm, the distributed group-pricing algorithm is also suitable for asynchronous pricing, i.e., each R-Auctioneer in the distributed group-pricing algorithm adjusts the price of the corresponding resource group asynchronously, and in a long run this process will also lead to a global general equilibrium.

626

C. Weng, X. Lu, and Q. Deng

Fig. 3. Price adjusting. (a) The first situation; (b) The second situation; (c) The third situation

6 Conclusions How to determine the price of resources is one of key issues for market-based resource management in grid computing systems. In this paper, a distributed grouppricing algorithm is presented for determining the price according to the general equilibrium theory. Based on group pricing with good scalability, this algorithm is suitable for grid computing systems, and its performance is tested in different situations, and the property of the presented algorithm is also analyzed. Experiments show that the price can be adjusted to achieve an equilibrium quickly by this algorithm when the supply and demand of resources changes. Acknowledgements. This research was supported by the National Natural Science Foundation of China, No. 60173031.

A Distributed Approach for Resource Pricing in Grid Environments

627

References 1.

Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. The International Journal of Supercomputer Applications, Vol.15, No.3 (2001) 200-222 2. Foster, I., Kesselman, C.: The Globus Project: A Status Report. Future Generation Computer Systems, Vol.15, No.5 (1999) 607-621 3. Natrajan, A., Humphrey, M.A., Grimshaw, A.S.: The Legion Support for Advanced Parameter-space Studies on a Grid. Future Generation Computer Systems, Vol.18, No.8 (2002) 1033-1052 4. Varian, H.: Microeconomic Analysis. 3rd edn, W. W. Norton & Company, Inc., New York (1992) 5. Cao, H., Xiao, N., Lu, X., Liu, Y.: A Market-based Approach to Allocate Resources for Computational Grids. Computer Research and Development (Chinese), Vol.39, No.8 (2002) 913-916 6. Wolski, R., Plank, J., Brevik, J., Bryan, T.: Analyzing Market-based Resource Allocation Strategies for the Computational Grid. The International Journal of High Performance Computing Applications, Vol.15, No.3 (2001) 258-281 7. Subramoniam, K., Maheswaran, M., Toulouse, M.: Towards a Micro-Economic Model for Resource Allocation in Grid Computing System. In: Proceedings of the 2002 IEEE Canadian Conference on Electrical & Computer Engineering (2002) 782-785 8. Buyya, R.: Economic-based Distributed Resource Management and Scheduling for Grid Computing [Ph.D. Dissertation]. School of Computer Science and Software Engineering, Monash University, Australia (2002) 61-79 9. Cheng, J., Wellman, M.: The WALRAS Algorithm: A Convergent Distributed Implementation of General Equilibrium Outcomes. Computational Economics, Vol.12, No.1 (1998) 1-24 10. Ygge, F.: Market-Oriented Programming and Its Application to Power Load Management [Ph.D. Dissertation]. Department of Computer Science, Lund University, Sweden (1998) 65-78 11. Zhang, J.: Economic Cybernetics (Chinese). Tsinghua University Press, Beijing (1989) 12. Zhang, J.: Mathematical Economics – Theory and Application (Chinese). Tsinghua University Press, Beijing (1998)

Application Modelling Based on Typed Resources* Cheng Fu and Jinyuan You Department of CS, Shanghai Jiao Tong Univ., China {fucheng, you–jy}@cs.sjtu.edu.cn

Abstract. We have developed a type system for the calculus of Safe Mobile Resources (SR), which is a variant of Mobile Resources(MR). In this paper, we will show the expressive power of the calculus. Some examples will be examined to illustrate how to use the features in SR to model the usual distributed applications in a mobile or cooperative environment.

1

SR Review

The calculus of Safe Mobile Resources(SR) is a variant of the calculus of Mobile Resources(MR), with enhanced capabilities to enforce full coactions. Its three essential behaviors are listed below:

The first one shows the operation of resource consumption. In SR, any resources can be embedded into ambient of any levels which are accessible for any outer process as long as the out process has the specific relative path names for the resource. In (1), for example, is the relative path name. The second reduction shows how resource(process) moves between two places. There are three processes that join the reduction. Any resources are moved passively with the permission(coactions) of a sender process and a receiver process. And the last formula shows the deletion operation for an ambient. Ambient deletion does not support path name access semantically. Let be a countable set of names ranged over by a, b, ... , n, m. The set of all processes is denoted by (ranged over by p, q, ...) and the set of capabilities (ranged over by In typed version, we use a set of restricted names where and to represent the typed names in an abbreviated form. Capabilities and simple capabilities are defined as SA. We write for free names of the process p, and for those of stands for one-step * This paper is supported by the Shanghai Science and Technology Development Foundation project (No. 03DZ15027) and the State Natural Science Foundation project (No.60173033). M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 628–635, 2004. © Springer-Verlag Berlin Heidelberg 2004

Application Modelling Based on Typed Resources

629

reduction. The definition of structural congruence relation is standard. There list an amount of structural congruence rules in Table 2 for position commuting of parallel composition and for restriction scope stretching. Contexts and path contexts are defined as MR and The SR Grammars and syntax are shown in Table 1. The reduction rules are in Table 3 and structural congruence relation are shown in Table 2.

2

Type System

The SR type system is designed to show the attributes of mobility, threadness and resource types for a capability, an ambient and a process. The type grammars for SR are shown in Table 4. The notation used in our type system is inspired by SA and ETS-MT[3]. For instance, an immobile single-threaded process can be typed as Proc where stands for an immobile process and the superscript indicates the total threads in a process, while, if a process is typed as it is then a mobile multi-threaded process. We use X to range over the ambient type

630

C. Fu and J. You

set Z over the capability type set and T over inner process type set The concept we used to identify the mobility[1,5] of a process is to observe whether a process remains in the same place during the reduction. In SR, only and satisfy this property. We have two mobile types: (mobile) and (premobile). A premobile has one more feature than the mobile one. The former can cause the enclosed ambient to be a mobile process while the latter cannot. There are also two immobile types: and (accompanyable). If a process is accompanyable, then it can cause a process emerge to be parallel to it while an immobile one cannot.

To get started, we also need some basic types to denote the types of the resources over which the mobile resources types are built. We use as a collection of all the basic resources types, to indicate any subset of basic types used in the current process, and to indicate any basic type in And the intuitive meaning of the grammar production is explained as follows: Amb[T, is a type of an ambient which can contain a process typed as Proc[T, Cap[T, K] is a type of capability. If K is Shh, it means that the capability will not consume or be consumed as a resource. Otherwise the target resource

Application Modelling Based on Typed Resources

631

or capability to be consumed should therefore coincide with the specified resource type. Proc[T, is a type of a process that has inner type T, and can cause resource consumption typed in The typing rules are shown in Table 5. The commutative operators · , | on on and on are defined as follows:

Then we define a transitive, reflexive relation of subtyping[6,7] on the types of process, which is summarized below. We use on on and on

We only allow subtyping on processes. because without the reduction behavior of processes, it makes no sense for the ambients and capabilities to have such a relation. And subtyping on processes suffice to show the relation between the processes in our calculus. Theorem 1. If and

and

then

with

Proof. The proof is shown by induction on the derivation of Example 2. Consider the following process

By the assumption that and with and we can easily derive By the similar form of process we can model an immobile server [3] with the provision of different services within named boundary. Example 3. Consider the following process

Although we know that most processes like the previous example must be typed as immobile, by (SRT Cap ) and (SRT Amb 2) the resource ambient res can only be typed as premobile which is different from the example mentioned above. We thus can model a mobile resources by using this form. But if we want mobile ambient to be truly movable during reduction, it should then be placed in a container ambient to hold it. After doing that, the outer processes can fetch and make use of it.

632

3 3.1

C. Fu and J. You

Applications An Auto Delivery Cola Machine

This model is an enhanced version of vending machine in MR. We divide the model into two parts: one is the machine; the other is the consumer. We use ambient to denote a slot for credit card, to denote a can that contains cola, and pck to denote the pocket. Other names are intuitively clear. When consuming occurs, card must be a private name so that other process cannot access any resources inside card.

Application Modelling Based on Typed Resources

633

Then we show how the interaction is performed between machine and consumer.

In step (4), ambient card is taken from pck to in (5), resource ecash is consumed; in (6), cola in is consumed; and next, the card is fetched back; at last, is removed(into to the litter bin). We then apply the SR type system to this model. By assuming ecash : cola : we deduce the follow typing results for the other names and processes:

We omit the formal deduction steps. The result shows us that there exists two resources consumption in the process consuming. The machine and consumer remain immobile. Ambient is mobile because it can be deleted. is mobile because it can contain a mobile card. card is mobile because it can be taken and given. pck is mobile for it has the same reason as

3.2

Digital Signature Card

Digital Signature Card is one of the main issue in paper [2]. Here we provide a modified version where the any movement is controlled by all participants. We then apply the type system to the model and make it well typed.

634

C. Fu and J. You

By assuming

we have the following result:

The deduction steps for the results are omitted, but intuitively we give the following explanations. Since there is no resource consumption in this model, the resource type property is shown as empty. is a secure mobile place which can hold classified data that can be sent through a network. reg is an internal place where encrypting and decrypting operations occur. An internal place is immobile. in and out are something like a buffer to hold incoming and outgoing data. and are the local places to allow the and processes to perform their tasks. and are message boxes one for sending, the other for receiving. msg is something like an envelop to hold the message data and the signature network is a physical place where the data transmission takes place. The process and provide their functionality within a bound and they are thus immobile. The two processes are modelled as immobile services to provide unlimited encryption and decryption operations, so they are multi-threaded. is a sender process while is a receiver processes. Both of them stand immobile and contain only one thread to perform their operations. is the whole model process. From outer perspective, it is an autonomous immobile system with actual two threads.

3.3

Resource Pooling

In the large cooperative environment, resource pooling is one of the main issues. Here we give the model of a Servicing Resource Pool designed in language of SR. In the following example, the process is the k-parameterized process where all wanted resources gather in parallel. Each resource process and their carrier ambient represents a single servicing resource. All servicing resources are located in ambient svr. The pooling process is denoted by where denotes how many resources there are in the pool. represents a process that will request for the resource inside the pool. is the process of the whole model. Parameter indicates how many jobs are requesting resources.

Application Modelling Based on Typed Resources

635

The process in each job process can access the resource process inside ambient reg. Moreover must contain capability mem reg pool to return the resource to the pool if the job has no long need the resource. We can furthermore build a larger model where multiple pools for different resources, and jobs can fetch different kinds of resources from different pools. In such a model, we should concern more about the concurrency problems such deadlock and starvation. To type the above model, we assume and Proc where then we have the following results:

4

Conclusions and Related Works

In this paper, we briefly introduced a type system on SR and examine some examples to show how to use SR to model the usual applications. Most applications modelled by SR are a little more complex than by MR. But the chance of the security risks decreased dramatically. This is because any mobile action should reach an agreement by all participants. Besides, a type system is implemented on the calculus to type the mobility, threadness and resources of the SR processes. To fully eliminate the grave interferences[4] in SR, a more complex type system is under research. Further more, bisimulation congruences under the typed calculi need to be developed. And the expressive power for SR seems to be far from enough. There are lots of necessary work to do, such as encoding ambients or etc.

References [1] L. Cardelli, G. Ghelli, and A. D. Gordon. Mobility types for mobile ambients. Technical Report MSR-TR-99-32, Microsoft Research, 1999. [2] J. C. Godskesen, T. Hildebrandt, and V. Sassone. A calculus of mobile resources. In Proc. CONCUR ’02, volume 2412 of Lecture Notes in Computer Science, 2002. [3] X. Guan, Y. Yang, and J. You. Typing evolving ambients. Information Processing Letters, 80(5):265–270, 2001. [4] F. Levi and D. Sangiorgi. Controlling interference in ambients. Short version appeared in Proc. 27th POPL, ACM Press, 2000. [5] E. G. M. Coppo, M. Dezani-Ciancaglini and I. Salvo. M3: Mobility types for mobile processes in mobile ambients. In Proc. CATS ’03, volume 78 of Electronic Notes in Theoretical Computer Science, 2003. [6] B. Pierce and D. Sangiorgi. Typing and subtyping for mobile processes. Journal of Mathematical Structures in Computer Science, 6(5):409–454, 1996. An extended abstract in Proc. LICS 93, IEEE Computer Society Press. [7] P. Zimmer. Subtyping and typing algorithms for mobile ambients. In Proc. FoSSaCS ’00, volume 1784 of Lecture Notes in Computer Science, pages 375–390, 2000.

A General Merging Algorithm Based on Object Marking* Jinlei Jiang and Meilin Shi Department of Computer Science and Technology, Tsinghua University, Beijing, P. R. China, 100084 {jjlei, shi}@csnet4.cs.tsinghua.edu.cn

Abstract. It is an ordinary need for cooperative applications to merge different versions of an object to a common state. Though many approaches exist, they are either too complex to implement or not flexible enough to meet the various high-level requirements. To solve the problem, a general merging algorithm is developed based on object marking, i.e., the contents of an object are marked with appropriate labels. The paper details the algorithm and shows how to recover operation context and how to detect and resolve operation conflicts with an example. The algorithm is efficient and flexible enough to allow users to specify various merging policies. Therefore, it can be implemented as a common service for cooperative applications.

1 Introduction For cooperative applications, it is an ordinary need to merge different versions of an object to a common state[4]. For example, in the course of collaboratively producing a document or some other artifact, collaborators often find that they have created two versions, each containing revisions that they wish to have in a single version. It then becomes a task to take the set of revisions from one version and re-apply them to the other version of the object. Another scenario requiring merging is mobile computing, where users replicate objects to local machine while online and then disconnect from the server and manipulate the objects offline as they move. At last different copies of an object are gathered somewhere and merged into a single one. To merge the contents, first of all we should tell out the differences, which can be done with the help of differencing tools. After that, we can re-apply one set of changes to an object to another object to obtain a new version of the object. This procedure is usually error-prone and time-consuming. Therefore, a tool performing the merge automatically would be highly useful. Existing merging tools can be divided into two categories, i.e., text-oriented and object-oriented. In text-oriented merging tools, the contents under operation are simple text documents. Examples of them are rcsmerge[7], semantic diff[3] and flexible diff[5]. While in object-oriented merging tools, the contents under operation are objects, which may have sophisticated structure. Examples of them include GINA[1], transformation based concurrency control[2, 6] and flexible object merging framework[4]. As we all know, objects *

This work is co-supported by the National Natural Science Foundation of China under Grant No.60073011, the National High Technology Research and Development 863 Program under Grant No.2001AA113150 and 985 Project of Tsinghua University.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 636–643, 2004. © Springer-Verlag Berlin Heidelberg 2004

A General Merging Algorithm Based on Object Marking

637

within cooperative applications usually have complex structures beyond text documents and computer programs. Therefore, the applicability of text-oriented merging algorithms is limited. Though object-oriented paradigm solves the data representation problem, deficiencies still exist. For example, GINA doesn’t support automatic merge, and transformation based concurrency control algorithms are hard to implement, and flexible object merging framework can’t handle whole-object operations and the matrix entries of it will explode for complex objects. To address these issues, a general merging algorithm based on object marking is developed in this paper. It defines a set of symbols (called labels) to denote the changes made to the object. Based on the semantics of the labels, one can easily recover operation context, detect and resolve operation conflicts. The rest of the paper is organized as follows. The coming section explains the basic idea behind our scheme. Section 3 describes the details about the algorithm proposed. Following that, an example illustrating the algorithm is given in section 4. The paper ends with some conclusions drawn in section 5, where future work is also given.

2 Object Marking We deploy Cova Object Description Language (CODL)[8] to represent the artifacts under operation, which provides a basis of general structured objects and allows us to take an application-independent approach to merge objects. The idea behind object marking is very simple, that is, if we can shield the changes suffered from concurrent operations by adding some labels to the shared object, the difficulty encountered due to different original context will be conquered.

2.1 Labels and Context A label is a symbol identifying the changes made to the contents of an object. It is obvious that the differences between the revised object and the base version are no more than three cases, that is, new contents are inserted, and existing contents are removed, and existing contents are updated. Therefore, three labels are defined to mark the object as follows. Insert label, denoted by I, is used to mark the inserted contents. Delete label, denoted by D, is used to mark the deleted contents. Update label, denoted by U, is used to mark the updated contents. For the changed contents, we also need to specify their scope. To do so, we borrow the idea of tagged data items from HTML/XML documents. In more detail, we exploit / and / to denote the start and the end of the inserted/deleted contents respectively. For update label, the format looks like “X/Y”, where X and Y indicate the new and the old value respectively while “/” acts as the separator between X and Y. Note that only the changed data items are labeled. Users may join a session at different time. This will cause a problem that the original object copies of different users are different, which will affect the differencing procedure and result in wrong merged result without careful treatment. To solve this

638

J. Jiang and M. Shi

problem, version number is introduced to track the original context information of a participant. In our algorithm, version is represented by a natural number and the original version for an object is always 0 (other values are also feasible). Combine version number and labels and then we get the full object-marking scheme. With version introduced, the marking labels can be uniformly represented as “...”, where Version is the context identifier and X is the label key.

2.2 Content Retrieval and Context Recovery Content retrieval is exploited to remove the labels (The contents sent to users contain no labels). According to the semantics of labels, this process is straightforward – contents without labels or labeled with I are copied directly, and contents labeled with D are omitted. As for updated contents, the new values are copied. During this process, we care nothing about the version numbers. This procedure functions when a user opens an object. Meanwhile the server will record the user’s context, which will be used for resolving conflicts after the operation results are submitted. After the object is opened, different users can then work independently on their own copies. Context recovery is used to recover the original object context for differencing and merging. Unlike in content retrieval, labels should be treated carefully during this process because the object may have been modified meanwhile. In more detail, suppose the original context of a user is N. To recover his/her original context, contents labeled with version number no greater than N don’t need to be reversed for they have been perceived by the user according to retrieval procedure. But a version number greater than N indicates the corresponding contents are not perceived by the user, so the labels with version number greater than N must be reversed, that is, inserted or updated contents are omitted while deleted contents are restored.

2.3 Marking Criteria During marking process, the following rules must be obeyed. Rule 1 (scope maximization). It requires that a label should mark as many contents as possible a time. This rule is introduced to reduce the labels present. Thus, the space occupied can be saved. According to this rule, labels “<5I>a<5I>b” should be changed into “<5I>ab”. Rule 2 (well structured). By well structured we mean that 1) each label must have an end label, and 2) no labels start within other labels and end outside of them. For example, labels “<2I>...<3D>......” are not well structured because the label <3D> starts in between <2I> and and ends outside of them. However, labels “<2I>ab<3D>cdef” are well structured. Rule 3 (label nesting). It says that 1) nested labels must be well structured, and 2) versions of the nested labels must be greater than those of the ones nesting them. This rule is specially designated to accelerate the context recovery process.

A General Merging Algorithm Based on Object Marking

639

3 Merge Algorithm This section looks into the merge algorithm.

3.1 Conflicts and Merge Policies A merge policy is a set of rules that determine which revisions will be included in the merged object. We borrow the basic idea of the merge matrix in flexible object merging framework, i.e., the merge policy is defined for each level of a structured object. However, some fundamental modifications also have been done. First, merge matrix in our scheme is only specified on the three primitive operations (i.e., Insert, Delete and Update as mentioned previously) and a complex operation is viewed as a combination of the primitive operations. In this way, entry explosion issue is avoided. Second, merge policies are pre-defined for the 8 primitive types and 5 collection types (i.e., list, array, set, bag and dictionary) at the object level. With no user-defined policies specified, the default ones will be deployed. This eases the burden of specifying merge policies. Third, our scheme can handle whole-object operations, which is impossible in flexible object merging framework. Similar to other object-oriented language, once a class is instantiated, it is not allowed to be re-structured. Therefore, the reason causing conflict is that two concurrent operations update the same object element with different values. The merge policy for primitive types is very simple as defined in Table 1, where “–” means the corresponding case will never occur. Three choices provided are users (denoted by F) that means it is up to user or a program to decide which value to keep (As in flexible object merging framework, it is a function that presents the users with the alternative changes and requests that they select one of them, or a function that accepts the changes and returns the choice), both (B for short) that means the system will keep the both values, and overwrite (O for short) that means the old value will be replaced by the one submitted most recently (called the newest value hereafter). The default policy for primitive types is O. The policy B is deployed when the users want to keep the revision history. However, though many values are recorded with B policy, only the newest value is used.

Merge policies for collection types are divided into three categories, that is, list and array, set and bag, and dictionary. In the following we will explain them in detail. Conflicts for list and array are as follows. Insert-Insert conflict. Users insert different elements at the same place concurrently.

640

J. Jiang and M. Shi

Delete-Update conflict. Some users alter an element while some concurrent ones delete it. Update-Update conflict. Two participants update the same element with different values concurrently. There is no Delete-Delete conflict since either the targets are different or the users’ intensions are the same. In addition, there is no Insert-Delete conflict because the results of concurrent operations can’t be perceived by each other as the work procedure requires. So does with Insert-Update conflict. The merge policy for list and array is illustrated in Table 2, where indicates the two operations are never paired for comparison or they are compatible. The default policy is also O. For Delete-Update conflict, by O we mean that the element will be deleted if the last submitted operation is Delete. Otherwise, it will be kept. With policy B selected, both values will be kept and used for Insert-Insert conflict. However, only the newest one is used for UpdateUpdate conflict. Set and bag have no index defined on them and conflicts for them are as follows. Delete-Update conflict. It occurs when one user deletes an element while another user alters its value. Update-Update conflict. It occurs when two users update the same element. The merge policy for set and bag is shown in Table 3. Note that it must hold that elements in a set should be unique. The default policy is B. For Delete-Update conflict, by B we mean that the updated results are kept. For Update-Update conflict, by O we mean that the result submitted most recently (the newest value) is kept. In addition, if two users insert the same value into a bag object concurrently, only one value is kept because their intentions are consistent in this case.

The index of dictionary type is acted by some keys, which distinguishes it from list and array. Conflicts for it are as follows. Insert-Insert conflict. Two users insert the same key with different values. Update-Update conflict. Two concurrent users update the value corresponding to the same key. Delete-Update conflict. One user deletes a key while some others alter the corresponding value concurrently. The merge policy for dictionary type is described by Table 4. For Insert-Insert conflict, we keep both users’ intentions by default (Note that one key should be altered in this case). The default policy for other conflicting cases is O as explained previously. Merge policies for user-defined objects are recursively specified on its attributes, that is, 1) if the attribute is of primitive or collection type, the policies discussed above will be applied, and 2) if the attribute is also of a user-defined type, merge

A General Merging Algorithm Based on Object Marking

641

policies for its attributes are used instead. With no policy specified, the default ones will be used. Merge policies are supplied as a policy profile, which is loaded each time the merge procedure is invoked. The profile can also be modified as time goes on. In our scheme, it is allowed to treat an object as an atomic unit. This is accomplished by specifying certain attributes of object types with the word “atomic” in the policy profile. For atomic object, it makes no sense to merge changes to it. Therefore, the merge policy for atomic object is the same as that for primitive types. In the end, we should point out that the policies for collection types only hold when the element is of primitive type or atomic object data type.

3.2 Merge Procedure The merge algorithm, denoted by MergeObject(oid, NC, ov), is illustrated as follows, where oid is the ID of the object to merge, and NC denotes the operation results, and ov represents the operation context, i.e., the original version number.

The algorithm first loads the merge policies and then one final common object is obtained through one by one differencing and merging of its attributes. The function CompareObject first recovers the context to ov and then does the comparison. This algorithm is invoked each time users submit their results. Afterwards, the server will refresh the labels marking the object and increase the context version maintained.

642

J. Jiang and M. Shi

4 A Case Study In this section, we will illustrate the algorithm by a string object, which is defined as follows with most operations omitted.

The merge process with default policies is shown in Fig. 1, where the vertical axis represents the time while the horizontal one represents operation related information such as participants, action, operation results and so on.

Fig. 1. Merging Process for a String Object

Digitals in Ver. column record changes of object version, where the one within the brackets is the original context for the corresponding user while the one out of the brackets is the latest version maintained at server side. Users C1, C2 and C3 form a session and they have the same original context. When the result from C1 is submitted, character “1” is found as a result of insert operation and therefore, it is marked with Insert label. When C4 joins the session, the content retrieval procedure (i.e. RetrieveContent) returns “A1B2CDEF” back. After the results from C3 are merged,

A General Merging Algorithm Based on Object Marking

643

the object got is “A<1I>1B<2I>2C<3I>3DE<3D>F”. Since labels with version no greater than 2 have no use any longer, they are removed by label refreshing procedure. As a result, we obtain “A1B2C<3I>3DE<3D>F ”. Operations from C3 and C4 cause conflict. According to the merge policy, result from C4 is kept. At last, a final object (“A2B3C4DE4”) is obtained by removing all the labels.

5 Conclusions and Future Work In cooperative applications, the need to merge different versions of an object to a common state is often encountered due to several reasons including optimistic concurrency control, asynchronous coupling and absence of access control. To meet this requirement, we have developed a general merging algorithm based on object marking. Our algorithm has the following characteristics and therefore, it can be implemented as a common service. It is based on general objects and thus can cover a wide spectrum of applications. Automatic conflict resolution eases the burden put on users. Flexible merge policies make it possible to meet various high-level requirements. It is efficient and easy to implement. Consistency guarantee is an important issue in CSCW. Although the algorithm proposed in this paper can meet the consistency requirement, it has an assumption that participants are working unaware of each other. Computing environment in the future will contain both online and offline users, under this circumstance it is our next goal to keep data consistency without losing efficiency as well as work awareness.

References 1. Berlage T. and Genau A.: A Framework for Shared Applications with Replicated Architecture. In: Proc of ACM Symposium on UIST (1993) 249–257 2. Ellis C. A. and Gibbs S. J.: Concurrency Control in Groupware Systems. In: Proc of ACM Conf on Management of Data (1989) 399–407 3. Horwitz S., Prins J. and Reps T.: Integrating Noninterfering Versions of Programs. ACM Transactions on Programming Languages and Systems, 3(1989) 345–387 4. Munson J. and Dewan P.: A Flexible Object Merging Framework. In: Proc of ACM Conf on CSCW (1994) 231–242 5. Neuwirth C. M., Chandhok R. et al: Flexible Diff-ing in a Collaborative Writing System. In: Proc of ACM Conf on CSCW (1992) 147–154 6. Suleiman M., Cart M. and Ferrie J.: Serialization of Concurrent Operations in a Distributed Collaborative Environment. In: Proc of ACM Conf on Supporting Group Work (1997) 435–445 7. Tichy W. F.: RCS – A System for Version Control. Software – Practice and Experience, 7(1985) 637–654 8. Yang G. X. and Shi M. L.: Cova: An Object-oriented Programming Language for Cooperative Applications. Science in China (Series F), 1(2001) 73–80

Charging and Accounting for Grid Computing System Zhengyou Liang1,2, Ling Zhang1, Shoubin Dong1, and Wenguo Wei1 1

GuangDong Key Laboratory of Computer Network , South China University of Technology, GuangZhou, 510641, P.R.China {zhyliang, ling, sbdong, wgwei}@scut.edu.cn 2

College of Computer and Information Engineering, GuangXi University, NanNing, 530004, P.R.China

Abstract. Grid computing is the key technology of next generation Internet. Today grid research is mostly focus on the communication, security, resource manangement and information management. Charging and accounting is a base activity in a economy society, so that it should become a part of grid computing system in computational economy environment. In this paper, we introduce charging and accounting item in a grid computing system, and propose a method for calculating the cost of a grid usage. This method gives out how to calculate the standardization technology cost of the usage of a job and how to translate the standardization technology cost into currency cost. Further more, we analyse the demands of a charging and accounting system in a computional economy based grid. A architecture of charging and accounting system and its support system is designed in this paper.

1 Introduction With the popularization of the Internet and obtaining of powerfull computer and high speed network, these low cost commerce component are changing our method of using computer. It is possible for us to use the computer network as a simple uniform computing resource. And it is possible to connect the geographic distributed all kinds of computing resources and aggregate them into a simple uniform resource. This form of resource is usually called “computational grid”[1]. The solution framework of 21th century scientific problem will base on the heterogeneous complicated “grid”. The applications based on grid include today security mechanism, web browse, remote collaboration engineering, distributed petabyte data analyse and alive equipment control system[1,2,3]. Grid computing is the key technology of the next generation Internet. The key conception of grid is coordinated resource sharing and solves problem in the dynamic multi institutional virtual organizations[2]. In the world wild, a lot of ambitious projects are using the grid computing conception to deal with some challenging problems, such as distributed analyse of experiment physics data, together access earthquake engineering equipment, create the “science portals” for thin client to access all kinds of remote data and system and transaction processing of extra large data[1,4]. In a grid system, the resource provider and the resource consumer are away not the same person, and often do not belong to the same organization. So when one uses the M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 644–651, 2004. © Springer-Verlag Berlin Heidelberg 2004

Charging and Accounting for Grid Computing System

645

resources of grid, a economy active takes place between the resource provider and the resource consumer. From the view of economy, the resource provider should obtain suitable benefit from providing the service, and the resource consumer should pay for the service. The grid system can be maintained by paying for service. Today the research of grid is mainly focus on the communication, security mechanism, resource management and information management. They lack the base service that support the economic activity in a grid system. So it is necessary to develop a charging and accounting service for the grid, which will manage the cost of usage of grid and support the economic activity according to the computational economy. In this paper, We introduce charging and accounting item in a grid computing system, and propose a method for calculating the cost of a grid usage. Further more, we propose a charging and accounting system for grid.

2 Related Work Charging and accounting for grid has been taken into account for some grid projects and researchers. [5] discusses what kinds of resource to be charged and accounted in DataGrid project, and proposes a calculating cost method, which translates the cost into the credit. A working scheme of their accounting model also presented.This scheme monitors and controls the job executing in real time. It may be too complicated in many cases. [6] introduces which resource needs to be accounted and charged in a computational economic-based grid. And which element affects the resource value is discussed. But how to implement the accounting and charging is not included in it. [7] proposes a Charging distributed services of a computational grid architecture. It thinks that a computation grid consists of four layers—communication service layer,computation service layer, information service layer, and knowledge layer. The three former layers of them provide QoS service by using the CPS[8] pricing scheme. CPS[8] pricing scheme is first used to deal with the congest in the TCP/IP based network for QoS. A charging system for grid was designed in [7]. But how to deal with “computing power” which is the computational grid’s main issue is not discussed in it. In this paper, we propose a calculating cost method that translate the standardization technology cost into the currency cost, and propose a scheme for grid charging and accounting system, which deal with the “computing power” metering and cost calculation.

3 Charging, Accounting, and Calculating Cost 3.1 What Should Be Charged and Accounted in Computational Grid It is necessary to decide for what kinds of resource elements should pay in a grid system. User applications require different resource. The requirement depends on computations performed and algorithms used in solving problems. The demanding resource of different applications is usually different. Some applications can be CPU

646

Z. Liang et al.

intensive while others can be I/O intensive or a combination. Therefore, the consumption of the following resources may be accounted and charged[6]: CPU - User time (consumed by user App.) and System time (consumed while serving user App.) Memory Maximum resident set size - page size Amount of memory used Page faults Storage used Network bandwidth consumption Signals received, context switches Software and Libraries accessed . It is obvious that no every resource will bill the cost of every chargeable element. In a specific application , only the resources used by it will be billed.

3.2 Cost Calculation It is important to determine the value for every resource element and consequently, the price. The price togtheter with the amount of usage of the resource determines the cost. It’s important to note that the value and the price of a resource are conceptually different. The value of a resource should be essentially the way to quantify the real capabilities of the resource itself[5]. [5] assumes that the price of a resource should be related to the value of that resource and that two resources with the same value should have comparable prices. But in fact, the Price is different according to different economic model, and a resource has different price at different condition. In this paper, we calculate the standardization technology cost of a application by charging and accounting system, and then transform the standardization technology cost into currency cost according to pricing policy. Pricing policy is determined by economic model. It is beyond of discussion in this paper. The following is the detail of cost calculation. The computing usage is defined as the product p·u, where p is a performance factor and u is the amount of usage of that resource element[5]. For example if p refers to the CPU power, u should be the amount of CPU time used by the job. This product is called technology cost in this paper, and it is nearly constant for a given job executed on processors with varying CPU Power in ideal. Furthermore, we define standardization technology cost as the product k·(p·u). Where p·u is the technology cost, and k is the normalization coefficient that use to image the value of specific resource. When we concerns the standardization technology cost for the whole job, it could be obtained from the standardization technology cost of every resource component by computing:

Where and are defined above. The i index runs over the resource elements (i.e. CPU, RAM, Disk, etc. . .).

Charging and Accounting for Grid Computing System

647

The price of a standardization technology cost unit is gived by a pricing algorithm, which is determined by the economy model used by the grid system. According economy rule, price is usually related with the user demands, the user’s wish, the relation of the demand and supply in the market, the provider’s currently load, the provider’s wish. So the price is a function of the elements mentioned above. We can use a function to express it as follow:

Where Price is the price of a standardization technology cost unit. D is the demand of usage of resource; usually it is an assessed value. PU is the user’s pricing policy, which images the user’s wish. R is the relation of demand and supply in the market. L is the currently load of the provider. And the PP is the provider’s pricing policy, which images the provider’s wish. The currency cost of the whole job is computed by: Where P is the standardization technology cost calculated by (1), and Price is Calculated by (2), C is the currency cost of a job.

4 A Charging and Accounting System for Grid In this section, we suppose a trade scene in a computational economy based grid system and subsequently, educe the demands of the charging and the other related item. Then we design the charging and accounting system for the grid, with the support modules.

Fig. 1. Process of a application in a computationnal economy based Grid

648

Z. Liang et al.

4.1 The Trade in a Grid and Its Demands In order to understand running scene of the charging and accounting system, we introduce how a comsumer submits a application to the grid and how the grid handles the job in a computational economy based grid system. Fig .1. shows the process: (1) The consumer(Application) submits his job with some parameters to the Grid Resource Broker. The parameters include consumer’s pricing policy and deadline. (2) The Grid Resource broker inquires of the Information Service about which Grid Service Provider is available. (3) The Information Service returns a set of available Grid Service Providers to the Grid Resource Broker. (4) If no availabled Grid Resource Provider, the Grid Resource notices the costumer that no available Grid Resource and the process is end. Or the Grid Resource Broker submits the job to one of the Grid Service Provider. (5) The Grid Service Provider evaluates the job technology cost and gives a price of the job to the Grid Resource Broker according to the technology cost , the demand and supply state in the maket, the provider’s pricing policy and its current loading. The Grid Resource Broker decides whether accept that price according to the cosumer’s pricing policy. If accept that price, continue to the step (6). Or it negotiates about the price with the provider. If they come to a price that accepted by both of them, continue to the step (6). Or the Grid Resoure broker chooses another available Grid Service Provider, go to step (4). (6) The Grid Service Provider signs a contract with the broker. Then the provider allocates resource for the job and executes the job. A component name “Data gather” meters and samples the usage of the job, seeing fig .2.. It sends the data to the Charging and Accounting System(CAS). After the job finish, the CAS calculates the currency cost and sends the currency cost data to the Billing System. Then the provider returns the job results to the broker. (7) The broker returns the results to the cosumer. All of the interaction between the Broker and Provider are supported by the Grid middleware. We don’t discuss the Grid middleware in detail for simplify the describe the process of handle the job. From the describing above, we get the demands of the provider. In computational economy environment, a provider should include the follow demands for computational economy: (1) Evaluate a job technology cost. (2) Evaluate the Supply and demand in a computinal power maket. (3) Give a price for job based on the technology cost evaluated, supply and demand state evaluate, the provider pricing policy and its current loading. (4) Meter and sample the usage of a job. (5) Calculate the currency cost of a job. (6) Billing system. Besides mention above, it must support the usually demands, such as resource reservation, resource allocation and trade server, etc. We don’t discuss them because they have often been discussed in the articles about grid resource management.

Charging and Accounting for Grid Computing System

649

Fig. 2. Grid Service Provider and Charging & Accounting System

4.2 The Architecture of Grid Service Provider and Its Charging and Accounting System We maps the demand into our design, showing as fig.2.. The important components relating to the cost calculation are introduced in follow. The Supply and demand assess module is used to evaluate the supply and demand state in the computional power market. According economy theory, the relationship of supply and demand is an important element in deciding the price of a goods. The price is low When supply is more than the demand, and the price is high when supply is less than the demand. The Consumer’s job assess module is used to evaluate how much technology cost that a job will spend in the provider’s machine. The broker need to know the machine performance, but it is not only decided by the hardware of the machine, it also depends on the algorithms that the job use to solve a problem. So if different providers use different algorithms, the same job may spend different technology cost even in the same type machine and subsequently, spend different currency cost. A broker need to know how many technology cost it will spend before it accept a price. It can compare the cost among the providers and choose the suitable one. The provider also needs the technology cost evaluated to help calculate a price in order to give a competing price to the consumer. The Data gather module is the process that provides general ways to meter and sample the usage of the grid resources for a job, such as the usage of cpu, memory, storage, etc. When a job is dispatched in the grid resources, it will be metered and sampled by the Data gather module. After the job finished, the data of its usage of the

650

Z. Liang et al.

grid resources will be sent to the Accounting module, Pricing algorithms module and the Consumer’s job accsess modules. The Pricing algorithms module is use to calculates optimal prices given current loading, Consumer job technology cost evaluated, supply and demand state evaluated, and the provider’s pricing policy. This prices was sent to the consumer for negotiate about the acceptable price for both provider and consumer. If they come to a acceptable price, it was sent to the charge calculation module. A pricing algorithms module may has more than one algorithms. The Accounting module collects data about the task or bulk usage of each customer that is provided by the Data gather module. The Charge calculation module receives the price that are sent from the pricing algorithms module and the data from the accounting module. It calculates the charges for the finished computing task and its output is again the input for the billing mechanisms of the provider .

5 Conclusion In this paper, what kinds of resource is charged and counted is introduced in the grid. A cost calculation method was proposed, which transforms the standardization technology cost of a job into curruency cost. And a architecture of charging, accounting and its support module is proposed. The shceme we discuss in this paper will apply to our campus grid.The further research is the pricing scheme and the related economic model that suitable for our grid. Acknowledgements. This work has been supported by GuangDong Key Laboratory of Computer Network (Grant No. 2002B60113) and GuangXi University Science Research Foundation (Grant No. CC1407).

References 1.

2. 3.

4.

A Natrajan, M A.Humphrey, A S.Grimshaw: Grids: Harnessing Geographically-Separated Resources in a Multi-Organisational Context. In Proceedings of the 15th Annual Symposium on High Performance Computing Systems and Applications(HPCS 2001) , Ontario, Canada, June 18-20, 2001. Available: http://legion.virginia.edu/papers/HPCS01.pdf I. Foster, C. Kesselman, S. Tuecke: The Anatomy of the Grid: Enabling Scalable Virtual Organizations . International J. Supercomputer Applications, 15(3), 2001. Available: http://www.globus.org/research/papers/anatomy.pdf. Mark Baker, Rajkumar Buyya and Domenico Laforenza: The Grid: International Efforts in Global Computing. International Conference on Advances in Infrastructure for Electronic Business, Science,and Education on the Internet (SSGRR’2000), IAquila, Rome, Italy, July 31 - August 6. Available: http://www.cs.mu.oz.au/~raj/papers/TheGrid.pdf. A. Iamnitchi, I. Foster: On Fully Decentralized Resource Discovery in Grid Environments. International Workshop on Grid Computing, Denver, Colorado, November 2001. Available:http://citeseer.nj.nec.com/cache/papers/cs/25088/http:zSzzSzpeople.cs.uchicago. eduzSz~andazSzpaperszSzGC2001.pdf/iamnitchi01fully .pdf

Charging and Accounting for Grid Computing System 5.

6.

7. 8.

651

C. Anglano, S. Barale, L. Gaido, A.Guarise, S.Lusso, A.Werbrouck: An accounting system for the DataGrid project -Preliminary proposal. draft in discussion at Global Grid Forum 3, Frascati, Italy, October, 2001. Available: http://server11 .infn.it/workload-grid/docs/DataGrid-01-TED-0115-3_0.pdf Rajkumar Buyya: Economic-based Distributed Resource Management and Scheduling for Grid Computing. A thesis submitted in fulfillment of the requirements for the Degree of Doctor of Philosophy, School of Computer Science and Software Engineering Monash University, Melbourne, Australia. April 12, 2002. Stiller, B., Gerke, J., Flury, P., Reichl, P.: Charging distributed services of a computational grid architecture. Cluster Computing and the Grid, 2001. Proceedings. First IEEE/ACM International Symposium on , 15-18 May 2001 Page(s): 596 –601 B. Stiller, J. Gerke, P. Reichl, P. Flury: The Cumulus Pricing Scheme and its Integration into a Generic and Modular Internet Charging and Accounting System for Differentiated Services, TIK Report No. 96, Computer Engineering and Networks Laboratory TIK, ETH Zürich, Switzerland, September 2000. Available: http://anaisoft.unige.ch/public-documents/deliverables/TIK-Report96.pdf.

Integrating New Cost Model into HMA-Based Grid Resource Scheduling Jun-yan Zhang, Fan Min, and Guo-wei Yang College of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610051, China {jyzhang, minfan, gwyang}@uestc.edu.cn

Abstract. Grid systems can provide a virtual framework for management and scheduling of resources across different domains. This paper proposes an HMA-base grid resource scheduling system to implement resource finding and scheduling. A new cost model is also given, considering resource finding cost and resource deciding cost beyond traditional model. In succession, the new cost model is integrated into the HMA-base grid resource scheduling system. Our experiment shows that optimal solution under traditional cost model is no longer optimal under our model. Keywords. Grid, resource scheduling, Agent, cost model

1 Introduction In traditional distributed computing environments (DCEs), resource management systems (RMSs) were primarily responsible for allocating resources for tasks [3]. They also performed functions such as resource discovery and monitoring to support their primary roles. With large amount of distributed resources and users, grid systems can provide a virtual framework for management and scheduling of resources across different domains, and they have been the focus of much research activities in recent years. A computational grid is an emerging computing infrastructure that enables effective access to high performance computing resources [5]. Resource management and scheduling are key grid services, where issues of utilizing grid resources reasonably by minimizing total cost represent a common concern for most grid infrastructure and scheduling algorithm developers. Resource management and scheduling in grid systems [1] [2] is challenging due to: (a) geographical distribution of resources; (b) resource heterogeneity; (c) autonomously administered grid domains having their own resource policies and practices; and (d) grid domains using different access and cost models. In this paper, we adopt Hierarchical Multi-Agent-based (HMA-based) methodology to grid resource scheduling, achieved by integrating cost into the HMA-base grid resource scheduling system to implement minimal cost resource scheduling. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 652–659, 2004. © Springer-Verlag Berlin Heidelberg 2004

Integrating New Cost Model into HMA-Based Grid Resource Scheduling

653

The paper is organized as follows: Section 2 introduces the traditional cost model. In section 3, the HMA-based grid resource scheduling system is described. In section 4, we integrate new cost model into our HMA-based grid resource scheduling systems. Comparing experimental results are included in section 5 and the paper concludes in section 6.

2 Traditional Cost Model The traditional cost model of the Internet forms the basis of most call admission, routing and reservation algorithms today. The model combines the cost classifying, switching, queuing and scheduling at a node with the cost of transmission over the next link in one abstract figure associated with each node-link pair (See Figure 1)[6]. Based on this model, the total cost of a system can be defined as:

Where, C denotes the total cost of system; Cs denotes the cost of switching and scheduling; Cq denotes the cost of queuing; Cl denotes the cost of link.

Fig. 1. Traditional Cost Model

Although simple, this model has proven effective and robust in designing many popular network protocols. Shortest Path Routing, for example, uses this cost model to find the route between a pair of origin-destination nodes by minimizing the sum of per-hop costs.

654

J.-y. Zhang, F. Min, and G.-w. Yang

3 HMA-Based Grid Resource Scheduling System In our resource scheduling system, we consider the whole grid system as a Global grid, which is made up of n Grid Domains. Let (i = 1, 2, ... , n) denote the ith Grid Domain (See Figure 2). The is an autonomous, administrative and interactive entity consisting of a set of resources, services and users managed by a single Management Agent In our system, we divide each GD into two subdomains: (a) a resource domain (RD) which signifies the resources within the GD; and (b) a user domain (UD) which signifies the users within the GD.

Fig. 2. HMA-Based Grid Resource Scheduling System First of all, we define as system resource matrix, where specifies how many type j resources possesses. We also define as system processing power matrix, where specifies processing power of for the jth type of resource, clearly for any legal i, j. In this system, Resource Deciding Agent (RDAgent) maintains an n×n table D which records the distance between GDs, where denotes the distance between and It also holds a quadruple to describe each GD in detail provides by each MAgent. Where, (a) denotes serial number of current GD, (b) specifies local resources; (c) denotes the processing power of current GD; (d) denotes the users of the ith user within At the beginning, RDAgent initializes the distance table D and quadruple according to the information submitted by all existed MAgents. When RDAgent polls MAgents on its own initiative, MAgents would refresh the table and the quadru-

Integrating New Cost Model into HMA-Based Grid Resource Scheduling

655

Fig. 3. A New Cost Model

ple if the update happens within current GD. Otherwise, MAgents needn’t submit the update and only keep the update information locally. Let TR denote a task requirement originated from a GD, it includes the explicit user IP address, and its original GD. It also includes the information of resource requirement. The requirement amount of each type of resource is denoted as respectively. After a TR is recognized and analyzed by Resource Scheduling Agent (RSAgent), RSAgent orders Resource Finding Agent (RFAgent) to find the location of resources which can meet the requirement TR. At first, RFAgent queries RDAgent for resource information. Because RDAgent has a distance table D and a quadruple after RFAgent finishes the finding processing, it can get a resource deployment matrix The columns specify RS and the rows specify Grid Domains. If locates at and we let else we let Now RFAgent examines this matrix if RFAgent finds a column whose elements are all it will ask the RDAgent to update table D and quadruple After update table D and quadruple if matrix still does not satisfy the TR, the next time update will start after a period of time (which we set to 5 minutes). If appropriate resource combination cannot be found after some times (which we set to 3 times), RFAgent concludes that the TR cannot be met and the task cannot be implemented, and sends this message to RSAgent. RSAgent will cancel this task and notify the user. If appropriate resource combination can be found, RFAgent sends this resource combination information to RSAgent. RSAgent will schedule resources according to this information and return result to the user, which procedure will be given in the next section.

4 Integrating New Cost Model into HMA-Based Grid Resource Scheduling System Generally speaking, resource schedulers and resource managers tend to choose the nearest resource because they make decisions depending upon traditional cost model [4]. Now we present a new cost model based on grid systems; therefore, the cost of resource finding and cost of resource deciding will affect the total cost significantly.

656

J.-y. Zhang, F. Min, and G.-w. Yang

Thus, the system total cost C consists of three parts: cost of resource finding cost of resource deciding and cost of resource scheduling Cs. Its architecture is illustrated with Figure 3. On the view of Figure 3, can be further divided into processing cost and transmission cost The total cost of resource scheduling C can be defined as:

Now we can integrate the new cost model into HMA-based grid resource scheduling systems to implement reasonable scheduling with minimal cost. On receiving a TR, RSAgent will forward resources and processing power requirement to RFAgent. RFAgent examines this matrix if it finds a column whose elements are all it will ask the RDAgent to update table D and quadruple We use integer variable count to record the update times. After refreshing table D and quadruple if matrix still does not satisfy the TR, the next time update will start after a period of time (about 5 minutes). Therefore, the cost of finding resources can be measured by count and denoted as That is to say, If count > M (M is a constant and M >0. In our paper, we set M = 3), then let RFAgent consider the TR cannot be met and tell this message to RSAgent. RSAgent will cancel this task and notify the user. If only one element is not in each column, this means only one kind combination of resources and Grid Domains can satisfy TR. So RSAgent has no choice but select this combination as the only solution regardless of cost C. If there is more than one resource combinations can satisfy TR, RDAgent will list all possible resource combinations and make decisions which combination to be chosen. The combinations can be signified as follows:

where RAi represents the GD which offers the ith resource, 0 for no requirement of respective resource. The requirement of each type of resource can be met by some GD. Each type of resources is selected independently, so we need choose among at most n values each time. Accordingly, the time complexity is O(n×k) instead of If some which satisfies TR and includes GD, which means there are kinds resources can be provided locally, and kinds resources locate at other GDs. We can define as:

Resource scheduling cost is made up of processing cost and transmission cost The stronger the processing power of GD, the smaller the of GD. For represents computation power of GDi for resource j, then

is the total computational power when choosing respective combination.

Integrating New Cost Model into HMA-Based Grid Resource Scheduling

657

Accordingly, we use processing power’s reciprocal to signify i.e. Because we assume the task to be processed cannot be divided further, the GD is unique. The father the distance between two GDs, the larger the Accordingly, we use logarithm of the distance between GDs and to signify so we have:

658

J.-y. Zhang, F. Min, and G.-w. Yang

Fig. 4. Performance comparison

Based on equation (2), the system total resource scheduling cost is:

RDAgent will choose the CB with minimal C and recommend to RSAgent. RSAgent would perform scheduling according this CB and return results to corresponding user. In this paper, we don’t compare costs of two tasks, so how many resources (such as communication traffic, computing overhead and so on) are needed is not our concern when we calculate total cost C. That is to say, the requirement we compute is relative but not absolute.

5 Comparing Experiment We simulate a system containing 10 GDs. 4 types of resource are provided. Distance between these GDs are listed in Table 1, initial information about all GDs are listed in Table 2, and task requirements information are listed in Table 3. Figure 4 shows average cost of tasks under two conditions: 1, optimal solution with traditional cost model and 2, optimal solution with new cost model. Here we can see that optimal solution under traditional cost model may not be optimal under new cost model.

6 Conclusion With the growing popularity of middleware dedicated at making so-called grids of processing and storage resources, network based computing will soon offer to users a

Integrating New Cost Model into HMA-Based Grid Resource Scheduling

659

dramatic increase in the available aggregate processing power. We propose an HMAbased grid resource scheduling system and a new cost model is given. Especially, we formalize resource information and task requirement in detail. Comparing experiment shows that optimal solution under traditional cost model is no longer optimal under our model.

References [1] [2]

[3]

[4] [5] [6]

I. Foster, C. Kesselman, and S. Tuecke, “The anatomy of the Grid: Enabling scalable virtual organizations,” Int’l Journal on Supercomputer Applications, 2001. K. Krauter, R. Buyya, and M. Maheswaran, “A taxonomy and survey of Grid resource management systems,” Software Practice and Experiance, Vol. 32, No. 2, Feb 2002, pp. 135–164. F. Berman, R. Wolski, S. Figueira, J. Schopf, and G. Shao, Application-level scheduling on distributed heterogeneous networks, in “Proc. 1996 Supercomputing”, Pittsburgh, PA, USA, 1996. B. Davie, S. Casner, C. Iturradle, D. Oran, J. Wroelawski, “Integrated Services in the presence of Compressible Flows”, Internet Draft, February 1999. I. Foster and C. Kesselman, “The GRID: Blueprint for a New Computing Infrastructure”, Morgan-Kaufmann, 1998. Kazem Najafi and Alberto Leon-Garcia, “A Novel Cost Model for Active Networks”, Communication Technology Proceedings 2000, vol.2, pp. 1073–1080.

CoAuto: A Formal Model for Cooperative Processes* Jinlei Jiang and Meilin Shi Department of Computer Science and Technology, Tsinghua University, Beijing, P. R. China, 100084 {jjlei, shi}@csnet4.cs.tsinghua.edu.cn http://cscw.cs.tsinghua.edu.cn

Abstract. A formal model called CoAuto (Cooperative Automaton) is proposed to describe and analyze cooperative processes. A basic CoAuto abstracts the behaviors of a single active entity. It separates data from control states and thus, can describe various cooperation scenarios (e.g. synchronous and asynchronous) by composition in a uniform yet flexible way. The composition can be done at two different levels (i.e. data sharing and action/control sharing) and thus, more complex cooperative process can be depicted. The paper details the structural elements of CoAuto and shows how to model cooperative processes in real world and analyze some basic properties (e.g. safety and liveness).

1 Introduction Computer Supported Cooperative Work (CSCW) is concerned with understanding how people work together and ways in which technology can assist. Interests in it have intensified in the last few years. As a result, numerous groupware systems have emerged such as MMConf[4], GroupKit[9], workflow systems, to name a few. Though these products have different emphases and are designed for supporting cooperation of certain type, further studies on them have shown that they present some regularities. For example, each system should support communication among the computational entities. This makes it possible to develop a general-purpose platform that can provide some services common to groupware development. To achieve this purpose, we believe a formal model will help us better understand the essential concepts and some interesting properties of cooperative processes. Indeed, people have done a lot towards the purpose above. Examples are various coordination models and languages[8], and team automaton[5]. Coordination models and languages are exploited to describe concurrent and distributed computations. Here we take IWIM[1] as an example, which is a control-driven model. In IWIM, different computing entities are interconnected by streams and communicate with each other through input/output ports. The formal semantics of a kernel of * This work is co-supported by the National Natural Science Foundation of China under Grant No.60073011, the National High Technology Research and Development 863 Program under Grant No.2001 AA113150 and 985 Project of Tsinghua University. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 660–668, 2004. © Springer-Verlag Berlin Heidelberg 2004

CoAuto: A Formal Model for Cooperative Processes

661

MANIFOLD, which is a language implementing IWIM, are presented in [2] based on a two-level transition system: the first level is used to specify the ideal behavior of each single component, whereas the second level captures their interactions. Although the approach is interesting, it is specially designed for MANIFOLD and loses the generality — it’s not suitable for describing synchronous activities. Team automaton is a framework and mathematical model for describing and analyzing groupware systems. It concerns with how to build groupware systems and how groupware systems work rather than to analyze cooperative processes. In this paper, a formal model called CoAuto is proposed to describe cooperative processes in a uniform yet flexible way. A CoAuto is a two-level transition system. At the first level, there exists a set of transition systems with each defining the behavior of a single participant. Different systems are combined via data dependency to model synchronous activities in a cooperative process. At the second level, there exists a single transition system that defines the interactions between the transition systems at the first level via control dependency. With this model, we can then analyze the properties of cooperative processes. The rest of the paper is organized as follows. In section 2, we revisit cooperative procedures and identify some key characteristics. These observations form the foundation of our work. Then CoAuto is formally defined in section 3. Composition rules are also given here. Section 4 presents a simple example to illustrate the model. Afterwards, we discuss some related issues such as safety and liveness in section 5. In the end, conclusions are drawn in section 6 with future work given.

2 Cooperation Revisiting A cooperative procedure can be studied from three aspects, that is, task decomposition, dependency and cooperation mode. Their details are as follows. Generally speaking, cooperative task always has a large goal and involves multiple participants. To accomplish it, it is usually divided into sub-tasks, which are then assigned to different people. The procedure of decomposition is recursive and it won’t stop until all the final sub-tasks can’t be divided any more or some conditions are met. In this paper, we use the term atomic task to denote such sub-tasks. By atomic we mean only one participant engages in a single and simple objective according to some pre-defined rules. A cooperative task is said to be modeled if all the atomic tasks and their relations have been identified and the composition rules are properly given. There are two typical inter-dependencies between two atomic tasks, that is, data dependency and control dependency. A data dependency exists among atomic tasks if these tasks access the same data object. Usually these tasks are called peer tasks. In this case, any operation executed by one atomic task should be multicast to other ones if the operation has modified the data object. This often occurs in synchronous groupware. One crucial issue related is to maintain the consistency of the shared data object. Control dependency means there are some causal relations between two tasks. For example, task B can’t start until task A completes. It is very common in workflow

662

J. Jiang and M. Shi

management system. In addition, we should point out that more complex dependency can be expressed by using logic connectives Real-world cooperation is far more complex and can’t be simply treated as a single asynchronous or synchronous process. For example, a report or an article can be coauthored synchronously by multiple authors and then the resulting document may be transferred asynchronously to others for further review or approval process. To facilitate further discussion, various tasks are divided into three classes as shown in Fig. 1.

Fig. 1. Cooperation Modes

The above three aspects are related to each other. The rules of decomposition depend on the type of dependency. For the sake of simplicity, we assume that a complex task will be decomposed into a collection of atomic tasks that satisfy there is at most one type dependency between any two of them. With atomic tasks and the dependencies identified, cooperative task of any type can be described by combining them. In more detail, (1) a single user task is an atomic task, and (2) a synchronous task could be modeled as a set of atomic tasks with data dependency, and (3) an asynchronous task could be modeled as a set of atomic tasks with control dependency, and (4) for integrated task, atomic tasks with data dependency are composed first and then the results are combined with other ones according to control dependency.

3 Formal Definitions CoAuto is derived from I/O automaton[7], which is known to have the power to specify synchronous or asynchronous, blocking or nonblocking systems[5]. Before running into the formal definitions, we will introduce some notations, represents null set and means Cartesian product, and i, denote indices. Definition 1 (Cooperative Automaton). A cooperative automaton is a tuple (D, S, where D is a set of data variables representing the data object under manipulation. Each variable should have a value, we denote by V the values assigned to D at a certain time. S is a nonempty state set is a set of input actions, which are generated by the environment and computed by the automaton is a set of internal actions, which are generated and computed by the automaton is a set of output actions, which are generated by the automaton computed externally by the environment For the above three sets, the relation holds. Each action in them has a form of <e, c, o>, where e is an event, c is the pre-

CoAuto: A Formal Model for Cooperative Processes

663

condition and o is the operation. An operation o can be executed if and only if event e occurs and condition c is met. We use A (called action signature) to denote all the actions of a CoAuto, here I is a nonempty initial state set, which has the form of where and is a set of values assigned to D at state s. F is a set of transition rules. It has two forms Normal: here This rule applies to the situation where the corresponding atomic task has no data dependency with the others. Abnormal: here and is a transformation function. This rule applies to the situation where data dependency occurs. The purpose of is to convert the defined actions into actual internal actions in order to keep the consistency of the shared data among distributed sites. An example for is the transformation functions in dOPT[6]. A CoAuto defined above(called basic CoAuto) abstracts the behaviors of a single actor in a cooperative process. Though these actors can perform operations freely as long as they are permitted, they will eventually achieve a common group decision. So, they can be regarded as a single coherent cognitive unit (i.e. an activity in a cooperative process) which performs externally observable behaviors and interacts with environments. This is the philosophy behind CoAuto. With basic CoAuto defined, we can describe various cooperation modes uniformly by composition. Definition 2 (single user or atomic task). A single user or atomic task is a CoAuto with transition rules only of the normal form, since there is no data or control dependency present. Definition 3 (synchronous task). Given a collection of CoAutos sharing the same data and with no disjoint internal actions (called peer CoAutos), or formally and a CoAuto describing a synchronous task can be obtained according to the following rules. C.D = D

F contains all admissible transitions described below. we have and If we have and Definition 4 (asynchronous task). Given a collection of CoAutos sharing no data and internal actions between any two of them, or formally If

664

J. Jiang and M. Shi

A CoAuto the following rules.

describing an asynchronous task can be obtained according to

where M is the number of component CoAutos. This rule implies that data of each component automaton are treated as a whole.

Transition rules in this case are similar to those in definition 3 except that the input actions will only change the states of component automata that can recognize them, whereas the other component automata keep dormant during this cycle. Definition 5 (cooperative process). Given a collection of CoAutos satisfying the following condition a CoAuto describing an asynchronous task can be obtained via the following two steps: 1. Compose the peer automata according to the rules given in definition 3. 2. Compose the rest automata and the results got in step 1 according to the rules given in definition 4. Note this can also be done hierarchically. Though derived from I/O automaton, CoAuto has quite some differences from its parent: 1) CoAuto shares not only the joint actions, but the common data. This makes it convenient to specify the requirement on data consistency, and 2) it is allowed for two component automata to have the same output actions in CoAuto, and 3) the composition of CoAuto can be done at different levels according to dependencies present.

4 A Case Study In this section we will illustrate the model defined previously. The example deployed is an abstraction of a B2C e-commerce process where buyers interact with the dealers to purchase something. The process is shown in Fig. 2.

Fig. 2. An E-Commerce Process

The process contains two activities, that is, Deal and Consult. Deal activity has one participant seller, an input action br(Buyer Request), an internal action hr(Handle Request) and an output action cs(Consult Scheduling). Consult activity is a synchronous activity involving two participants, that is, seller and buyer. Both the seller and the buyer can suggest deliver time(dt). Once the suggestion is accepted, the activity

CoAuto: A Formal Model for Cooperative Processes

665

ends producing an output action dr(Deliver Request). Otherwise, dt must be set again. Thus, this activity has three input actions: cs from Deal activity, SetDT and SetAgree from seller and buyer. Let = {Ready, Running, Completed, Aborted}. The component automata and the composed result for this process are as follows. Deal: where dealt is a variable indicating whether the request has been handled. We just omit other variables here for simplicity.

Consult: Component automata for seller and buyer are the same and given below. In this case, one just accepts results from the other, so we have for any where agree indicates if the negotiation completes

Running> According to definition 5, the composed automaton for the process can be got via the following two steps. (1) Compose and first. The result, denoted by is as follows.

(2) Compose

and

The result, which is what we want, is as follows.

F contains all admissible transitions. The state transition diagram is shown in Fig. 3, where

666

J. Jiang and M. Shi

is the initial state and is the final one. State is compound, indeed it contains many states with different dt values. In the diagram, means input action and means output action. Output actions are important for identifying the relations between CoAutos, however, when analyzing a single CoAuto, they are usually removed. In addition, actions causing no state transition can also be removed from the diagram.

Fig. 3. State Transition of Composed Automaton

5 Process Analysis Safety and liveness are two important properties when proving the correctness of cooperative processes. Roughly speaking, a liveness property specifies that certain desirable events will eventually occur, while a safety property specifies that undesirable events will never occur. With CoAuto model, we can formally define them as follows based on the normal state transition diagram. Definition 6 (normal state transition diagram). A state transition diagram of a CoAuto is normal if each action in the diagram connects two (same or different) states. This definition is used to remove disturbing actions to guarantee the correctness of analysis results. The diagram given in figure 3 is not normal since dr connects only one state. However, if we remove dr from the diagram, it will become normal. Definition 7 (live process). A cooperative process is live iff for each state (denoted by s) reachable from the initial state (denoted by in the normal state transition diagram of the CoAuto corresponding to this process, there exists a transition sequence leading to a final state (denoted by Formally, Here means reachability. A state is called reachable from transition sequence leading from state to That is,

iff there is a

Definition 8 (safe process). A cooperative process is safe iff for each non-final state in the normal state transition diagram of the CoAuto corresponding to this process, there exists at least one output action. Formally, Here is a set of final states and represents all admissible output actions at state s. From this definition, we can see it is necessary to remove disturbing actions to avoid making wrong conclusions.

CoAuto: A Formal Model for Cooperative Processes

667

Definition 9 (sound process). A cooperative process is sound iff it is live and safe. In the end we should point out although the properties are defined according to the state transition diagram, other formats are also viable. Indeed, our formalism supports a variety of verification techniques such as simulation methods, compositional reasoning[3] and temporal logic methods[10].

6 Conclusions and Future Work This paper has established a mathematical foundation called CoAuto to deeply understand the cooperation which involves humans’ activities of different types. The model is based on such a fact observed that a complex cooperative process can be divided into a set of atomic computation entities. These computation entities cooperate with each other to achieve a common goal. The cooperation among them differs in interdependency (data or control) and cooperative mode (synchronous or asynchronous). The outstanding feature of CoAuto is in its uniformity and flexibility resulting from the separation between data and control dependencies. Moreover, thanks to the formal automata set-up, results and methodologies from automata theory are applicable. Our work is far from completion and there are still many open problems. For example, are there redundant states/actions in the state transition diagram? How long will a process last? In the future, we will study these two problems further. In addition, we will also investigate efficient algorithms to verify properties of cooperative processes. Acknowledgements. Some of the work was done at Bell-Labs Research China. Thanks owe to Yan Wu and Guangxin Yang for their significant contributions to the idea presented in this paper.

References 1. 2.

3.

4. 5. 6.

Arbab F.: The IWIM Model for Coordination of Concurrent Activities. In: LNCS 1061, Springer-Verlag (1996) 34–56 Bonsangue M. M., Arbab F.: A Transition System Semantics for the Control-Driven Coordination Language Manifold. Report SEN-R9829, 1998, CWI, Amsterdam, The Netherlands Cheung S. C., Giannakopoulou D. and Kramer J.: Verification of Liveness Properties Using Compositional Reachability Analysis. ACM SIGSOFT Software Engineering Notes, 6(1997) 227–243 Crowley T., Milazzo P. et al: MMConf: an Infrastructure for Building Shared Multimedia Applications. In: Proc of ACM Conf on CSCW (1990) 329–342 Ellis C. A.: A Framework and Mathematical Model for Collaboration Technology. In: LNCS 1364, Springer-Verlag (1998) 121–144 Ellis C. A. and Gibbs S. J.: Concurrency Control in Groupware Systems. In: Proc of ACM Conf on Management of Data (1989) 399–407

668

J. Jiang and M. Shi

Lynch N. A. and Tuttle M. R.: An Introduction to Input/Output Automaton. CWI Quarterly, 3(1989) 219–246 8. Papadopoulos G. A. and Arbab F.: Coordination Models and Languages. Report SENR9834, 1998, CWI, Amsterdam, The Netherlands 9. Roseman M. and Greenberg S. GroupKit: A Groupware Toolkit for Building Real-time Conferencing Applications. In: Proc of ACM Conf on CSCW (1992) 43–50 10. Sistla A. P.: On Characterization of Safety and Liveness Properties in Temporal Logic. In: Proc of ACM symposium on Principles of Distributed Computing (1985) 39–48

7.

A Resource Model for Large-Scale Non-hierarchy Grid System* Qianni Deng 1, Xinda Lu 1, Li Chen 2, and Minglu Li 1 1

Dept. of Computer Science&Eng., Shanghai JiaoTong Univ., Shanghai 200030, P.R.China {deng–qn, lu–xd, li–ml}@cs.sjtu.edu.cn 2

Dept. of Mechanical Eng., Shanghai JiaoTong Univ., Shanghai 200030, P.R. China [email protected]

Abstract. Computational Grid and Peer to Peer computing system are interlap– ping with each other progressively. This paper brings forward a resource model for future large-scale non-hierarchy grid system. With this model we can represent heterogeneous resources sharing relationships. Based on the assumption of power-law degree distribution and result of Lada A. Adamic [20], we find that the unstructured locating algorithms used in P2P system do not suit this resource model. Finally we suggest that it is necessary to classify Grid systems, observe network topology and build classified resource models for different types of Grid system.

1 Introduction Early Computational Grid [1,2] and Peer to Peer computing system [3] are two different types of distributed systems, now they are interlapping with each other progressively, future large scale resource sharing Grid system will has following features. Large-scale, lack of centralized control. Dynamical changeable membership. Some of Grid resources are stable and reliable, but the others may join or leave the system dynamically. Resource diversity. The diversity means (a) multiple types of resources, e.g. computational resource, data, service, instrument, storage and so on. Resources of a same type are heterogeneous, e.g. different O.S., different number and speed of CPU, different size of data, and different provided services, (b) Multiple characteristics. Some resources can provide stable services, for example, provide service for all users from 6.00AM to 6:00PM. But some resources are un-stable, for example, only can be shared when the resources are idle. To manage and locate diverse resources in this large-scale, dynamic and heterogeneous environment, one needs an effective resource organization model. This model must be different with the hierarchy resource model of early computational Grid. In * This work is supported by National Natural Science Fund of China (No. 60173031). M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 669–676, 2004. © Springer-Verlag Berlin Heidelberg 2004

670

Q. Deng et al.

following sections of this paper, we first analyze the network topology of large scale resource sharing Grid system, then put forward a non-hierarchy resource model, based on the proposed model we compare two resource locating mechanisms by theoretical analysis. And finally we bring forward possible researching hotspots of Grid resource model.

2 Network Topology of Large-Scale Resource Sharing System Internet is the base infrastructure of large-scale distributed resource sharing Grid system. The network topology of Internet is un-predictable and changes dynamically, and it is one of key factors that affect the efficiency of resources management and locating in Grid system. Random graph is a basis mathematic tool used to research large-scale complicated network system. The classical model [4,5,6,7] of random graph theory supposes that there are totally N labeled nodes in the graph and connectivity probability between any two randomly chosen nodes is p, therefore there exists pN(N – 1)/2 links in the graph totally. Each node has several links with neighbor nodes. Not all nodes in a network have the same number of links. The spread in the number of links of the diverse nodes, or a node’s degree, is characterized by a distribution function p(k) , which gives the probability that a randomly selected node has exactly k edges. Since in a random graph the edges are placed randomly, the majority of nodes have approximately the same degree, close to the average degree of the network. The degree distribution of a random graph is a Poisson distribution

But it has been discovered that for most large networks the degree distribution significantly deviates from a Poisson distribution. In particular, for a large number of

A Resource Model for Large-Scale Non-hierarchy Grid System

671

networks [8,9,10,11,12,14], including the World Wide Web, the Internet, the degree distribution has a power-law tail.

is approximately a constant, that is said p(k) will not be affected by the scale of the network, such networks are called scale free. From most research and analysis, it has been observed that Internet graph follow Power–law in three level: router topology, inter-domain topology and World Wide Web, and the degree distribution power-law tails are all between 2 and 3. The related exponents are shown in Table 1.

3 A Non–hierarchy Resource Discovery Model Wei Li et al[17] has proposed a hierarchy Grid resource model. This paper mainly focuses on building a resource model for p2p grid environment proposed by Iam– nitchi A. [13]. In the context of decentralized resource discovery in large-scale, distributed, heterogeneous systems, we assume that every participant in the virtual organization has one or more servers that store and provide access to resource information. We call these server-nodes or peers. A node may provide information about a small set of resources (e.g. locally stored files or the node’s computing power, as in a traditional P2P scenario) or a large set of resources (e.g. all resources shared by an institution, as in a typical Grid scenario). From the perspective of resource discovery, the gird is thus a collection of geographically distributed nodes that may join and leave at any time and without notice. Then we give formalization of a resource discovery model based on above assumption. Definition 1. We look the resource discovery model as an undirected graph G. In G the number of nodes and connection between nodes are changeable dynamically, the number of nodes is at most N. All nodes in G are peer to peer, and they all have same function, providing information about (local or community) resources. If two randomly chosen nodes can access each other and exchange information mutually, one link exits between these two nodes. When a request passes to a node one needs to check the resource information in this node, we suppose that the checking time in each node is a constant. Definition 2. Because of the diversity of resources, the access probabilities of some high performance nodes are higher than the other common nodes, we call these nodes capable nodes, e.g. a high performance server can provide high quality service and information about a large set of resources. Therefore the capable nodes would have more links than common nodes. Based on the observed conclusion about Internet topology we assume without loss of generality that the graph G is a scale-free net-

672

Q. Deng et al.

work, the degree distribution of graph G is a

the power tail

is between

2 and 3, and the max degree in graph G is Definition 3. We denote any one resource as r. For any one node, we denote it as the relationship between and r can be denoted as If node has record of the information of resource r, we give but if node has no record of the information of resource r , we give Definition 4. Locating process of resource can be regarded as a request going from initiative node until reaching information provider node where We denote this process as a search path and

4 Analysis of Resource Discovery Algorithm 4.1 Random Walk Description. This algorithm does not consider node heterogeneity and degree distribution disequilibrium. Searching for a requested resource can be look as a random walk process, walking from the initiative node, randomly choosing a neighbor of current node as next checking node, repeating above process until the requested resource is found. Analysis. In worst situation, resource r only can be found until all information nodes have been visited. Therefore the searching cost of random walk, we denote it as s, is the length of walking path through which can scan all nodes in the whole graph. We use the generating function formalism introduced by Newman [19] and Lada A. Adamic [20] for graphs with arbitrary degree distributions to analytically characterize search-cost in power-law graphs. Suppose that we have an undirected graph—of N vertices, with N large. We define the generating function for the distribution of the vertex degrees k, then

Where

is the probability that a randomly chosen vertex on the graph has degree

k. For a graph with a power-law distribution with exponent minimum degree k = 1 and an abrupt cutoff at the generating function is given by

The distribution is assumed correctly normalized, so that

A Resource Model for Large-Scale Non-hierarchy Grid System

The average degree

of a vertex in the case of

673

is given by

Another very important quantity is the distribution of the degree of the vertices that we arrive at by following a randomly chosen edge. Such an edge arrives at a vertex with probability proportional to the degree of that vertex, and the vertex therefore has a probability distribution of degree proportional to The correctly normalized distribution is generated by

If we only consider the number of outgoing edges from the vertex we arrived at, but not include the edge we just came from, we need to divide by one power of x. Hence the number of new neighbors encountered on each step of a random walk is given by the generating function

Therefore the average number of visited neighbors in each step of random walk is

Supposed that

we impose the result of Lada A. Adamic[20] then give

674

Q. Deng et al.

The cost for scanning the whole graph is

so

4.2 High Degree Neighbor Routing Description. This Algorithm considers node heterogeneity and degree distribution disequilibrium. In resource locating process, if a node fails to find the requested resource information in local store it can choose to pass the request message to the neighbor with the most neighbors. Each step of locating resource r is, choosing the highest degree neighbor (denoted as A) of current node as the next checking node, checking whether A has information of r, repeating above process until the requested resource r is found. Searching Cost. In worst situation, resource r can be found until all information nodes had been visited. Therefore the searching cost, we denote it as s, is the length of walking path through which can scan all nodes in the whole graph. Note. In fact, because of the dynamicity of resources, even though a certain node has a information record that resource r locates in node it does not mean that we need to check it from but the above algorithm ignores the cost of this scene. Let be the degree of the last node we need to visit in order to scan a certain fraction of the graph. Then the number of first neighbors scanned is given by

Where we make the assumption that i.e., the degree of the node has not dropped too much by the time we have scanned a fraction of the graph. That is to say, to scan a fraction of the graph we need to walk steps

The cost for scanning the whole graph is

If situation.

the searching cost of algorithm 2 is higher than of algorithm 1 in worst

A Resource Model for Large-Scale Non-hierarchy Grid System

675

5 Conclusion and Future Work From the definition of the proposed resource model we know that it is a generalized model, with which we can represent sharing relationships among heterogeneous resources. Because this model does not distinguish type of resources, it is hard to find a uniform structured method to represent and locate resources. Based on the assumption of power-law degree distribution and result of Lada A. Adamic [20], section 3 analyses the scaling of two unstructured locating algorithms, we find that the locating algorithm which impose the power-law distribution of the network topology does not suit the resource model. We believe that we can improve the resource model as follow. Construct different resource model for different type of Grid, for example Computational Grid, Data Grid and Service Grid, then we can find a uniform resource representation in each type of grid respectively. Besides the unstructured method we have analyzed, we hope to find some more effective structured method to improve resource management and discovery in large-scale, non- hierarchy grid system. Observe the realistic network topology of computational grid, data grid and Service grid. We want to answer the following questions after observing, how about the topology of different types of grid? Are they all scale-free networks? Are their power-tails same with Internet and between 2 and 3? Design corresponding resource management and locating methods according the network topologies of different types of grid.

References 1. 2. 3. 4. 5. 6. 7. 8. 9.

The Anatomy of the Grid: Enabling Scalable Virtual Organizations, IJSA, 2001. http://www.globus.org What is the Grid: A Three Point Checklist, Opinion Pieces by Ian Foster ,Grid Today, July 20, 2002. http://www-fp.mcs.anl.gov/~foster/Articles/WhatIsTheGrid.pdf Andy Oram, editor. Peer-to-Peer Harnessing the Power of Disruptive Technologies. O’Reilly Associates, 2001. as, Random Graphs, Academic Press, New York (1985). P. Erdos and “On random graphs,” Publications of Mathematicae 6, 290–297 (1959). P. Erdos and “On the evolution of random graphs,” Publications of the Mathematical Institute of the Hungarian Academy of Sciences 5, 17–61 (1960). P. Erdos and “On the strength of connectedness of a random graph,” Acta Mathematica Scientia Hungary 12, 261–267 (1961). Albert, R. and Barabasi, A.-L. Statistical Mechanics of Complex Networks. Rev. Mod. Phys.Vol 74, January 2002. B. A. Huberman and L. A. Adamic, “Growth dynamics of the world-wide web,” Nature 401, 131 (1999).

676

Q. Deng et al.

10. J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins, “The web as a graph: Measurements, models, and methods,” in Lecture Notes in Computer Science, No. 1627, T. Asano, H. Imai, D. T. Lee, S.-I. Nakano, and T. Tokuyama (eds.), Springer Verlag, Berlin (1999). 11. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, and J. Wiener, “Graph structure in the web,” Computer Networks 33, 309–320 (2000). 12. M. Faloutsos, P. Faloutsos, and C. Faloutsos, “On power-law relationships of the internet topology,“ Comp.Comm. Rev. 29, 251–262 (1999). 13. Iamnitchi, A.; Foster, I.; Nurmi, D.C. A peer-to-peer approach to resource location in grid environments High Performance Distributed Computing, 2002. HPDC-11 2002. Proceedings. 11th IEEE International Symposium on , 2002 Page(s): 419 –419 14. Pandurangan, G.; Raghavan, P.; Upfal, E.Building low-diameter P2P networks. Foundations of Computer Science, Proceedings. 42nd IEEE Symposium on 2001, Page(s): 492 – 499 15. R. Albert, H. Jeong, and “Diameter of the world-wide web,” Nature 401, 130–131 (1999). 16. Govindan, R., and Tangmunarunkit, H., Proceedings of IEEE INFOCOM 2000, Tel Aviv, Israel, (IEEE, Piscataway, N. J.), 3, 1371 (2000). 17. Wei Li, Zhiwei Xu, Fangpeng Dong, Jun Zhang, Grid Resource Discovery Based on a Routing-Transferring Model, 3rd International Workshop on Grid Computing (Grid 2002). 18. Lada A. Adamic, Rajan M. Lukose, Amit R. Puniyani, and Bernardo A. Huberman, Search in power-law networks. PHYSICAL REVIEW E, VOLUME 64, 046135(2001) 19. M. E. J. Newman , S. H. Strogatz , and D. J. Watts. Random graphs with arbitrary degree distributions and their applications. 20. Lada A. Adamic, Rajan M. Lukose, Amit R. Puniyani, and Bernardo A. Huberman, Search in power-law networks. PHYSICAL REVIEW E, VOLUME 64, 046135(2001)

A Virtual Organization Based Mobile Agent Computation Model Yong Liu, Cong-fu Xu, Zhao-hui Wu, Wei-dong Chen, and Yun-he Pan College of Computer Science, Zhejiang University Hangzhou 310027, P.R. China [email protected],{xucongfu,wzh,chenwd,yhpan}@zju.edu.cn

Abstract. Mobile agent has developed for decade, and widely implemented in distribute computation. However, traditional mobile agents including strongmigration and weak-migration mobile agents still have some weakness. With the fabric of Grid virtual organization architecture, the mobile agent has gain great advantage comparing to those old mobile agents system. In this paper, a novel formalized mobile agent computation model based on the virtual organization is presented. In this model, all the actions of the mobile agents are treated as states. The process of the mobile agents’ workflow is controlled by a finite-state-machine. This ensures the atomic action for each mobile agent to avoid the abnormal condition of communication mismatch. This model takes full advantages of strong-migration mode agent, such as robustness and intelligence; it can also overcome the serious weakness of large amount of data transmission existing in the strong-migration mode agent systems.

1 Introduction Mobile Agents are programs that can be migrated and executed between different network hosts. They locate for the appropriate computation resources, information resources and network resources, combining these resources in a certain host, to achieve the computing tasks [1]. There are tow types of mobile agents classified by the migration ability of the agents. They are strong migration mobile agents and weak migration mobile agents. The ordinary mobile agents system such as AgentTCL [6], Voyager System, Aglet System [7] etc, can all be ranged into those two types. The AgentTCL used the strong migration policy by which the mobile agent takes not only both executable codes and data used in executing process, but also the states of the executing process. The Voyager and Aglet use weak migration policy by which the mobile agent only takes the executed codes and the states of data. When using the strong migration policy, the mobile agent system needs to record all the states and related data in each position of the agent, which will spend huge time and huge space for the transport, and will lead to low efficiency. When using weak migration policy, the transportation of data will decrease greatly, however, the abilities of adapting the complicated network topology will decrease too. Therefore, how to design a reliable M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 677–682, 2004. © Springer-Verlag Berlin Heidelberg 2004

678

Y. Liu et al.

and high-efficiency work pattern for the mobile agent becomes the key problem. In fact, the grid [3] technology has provided a powerful platform for the mobile agent. A serial of grid protocols [2,3], such as GNSP, GGMP, QDP etc, make the new work pattern available. In this paper, we introduce a finite state machine based mobile agent computation model that the ability of migration is between strong migration and weak migration. In that model, the executing process has been divided into several states controlled by the finite state machine, and only the agent body and the communication data should be transported, by this way, we increase the efficiency of transport and the adaptation of mobile agent.

2 VO Based Fabric Architecture of Computation Model The VO based architecture mentioned in this paper is a structure similar to the fabric layer in [2]. The following part, some definitions are given out: Definition A. Node, the minimized devices that can load and execute the mobile agents in network denoted as There is a kind of nodes called Key Node, which deal with the remote communications. Definition B. Group, the set includes one node or several nodes, noted as The group can identify each node in VO, which means each node except key node will only belong to a certain group. means node belongs to group Group is a comparatively stable organization; the nodes belonging to certain group can leave this group and join another group dynamic. The login and logout of nodes use a GGMP (Grid Group Management Protocol) [2], which is similar to the IGMP. Definition C. Node Distance, the least route number between tow nodes, the Node Distance between node i and j marks as Above all the definitions, we can give out the definition of VO. Definition D. Virtual Organization, VO is a fabric structure that composes by nodes and is established by a serial of protocols. Normally, the group contains resembling and adjacent nodes. There is a Key Node in each group, the Key Node in the group, marks as The function of key node in a group is similar to the gateway in a LAN, which communicates with other nodes outside the group. A protocol called GNSP (Gateway Node Selective Protocol) [3] has been used to determine the key node. Among all the nodes and groups, the key nodes constitute a virtual group called Kernel Group, It is the most important portion that server for other nodes, deal with communication, seeking etc between nodes in virtual organization topology.

3 VO Based Mobile Agent Computation Model To implement the VO based computation model, the first requirement is that the mobile agents can be executed in different nodes, so the finite-state mobile agent has been given out, in that model the data and resource have been distinguished. The definitions present as follow:

A Virtual Organization Based Mobile Agent Computation Model

679

3.1 Finite-State Mobile Agent Definition E. Finite-state mobile agent is a resource driven mobile agent system. In fact, the mobile agent whose migration ability is between strong migration and weak migration can be seen as a finite-state machine auto motioning and driven by the resource and data. Definition F. Data are all the local data that will affect the mobile agent’s state changing. Definition G. Resource is a general designation of all the data, device and software in VO nodes. In our model, all the runtime parameters and devices in remote nodes are called resources, which is different from the data affecting the mobile agent’s state-change. Therefore, we can ignore the influence of network topology to the executing of the mobile agents; we can only care for the state-change of the mobile agents when they are executing, migrating, in other words, this mobile agent system need not to care for which node does the mobile agent move from or move to, the current state of mobile agent is the only parameter that should be recorded in that mobile agent, be migrated with mobile agent and be update in time. There are several parts, finite states, transition relation, exterior input symbols (data or resource) etc, which constitute a finite-state mobile agent. The finite-state mobile agent marks as follow: is the identity of the mobile agent. It will retain the identity value during the runtime, and the VO architecture can locate the right mobile agent by It can adopt a universal finite state set and common transition relation in practice service, such as virtual experiment device sharing service mobile agent, while the can be seemed as an instance handle. is the finite state set, including request state suspend state block state and service state The input symbols Ur include tow condition: one is the resource input symbol for service state the other one is the service time Fr is the transition relation.

3.2 VO Based Mobile Agent Computation Model (MACM) After defining the finite-state mobile agent, we can give out the computation model for VO based mobile agent: Definition H. VO based mobile agent computation model is a six-tuple where, R is the node set. S is the finite state set of the mobile agent. S does not include the state of the agent migration and the null state Here, migration state means that the mobile agent starts move to another node to executing new state; null state means the mobile agent does not perform any action (executing and migration), is the set of all the message operation states for mobile agent. is the state of sending message, is the state of receive message, is the initial node that the mobile agent has been produced, a mobile agent’s service

680

Y. Liu et al.

firstly comes from the node v, and then cycles driven by the finite states, is the set of final node for the mobile agent, only in the final node the mobile agent can be destroyed and the service ends, The transition relation, is a finite subset of where (1) (2) (3) (4)

To all the To all the To all the To all the

then then then then the next transition state relation is

In this computation model the migration state is established by the communication of the nodes in VO. From the definition of the computation model, all the migration, communication (message method) and executing remote have been regard as a state in computation model. There is commonly a communication invalidation problem [4] in traditional message passing mobile agent system, which is caused by the asynchronies of the mobile agent’ migration and the mobile agent’ message. After the mobile agent sends out its message, it moves to another nodes, and then there may be a problem that the return message cannot find the position of the original agent. Analogously, a broadcast based mode [5] for the mobile agent’s seeking and communication has the problems such as huge transmitted data, easy to be block in finite bandwidth, low reliability etc. In our VO based computation model, the communication has been treated as states, and a serious of rules have been defined to ensure the symmetry of the sending message operation and receiving message operation, so the communication process and the migration process can be a atomic operation. This will ensure the communication and migration to be sequence logic; there will not be the communication invalidation problem.

3.3 Transition Forecast Algorithm in MACM In VO based MACM, data and resource are unified, so a minimize distance transition policy can be implemented, which can decrease the migration cost much. The transition problem can be described as, in a node set when a mobile agent arrives at node which node will be the next migration position for this mobile agent. Here the algorithm is given as follow. Algorithm. Minimal distance transition forecast algorithm Step 1. The mobile agent serving in node finds a resource miss problem, which is say there is not enough resource for the service continuing. Step 2. Node broadcasts for the resource, only the node who has the resource that agent needs will reply this broadcast. Step 3. When receives the reply messages, there will be possible-transition node set, the node contained in possible-transition node set has the resource that the mobile agent needs. Calculating getting the minimal node distance the node is the next migration node. Step 4. Node move to node algorithm ends.

A Virtual Organization Based Mobile Agent Computation Model

681

4 Analysis China ministry of education began to carry on a resource-sharing project among universities of China from 1999. The aim of this CSCR (Computer Support Cooperation Research) project is to fully utilize the device, data distributing in each university. The most difficulty is the smart, high efficient, stable, reliable CSCR platform. We implement the MACM in the platform. We propose a concept of Service Availability to evaluate the performance of the MACM. Service Availability (SrvAvl) is the ratio of the agent’s service time in node with the general time (service time and the migration time) of the mobile agent.

Here, is the service time in nodes, is the migration time of mobile agent, from the equation, we can conclude that decrease the migration time can efficiently increase the service availability when the service time keep stable. In our MACM, the migration time can calculate by the following equation.

is the size of the mobile agent; is the transfer velocity from node to node of section q. commonly, we use an average transfer velocity, B, between and the relation between the mobile agent’s size, migration times and the service availability show in figure 1.

Fig. 1. Relation between the agent size, migration times and service availability, here the red curve presents the smallest size of agent with M, blue curve presents the middle size of agent with 2M, black curve presents the biggest size of agent with 3M.

From figure 1, when the mobile agents have different size, the corresponding service availabilities distinguish. The service availability of the smaller size agent is higher than the bigger size agent. So decreasing the size of mobile agent can greatly increase the system performance.

682

Y. Liu et al.

5 Conclusion In this paper, we propose a virtual organization based mobile agent computation model to solving the problems in traditional mobile agent system. This model integrates the advantages of the strong-migration agent and the weak-migration agent, so that it can provide a more intelligence and robust mobile agent computation model which can avoid much frequently data transmit. With the aid of the virtual organization architecture, this computation model can effectually avoid the communication invalidation problem. However, this model needs high performance of the key node and the average bandwidth of the kernel group will affect the system capabilities greatly. Acknowledgement. This paper is supported by the projects of Zhejiang Provincial Natural Science Foundation of China (No. 602045, and No. 601110), and it is also supported by the advanced research project sponsored by China Defense Ministry & Education Ministry.

References 1. Tao, Xian-ping, Liu, Jian, and et al. Mobile agent: a kind of future distributed computation model. Computer Science, 1999, 26(2): 1-6. 2. Huang Lican, Wu Zhaohui, Pan Yunhe. Virtual and Dynamic Hierarchical Architecture for E-Science Grid. International Journal of High Performance Computing Applications, Volume 17 Issue 3- August 2003. 3. I. Foster, C. Kesselman, S. Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, 15(3), 2001. 4. Tao, Xian-ping, Feng, Xin-yu, Li, Xin, and et al. Communication mechanism in Mogent system. Journal of Software, 2000, 11(8):1060-1065. 5. Murphy, A., and Picco, G.P. Reliable communication for highly mobile agents. In: Proceedings of Agent Systems and Architectures/Mobile Agents (ASA /MA)’99, CA, USA, 1999. pp.141-150. 6. Gray, R.S. Agent TCL: a transportable agent system. Proceedings of International Conference on Information and Knowledge Management (CIKM’95). Workshop Intelligent Information Agents, Dec., 1995. 7. Lange, D., and Oshima, M. Programming Mobile Agents in Java - with the Java Aglet API. http://www.cis.upenn.edu/~bcpierce/courses/629/papers/AgletsBook-index.html

Modeling Distributed Algorithm Using B Shengrong Zou Department of Computer Science and Technology,Yangzhou University, Yangzhou 225009,China [email protected]

Abstract. Although there have been several attempts to create grid systems, there is no clear definition for grids. In this paper, a formal approach is presented for defining elementary functionalities of distributed systems. We shall illustrate the use of a certain formal technical for developing distributed algorithms.This technique uses a so-called “event driven” approach together with the B-Method. The resulting general machines for distributed system can serve as a framework for defining new systems or analyzing existing ones.

1 Introduction Design and operation of large systems is becoming increaseingly complex.The interaction of cooperation and competition relationships leads to subtle and even paradoxical behaviours. Therefore formal methods are increasingly required in engineering practice.this is particularly true for performance evaluation, a natural starting point for the design and construction of large and complex systems. B[1], Z [2] and VDM[3] are formal methods based on the construction of models (as opposed to those based on an algebraic approach like LARCH, OBJ, ASM[4]). B is the most recent of the three notations. B is also a development method covering all the steps from the specification to the code. B is based on an explicit axiomatic of the type set theory. B contains a structuring mechanism (composition/ decomposition) which is the abstract machine (transformed during the development into refinement and then implementation). The development method is based on mathematical theories that are fully stated: the theory of generalised substitutions, the theory of refinement, the theory of layered architecture for software. The definition of the system dynamics is not done by means of pre-post conditions but by the generalisation of the notion of substitution based on the theory of predicate transformers. The critical advantages of B,enabling its effective uptake and use within industry are : The relatively simple and familiar notation (generalised substitutions) used to specify state transformations. The uniform use of this from specification to code reduces the cost of learning the notation, and the possibility of semantic errors through translations;

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 683–689, 2004. © Springer-Verlag Berlin Heidelberg 2004

684

S. Zou

constructs for supporting modularity in specification and implementation, allowing decomposition of the task of verification and specification into more feasible subtasks. The unusual nature of these constructs may be an initial problem for those familiar with other specification languages, but represent no greater learning difficulty than the structuring facilities of Ada or C++; the existence of robust tool B-toolkit support for all the stages of the software development lifecycle, including animation and document production. This collection of facilities is not currently offered for any other formal method; the successful application of the method and language to large industrial system, in a range of technical areas: real-time, simulation, information processing and engineering. It is commonly accepted that through the advent of high speed network technology, high performance applications and unconventional applications emerge by sharing geographically distributed resources in a well controlled, secure and mutually fair way. Such a coordinated large scale virtual pool of resources requires an infrastructure called a grid. Although the motivations and goals for grids are obvious, there is no clear definition for a grid system. The grid is a framework for “flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources’’[5] , “a single seamless computational environment in which cycles, communication, and data are shared, and in which the workstation across the continent is no less than one down the hall”[6], “a wide area environment that transparently consists of workstations, personal computers, graphic rendering engines, supercomputers and non-traditional devices: e.g., TVs, toasters, etc.” [7] ,“a collection of geographically separated resources (people, computers, instruments, databases) connected by a high speed network a software layer, often called middleware,which transforms a collection of independent rescources into a single, coherent, virtual machine”,with varying degrees of precision these definitions describe the central notion of grid system, yet, they are unable to clearly distinguish between grid systems and conventional distributed systems.In this paper we try to give a model for conventional distributed system using B .

2 One Model of Distributed System Here we introduce a distributed system, it have a possibly large (but finite) number of agents. These agents are disposed on different sites that are in communication with each others by means of unidirectional channels forming a ring [8] [9]. Each agent is thus able to send messages to its “right” neighbour and receive ones from its “left” neighbour. Such messages are not supposed to be transmitted immediately from one node to the next. In fact,we suppose that they can be “buffered” between the two,and also reordered or duplicated. Moreover each agent is supposed to execute the same piece of code. The distributed execution of all these programs should result in a unique agent being “selected the leader”. This decision, based on certain local criteria, should be made by the winning agent itself. Of course, it must be

Modeling Distributed Algorithm Using B

685

proved that no other agent can reach the same conclusion. The determination of such a privileged agent might be useful when the ring is started or re-initiated.

Since every agent executes the same code,the problem seems to be unsolvable:what kind of distinction between them could indeed introduce a certain difference in their,otherwise homogeneous,behaviour? Their position in the ring is certainly not such a distinction, since the very shape of the ring does not give the position of an agent any special distinction (no first, no last, no medium position, etc). In fact,the only attribute that makes one agent different from the others is its name:the agents are indeed supposedly named and named differently. But by itself,this difference in names still is an homogeneous property:there is, a priori, no “more” distinction than the distinction itself.

3 Modeling Distributed System Using B Then we can give some machines for a distributed system. The model presented here is a distributed multi-agent system [10][11][12] where agents are processes. The Self function represented here as p allows an agent to identify itself among other agents. It is interpreted differently by different agents. The following machines constitute a module, i.e. a single-agent program that is executed by each agent. Machine 1: Map The working cycle of a distributed system is based on the notion of a pool of computational nodes. Therefore, first all processes must be mapped to a node chosen from the pool. Other machines cannot work until the process is mapped.

686

S. Zou

Note the declarative style of the description: it is not specified how the appropriate node is selected, any of the nodes where the conditions are true can be chosen.The selection may be done by the user, prescribed in the program text or it can be up to a scheduler or a load balancer layer, but at this level of abstraction it is irrelevant. Actually , the conditions listed here (login access and the presence of the binary code) are the absolute minimal conditions and in a real application there may be others with respect to the performance of the node, the actual load, user’s priority and so on. Machine 2: Resource grant Once a process has been mapped, and there are pending requests for resources, they can be satisfied if the requested resource is on the same node as the process. If a specific type of resource is required by the process, it is the responsibility of the programmer or user to find a mapping where the resource is local with respect to the process. Furthermore, if a user can login to a node, she is authorized to use all resources belonging to or attached to the node: where BelongsTo (r ,n) = true . Therefore, at this level of abstraction it is assumed realistically that resources are available or will be available within a limited time period. The model does not incorporate information as to whether the resource is shared or exclusive.

Machine 3: State transition If all the resource requests have been satisfied and there is no pending communication, the process can enter the running state.

Modeling Distributed Algorithm Using B

687

The running state means that the process is performing activities prescribed by the task. This model is aimed at formalizing the model of distributed execution and not the semantics of a given application. Machine 4: Resource request During the execution of the task ,events can occur represented by the external event function. The event in this machine represents the case when the process needs additional resources during its work. In this case process enters the waiting state and the request relation is raised for every resource in the reslist.

Machine 5: Send (communication) Processes of a distributed application interact with each other via message passing. Although, in modern programming environments there are higher level constructs, virtual object spaces, etc., and sophisticated message passing libraries like MPI provide a rich set of various communication patterns for virtually any kind of data, at the low level they are all based on some form of send and receive communication primitives. This model restricts its scope to (blocking and nonblocking versions of) message passing. In the following, code fragments for blocking versions are bracketed,and are supplementary to the nonblocking code. Upon encountering a send instruction during the execution of the task, a new message is created with the appropriate sender and receiver information. If it is a blocking send and the communication partner p is not waiting for this message, the process goes to the waiting state and expects p to receive.

Machine 6: Receive( communication) Normally receive procedures explicitly specify the source process for expected messages. However, message passing systems must be able to handle indeterminacy, i.e. in some situations there is no way to specify the order in which messages are accepted. If the task reaches the receive instruction and there exists a message that can be accepted, it is removed from the universe MESSAGE and the process resumes its work.

688

S. Zou

MESSAGE ( msg ): = false means that msg is not part of the MESSAGE universe anymore. It is assumed that the content of the message is in possession of the recipient. The concept of message is like a container: the information held by the sender is transformed into a message and the message exists until the receiver extracts the information. The actual handling of the message (queued,buffered or transmitted) is up to the lower levels of abstraction. If the expected message does not exist and the operation is a blocking call, the process goes into the receive waiting state and updates the expecting function.

Machine 7: Termination This machine represents the event of termination.PROCESS ( p ) := false means that process p is removed from universe PROCESS : it does not exist anymore.

4 Conclusions The outcome of our analysis is a highly abstract declarative model. The model is declarative in the sense that it does not specify how to realize or decompose a given functionality , but rather what it must provide. Without any restriction on the actual implementation, if a certain distributed environment conforms to the definition, i.e. it provides the necessary functionalities, it can be termed a distributed system.In this paper the most elementary and inevitable services are defined. It is a minimal set: without them no application can be executed under assumptions made for distributed system although a number of applications may also require additional services. Our model adopts an architectural/system developer’s point of view. The resulting formal model can be applied in several ways. First, it enables checking or comparing existing system to determine if they provide the necessary functionalities. Furthermore it can serve as a basis for high level specification of a new system or components or for modification of an existing one.Finally, the model is also useful in reasoning about the properties of grids[13].

Modeling Distributed Algorithm Using B

689

References 1. 2.

3.

4.

5. 6.

7.

8. 9.

10.

11. 12.

13.

Kevin Lano: The B language and Method. Springer(1996) Wang Yunfeng , Li Bixin , Pang Jun , Zha Ming , Zheng Guoliang : A Formal Software Development Approach Based on COOZ and Refinement Calculus.31st International Conference on Technology of Object-Oriented Language and Systems.IEEE press(1999) Satpathy, M., Snook, C., Harrison, R., Butler, M., Krause, P.:A Comparative Study of Formal and Informal Specification through an Industrial Case Study.Proc IEEE/IFIP Workshop on Formal Specification of Computer Based Systems(2001) Egon Borger: High Level System Design and Analysis using Abstract State Machines(ASM). In: Hutter, D. (eds.): Current Trends in Applied Formal Methods (FMT rends 98). Lecture Notes in Computer Science, Vol.1641. Springer( 1999) 1-43 Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid. International Journal of Supercomputer Applications( 2001) Grimshaw, A.S., Wulf, W.A., French, J.C., Weaver, A.C., Reynolds, P.F.: Legion: The Next Logical Step Toward a Nation wide Virtual Computer. Technical report No. CS-9421(1994) Grimshaw, A.S., Wulf,W.A.: Legion - A View From 50,000 Feet. Proceedings of the Fifth IEEE International Symposium on High Performance Distributed Computing. IEEE Computer Society Press, Los Alamitos, California(1996) Abrial,J.R. Extending B Without Changing it for Developing Distributed Systems.In: Iiabrias (eds.): Conference on the B-Method(1996) Abrial J.R., Mussat, L.: Introducing Dynamic constraints in B. In: Bert, D. (eds.): B’98:Recent Advances in the Development and Uses of the B-Method. LNCS vol 1393. (1998) Czajkowski, K., Fitzgerald, S., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. Proc. 10th IEEE International Symposium on HighPerformance Distributed Computing (HPDC-10). IEEE Press (2001) Foster, I., Kesselman, C.: The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann Publishers (1999) Geist, A., Beguelin, A., Dongarra, J., Jiang, W., Manchek, B., Sunderam, V.: PVM: Parallel Virtual Machine - A User’s Guide and Tutorial for Network Parallel Computing. MIT Press, Cambridge (1994) Zsolt Németh, Vaidy Sunderam: A Formal Framework for Defining Grid Systems.2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (2002)

Multiple Viewpoints Based Ontology Integration Kai Zhang, Yunfa Hu, and Yu Wang

Department of Computing and Information Technology, Fudan University, Shanghai, 200433, P.R.China {011021381, yfhu, 011021395}@fudan.edu.cn

Abstract. Ontology integration is a focus on ontology application field. Ontology can be viewed as a kind of software product. Ontology integration needs to be directed by methodology. In many applications, we need to integrate existed ontologies for a unified ontology for application requirements. The ontology to be integrated can be viewed as a viewpoint of the unified ontology. A multipleviewpoints-based ontology integration approach is introduced by multiple viewpoints theory in requirement engineering. We define ontology viewpoint by characters of ontology and use conceptual graph to represent semantic in ontology. We discuss the inconsistency checking inner ontology viewpoint and among ontology viewpoints. At last, we use a concept lattice to construct the concept hierarchy.

1 Introduction Recently ontology is used widely in many kinds of data integration applications as a powerful tool for knowledge share. Defined as “an explicit specification of a conceptualization” [ 1], ontology unifies all the concepts and relations of domain. In many practice applications, we often need to integrate some existed ontologies to be a unified ontology. The target ontology (the integrated unified ontology, we call it “target ontology” below) commonly constructed for special ontology application. So we should consider the application requirements in ontology integration procedure. There is not an explicit formalism definition of ontology so far. In the paper, we define an ontology a tuple O= (E,R,F, A,I) , where E is classes in the ontology which are usually organized in taxonomies, R is relations which represent a type of interaction between concepts of the domain, F is functions, A is axioms which are used to model sentences that are always true, I is Instances which are used to represent specific elements. If we use to represent ontologies to be integrated, to represent target ontology, K to represent knowledge used in ontology integration, to represent an ontology integration system, then ontology integration procedure can be formalized as below:

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 690–693, 2004. © Springer-Verlag Berlin Heidelberg 2004

Multiple Viewpoints Based Ontology Integration

691

In the paper, we induced multiple viewpoints theory of requirements engineering to ontology integration. The aim of doing this is for research on ontology integration from methodology angle. We focus on the inconsistency checking and construction of concepts hierarchy in ontology integration in our work. In our approach, we use conceptual graphs as ontology knowledge representation tools. The remainder of the paper is as follows. In section 2 we discuss how to integrate ontologies based on multiple viewpoints theory in detail, the emphasis is inconsistency checking, management strategy among ontologies and concepts hierarchy construction. In section 3 we compare related work and do some conclusions.

2 Ontology Integration Procedure 2.1 Ontology Viewpoint Every ontology to be integrated can be viewed as target ontology’s one viewpoint. To distinguish from the concept of viewpoint in requirements engineering, we call the viewpoint corresponding to ontology as ontology viewpoint. We can formalize ontology viewpoint as below. Definition 1 (ontology viewpoint). An ontology viewpoint is a tuple P is the name of ontology viewpoint. C is the concepts set in P. The element in R is a here r is the relation between and is a set of first-order predication formulas. It represents the constraints in P. is a set of first-order predication formulas. It represents the relation between P and other viewpoints. M is a set of first-order predication formulas. It represents the relation between concepts in P and concepts in other viewpoints. In definition 1, C and R form the conceptual graphs in is often used for inconsistency check in viewpoint. It corresponds to the axioms in ontology.

2.2 Inconsistency Checking and Management Different ontology created by different organization. There must be semantic inconsistency. As in multiple viewpoints theory in requirements engineering, checking and managing inconsistency between viewpoints and inconsistency in viewpoint is also important for our research. For an ontology viewpoint, the axioms in ontology are the constraints of concepts and relations in ontology. So the axioms can be used for inconsistency checking in ontology viewpoint. To check inconsistency in ontology viewpoint, we have definition below. Definition 2 (inconsistency in ontology viewpoint). For an ontology viewpoint P, if then we call there exists inconsistency in P. Inconsistency among ontology viewpoints are mainly structure inconsistency. We define it below.

692

K. Zhang, Y. Hu, and Y. Wang

Definition 3 (structure inconsistency). Compare to every ontology viewpoint’s structure, if there exists inconsistency, then this kind of inconsistency is called structure inconsistency. To cancel and manage inconsistency, we adjust relations between concepts in ontology viewpoint. Adjusting relations base on two kinds of operations below: 1) delete(r) ; Delete inconsistent relation r . 2) rewrite(r): Delete inconsistent relation r, at the same time create other relation to keep consistence.

2.3 Unite Conceptual Graphs by Concept Lattice We need to get target ontology from all ontology viewpoints. The base of target ontology is the conceptual graphs in every ontology viewpoint. But target ontology does not unite conceptual graphs casually. We need to generate the conceptual graphs of target ontology based on concept hierarchy in all conceptual graphs. We let the set of conceptual graphs in all the ontology viewpoints be a formal context, the aim doing this is to get a concept lattice[2] to represent concept hierarchy. In the formal context, all the classes in ontology viewpoints can be viewed as set of instances, all the attributes of these classes can be viewed as attributes of instance. The aim of constructing concept lattice is to rewrite conceptual graphs, figure 1 shows an instance of rewriting conceptual graphs.

Fig.1. Conceptual graph rewriting

In figure 1, concept A in graph 1 and concept B in graph 2 both are sub-concept of C in concept lattice. The common ground of A and B is in their own conceptual graphs they have the same concepts and relations associated with them. We use C (the least upper bound of A and B in concept lattice) replace A and B in graph 1 and 2, at the same time creates the relation which between A, B and C. The conceptual graphs rewritten cancels redundance and clarifies concept hierarchy. Though we obtain the conceptual graphs of target ontology, the work of ontology integration is not complete. The conceptual graphs still needs to be evaluated and modified by human experts (add validated association assertions). When this step completes, ontology integration completes.

Multiple Viewpoints Based Ontology Integration

693

3 Related Work and Conclusions Some researcher do much work in ontology integration field, the representational work can be classified as two classes: syntactical-level integration and concept-level integration. Omelayenko in reference [3] uses syntactical heuristic rules to complete ontology integration work in e-business. Stumme in reference [4] introduce the FCAMERGE approach which is bottom-up, FCA-MERGE merge two ontologies by formal concept analysis. These approaches above are mainly at the angle of technology. They seldom consider the methodology of ontology integration. In the paper, we view an ontology to be integrated as a viewpoint of target ontology. We use multiple viewpoints theory in requirements engineering in ontology integration. The aim is to do some research for methodology of ontology integration. The practice proves that our research is valuable.

References 1. Gruber, T. R.: A translation approach to portable ontology specifications. Knowledge Acquisition, Vol. 5. 1993. 2. Rudolf, W.: Concept lattices and conceptual knowledge systems. Computers and Mathematics with Applications, 23(6-9):493-515, 1992 3. Omelayenko, B.: Syntactic-Level Ontology Integration Rules for E-commerce. Proceedings of The 14th International FLAIRS Conference (FLAIRS-2001), May 2001. 4. Stumme, G., Maedche, A.: FCA-MERGE: Bottom-Up Merging of Ontologies. In IJCAI, pp. 225–234,2001.

Automated Detection of Design Patterns Zhixiang Zhang and Qinghua Li Department of Computer Science Huazhong University of Science and Technology Wuhan, Hubei Province, China 430033 [email protected]

Abstract. Detection of instances of design patterns is useful for the software maintenance. This paper proposes a new framework for the automated detection of instances of design patterns. The framework uses a reengineering tool to analyze C++ source codes. Prolog is used to induce instances of design patterns, the elemental design patterns are also used as a intermediate results for the final target (design patterns). Two-phrased query makes the discovery process more efficient.

1 Introduction A pattern provides knowledge about the role of each class within the pattern, the reason for certain relationships among pattern constituents and/or the remaining parts of a system. Consequently, in maintenance, the identification of design pattern instances provides insight on software artifact structure and reveals places where changes, reuse, or extensions are expected. Moreover, design patterns can also give some indications to managers about the quality of the overall system. Design pattern is a relatively young field, few works in program understanding and reverse engineering have addressed design pattern detection[1][2] [3] [4] [5]. Most of them used only structural information about the system, or can only recognize several patterns. This paper uses a new representation of structural and behavioral features of design patterns, designs a system for the automated detection of design pattern instances, and discusses the related techniques.

2 System Frameworks The automated system for design pattern instances adopts three techniques: 1. Use Prolog to induce instances of design patterns. The Prolog facts represent the structural information between elements in a design patterns; A design pattern is represented by one or several Prolog rules. 2. Use the elemental design patterns (EDPs)[6] as an intermediate results for the final target (design patterns). These EDPs capture the elemental components of objectM. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 694–697, 2004. © Springer-Verlag Berlin Heidelberg 2004

Automated Detection of Design Patterns

695

oriented language, and the salient relationship used in the vast majority of software engineering. Because of the calling information amongst classes and methods of classes, EDPs are considered the most completed method to formalization design patterns. Most patterns are composed of several EDPs. 3. A reengineering tool called Columbus Schema for C++ was used to convert C++ source codes to intermediate representation of classes and the high level structural relationship between classes. Fig 1 shows the framework of the automated system for design pattern instances detection. The output of the Columbus Schema for C+ + was converted to the design description files. The automated detection of design pattern consists three steps: 1. Design Prolog rules based on the structural features of the design patterns; 2. Convert specific source codes into structural information files, convert these files into Prolog facts; 3. Find instances of design patterns by Prolog queries, and commit the results to users in some ways.

Fig. 1. The pattern detection framework

3 Implementation 3.1 Elemental Design Pattern As the number of facts in Prolog describing the structure of a large software system becomes larger, the efficiency of searching may become lower. So we need a certain level of abstraction about the structure s of design patterns. Here we use the EDPs as intermediate search results. For example, Fig 2 shows that the Decorator pattern is composed of Object Recursion and ExtendMethod EDPs. If we can identify an Object Recursion EDP instance and an ExtendMethod EDP instance, we can get a Decorator pattern.

696

Z. Zhang and Q. Li

3.2 Facts and Rules The structural relationship can be mapped into the Prolog facts. Because variables in Prolog cannot contain the “.”, the representation of method in a class should be converted. For example the “Decorator.Draw” is converted to “ Decorator_Draw”.

Fig. 2. Decorator pattern composed by Object Recursion and ExtendMethod EDPs

Instance of design pattern and Elemental Design pattern are represented by an unique identifier and a set of participants: For example, pattern(FactoryMethod, Creator, Product, CCreator, CProduct, FacMethod) represents a FactoryMethod pattern consisting of Creator, Product, CCreator and CProduct, where FacMethod is the factory method. The rule corresponding to the ObjectRecursion EDP is: edpPattern(object Recursion,Handler,Recurser,Terminator,Initiator):-

Automated Detection of Design Patterns

697

According to the relations amongst between ObjectRecursion, ExtendMethod and Decorator, we can redefine the rule for query of Decorator pattern based on the Object Recursion and ExtendMethod EDPs:

4 Conclusion This paper implemented a automated system for design pattern instances. The Prolog rules for each patterns and EDPs gather the features required to diagnose a pattern instance. As described above, the process composed of two phrases. The benifits of 2-phased recognition are : 1. EDPs themselves are the core primitives that underlie the construction of patterns in general. They are useful for the understanding of the code; 2. As intermediate results(facts), EDPs can reduce the cost greatly during the Prolog query, which can significantly advance the efficiency. Further work could also try to study the formalization method of design patterns to precisely describe the design patterns. The more precisely the design patterns are described, the higher precision and recall can be achieved.

References 1. Christian Kramer , Lutz Prechelt, “Design Recovery by Automated Search for Structural Design Patterns in Object-Oriented Software”. In:International Workshop on Program Comprehension, pp. 208-215 2. Rudolf K. Keller Reinhard Schauer Sébastien Robitaille Patrick Pagé. “Pattern-Based Reverse-Engineering of Design Components”. In Proceedings of International Conference on Software Engineering (ICSE’99), Los Angeles, USA, May 1999. 3. G. Antoniol, G.Casazza “Object-oriented design patterns recovery” The Journal of Systems and Software 59 (2001)181-196 4. Jochen Seemann , Jürgen Wolff von Gudenberg, “Pattern-based design recovery of Java software”, ACM SIGSOFT Software Engineering Notes, v.23 n.6, p. 10-16, Nov. 1998. 5. Antoniol, G. Casazza, M. Di Penta, R. Fiutem - “Object-Oriented Design Patterns Recovery”, Journal of Systems and Software n.59,p181-196 (2001). 6. Jason McC. Smith and David Stotts, “Elemental Design Patterns: A Logical Inference System and Theorem Prover Support for Flexible Discovery of Design Patterns”, Technical Report TR02-038 Department of Computer Science Univ. of North Carolina at Chapel Hill Sep. 2002.

Research on the Financial Information Grid Jiyue Wen1,2 and Guiran Chang1 1

College of Information Science and Engineering, Northeastern University Shenyang, Liaoning 110004, China [email protected] 2

No. 208 Yanan Three Road Qingdao, Shandong 266071, China [email protected]

Abstract. An Information Grid uses grid technologies to achieve the sharing, management, and service providing of information resources. In this article, the demand for grid technology by the financial industry is analyzed and the architecture of a financial information grid is proposed.

1 Introduction The financial industry can use grid technology to integrate financial information, to operate in a multi-channel manner, to strengthen the supervision and management, to guard against and solve financial risks, and to improve the efficiency and quality of financial services. Section 2 of this article analyzes the demand of financial industry for grid technology. Section 3 discusses the architectural idea of a financial information grid, considering the heterogeneity in the financial industry. Section 4 presents implementation proposal and plan based on the real conditions of financial industry today.

2 The Demand for Grid Technology by the Financial Industry Information need of financial industry come from inside and outside. Inside information need is related closely to the daily operations and management of financial businesses. The contents include credit loan information, deposit information, settlement information, savings information, international business information, trust business information, and leasing business information. The outside information need is related to the integrated management and decision-making of financial businesses. The contents include macroeconomic information, financial policy information, industrial policy information, domestic and abroad financial organization information, financial risk prevention information, and other financial computerizing information. Against the demand and management for the above two kinds of financial information, banks have constructed their own intranets. However, these networks run separately on different architectures and operating systems. Between the central

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 698–701, 2004. © Springer-Verlag Berlin Heidelberg 2004

Research on the Financial Information Grid

699

bank and the commercial banks, there is no interconnection. The central bank gets data reports of the commercial banks passively. So the non-spot data are not timely, authenticated, and reliable. This leads to the afterwards supervision by the central bank and the afterwards monitoring by the commercial banks. Thus the traditional model for financial information statistics and audit and the spot and non-spot checking do not meet the requirement to the monitoring of the operational legality or risk prevention of commercial banks. The central bank must find a new collateral supervision system allowing it to log on the computer systems of the commercial banks to actively collect the latest original operation data. In this way, the central bank will know the general operations of various banks timely and correctly, be easy to find operational neglects of those banks, and be able to make objective and feasible financial policies to guide the financial industry to run legally. One solution is to use the grid technology.

3 Design Idea of the Architecture of a Financial Information Grid Globus is a platform for the construction of grid infrastructure, which serves as a grid operating system and undertakes the task of administrating grid resources. Owing to the complexity of commercial bank systems, the customization of grid function becomes exceptionally difficult and complicated. To solve this problem, a Financial Grid Middleware (FGM) can be built between Globus and the application programs, which is a distributed heterogeneous computing environment to help the program transformation between commercial banks. FGM is built as a middleware connecting both the application programs and the grid infrastructure, just like Cactus. FGM is not created on Globus, but uses Globus as a branch of the main body of the financial network system to let the application programs with MPI (Message Passing Interface) run on the grid unmodified. This is shown in Fig. 1. The first layer of this architecture is the fabric layer whose basic function is to control local resources and to provide to upper layers with interfaces for accessing these resources, which broadly include calculate resources, memory resources, network resources, data and information resources, etc. It can be a host computer, a preprocessing computer, or the whole computer cluster of a certain commercial bank. The second layer is Globus layer.Its grid service functions serve as a grid operating system to solve problems such as resource discovery, validation, reservation, and the management of memory, communication or safety. It allocates resources through DUROC (Dynamically Updated Request Online Co-allocator) and provides parallel programming interface for heterogeneous grid environment by MPICH-G2. The third layer is FGM, including financial network trunk and branches. The trunk is the basis of FGM, which provides a group of API for the branches to be linked dynamically in a plug and play manner and coordinates and controls data transfer and program execution between branches. A branches here is a set of software modules or some subprograms written in C, C++, Fortran, or Java language for abstract virtual computers without considering the complicated grid environment. It can be easily

700

J. Wen and G. Chang

Fig. 1. The architecture of a financial finormation grid. FA: Financial Application AP: Application Program, L-DC: Long-Distance Control, WI: Web Interface, I/O PK: I/O Package, FH-C: Financial Host-Computer.

ported into existing application programs of the financial institutions. The branches can be divided into two types: application branches and grid branches. The application branches can be used for account checking, information index, statistics, or supervision criterion function, etc. The grid branches provides functions for parallel computation, I/O, Web interface, high-performance communication, and data mapping, supporting the applications. The fourth layer is the application layer on which users can operate directly. The users can be counter clerks, post monitors, statisticians, policy decision-makers, or spot and non-spot auditors. They can get any results obtained from inquiry, checking, or calculation only by telling the terminal what they want to do, such as the data for making statistics, the items for checking, the supervision index for calculation, or the precision and range of answers, etc. They do not need to consider the heterogeneous environment, the resources being dynamically coordinated, or the optimization algorithms used.

4 Suggestions to the Implementation of a Financial Information Grid Considering the complexity of the financial institutions, the heterogeneity of the network systems, the information need of the financial industry, the incomplete support of current technology, and the future goal, the application of the grid technology can start from the auditing, reporting, and supervision. The development should not affect the normal operation of the commercial banks and should be carried out step by step. Under the present networking condition, the central bank can set up financial grid data processing center to optimize system performance, improve application

Research on the Financial Information Grid

701

environment, administrate the equipment resources and information resources of all the banks, and provide timely and accurate financial information for statistics, monitoring, and auditing. The collateral supervising system makes it possible for special banks to monitor the operation flow of their front offices, to eliminate operative holes, and to avoid factitious risks. The second is to set up broadband networks. The broadband network systems are essential for the grid environment to provide high performance communications. The high quality broadband networks support “connect and play” of computing capacity and information gathering. It also provides users with non-delay and high reliable communication services. The Grid may take advantage of the current automatic banking systems of various banks, the satellite-terrestrial communication system and the transfer system of the central bank, increasing the bandwidth of these systems, improving the communication ability, and ensuring the inter-operability of grid resources. The third is to set up Financial Certification Authority. The security of grid applications depends on digital identity certification. The grid issues an X.509 certificate for each of the statisticians, supervisors, and auditors whose digital signature is required whenever they request to log on. Once an applicant logs on, he or she can access all the authorized resources. The leading bank may be The People’s Bank of China who organizes the commercial banks to establish a national authorized financial certificate agency, China Financial Certification Authority (CFCA), to be responsible for the digital identification on financial grid. The ITU X.509V3 standard is used and the international specification for grid data signature is followed. The grid management software is the key element of financial information grid service. The core techniques include integrated information platform (single system image), semantic web, intelligent agent, and ontology, etc. The grid operating system can be applied to the preprocessing computers with their heterogeneous financial information report systems, and the application programs are upgraded to grid programs. Thus the features such as financial information statistics, supervision, and real time sharing can be realized.

References 1. Zhihui Du, Yu Chen, Peng Liu: Grid Computing. Tsinghua University Press (2002) 2. Jin Chen: E-Commerce – Finance and Safety. Tsinghua University Press (2000) 3. The Research and Liaison Group of Financial E-Banking System: E-Commerce – Safety Certificate and Settlement on Internet. People’s Press (2000) 4. Globus Grid Computing Theory and Applications. www.gridcn.net 5. Financial computerizing. Issues 4, 5, and 6, 2002 6. Lan Foster, Carl Kesselman, Steven Tuecke: The Anatomy of the Grid – Enabling Scalable Virtual Organizations. Int. J. Supercomp. Appl. and High Performance Computing (2001) 7. Ian Foster, Carl Kesselman, Jeffrey Nick, Steven Tuecke: The Physiology of the Grid: Open Grid Service Architecture for Distributed Systems Integration. GGF4 (2002)

RCACM: Role-Based Context-Awareness Coordination Model for Mobile Agent Applications* Xinhuai Tang, Yaying Zhang, and Jinyuan You Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200030, P.R.China {tang-xh, zhang-yy, you-jy }@cs,sjtu.edu.cn

Abstract. In this paper, we present an RCACM coordination model for mobile computing applications based on mobile agents. The key idea in RCACM is the role-based context-awareness hierarchical coordination model. In this model, local programmable reactive tuple space is introduced to address the contextaware coordination problems and hierarchical distributed tuples can make agents dynamically acquire information about resource location and availability according to their permissions; role mechanism is adopted for access control to prevent unauthorized access.

1 Introduction In mobile computing systems, an application may be composed of several mobile agents that cooperatively perform a task[1]. Multiple mobile agents are in need of coordinating their activities with each other, and also accessing resources on hosting execution environments. Furthermore, when an agent transfers to a new environment, the interaction information it accesses and the outside world it perceived might have changed. For an agent, its execution result on one site may be different from the execution result on other site because of the different execution environment. So the migration of mobile agent introduced context-aware coordination issues [2]. Generally, coordination technologies concern with enabling interaction among agents and help them cooperate with each other [3]. However access control should also be considered to constrain interaction to ensure data privacy and integrity, especially when agent mobility is introduced. At present the combination of coordination and access control remains an open problem in the design and implementation of mobile agent applications. This paper aims to present a role-based context-aware coordination model(RCACM) in mobile agent applications. We focus on contextawareness secure coordination, that is, the coordination of activities with context awareness due to agent mobility and with insurance of data integrity. *

This paper is supported by the Shanghai Science and Technology Development Foundation under Grant No. 03DZ15027 .

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 702–705, 2004. © Springer-Verlag Berlin Heidelberg 2004

RCACM: Role-Based Context-Awareness Coordination Model

703

2 Role-Based Context-Aware Coordination 2.1 Overview In order to overcome weakness and still making use of advantages Linda tuple space has, we rebuild the tuple space from passive data storage to reactive programmable coordination media, which can embody computing capabilities within the tuple space. As a consequence, the role-based context-aware coordination model(RCACM) in mobile agent applications is proposed, which extends classical Linda model. There is no global tuple space in the mobile agent system. Instead, multiple distributed tuple spaces are used for agent coordination. The model transfers the global coupling interactions to local uncoupling interactions. Agent interactions take place on the local execution environment on the destination site of migrating agent. Local space can be accessed in an asynchronous and anonymous way as in Linda. Therefore, interaction tuple space can act as a gateway for agent to access resources in the network. RCACM supports a coordination paradigm where agents can migrate from one computing environment to another to interact with each other. The architecture of rolebased context dependent coordination is as in Fig.1.

Fig. 1. Architecture of role-based context-awareness coordination

In RCACM, withoutglobal shared tuple space, multiple tuple spaces are distributed over the network to assist the coordination task. Reactive programmable tuple space is responsible for handling coordination activities, including environment-aware and application-aware coordination.

2.2 Role Mechanism and Reactions in Tuple Space Coordination and access control are two strictly related topics in open distributed applications. One can easily imagine malicious agents attempting to access private information or modify private data. So a server receiving interaction of external agent needs to impose some requirements to ensure no violation to security concerns of host

704

X. Tang, Y. Zhang, and J. You

server. Similarly, the external agent also needs to ensure that its execution at the host server site will not lose its integrity or security. A role can be defined as the behavior and the set of the capabilities expected for the agent that plays such roles. An agent might hold multiple roles. Every role has its corresponding capabilities. We use {read, take, write, execution, new,...} to indicate the set of agent operation abilities. Every element indicates the capability to execute the corresponding operation. Assigning roles to an agent means to associate with it capabilities that describes all the operations the agent intends to perform, while ignoring the specific location it will execute on. In RCACM, tuple space is programmable and reactive. Site manager at every site in the mobile agent system can implement and enforce application specific and local environment specific policies by programming the behaviors of tuple space. Operation behaviors can be associated with specific events.

2.3 Environment-A ware and Application-Aware Coordination In mobile agent applications, there are always both agent related application specific policy and environment specific policy. When a site in the network opens itself to external agents for execution, it must prevent malicious agents from damaging its data and resource. On the other side, when an agent migrates to a new site, it cannot predict the policy of the site[4]. If mobile agent needs to handle all unexpected policy related problems, this would certainly make it even more complex and make the system difficult to be scalable. So with programmable reactive tuple space, environmentawareness policies such as security policies can be integrated into the behavior of tuple space. Mobile agents can be under restrictions on the new site transparently. In RCACM, when an agent arrives at a site, it is bound to the tuple space on the destination site and can use it to coordinate itself with other agents and to access local resources. The local tuple space is implemented as Java objects. Every tuple is implemented as a Java object. To define the environment infrastructure, an administrator has to choose the roles the environment supports and define the application and environment policies to coordinate activities via the tuple space.

3 Conclusion Mobile agent system demonstrated great potential in designing and implementing complex distributed and concurrent software systems. It has involved in many applications such as e-commerce, remote information retrieval, remote diagnostic clinic and military war simulation. We propose a role-based context-awareness coordination

RCACM: Role-Based Context-Awareness Coordination Model

705

model, which is suitable for interactions between agents and between agents and environment in mobile agent systems. The model consists of three parts. (1) A role mechanism is presented for security concerns; (2) Global coupling interaction space has been changed to locally uncoupling interaction space to facilitate information access for mobile agents; (3) Programmable reactive tuple space is used to solve the problem introduced by context-awareness coordination. Environment-awareness and applicationawareness coordination policies can be integrated into the tuple space.

References 1. Gian Pietro Picco, Mobile Agents: An Introduction, Journal of Microprocessors and Microsystems, 25(2)(2001) 65-74. 2. Giacomo Cabri, Letizia Leonardi, Engineering Mobile Agent Applications via Contextdependent Coordination. IEEE transaction on software engineering 28(11) (2002) 10401056. 3. M. Cremonini, A. Omicini, F. Zambonelli, Coordination and Access Control in Open Distributed Agent Systems: The TuCSoN Approach, Proceedings of 4th International Conference on Coordination Languages and Model (COORDINATION 2000), LNCS 1906, Springer, Limassol, Cyprus, 2000, pp. 99-114. 4. Davide Rossi, Giacomo Cabri, Enrico Denti, Tuple-based technologies for coordination, In A. Omicini, F.Zambonelli, M. Klusch, R. Tolksdorf (Eds.), Coordination of Internet Agents, Springer, (2001)83-109

A Model for Locating Services in Grid Environment Erfan Shang, Zhihui Du, and Mei Chen Department of Computer Science and Technology, Tsinghua University, Beijing, 100084 [email protected]

Abstract. A model for locating services in grid is described here. It integrates the Grid framework OGSA (Open Grid Services Architecture) [1], using VO (Virtual Organization) [2] concept to divide logic grid services into different organizations based on its establishing purpose and requirement on resources sharing and services providing. The CARP hash-based information caching mechanism and a hierarchy message dissemination arithmetic are presented. A performance evaluation of the algorithm is analyzed theoretically.

1 Introduction We present a hierarchy message dissemination arithmetic and integrate web cache sharing technique with our service location mechanism. The establishment of each virtual organization has its purpose and requirement on resources sharing and services providing. In general, VO domain is a collection of services which have logically close localities and similar attributes. We need characterize the collection’s similarities by means of describing VO’s properties. The relation among ubiquitous virtual organizations in grid system is flat. We establish a distributed grid service information model and the performance evaluation of the information propagation arithmetic is analyzed theoretically. Our purpose is to combine CARP [3] protocol with service location mechanism. Some will associate the ICP [4] protocol with Gossip protocol [5], a typical multicast group communication protocol. The difference between them is that ICP protocol is a concentrative Gossip alike which disseminates messages only among proxies. CARP is a hash-based routing mechanism, through which the result is a deterministic location for all cached information. This mechanism is similar with Plaxton-based [6] distributed systems, including Tapestry [7], Pastry [8] etc.

2 Grid Service Locating Model Model uses Virtual Organization to divide grid services into different organizations logically. Each node knows partial services information in its virtual organization and a few services information of other organizations. Every kind of service set a Grid

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 706–709, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Model for Locating Services in Grid Environment

707

Service Identification (GSID). This ID is categorized by standard service taxonomy, in order to let similar service attributes map to close service taxonomy code number. There are two types of nodes in VO: general node and VO server. Servers are capable of employing for a long time stably and reliably. Every node knows the servers’ locations of its own VO. The model adopts a two-level hierarchy framework. First level is the proxy server based on CARP protocol and second level is hierarchy Gossip mechanism. So servers act as CARP proxy server, collect information of its own organization to maintain characteristics of its VO and exchanges VOs’ information between different VO servers. VO server is transparent to other VO’s general nodes. VO server maintains two kinds of information: 1) service location and description cached by means of CARP, 2) other VOs’ properties and other VO servers location. There are three types of information operations in our message protocol: 1) General nodes positively send local GSIDs and request times to their VO servers to maintain domain properties. Servers synchronize data among themselves periodically. 2) Information publication operation: general nodes propagate latest service information to other general nodes by Gossip protocol and to servers based on CARP. 3) Information request operation: it is described at Sect. 3.1.

3 System Architecture 3.1 Message Propagation Algorithm The model is a two-level hierarchy framework. The core of the algorithm is composed of CARP routing and hierarchy (dissemination between VO and within VO) Gossip message protocol. The request message propagation process is following: 1) A service request including GSID is sent to the service’s default VO server. 2) If qualified service information is found, go to 6). 3) If no qualified cache information, server searches in its local corresponding information, and forward request to other VOs’ server by Gossip mechanism. 4) Request disseminates in organizations with a TTL (Time-to-Live). If query hit qualified information, go to 6). 5) If time is out, no information hit, go to 6). 6) If qualified information is found, server returns the reservation handle to client. Or the “no qualified information is found” message answers to the requester. The propagation process stops.

3.2 VO Server Information Architecture VO server acts as the VO local representative and provides the caching storage. So the information architecture is an important issue for efficient routing and locating service. It contains three layers. The bottom layer, called local cache, maintains service information table including GSID, URL and service description items, according to the CARP routing. The second layer is Global Cache: servers share cache contents and coordinate replacement so that they appear as one unified cache with global LRU replacement to the users [9]. The top layer is the neighbor VOs’ information. It contains the service requests’ rate served by the local VO and the GSID range of this rate.

708

E. Shang, Z. Du, and M. Chen

4 The Theoretic Analysis of the Model To simplify the model, some assumptions are given. N is used to represent the grid node number, every virtual organization has equivalent M nodes. We consider that all nodes share the same number of services, and have the same service frequency. The number of distinct user requests is in the random distribution. In [10], if each node gossips exactly k-logN+C (N is the grid size and C is a constant) messages, then the probability that everyone gets the message goes to exp(exp(-c)). When the grid size is we set C=0, then every node multicast logN=16.118 messages to its neighbors. This should give 36.8% of covering propagation. Table 1 presents the load of message dissemination for node. The fanout parameter EX is the expectation of every node message dissemination in condition of domain management and hierarchy Gossip routing framework. The approximate expressions is

It can be seen that the expectation of every node load is about 62% or less of normal Gossip mechanism, when the VO size is change from 1000 to 20000. While the VO size M is bigger, the load of grid node is heavier.

Cache sharing mechanism is integrated in the following. Suppose that only one server per VO, the average cache hit ratio is H. According to some resent research [9], hit ratio H is about 30% in distributed system, which means that the expectation of every node message dissemination number will reduce to We analyze the average number of hops per request in our model. Hierarchy Gossip arithmetic divides participant nodes into domains so as to improve controlling hop of request. The CARP protocol with high probability implements one hop qualified process in some degree. It decreases hop number to (1-hit ratio)*(hop number).

5 Related Work, Conclusion, and Future Work Universal Description, Discovery and Integration (UDDI) [12] is a specification for distributed Web-based information registries of Web services. UDDI is actually a

A Model for Locating Services in Grid Environment

709

central registry. The information synchronization pattern in our VO server will refer to UDDI. Existing peer to peer substrates such as Tapestry [7] and Pastry [8] demonstrate distributed hash tables. CARP protocol is similar hash-based routing mechanism. The difference between them is that in Tapestry or Pastry, all nodes in system participate information storage, while in CARP proxies maintain the information, other nodes only provide services. Monitoring and Discovery Service (MDS) [13] is a prominent grid information service. It proposes that virtual organization is the basic logical information management unit. the drawback is MDS is also a central registry. We propose to integrate web cache sharing technique with domain service management (virtual organization) to improve the efficiency of locating service. We should pay more attention to standardization of service description and identification in grid environment, in order to let services which have similar attribute correspond to close service taxonomy number and give a reference to service provider. We will build an emulated gird to understand whether our model and mechanism is appropriate in terms of response time, response quality and scalability. We will consider more parameters to our experimental environment, such as logical distribution of services, distinct user request pattern, node number and service number.

References 1.

2. 3. 4. 5. 6. 7.

8. 9. 10. 11. 12. 13.

Foster, I., Kesselman, C., Nick, J. M., Tuecke, S.: The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Technical report, Argonne National Laboratory (2002) Foster, I., Kesselman, C., Tuecke, S.: The Anatomy of the Grid. Enabling Scalable Virtual Organizations. International Journal of Supercomputer Applications (2001) White Paper Cache Array Routing Protocol and Microsoft Proxy Server 2.0 (1997) ICP working group. National Lab for Applied Network Research. Kempe, D., Kleinberg J., Demers, A.: Spatial gossip and resource location protocols. Proc. 33rd ACM Symp. on Theory of Computing (2001) 163-172 Plaxton, C. G., Rajaraman, R., Richa, A. W.: Accessing nearby copies of replicated objects in a distributed environment. In Proceedings of ACM SPAA. ACM, June (1997) Zhao, B.Y., Kubiatowicz J. D., Joseph A. D.: Tapestry: An infrastructure for fault-resilient wide-area location and routing. Technical Report UCB//CSD-01-1141, U. C. Berkeley, April (2001) Druschel P., Rowstron A.: Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. Submission to ACM SIGCOMM (2001). Fan L., Cao P., Almeida J, et al: Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. Tech. Rep. 1361, February (1998) Kermarrec, A., Massoulie, L.: A Ganesh. Reliable probabilistic communication in largescale information dissemination systems. MSR-TR-2000-105 Iamnitchi A., Ripeanu, M., Foster I.: Locating data in (small-world?) peer-to-peer scientific collaborations. in 1st International Workshop on Peer-to-Peer Systems (2002) UDDI project. http://www.uddi.org Fitzgerald, K., Foster, I., Kesselman, C.: Grid Information Services for Distributed Resource Sharing. On High Performance Distributed Computing, IEEE Press (2001) 181-184

A Grid Service Based Model of Virtual Experiment* Liping Shen, Yonggang Fu, Ruimin Shen, and Minglu Li Department of Computer Science & Engineering, Shanghai Jiaotong Univ., HuaShan Rd. 1954#, Shanghai, 200030, China {lpshen,fyg,rmshen,[email protected]}

Abstract. There is increasing recognition of the need for laboratory experience that is through these experiences that students could deepen their understanding of the conceptual material, especially for the science and engineering courses. Virtual Experiment has advantages over physical laboratory at many aspects. Nowadays virtual experiments are mostly stand-alone applications without standard interface, which are difficult to reuse. In this paper, we propose a virtual experiment model based on novel grid service technology. We employ two-layered virtual experiment services to provide cheap and efficient distributed virtual experiment solution. This model could reuse not only virtual instruments but also compositive virtual experiments.

1 Introduction Virtual Experiment (VE) is powerful application software system which could provide students highly immersion and rich experience. It has many advantages over physical laboratory. It is a cost effective way to leverage expensive equipments and maintain physical laboratory, and provides concurrent on-line instruction, visualization, repeated practice and feedback breaking the geographical, lab space and time constraints. It also could provide experiments that can’t really be done in the physical lab, e.g. simulation of a nuclear power plant. Finally, it enables convenient and economic access to expensive and specialized instruments reuse through remote control, and enables cooperative experiment and research. Early players of VE include Virtual Physics Laboratory [5] in University of Oregon, Control the Nuclear Power Plant [6] in Swedish Linkopings University, The Interactive Frog Dissection [7] in University of Virginia etc. have common drawbacks: the components of VE are difficult to reuse and technology used is typically beyond an average educator. It is an urgent requirement for us to devise an intelligent mechanism for teachers to design a VE without much unnecessary effort. The outline of this paper is as follows. Section 2 set forth the layered structure of VE Services, which base on the grid services and Globus Toolkit 3. The model of the VE grid employing VE services is introduced at section 3 and section 4 concludes this paper.

* This paper is supported by 973 project (No.2002CB312002). M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 710–714, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Grid Service Based Model of Virtual Experiment

711

2 Virtual Experiment Services Our proposed VE architecture is based on the widely acknowledged middleware product, the newly released version of Globus (GT3) [2]. Fig.1 describes the layered architecture of VE Services. The VE Services are organized in two hierarchical levels: the core VE Services layer and the high-level VE Services layer.

2.1 Core VE Services Layer This layer employs basic grid services to provide data and resource management. For the VE Services, data are the input/output data of the VE, while resources include Virtual Instruments (VI), analysis & visualization tools, constraints computing tools and stored VE processes besides the generic grid resources such as CPU, memory and database. VI is the main component of a VE, while constraints denote the VE principles, the expressions holding in the VIs together. The Core VE Services layer comprises three main services. VE Directory Service (VEDS). VEDS extends the basic Globus Monitoring and Discovery service and it is responsible for maintaining a description of all resources used in the VE grid and responding to queries of available resources. The metadata information is presented by XML documents and is stored in a VE Metadata Repository (VEMR). Another important repository is the VE Knowledge Repository (VEKR). VEKR stores and provides access to VE process performed within the VE Grid. It warehouses the VE’s process information (past experience) and allows this knowledge to be reused. Once users have constructed successful VE processes they wish to be re-used, they can publish them as new services. In order to enable this function, we need uniform description of a VE. The information needed here include the organization of the resources, the resources metadata description, the steps of the process and the experiment principles (constraints).

Fig. 1. Layered Architecture of VE Services

712

L. Shen et al.

Resource Allocation Service (RAS). This service is used to find the best mapping between a VE design and available resources, with the goal of satisfying the application requirements (network bandwidth, latency, computing power and storage) and grid constraints. RAS is directly based on the Globus Resource Allocation Manager services. The location where each service of a VE is executed may have a strong impact on the overall performance of the VE. When dealing with very large data, it is more efficient to keep as much of the computation as near to the data as possible [4]. To create a reasonably responsive virtual experiment, a compromise has to be made to balance the requirements of communication and computation [3]. A simple allocation algorithm leveraging the above considerations is used to determine the “best” resources as follows. 1. CompuTime=Typical Execution Time stored in VEMR; 2. CommuTime=inLatency + inData/inBandwidth+ outLatency + outData/inBandwidth; 3. Coex=1.2; 4. ExecTime= CompuTime + CommuTime*coex 5. Rank= 1/ExecTime; Line 1 gives the computation time which is estimated as the Typical Execution Time stored in VEMR. Line 2 computes the time needed to transfer the input/output data, where inLatency/outLatency is the network latency of the input/output channel, inData/outData is the amount of the input/output data measured by bit and inBandwidth/outBandwidth is the bandwidth of the input/output channel. Line 3 and 4 gives the value of ExecTime where we give more power to CommuTime because communication time is prone to gain by reason of congestion. Finally rank is the reciprocal of ExecTime, which is the basis for selection. Data Management Services (DMS). The DMS is responsible for the search, collection, extraction, transformation and delivery of the data required or produced by the VI, analysis & visualization tools, and constraints computing tools. Data produced by a remote service may be either stored at the same host of the service executed or collected at a central database, or transferred to next service directly. This information is managed by DMS. DMS service is based on the Globus GridFTP and Replica Location services. The goal of DMS is to realize individual warehouse, a single, large, virtual warehouse of a VE data. It deploys a data grid for a VE. 2.2 High-Level VE Services Layer

This layer is the programming interfaces for VE application developers. Main services are as follows. VI Access Services is responsible for the search, selection, and deployment of distributed VIs, employing the services provided by VEDS and RAS. The VIs may be simulation software, or remote control physical instruments. They may be implemented as java applet which could be downloaded to the client side, or a web service which will be run at server side or a grid service which will be executed in a Virtual Organization [1]. Tool Access Services is responsible for the

A Grid Service Based Model of Virtual Experiment

713

search, selection, and deployment of distributed VE tools, which may provide services for data analysis and management, VE constraint computing, and data visualization. Result Presentation Services is a significant step in the VE process that can help students in the VE result interpretation. This service specifies how to generate, present and visualize the data produced by VI and analysis tools. The result could be recorded and stored either as XML format or visualization format.

3 Model of Virtual Experiment Grid After the general description of the VE Services, here we describe how they are exploited to model the VE grid. Fig.2 shows the different components of the VE grid. In this model teachers and students at the client side could access the resources at the back-end through VE services transparently.

Fig. 2. Model of the Grid Service Based VE

The clients are environments for authoring, executing VE and accessing VE Services. A VE Authoring & Executing Tool (VEAET) is offered at client side. VEAET provides services for teachers to design VE plans easily, and for students to execute VE plans. A VE plan is represented by a graph describing resource composition. A node in the plan graph denotes access to one of the distributed resources including VI, tools etc, and a line between nodes describes the interaction and data flows between the services and tools. With this visual tool, a teacher can directly design the VE plan by selecting and dragging. A VE plan could be recorded and stored as XML format locally or published remotely. When a VE plan is loaded and set to startup, it will firstly get initialized by VEAET. VEAET, acting on the user’s behalf, contacts a VE registry that a relevant Virtual Organization maintains to identify VE service providers. The registry returns handles identifying a VE Services that meet user requirements. Then VEAET issues requests to the VE services factory specifying details such as the VE operation to be

714

L. Shen et al.

performed, and initial lifetimes for the new VE service instance. Assuming that this negotiation proceeds satisfactorily, a new VE service instance is created with appropriate initial state, resources, and lifetime. The VE service, afterwards, initiates queries against appropriate remote VIs, tools and constraints computing, acting as a client on the user’s behalf. Appropriate factories of the relevant resources are selected and then returned from the VE services to the client VEAET. The VEAET is responsible for activating execution on the selected resource as per the scheduler’s instruction and then binds the new service instances to the VE plan. A successful outcome of this process is that a VE plan is transformed into an executable VE. During the execution course, VEAET periodically updates the status of VE execution and records the VE process as XML format. Teachers and students could publish a successfully executed VE process through VEDS for further reuse.

4 Conclusion and Future Work The Grid Services infrastructure is growing up very quickly and is going to be more and more complete and complex both in the number of tools and in the variety of supported applications. In this paper, we propose a VE model based on novel grid service technology. This model puts forward two-layered VE Services to provide cheap and efficient distributed VE solution. This model could reuse not only VIs but also compositive VEs. Moreover, we provide a visualized VE authoring tool for teacher to design an experiment with little effort. In order for the comprehensive communication between the VE environment and VIs, future work will focus on standardization of the virtual instrument interfaces and VE workbench APIs.

References 1.

2. 3.

4.

5. 6. 7.

Foster et al.: The physiology of the grid, an open grid services architecture for distributed systems integration. Tech. report, Open Grid Service Infrastructure WG, Global Grid Forum (2002) Thomas Sandholm, Jarek Gawor: Globus Toolkit3 Core-A Grid Service Container Framework. http://www-unix.globus.org/toolkit/3.0/ogsa/docs/gt3_core.pdf(2003) Chuang Liu, Lingyun Yang, Ian Foster and Dave Angulo: Design and Evaluation of a Resource Selection Framework for Grid Applications. Proceedings of the 11th IEEE Symposium on High-Performance Distributed Computing(2002) Vasa Curcin and Moustafa Ghanem et al.: Discovery Net: Towards a Grid of Knowledge Discovery. Knowledge Discovery and Data Mining Conference(2002), ACM 1-58113567-X/02/0007 Virtual Physics Laboratory. http://jersey.uoregon.edu/vlab/ Control The Nuclear Power Plant. http://www.ida.liu.se/~her/npp/demo.html The Interactive Frog Dissection. http://curry.edschool.virginia.edu/go/frog/

Accounting in the Environment of Grid Society* Jiulong Shan, Huaping Chen, Guoliang Chen, Haitao Tian, and Xin Chen Department of Computer Science and Technology, University of Science and Technology of China 230027 Hefei, Anhui, China {jlshan, tht, chxin}@mail.ustc.edu.cn {glchen, hpchen}@ustc.edu.cn

Abstract. Grid and P2P are both emerging technologies that aim at efficient resource sharing in recent years [1, 2]. Reference [3] named the coexist environment of Grid and P2P “Grid Society”. In the environment of Grid Society there exists a large pool of users as well as resource providers, just as in Human Society, whether all of them can be efficiently managed will greatly affect the performance of the whole system [4]. On the basis that Human Society and Grid Society are Similarity Systems, we use the methods of migration to explore the problem of accounting management and proposed a Society based Accounting Management model for Grid Society.

1 Introduction Grid and P2P are both emerging technologies that aim at efficient resource sharing in recent years. Now more researchers inclined to combine the research work in these two fields. We followed the above idea and put forward a system model that merged Grid and P2P, and we entitle it “Grid Society”. It inherited both from the Grid environment and from the P2P Environment. Based on the comparative research of Human Society and Grid Society, reference [3] has drawn out the conclusion that: Grid Society and Human Society are Similarity Systems, issues in Grid Society can be solved using corresponding solutions of similar issues in Human Society. In the environment of Grid Society there exists a large pool of users as well as resource providers, and they belong to multi-domains separately. Just as in Human Society, whether all of them can be efficiently managed will greatly affect the performance of the whole system. The accounting problem in Grid Society should include: Goal 1: Managing of users’ behaviors, avoiding violate operations. Goal 2: Recording the usage of each user accurately and charging of it. Goal 3: Enhance of resource sharing between consumers and providers. *

This work was supported by the National ‘863’ High-Tech Programme of China under the grant No. 2002AA104560, National Science Foundation of China under the grant No. 60273041, and SRF for ROCS, SEM.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 715–718, 2004. © Springer-Verlag Berlin Heidelberg 2004

716

J. Shan et al.

Several works have already set to solve the accounting problem: First, Globus Toolkit [5] as a de facto standard of Grid Computing software uses GSI to manage the accounts. In [6] Virtual Account System is proposed to simplifying the user management problem. For the schema of United Device’s Grid MP platform [7], a layered structure is used for dispatching jobs and maintaining the users’ account info. In the reminder of this paper, we will mainly discuss the accounting problem in Grid Society. In section 2 we give the comparison between Human Society and Grid Society, and in section 3, a new society based accounting model is described. Summary and future works are presented in section 4.

2 Accounting in Human Society and Grid Society According to the conclusion drawn in [3], we can explore the similar problem of accounting in Human Society and migrate its solution into Grid Society. The fundamental elements of Human Society are People and Nature Resource. The element People acquires various resources to meet its own requirements and offer human labor to serve the Human Society. Then the People elements form various organizations. On the basis of that, various kinds of social affairs come into being. Therefore, in Human Society the account object to be managed can be categorized into two classes: individual user and group user. But, from another point of view, they can also be identified as Consumer, Provider and Agency. The Grid Society also consists of two kinds of fundamental elements: Computer and Grid Resource. The Computer element here means a single processor with some auxiliary devices, having abilities of computing, storage or routing, etc. And the Computer elements can be composed into Machine Teams of various size and ability. The same as it in Human Society, resource sharing in Grid Society can also be regulated under economical mechanisms. In Grid Society the object of accounting management is the user of a Computer or a Machine Team, and the management requirements described in Section 1 is consistent with effects of Human Society’s. Therefore, we can migrate the experience in Human Society into Grid Society to improve the accounting management.

3 Society Based Accounting Management Model In our SAM Model, the accounts in Grid Society can be classified into 3 kinds: Consumer: one who refer job request to Grid Society. Provider: one who shares his own resources with the others in the Grid Society and earns reward through helping others to fulfill their tasks. Agency: one who schedules the interaction between Consumer and Provider. Agency holds the service information from Provider as well as users’ requirements from Consumer. By certain scheduling policy, agency works for the efficient resource sharing in Grid Society.

Accounting in the Environment of Grid Society

717

3.1 General Components Interaction The general interaction among Consumer, Provider and Agency is described in Fig.1. Through the Consumer Component, an end-user submits his job onto the Agency he selected from a list it maintained. After checking user’s identity and analysis of job request, the Agency can react in two ways. Agency can return a list of available services for end-user to choose, or it can use certain scheduling algorithm (authorized by the end-user) to search for the most appropriate Service Provider and submit job to it. When Provider complete one job, it return the result to Consumer through the Agency, and the Agency balance the fee among three sides, according to the execution of trilateral protocols. The Finance Module used in balance can be viewed as a financial service Provider.

Fig. 1. General Components Interaction in SAM

Agency is with responsibility for recording the contract information, and charging it. Especially, the same as in Human Society, all contracts are subscribed by all the three involved parts. Every part can have its own decision and it makes a free market.

3.2 Complex Components Interaction One individual component may be multi-identified, so we think that more complex interaction operations also exist among those entities, as show in Fig.2. First, one entity can be Consumer, Provider and Agency at the same time shows in Fig2. (a). This kind of interaction achieves Work Flow management on Provider part.

Fig. 2. Complex Components Interaction in SAM

Second, for a service request by end-user, Agency can map it to be fulfilled on multiProviders, describes in Fig2. (b) That is, Agency can decompose the request according to its details. And Work Flow management here is achieved on the Agency part. Third, when one Agency does not get match for end-user’s request in his Service List, it can forward the request to his other similar functioned Friend Agency, as Fig2. (c) shows.

718

J. Shan et al.

With the SAM model described above, we can achieve the total 3 Goals listed in Section 1. First, the design of Society based model eliminates the gulf between Provider and Consumer and efficiently makes a match between them. Second, the architecture ensures the independency among components, so that each component can set up self-management policy without influence on the whole architecture. Third, the layered structure makes the user management decentralization possible in Grid Society environment. Only the Agency need to have a certificate on the destination Provider, and the end user only need to communicate with the selected Agency. Therefore, it lightens the burden for every Service Provider. The Provider component may also use the method of Account Template for further improvement [8].Finally, Different service selection policy can be used in Agency component, e.g. economic market based auction mechanism, which can enhance the economization in resource sharing.

4 Conclusion and Future Works This article is based on the principle that methods between Similar Systems can be migrated. And we migrate the basic accounting management policies from Human Society for the management of accounting in Grid Society. Our future work will focus on the method of user-faced Agency addition to provide more available service and protect user’s rights and interests at the same time. We also plan to study the organization of the Agencies, as well as their influence on the whole Grid Society’s resource usage efficiency.

References 1. I. Foster, C. Kesselman, etc.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, 2001. 2. tefan Saroiu, P. Krishna Gummadi, etc.: Exploring the Design Space of Distributed and Peer-to-Peer Systems: Comparing the Web, TRIAD, and Chord/CFS, 1st International Workshop on Peer-to-Peer Systems, 2002. 3. Jiulong Shan, Guoliang Chen, etc.: Grid Society: A System View of the Grid and P2P Environment, International Workshop on Grid and Cooperative Computing, 2002. 4. Tom Hacker, Bill Thigpen.: Distributed Accounting Working Group Charter, http://www.gridforum.org/5_ARCH/ACCT.htm, 2000.6. 5. The Globus Toolkit: http://www.globus.org 6. Norbert Meyer, Pawel Wolniewicz, Miroslaw Kupczyk.: Simplifying Administration and Management Processes in the Polish National Cluster, CUG SUMMIT, 2000. 7. United Device – Grid Computing Solution: http://www.ud.com 8. Thomas J. Hacker, Brian D. Athey.: A Methodology for Account Management in Grid Computing Environments, The 2nd International Workshop on Grid Computing, 2001.

A Heuristic Algorithm for Minimum Connected Dominating Set with Maximal Weight in Ad Hoc Networks Xinfang Yan 1,2, Yugeng Sun1, and Yanlin Wang1 1

School of Electrical Engineering & Automation, Tianjin University,Tianjin300072; 2 College of Information Engineering, Zhengzhou University, Zhengzhou450052

Abstract. Routing based on a minimum connected dominating set (MCDS) is a promising approach in mobile ad hoc networks, where the search space for a route is reduced to nodes in the set (also called gateway nodes). This paper introduces MWMCDS, a simple and efficient heuristic algorithm for calculating MCDS with maximal weight. The choiceness based on maximal weight of gateway nodes guarantees that the most suitable nodes have been chosen for the role of gateway nodes so that they can properly coordinate all the other nodes. As a result, the method can keep stability of the MCDS and provide a high effective communication base for broadcast and routing operation in the whole network.

1 Introduction Mobile Ad hoc Networking (MANET) is a temporary, and an autonomous multihopsystem consisting of hosts with wireless receiver and dispatcher, where each host assumes the role of a router for its neighbors and relays packets toward final destinations. MANET has no established infrastructure or centralized administration, where every host can move to any direction at any speed and any time. This induces a dynamic topology. The characteristic put special challenges in routing protocol design, because it must take much expense (bandwidth, CPU, battery, etc) to find a route again when topology changes. So routing algorithm should converge quickly. Recently a hierarchical routing approach based on a MCDS is proposed in [1,8,9,10,]. The gateway hosts in the MCDS form a high-level virtual backbone network.. Each gateway host act as a control center in own cluster. Clearly, the efficiency of this approach depends largely on the process of finding and maintaining a MCDS and the size of the corresponding subnetwork. Unfortunately, computing a MCDS in a unit graph is NP-hard [3]. So the approximate algorithm for MCDS is needed to design in practical applications. Wu [8] gave the contrast among several existing main algorithms, where these algorithms are described by taking into account all the hosts of network with same character. But the hosts in the actual MANET may be quite complicated, such as they can be computer, PDA and varied mobile telephone. And the state of host’s power or the time of host’s online plays an important role in choice of gateway nodes. With a view to this feature, the paper introduces MWMCDS. In order to reflect the influence M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 719–722, 2004. © Springer-Verlag Berlin Heidelberg 2004

720

X. Yan, Y. Sun, and Y. Wang

of host’s power or the time of host’s online, every node v is assigned a weight (a real Furthermore, the sum of all the nodes’ weight is insured as much as possible when MCDS is formed. Because each node only has a gateway node with routing table, the source node can find a route promptly and reduce communication delay. As a result, the mothed can minimize the amount of storage for communication information , and minimize the amount of data to be exchanged in order to maintain routing and control information in a mobile environment, ect. If the route is congested, the other gateway router of the node’ neighbor set is getatable. In addition, only the information of the local topology is amended when its topological structure changes (e.g.,hosts switch on, off and move). So the algorithm has good distributed nature and retractility.

2 Preliminaries We model a MANET by an undirected graph G = (V ,E), in which V is the set of mobile hosts and E represents a set of edges. There is an edge if and only if u and v can mutually receive each others’ transmission (this implies that all the links between the hosts are bidirectional) ; that is, connections of hosts are based on geographic distances of hosts. We assume each host is mounted by an omni-directional antenna. Thus the transmission range of a host is a disk and the corresponding graph is called a unit disk graph [3], or simply, unit graph. A dominating set (DS) D of G is a subset of V such that any node not in D has at least one neighbor in D. If the induced subgraph G[D ] of D is connected, then D is a connected dominating set (CDS). Among all CDSs of graph G, the one with minimum cardinality is called a minimum connected dominating set (MCDS). Vertices in a DS are called gateway nodes (or dominator) while vertices that are outside a DS are called non-gateway nodes(or dominatee). We assume a given MANET instance contains n hosts (nodes). Every node v in the network is assigned a unique identifier (ID). Consider weighted networks, i.e., use a weight of each node of the network to stand for the power of host or the time of host’s online. In the procedures, we use the following notation: hop_count (u, v)---the number of edges in the shortest path from u to v, for u , the one-hop (open) neighbor set of the two-hop (open) neighbor set of m(v)--- a marker for node which is 0, 1, or 2 and respectively correspond v’s role is undecided, will be nongateway or gateway node, the initial state. A node v in this state has m(v)=0. the dominatee state. A node v in this state is a non-gateway and has m(v)=1. the dominator state. A node v in this state is a gateway and has m(v)=2. G(v)---used by a node v to make its neighbors aware that it is gateway node. G(v)=t ---node t is the gateway node of node v. join(v , t)--- node v, whose gateway is node v is a non-gateway. m (v) = 2 }--- the set of all the gateway nodes.

A Heuristic Algorithm for Minimum Connected Dominating Set

721

3 Algorithm Initially, every node v in the network is assigned a unique identifier (ID) randomly and let m(v)=0, i.e., all nodes are in v exchange its open neighbor set with all its neighbors.Thus, v knows information of that is, its neighbor’s neighbor information. Then, v just with one neighbor first decides its own role based on rule (a), then this neighbor’role is decided based on rule (b). And then, v with calculates its own new state based on information. The algorithm rule is as follows: (a). If node u just has one neighbor t, then lets m(u)=1 and goes to state and u will broadcast the message join(u , t). Turn to next node. (b). On receiving a join(u , t) message, t goes to state and broadcasts a G(t) message to its neighbors stating that it will be a gateway node Turn to next node. (c). If v has received from all its neighbors z such that then calculates its own role based on rule (d); else turns to the node with the biggest weight among the neighbors of v’s in (this ensures those nodes with bigger weights are top-priority). (d). If v exists two unconnected neighbors x and y, it checks whether those gateway nodes among its are connected(ths is a optimization design for decreasing size of CDS). If they are unconnected, then sets m(v)=2 and goes to state v will broadcast the message G(t) Turn to next node; else calculates its own role based on rule (e). (e). Goes to state v selects the gateway node t with biggest weight among all its neighbor nodes. If t is not existent, v will keep waiting until t is appeared, and then v will set m(v)=1 and broadcast the message join(v , t). Turn to next node. We will show that all the nodes terminate the algorithms being either gateway nodes or non-gateway nodes, and that the set of the gateway nodes in is indeed a MWMCDS.

Fig. 1. When N =20

Fig. 2. When R = 30

4 Simulation We measure the performance of the proposed MWMCDS algorithm using computer simulation. Assume that there are N hosts are scattered randomly in a 250×250 square units of a 2-D simulation area. Only take connected graph into consideration. We vary

722

X. Yan, Y. Sun, and Y. Wang

N and R (transmission radius) to analyze how the network size and connectivity affect the performance. We run the algorithm 80 times on different set of parameters including N and R and at the end, we simply take the average of the ratios for all cases.Twe of the averaged results are reported in Fig. 1 and Fig.2, where “gate of all” curve represents ratio of number of gateways versus the number of hosts in the network and “gatev of high” curve represents the ratio of gateways with high weight versus all of the gateways in the network. The curve of experiment shows that ratio of number of gateways versus the number of hosts in the network will be reduced as number of node in the topology graph increases. At the same time, the ratio of gateways with high weight versus all of the gateways in the network keeps higher value (over 80% ), i.e., when gateways are chosen in the procedure, the nodes with high weight take first priority of computation.

5 Conclusions Simulation resluts show that the proposed algorithm MWMCDS can ensure the maximality of sum of CDS’ weight and the minimality of CDS’ size. So the scheme can be potentially used in designing efficient routing algorithms based on a MCDS.

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

[10]

K. M. Alzoubi, P .J. Wan, and O. Frieder.: New Distributed Algorithm for Connected Dominating Set in Wireless Ad Hoc Networks, Proc.35th Hawaii Int’l Conf. System Sciences (2002) 3881-3887 S Basagni.: Finding a Maximal Weighted Independent Set in Wireless Networks.. Telecommunication Systems 18(1-3)(2001) 155-168 B. N. Clark, C. J. Colbourn, and D. S. Johnson..: Unit Disk Graphs. Discrete Mathematics, (1990) 86: 165– 177 S. Guha and S. Khuller.: Approximation Algorithms for Connected Dominating Sets., Algorithmica(1998) 20(4): 374-387 H. Lim,C. Kim.: Flooding in Wireless Ad Hoc Networks. Computer Communications, (2001)24: 353-363 E.M. Royer and C.K. Toh.: A Review of Current Routing Protocols for Ad Hoc Mobile Wireless Networks. IEEE Personal Comm.(1999)4: 46-55 P. Sinha, R. Sivakumar and V. Bharghavan.: Enhancing ad hoc routing with dynamic virtual infrastructures. INFOCOM (2001)3: 1763-1772 J. Wu and H. Li.: A Dominating-Set-Based Routing Scheme in Ad Hoc Wireless Networks. Telecomm. Systems, A special issue on Wireless Networks(2001)18:l-3,13-36 J. Wu.: Extended dominating-set-based routing in ad hoc wireless networks with unidirectional links. IEEE Trans. On Parallel and Distributed Systems(2002)13(9):866-881 Peng Wei, Lu Xi-Cheng.: A Novel Distributed Approximation Algorithm for Minimum Connected Dominating Set. Chinese J. Computers (2001)24(3): 254-258

Slice-Based Information Flow Graph* Wan-Kyoo Choi1 and Il-Yong Chung1 School of Computer Engineering, Chosun University, Kwangju, 501-759, Korea, [email protected], [email protected]

Abstract. We, in this paper, try to represent information flow of program on the basis of slices. It is referred to slice-based information flow graph(SIFG). SIFG captures the information flow among data tokens. We can find the elementary characteristics of the information flow of program by using SIFG and increase the understanding about program.

1 Introduction The nature of information flow, which is related to the deep structure of a program, must be considered for understanding a program. Programmers tend to group statements in ways based on other than sequential relationships when attempting to understand programs. In general, the criteria used for the groupings are related to data and control flow [2]. This information is explicit in the program slice [1]. Slices is the abstraction of sets of statements that influence the value of a variable at a particular program location [2]. Slicing is a method of program reduction. Slices were proposed as potential debugging tools and program understanding aids. Slices as originally defined capture the “use” relationship of traditional flow analysis [5] Specially, data slice [4,5] represents a slice abstraction of a program. Data slices modify the concept of metric slice [3] to use data tokens rather than statements as the basic unit. Like metric slices, data slices also are computed on the set of variables that are outputs from a module. Data tokens are variables, constant definitions and references defined in statements. Usage of data tokens ensures that all of elementary changes of interesting variables will cause a change in at least one slice of a program [4]. If programmers use slices when understanding a program, understanding of a program can be regarded as making a search for the information flow on slices. Data slice, however, never consider the information flow among the data tokens. Therefore, if we can capture the information flow among data tokens, we can find the elementary characteristics of the information flow of a program and increase the understanding of it. We, in this paper, propose the slice-based information flow graph(SIFG) for representing information flow of program by capturing and modeling the information flow of data tokens on data slice. Since SIFG can show the information * This study was partially supported by research funds from Chosun University, 2003 M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 723–726, 2004. © Springer-Verlag Berlin Heidelberg 2004

724

W.-K. Choi and I.-Y. Chung

flow on data slice explicitly, it can represent well the interesting elementary changes of the information flow of a program and can enhance our understanding of a program.

2

Slice-Based Information Flow Graph

We developed the slice-based information flow graph, which models the information flow on a data slice to use data tokens as basic unit. Definition 1. The information flow graph on a data slice S, denoted SIFG(S), is a directed graph, SIFG(S)=. N is a set of data tokens defined in statements of S, and E is a set of edges. An edge of SIFG represents the information flow amomg data tokens. An information flow edge is defined among data tokens contained in a statement and related to a variable. An information flow edge defined among data tokens contained in a statement represents use-used between data tokens. Let be statement of S. Then is a set of data tokens used or contained in Let and SIFG contains a direct information flow edge from to (that is, directly uses if one of the followings holds:

1. The result of computation using brings on a change of 2. is an data token related to an array variable and is an index of it. In this case, change of lead to change of an array element. SIFG contains an indirect information flow edge from indirectly uses if the following holds: 1.

contains a logical operator, and side of the one.

to

is left-hand side and

(that is, is right-hand

An information flow edge defined among data tokens related to a variable represents the flow of data value between data tokens. Let be a set of all data tokens of a variable used in S. Let SIFG contains a direct information flow edge from to if the followings holds:

1. There is an control flow path from to in standard control flow graph for S. on an control path 2. There is no such a statement, which contains from to 3. There doesn’t exist such that there is the information flow from to and from to The slice-based information flow graph for a procedure P, denoted SIFG(P), is defined as concatenation of every SIFG(S) related tp P.

Slice-Based Information Flow Graph

725

Definition 2. Let be slices for a procedure P. Let and be sets of nodes and edges on respectively. Then SIFG(P) for a procedure P is as follows.

For examples,

and are slices about the outputs of procedure Sum1 of figure 1, respectively. In figure 1, indicates the data token for a variable in a procedure, and indicated the statements in procedure. The data slices for sumX and sumSqrX are a sequence of data tokens used in S(sumX) and S(sumSqrX). Figure 2 shows SIFG for a procedure Sum1.

Fig. 1. Procedure Sum1

3

Fig. 2. SIFG(sum1)

Related Works

The representative methods for representing the information flow of a program are control flow graph(CFG) [6], data flow graph(DFG) [7] and program dependence graph(PDG) [1]. CFG, in which the nodes represent statements and the edges represent transfer of control between statements, encodes control flow information. DFG, in which the nodes represent statements and variables and the edges represent data flow between statements and variables, encode data flow information and point of input and output. PDG, in which the nodes represent statements or region codes of code and the edges represent control and data dependencies, encodes both control and data dependence information.

726

W.-K. Choi and I.-Y. Chung

We are able to understand control flow, data flow, control dependency and data dependency between statements and between statements and variables in a program by using CFG, DFG and PDG. SIFG use variables that are represented by data tokens, but CFG, DFG and PDG use statements as basis of analysis of a program. Thus, while SIFG can’t represent the data flow structure between variables and the elementary characteristics of the information flow of a program, CFG, DFG and PDG can represent these informations.

4

Conclusion

We, in this paper, proposed the slices-based information flow graph (SIFG). The existing representations of information flow of program are based on statements, but SIFG is based on variables that are represented by data tokens. Specially, it captures characteristic of information flow between variables on slices. Thus SIFG is able to represent the elementary change of interesting variables and the data flow structure between variables. It can show the nature of information flow of a program more clearly and enhance our understanding about a program. We also can use it as tool for the partial analysis and the partial debugging of a program by employing this characteristic of SIFG.

References 1. Karl J. Ottensteion, Linda M. Ottensteion: The program dependence graph in a software development environment. Proceeding of the ACM SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environment, ACM SIGPLAN Notices 19(1984) 177-184. 2. M. Weiser: Programmers use slices when debugging. Communication of the ACM 25(1982) 446-452. 3. Linda M. Ott, Jeffrey J. Thuss: Slice based metrics for estimating cohesion. Proc. IEEE-CS International Software Metrics Symposium(1993) 71-81. 4. J.M. Bieman, L.M. Otto: Measuring functional cohesion. IEEE Transaction Software Engineering 20(1994) 111-124. 5. J.M. Bieman, B.K. Kang: Measuring design-level cohesion. IEEE Transaction Software Engineering 24(1998) 111-124. 6. Linda M. Otto: Using Slice Profiles and Metrics during Software Maintenance. Proc. 10th Annual Software Reliability Symposium(1992) 16-23. 7. A. Aho, R. Sethi and J. Ullman: Compilers, Principles, Techniques and Tools. Addison-Wesley, Reading, MA(1986).

Semantic Rule Service Model: Enabling Intelligence on Grid Architecture Qi Gao, HuaJun Chen, ZhaoHui Wu, and WeiMing Lin Grid Computing Lab, College of Computer Science, Zhejiang University, Hangzhou, 310027, P.R.China {hyperion, huajunsir, wzh}@zju.edu.cn, [email protected]

Abstract. Based on Semantic Web technology and OGSA architecture, we propose a Semantic Rule Service Model to enable intelligence on grid. In this model, we regard rules and inference engines as resources, and employ rule base services and inference services to encapsulate them to do inference on ontology knowledge. With the support of OGSA architecture, we organize the services as grid services in order to support the dynamic discovery and invocation of the rules and inference engines. The function of this model is to provide intelligent support to other grid services and software agents. In addition, we illustrate the application of this model in an application of the Traditional Chinese Medicine (TCM) system.

1 Background and Introduction The Grid [1] is an integrated infrastructure for coordinated resource sharing and problem solving in distributed environments. In OGSA model [2], various resources, including information and knowledge, are encapsulated in Grid services. The Grid infrastructure is a sound base for dynamic and large-scale web applications. On the other side, with the goal “making the web machine understandable”, Semantic Web [3] research community focuses on the semantic integration of the web. Several ontology languages, such as RDF [4], DAML+OIL[5], and OWL[6], are developed to represent data and information semantically by defining the terms with explicit semantics and indicating relations between them clearly. In the future, the Internet will be integrated both physically with Grid architecture, and semantically with Semantic Web technology. The research on Knowledge Base Grid (KB-Grid) [7] has put effort on utilizing the two new technologies together to enable knowledge sharing and grid intelligence. The Semantic Rule Service, which is a sub-project of KB-Grid, aims at bringing reasoning support to web applications. In our opinion, not only the descriptive knowledge in ontology should be considered as resources, rules and inference engines are also resources to be published and shared. In Semantic Rule Service model, we construct a suit of services to enable various organizations to publish their rules and inference engines on the web, so that other web applications and software agents are able to utilize these resources to solve M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 727–735, 2004. © Springer-Verlag Berlin Heidelberg 2004

728

Q. Gao et al.

specific problems. Different from many traditional rule-based systems, this model is a grid-based rule service model for ontology knowledge on the web. As the services are constructed as Grid service, we can employ some technologies provided by OGSA to support dynamic service registry, discovery, and accessing.

2 Related Work With the rapid development of Semantic Web, many research organizations have put efforts into the representation of logic. RuleML [8] makes the efforts of building a standard rule description language for Semantic Web. Based on the research of situated courteous logic programs (SCLP), RuleML supports prioritized conflict handling as well as sensors and effectors. Based on Horn logic, TRIPLE [9] is designed especially for querying and transforming RDF models. DAMLJessKB [10] adopted the approach of translating DAML description to Jess [11] rules to reason with DAML knowledge. Besides, DAML research community has initiated DAML-Rule [12], based on the research on Description Logic Programs (DLP) [13], which aims at combining Description Logic (DL) reasoning with Logic Programs (LP) inference. In our view, rules should only be used to represent heuristic knowledge derived from experience, and the descriptive knowledge (fact) should be left to other representations, e.g. ontology. Rules should be utilized together with descriptive knowledge in ontology language. In this paper, we may use the word “knowledge” solely refer to the descriptive knowledge. Here, we focus on rules’ ability to process knowledge for other web applications and make decision for software agents. Actually, the decision making procedure is also similar to knowledge processing, since related descriptive knowledge must be processed to reach a conclusion. From this perspective, rules are different from descriptive knowledge, and a set of rules can process a certain kind of descriptive knowledge in a specific way. Then we discuss some differences between Description Logic (DL) reasoners, e.g. FaCT [14], RACER [15] and rule inference engines. Since the ontology languages of Semantic Web are defined as DL language, DL reasoners are especially suitable for computing subsumption relation between classes and discovering inconsistency in ontology model. And that kind of reasoning is essential to building and maintaining large ontologies. On the other side, rule inference engines are more flexible. People can design rules straightforwardly to make inference on certain knowledge. The rule inference engine can only focuses on the related pieces of knowledge and can tolerate some inconsistency of the knowledge model. Therefore, rule inference engines are suitable for domain specific applications, which need flexible, domain related rules.

Semantic Rule Service Model: Enabling Intelligence on Grid Architecture

729

3 Semantic Rule Service Model Figure 1 depicts the layers of Semantic Rule Service Model. The service layer has three related Grid services, each of which encapsulates one kind of resources. Rule base service and inference service will be discussed detailedly in this section. The Ontology KB service is the central part of KBGird [7] providing interfaces for knowledge sharing, query, and management. Here, ontology KB service mainly acts as knowledge source and we do not intend to discuss it in detail. Directory services are on the index layer, playing an important role in service registry and dynamic discovery. The upper-most layer is the application layer, on which other grid services, software agents and semantic browser can utilize the services on lower layers. The rule editor is a supportive tool, which enable users to edit rules visually.

Fig. 1. Semantic Rule Service Model

3.1 Rule Base Service Rule base service provides interfaces for web users to share their rules. On the one hand, rule publishers can register their rules on the rule base. On the other hand, all web users can access the registered rules and use these rules to process knowledge. Rules are organized as RuleSets by function in the rule base. A RuleSet is a group of rules which are written in a certain rule representation language, based on a certain kind of knowledge, and applied to process the knowledge in a certain way. Rules in a RuleSet should be close related and cooperate with each other to implement an inference. It should be noted that the rule base does not define or appoint a rule language for RuleSet. In other words, RuleSet can be written in any rule language, such as RuleML, TRIPLE, etc.

730

Q. Gao et al.

To organize the various RuleSets, rule base must maintain the meta-information of them. Each RuleSet has its meta-information, so that any web user can find the RuleSet which meet the requirements. The basic meta-information includes: the URI, the rule language, the knowledge type, the publisher, version, and last-update time, and the description of the function. For example, the meta-information of a RuleSet in the TCM application is this:

3.2 Inference Service The inference service performs the task of processing knowledge according to certain RuleSets. Generally, an inference service encapsulates an underlying inference engine. With the shared basic interface, various inference services can be provided by different organizations. Some can be built upon traditional rule-based systems, such as Jess [11], with an outer layer to translate knowledge between classical representations and the Semantic Web standards. Others may be built on newly designed inference engines which support Semantic Web languages. In our prototype we designed a new inference engine based on RDF and WRIL, which will be discussed in detail in section 4. Although the implementation is transparent, web users need to know which rule languages and knowledge languages are supported by the inference service. Therefore, every inference engine should have its meta-information, which may also be used for service discovery and locating. The meta-information consists of: the address of the service, the rule language and the knowledge language supported by the engine, the publisher, version, and last-update time, etc. Here is an example:

Semantic Rule Service Model: Enabling Intelligence on Grid Architecture

731

3.3 Rule Editor The rule editor serves as a visual tool of editing rules. Although some traditional rulebased systems have corresponding methods to edit rules, few of them can edit rules based on ontologies. Since we consider rules as the description of the business logic of processing knowledge, rules should support ontologies and should be close related to the knowledge representation languages such as RDF, OWL, etc. Our rule editor can incorporate vocabulary of basic ontologies and customized ontologies defined by user, so that users can choose the terms defined in ontologies to design their rules.

3.4 Directory Service The directory service on the index layer focuses on the registry, discovery, and locating of the rule base service and inference service. Service providers register the meta-information of RuleSets and inference service to the directory. Then users or user applications can query the services they require and get the addresses of them.

Directory service maintains meta-information with a certain life cycle. If after a certain period the item is not reregistered again, the item will be removed from the directory as obsolete information. Query is implemented by matching function to compare the meta-information in the request with the meta-information in the directory and returns the matched items, including the meta-information and the address

732

Q. Gao et al.

the service. The mechanism to communicate between distributed directory services is that: Every directory records a group of other neighboring directories. When a new item is registered to one directory, the directory registers it to its neighbors. When a directory receives a registration from another directory, it subscribes the item from the source and updates the local item according to the source.

3.5 Application Layer In the simplest way, the semantic browser can serve as a client tool of this model. The fundamental function of semantic browser is to display RDF knowledge in a graphical view. Here, the semantic browser can access the rule services in a visual way. It focuses on the personal use of the services enabling the rule publisher to publish RuleSets, query the rule base, and access inference services to process knowledge. Software agents can also benefit from the rule services. Many software agents typically rely on the build-in rules and inference engines to analyze the environment and behave intelligently. However, as the environment is complex and continuously changing, the build-in rules must be frequently updated and the work of the software agents may be interrupted. To solve this problem, we can store the rules in rule base, and keep updating them. Then the software agents can access the rule base service and obtain the appropriate RuleSets according to the external environment. In the most common cases, rule services are used to support users or other web applications to process knowledge. To solve an application problem, special experts or organizations in the specific domain design some RuleSets for the problem by the rule editor. The rules can be represented in any rule language, providing there are corresponding inference services available on the Web. In section 4, we will discuss it with a real-world application.

4 A Case Study of the Application in Traditional Chinese Medicine In this section, we discuss the rule service model with a web application of Traditional Chinese Medicine (TCM). Traditional Chinese Medicine (TCM) is a knowledge intensive domain. In previous work, a Unified TCM Language System [16] has been built as an abstract upper-level class in TCM Ontology. TCM ontology contains many special concepts which are represented as classes and properties in RDF. Now we have finished the building of the whole class definition of TCM Ontology and edited about 100,000 records of TCM ontology instances. As a sub-project of TCM KB-Grid, we have developed a computer-aided prescription analyzing system, which is based on rule service model and ontology KBs. The function of the system is to analyze prescriptions and provide suggestions and warnings according to TCM knowledge and analyzing rules. To represent these rules, we design a language, which especially aims at processing RDF knowledge, called Web

Semantic Rule Service Model: Enabling Intelligence on Grid Architecture

733

Rule Inference Language (WRIL). We designed a rule ontology and use RDF language to represent rules. Definitions of WRIL: A Rule is defined as a 3-tuple , where ASet is antecedents of the rule, consisting of BodyUnits, CSet is consequents of the rule, consisting of HeadUnits, and f is the times the rule can be fired. A RuleStmt is defined as a 3-tuple <S, P, O>, where S, P, O are the subject, predicate, and object of the rule statement respectively. Any of them can be variable or constant. A BodyUnit is defined as a 2-tuple , where RS is the RuleStmt and T is the test condition of the RS. The test condition has 2 types: Existence and Nonexistence A HeadUnit is defined as a 2-tuple , where RS is the RuleStmt and A is the action of the RS. The action has 4 types: AddStmt, RemoveStmt, ChangeNum, and ChangeStr. Variable is also defined to represent indefinite parts of RuleStmt. There are 3 types of variable: Res Variable, StrVariable, and NumVariable. The RDF Schema can be found on our web site: http://grid.zju.edu.cn. A simplified example rule for detecting Contraindication in TCM prescription is provided next:

The meaning of this rule is: If any prescription, represented by a variable “V_P”, has two herbs, represented by variables “V_Herb_X” and “V_Herb_Y”, and these two herbs have “Contraindication” with each other, then a warning about Contraindication is generated in the prescription. We have developed an inference engine to execute the WRIL rules. This inference engine employs Jena API [17] to build RDF model and access RDF knowledge.

734

Q. Gao et al.

Fig. 2. Prescription analyzing system on the Semantic Rule Service Model

The computer-aided prescription analyzing system is published as web service. The request includes patient information, diagnosis, and prescription, all of which are represented in RDF according to TCM ontology. The system locate the analyzing RuleSet and corresponding inference service by querying directory service, then access rule base service to get the RuleSet in order to send them to inference service. The inference service processes the patient case and the rules with the support of TCM ontology KB and returns the results to the analyzing system. The system reorganizes the results and reply to users. With the support of TCM ontology KB, which includes huge amount of descriptive knowledge, this system achieves high performance based on a relatively small set of general rules.

5 Summary and Future Work In this paper, we describe the semantic rule services to address the ontology knowledge processing problem in the Grid and Semantic Web background. In this model, rules and inference engines are considered as resources to be shared by Grid services, and rule inference is employed to enable intelligent grid-based applications and software agents. In the future, we will make our effort into other usages of rule inference in Grid architecture, e.g. rule-based service integration or process model validation, etc. And we may also combine rule inference and description logic reasoning together to achieve better utilization of ontology knowledge. The rule service model may evolve as long as the research progresses.

References [1] [2]

Ian Foster, Carl Kesselman, and Steven Tuecke: The Anatomy of the Grid: Enabling Scalable Virtual Organizations: Intl J. Supercomputer Applications, 2001 I. Foster, C. Kesselman, J. Nick, S. Tuecke, The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Open Grid Service Infrastructure WG, Global Grid Forum, June 22, 2002.

Semantic Rule Service Model: Enabling Intelligence on Grid Architecture [3] [4] [5] [6] [7]

[8] [9]

[10]

[11] [12] [13]

[14] [15] [16] [17]

735

Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, In Scientific American, May 2001 Resource Description Framework (RDF) http://www.w3.org/RDF/ Deborab L.McGuiness, Richard Fikes, James Hendler, LynnAndrea Sten, DAML+OIL: An Ontology Language for the Semantic Web IEEE Intelligent System Sep. - Oct. 2002, Web Ontology Language (OWL) http://www.w3.org/2001/sw/WebOnt/ Huajun Chen, Zhaohui Wu: OKSA: an Open Knowledge Service Architecture for Building Large-Scale Knowledge Systems in Semantic Web. In the Proceeding of IEEE Conference on System, Man and Cybernetics, 2003. Rule Markup Language Initiative http://www.dfki.uni-kl.de/ruleml/ Michael Sintek, Stefan Decker: TRIPLE-A Query, Inference, and Transformation Language for the Semantic Web. International Semantic Web Conference (ISWC), Sardinia, June 2002. Joseph B. Kopena and William C. Regli “DAMLJessKB: A Tool For Reasoning With The Semantic Web” 2nd International Semantic Web Conference (ISWC2003), Sanibel Island, Florida, USA, October 20--23 2003. Java Expert System Shell (Jess) http://herzberg.ca.sandia.gov/jess/ DAML-Rule http://www.daml.org/rules/ Benjamin N. Grosof, Ian Horrocks, Raphael Volz, Stefan Decker, Description Logic Programs: Combining Logic Programs with Description Logic, in Proc. of the Twelfth International World Wide Web Conference 20-24 May 2003 I. Horrocks, FaCT and iFaCT. In Proceeding of the International Workshop on Description Logics(DL’99) V. Haarslev and R. Moller, ‘RACER system description’, in Proc. of IJCAR-01, number 2083 of LNAI, Springer-Verlag, 2001. Xuezhong Zhou, Zhaohui Wu. UTMLS: An Ontology-based Unified Medical Language System for Semantic Web. 2002. Jena 2 Development http://www.hpl.hp.com/semweb/jena2.htm

CSCW in Design on the Semantic Web* Dazhou Kang1, Baowen Xu1,2, Jianjiang Lu1,2,3, and Yingzhou Zhang 1 1

Department of Computer Sci. & Eng., Southeast University, Nanjing 210096, China 2 Jiangsu Institute of Software Quality, Nanjing 210096, China 3 PLA University of Science and Technology, Nanjing, 210007, China [email protected]

Abstract. Computer-Supported Cooperative Work (CSCW) in Design explores the potential of computer technologies to help cooperative design. It requires more efficient technologies of communications and reusing knowledge in design process. This paper looks at CSCW in Design on the Semantic Web and shows how the Semantic Web technologies may improve the current design process. It describes using the Semantic Web technologies to represent design knowledge in a unified and formal form that can be understood by both people and machines and shows how this improve all kinds of communication processes in cooperative design. We study the great advantage of sharing and reusing design knowledge on the Semantic Web. This is very helpful in design process, and may completely change the current way of design.

1 Introduction In the contemporary world, design of complex new artifacts, includes physical artifacts such as airplanes, as well as informational artifacts such as software increasingly requires expertise in a wide range of areas. Concurrent engineering is needed in order to manage increasing product diversity to satisfy customer demands while trying to accelerate the design process to deal with the competitive realities of a global market and decreasing product life cycles[1]. Complex designs may be done by many, sometimes thousands of participants working on different elements of the design. The cooperative design process usually has strong interdependencies between design decisions. This makes it difficult to converge on a single design that satisfies these dependencies and is acceptable to all participants[2]. Current cooperative design processes are typically characterized by expensive and time-consuming, poor incorporation of some important design concerns and reduced creativity[3].

* This work was supported in part by the Young Scientist’s Fund of NSFC (60303024), National Natural Science Foundation of China (NSFC) (60073012), National Grand Fundamental Research 973 Program of China (2002CB312000), National Research Foundation for the Doctoral Program of Higher Education of China, Natural Science Foundation of Jiangsu Province, China (BK2001004), Opening Foundation of State Key Laboratory of Software Engineering in Wuhan University, and Opening Foundation of Jiangsu Key Laboratory of Computer Information Processing Technology in Soochow University. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 736–743, 2004. © Springer-Verlag Berlin Heidelberg 2004

CSCW in Design on the Semantic Web

737

CSCW explores the potential of computer technologies to help people work together on two fronts: managing task interdependencies, and managing common information spaces[1]. Researchers in both technology and social domains have published many successful theories and applications such as Email Manage Systems, Work Flow Systems and Group Ware. The coordination and integration of the myriads of interdependent and yet distributed and concurrent design activities becomes enormously complex. It thus seems as if CSCW technologies may be indispensable if cooperative design is to succeed. When apply CSCW approaches to cooperative design, researchers address the issue of supporting the cooperation among designers and other actors over distance by means of a series of shared display facilities. In addition, they explore different approaches to capturing “design rationale” and supporting “organizational memory” [4]. CSCW in Design requires research in different domains: 1. New computer technologies to increase the efficiency of communications; 2. The requirement of reusing research and design results from previous projects; 3. The necessity to take into account results and criticisms occurring during the entire life cycle of designed products[5]. When the Web plays an increasingly important role in people’s works including design works, it is both opportunity and challenge for designers. The Semantic Web enables computers and people work in cooperation and can greatly help people share and reuse knowledge on the Web. It will help CSCW in design both in communication process of design and in sharing and reusing design knowledge. This paper will study how the Semantic Web technologies can improve the current design process. It is organized as follows: Section 2 presents the basic technologies and ideas of the Semantic Web. Section 3 describes the representation of design on the Semantic Web. Section 4 shows the communication processes of design on the Semantic Web. Section 5 studies the sharing and reusing design knowledge on the Semantic Web. Finally, it gives a conclusion.

2 The Semantic Web The Semantic Web is a new form of Web content in which information will no longer only be intended for human readers, but also for processing by machines. The Semantic Web is an extension of the current Web in which information is given welldefined meaning, better enabling computers and people to work in cooperation[6]. The most important technologies for developing the Semantic Web are Extensible Markup Language (XML), Resource Description Framework (RDF) and Ontology. They can express the semantic of information on the Web. The XML data is formally structured and can be processed by machines. RDF shows the meaning of data that let machines know how to process the data automatically. Human language thrives when using homonyms and synonyms. But these usually make machines confused. Ontology formally defines the relations among concepts and has a set of inference rules; it can deal with the problem of homonyms and synonyms. Ontology is the basic of sharing and reusing knowledge on the Web. The search results provided by the current search engines are always full of useless and outlying results. On the Semantic Web, People can retrieval pages that refer to a

738

D. Kang et al.

precise concept. The machines can even relate the information on a page to the associated knowledge structures and inference rules. Software agents can help people do many complex works. They can exchange data with others, find and use Web services automatically via Ontology. Both agents and services can describe themselves to others. Everyone can express new concepts and ideas that they invent with minimal effort. The knowledge is expressed in a unifying logical language, and it is meaningful to both software agents and other people. The Semantic Web technologies can be used to represent design in a unified and formal form that can be understood by both people and machines. These representations provide structures to capture and retrieval design knowledge. It will greatly helpful in the communication processes in design projects. Most designers may be interested in the remarkable advantage of sharing and reusing knowledge.

3 Representations of Design 3.1 Current Representation of Design Currently, information of design is mainly in visualized or nature language form for human reading. This makes a heavy and time-consuming work for designers to capture, index and reuse of design knowledge. It needs to represent designs formally in order to share and reuse them and improve the design process. Different design projects often use different tools and systems; the knowledge is represented in different forms. Current techniques to represent design knowledge are often based on special vocabularies and forms. It makes the communication as well as sharing and reusing of design results very difficult. Another problem is the lack of formal representation of requirements and functionality. Different people may have different understanding when reading the same document. It faces the same problem when describing the intentions of designers. It also need to represent the process of design, including the documents and discusses among the designers. Currently, this data are often simply stored as mainly unstructured nature language texts, pictures, voices or videos and hard to retrieval. We need to represent design including its requirements, functionality and process in a unified and formal way that machines can process automatically.

3.2 Representation on the Semantic Web The Semantic Web techniques can be used to represent designs in a unified and formal way; Ontology is a capable tool. Designers can use a general ontology to capture design knowledge, no matter what specific CAD systems they are using. The ontology can describe the relations among the terms using by different designers. Designers can easily come to a common understanding via the ontology. The knowledge in old informal forms can be processed automatically as well by using RDF technology. Objects included in this general ontology may be parts, features, requirements, and constraints[7]. A part is a component of the artifact being designed. The structure of a

CSCW in Design on the Semantic Web

739

part is defined in terms of the hierarchy of its component parts. These parts describe the model of the artifact. There are different kinds of features associated with a part; e.g., geometrical features, functional features, assembly features, mating features, physical features, etc[8]. It also needs to represent relationships between parts: such as joints, constraints, and behaviors. Parts, features and other parameters and constraints can describe the artifact being designed. However, it is difficult to describe designer’s intention called design rationales[9, 10], especially the functionality of design, which is an important part of design rationales. It needs a conceptual framework enabling systematic description of the functional knowledge for designers to share the conceptual engineering knowledge about functionality. The framework should consist of categorization of the functional knowledge and layered ontologies[11]. Requirements of design include physical requirements, structural requirements, performance requirements, functional requirements, cost requirements and so on[7]. Clients’ requirements are decomposed into requirements for the various sub-systems. Analysis and design is driven by the decomposed requirements. Finally, designers integrate sub-systems to meet the customer requirements. Requirement ontology can provide an unambiguous and precise terminology such that each designer can understand and use in describing requirements, and it can describe dependencies and relationships among the requirements to help the decomposition and integration work. The most powerful ability of requirement ontology is to check whether the functionality of design meets the requirements. The clients and designers can reach a shared understanding by exchanging ontologies. Usually requirement analyzers build requirement ontology instead of clients, because it is inurbane asking clients to provide their requirements in a formal way.

4 Communication Cooperative design mainly includes three kinds of communication and coordination processes: between designers and clients, within design teams, and between designers and design environments.

4.1 Representation on the Semantic Web Designers and clients need shared knowledge and artifacts for mutual understanding. Many clients do not know what they really need and are unfamiliar to the artifacts they want, they cannot tell their ideas accurately. So the first step in design is let some experts to analyze the client’s current situation as precisely as possible and determine what the clients really need by communicating with the clients in both informal and formal interviews. It leads to a common understanding of the real requirements. The experts then use requirement ontology for representing requirements that other designers can understand. These requirements should be visualized or translated back to nature language, and be showed to the clients. They can also be process by machines to find and reuse past results of design which meet the requirements. When designers have completed a design, it is important to know whether the design meets the clients’ needs. Clients often require externalizations for mutual understanding instead of formal representations. They prefer to see visualized results

740

D. Kang et al.

and tryout prototypes of artifacts. Visual representations of information can be based on ontological classifications of that information[12]. Designers use ontology to provide design information. The information can be visualized to clients. Of course, it needs experts to provide more intuitionist information, such as VR space, rapid prototype. These experts do not need to know the detail of design; they can get precise information via the ontology provided by designers. Clients may provide their criticisms; and designers need to improve or redesign the product according to these.

4.2 Within Design Teams Most real tasks are not done by individuals but by groups of people. Members in such teams might have very different interests. This kind of communication process is mainly managing task interdependencies to deal with the strong interdependencies between design decisions. The communications within teams are usually informal, unstructured. There is no need to formalize it, because it is for people to understand. Many CSCW technologies increase the efficiency of communications. One of the main objects is to let people in different places communicate as if they are face to face, such as virtual conferences, video telephones, blackboard systems and VR organizations. The other object is to help designers communicate across time, e.g. emails, workflow systems. Both of them need to make the communications visual and lifelikeness using video, audio and VR technologies. The Semantic Web technologies can help communicators share knowledge more easily and precisely. Another important advantage is to annotate the content of communications. It is difficult to organize and retrieve the video and audio records currently. On Semantic Web, the information contained in these records can be captured by the communicators themselves or by machines analyzing speeches, images and texts; then the XML and RDF can be used to annotate these records, describe the main topic and synopsis. We can make indexes and store the records in a structured space for sharing and reusing.

4.3 Between Designers and Environments These records as well as the documents and results of the design project are stored in common spaces. They are also design artifacts memories that can be used to support indirect, long-term communication. They show the results of the designers’ distributed activities. Designers can see their own contribution, the contributions of others, and the interactions. They can share results and ideas in these information spaces. Designers should be able to access the spaces and add or get information while the privacy information should be protected. There are already many technologies to manage the communication processes between designers and the common spaces, such as telnet, FTP, etc. There are mainly two problems when managing shared information spaces: one is that of indexation, that is, the provision of means that allow an individual to assign a publicly visible and permanent ‘pointer’ to each item so as to enable other individuals to locate the items relatively easily and reliably [1]; the other is the requirements of privacy and safety.

CSCW in Design on the Semantic Web

741

If the results and the communication records has been recorded and structured with semantic information, designers can find the information they want easily and know more about the previous design processes to help current design work. When these spaces are linked to the Web and share their information, they can become treasury for designers all over the world. On the Semantic Web, the documents and records of previous design projects are in well structure and meaningful to machines. Designers can search and process them easily and exactly with the help of agents. They no more have to find a piece of information in tons of document paper, or deal with lots of useless results provided by current search engines. The privacy requirement information can also be added to the original data, and the digital signature technology can be used for trust and encrypt processes. Design environments also include design tools using by designers. These tools, such as CAD systems or CSCW systems, are developed by different companies, and may not compatible. They have different user interfaces and data formats. Designers may cost long times to be familiar with a new design tool. HCI (Human-Computer Interaction) problem is one of the central challenges for CSCW.

5 Sharing and Reusing Knowledge An efficient way to accelerate design process and reduce workload is to reuse the results of previous design projects. Knowledge sharing is a dream of all designers. They have been suffering the difficulty in sharing conceptual knowledge representing designs because of lack of rich common vocabulary. It is extremely difficult to sharing knowledge in different domains and representations. Sharing and reusing knowledge on the Web may greatly help design work. Designers can share experiences and methods of design, share and reuse design results on the Web. There are six challenges of the sharing and reusing knowledge, that are acquire, model, retrieve, publish, reuse, and maintain[13]. The experiences and methods of design are mainly written in nature language currently. The documents and records of design are in different forms and representations too, sometimes in media forms. It is difficult to find information needed. The RDF technology can be used to describe all the resources on the Web. It can give a URI to each resource, such as a text, a document or a media file. Then it makes statements to describe the attributes of the resources represented by URIs and the relations among them. These statements can help us to search and manage the information. Designers also need specific knowledge in domains about the current design work. The specific domain knowledge is mainly represented using the specific terms of that domain currently, and most information is hidden in nature language texts. A newcomer may cost much time to search the information he need, and may trouble understanding what the information means. The Semantic Web makes information retrieval on the Web easily. The domain knowledge is represented using domain ontology. The relations of terms of two domains can be well defined. Users can share and reuse knowledge quickly and exactly via ontology on the Semantic Web. Everyone can publish results, ideas and concepts of design on the Semantic Web with minimal effort. Its unifying logical language will enable the knowledge to be progressively linked into a universal Web. The designs can be represented formally,

742

D. Kang et al.

and computers can find and reuse them automatically. This will open up the knowledge to meaningful analysis by software agents and people in different domains. This will extremely increase the volume of knowledge on the Web for all designers. How to find reusable design results? Which parts should be reused? If there are many suitable parts, which one is better to reuse? How to reuse them? These questions are mainly answered by designers themselves and it costs much time and manpower. Designers need to be skilled in reusing design knowledge and preparing their own design solutions to facilitate reuse[14]. The results of previous design are increasing very quickly. It makes find and reuse suitable results increasingly difficult. The general design ontology can represent the requirements and design results in a formal way. Machine can find out whether a design meets its requirements. When designers start a new design project provided with some requirements, they may ask agents to find design results meet one or some of these requirements in previous design projects. Not only search in the projects of the same organization but also the design results all over the world if they are represented on the Web. Adding a suitable previous design to the current design artifact is a difficult process and there may be many compatible problems. Today there are many industry standards and software methods to help reusing. For example, the COM mechanism in the windows system helps programmers to reuse software modules. An artifact can even describe how to reuse itself by using a unified language on the Semantic Web. Designers can easily reuse it according to this information. There is no central maintain system on the Web. The documents can be selfdescribing, homonymy resources and different versions of data can be easily managed on the Semantic Web, while most heavy maintain work can be done by machines. Sharing and reusing knowledge on the Web may greatly change the current way of design. Everyone may be both designer and client on the Web, and the design works are done by people all over the world.

6 Conclusion Design of complex artifacts requires the cooperation of experts working in different domains. CSCW in Design explores the potential of computer technologies to help cooperative design that is challenging and complex. The Web greatly extends the range of cooperation and provides increasing knowledge to designers. It requires new technologies to increase the efficiency of communications and knowledge sharing. The Semantic Web provides technologies such as XML, RDF and Ontology to represent design in formal, structured and unified forms. There are not only representations of the artifacts of design, but also the design rationales, including functions and requirements. It can also improve the efficiency of all communication and coordination processes in cooperative design. The Semantic Web will help designers with a more efficient way to share and reuse knowledge all over the Web. It will greatly change the current way of design.

CSCW in Design on the Semantic Web

743

References 1. 2. 3.

4. 5.

6. 7. 8. 9.

10.

11. 12.

13. 14.

Schmidt, K.: Cooperative Design: Prospects for CSCW in Design. Design Sciences and Technology, Vol. 6, No. 2,1998, pp. 5-18. Klein, M., Sayama, H., Faratin, P., Bar-Yam, Y.: The Dynamics of Collaborative Design: Insights From Complex Systems and Negotiation Research. Journal of Concurrent Engineering: Research and Applications, 2003, in press. Klein, M., Sayama, H., Faratin, P., Bar-Yam, Y.: A Complex Systems Perspective on Computer-Supported Collaborative Design Technology. Communications of the ACM, Vol. 45, No. 11, 2002, pp. 27-31. Carstensen, P.H., Schmidt, K.: Computer Supported Cooperative Work: New Challenges to Systems Design. Handbook of Human Factors, Tokyo, 2002. Chan, S., Ng, V., Lin, Z.: Guest Editors’ Introduction: Recent Developments in Computer Supported Cooperative Work in Design. International Journal of Computer Applications in Technology, Vol. 16, Nos. 2/3, 2002. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American, Vol. 284, No. 5, 2001, pp. 34-43. Lin, J., Fox, M.S., Bilgic, T.: A Requirement Ontology for Engineering Design. Concurrent Engineering: Research and Applications, Vol. 4, No. 4, 1996, pp. 279-291. Dixon, J.R., Cunningham, J.J., Simmons, M.K.: Research in Designing with Features. Workshop on Intelligent CAD, Elsevier, 1987, pp. 137-148. Kopena, J.: Assembly Representations for Design Repositories and the Semantic Web. Report, Geometric and Intelligent Computing Laboratory, Computer Science Department, Drexel University, 2002. Hu, X., Pang, J., Pang, Y., Sun, W., Atwood, M., Regli, W: Design Rationale: A Background Study. Report, Geometric and Intelligent Computing Laboratory, Computer Science Department, Drexel University, 2002. Kitamura, Y., Mizoguchi, R.: An Ontological Schema for Sharing Conceptual Engineering Knowledge. International Workshop on Semantic Web Foundations and Application Technologies, 2003, in press. Harmelen, F.V., Broekstra, J., Fluit, C., Horst, H., Kampman, A., Meer, J., Sabou, M.: Ontology-based Information Visualization. Proceedings of the 15th International Conference on Information Visualization, London, 2001, pp. 546-554. Troxler, P.: Knowledge Technologies in Engineering Design. Proceedings of the 7th International Design Conference, Dubrovnik, 2002, pp. 429-434. Zdrahal, Z., Mulholland, P., Domingue, J., Hatala, M.: Sharing Engineering Design Knowledge in a Distributed Environment. Journal of Behaviour and Information Technology, Vol. 19, No. 3, 2000, pp. 189-200.

SIMON: A Multi-strategy Classification Approach Resolving Ontology Heterogeneity – The P2P Meets the Semantic Web * Leyun Pan, Liang Zhang, and Fanyuan Ma Department of Computer Science and Engineering Shanghai Jiao Tong University, 200030 Shanghai, China {pan-ly, zhangliang}@cs.sjtu.edu.cn, [email protected]

Abstract. The semantic web technology is seen as a key to realizing peer-topeer for resource discovery and service combination in the ubiquitous communication environment. However, in a Peer-to-Peer environment, we must face the situation, where individual peers maintain their own view of the domain in terms of the organization of the local information sources. Ontology heterogeneity among individual peers is becoming ever more important issues. In this paper, we propose a multi-strategy learning approach to resolve the problem. We describe the SIMON (Semantic Interoperation by Matching between ONtologies) system, which applies multiple classification methods to learn the matching between ontologies. We use the general statistic classification method to discover category features in data instances and use the first-order learning algorithm FOIL to exploit the semantic relations among data instances. On the prediction results of individual methods, the system combines their outcomes using our matching committee rule called the Best Outstanding Champion. The experiments show that SIMON system achieves high accuracy on real-world domain.

1 Introduction Today’s P2P solutions support only limited update, search and retrieval functionality, which make current P2P systems unsuitable for knowledge sharing purposes. Metadata plays a central role in the effort of providing search techniques that go beyond string matching. Ontology-based metadata facilitates the access to domain knowledge. Furthermore, it enables the construction of semantic queries [1]. Existing approaches of ontology-based information access almost always assume a setting where information providers share an ontology that is used to access the information. However, we rather face the situation, where individual peers maintain their own view of the domain in terms of the organization of the local file system and other information sources. Enforcing the use of a global ontology in such an environment would mean to give up the benefits of the P2P approach mentioned above. *

Research described in this paper is supported by The Science & Technology Committee of Shanghai Municipality Key Project Grant 02DJ14045 and by The Science & Technology Committee of Shanghai Municipality Key Technologies R&D Project Grant 03dz15027.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 744–751, 2004. © Springer-Verlag Berlin Heidelberg 2004

SIMON: A Multi-strategy Classification Approach

745

We can consider the process of addressing the semantic heterogeneity as the process of ontology matching (ontology mapping) [2]. Matching processes typically involve analyzing data instances associated with ontologies and comparing them to determine the correspondence among concepts. Given two ontologies in the same domain, we can find the most similar concept node in one ontology for each concept node in another one. However, at the Internet scale, finding such mappings is tedious, error-prone, and clearly not possible. It cannot satisfy the need of online exchange of ontology to two peers not in agreement. Hence, we must find some approaches to assist in the ontology (semi-) automatically matching process. In the paper, we will discuss the use of data instances associated with the ontology for addressing semantic heterogeneity. We propose the SIMON (Semantic Interoperation by Matching between ONtologies) system, which applies multiple classification methods to learn the matching between the pair of ontologies that are homogenous and their elements have significant overlap. Given the source ontology B and the target ontology A, for each concept node in target ontology A, we can find the most similar concept node from source ontology B. SIMON considers the ontology A and its data instances as the learning resource. All concept nodes in ontology A are the classification categories and relevant data instances of each concept are labeled learning samples in a classification process. The data instances of concept nodes in ontology B are unseen samples. SIMON classifies instances of each node in ontology B into the categories of ontology A according the classifiers for A. SIMON uses multiple learning strategies, namely multiple classifiers. Each of classifier exploits different type of information either in data instances or in the semantic relations among these data instances. Using appropriate matching committee method, we can get better result than simple classifier. This paper is organized as follows. In the next section, we introduce the overview of the ontology matching system. In section 3, we will discuss the multi-strategy classification for ontology matching. Section 4 presents the experiment results with our SIMON system. Section 5 reviews related work. We give the conclusion and the future work in section 6.

2 Overview of the Ontology Matching System The ontology matching system is trained to compare two ontologies and to find the correspondence among concept nodes. An example of such task is illustrated in Figure 1 and Figure 2. There are two ontologies of movie database. When a soft agent wants to collect some information about movies, it accesses a P2P system of movie. The movie information on individual peers will be marked up using some ontology such as Figure. 1 or Figure.2. Here the data is organized into a hierarchical structure that includes movie, person, company, awards and so on. Movies have attributes such as title, language, cast&crew, production company and genre and so on. Some classes link to each other by some attributes shown as italic in figure. However, because each of peers may use different ontology, it is difficult to completely integrate all data for an agent that only master one ontology. For example, agent may consider that “Movie” in Allmovie is equivalent to “Movie” in IMDB. However, in fact “Movie” in IMDB is just an empty ontology node and “MainMovieInfo” in IMDB is the most similar to “Movie” in Allmovie. The

746

L. Pan, L. Zhang, and F. Ma

mismatch also may happen between “MoviePerson” and “Person”, “GenreInstance” and “Genre”, “Awards and Nominations” and “Awards”.

Fig. 1. Ontology of movie database IMDB

Fig. 2. Ontology of movie database Allmovie

SIMON uses multi-strategy learning methods including both statistical and firstorder learning techniques. Each base learner exploits well a certain type of information from the training instances to build matching hypotheses. We use a statistical bag-of-words approach to classifying the pure text instances. Furthermore, the relations among concepts can help to learn the classifier. On the prediction results of individual methods, system combines their outcomes using our matching committee rule called the Best Outstanding Champion that is a weighted voting committee. This way, we can achieve higher matching accuracy than with any single base classifier alone.

3 Multi-strategies Learning for Ontology Matching 3.1 Statistical Text Classification One of methods that we use for text classification is naive Bayes, which is a kind of probabilistic models that ignore the words sequence and naively assumes that the presence of each word in a document is conditionally independent of all other words in the document. Naive Bayes for text classification can be formulated as follows. Given a set of classes and a document consisting of k words, we classify the document as a member of the class, words in the document:

that is most probable, given the

can be transformed into a computable expression by applying Bayes Rule (Eq. 2); rewriting the expression using the product rule and dropping the denominator, since this term is a constant across all classes, (Eq. 3); and assuming that words are independent of each other (Eq. 4).

SIMON: A Multi-strategy Classification Approach

is estimated as the portion of training instances that belong to

747

So a key

step in implementing naive Bayes is estimating the word probabilities, We use Witten-Bell smoothing [3], which depends on the relationship between the number of unique words and the total number of word occurrences in the training data for the class: if most of the word occurrences are unique words, the prior is stronger; if words are often repeated, the prior is weaker.

3.2 First-Order Text Classification As mentioned above, data instances under ontology are richly structured datasets, where data best described by a graph where the nodes in the graph are objects and the edges in the graph are links or relations between objects. The methods for classifying data instances that we discussed in the previous section consider the words in a single node of the graph. However, the method can’t learn models that take into account such features as the pattern of connectivity around a given instance, or the words occurring in instance of neighboring nodes. For example, we can learn a rule such as “An data instance belongs to movie if it contains the words minute and release and is linked to an instance that contains the word birth.” Clearly, rules of this type, that are able to represent general characteristics of a graph, can be exploited to improve the predictive accuracy of the learned models. This kind of rules can be concisely represented using a first-order representation. We can learn to classify text instance using a learner that is able to induce first-order rules. The learning algorithm that we use in our system is Quinlan’s Foil algorithm [4]. Foil is a greedy covering algorithm for learning function-free Horn clauses definitions of a relation in terms of itself and other relations. Foil induces each Horn clause by beginning with an empty tail and using a hill-climbing search to add literals to the tail until the clause covers only positive instances. When Foil algorithm is used as a classification method, the input file for learning a category consists of the following relations: 1. category(instance): This is the target relation that will be learned from other background relations. Each learned target relation represents a classification rule for a category. 2. has_word(instance): This set of relations indicates which words occur in which instances. The sample belonging a specific has-word relation consists a set of instances in which the word word occurs. 3. linkto(instance, instance): This relation represents that the semantic relations between two data instances.

748

L. Pan, L. Zhang, and F. Ma

We apply Foil to learn a separate set of clauses for every concept node in the ontology. When classifying the other ontology’s data instances, if an instance can’t match any clause of any category, we treat it as an instance of other category.

3.3 Evaluation of Classifiers for Matching and Matching Committees Method of Committees (a.k.a. ensembles) is based on the idea that, given a task that requires expert knowledge to perform, k experts may be better than one if their individual judgments are appropriately combined [7]. For obtaining matching result, there are two different matching committee methods according to whether utilizing classifier committee: microcommittees: System firstly utilizes classifier committee. Classifier committee will negotiate for the category of each unseen data instance. Then System will make matching decision on the base of single classification result. macrocommittees: System doesn’t utilize classifier committee. Each classifier individually decides the category of each unseen data instance. Then System will negotiate for matching on the base of multiple classification results. To optimize the result of combination, generally, we wish we could give each member of committees a weight reflecting the expected relative effectiveness of member. There are some differences between evaluations of text classification and ontology matching. In text classification, the initial corpus can be easily split into two sets: a training(and-validatiori) set and test set. However, the boundary among training set, test set and unseen data instance set in ontology matching process is not obvious. Firstly, test set is absent in ontology matching process in which the instances of target ontology are regarded as training set and the instances of source ontology are regarded as unseen samples. Secondly, unseen data instances are not completely ‘unseen’, because instances of source ontology all have labels and we just don’t know what each label means. Because of the absence of test set, it is difficult to evaluate the classifier in microcommittees. Microcommittees can only believe the prior experience and manually evaluate the classifier weights, as did in [2]. We adopt macrocommittees in our ontology matching system. Notes that the instances of source ontology have the relative “unseen” feature. When these instances are classified, the unit is not a single but a category. So we can observe the distribution of a category of instances. Each classifier will find a champion that gains the maximal similarity degree in categories of target ontology. In these champions, some may have obvious predominance and the others may keep ahead other nodes just a little. Generally, the more outstanding one champions is, the more we believe it. Thus we can adopt the degree of outstandingness of candidate as the evaluation of effectiveness of each classifier. The degree of outstandingness can be observe from classification results and needn’t be adjusted and optimized on a validation set. We propose a matching committee rule called the Best Outstanding Champion, which means that system chooses a final champion with maximal accumulated degree of outstandingness among champion-candidates. The method can be regarded as a weighted voting committee. Each classifier votes a ticket for the most similar node according to its judgment. However, each vote has different weight that can be measured by degree of champion’s outstandingness. We define the degree of outstandingness as the ratio of champion to the secondary node.

SIMON: A Multi-strategy Classification Approach

749

4 Experiments We take movie as our experiment domain. We choose the first three movie websites as our experimental objects which rank ahead in google directory Arts > Movies > Databases: IMDB, AllMovie and Rotten Tomatoes. We manually match three ontologies to each other to measure the matching accuracy that can be defined as the percentage of the manual mappings that machine predicted correctly. We found about 150 movies in each website. Then we exchange the keywords and found 300 movies again. So each ontology holds about 400 movies data instances except repetition. We use a three-fold cross-matching methodology to evaluate our algorithms. We conduct three runs in which we performed two experiments that map ontologies to each other. In each experiment, we train classifiers using data instances of target ontology and classify data instances of source ontology to find the matching pairs from source ontology to target ontology.

Table 1 shows the classification result matrixes of partial categories in AllmovieIMDB experiment, respectively for the statistic classifier and the First-Order classifier (The numbers in the parentheses are the results of First-Order classifier). Each column of the matrix represents one category of source ontology Allmovie and shows how the instances of this category are classified to categories of target ontology IMDB. Boldface indicates the leading candidate on each column. These matrixes illustrate several interesting results. First, note that for most classes, the coverage of champion is high enough for matching judgment. For example, 63% of the Movie column in statistic classifier and 56% of the Player column in FirstOrder classifier are correctly classified. And second, there are notable exceptions to this trend: the Player and Director in statistic classifier; the Movie and the Person in First-Order classifier. There will be a wrong matching decision according to results of Player column in statistic classifier, where Player in AllMovie is not matched to Actor but Director in IMDB. In other columns, the first and the second are so close that we can’t absolutely believe the matching results according to these classification results. The low level of classification coverage of champion for the Player and Director is explained by the characteristic of categories: two categories lack of feature properties.

750

L. Pan, L. Zhang, and F. Ma

For this reason, many of the instances of two categories are classified to many other categories. However, our First-Order classifier can repair the shortcoming. By mining the information of neighboring instances-awards and nominations, we can learn the rules for two categories and classify most instances to the proper categories. Because the Player often wins the best actor awards and vice versa. The neighboring instances don’t always provide correct evidence for classification. The Movie column and the Person column in table 6 belong to this situation. Because many data instances between these two categories link to each other, the effectiveness of the learned rules descends. Fortunately, in statistic classifier, the classification results of two categories are ideal. By using our matching committee rule, we can easily integrate the preferable classification results of both classifiers. After calculating and comparing the degree of outstandingness, we more trust the matching results for Movie and Person in statistic classifier and for Player and Director in First-Order classifier.

Fig. 3. Ontology matching accuracy

Figure.3 shows three runs and six groups of experimental results. We match two ontologies to each other in each run, where there is a little difference between two experimental results. The three bars in each experimental represent the matching accuracy produced by: (1) the statistic learner alone, (2) the First-Order learner alone, and (3) the matching committee using the previous two learners.

5 Related Works From perspective of ontology matching using data instance, some works are related to our system. In [2] some strategies classify the data instances and another strategy Relaxation Labeler searches for the mapping configuration that best satisfies the given domain constraints and heuristic knowledge. However, automated text classification is the core of our system. We focus on the full mining of data instances for automated classification and ontology matching. By constructing the classification samples according to the feature property set and exploiting the classification features in or among data instances, we can furthest utilize the text classification methods.

SIMON: A Multi-strategy Classification Approach

751

Furthermore, as regards the combination multiple learning strategies, [2] uses microcommittees and manually evaluate the classifier weights. But in our system, we adopt the degree of outstandingness as the weights of classifiers that can be computed from classification result. Not using any domain and heuristic knowledge, our system can automatically achieve the similar matching accuracy as in [2]. [5] also compare ontologies using similarity measures, whereas they compute the similarity between lexical entries. [6] describes the use of FOIL algorithm in classification and extraction for constructing knowledge bases from the web.

6 Conclusions The completely distributed nature and the high degree of autonomy of individual peers in a P2P system come with new challenges for the use of semantic descriptions. We propose a multi-strategy learning approach for resolving ontology heterogeneity in P2P systems. In the paper, we introduce the SIMON system and describe the key techniques. We take movie as our experiment domain and extract the ontologies and the data instances from three different movie database websites. We use the general statistic classification method to discover category features in data instances and use the first-order learning algorithm FOIL to exploit the semantic relations among data instances. The system combines their outcomes using our matching committee rule called the Best Outstanding Champion. A series of experiment results show that our approach can achieves higher accuracy on a real-world domain.

References 1. J. Broekstra, M. Ehrig, P. Haase. A Metadata Model for Semantics-Based Peer-to-Peer Systems. Proceedings of SemPGRID ’03, 1st Workshop on Semantics in Peer-to-Peer and Grid Computing 2. A. Doan, J. Madhavan, P. Domingos, and A. Halevy. Learning to Map between Ontologies on the Semantic Web. In Proceedings of the World Wide Web Conference (WWW-2002). 3. I. H. Witten, T. C. Bell. The zero-frequency problem: Estimating the probabilities of novel events in text compression. IEEE Transactions on Information Theory, 37(4), July 1991. 4. J. R. Quinlan, R. M. Cameron-Jones. FOIL: A midterm report. In Proceedings of the European Conference on Machine Learning, pages 3-20, Vienna, Austria, 1993. 5. A. Maedche, S. Staab. Comparing Ontologies- Similarity Measures and a Comparison Study. Internal Report No. 408, Institute AIFB, University of Karlsruhe, March 2001. 6. M.Craven, D. DiPasquo, D. Freitag, A. McCalluma, T. Mitchell. Learning to Construct Knowledge Bases from the World Wide Web. Artificial Intelligence, Elsevier, 1999. 7. F. Sebastiani. Machine Learning in Automated Text Categorization. ACM Computing Surveys, Vol. 34, No. 1, March 2002.

SkyEyes: A Semantic Browser for the KB-Grid Yuxin Mao, Zhaohui Wu, and Huajun Chen Grid Computing Lab, College of Computer Science, Zhejiang University, Hangzhou 310027, China {maoyx, wzh, huajunsir}@zju.edu.cn

Abstract. KB-Grid was introduced for publishing, sharing and utilizing an enormous amount of knowledge base resources on Semantic Web. This paper proposes a generic architecture of Semantic Browser for KB-Grid. Semantic Browser is a widely adaptable and expandable client to Semantic Web and provide users with a series of functions, including Semantic Browse, Semantic Query and so on. We introduce the key techniques to implement a prototype Semantic Browser, called SkyEyes. Also, an application of SkyEyes on Traditional Chinese Medicine (TCM) is described in detail.

1 Introduction The emergence of Semantic Web [1] will result in an enormous of knowledge base (KB) resources distributed across the web. In such a setting, we must face the challenges of sharing, utilizing and managing huge scale of knowledge. Traditional web architecture seems to be quite insufficient to meet these requirements. Since Grid technologies have the ability to integrate and coordinate resources among users without conflict and insecurity, we propose a generic model of Semantic Browser based on the basic ideas of Grid and Semantic Web and implement a prototype Semantic Browser, called SkyEyes. In the following two subsections, we will introduce the background knowledge as well as some related work. And we will propose a generic architecture of Semantic Browser for KB-Grid. We will also discuss the key techniques to implement a prototype Semantic Browser, SkyEyes. Besides, an application of SkyEyes on Traditional Chinese Medicine (TCM) will be described in detail. In the end, we will take a brief summary on our job and look forward to our future work.

1.1 Background The scale of the Internet has grown at a startling rate and provided us large amount of information. We have to face the problems of publishing, sharing and utilizing the web information, so Knowledge base Grid (KB-Grid) [2] was introduced to meet the requirements. KB-Grid is a project being developed by the Grid Computing Lab of Zhejiang University. KB-Grid suggests a paradigm that emphasizes how to organize, publish, discover, utilize, and manage web KB resources. In KB-Grid, distributed knowledge is presented by lightweight ontology languages such as RDF(S) [3]. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 752–759, 2004. © Springer-Verlag Berlin Heidelberg 2004

SkyEyes: A Semantic Browser for the KB-Grid

753

In such a setting, traditional web browsers would be useless, and some particular browser for KB-Grid should be promoted and developed. SkyEyes is just such a new type of Semantic Browser, aimed at browsing, querying, managing and updating knowledge from distributed KBs for KB-Grid.

1.2 Related Work There have been a variety of researches and applications related to our work: Ideagraph [4] is a personal knowledge management tool for creating, managing, editing and browsing personal KB, which is an easy-to-use software. However, it’s just a local tool for personal use rather than distributed application. IsaViz [5] is a visual working environment for creating and browsing RDF model. It makes use of Graph Viz library to display an RDF model as a bitmap, which is lack of proper layout, so its interaction and effect are not satisfying. OntoRama [6] is a prototype ontology browser, which takes RDF/XML as input format and models ontology as a configurable graph. However, the function of OntoRama is confined to browse and display and it hasn’t supported query yet. There is still no Semantic Browser in real sense. Most of these related researches and applications still aim at visualizing or browsing ontology or knowledge, lack of more intelligent functions such as Semantic Query and reasoning. Besides, many applications are only for local use rather than distributed environment. The application background of our work is TCM Information Grid and our goal is to build Open Knowledge Service Architecture to provide a wide series of knowledge services. The idea of SkyEyes is to build a widely adaptable and expandable intelligent client to Semantic Web, which will provide more intelligent functions. The immediate application of SkyEyes is browse and query of TCM ontology.

2 Overview Semantic Browser works as an intelligent client to Semantic Web and is an interface between KBGrid and KB-Grid users. Any user of KB-Grid can publish, browse, query, manage and utilize knowledge via this browser. Here Fig. 1. A generic architecture of Semantic Browser we propose a generic architecture of Semantic Browser for KB-Grid as figure 1.

754

Y. Mao, Z. Wu, and H. Chen

2.1 Knowledge Server KB-Grid consists of many decentralized KB-Grid nodes. Each node may include several KBs. KB-Grid nodes exchange knowledge and deliver knowledge services through Grid Service interface. These nodes work collectively as a super knowledge server and the inner structure of KB-Grid is transparent to clients. Clients interact with knowledge server through Grid Service. A meta-information register center is set to coordinate KB resources. A shared ontology described by RDF(S) is stored in the register center, which performs as an index for distributed KBs.

2.2 Semantic Browser Plugins Semantic Browser remotely accesses knowledge services through Grid Service. For each type of knowledge services, we accordingly develop a plugin, which is an independent module in Semantic Browser. Service Discovery Plugin. Knowledge server will dynamically deliver various knowledge services and Service Discovery plugin accesses Service Discovery services to get meta-information about services. Semantic Browse Plugin. Semantic Browse is to visualize concepts and their instances that are explicitly described and the relationships among them as semantic graphs and assist users to browse semantic information with semantic links. Semantic Browse plugin accesses Semantic Browse services to carry out Semantic Browse. Semantic Query Plugin. When KB-Grid becomes very huge or they are not very familiar with the structure of the knowledge, users would better query the knowledge instead of browse. Semantic Query is to query semantic information or knowledge with semantic links. Semantic Query plugin accesses Semantic Query services to query semantic information from distributed KBs and optimizes query results. Knowledge Management Plugin. Knowledge Management plugin accesses Knowledge Management services to manage knowledge both local and remote. Reasoning Plugin. Reasoning plugin accesses reasoning services to perform reasoning based on domain ontology to solve practical problems. Users can dynamically choose rule set and case base according to specific problems. Besides, SkyEyes still reserves slots for extended plugins to access possible knowledge services that may be delivered by knowledge server in the future.

2.3 Intelligent Controller Since each plugin just implements a single function separately, Semantic Browser needs an intelligent controller to combine them as a whole. The intelligent controller is the kernel of Semantic Browser, which coordinates and schedules various plugins, making proper plugins access proper services of KB-Grid.

Sky Eyes: A Semantic Browser for the KB-Grid

755

2.4 SGL-Parser and SG-Factory For different formats of semantic information such as RDF(S), XML, OWL and so on, Semantic Browser should display a uniform semantic graph. Besides, if we want to make semantic graphs look clear without loss of semantics, we have to take more into account. Since it’s hard to draw semantic graphs just based on RDF or other existed languages, we develop Semantic Graph Language (SGL) for displaying semantic graphs. And the semantic information acquired from server will be translated uniformly into SGL by Semantic Browser plugins. The SGL Parser will read and parse SGL and the SG-Factory will produce uniform and standard semantic graph, despite the formats of semantic information.

3 Implementation and Key Techniques According to the generic architecture of Semantic Browser, we develop a prototype Semantic Browser, called SkyEyes and have implemented two major functions of Semantic Browser, Semantic Browse and Semantic Query. The user interface is similar with traditional web browsers as figure 2 displays; so common users can operate it easily and well. However, to solve some problems with SkyEyes, users should have enough domain knowledge. SkyEyes was implemented with JAVA, so it’s portable and can be used in different environments. There are several key techniques to Fig. 2. SkyEyes: a Semantic Browser implement SkyEyes.

3.1 Expandable Plugin Mechanism SkyEyes is a lightweight client, which calls special plugins to access remote knowledge services to solve problems. As the scale of knowledge increases or users’ requirements change, knowledge services of KB-Grid may be dynamically changed and delivered. The expandable plugin mechanism allows SkyEyes to expand its function easily just by adding and updating new plugins, without the code and structure modified. In this way, users can even custom their own browsers by subscribing or unsubscribing services as they wish.

756

Y. Mao, Z. Wu, and H. Chen

3.2 Operatable Vectographic Components Each user operation in SkyEyes will result in a semantic graph, which is composed of vectographic components. Vectographic components can be scaled and dragged freely without the quality of the graph reduced, so the display effect of semantic graph is much better and more acceptable to users. A vectographic component itself doesn’t contain or store any data and it is used as proxy or view for semantic information. In a semantic graph, each vectographic component provides users not only a view of semantic information but also a series of functions. If a great deal of semantic information is returned from server, the structure of corresponding semantic graph will become so complex that a lot of nodes and arrows will overlap with each other in one graph. Many visualization tools do suffer this problem that is quite inconvenient to users and sometimes intolerable. In order to solve this problem, SkyEyes adopts radial layout algorithm [6] to arrange the global layout of a semantic graph., so as to avoid overlaping.

3.3 SGL: Semantic Graph Language Not like general graph languages, SGL takes semantics in and treat semantics as part of graph elements. Graph elements described in SGL is related with each other by semantic link not hyper link or graphic link. We can use SGL to describe both the semantics and the appearance of a semantic graph. SGL is an XML-based language and here is part of brief BNF definition of SGL.

The structure of a semantic graph is clear in such a SGL document, and therefore SkyEyes can draw out a standard semantic graph. There is a simple example.

SkyEyes: A Semantic Browser for the KB-Grid

757

Fig. 3. A simple example of SGL

4 Application: Semantic Browse on Traditional Chinese Medicine A practical domain of SkyEyes is Traditional Chinese Medicine (TCM). We have been building an ontology of Unified TCM Language System, which includes TCM concepts, objects, and their relationships with each other. We have chosen Protégé 2000 [7] as ontology editor to build the ontology in RDF(S). The TCM ontology is distributed across the web in more than twenty nodes throughout our country. They share a common ontology in the meta-information register center and are related by semantic links, URIs. Now we have finished in building the whole concept definition of the TCM ontology and edited more than 100,000 records of instances. Users of this TCM ontology can download SkyEyes from our server and install it. Then they can use it to acquire useful information they need from TCM ontology. TCM experts can solve practical problems with the help of SkyEyes, or they can take the result returned by SkyEyes as a reference. For example, if a doctor is not sure about the use of a medicine, he can turn to SkyEyes.

4.1 Semantic Browse Before starting to browse, the doctor can input the URL of the TCM ontology into the address field of SkyEyes and begin to perform Semantic Browse. The process of Semantic Browse can be divided into several steps: First, given a URL, Sky Eyes will connect to knowledge server and get RDF(S) files. It will call Jena [8] API to parse RDF(S) files into some data model that can be understood and processed by client. SkyEyes will extract meta-information about RDF(S) and parse it into a class hierarchy tree to finish the initialization. Next, the doctor can expand a tree node to browse its sub-classes, or click it to list its direct instances in the instance list area. The relationships between the class and properties are displayed as a semantic graph in the semantic graph area. Then, if the doctor finds the instance of the medicine in the instance list, he can click the instance and then all its properties and property values will be displayed around the node standing for the medicine in a graph, including compositions, effect and so on, so the use of that medicine will be very clear. If the doctor wants to know more about one composition of that medicine, he can click the node standing for the composition, then a detailed semantic graph about that composition will be displayed.

758

Y. Mao, Z. Wu, and H. Chen

During Semantic Browse, each user operation will send in a URI, acquire further related semantic information from server through this URI. SkyEyes then calls JGraph [9] API to draw semantic graph according to the information. If the doctor can’t find the information this way, he can turn to Semantic Query.

4.2 Semantic Query SkyEyes itself is unable to query, and query is done by accessing query services. SkyEyes just provides an easy-to-use interface to do Semantic Query and visualizes query results. Query results is returned from server as semantic information, which will be optimized by Semantic Query plugin and displayed as semantic graphs. For the moment, SkyEyes provides four kinds of Semantic Query, and for each, SkyEyes will display a type of corresponding semantic graph. Class-class query returns semantic information about the specific class, its upclasses and its sub-classes. Class-instance query returns semantic information about the specific class and its direct instances. Instance-property query returns semantic information about the specific instance and its properties. Correlative query returns semantic information about the specific class, its upclasses, its sub-classes, and their instances. It fits the condition that the class users queried has no direct instances at all but its sub-classes or up-classes have. The doctor can input restrictive information about the medicine to perform Semantic Query within the TCM ontology. The query result won’t be documents simply containing the keyword but a semantic graph focusing on the medicine, just like the semantic graph displayed when browsing. He can also set the depth of each query and control the semantic graph by configuring some parameters of SkyEyes. Deeper depth means there is more semantic information returned and displayed.

4.3 Reasoning If the problem is more than querying a medicine, for example, treating a patient, the doctor can take advantage of reasoning function to perform more complex work. Firstly, he can describe the symptoms of the patient in a particular format. Next, he could choose a case base that may contain a similar case with a rule set on diagnosis and treatment. The reasoning plugin of SkyEyes will access reasoning services to perform complex reasoning and the results will be returned as semantic information. At last, SkyEyes will display results as semantic graphs and the doctor could take the result as a useful reference when treating the patient.

5 Summary SkyEyes is a prototype Semantic Browser, which was implemented according to a generic architecture of Semantic Browser for KB-Grid. As an intelligent client to Semantic Web, it provides users with the major functions of Semantic Browser and a

SkyEyes: A Semantic Browser for the KB-Grid

759

friendly user interface. SkyEyes owns several important features that distinguish itself from traditional web browsers: Open. SkyEyes is based on Grid Service and works as part of KB-Grid, not subject to traditional C/S structure. Exact. Browse and query utilize semantic links to locate and return more exact and useful information users require. Intelligent. Accessing knowledge services to understand and solve more complex and practical problems, which previously call for domain experts. Expandable. Expandable plugin mechanism allows expanding function dynamically according to the services delivered by knowledge server. Universal. SkyEyes converts various formats of semantic information into uniform semantic graphs based on SGL. Convenient and Operatable. Use of vectographic components provides users with excellent view of semantic information and a series of interactive functions. Our future work is to build a knowledge sharing and knowledge management platform towards Semantic Web. As part of this platform, the function of SkyEyes is still insufficient, so a series of knowledge services especially reasoning services and knowledge management services will be developed based on Grid Service and more Semantic Browser plugins will be added to SkyEyes to extend its functions. Besides, we will go on with the TCM Information Grid for TCM science research.

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

Berners-Lee, T., Hendler, J., Lassila, O. The Semantic Web. Scientific American, May, 2001. WU ZhaoHui, CHEN HuaJun, XU JieFeng. Knowledge Base Grid: A Generic Grid Architecture for Semantic Web. JCST Vol.18, No.4, July, 2003. Resource Description Framework (RDF) Model and Syntax Specification. http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/. Ideagraph, an Idea Development Tool for the Semantic Web. http://ideagraph.net/. IsaViz: A Visual Authoring Tool for RDF. http://www.w3.org/2001/11/IsaViz/. Peter Eklund, Nataliya Roberts, Steve Green. OntoRama: Browsing RDF Ontologies using a Hyperbolic-style Browser. The First International Symposium on CyberWorlds (CW2002), pp.405-411, Theory and Practices, IEEE press, 2002. Protégé 2000. http://protege.stanford.edu/. Jena Semantic Web Toolkit. http://www.hpl.hp.com/semweb/jena.htm. Jgraph. http://jgraph.sourceforge.net/.

Toward the Composition of Semantic Web Services Jinghai Rao and Xiaomeng Su Department of Computer and Information Science, Norwegian University of Science and Technology, N-7491, Trondheim, Norway {jinghai, xiaomeng}@idi.ntnu.no

Abstract. This paper introduces a method for automatic composition of semantic web services using linear logic theorem proving. The method uses semantic web service language (DAML-S) for external presentation of web services, and, internally, the services are presented by extralogical axioms and proofs in linear logic. Linear logic(LL)[2], as a resource conscious logic, enables us to define the attributes of web services formally (in particular, qualitative and quantitative value of non-functional attributes). The subtyping rules that are used for semantic reasoning are presented as linear logic implication. We propose a system architecture where the DAML-S parser, linear logic theorem prover and semantic reasoner can work together. This architecture has been implemented in Java programming language.

1

Introduction

The Grid is a promising computing platform that integrates resources from different organizations in a shared, coordinated and collaborative manner to solve large-scale science and engineering problems. The current development of the Grid has adapted to a services oriented architecture and, as a result, recently Grid technologies are evolving towards an Open Grid Services Architecture (OGSA).The convergence of Web services with Grid computing will accelerate the adoption of Grid technologies. [1] defines a Grid service as a Web service that provides a set of well-defined interfaces and follows specific conventions. As such, Grid service will inherently share some of the same problems and technical challenges of Web service in general. The ability to efficiently and effectively select and integrate interorganizational services on the web at runtime is a critical step towards the development of the online economy. In particular, if no single web service can satisfy the functionality required by the user, there should be a possibility to combine existing services together in order to fulfill the request rapidly. However, the task of web service composition is a complex one. Firstly, web services can be created and updated on the fly and it may be beyond human capabilities to analyze the required services and compose them manually. Secondly, the web services are developed by different organizations that use different semantic model to M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 760–767, 2004. © Springer-Verlag Berlin Heidelberg 2004

Toward the Composition of Semantic Web Services

761

describe the features of services. The different semantic models indicate that the matching and composition of web service have to take into account on the semantic information. In this paper, we propose a candidate solution which we believe to contribute to solving the two challenges. We describe a method for automated web service composition which is based on the proof search in (propositional) Multiplicative Intuitionistic fragment of Linear Logic (MILL [3]). The idea is, given a set of existing web services and a set of functionality and non-functional attributes, the method finds a composition of atomic services that satisfies the user requirements. The fact that Linear Logic is resource conscious makes it possible to make proving on both qualitative and quantitative non-functional attributes of web services. Because of the soundness of the logic fragment correctness of composite services is guaranteed with respect to initial specification. Further, the completeness of logic ensures that all composable solutions would be found. The rest of this paper is organized as follows: Section 2 presents a system architecture for composition of semantic web services. Section 3 presents the methods on transformation between DAML-S documents and Linear Logic axioms. Section 4 discusses the usage of type system to enable semantic composition. Section 5 is the related works and the conclusion of the paper.

Fig. 1. Architecture for Service Composition

2

The Service Composition Architecture

Figure 1 depicts the general architecture of the proposed web service composition process. The approach is presented by the following process. First, a description of existing web services(in the form of DAML-S Profile) is translated into axioms of Linear Logic, and the requirements to the composite services are specified in form of a Linear Logic sequent to be proven. Second, the Linear Logic Theorem Prover determines whether the requirements can be fulfilled by composition of existing atomic services. On reading each propositional variable, the theorem prover requires the semantic reasoner to provide possible subtyping inference. The subtypings are inserted into the theorem prover as logic implications. If one or more proofs are generated the last step is the construction of flow models (written in DAML-S Process). The process is controlled by the coordinator,

762

J. Rao and X. Su

especially when components are distributely located. During the process, the user is able to interact with the system by GUI. In this paper, we pay special attention on the DAML-S Parser and the Semantic Reasoner. The detail on theorem proving part has been already introduced in [7]. The readers who have knowledge about Linear Logic or theorem proving are able to understand this part easily without referring to the separate publication.

3

Transforming from DAML-S to Linear Logic Axioms

In our system, the web services are specified by DAML-S profile externally and presented by LL Axioms internally. The transformation is made automatically by DAML-S Parser. The detail presentation of DAML-S can be found in [4]. Here, we focus on the presentation of LL axioms. Generally, a requirement to composite web service, including functionalities and non-functional attributes, can be expressed by the following formula in LL:

where is a set of logical axioms representing available atomic web services, is a conjunction of non-functional constraints. is a conjunction of nonfunctional results. We will distinguish these two concepts later. is a functionality description of the required composite service. Both I and O are conjunctions of literals, I represents the set of input parameters of the service and O represents output parameters produced by the service. Intuitively, the formula can be explained as follows: given a set of available atomic services and the non-functional attributes, try to find a combination of services that computes O from I. Every element in is in form whereas meanings of I and O are the same as described above. Next, we describe the detail procedure to transform a DAML-S document into the linear logic expression. We present in sequence the transformation on functionalities and non-functional attributes. Afterwards, we present the whole process by an example.

3.1

Transforming on Functionalities

The functionality attributes are used to connect atomic services by means of inputs and outputs. The composition is possible only if output of one service could be transferred to another service as input. The web service is presented by DAML-S profile externally. The functionality attributes of the “ServiceProfile” specifies the computational aspect of the service, denoting by the input, output, precondition and postcondition. Below is an example of the functionalities for a temperature report service:

Toward the Composition of Semantic Web Services

763

From the computation point of view, this service requires an input that has type “&zo;#ZipCode” and produces an output that has type “&zo;#CelsTemp”, the value of temperature measured in Celsius. Here, we use entity types as a shorthand for URIes. For example, &zo;#ZipCode refers to the URI of the definitions for zip code parameter: http://www.daml.org/2001/10/html/zipcode-ont\#ZipCode When translating to Linear Logic formula, we translate the field “restrictedTo” (variable type) instead of the parameter name, because we regard the parameters’ type as their specification. Below is the example propositional linear logic formula that expresses the above DAML-S document:

3.2

Non-functional Attributes

Non-functional attributes are useful in evaluating and selecting service when there are many services that have the same functionalities. In the service presentation, the non-functional attributes are specified as facts and constraints. We classify the attributes into four categories: Consumable Quantitavie Attributes: These attributes limit the amount of resources that can be consumed by the composite service. The total amount of resource is the sum of all atomic services that formulate the composite service. Non-consumable Quantitative Attributes: These attributes are used to limit the quantitative attributes for each single atomic service. The attributes can present either amount or scale. Qualitative Constraints: Those attributes which can’t be expressed by quantities are called qualitative attributes. Qualitative Constraints are those qualitative attributes which specify the requirements to execute a web service. Qualitative Facts: Another kind of qualitative attributes, such as service type, service provider or geographical location, specify the facts regarding the services’ environment. Those attributes can be regarded as goals in LL.

764

J. Rao and X. Su

The different categories of non-functional attributes are presented differently in logical axioms. The non-functional attributes can be described as either constraints or results. The constraints and results of the services can be presented as follows: The constraints to the service: The results produced by the service:

3.3

Example

Here, we illustrate the LL presentation of the temperature report service example where both functionalities and non-functionalities have been taken into consideration. The complete DAML-S description of this example can be found at http://bromstad.idi.ntnu.no/services/TempService.daml. For the sake of readability, we omit the namespace in the name of the parameters. The available atomic web services in the example are specified as follows:

The formula presents three atomic services. name2code outputs the zip code of a given city. temp reports the Celsius temperature of a city, given the zip code of the city. trans transforms the Celsius temperature to the Fahrenheit temperature. in the left hand side of the name2code service denotes that 10 Norwegian Krones(NOK) are consumed by executing the service. The service trans costs 5 NOK and has a quality level 2. The quality level is not a consumable value, so it appears at both the left and right hand sides. In the specification it is also said that the temperature reporting service temp is located in Norway and it only responses to the execution request that has certificated by Microsoft. For other attributes which are not specified in service specification, the values are not considered. The required composite service takes a city name as input and outputs the Fahrenheit temperature in that city. It is specified by LL as follows:

The non-functional attributes for the composite service are:

This means that we would like to spend no more than 20 NOK for the composite service. The quality level of all the selected services should be no higher than 3. The composite service consumer has certification from Microsoft (!CA_MICROSOFT) and it requires that all location-aware services are located

Toward the Composition of Semantic Web Services

765

within Norway (!LOC_NORWAY). ! symbol describes that we allow unbound number of atomic services in the composite service. For the qualitative constraints (location), the service uses LOC_NORWAY to determine its value and we can determine in the set of requirements whether a service meets the requirement. By now, we have discussed how DAML-S specification have been translated to LL extralogical axioms. Next step is to derive the process model from the specification of the required composite service. If the specification can be proven to be correct, the process model is extracted from the proof. we have stressed the proof in a separate publication [7] and therefore we don’t go into detail here. The result dataflow of the selected atomic service are presented through a graphic user interface. A screen shot is presented in figure 2. In figure 2, the interface of the user required service is presented in the ServiceProfile panel (upperright) and the dataflow of the component atomic services is presented in the ServiceModel panel (lowerright).

Fig. 2. The Screen Shot

4

Composition Using Semantic Description

So far, we considered only exact match of the parameters in composition. But in reality, two services can be connected together, even if the output parameters of one service does not match exactly the input parameters of another service. In general, if a type assigned to the output parameter for service A is a subtype of the type assigned to an input parameter for service B, it is safe to transfer data from the output to the input. If we consider resources and goals in LL, the subtyping is used in two cases: 1) given a goal of type T, it is safe to replace by another goal of type S, as long as it holds that T is a subtype of S; 2) conversely, given a resource of type S, it is safe to replace by another resource of type T, as long as it holds that T is a subtype of S. In the following we extend the subsumption rule for both resource and goal. Here we should mention that the rules are not extension to LL. The subtyping can be explained by inference figures of LL. We write in following to

766

J. Rao and X. Su

emphasis that these inference rules are for typing purposes, not for sequencing methods, when constructing programs. First of all, the subtype relation is transitive.

In addition, subsumption rules state the substitution between types.

Such subtyping rules can be applied to either functionality (parameters) or non-functional attributes. Here we use two examples to illustrate the basic idea. First, let us assume that the output of the temperature reporting service is air temperature measured by Celsius scale, while the input of temperature translation service is all Celsius temperature. Because the later is more general than the former, it is safe to transfer the more specific output to the more general input. Another example considers the qualitative facts. If an atomic service is located in Norway, we regard Norway is a goal in LL. Because Norway is a country in Europe, it is safe to replace Norway with Europe. Intuitively, if the user requires a service that is located within Europe, the service located within Norway meets such requirement. In this paper, we assume that the ontology used by service requester and that for the service provider are interoperable. Otherwise, the ontology integration is another issue which is beyond the scope of this paper.

5

Conclusion

This paper approaches the important issue of automatic semantic web service composition. It argues that Linear Logic, combined with semantic reasoning for relaxation of service matching (choosing), offers a potentially more efficient and flexible approach to the successful composition of web services. To that end, an architecture for automatic semantic web service composition is introduced. The functional settings of the systems are discussed and techniques for DAML-S presentation, Linear Logic presentation, and semantic relaxation are presented. A prototype implementation of the approach is proposed to fulfill the task of representing, composing and handling of the services. This paper concentrate on the automatic translation part and the semantic relaxation part, while the theorem proofing part has been stressed elsewhere [7]. Some works have been performed on planning based on semantic description of web services. In [5], the authors adapt and extend the Golog language for automatic construction of web services. The authors addressed the web service composition problem through the provision of high-level generic procedures and customizing constraints. SWORD[6] is a developer toolkit for building composite web services. SWORD uses ER model to specified the inputs and outputs of the web services. As a result, the reasoning is made based on the entity

Toward the Composition of Semantic Web Services

767

and attribute information provided by ER model. [8] presents a semi-automatic method for web service composition. The choice of the possible services are based on functionalities and filtered on non-functional attributes. The main difference between our methods and the above methods is we consider the non-functional attributes during the planning. Usage of Linear Logic as planning language allows us formally define the non-functional characteristics of web services, in particular, quantitative attributes. In addition, we distinguish the constraints and facts in qualitative attributes. The planner treats them differently in logic formulas. Also, as more and more organizations and companies embrace the idea of using web service interface as a cornerstone for future Grid computing architecture, the author hope that the revealing and discussing of semantic related issues will inform researchers in Grid computing of the intricate problem of service composition which might as well rise up in Grid service research. Our current work is directed to add the disjunction connective to the logical specification of service output. This is useful when we should consider exceptions or optional outputs of atomic services. By using disjunction, the planner is also able to generate control constructs such as choice and loop. Although the introduction of disjunction is easy in logic presentation, the proving speed is slowed down significantly. The mechanism to improve the computation efficiency of proving is also under consideration.

References 1. I. Foster, C. Kesselman, J. Nick, and S. Tuecke. The physiology of the grid. Online: http://www.gridforum.org/ogsi-wg/drafts/ogsa_draft2.9_2002-06-22.pdf, January 2002. 2. J.-Y. Girard. Linear logic. Theoretical Computer Science, 50:1–102, 1987. 3. Patrick Lincoln. Deciding provability of linear logic formulas. In London Mathematical Society Lecture Note Series, volume 222. Cambridge University Press, 1995. 4. David Martin et al. DAML-S(and OWL-S) 0.9 draft release. Online: http://www.daml.org/services/daml-s/0.9/, May 2003. 5. Sheila McIlraith and Tran Cao Son. Adapting golog for composition of semantic web services. In Proceedings of the Eighth International Conference on Knowledge Representation and Reasoning(KR2002), Toulouse, France, April 2002. 6. Shankar R. Ponnekanti and Armando Fox. SWORD: A developer toolkit for web service composition. In The Eleventh World Wide Web Conference, Honolulu, HI, USA, 2002. 7. Jinghai Rao, Peep Kungas, and Mihhail Matskin. Application of linear logic to web service composition. In The First International Conference on Web Services, Las Vegas, USA, June 2003. CSREA Press. 8. Evren Sirin, James Hendler, and Bijan Parsia. Semi-automatic composition of web services using semantic descriptions. In Web Services: Modeling, Architecture and Infrastructure” workshop in conjunction with ICEIS2003, 2002.

A Viewpoint of Semantic Description Framework for Service* Yuzhong Qu Dept. of Computer Science and Engineering Southeast University, Nanjing 210096, P. R. China [email protected]

Abstract. The evolvement of Semantic Web Service technology is synergistic with the development of the Semantic Grid, reinforced by the adoption of a service-oriented approach in the Grid through the OGSI. However, the Semantic Web Service technology is far from maturity to pursue the vision of semantic service. This paper illustrates the semantic description framework of DAML-S by using RDF graph model, gives some thinking in improving DAML-S and designing “semantic” service description languages, presents a novel semantic description framework for service, and then illustrates the usage of our framework by two examples in describing web service and grid service.

1 Introduction Building on both Grid and Web services technologies, the Open Grid Services Infrastructure (OGSI) [1,2] defines mechanisms for creating, managing, and exchanging information among entities called Grid services. The main motivation of OGSI is the need for open standards that define the interaction and encourage interoperability between components supplied from different sources. Web/Grid services are represented and described using the WSDL, which uses XML to describe services as a set of endpoints operating on messages. Based on this description language, it is usually impossible for software agents to figure out the precise meaning of the service identifiers and functionality provided by the service. The lack of semantics in the capabilities of the service makes it difficult for machines to discover and use the service at the right time. To bring semantics to web services [3], the Semantic Web technologies such as RDF Schema, DAML+OIL or OWL, have been used to provide more explicit and expressive descriptions for web services. DAML-S [4,5] is a key component towards the Semantic Web Services vision. DAML-S can be used to characterize the serviceportfolio offered by a web service in a more expressive manner than the existing WSDL, thereby opening up the possibility of automatic service discovery and use. Recently, some applications of Semantic Web technologies in Grid applications [6] are well aligned with the research and development activities in the Semantic Web and Grid community, most notably in the areas where there is established use of on*

This paper is jointly supported by NSFC with project no. 60173036 and JSNSF with project no. BK2003001.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 768–777, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Viewpoint of Semantic Description Framework for Service

769

tology. However, the application of Semantic Web technologies inside the grid infrastructure is less well developed. The emerging work on Semantic Web Services is synergistic with the development of the Semantic Grid, reinforced by the adoption of a service-oriented approach in the Grid through the OGSI, as pointed out by David De Roure [7]. We see the importance of semantic description for Grid Service in the vision of Semantic Grid [8], and observe the trend of convergence between Web Service and Grid Service in the future. We also notice that the Semantic Web Service technology (e.g. DAML-S) is far from maturity to pursue the vision of semantic service. Against this background, we focus on the semantic description framework for web service and grid service in this paper. It illustrates the semantic description framework of DAML-S by using RDF graph model, discusses some consideration in improving DAML-S and designing other “semantic” service description languages (SSDL, in short), presents a novel semantic description framework for service, and then illustrates the usage of our framework by two examples.

2 Semantic Description Framework of DAML-S DAML-S provides a set of basic classes and properties for declaring and describing services, as well as an ontology structuring mechanism inherited from DAML+OIL. The set of basic classes and properties is defined in DAML-S Service (an upper ontology) and three subparts including DAML-S Profile, DAML-S Process and DAMLS Grounding. An overview of DAML-S Ontology and the DAML-S Process Model will be discussed respectively in the following subsections by using RDF graph model. Some conventions in the RDF graph model is as follows:

The above notation means that the object labeled with “p” is a property, whose domain and range are specified to be class A and B, respectively. In addition, the label “m:n” is the multiplicity specification of the class A with the property “p” if the multiplicity presents.

The above notation means that the object labeled with “C” and the object labeled with “D” have the binary relationship denoted by the property labeled with “p”, in other words, object C has object D as a value of property p. Usually, a class or a property is assumed to be in a namespace. We use the prefixes such as “xsd”, “rdf”, “rdfs” and “owl”, to denote the commonly used namespaces. In addition, the prefixes such as “service”, “process” and “profile”, are used to denote the namespaces as defined in DAML-S. Note that the namespace prefix of a name may be omitted when there is no ambiguity.

770

Y. Qu

2.1 An Overview of the DAML-S Ontology An RDF graph model of the DAML-S ontology is roughly depicted in Fig. 1. Each instance of Service “presents” zero or more instances of (a descendant class of) ServiceProfile, and may be “describedBy” at most one instance of (a descendant class of) ServiceModel. ProcessModel is a subclass of ServiceModel. Each instance of ProcessModel can has at most one process through the “hasprocess” property. It adopts the processes as classes approach, i.e. any application specific process should be defined as a subclass of (or, a descendant class of) Process. ProcessPowerSet is defined to be the class of all subclasses of Process. In addition, each instance of ProcessModel can have at most one instance of ProcessControlModel, and there exist some debates on the necessity of the process control model. Further, processes have IOPEs (inputs, outputs, preconditions, effects) properties to describe their functionalities, and more discussion will be given in section 2.2. Profile is a subclass of ServiceProfile, and each instance of Profile can point to at most one process through the functional property “has_process”. In addition, each instance of Profile has its IOPEs to present function description. But these IOPEs in profile just refer to the corresponding IOPEs in process, e.g. an input in a profile can only refer to a sub-property of input in the process model.

Fig. 1. RDF Graph Model of the DAML-S Ontology

In the Congo example of the DAML-S 0.9 [4], ExpressCongoBuyService is defined to be an instance of Service. This service presents Profile_Congo_BookBuying_Service (an instance of Profile), and is described by ExpressCongoBuyProcessModel (an instance of ProcesssModel). ExpressCongoBuyProcessModel has a user-defined process ExpressCongoBuy,

A Viewpoint of Semantic Description Framework for Service

771

which is defined to be a subclass of AtomicProcess, and then can be seen as a subclass of Process.

2.2 Framework of the DAML-S Process Ontology Within DAML-S Process Model, there are two chief components of a process model: the Process Ontology and the Process Control Ontology. The latter is out of the scope of this paper, and the former describes a service in terms of its IOPEs and where appropriate, its component sub-processes.

Fig. 2. RDF Graph Model of the DAML-S Process Ontology

An RDF graph model of the DAML-S process ontology is roughly depicted in Fig.2. Processes can have IOPEs and participants. Among them, input, output and participant are sub-properties of parameter in the process model. The ranges of precondition, effect and output are specified to be Condition, ConditionalEffect and ConditionalOutput respectively. The representation of preconditions and effects, as well as coCondition, ceCondition and ceEffect, depends on the representation of rules in the DAML language, but no proposal for specifying rules in DAML has been put forward. For this reason, they are currently mapped to anything possible. In addition to its action-related properties, a process has a number of bookkeeping properties such as name (rdfs:Literal) and address (URI). There are three different types of processes: atomic, simple, and composite. Composite process can be described as atomic processes chained together in a process model. A composite process must have only one composedOf property by which is indicated the control structure of the composite process. The use of control constructs, such as if-then-else, allows the ordering and the conditional execution of the subprocesses (or control constructs) in the composition. Again, take the atomic process ExpressCongoBuy [4] as an example. The ExpressCongoBuy process has two properties (congoBuyBookISBN and congoBuySignInInfo) as its input, two properties (congoOrderShippedOutput and con-

772

Y. Qu

goOutOfStockOutput) as its outputs, two properties as its preconditions (congoBuyAcctExistsPrecondition and congoBuyCreditExistsPrecondition), one property as its effect (ongoOrderShippedEffect). These concrete IOPEs for the ExpressCongoBuy process are specified to be sub-properties of input, output, precondition and effect, with some constraints on their domains and ranges (through using anonymous subclasses via the use of “Restriction”).

3 Some Design Consideration DAML-S can be seen as a “semantic” service description language (in short, SSDL). We use the term SSDL to mean an ontology for describing services plus an ontology structuring mechanism inherited from the ontology defining language (base language). In the case of DAML-S, the base language is DAML+OIL, and the provided ontology is just what we discussed in previous section. The base language of DAMLS will be shifted to OWL in the next release. DAML-S is definitely a great propellant towards Semantic Web Service. However, we also learned some lessons from the evolvement of DAML-S. These lessons are helpful to the design of SSDL as well as the improvement of DAML-S. The followings are our corresponding considerations.

(1) The Use of Meta-modeling Facilities of RDF Schema In the version of DAML-S 0.9, there are some misuses of the meta-modeling facilities of RDF Schema, e.g. the rdfs:range of “has_process” properties. In fact, ontology developers should take care when they define classes of classes or attach properties to classes. As we know, OWL provides three increasingly expressive sub-languages designed for use by specific communities of implementers and users. OWL Full can be viewed as an extension of RDF, while OWL Lite and OWL DL can be viewed as extensions of a restricted view of RDF. When using OWL Full as compared to OWL DL, reasoning support is less predictable since complete OWL Full implementations do not currently exist. Our position is as follows: The designer of a “semantic” service description language (in short, SSDL), such as DAML-S, may use some meta-modeling facilities of RDF Schema, and should be careful to do so, but users of the SSDL should not use any meta-modeling facilities in their applications. In the case of adopting OWL, the designer of a SSDL may use some constructs from OWL Full with carefulness, but users of the SSDL should just use ontology and constructs from SSDL and OWL DL to describe their web service and/or grid service.

(2) The Functional Description of Services In DAML-S Process Model, as illustrated in section 2.1, any application specific process should be defined as a subclass of (or, a descendant class of) Process. In other words, the class Process is the super-class of every user-defined class representing an application-specific process. This representing processes-as-classes approach will bring some harm to the usage of DAML-S. Fortunately, DAML-S 1.0 will adopt the processes-as-instances approach.

A Viewpoint of Semantic Description Framework for Service

773

Secondly, an instance of Service can be described by at most one instance of ProcessModel, and each instance of ProcessModel can has at most one subclass of Process through the “hasprocess” property. This means that each service can be associated with at most one process. It lacks the capability to describe the service type having multiple functions. Of course, there is a trade-off between fine-grain and coarse-grain. We think a good description framework should be flexible enough to cover the coarse granularity. Thirdly, the DAML-S Profile also provides the functional description for service through the IOPEs. But these IOPEs in profile just refer to the corresponding IOPEs in process, and have little added value in principle. As we noted, in myGrid [6], additional properties such as performs_task and uses_resource, etc., are used with profiles to specify which task they perform, which resource they use. We think that the task being performed is an important aspect of a functional description, e.g. retrieving is an example of a generic task. The structure of the DAML-S Profile could be improved accordingly. Finally, the representation of preconditions and effects, as well as the condition of conditional output, depends on the representation of rules in the DAML language. A decision was made to more closely align the DAML Rules with the Rule Markup Language (RuleML [9], in short). Currently, RuleML supports user-level roles, URI grounding, and order-sortedness [10]. The rules will definitely play an important role in representing the functional behavior, however, the variable and scope issue should be resolved when integrating rules into DAML-S or other SSDLs, which will be discussed in more detail in section 4.

(3) The Representation of Stateful Services There are some debates on “Service and State: Separation or Integration?” As we know, pure stateless services are rare and usually uninteresting. The question is through which mechanism to expose the state, through service, specific operation, session or contextualization? Within OGSI, the state of grid service is exposed directly through the service itself. We notice the trend of convergence between Web Service and Grid Service. A good description framework should be flexible enough to cover the requirement of grid service directly, including stateful grid services.

4 A Semantic Description Framework for Service Based on previous considerations, we propose a semantic description framework for service in this section. Our framework has following characteristics: (1). Services are organized by service types using “services implements service types” mechanism; (2). Operations (or processes) are associated with service types; (3). Each operation is represented as an instance; (4). Multiple inputs/outputs of an operation are aggregated by a message with part names, and then each operation has only one input message and only one output message;

774

Y. Qu

(5). Services state can be directly exposed at the service type level; (6). Rules are used to prescribe the behaviors of services at both of service type and operation levels, and three pseudo variables, i.e. “hostService”, “input” and “output”, are introduced to be used within rules. An RDF graph model of our description framework for service is roughly depicted in Fig. 3. Note that the classes and properties within our description framework are assumed to be in a fiction namespace “mySSDL” to avoid name collision, and the prefix of a name would be omitted when there is no ambiguity.

Fig. 3. A Semantic Description Framework for Service

A service has service profiles, implements zero or more service types. A service type is an instance of the class ServiceType. ServiceType is defined to be a subclass of owl:Class, the intended meaning of ServiceType is the class of all service types just as owl:Class is the class of all OWL classes. Each service type may have service data templates, service rules and operations. A service data template describes the name and type of an exposed service data. A service rule prescribes the behaviors of a service type, such as the consistency constraint, the dependency and concurrency between the operations, etc. An operation is an instance of the class Operation, just as a property is an instance of the class rdf:Property. An operation has codomain and/or corange property, just as a property can have rdfs:domain and/or rdfs:range property. But, an operation must have exactly one codomain and corange property, and the codomain and corange property of an operation must be an instance of MessageType. In addition, an operation can have rules to describe functional behaviors of the operation. The class MessageType is defined to be a subclass of owl:Class, the intended meaning of MessageType is the class of all message types. Any message type (an instance of MessageType) can have zero or more part templates. A data template must have exactly one data name and one data type. The data type of a data template can be one of the built-in OWL datatypes including many of the built-in XML Schema datatypes, or anyone of other OWL classes including application specific data types

A Viewpoint of Semantic Description Framework for Service

775

and even user-defined service types. We define a specific message type (named as myssdl:Void) to be an instance of MessageType, with the constraint that there doesn’t exist any data template associated with Void. Rules are used to specify the behaviors of a service as well as the functional behaviors of an operation. As to the variable and scope issue, we propose: The pseudo variable “hostService”, the service data names and operation names of a service type can be used within the service rules of the service type. The service data names of the containing service type, three pseudo variables (“hostService”, “input” and “output”), as well as the data names of the co-domain and co-range message types can be used within operation rules of the function. The pseudo variable “hostService” denotes the service that implements the corresponding service type. Within operation rules of a function, “hostService” denotes the service requested to execute the function, while the pseudo variables “input” and “output” denotes the input message and the output message of the corresponding operation, respectively. It should be noted that further more research work should be taken to integrate RuleML and logic reasoning into our description framework for service, although some theoretical research results [10, 11] has been made.

4.1 Examples This subsection illustrates the usage of our framework by two examples. The first one is about describing web service, while the second one is about grid service. (1) The Congo Example We define a service type BuyBook as an instance of ServiceType, and a service congoBuyBook as an instance of Service. The service congoBuyBook implements the service type BuyBook, which has two or more operations: expressBuyBook, fullBuyBook and other possible operations exposed (e.g. createAcct, createProfile, locateBook). These operations are defined to be instances of Operation, with corresponding input/output message types and operation rules. As to the operation that uses other exposed operations, we can use the service rule to specify the control flow of the composition. Take the expressBuyBook operation as an example. It has an input message type with two parts: a String value with the name buyBookISBN and an instance of SignInData with the name buyBookSignInInfo. The operation has an output message type with one part: a String value with the name replyMessage. The operation has an operation rule, which prescribe that if the book is in stock, then the reply message indicates that the order is shipped; and that if the book is out of stock, then the reply message indicates that the book is out of stock. In the case that the quantity of available books is exposed as a service data, the rule could say more about the precondition and effect of the operation. (2) The GridService portType As we know from OGSI [2], the GridService portType MUST be implemented by all Grid services and thus serves as the base interface definition in OGSI. This portType

776

Y. Qu

is analogous to the base Object class within object-oriented programming languages such as Smalltalk or Java, in that it encapsulates the root behavior of the component model. The behavior encapsulated by the GridService portType is that of querying and updating against the serviceData set of the Grid service instance and managing the termination of the instance. Now let’s define a service type, also named as “GridService”, to reflect the above idea. First, we can define GridService as an instance of ServiceType. The GridService is defined to have many service data templates, including the ones with following data names: interface, serviceDataName, gridServiceHandle, gridServiceReference, findServiceDataExtensibility, setServiceDataExtensibility, factoryLocator, terminationTime. The first six ones have list types as their data types, the last two has ogsi:LocatorType and ogsi:TerminationTimeType as their data types, respectively. The mutability, modifiable and nillable of these service data can be described by service rules, and rules can also be used to describe the multiplicity constraints on the first six service data elements, e.g. a grid service has at least one interface and two setServiceDataExtensibility, as well as the other consistency constraint, e.g. initial setting of service data value. Second, we define the following operations for GridService: findServiceData, setServiceData, requestTerminationAfter, requestTerminationBefore and destroy, with their corresponding input/output message types. By doing so, every service that implements the service type “GridService” can be seen as a grid service. That’s just what service data and behaviors the OGSI requires every grid service should have. As to the domain-specific grid applications, the application domain community can design a couple of service types for reuse in their grid applications.

5 Conclusion The semantic description framework of DAML-S is illustrated by using RDF graph model in this paper. We discussed some consideration issues in improving DAML-S and/or designing another SSDL, and a novel description framework for service is presented. The six characteristics of our description framework are outlined, and it’s illustrated by two examples. There are some research works related to this paper. For example, Web Service Modeling Framework (WSMF) is proposed in [12] to enable fully flexible and scalable E-commerce based on web services, and IRS-II (Internet Reasoning Service) [13] is a framework and implemented infrastructure, whose main goal is to support the publication, location, composition and execution of heterogeneous web services, augmented with semantic descriptions of their functionalities. As compared to these related works, the system infrastructure and service composition are not the focus of this paper. Our main concerns include the semantic description framework for service and the design issues of the SSDL. We hope that the six characteristics of our description framework given in section 4 and our thinking on designing SSDL given in section 3 could push the improvement to DAML-S as well as the design of other SSDLs. To make the vision of Semantic Grid Service and Semantic Web Service a reality, a number of research challenges need to be addressed. Our further research work in-

A Viewpoint of Semantic Description Framework for Service

777

cludes the integration of rules into SSDL as well as the trust and provenance of Semantic Service. We believe, with the evolvement of the SSDL and the emerging of system infrastructure and framework, the Semantic Grid/Web Service will become a real life in the future.

References 1 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

Foster, C. Kesselman, J. Nick, S. Tuecke, Grid Services for Distributed System Integration. IEEE Computer, 3 5(6): 37-46, 2002. S. Tuecke, K. Czajkowski,I. Foster, J. Frey, S. Graham, C. Kesselman et al, Open Grid Services Infrastructure (OGSI), 2003, Available online at http://www-unix.globus.org/toolkit/draft-ggf-ogsi-gridservice-33 2003-06-27.pdf. Sheila A. McIlraith, David L. Martin: Bringing Semantics to Web Services. IEEE Intelligent Systems, 18(1): 90-93, 2003. DAML-S 0.9 Draft Release (2003). DAML Services Coalition. Available online at http://www.daml.org/services/daml-s/0.9/. DAML-S Coalition. DAML-S: Web Service Description for the Semantic Web. In First International Semantic Web Conference (ISWC) Proceedings, Sardinia (Italy), June, 2002, pp 348-363, 2002. C. Wroe, R. Stevens, C. Goble, A. Roberts, M. Greenwood, A suite of DAML+OIL Ontologies to Describe Bioinformatics Web Services and Data, International Journal of Cooperative Information Systems, Vol. 12, No. 2 (2003) 197-224. David De Roure, Semantic Grid and Pervasive Computing, available online at: http ://www. semanticgrid .org/GGF/ggf9/gpc/ Roure, D., Jennings, N. and Shadbolt, N. 2001. Research Agenda for the Future Semantic Grid: A Future e-Science Infrastructure. Available online at http://www.semanticgrid.org/v1.9/semgrid.pdf Harold Boley, Said Tabet, and Gerd Wagner. Design Rationale of RuleML: A Markup Language for Semantic Web Rules. In Proc. Semantic Web Working Symposium (SWWS’01), pages 381-401. Stanford University, July/August 2001. Harold Boley, Object-Oriented RuleML: User-Level Roles, URI-Grounded Clauses, and Order-Sorted Terms, in Workshop on Rules and Rule Markup Languages for the Semantic Web (RuleML-2003), Sanibel Island, Florida, USA, 20 October 2003. Benjamin N. Grosof, Ian Horrocks, Raphael Volz, and Stefan Decker. Description Logic Programs: Combining Logic Programs with Description Logic. In Proc. 12th Intl. Conf. on the World Wide Web (WWW-2003). Budapest, Hungary, May 2003. Fensel, D., Bussler, C. (2002). The Web Service Modeling Framework WSMF. In 1st meeting of the Semantic Web enabled Web Services workgroup, 2002. Available at http://informatik.uibk.ac.at/users/c70385/wese/wsmf.bis2002.pdf. Enrico Motta, John Domingue, Liliana Cabral , and Mauro Gaspari, IRS-II: A Framework and Infrastructure for Semantic Web Services. In Proceedings of the 2nd International Semantic Web Conference 2003 (ISWC’2003), 20-23 October 2003, Sundial Resort, Sanibel Island, Florida, USA.

A Novel Approach to Semantics-Based Exception Handling for Service Grid Applications* Donglai Li,Yanbo Han, Haitao Hu, Jun Fang, and Xue Wang Software Division, Institute of Computing Technology, Chinese Academy of Sciences, 100080, Beijing, China Graduate School of Chinese Academy of Sciences, 100039, Beijing, China {ldl, yhan, huhaitao,fangjun, wxue }@software.ict.ac.cn

Abstract. Whenever the characteristics of a service grid environment are addressed, issues related to openness and dynamism pop out first. Such issues do affect the definition and handling of application exceptions, and traditional approaches to exception handling lack in proper mechanisms for capturing exception semantics and handling exceptions. In this paper, after analyzing the newly arisen problems of exception handling in a service grid environment, we focus on exceptions caused by runtime mismatches between user’s requests and underlying services, and propose a semantics-based approach to handling this kind of exceptions. The approach was first developed within the FLAME2008 project and some promising results have been achieved.

1 Introduction In a service grid environment, services evolve autonomously, their coupling is highly loose, and system boundaries are no longer clearly in control. Exceptions [8] [10] may happen more frequently when building applications in such an open and dynamic environment, especially in connection with frequent changes of user requirements. Exception handling has always been an important topic and some previous efforts have led to remarkable achievements [3][9][11]. But in a service grid environment, new challenges appear: Applications may use any network-based service. Most users don’t have the ability to describe all the potential runtime mismatches, which lead to exceptions. We thus intend to detect such mismatches automatically. The handling process should be determined dynamically and mostly by the system instead of users. Often, because of lacking background knowledge and potential services, one couldn’t describe details of most mismatch exceptions. A mechanism is needed to judge what a mismatch exception is and whether it happens or not, and a mechanism is needed to determine the way of handling a mismatch exception. The connection among aspects of exception handling process is “tight” in current exception handling technology. To solve this problem, semantics is naturally introduced. In the information systems context, semantics can be viewed as a mapping between an object modeled, represented and/or stored in an information * This paper is supported by the National Natural Science Foundation of China under Grant No. 60173018 and the Young Scientist Fund of ICT, CAS under Grant No. 20026180-22. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 778–786, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Novel Approach to Semantics-Based Exception Handling

779

system and the real-world object(s) it represents. [1]. Finding out the relationship between the semantics of a user’s request and the semantics of the services may help to solve the mismatch exception handling problems. Based on semantics, we propose an approach, named ASEED(a novel approach to semantics-based exception handling for service grid applications). We developed this approach within the FLAME2008[12], which adopts the service grid paradigm and develops a service integration platform targeted at a real-world application scenario – an information service provision center for the Olympic Games Beijing 2008. The paper organized as follow: section 2 discusses the prerequisites of the approach; section 3 illustrates the approach; section 4 shows and evaluates the approach; section 5 compares with related works; Last section lists some future directions.

2 Prerequisites of ASEED Although services are encouraged to be capsulated in a unified form, for example, Web Services facilitate open standards and a service is described by WSDL, there is no global system for publishing services in such a way as it can be easily processed by anyone else. The problem is that, in some contexts, it is difficult to use the services in the ways that their designers might want [4]. The unclear meaning of user’s requests and services makes the situation more complicated. Thus the meaning of user’s requests and services, so called semantics, needs to be exposed. In order to solve the mismatch exceptions, some prerequisites are needed.

2.1 Semantics Infrastructure We take ontology-based semantics as the infrastructure of our approach. An ontology is a formal, explicit specification of a shared conceptualization, which is mediator to share common concepts among different parties. We depict the relationship between two concepts. For example a single ontology includes some concepts: mammal, human, woman, etc. Woman is the subclass of human, human is the subclass of the mammal and woman inherits all properties of human and human inherits all properties of mammal. Thus we have different granularities of semantics. To illustrate these layered structural ontologies, we use a graph to demonstrate, nodes in the graph stand for specified semantics and the edges of the graph show the relations among specified semantics. Many research projects have produced large ontologies such as WordNet, which provides a thesaurus for over 100,000 terms explained in natural language. In a grid environment, resource sharing is done with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO)[7]. Such rules are also the guidance and reference when experts construct ontologies besides their domain knowledge.

780

D. Li et al.

2.2 Semantics of Services and User’s Requests Usually a service has two kinds of properties: functional ones and non-functional ones. Functional properties illustrate what the service could do, while non-functional properties show other information such as QoS. And they have their respective usage: - Semantics of functional properties: it manifests the service’s ability. Though languages like WSDL can describe it from syntax aspect, they don’t make it machineunderstandable. Services published with accessible semantics of their functional properties can be searched, invoked automatically. - Semantics of non-functional properties: services having same functional properties may be different from many other aspects, such as different QoS. Semantics of non-functional properties describe the constraints of the services so that the services can be understood more precisely. In a user’s request, each unit, so called activity, illustrates the user’s partial desire. Similar to a service, an activity usually has two aspects of semantics: Semantics of functional desire and Semantics of non-functional desire With the semantics infrastructure, it’s possible for users to describe their requests and for services suppliers to publish services from semantics level. The mismatch between them can be checked automatically by the system, because the machine understands feature of semantics.

3 The Semantics-Based Exception Handling Approach To ensure a smooth execution of service grid applications, mismatches between user’s requests and services should be observed and handled as mentioned before. Fig1 illustrates the central idea of our approach:

Fig. 1. Reference model of ASEED When an exception is signaled, the semantics of user’s request and the services are derived form the Exception Context and the compatibility between them are analyzed by the Mismatch Analysis to find out whether it’s a mismatch exception. There are common factors among different handling processes, such as the exception context, the strategy of handling exceptions, etc. The Handling Pattern contains patterns, which abstract out the common information of exception handling processes. Patterns are referred when similar exception happens.

A Novel Approach to Semantics-Based Exception Handling

781

When a mismatch exception happens, the Mismatch Analysis informs the Strategy Selection and Specified Handling Strategy is found out then by consulting the Exception Context and the Handling Pattern. The semantics infrastructure is the basis of connecting aspects of exception handling. Based on it, exceptions of mismatches between user’s requests and services can be observed and specified handling strategy is produced. Main components of the approach will be illustrated in the following sections.

3.1 Mismatch Exception Analysis During execution, the semantics of selected services should match the semantics of user’s requests from both the functional aspect and the non-functional aspect. When an exception is signaled, the compatibility between semantics of services and user’s requests will be calculated. The algorithm is:

Compatibility of both functional and non-functional semantics would be checked, because either of them may lead to mismatch exceptions. With the semantics infrastructure, those kinds of mismatch exception can be distinguished automatically by the system when exceptions happen, and users don’t have to specify them when describing their requests.

3.2 Handling Pattern There are common factors among different handling processes. Pattern[2] is used to describe those common factors. A pattern may consist of many parts, and we use the following four to contain a minimum: name, problem, context and solutions: Name: each pattern has its specified name, which identifies itself. Problem: it describes what kind of mismatch exception the pattern tries to solve. Context: a mismatch exception resides in a specified context, which consists of the structure of user’s requests, the semantics of user’s request or the semantics of the services and the context affects the exact meaning of an exception. Solutions: solutions contain the necessary actions to handle the exceptions. It may contain one or more steps to guide the handling of a specified exception. The instantiations of patterns, which we call cases, have the solid information of the exception handling processes. For the context of a case, an expression is used to describe the structure: ax stands for a single activity; stands for Sequence; stands for Concurrent;

stands for Choice; power stands for Loop:

782

D. Li et al.

Fig. 2. Illustration of the a process’s structure

While the semantics of the user’s requests or services in the case are illustrated by a set of value pair. And the Solutions contain a set of semantics notations illustrating the handling process. Case will be populated during the exception handling processes.

3.3 Strategy Selection Handling strategy is the solutions property of the case. In order to solve the mismatch exceptions, information from the exception context will be consulted to retrieve the suitable case. The case matching consists of two parts: the structure matching and the semantics matching. - Structure matching: for the case and the user’s request, the structure can be described as an expression as mentioned above. Thus the structure matching then can be treated as an expression matching. Only those cases which have the same structure as the user’s request can be considered as matched. - Semantics matching: each activity in the case or in the user’s request maps with a (set of) specified semantics. Usually a single case or user’s request consists of many activities. The semantics matching results are defined by the least square distance function, which assumes that the best-fit is the one that has the minimal sum of the deviations squared from a given set of semantics, shown as below:

and are the corresponding activities in the user’s requests and the case; is the weight for which illustrates the relativity of in the case. is the function which calculates the semantics matching degree of two corresponding activities from user’s request and the case:

A Novel Approach to Semantics-Based Exception Handling

783

The smaller SM is, the closer the semantics compatibility is. Structure similarity and semantics similarity make the Strategy Selection retrieve the suitable case, and the handling strategy is then retrieved and used to guide the exception handling.

4 Implementation in FLAME2008 4.1 Case Study A common travel scenario from Flame2008: first booking the flight, then reserving a hotel room and renting a car in parallel. If both are done, then booking a ticket for swimming game of Olympic Games 2008.

Fig. 3. Demonstration of travel scenario

Fig3 shows the scenario that illustrates the functional semantics mapping (two dotted broken line) and non-functional semantics mapping (doted line) for GTB. Before starting GTB, User changes his mind to watch fencing game instead of swimming game. So he modifies GTB by changing a non-functional semantics (lightcolored doted line). While the continuous execution of the application stops due to some wrong returning result, which signal an exception. The following shows the mismatch exception handling process: (1) Mismatch Analysis: For GTB, the semantics of the user’s new request (fencing ticket booking), the semantics of the user’s old request (swimming ticket booking) and the semantics of the selected service have the same functional semantics: “http://flame/KgB/travel.daml#sports.ticketbooking”, but their non-functional semantics are different:

784

D. Li et al.

Activity name GTB (new) GTB (old)

Non-functional semantics http://flame/KgB/travel.daml#sports.ticketbooking.fencing http ://flame/KgB/travel.daml#sports .ticketbooking . s wimmin ging Selected Service http://flame/KgB/travel.daml#sports.ticketbooking.swimmin ging After the modification, the functional semantics of the service still satisfies the user’s request, but the non-functional semantics of the service is no longer compatible (they are siblings in the semantics graph). Thus a mismatch exception is observed. (2) Strategy Selection: in order to get the handling strategy, the structure similarity and the semantics similarity are calculated so as to retrieve suitable case: - Structure matching: shown in Fig3, the structure of the exception context is By expression compare, Cases with same structure are selected out. - Semantics matching: Among the cases selected out by structure similarity comparing, the semantics similarity is calculated. Using the algorithm we have mentioned, a suitable case is selected out and it is: Service ReSelect Name A new activity replaces the old one to satisfy a different goal. Problems Context Structure: Semantics: //semantics of non-functional properties are omitted here http://flame/KgB/travel.daml#traffic.plane-ticketbooking> http://flame/KgB/travel.daml#accomodation.hotelreserving> http://flame/KgB/travel.daml#traffic.carrenting> http://flame/KgB/travel.daml#sightseeing.ticketbooking> Solutions http://flame/KgB/task/Execute.daml#ServiceReSelect In the case, the solution is “http://flame/KgB/task/Execute.daml#ServiceReSelect”, which guides the execution of the application by allowing it to select another service. Meanwhile, if there is no suitable case available, a default case will be consulted.

4.2 Evaluation of ASEED Based on semantics, we proposed an approach to handle runtime mismatch exceptions of service grid applications. Semantics helps the system know what a runtime mismatch exception is. Also it helps to auto-detect these exceptions and handle them dynamically. In service grid environment, handling exception by using our approach has some promising effects: - The veracity of catching mismatch exceptions: The explicit semantics from some aspects of the application are used, such as the semantics of the services. Exceptions of mismatches between user’s requests and services could be detected precisely. - The flexibility of handling exceptions: as we have mentioned, exceptions may have different meaning and should be handled by different ways. Our approach tries to find out the suitable way each time an application encounters mismatch exceptions. And during handling process, the handling strategies are located dynamically. Still there are some points affecting the handling process: - How minute the semantics has been described: the precision of catching mismatch exceptions depends largely on the granularity of semantics description. If the granularity is rough, the catching result is not good.

A Novel Approach to Semantics-Based Exception Handling

785

The similarity-matching of the case: it’s easy to compare the structure similarity. But the semantics similarity may count on more factors so as to make more precise matching. The handling patterns: in order to solve all kinds of mismatch exceptions, the handling patterns and their instances are needed to be enriched and better managed.

5 Related Works In a service grid environment, exception handling is still in its infant state. Some research groups have begun to pay their attention to this issue. IBM’s BPEL4WS[6] pays attention to the ability for flexible control of the reversal by providing the ability to define exception handling and compensation in an application-specific manner. But it mostly deals with local exceptions and exceptions (handlers) which are predefined by users themselves. If there are unexpected exceptions, the system cannot be aware of them. Globus[5] provides a range of basic services to enable the construction of application specific fault recovery mechanisms. It focuses on providing a basic, flexible service that can be used to construct a range of application-specific fault behaviors. But it is difficult to build those kinds of service for common use. There are some other works in related domains like workflow domain: [11] provides a model, which provides a rule base that consists of a set of rules for handling exceptions. But the rule base is a separate component functionally disjoint from the exception database. Sometimes the rules cannot describe the scenario even when an approach has been adopted to resolve many exceptions because of the disconnection between the two bases. METEOR[13] tried to solve the conflicts resolution in cross-organizational workflows. Their approach bundles knowledge sharing, coordinated exception handling and intelligent problem solving. The attention is paid to the conflictions between the handling participants not the mismatches we mentioned. But the case matching algorithm is worthy of reference. For most work, if the mismatch exceptions are not defined by users, they can’t be detected and handled. But it is quite difficult for users to describe all mismatch exceptions in a service grid environment due to its open and dynamic characteristics. Also the system lacks in the means of automatically detecting and handling the mismatch exceptions. And most work focuses on the problems within a bounded environment either intra-organization or inter-organization where exceptions and exception handling processes are stable in some sense.

6 Conclusions In this paper, we analyzed the upcoming problem of mismatch exceptions between user’s requests and the underlying services in service grid environment. In order to build reliable applications, these exceptions needed to be detected and handled in an effective way. We proposed a novel approach named ASEED, which adopts semantics as a dominant role for providing a basis throughout an exception handling process. Semantics makes the mismatch exceptions and the handling processes machine understandable. The approach makes it possible that these exceptions don’t

786

D. Li et al.

have to be described by users and mostly they are detected and handled by the system itself. The approach has been implemented in the FLAME2008 project and some promising results have been achieved. Still some problems need to be solved to perfect the approach. In our future research, we will pay our attention to the following problems: the spectrum of exception context needs to be broadened and the details of context will be studied more thoroughly; Elements of a pattern will be under thorough consideration and a thorough classification of the patterns is needed to be done. Effective management of the cases is needed to offer better support.

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13.

A. Sheth, Data Semantics: What, Where, and How?, in Data Semantics (IFIP Transactions), R. Meersman and L. Mark, Eds., Chapman and Hall, UK, (1996) 601-610. E. Gamma,R. Helm, R. Johnson, J. Vlissades, Addison-Wesley, Design Patterns Elements of Reusable Object-Oriented Software, ISBN: 0-201-63361-2, (1995) F. Casati, S. Ceri, S. Paraboschi, and G. Pozzi, Specification and Implementation of Exceptions in Workflow Management Systems, TODS, Vol 24, No. 3, (1999) 405-451 http://infomesh.net/2001/swintro/ http://www.globus.org/details/fault_det.html http://www-900.ibm.com/developerWorks/cn/webservices/ws-bpel_spec/index_eng.shtml I. Foster, C. Kesselman, and S. Tuecke. The anatomy of the Grid: Enabling scalable virtual organisations. International Journal of Supercomputer Applications, 15(3), (2001). J.B. Goodenough, Exception Handling: Issues and a Proposed Notation, Communications of the ACM, Vol. 18, No. 12 (1975)683-696 J. Eder and W. Liebhart, Contributions to Exception Handling in Workflow Systems, EDBT Workshop on Workflow Management Systems, Spain, (1998). J.L. Knudsen, Better Exception-Handling in Block-Structured Systems, IEEE Software, Vol. 17, No. 2 (1987) 40-49 S.Y. Hwang, S.F. Ho, J. Tang, Mining Exception Instances to Facilitate Workflow Exception Handling, Proc. of the Sixth International Conference on Database Systems for Advanced Applications, Taiwan, (1999) 45-52. Y. Han, H. Geng, H. Li, J. Xiong et al, VINCA – A Visual and Personalized Businesslevel Composition Language for Chaining Web-based Services, Proc. of International Conference on Service Oriented Computing , Italy (2003) Z. Luo, A. Sheth, K. Kochut and B. Arpinar, Exception Handling for Conflict Resolution in Cross-Organizational Workflows, Distributed and Parallel Databases Journal, Vol 11(2003)

A Semantic-Based Web Service Integration Approach and Tool* Hai Zhuge, Jie Liu, Lianhong Ding, and Xue Chen Knowledge Grid Research Group, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, 100080, Beijing, China [email protected], [email protected], [email protected]

Abstract. Integration of Web Services for large-scale applications concerns complex processes. Component technology is an important way to decompose a complex process and promote efficiency and quality. This paper proposes a service integration approach considering both the integration of service flows and the integration of data flows. The approach includes the component-based Web Service process definition tool, mechanism for retrieving services in a well-organized service space and UDDI repositories, algorithms for heterogeneous data flow integration, and rules for service verification. The proposed approach has been integrated into an experimental service platform and used in an online book sale business application. Comparisons show the features of the proposed approach.

1 Introduction Integration of Web Services for large-scale applications is a challenging issue due to unmanageable efficiency and quality issue of the involved complex service processes. Another issue arising from service integration is how to conveniently, accurately and efficiently retrieve services from the rapidly expanding and large-scale service repositories. Data returned from multiple services may be heterogeneous in semantics, structure and value [1, 4], so the third issue of service integration is how to integrate the heterogeneous data flows returned from different services so as to provide a unified view for users. Previous research on Web Service integration mainly concerns approaches for automatically integrating relevant services by using semantic markups [8], Petri-Netbased and ontology-based approaches for service description, simulation, verification and composition [3, 10], and languages for describing behavioral aspects of the service flow [6, 11]. However, these research works seldom address the above three issues. The current UDDI registry nodes only provide keyword-based service retrieval [9]. If users are not familiar with the pre-specified service categories, they usually could * The research work was supported by the National Science Foundation of China (NSFC). M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 787–794, 2004. © Springer-Verlag Berlin Heidelberg 2004

788

H. Zhuge et al.

not get the satisfied retrieval results. Applications show that the current UDDI repositories cannot meet the needs of the business processes in efficiency and accuracy. This paper solves the issue of complex service process construction by making use of component technology, an important way to decompose a complex service process. A component-based service process definition tool has been implemented to assist users to transform a business process into a service process and then to specify the requirements for the related service components, which are integrated by service flow and data flow. Interactions between the components in the service process are based on XML, SOAP and WSDL. We solve the issue of improving the accuracy and efficiency of service retrieval by making use of the service space model, which organizes Web Services in a normalized and multi-dimensional service space so that services could be retrieved efficiently. We solve the issue of heterogeneous data flow integration by establishing mapping between the global schema and the source schema. The semantic heterogeneity, the structure heterogeneity, and the data value heterogeneity are considered.

2 General Architecture The general architecture of the proposed Web Service integration approach is illustrated in Fig. 1, which mainly consists of the following modules: Process Definition, Definition Verification, Requirement Description, Web Service Retrieval, Integration, Integration Verification, and Registration.

Fig. 1. General architecture of the proposed approach

A Semantic-Based Web Service Integration Approach and Tool

789

We have developed a component-based Web Service process definition tool to assist users to transform a business process into a service process. The process definition is accomplished by drawing nodes and arcs on the interface with the help of the operation buttons. After definition, the completeness and time constraints of the defined process components are verified. Modification is required in case errors occur. Otherwise, users can specify the requirements for the related service components by using the definition tool. A service space is a multi-dimensional space with a set of uniform service operations (http://kg.ict.ac.cn). A referential model for the service space can be expressed as Service-Space=(Classification-Type, Category, Registry-Node). In order to retrieve the required services effectively and efficiently, multi-valued specialization relationships and similarity degree between services are constructed [13]. Besides the GUI, the service space supports applications to retrieve services by using SOAP messages. If no matching services are retrieved, the service space will automatically communicate with the UDDI repositories through SOAP messages to get the required services. The components in the service process are integrated through service flow and data flow. The service flow reflects the control dependence, while the data flow denotes the data dependence among the service components. The Integration Verification module checks the accessibility, deadlock and the execution state of the service process. If no error occurs, the new service will be registered at the service space and also the UDDI repositories, otherwise modification of process definition is triggered.

3 Semantic Heterogeneous Data Flow Integration In order to form a single semantic image for heterogeneous data returned by the service components [15], we use a triple DIS = to represent a data integration system, where G is the global schema — the XML schema defined by application developers, S is a set of the source schemas — the XML schemas of the data sources returned by service components, and M is the mapping from G into S. The process of heterogeneous data integration consists of the following four steps: The first step is global schema definition. The application developers define the basic information, the data dependence relationships (i.e., the semantic constraints), and the structure of the global schema according to the requirements. The basic information is represented by the structure: GNode (GnodeID, Gnode, Gtype), where GnodeID is the node identifier, Gnode is the node name, and Gtype is the node type. The data dependence relationship is represented by a set of pairs where is the key just as the key in relational database systems. Paths in the global schema is expressed by the structure: GSchema (GpathID, Gpath, GpathLID, Gtype), where GpathID is the path identifier, Gpath is the label path (i.e., a sequence of slash-separated labels starting from the root to the current node), GpathLID is the identifier path (i.e., a sequence of slash-separated node identifiers starting from the root to the current node), and Gtype is the terminal node type.

790

H. Zhuge et al.

The second step is source schema extraction, which loads each data flow of the componential services, traverses the source schema recursively by the preorder sequence, and extracts the node name and the label path of the leaf node (or attribute node) from each source. The node information is kept by the structure: SNode (SourceID, SnodeID, Snode, Stype), and the label path information is recorded by the structure: SSchema (SourceID, SpathID, Spath, SpathLID, Stype), where SourceID is the source identifier, SpathID is the path identifier, Spath is the label path, SpathLID is the identifier path, and Stype is the terminal node type. The third step is mapping construction between the global schema and the source schemas, which solves the semantic conflict, structure conflict, and data value conflict among the involved service components. To solve the semantic conflict, such as the synonymy relationships between nodes naming, each node in the global schema is associated with a semantic attribute set (i.e., a set of semantically related terms) generated by making use of WordNet and can be added, modified and deleted on demand. Structure conflict is resolved through node mapping, path mapping, and tree mapping between the global schema and the source schemas. The node mapping is to map nodes in the global schema into nodes in the source schemas according to the established semantic attribute set. Human intervention is necessary in order to denote the mapping nodes on demand but they are not included in the semantic attribute set. The path mapping is to map the label paths in the global schema into paths in the source schemas. The tree mapping is to map the global schema as a tree into the source schemas. The tree structure sequence derived from the global schema is denoted as where is the path identifier, and is the label path from the root to the leaf node. The tree structure of the source schemas is denoted as where is the source identifier, is the path identifier, and is the label path from the root to the leaf node. The fourth step is data integration. To integrate the heterogeneous data flows satisfying users’ queries, the global query sequences including all the possible sub queries about the global schema are established. We use a set of triples to denote the global query sequences as where denotes the sub-query identifier, is the path expression of the sub-query, and is the condition to be satisfied. Each user query corresponds to a set of non-continuous branches in the global query sequence, which further corresponds to non-continuous branches in each source tree and executes at each source. Data returned from the service components satisfying the sub-query branches with Boolean conditions is integrated. To solve the problem of data value conflict, the involved sources are ranked considering the reliability, data accuracy, and data quality. Data returned from the sources with higher rank has the higher priority.

4 Verification The Definition Verification module validates the completeness and time constraints of process definition. First, a component should be independent. The independency

A Semantic-Based Web Service Integration Approach and Tool

791

requires the component to reflect an independent business and to be able to execute independently. Second, a component should be encapsulated to interact with the rest components in the process through SOAP messages. Third, the start node and the end node should be unique and the internal process completeness should be satisfied as discussed in [14]. Considering the time factor and the logical relationships among services, the following rules are used for verification: The start time of a single node must not be earlier than its predecessor’s end time, The end time of a single node must not be later than its successor’s start time, The start time of a node with “And-join” predecessors must not be earlier than any of its predecessors’ end time, The start time of a node with “Or-join” predecessors must not be earlier than all of its predecessors’ end time, The end time of a node with “And-split” successors must not be later than any of its successors’ start time, The end time of a node with “Or-split” successors must not be later than all of its successors’ start time, The start time of an arc must not be earlier than its predecessor’s end time, The end time of an arc must not be later than its successor’s start time. The Integration Verification focuses on the following aspects: First, the components in the service process should be reachable from the start node, and, deadlock and loop should be checked and eliminated. Second, the components to be retrieved should be found in service space and UDDI repositories. Third, the execution condition of the components should be satisfied during the execution process, and data returned in the data flows should satisfy the requirements.

5 Application in Online Book Sale The purpose of this application is to demonstrate the integration of book information from multiple booksellers. According to the business process of book sale, users can use the definition tool to define service process and specify requirements for service components as shown in Fig. 2, where the background is the top-level service process, and the middle window is service component requirement specification. Clicking the “Search” button in the middle window will trigger the search process. Users will be asked to select services from a name list in the front window, and then information about service components will be automatically returned. The global schema defined by the application developers is shown in Fig. 3. The basic information of a book includes Book={ISBN, Title, Author, Publisher, Year, Abstract, Vendor, Price, Stock}. The semantic constraints is denoted as where ISBN is the key of Title, Author, Publisher, Year, and Abstract, and both ISBN and VendorID are the key of Price and Stock. The semantic attribute set consists of {ISBN (Book No, Book ID), Title (Name), Author (Writer),

792

H. Zhuge et al.

Publisher (Bookman), Abstract (Outline, Abstraction), Price (Cost), Stock (Inventory, Amount, Quantity)}.

Fig. 2. An interface of the component-based service process definition tool

Fig. 3. The global schema of application in online book sale

After extracting the source schemas of the involved service components, the node mapping, path mapping, and tree mapping between the global schema and the source schemas can be constructed automatically. The global query sequence can be denoted as Each user query corresponds to non-continuous branches in the global query sequence, which further corresponds to branches of the source schemas. We use the example

A Semantic-Based Web Service Integration Approach and Tool

793

“To retrieve books about Web Services under $60” to illustrate the query matching process. The above user query corresponds to two sub-query branches which further corresponds to sub-queries at three involved service components as {(1, 4, Amazon/Book/Title, “Web Services”) AND (1, 11, Amazon/Book/Price, “<60$”) OR (2, 4, Barnes/Book/Name, “Web Services”) AND (2, 10, Barnes/Book/Price, “<60$”) OR (3, 8, eCampus/Book/BasicInfo/Title, “Web Services”) AND (3, 14, eCampus/Book/OtherInfo/Price, “<60$”)}. In the 4-tuple, the first parameter represents the source identifier, the second denotes the path identifier, the third is the label path, and the fourth is the query condition. Then, data satisfying both and is integrated. The prototype of the online book sale service integration system has been implemented. The advantages of integrating multiple book sale services are as follows: First, it enables single point of access to multiple booksellers. Without service integration, users have to visit multiple book-sale websites to get enough supply information and then to make a purchase decision. Second, the newly integrated book sale service can provide with additional information such as price comparison that cannot be provided by the separate services.

6 Comparison The major differences between the proposed approach and the previous works concern three aspects: the component-based service process integration, the normalized service organization, and the heterogeneous data flow integration. The component-based Web Service integration divides and conquers the complexity of complex service processes. Service components can increase the flexibility and efficiency in building the service processes by realizing component reuse of different granularities. The encapsulation characteristic of service components also localizes the possible modification and errors. Comparisons between the proposed Web service retrieval approach and the previous research works can be found in [13]. Many approaches have been done on data integration. A new system for indexing and storing XML data by using single element and attribute as the basic unit of query was proposed in [7]. Approaches indexing on all the raw paths starting from the root node were proposed in [5]. An adaptive path index approach by utilizing the datamining algorithm to summarize XML path information that appears frequently was proposed in [2]. A novel index approach integrating both content and structure of each source was proposed in [12]. However, these approaches based on the static structure and content indexes are unsuitable for integrating data dynamically generated. Our approach solves the following three types of heterogeneity in service integration: the semantic conflict is managed by establishing the semantic attribute set and semantic constraints in the global schema; the structure conflict is solved through node mapping, path mapping, and tree mapping between the global schema and the source schemas; and, the data value conflict is managed according to the quality of the involve service components.

794

H. Zhuge et al.

7 Conclusion To solve the issues of complexity and efficiency in large-scale service integration, this paper presents a semantic-based Web Service integration approach based on the componential process construction and well-organized service space. The first contribution is to provide with a component-based Web Service process definition tool. The second contribution is to use service space to organize Web services in a normalized service space so as to improve the accuracy and efficiency of service retrieval. The third contribution is to propose a tree-based mapping approach to integrate data returned by multiple service components in the way of single semantic image [15].

References 1. Bergamaschi, S., Castano, S., Vincini, M.: Semantic Integration of Semistructured and Structured Data Sources. SIGMOD Record 28 (1999) 54-59 2. Chung, C., Min, J., Shim, K.: APEX: An Adaptive Path Index for XML Data. ACM SIGMOD 2002 Conference, Madison Wisconsin (2002) 3. Cui, Z., Jones, D., O’Brien, P.: Semantic B2B Integration: Issues in Ontology-Based Approaches. SIGMOD Record 31 (2002) 43-48 4. Delobel, C., et al., Semantic Integration in Xyleme: A Uniform Tree-Based Approach. Data & Knowledge Engineering 44 (2003) 267-298 5. Goldman, R. and Widom, J.: Dataguides: Enabling Query Formulation and Optimization in Semistructured Databases. 23rd International Conference on Very Large Data Bases, Athens Greece (1997) 6. Leymann, F.: Web Services Flow Language (WSFL 1.0). IBM Corporation. May 2001 7. Li, Q., Moon, B.: Indexing and Querying XML Data for Regular Path Expressions. 27th International Conferences on Very Large Data Bases, Roma Italy (2001) 8. McIlraith, S., Son, T.: Adapting Golog for Composition of Semantic Web Services. 8th International Conference on Principles of Knowledge Representation and Reasoning, Toulouse France (2002) 9. Microsoft Advanced UDDI Search, http://uddi.microsoft.com/search.aspx 10. Narayanan, S., McIlraith, S.: Simulation, Verification and Automated Composition of Web Services. 11th International World Wide Web Conference, Honolulu Hawaii (2002) 11. Thatte, S.: XLANG—Web Services for Business Process Design. Microsoft Corporation. May 2001 12. Wang, H., et al.: ViST: A Dynamic Index Method for Querying XML Data by Tree Structures. ACM SIGMOD/PODS 2003 Conference, San Diego California (2003) 13. Zhuge, H., Liu, J.: Flexible Retrieval of Web Services. Journal of Systems and Software. http://www.elsevier.com/locate/jss 14. Zhuge, H.: Component-Based Workflow Systems Development. Decision Support Systems 35 (2003) 517-536 15. Zhuge, H.: Clustering Soft-Devices in Semantic Grid. IEEE Computing in Science and Engineering 4 (2002) 60-63

A Computing Model for Semantic Link Network* Hai Zhuge1, Yunchuan Sun1,2, Jie Liu1, and Xiang Li1 Knowledge Grid Research Group, Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Science, 100080, Beijing, China1 [email protected]

Beijing Normal University, 100875, Beijing, China2 [email protected]

Abstract. Semantic Link Network SLN is a model of future Semantic Web and Knowledge Web. Based on our previous SLN model, this paper enriches the semantic links and reasoning rules, proposes a computing model for SLN, and develops relevant reasoning and managing mechanisms using a semantic matrix representation of SLN. Keywords: Knowledge Grid, Matrix, Semantic Grid, Semantic Link, Semantic Web, Web

1 Introduction As an open system, WWW has become an important means to distribute and retrieve information. However, the current Web has a severe limitation: the hyperlinks are only pointers from page to page without any semantic relationship. Application systems can hardly provide accurate information retrieval and intelligent services because they cannot understand the contents of Web pages. Tim Berners-Lee and other scientists proposed the Semantic Web to improve the current Web by means of given Web pages well-defined meaning [1,3]. It is mainly based on ontology mechanisms and the markup languages such as Resource Description Framework (RDF), which integrates a variety of applications using XML for syntax and URIs for naming [4]. The Semantic Grid, the natural development of the Semantic Web and Grid, intents to incorporate the advantages of both [3,7]. The motivation of Semantic Web is obviously right, however, easy-to-use and easy-to-build are two important criteria that determine the success or failure of new techniques. We have proposed the Semantic Link Network SLN as a model to facilitate the Semantic Web [6]. The links among resources (documents, images, and concepts etc) are not simply hyperlinks but semantic links, which support certain semantic reasoning. Through defining semantic link primitives such as cause-effect link and sequential link, the semantic relationships between resources can be easily represented, just as the hyperlink. Based on our previous SLN model, this paper first enriches the semantic links and reasoning rules, and then proposes a computing model for SLN including semantic link reasoning and operations. Based on a semantic matrix representation of SLN, we develop a matrix-based reasoning theory and management approach for SLN. * This work was supported by National Science Foundation of China. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 795–802, 2004. © Springer-Verlag Berlin Heidelberg 2004

796

H. Zhuge et al.

2 Semantic Link Reasoning 2.1 Enrich Semantic Links Our previous SLN model has seven semantic link primitives: cause-effective link (ce), implication link (imp), subtype link (sub), similar-to link (sim), instance link (ins), sequential link (seq), and reference link (ref) [6]. A semantic link with semantic factor (type or relation) between two resources and is denoted as We herein add the following new semantic links and operations. 1. Equal-to link, denoted as e, which indicates two resources are identical in meaning. Clearly, a resource is equal to itself. 2. Empty link, denoted as which shows two resources are absolutely irrelevant. 3. Null link or unknown link, denoted as Null or N, which shows that the semantic relation between two resources is unknown or uncertain. Null-relation means that there are some possible semantic relations, and we do not know their exact meaning so far. It can be replaced with certain relationship once provided by users or deduced from reasoning mechanism. denoted as or 4. Reverse relation operation of some semantic relation If there is a semantic relation from to then there is a reverse semantic relationship from to we call it reverse relationship. For example, a causeeffect link from to means that is the cause of and is the effect of a Reverse(ce) relation from to A semantic relation and its reverse declare the same thing, but the reverse relationship is useful in reasoning. 5. relation, denoted as or for some semantic relation relation means that there does not exist relation from a resource to another. Sometimes, it is useful in reasoning if we know clearly that there is no certain semantic relation between two resources. 6. Opposite relation, denoted as which states that the successor declares the opposite idea of the predecessor. If and hold, we can get that The opposite relation is symmetrical, i.e., is equal to

2.2 Reasoning Rules Reasoning with semantic relations is to get some uncertain semantic relation between two resources from a group of semantic relations over an SLN using certain reasoning rules. We have given 22 rules introduced in [6]. More domain-dependent reasoning rules can be developed according to application requirement. Based on the new semantic relations, some new rules can be obtained as shown in Table 1. We call a semantic factor is stronger than denoted as or if is implied by in meaning. That is, if there exists a semantic relation between two resources, then there must exist a semantic relation It is obvious that implication relation among reasoning rules is non-reflexive, symmetric and transitive.

A Computing Model for Semantic Link Network

797

Some semantic relations that have not been marked between two resources can be derived by logical reasoning based on reasoning rules and links. A semantic link can be added to an SLN once we get the semantic relation between two resources.

3 Operations on Semantic Relations We herein use an addition operation and a multiply operation to represent two types of semantic link compositions. Definition 1. (Addition) If there exist two semantic links with semantic factors and from to over an SLN, then the two semantic links can be merged into one with the semantic factor Such semantic link merge operation is determined by the addition operation of and

Fig. 1. Addition operation of semantic relations For example, if there exist two semantic links ce and seq from to over an SLN, then the meaning of the semantic link from to is ce+seq, which means that is not only the cause but also the successor of We can extend the addition operation from two semantic links to n semantic links and the result can be denoted as or Clearly, during reasoning occurs means a semantic conflict occurs. Similarly, etc., also lead to semantic conflict. Once a conflict occurs, the most important thing is to deal with the conflict. Only a consistent SLN can properly support problem-solving and question-answering. In following discussion, if not specified, operations and reasoning carry out only in a consistent SLN. According to the definition of addition, we have the following operation laws and characteristics. Laws for addition: 1. 4. 2. 5. 3. 6.

798

H. Zhuge et al.

Characteristic 1. For semantic factors

and

involved in a consistent SLN, we have

Characteristic 2. For semantic factors and involved in a consistent SLN, if and then Definition 2. (Multiplication) Assume there exist two semantic relations: is from to and is from to over a consistent SLN. If we can get the semantic factors from to by reasoning based on prior assumption, then we call the reasoning process multiplication operation, denoted as where Fig.2 shows an example of the multiplication process, where the multiplication is based on reasoning rules and

Fig. 2. An example of multiplication operation:

The process of multiplication of n semantic relations, denoted as is essentially the process of a logical reasoning. The objective of the multiplication of the n semantic relations is to get the semantic relation from the first resource to the last one. Based on above definitions, we can get the following multiplication operation laws. Multiplication Operation Laws: 1. 4. 2. 3. 5. Lemma 1. For semantic relations and involved in a consistent SLN, Corollary 1. For semantic relations and involved in a consistent SLN, Lemma 2. For semantic relations and involved in a consistent SLN, if then holds. In many cases, the reasoning rules are commutative, i.e., For example, ce×sub=sub×ce and ce×st=st×ce hold. However, we cannot assure the commutative characteristic hold for any two semantic relations. And also we cannot assue that whether the multiplication combination law hold or not. A feasible way to compute is to take the summation as

4 Matrix Representation for SLN and Matrix-Based Reasoning 4.1 Concept Definition Similar to any other networks, an SLN can be represented as a directed graphics with semantic relations. So an SLN can be expressed as (N, L), where N is a set of resource

A Computing Model for Semantic Link Network

799

nodes and L is a set of directed semantic links. We can get a closure for an SLN by adding all possible semantic links over an SLN. Definition 3. (Closure of SLN) The closure of an SLN S(N, L) is a new SLN (N, L’), denoted as where L’ is constructed as follows: 1) all semantic links included in L are included in L’; 2) a semantic link from a resource to another is appended to L’ if the semantic relation between two resources is available via reasoning on L. The closure is unique for a given SLN. We say two SLNs are equivalent if their closures are identical. Clearly, an SLN is equivalent to its closure. An SLN S(N,L) is said to be included by T(N’,L’) if and denoted as Obviously, Lemma 3. For SLNs S and T, S is equivalent to T if and only if and The proof is similar to that of the lemma of two equivalent rule bases discussed in [5]. Definition 4. (Minimal Cover for SLN) An SLN M is the minimal cover of another one S, if 1) and 2) no semantic link sl exists in M such that holds. The minimal cover is a refined SLN, which involves the least number of semantic links and keeps equivalent to the original. We can use the approach for refining a rule base to get the minimal cover that is important to maintain an SLN [5].

4.2 Matrix Representation of SLN An SLN can be represented by an adjacent matrix. Given an SLN with n resources it can be represented by matrix as follows, where represents the semantic factor from to We call it semantic relationship matrix (SRM).

As discussed, the semantic relation from a resource to itself can be regarded as e, so for any i, we have And for any i and j, we have in an SLN matrix. If there does not exist marked semantic links between two resources and then we have and For a given SLN, the corresponding matrix defined by (1) is unique, vice versa. For example, the right part of Fig.3 is the semantic matrix of the SLN shown on the left.

Fig. 3. A simple SLN and its semantic relationship matrix.

800

H. Zhuge et al.

4.3 Reasoning with SLN Matrix Reasoning in SLN is to derive the semantic relation between two resources by logical reasoning via a series of semantic relations (links). Assume a consistent SLN consists of n resources: and its semantic relationship matrix is M. We can get as the semantic relation between and if it is marked in the matrix. However, sometimes the matrix contains the Null relation between and although it may be other semantic relations that can be deduced from reasoning. So the reliable semantic relation, denoted as should be derived from reasoning. Theorem 1. In a consistent SLN, a reliable semantic relation can be computed by using the following formula: where M is the semantic relationship matrix, is the ith row and is the jth column of M that is:

Proof: In fact, the solution for getting the reliable semantics from to is as follows: 1) find all paths from to in the SLN; 2) reasoning along each possible path obtained in 1); 3) take the summation of all reasoning results as the final result. Any two resources in the SLN are connected because all unknown semantic meanings between them are regarded as Null semantic meaning. Therefore, all paths from to can be classified as follows according to their lengths. Length=l: the semantic meaning is Length=2: the semantic summation is Length=3: the summation is Length=n-1: summation is

the

semantic

In the following, we prove that the summation of the semantic meaning when length =n-1 includes all others. We need to consider the following two cases: Case 1: Length n-1. There must be at least one ring in the path since there are only n resources. Each of these paths can be denoted as the semantic meaning along such a path is according to Lemma 1, we can easily obtain that The right part of the inequality is exact the semantic meaning by reasoning along the path We can deal with the rings as above till the length is smaller than n. That means the semantics of any semantic path with length>n-1 is implied by that of a path with length
A Computing Model for Semantic Link Network

801

According to Case 1, Case2 and Corollary 1, the summation of all reasoning results is the summation of the semantic meaning when length =n-1, i.e., And the result is exactly the result of Corollary 2. Let and be the same meaning as defined above, we have We can easily prove Corollary 2 by Theorem 1. If we compute the semantics of any two resources in a consistent SLN by the above formula, we can get a new semantic relationship matrix called full semantic relationship matrix (FSRM), i.e., the SLN matrix for the closure of the original SLN. We get the semantic relationship of any two resources in the SLN by the FSRM. Of course, some of these semantic relations are marked clearly and attached with the semantic factors in the original SLN, and others can be derived by logical reasoning. The fact is that any logical reasoning using the semantic relations can be realized by the multiplication of the SLN matrix onto itself. Corollary 3. For a semantic relationship matrix M and its FSRM F, we have The full semantic relationship matrix for the SLN shown in Fig.3 is as follow. It is easy to testify that the semantic relationship of any two resources derived by the above matrix agrees with that derived by logical reasoning using the original SLN shown in Fig.3.

Corollary 4. For a semantic relationship matrix M and its FSRM F, we have F×M=F. Proof: We only need to verify that each element of F×M is equal to the corresponding one of F, i.e.,

The right of the equation can be denoted as

By Characteristic 2, By and we have

Obviously, we have: i.e., for all 1 m n. i.e.,

which means that F×M=F.

5 SLN Management Based on Matrix Representation It is vital to ensure the consistency for SLN because any inconsistency in SLN is potential to damage logical reasoning. The SLN provides richer semantics than the hyperlink Web; however, its maintenance cost is higher. Fortunately, the semantic matrix provides very useful tool. The SLN management concerns resource management and semantic link management, and the operations consist of adding, deleting and updating. Generally, consistency checking is necessary to be executed

802

H. Zhuge et al.

during these operations. For example, while adding a new semantic relation to an SLN, we need to do: 1) decide whether the new semantic relation conflicts with the SLN; and 2) if conflicts, cancel the operation, otherwise add the semantic link to the SLN. Obviously, the first step is the key and can be completed through the FSRM. Similarly, other operations can be executed with support of semantic relation matrix. Just as constructing the current Web, constructing a world wide SLN requires us to build a local SLN first, and then merge it to the main. To ensure consistency, we need to detect whether the local SLN agree with the main. In many cases, users just need a very small portion of the whole SLN. So a meaningful part should be generated from the main. The semantic matrix is a useful tool to merge a local SLN to the main and to cut an SLN into small parts and to ensure their consistency and integrity.

6 Discussion and Conclusion To maintain semantic consistency, the future Semantic Web should establish an authority certification mechanism just as the UDDI for current Web Services. The future Semantic Web would consist of three layers: the bottom is the hyperlink network; the middle is the uncertified SLN, which can provide some useful reference information but does not guarantee its correctness; and the top is the certified SLN, which can provide provable information and explanation. SLN is a promising model for Semantic Web. This paper enriches the semantic relation and reasoning rules on our previous SLN model, presents a semantic matrix model for SLN based on the addition and multiplication semantic operations, and proposes relevant theory for SLN reasoning and management.

References 1. T. Berners-Lee, J. Hendler, and O. Lassila, The Semantic Web, Scientific American, vol.284, no.5, 2001, pp.34-43. 2. I. Foster, C. Kesselman, J.M. Nick, and S. Tuecke, Grid Services for Distributed System Integration, Computer, June, 2002, pp.37-46. 3. J. Hendler, Agents and the Semantic Web, IEEE Intelligent Systems, vol.16, no.2, 2001, pp.30-37. 4. M. Klein, XML, RDF, and relatives, IEEE Internet Computing, vol.5, no.2, 2001, pp.26-28. 5. H. Zhuge, Y.Sun and W.Guo, Theory and Algorithm for Rule Base Refinement, in Proc. of the 16th International Conference on IEA/AIE, 2003, Springer LNAI 2718, pp.187-196. 6. H. Zhuge, Active Document Framework ADF: Model and Tool, Information and Management, vol.41, no.1, 2003, pp.87-97. 7. H. Zhuge, Clustering Soft-Devices in Semantic Grid, IEEE Computing in Science and Engineering, Nov. /Dec., 2002, pp.60-63.

A Semantic Web Enabled Mediator for Web Service Invocation Lejun Zhu, Peng Ding, and Huanye Sheng Department of Computer Science & Engineering, Shanghai Jiaotong University, 1954 Huashan Rd., Shanghai, China, 200030 [email protected] [email protected] [email protected]

Abstract. The increasing amount of web services in global Grid environment brings various interoperability problems to service providers and client application developers. Providing machine understandable descriptions, autonomous agents can adapt client applications to heterogeneous services with little human interference. This paper introduces a multiagent scenario on the Semantic Web to help client applications to invoke heterogeneous services without reprogramming. This paper also presents an example of this scenario.

1 Introduction Web Service is the basis of modern Grid architecture [1]. Ideal programs can discover and use services with a certain function automatically. But in the real world, programs are usually bounded to the “stub” codes generated for a specific service interface. Whenever a service interface is created or changed, existing application programs cannot use it until a human programmer rewrites them to use the new interface. The emerging Semantic Web aims to provide well-defined information for machines to understand [2]. With a given upper ontology, autonomous agents can read semantic service descriptions and figure out how to use these services and what are needed to use them. This makes it possible for programs to discover and employ these services without or with little help from human programmers and end users.

2 Semantic Web and Mediator in Web Service Invocation We designed our scenario to support programs and services along with their semantic descriptions. An agent between applications and services can coordinate interactions between them, i.e., act as a mediator [3] between requestors and services. When an application uses some web service in its process, it becomes a consumer of services. Programmer forms the service requirement into a semantic service request instead of retrieving a WSDL description for a concrete service. At runtime, the agent will receive the request, executes the query inside it and selects a proper service either M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 803–806, 2004. © Springer-Verlag Berlin Heidelberg 2004

804

L. Zhu, P. Ding, and H. Sheng

automatically or interactively. A dataflow between inputs/outputs of the application and the service should then be constructed. This can be done with some heuristic algorithm or user interactions. Finally, the agent invokes the service with parameters given by the consumer program, and outputs of the service are dispatched to the consumer program where they are expected.

Fig. 1. Agent Coordinated Service Invocation

Figure 1 shows the whole scenario. There are three major factors in this scenario: service, application and agent. The service provides its semantic description and an interface for SOAP message; the application contains service request and API call to agent; and the agent works as a mediator between applications and services.

2.1 Semantic Service Description To make use of services on the Web, the agent needs computer-interpretable descriptions of these services. It is possible to compose these descriptions with a common upper ontology such as DAML-S [4]. With DAML-S, agents will be able to accomplish tasks such as service discovery, invocation, composition and monitoring.

2.2 Service Request The service request is sent to the middle agent to specify what applications need and what they can do. The agent can accept such request in DAML+OIL [5] format. Applications express a certain function by giving a semantic query for semantic descriptions of services. Their inputs and outputs are also included. The organizational point of our request ontology is class ServiceRequest. Properties requestName, query, inputs and outputs are defined on ServiceRequest. Property requestName is the human-readable title of the request, representing the function application required. Property query has a subproperty rdqlQuery, which is used in our prototype. Properties inputs and outputs are collections whose items are instances of class Parameter. Parameter has properties parameterName and dataType.

A Semantic Web Enabled Mediator for Web Service Invocation

805

2.3 Agent as Mediator Agent isolates the function for accessing information on the Semantic Web, executing queries, interacting with users and exchanging SOAP messages from applications. It finds a proper service by the query from the application, and dispatch parameters from the application to the service then dispatch return values back to the application. The agent can use algorithms, inference rules or user interaction to construct a proper dataflow between applications and services. Every input of service and every requirement of return value for application should be bound. Previous invocation histories and user-specified axioms can be used to generate the binding solutions.

3 Implementation of Prototype We’ve implemented a prototype of the mediator agent in our scenario, which can read semantic service descriptions written in DAML-S and service requests as explained in Section 2.3. Figure 2 shows an example invocation procedure. The application (1) asks the agent to find a data source service. The agent found three such services (2). The actual service to be used is selected by the user. The agent then searches and infers the I/O relationship between the service and the application. The user may confirm or adjust the links (3). Unused data will be discarded. After that, the agent invokes the service with data from the application. The application receives the results from the agent, does some domain-specific calculation and shows the result (4).

Fig. 2. Example of Service Invocation Mediation

806

L. Zhu, P. Ding, and H. Sheng

4 Related Works XSRL [6] proposes a request language querying a service in UDDI repositories. It chooses XML approach as its basis and uses EaGLe as its query language. It lacks the expression of applications’ inputs/outputs, which makes it not suitable in our scenario. WSMF [7] uses mediators to solve conflicts, especially invocation sequence mismatch in service composition. Our scenario emphasizes agent-human collaboration as well as computational approach in case there are conflicts that agents cannot solve.

5 Conclusion In this paper we introduced a semantic web enabled scenario to mediate web service invocation in WWW and Grid environments. The main components involved are web services, application programs and mediator agent. Services provide semantic service descriptions. Applications provide semantic Service Requests for service invocations. Mediator agent reads both descriptions, and performs the invocation. Our future work will focus on agent collaboration for distributed interaction and composition of web services. Other possible works include better planning algorithm for construction of dataflow and automatic service rating from previous invocations.

Acknowledgement. The authors appreciate valuable advices from Professor Toru Ishida of Kyoto Univ. and members of Ishida Lab during the conceiving of this paper.

References 1. Roure, D., Jennings, N., Shadbolt, N.: The Semantic Grid: A Future e-Science Infrastructure, Grid Computing: Making The Global Infrastructure a Reality, pp 437-470, 2003 2. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web, Scientific American, 2001.5 3. Hayden, S., Carrick, C., Yang, Q.: Architectural Design Patterns for Multiagent Coordination, International Conference on Agent Systems ’99, 1999.5 4. The DAML Services Coalition, DAML-S: Semantic Markup for Web Services, http://www.daml.org/services/daml-s/0.7/daml-s.html 5. McGuinness, D., et al.: DAML+OIL: An Ontology Language for the Semantic Web, IEEE Intelligent Systems, 2002.9/10 6. Papazoglou, M., et al: XSRL: A Request Language for Web Services, http://www.webservices.org/index.php/article/articleview/990/1/24/ 7. Fensel, D., Bussler, C.: The Web Service Modeling Framework WSMF, http://informatik.uibk.ac.at/users/c70385/wese/wsmf.paper.pdf

A Data Mining Algorithm Based on Grid Xue-bai Zang 1,2, Xiong-fei Li1,2, Kun Zhao1, and Xin Guan

1

1

2

College of Computing Science and Technology , Jilin University , Changchun 130025, China State Key Laboratory for Novel Software Technology,Nanjing University, Nanjing 210016, China [email protected]

Abstract. In this paper.a data mining algorithm that can work on the grid is presented. First, we have built a grid platform of experiment, and have finished the infrastructure’s design and debug. Then, we have developed software of data mining. The software is running on the grid platform. The load-balance and generating large itemset using multi-segment are discussed in our algorithm. The experiment’s result indicates that the computing capability of data mining based grid has apparent superiority over the uniprocessor’s. The grid platform, which we have built, has actual developing ability of data mining.

1 Introduction Grid Computing is a newly-developed technology based on Internet. It makes the computing resources, storage resources and network resources, which spread around different place, into one virtual super computer by using share network. The Grid Computing can provide powerful computing ability[l,2]. Users can enjoy the integrative, dynamic, flexible controllable, intelligent cooperative service in Grid[3]. Grid Computing System commonly consists of grid hardware, grid operating system, grid interface surface, grid application, etc. The most prominence characters of Grid Computing System are the resource sharing, cooperate with others and the opening criterion. The “resource” of Grid are the processor which is managed by Grid Resource Management Service(GRMS), storage entity, resource of network, etc. These resource has classical isomerous and distribute character [4]. We has studied and established a grid platform. Then, we have developed data mining software on the grid, and ran contrast analysis of experiment performance with homologous uniprocessor software. Finely, the experiment results are presented.

2 Building the Infrastructure of Grid Platform The next generation network is grid. Considering the generality study of grid and the reality require of the automobile dynamic simulator (another grid service in our uniM. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 807–810, 2004. © Springer-Verlag Berlin Heidelberg 2004

808

X.-b. Zang et al.

versity), we have established the grid infrastructure (as shown in Fig.1). In the grid infrastructure, we setup two subsets. Two subsets are connected through X.25 protocol and the subsets are linked into the campus network of Jilin University. There are different hierarchy resource of computing and storage, such as PII/266, PIII/550, PIV/2.4G, PIV/2.0 with double CPUs, double server cluster (4 CPUs per Server), disk array, etc. We use Windows or Linux as the operating system who can simulate wide area different environment. The equipments, which were shown in Figure 1, belong to the same virtual organization (grid.jlu.edu.cn).

Fig. 1. The infrastructure of grid platform

3 The Structure Designing and Configuration of Grid This grid platform is named Jilin University Grid (grid.jlu.edu.cn), and FQDN consist of ca.grid.jlu.edu.cn, alpha.grid.jlu.edu.cn, bate.grid.jlu.edu.cn, etc, among which ca.grid.jlu.edu.cn is the certification center, bate.grid.jlu.edu.cn is NTP time server. PII /266 is only used for client, other machine is not only server but also client. Before constructing the GT3 alpha 2, we need install JDK1.3.1 and Jakarta Ant 1.5. GT3 alpha 2 has a service container itself and it can use Tomcat as application server. We create one account number ogsa to use it for installing and using ogsa. The installing package of Globus toolkits are saved in /home/ogsa; At the same time, we have the installing files (.tar.gz file )of tool software such as JDK 1.3.1, Jakarta Ant 1.5.1, Jakarta ORO, Jakarta Tomcat, etc, saved in the directory of /usr, so when we run the command of tar-xvzf in this directory, the corresponding file folder will be saved in /usr. After the startup of service container, we can execute the program of grid service request terminal to call the grid service of service container.

A Data Mining Algorithm Based on Grid

809

OGSA is Service Oriented, and all the services are inherited one general basis class: ServiceSkeleton, which implements the core function of what all grid services provide. The class PersistentServiceSkeleton implements the common behavior of all permanence services which are not created dynamically by one factory, but join in the skeleton statically through host computer environment. All grid services include one ServiceDataSet, which contains all service data member. They can be getting by findServiceData (pull) or subscribing (push). Creating a service or service framework which can be executed in deed, we should inherit ServiceSkeleton and implement WSDL PortType interface. One factory can provide one service by inheriting FactoryServiceSkeleton, and the most important is the method of CreateServiceObject through which we can create one new service. The Client grid service can be wrote on JAX-RPC client APIs. We provide some application class to simplify the implement from GSH to GSR and the self detect of GSR. For extending JAX-RPC stubs, we provide a user stub generator to have the utility program integrate in the programming pattern of service request terminal seamlessly. Typically, one client will get a handle of service instance through a registry or other way. This handle is handed to ServiceLocator and the ServiceLocator create one proxy or stub which take charge of establishing call according to the NET protocol binding defined in service.

4 A Data Mining Algorithm Working on the Grid There are two different methods to find association rules in paralleling: One is sharing one hash tree among processor, but partitioning the database into slice to assign for processor. Every processor scans its local database to compute the support of candidate itemsets in global hash tree. In this case, the count domain of each candidate itemset is sharing variable, and we should adopt lock mechanism to count domain. The database adopts partition method to have database divided P parts in average with P processor. Another kind of processor has global database, but partitions hash tree into slice to assign for processor. Every processor scans the whole database to compute the candidate itemsets’s support of its local hash tree. In conclusion, we establish two feasible parallel programs as shown underside. 1. Communion candidate itemsets, partitioning datasets (CCPD) 2. Partitioning candidate itemsets, communion datasets (PCCD) The service request terminal firstly creates GSI proxy, then submits job to server by using the above commands and executes the gram watch process of server terminal. When detecting all the tasks have been submit, it uses ssl protocol to verify its identity, and begins to search files in local position according to the rsl file’s contents which were received from service request terminal after passing verification. Now we give a parallel algorithm for discovering large itemsets. When count the support for these candidate itemsets whose size is more than k, it is meaningless to pass the transactions whose size is k. So while passing the data to determine the set of large k-itemsets, in order to reduce the size of the data, the algorithm deletes the k-size

810

X.-b. Zang et al.

transactions from the data. is the set of large k-itemsets, is the set of seed k-itemsets, the algorithm generates the set of potentially large itemsets, which is candidate set from but not The size of affects directly the efficiency of algorithm. Using the support vector when counting can reduce the size of In data mining algorithm, each itemset has a support vector. For a k-itemset I, its support vector is When is the times that itemset I presented in transactions of size i; when i=q , is the times that itemset I presented in transactions whose size is not less than i. Obviously, the support vector of k-itemset I is counted from the times that I presented in k-size transactions, and take the sum of each section in the vector as the support of itemset I. Basing on the above method, we have implemented the parallel program of multisegment support [5].

5 Conclusion Through the grid platform we have set up and the experiment of data mining software based on grid, we can arrive the below conclusion: 1. The common program need reconstruct a lot when it runs in grid. 2. We can create process on other computer and execute task in grid. 3. The performance of the grid with four CPU is lower than four times of the one CPU computer’s. The computing resource integrating capability of grid can be used in the automobile dynamic simulator. But when the process is terminated in exception, it will leave mark on computing resource.

References 1. I. Foster and C. Kesselman (Eds.), The Grid: Blueprint for a New Computing Infrastructure, Morgan-Kaufmann, 1998. 2. I. Foster, C. Kesselman, J. Nick, S. Tuecke, The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration, January, 2002. http://www.Globus.org/research/papers/ogsa.pdf 3. K. Czajkowski, I. Foster, C. Kesselman, N. Karonis, S. Martin, W. Smith and S.Tuecke, A resource management architecture for metacomputing systems, In Proc.IPPS/SPDP’98 Workshop on Job Scheduling Strategies for Parallel Processing, Springer-Verlag LNCS 1459, 1998, 62–82. 4. I. Foster, C. Kesselman, S. Tuecke. The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, 2001,Vol.15,No.3 5. Li Xiong-Fei, A Data Mining Algorithm Based on Calculating Multi-Segment Support, Chinese J. Computer, 2001, 6, 661-665

Prototype a Knowledge Discovery Infrastructure by Implementing Relational Grid Monitoring Architecture (R-GMA) on European Data Grid (EDG) Frank Wang1, Na Helian1, Yike Guo2, Steve Thompson3 and John Gordon4 1

Department of Computing, London Metropolitan University London N7 8DB, United Kingdom [email protected] 2

Department of Computing, Imperial College, London SW7 2BZ, United Kingdom 3 Xyratex, Hampshire PO9 1SA, United Kingdom 4 CCLRC Rutherford Appleton Laboratory Oxfordshire OX11 0QX, United Kingdom

Abstract. This paper describes the implementation of a ScanOnce algorithm in SQL for quick association rule mining and the development of a data mining infrastructure JetGrid. The architecture of JetGrid is designed to be compatible with lower-level grid mechanisms since it is to operate on top of Relational Grid Monitoring Architecture (R-GMA) provided by European Data Grid (EDG). JetGrid for quick knowledge discovery was preliminarily prototyped and it is extensible, using an object-oriented design that was coded in C++. Mining agents will be staged to one or more computing elements on the EDG. They will mine data delivered using GridFTP and/or Globus transport mechanism. Some initial experimental results are presented, which provide some indication of the performance of the JetGrid data mining infrastructure on top of EDG.

1 Introduction The Grid is a natural platform for deploying a high performance data mining service[Hinke, 2000] [Orlando, 2002][Cannataro and Talia, 2003]. The Grid has also developed standards based infrastructures and services relevant to data mining. These new standards and standards based services and platforms have the potential for changing the way the data mining is used. In this paper, an integration of a new ScanOnce algorithm for quick association rule mining in SQL and the European Data Grid (EDG) has been attempted, which enriches the National e-Science Programme launched in 2001. The EDG is a project funded by the European Union that aims to enable access to geographically distributed computing power and storage facilities belonging to different institutions[http://eu-datagrid.web.cern.ch/eu-datagrid/]. A grid

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 811–814, 2004. © Springer-Verlag Berlin Heidelberg 2004

812

F. Wang et al.

infrastructure JetGrid for quick knowledge discovery was prototyped. The architecture of JetGrid is designed to be compatible with lower-level grid mechanisms since it is to operate on top of Relational Grid Monitoring Architecture (R-GMA) provided by European Data Grid (EDG). The JetGrid data mining infrastructure is extensible, using an object-oriented design that was coded in C++. For the scientist in an e-Science scenario, the vision that is now becoming reality is as follows: 1. The user submits his request through a Graphic User Interface (GUI) just specifying high level requirements (the kind of application he wants to use, the operating system,...) and eventually providing input data; 2. The Grid finds and allocates suitable resources (computing systems, storage facilities, ...) to satisfy the user’s request; 3. The Grid monitors request processing; 4. The Grid notifies the user when the results are available and eventually presents them. On JetGrid, the user must specify what is to be mined, how it is to be mined and where it is to be mined. Under our approach, the user specifies what is to be mined by storing the names and grid locations of a set of data in a database associated with the EDG miner. These are communicated to a miner agent over the Grid. The user specifies how the data is to be mined by specifying a mining plan that lists the sequence of mining operations that are to be applied to the data and any parameters required. The user specifies where the mining is to take place by specifying the EDG computing elements (CEs) on which the mining agent is to be staged. With these requirements specified, the user will then invoke the miner, which will send mining agents to the designated EDG computing elements. On these computing elements, each agent will acquire the data to be mined, mine it and send the results back to the user.

2 Data Mining Architecture We are currently implementing the ScanOnce algorithm in SQL to deploy it on top of the EDG by using useful Globus middleware services. The architecture of JetGrid is designed to be compatible with lower-level grid mechanisms. The JetGrid data mining system is extensible, using an object-oriented design that was coded in C++. The EDG provides a monitoring and information management service for distributed resources that expose a relational model with SQL support to provide static as well as dynamic information about Grid resources and for use within application monitoring. R-GMA is a relational implementation of the Grid Monitoring Architecture (GMA). The relational model R-GMA is very flexible and allows complex queries which make use of information in multiple objects. R-GMA makes information from Producers available to Consumers as relations (tables). The R-GMA implementation uses HTTP Servlet technology. Communication with the servlets is achieved via an API. This API has been implemented in Java and C++ but C and other languages will be provided soon. The response from a servlet is in the form of an XML document that corresponds to an XML schema definition. It is highly dynamic. For our infrastructure on top of R-GMA,

Prototype a Knowledge Discovery Infrastructure

813

we should consider a virtual organization (VO)’s information to be organized logically as one huge relational database - but the implementation is based on a number of loosely coupled components. The huge data base is partitioned, with the description of the partitioning held in the Registry server[http://eu-datagrid.web.cern.ch/eu-datagrid]. To begin the mining operation, initially a mining agent will be staged to an EDG computing element. It is envisioned that these agent may acquire mining operations from multiple sites on the EDG. Some will be acquired from public repository sites that contain a standard set of mining operations. It is hoped that once the mining system is fully operational, mining users will contribute new operations to this mining repository [Hinke et al, 2000]. The agent also performs acquisition of data. Such data can be acquired from EDG-based repositories as well as various data repositories that provide FTP access to their data holdings. By using data delivery, the storage requirements for the target mining site are minimized. Applying the ScanOnce algorithm [Wang et al, 2002] [Wang et al, 2003] implemented in SQL over the EDG to mine global association rules does not need to ship all of local data to one site thereby not causing excessive network communication costs. In this algorithm the contribution from each transaction is comprehensively taken into account by growing a prefix tree for each local transaction and enumerating all subsets of the transaction itemset. There is no need at all to store and re-scan the previously-scanned transactions, which will be discarded after a single pass.

3 Preliminary Experiments We performed a very simple experiment on a synthetic database. In this data set, the average transaction size and average maximal potentially frequent itemset size will be set. Items are drawn from a universe of I = 10K unique items. Item_IDs are 4-byte integers. The number of transactions in the dataset ranges from 200,000 to 5 millions, which occupies up to 200 MB space. For this experiment, the data mining system used delivery to acquired data, similar to [Hinke, et al, 2000]. This data was acquired from a remote host using both FTP and Globus transport mechanisms. As a reference point, the EDG miner was also used to mine the same data that was stored locally. The Linux csh time command was used to time the commands, with its wall-clock time reported in the following table. These are preliminary results, since they represent only one run for each experiment. However, these results provide some indication of the performance of the JetGrid data mining infrastructure. Time to Acquire and Mine Data Source of Data Using Globus Transport Mechanism 25 minutes 13 seconds from Remote Host Using GridFTP from Remote Host 24 minutes 49 seconds 15 minutes 21 seconds Local Host We have reported our experimental results on a synthetic database. We are currently running our query over two other different datasets. The first dataset is a collection of Web pages crawled by WebBase, a web crawler developed at Stanford Univer-

814

F. Wang et al.

sity [Hirai, 2000]. Words in each document are identified. Common stopwordsfSalton, 1988] are removed. Common stop-words [Salton, 1988] are removed. The resulting input file is 54 MB. The second dataset is the well-known Reuters newswire dataset, containing 806,000 news articles [http://www.reuters.co.uk/]. The input file resulting from this dataset after removing stop-words is roughly 210 MB.

4 Conclusion and Work-on-Progress This paper represents a snap-shot into a project that is ongoing, presenting a scenario of grid-based mining, architecture for a grid-based miner, and some preliminary experimental results. JetGrid for quick knowledge discovery was preliminarily prototyped. Some preliminary experimental results are presented, which provide some indication of the performance of the JetGrid data mining infrastructure on top of EDG. Intensive tests are being carried out using GridFTP and other Globus transport mechanism for shipping raw data D across the grid to one node for processing (MD: Move Data Model)[Chan, 1999] and processing the data locally until a result R is obtained and ship the result to one node for further processing (MR: Move Results Model) [Sunderraman, 1998]. More test results will be reported at the Workshop.

References Cannataro, M. and Talia, D.,2003. The knowledge grid, Communications of the ACM, Volume 46, Issue 1 Foster, I. and Kesselman. C., 1997. Globus: A metacomputing infrastructure toolkit. Intl J. of Supercomputer Applications, 11 (2): 115–128, 1997. Hinke, T., Novotny, J., 2000. “Data Mining on NASA’s Information Power Grid”, Proc. of 9th IEEE International Symposium on High Performance Distributed Computing, Pittsburgh, August 1-4, 2000 http: //eu-datagrid. web. cern.ch/eu-datagrid Manku, G., Motwani, R., 2002. Approximate Frequency Counts over Data Streams, Proceedings of the 28th VLDB Conference,Hong Kong, China, 2002 Orlando, S., Palmerini, P., Perego, R. and Silvestri, F., 2002. Scheduling High Performance Data Mining Tasks on a Data Grid Environment, proceedings of Europar 2002. Wang, F., 2002. “A ScanOnce Algorithm for Large Database Mining”, Proceedings of the 2002 International Conference on Data Mining, ISBN: 1-85312-925-9, 25 - 27 September 2002, Bologna, Italy Wang, F. and Helian, N., 2003. “Scanning Once a Large Database to Mine Association Rules By Growing A Prefix Tree for Each Transaction”, The International Conference on Information and Knowledge Engineering (IKE’03), June 23 - 26, 2003, USA

The Consistency Mechanism of Meta-data Management in Distributed Storage System Zhaofu Wang, Wensong Zhang, and Kun Deng

P&DP Lab, National University of Defense Technology, Changsha, 410073 {wangzhaofu, dengkun}@vip.sina.com [email protected]

Abstract. Data consistency is an important issue to mantain data integrity in a sequence of write-operations. Meta-data consistency management is an vital technology in implementing clustered storage systems. The lock mechanism is popular in this area. In this paper, we discuss the consistency mechanism in our LCFS systems. The traditional locking mechanism and the versioning-based consistency protocol are compared. ...

1 Introduction The notion of the data consistency concept is used popularly. In this paper, the data consistency conception is constrained to he sequence of write-operations which is essential for maintaining data integrity. Whereas the set of fragments involved in one reading operation must correspond to the same update operation. In the distributed storage systems, such consistency is hard to manage because of concurrent updates from distinct clients. Without such consistency consideration, the systems operation may bring unpredictable result. For example, classical NFS servers had no consistency consideration. It cannot preserve local single-copy semantics. The new NFS4 protocol has become a stately file system to maintain file access consistency. Typical distributed File systems like NASD[1], Lutre and GFS, have consistency implementation in different levels. There has been lots of work on protocols for maintaining consistency across distributed storage systems. The popular method is write ownership protocol, widely known as the lock mechanism. The other two methods are timestamp-based consistency protocol and leases-based consistency protocol. We only discuss the lock protocol in this paper. When we implement our asymmetric clustering storage system named LCFS, we need to hold a consistent view across the storage nodes, especially across meta-data nodes for most recent updates. A lock-based consistency solution for clustering storage system is presented in the paper. Implementation of the mature protocols is also discussed. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 815–821, 2004. © Springer-Verlag Berlin Heidelberg 2004

816

Z. Wang, W. Zhang, and K. Deng

Fig. 1. The architecture of the LCFS storage system

2 System Model The structure of the distributed storage systems named LCFS is described as figure 1. The system is constituted of three subsystems. Clients: Applications running on clients shared one file system, which is built up of flilesets and provides a global namespace. Metadata Control Systems: Manageing namespace and file system meta-data coherence, cluster recovery, and coordinate storage management functions. With specialized metedata management, the system can maintain standard directory and file semantics with afforadable system performance slowdown. Storage Targets: Providing persistent storage for objects. Objects can represent files, stripes or extents of files. Files are represented by container objects in the metadata cluster or constituent objects in storage targets. This includes handling large numbers of files ranging from bytes to terabytes in size, supporting very small and very large directories, and serving tens or hundreds of thousands of parallel accesses to different files in different directories, different files in the same directory, and even to the same file. One key aspect of the LCFS system is the integration of caching cluster consistency and recovery, which is vital to achieve best performance from a client perspective.

The Consistency Mechanism of Meta-data Management

817

3 Consistency Model When several operation durations overlap, we call them concurrent operations. The duration of a read operation is measured from the start time at the issuing client to the end time at the same client. A read operation will keep active until result value returned or canceled by submitter. The LCFS will offer Unix semantics across multiple storage nodes. The characteristics of LCFS includes: 1. Updates immediately visible to all processes in the cluster. 2. Updates atomic with respect to other processes reading and updating file system information.

Modern filesystems have implemented a variety of locking models: 1. Strict distributed filesystem semantics : Before a node is allowed to update an object, the nodes wait until all other nodes have release their read locks on the object. 2. Callback distributed filesystem semantics : As soon as a node begins updating an object, other nodes are notified they must re-acquire read locks. However, the update proceeds without waiting for processes that hold a read lock (AFS, Coda -connected). 3. Optimistic distributed filesystem semantics : Updates require a shared write lock on objects, i.e. one which can co-exist with read locks on the same object (InterMezzo, Coda - weakly connected). 4. Lack of distributed filesystem semantics : Multiple nodes can read and update objects (NFS). The consistency semantic for which we strive is a variation of linearizability for metadata updating. Our goal for this protocol is practical efficiency for survivable metadata updating.

4 Implementation in LCFS 4.1 Traditional Lock Mechanism Locking is the most commonly used mechanism for concurrency control. We adopt two locking algorithms: simple server locking and its caching variant named callback locking. It’s similar to the locking protocol of luster project[2]. The protocol have a two-phase property by which serializability can be guaranteed. They are also free from deadlock To implement strict semantics for a Linux cluster filesystem, some obstacles must be conquered on VFS level. As correct lookups depend on directory reading and inode information, a filesystem with strict semantics need to acquire read locks during

818

Z. Wang, W. Zhang, and K. Deng

lookup (just like GFS system do so). However, there is no interface in VFS to hold read locks beyond the lookup. Just image one scenario as below: node A emits mkdir(d) operation, and another node B issues rmdir(d). If node B completes the remove operation after the host A’s lookup and before it’s activation, the node A will be failed, even if directory d has been deleleted. We use two paths to perform meta-data updates in LCFS. In order to make an update, a CFS node must perform a lookup on the data nodes involved and acquire certain locks. The traditional process of acquiring lock and doing update is described as figure 2.

Fig. 2. The typical lock acquirement process

In cases of successive access of meta-data updating, it is considerably more efficient to ask the host holding the update lock for a while (named lease time). In this protocol schema, the CFS should delays the unlock message and caching the lock at the host. The host cached the lock will generate another access to the same meta-data in the near future, avoiding send subsequent lock messages to the meta-data server. If a host requests a lock which is currently cached by another host from the meta-data server, the MDS asks the lock owner (this is the callback message) to relinquish it before granting the lock to the newly requesting host. The meta-data server, known as the lock manager, will do callback locking. This implementation is only a little more complicated, but callback locking can reduce lock server load and lock acquisition latencies when locks are commonly reused by the same host multiple times before a lease expires.

The Consistency Mechanism of Meta-data Management

819

4.2 Versioning Consistency Mechanism The traditional locking can no doubt limit the concurrency of storage system. In Lutre system, they adopt a more concurrent protocol executed in a client/server model based on intend. In NFSv4, they use compound operation to do batch processing. In the replication-based clustered MDS, we adopt a new consistency mechanism using versioning. In this scenario, the meta-data is distributed and mirrored in all the servers, each write and read of meta-data should be directed to any server. The consistency can be guaranteed based on the following protocol. Correctness of the protocol can be referred in referrence[1]. Meta-data servers are responsible to maintain versions of meta-data items. Every write request creates a new version of the meta-data item at the meta-data server. Versions of a meta-data item are distinguished by logical timestamps (i.e., logical timestamps are unique per meta-data item).

Fig. 3. The write process of versioning protocol

The write operation is divided into two phases. Concurrent logical timesetamp for the meta-data item be determined firstly. In thesecond phase, write requests be issued to a set of meta-data servers until at least a write threshold of write requests have been acknowledged. The process of a write operation to meta-data item D is shown in Figure 3. Partial-writes and write operations that are concurrent to the query may contain timestamps later than that of the latest complete write. The client can consider the write complete once W meta-data servers have acknowledged the write requests.

820

Z. Wang, W. Zhang, and K. Deng

The read operation also has two phases. The first phase consists of the client establishing an initial latest candidate write. The second phase consists of determining if the latest candidate write is a complete write. The process of a read operation to metadata item D is shown in Figure 4.

Fig. 4. The read process of versioning protocol

There is a chance that the latest candidate write cannot be classified as either complete or partial. This may occur if some of the meta-data servers that have not responded contain copies of the latest candidate write. If the latest candidate write is a complete write, it is the latest complete write; the read operation completes by returning this value. If the latest candidate write cannot be classified, the read operation aborts. If the latest candidate write is an identifiable partial-write, it is discarded. All meta-data servers that host the discarded latest candidate write are removed from the set of read responses. Most importantly, we think that the versioning consistency mechanism is more efficient than the traditional one in the clustered meta-data server scenario.

5 Summary The consistency of distributed system is studied widely. But the consistency in cluster storage systems has some particular feature. We implement a locking protocol in our LCFS system The locking mechanism can preserve the updating correction across the CFS hosts, but need to do more optimization for better performance. We propose a

The Consistency Mechanism of Meta-data Management

821

versioning-based consistency model for clustered meta-data server in this paper. More works need to do to implement the more efficient consistency protocol in our large-scale storage systems.

References 1. Garth R. Goodson, Jay J. Wylie, Gregory R. Ganger, Michael K. Reiter, Decentralized Storage Consistency via Versioning Servers, School of Computer Science Carnegie Mellon University, CMU-CS-02-180, September 2002 2. Peter J. Braam. The Lustre Storage Architecture. Cluster File Systems, Inc. [email protected], 2002.12 3. Brian Pawlowski, Spencer Shepler, Carl Beame, Brent Callaghan, Michael Eisler, David Noveck, David Robinson, Robert Thurlow , The NFS Version 4 Protocol. http://www.citi.umich.edu/projects/nfsv4. 2002 4. Scott A. BrandtEthan L. MillerDarrell D. E. LongLan Xue, Efficient Metadata Management in Large Distributed Storage Systems, the 20th IEEE / 11th NASA Goddard Conference on Mass Storage Systems and Technologies, San Diego, CA, April 2003. 5. Khalil Amiri, Garth A, Gibson, Richard Golding. Highly concurrent shared storage, the Proceedings of the International Conference On Distributed Computing Systems, Taipei, April 2000. 6. Feng Zhou, Chao Jin, Yinghui Wu, Weimin Zheng, TODS: Cluster Object Storage Platform Designed for Scalable services. [email protected] 7. Erik Riedel, Susan Spence, Alistair Veitch, When Local Becomes Global: An Application Study of Data Consistency in a Networked World, Proceedings of the 20th IEEE Int’l Performance,Computing and Communications Conference (IPCCC 2001), April 2001.

Link-Contention-Aware Genetic Scheduling Using Task Duplication in Grid Environments* Wensheng Yao, Xiao Xie, and Jinyuan You Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200030, China [email protected]

Abstract. In this paper, we consider the problem of scheduling precedenceconstrained tasks as well as communications in the grid environment where computers and links are heterogeneous and time-sharing. Herein, we propose a novel genetic scheduling algorithm for grid computing. The new algorithm adopts a special chromosome encoding scheme in order to make better use of task duplication. Moreover, knowledge based genetic operators are developed to improve the performance of the algorithm. We perform comparison studies in a simulated grid environment. Experimental results show the effectiveness of the enhanced genetic scheduling algorithm. Keywords: Genetic scheduling, task duplication, link contention

1 Introduction The problem of grid scheduling has received considerable attentions over the past few years 000. Our work focuses on matching and scheduling tasks and communications on given grid resources. According to the information of an application program and the underlying system states, grids schedulers generate proper schedules. The varying performance of the available resources can be predicted by such tools as the Network Weather Service (NWS) 0. A long-term application-level prediction model 0 can be used to predict the execution time of a task on resources. A program can be modeled as a directed acyclic graph (DAG). The scheduling problem is known to be NP-complete 0. Task scheduling considering link contention for heterogeneous systems is a relatively less explored research topic. So far, several heuristic algorithms considering link contention have already been proposed 00. A genetic algorithm (GA) 0 based approach was also proposed to solve this problem 0. Nevertheless, an effective technique called task duplication 00 was not considered in their algorithms. Most GA-based scheduling algorithms fail to make use of task duplication. In this paper, we propose a link contention aware GA based scheduling algorithm using task duplication. A simulator was developed for evaluating the performance of *

Supported by Shanghai Science and Technology Development Foundation under grant No. 03DZ15027 and the NSF of China under grant No. 60173033.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 822–829, 2004. © Springer-Verlag Berlin Heidelberg 2004

Link-Contention-Aware Genetic Scheduling Using Task Duplication

823

the algorithms using SimGrid toolkit 0. We compare the algorithms using a subset of the benchmarks proposed by Kwok and Ahmad 0. This paper is organized as follows. In Section 2, we define the scheduling problem. Section 3 describes the definition of main sequences in a task graph. We present the algorithms in Section 4 and Section 5, respectively. Experimental results and comparisons are shown in Section 6. We present our conclusion remarks in Section 7.

2 Background In this paper, the underlying grid system U=(P, L) consists of a set of k heterogeneous which are connected via m heterogeneous links PEs Each processing element executes only one task at one time without interrupting, and task preemption is not allowed. A message can not be transmitted through a link until it is not occupied by another message. We also assume that a static route table is available for the scheduling algorithms. A parallel program is modeled as a weighted DAG G=(V, E, C, M), where each node represents a task and each edge represents the precedence relation that task u should be completed before task v can be started. The execution time of v on p, denoted as C(v, p), and the communication time of (u, v) on l, denoted as M(u, v, l) can be provided by the prediction tools. The critical path (CP) is the longest path from an entry node to an exit node, of which the sum of computation cost and communication cost is the maximum. The communication-to-computation-ratio (CCR) of a parallel program is defined as its average communication cost divided by its average computation cost. A schedule of G, denoted by S(G), consists of mapping of tasks and communications onto PEs and links, respectively, and assigning a start time to each task and each communication. For a task it is scheduled onto and assigned a start time ST(v, p). Therefore, the finishing time of v on p, denoted by FT(v, p), can be represented as FT(v, p)= ST(v, p)+ C(v, p). The length or makespan of a schedule S, is the maximal finishing time of all tasks, that is Given a weighted DAG G=(V, E, C, M) and a grid system U=(P, L), the goal of scheduling is to find a schedule with minimal makespan.

3 Main Sequences in Task Graph In order to capture the characteristics of task graphs, we have the following definition. Definition 1. Of all paths from an entry node to a node v, the one with the largest sum of computation cost is called the Main Sequence of v, denoted by MS(v). Of all main sequences, the sequence with largest sum of computation cost is called the DAG Main Sequence (DAGMS). Fig.1 shows that it is possible to shorten the final schedule length by duplicating the nodes on the main sequence of the task graph.

824

W. Yao, X. Xie, and J. You

Fig. 1. (a) a DAG, (b) an optimal schedule for the task graph, (c) a simple Grid.

Only considering the nodes on the DAGMS may not lead to optimal schedules. We find the useful main sequences in a task graph by using the following algorithm.

For example, the useful mam sequences in Fig.l (a) are calculated as follows.

If the first MS is mapped on P2 and other MSs are mapped on P1, and the order is properly arranged, we can obtain an optimal schedule as shown in Fig. 1 (b).

4 The Starting Scheduling Algorithm We begin with a simple genetic algorithm using a novel chromosome encoding scheme 0. We call it Pure Genetic Scheduling (PGS) algorithm.

Link-Contention-Aware Genetic Scheduling Using Task Duplication

825

A chromosome is defined as two strings {PE, priority}, whose lengths are equal to the number of tasks 0. The decoding process of a chromosome is to determine only one schedule. To do so, we need calculate the communication time of all possible messages on the links. For different immediate successors of task v, we firstly consider the communication from v to the successor with a smaller pri value. If the successors have several duplications, we firstly consider the duplication with the smaller available time of the allocated processing element. Hence, different PE and pri may lead to different message schedule. The operations, such as initialization, selection, crossover and mutation, are applied in the same way as used in 0.

5 The Proposed Scheduling Algorithm Based on the DAGMS and other useful main sequences in a task graph, we propose the Main Sequences Genetic Scheduling algorithm.

5.1 MS-Based Initialization The initial population is generated as follows. Firstly, we sort all processing elements in descending order of their computing capabilities and the links bandwidths. Then, for each useful main sequence, yielded by the algorithm in Section 3, we allocate the first available processing element to all tasks on the MS. Thus all tasks on the same MS are executed on the same processing elements. If no processing element is available, then a processing element is randomly selected to accommodate the tasks. Secondly, for each task, the priority element value is generated the same way as the PGS algorithm. This process is repeated Pop_Size/2 times. To generate the other Pop_Size/2 individuals, we repeat the following process Pop_Size/2 times. Firstly, for each useful MS, a processing element is randomly selected to accommodate the tasks on the MS. Secondly, for each task, the priority element value is generated the same way as described above.

5.2 MS-Based Crossover and Mutation In the MSGS algorithm, we use two different crossover methods for the two strings. For the priority string of the chromosome, we use uniform crossover. For the PE string of the chromosome, we use main sequence based crossover. The crossover operator is described as follows. Firstly, chromosomes are selected for crossover. Secondly, for the two selected chromosomes, a node is randomly selected as crossover candidate with a certain probability, and then the alleles for priority elements are swapped. Thirdly, for the two selected chromosomes, all nodes on the main sequence of a randomly selected node are identified, and then the alleles for PE elements of these nodes are swapped. Finally, the two newly generated chromosomes replace their parents. The main sequence knowledge is also integrated into the mutation operator as follows. For each chromosome, a task is randomly selected with the mutation probability. Then, for each selected task, one of the following operations is executed,

826

W. Yao, X. Xie, and J. You

changing its priority value or changing its PE value. To change the priority value, the same process is executed as that in the PGS algorithm, that is generating a new value as the priority of the gene. To change the PE value, the algorithm first judges whether the corresponding task is on DAGMS. If it is on DAGMS, a processing element is randomly selected. If the task is not scheduled to that processing element, then the algorithm adds it to the PE element of the task; otherwise, the algorithm deletes the processing element from the PE element of the task. If the PE element value is zero after deletion, a processing element is randomly selected to assign to the task. If it is not on DAGMS, the algorithm performs the same operation as it is on DAGMS with a certain probability. For other cases, the algorithm generates new PE element value to replace the old one. Finally, a new chromosome is generated to replace its parent.

6 Experiments In this section, we present the comparison results and analysis of the different approaches. We implemented the two algorithms, PGS and MSGS, in C language and performed simulation studies in the Linux environment (Intel P4 2.0G Hz). For such purpose, we developed a simulator, SimTDB, using SimGrid toolkit 0. SimGrid provides a trace-based simulation and performance prediction of resources. All results correspond to tests run with the benchmarks 0 on a 14-PE grid system. The varying performance of the resources are monitored and predicted by NWS 0. Running times, including predictions, are given in seconds. We did not compare them with other algorithms without considering task duplication such as 000. To produce the results in this section, the following parameters are adopted in the algorithms: Pop_Size=30, Pc=0.6, Max_G=10000. We give a lager number to the max generation so as to allow the algorithms to search solutions exhaustively. Each of the tests has been performed five times so as to provide a fair comparison. Table 2 shows the characteristics of the graphs 0, given by the number of tasks, the number of edges, total computation cost, total communication cost, and CCR.

Link-Contention-Aware Genetic Scheduling Using Task Duplication

827

6.1 Initial Population In Table 3, we present the characteristics of the initial population in PGS and MSGS with prefect predictions. It shows that our method can produce better initial populations.

6.2 Final Results In order to show the effectiveness of the proposed algorithm, we compare the two algorithms in terms of schedule length, virtual clock and running time to find the schedule. Table 4 shows the comparative results. In general, the knowledge of useful main sequences of task graphs is helpful for finding a shorter schedule. The knowledge-based crossover and mutation allow the algorithm to find a better solution.

Clearly, MSGS produces the best results. The running times used to find schedules are shorter when CCR is small. In fact, the predictions occupy a lot of the running times while other parts of the algorithms spend only 15% of total time. The reasons for this good performance can be explained as adopting the knowledge of the useful main sequences for initialization and genetic operations. The multiple main sequences

828

W. Yao, X. Xie, and J. You

in a task graph capture the characteristics of the task graph. Hence, the initialization integrating the knowledge of main sequences can produce better solutions. Meanwhile, the knowledge-based crossover and mutation can improve the qualities of solutions. Fig. 3 shows the converging processes of the different approaches. MSGS starts with a better solution, and improves the quality of the solution in a short term and then gradually improves the result in the later evolutionary process. PGS starts with a worse solution and often generates worse solutions due to adopting the simple crossover and mutation operators.

Fig. 2. Evolutionary processes. (a) Random task graph (R=1.0), (b) task graph Gaussian (R=1.0).

We also study the impacts of different prediction errors on the performance of PGS and MSGS. Five situations are selected for evaluation, where the prediction error is 0, 0.1, 0.2, 0.5 and 1, respectively. The prediction error 0.2 means that the error is within a uniform distribution [-0.2, 0.2]. Fig. 3(a) shows the relative error / virtualclock of algorithms with different prediction errors. Fig. 4(b) shows the deviations from the virtual clock with perfect prediction using different prediction errors. It seems that MSGS is more sensitive to prediction errors than PGS. It can be explained that MSGS has stronger search ability. It can be also observed that MSGS has smaller relative errors and deviations when prediction errors are within 0.2.

Fig. 3. (a) Relative errors, (b) Deviation from the virtual clock with perfect prediction.

Link-Contention-Aware Genetic Scheduling Using Task Duplication

829

7 Conclusion We have presented a genetic algorithm for link-contention-aware scheduling problem. A novel chromosome encoding scheme is used to make better use of task duplication. The knowledge of the main sequences in a task graph is integrated into the algorithm. The proposed algorithm is compared with the PGS algorithm in terms of makespan, virtual clock and running time. The experimental results demonstrate the effectiveness of the MSGS algorithm.

References [1] [2] [3] [4] [5] [6] [7] [8] [9]

[10] [11] [12] [13]

I. Foster and C. Kesselman, editors. The Grid: Blueprint for a Future Computing Infrastucture. Morgan Kaufmann Publishers, Calif., 1998. E. Heymann, M. A. Senar, E. Luque and M. Livny, “Adaptive Scheduling for MasterWorker Applications on the Computational Grid,” LNCS 1971, Springer-Verlag, Berlin, (2001) 214-227. W. Yao, B. Li and J. You, “Genetic Scheduling on Minimal Processing Elements in the Grid,” LNCS 2557, Springer-Verlag, Berlin, (2002) 465-476. R. Wolski, N. T. Spring and J. Hayes, “The Network Weather Service: a distributed resource performance forecasting service for metacomputing,” Journal of Future Generation Computing Systems, 15 (October 1999) 757-768. X. Sun and M. Wu, “Grid Harvest Service: A System for Long-Term, Application-Level Task Scheduling,” proceedings of I7th IPDPS, 25 (2003). J. D. Ullman, “NP-complete scheduling problems,” Journal of Computing System Science, 10 (1975) 384-393. Y.K. Kwok and I. Ahmad, “Link contention-constrained scheduling and mapping of tasks and messages to a network of heterogeneous processors,” Cluster Computing, 3 (2000)113-124. G.C. Sih and E.A. Lee, “A compile-time scheduling heuristic for interconnectionconstrained heterogeneous processor architectures,” IEEE Transactions on Parallel and Distributed Systems, 4(2) (1993) 75-87. L. Wang, H.J. Siegel, V.P. Roychowdhery and A. Maciejewski, “Task matching and scheduling in heterogeneous computing environments using a genetic-algorithm-based approach,” JPDC, 47 (1997) 8-22. S. Ranaweera and D. P. Agrawal, “A Task Duplication Based Scheduling Algorithm for Heterogeneous Systems,” Proceedings of International Parallel and Distributed Processing Symposium, (2000) 445-450. J. Holland, Adaptation in natural and artificial systems ed.) (Cambridge, MIT Press, 1992). H. Casanova, “Simgrid: a Toolkit for the Simulation of Application Scheduling,” Proceedings of the First IEEE/ACM International Symposium on Cluster Computing and the Grid, (2001)430-437. Y.K. Kwok and I. Ahmad, “Benchmarking and comparison of the task graph scheduling algorithms,” JPDC, 59(3) (1999) 381-422.

An Adaptive Meta-scheduler for Data-Intensive Applications* Xuanhua Shi, Hai Jin, Weizhong Qiang, and Deqing Zou Huazhong University of Science and Technology, Wuhan, 430074, China {xhshi, hjin, wzqiang, deqingzou}@hust.edu.cn

Abstract. In data-intensive applications, such as high-energy physics, bioinformatics, we encounter applications involving numerous jobs that access and generate large datasets. Effective scheduling such applications is challenging, due to a need to consider for both computational resources and data storage resources. In this paper, we describe an adaptive scheduling model that consider availability of computational, storage and network resources. Based on this model we implement a scheduler used in our campus grid. The results achieved by our scheduler have been analyzed by comparing Greedy algorithm that is widely used in computational grids and some data grids.

1 Introduction Grid is a coordinated resource sharing and problem solving in dynamic, multiinstitutional virtual organizations [1]. Now the grid computing is moving into two ways, one is the computational grid which focuses on reducing the execution time of applications that requires a great number of computer processing cycles, the other is data grid which handles the large scale data management problems. In the past the job scheduling were widely discussed, but most of them considered the data and the storage the second. In some applications the data scale is already measured in terabytes and will be soon in petabytes, e.g. high-energy physics experiments in CERN are expected to generate petabytes of scientific data by 2005 [5]. The Data Grid Project [2] is trying to solve such problems. However, the scheduling on the data grid is still a recent grid computing activities. Our focus here is on scheduling the data-intensive applications. Scheduling such applications is challenging due to a need to address a variety of metrics and constraints (e.g. resource utilization, response time, global and local allocation policies) while dealing with multiple, potentially independent sources of jobs and a large number of storage, compute, and network resources [3]. The nature of such applications means it is important to consider the data scale and the data locations when determining the job scheduling. The scheduling algorithms focusing on the maximizing processor utilization are unlikely to be efficient. In this paper, we present a scheduling model, which take both replicated data locations and maximizing processor utilizations into account, using adaptive algorithms to achieve the real-time and even fault-tolerant requirements [4]. * This paper is supported by National Science Foundation under grant 60125208 and 60273076. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 830–837, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Adaptive Meta-scheduler for Data-Intensive Applications

831

The outline of this paper is as follows. Section 2 we review the related works in grid job scheduling and data placement. The adaptive scheduling model is discussed in section 3. We apply the adaptive model on our campus grid and the results gathered are discussed in section 4. In section 5 our conclusions are presented.

2 Related Works There are a number of efforts in grid scheduling. The Condor project [14] is to develop a high-throughput computing system. The scheduling decision is to utilize idle resources. Our meta-scheduler is to improve the performance of the applications. The Resource Broker in EDG Workload Management System [15] derived from Condor, which has the same problem as Condor. The condor team has considered the data location into Condor [6] [7]. Thain et al. [6] describe a system that links jobs and data by binding execution and storage sits into I/O communities that reflect physical reality, but they do not address policy issues [8]. Basney et al. [7] define a framework, execution domain, which has the same shortcoming. Nimrod-G [17] uses grid economies to implement its scheduling policies, which does not consider the impact to the system performance of the grid applications. The AppLes project [16] schedules the applications by their own AppLes. The key to the AppLes approach is that everything in the system is evaluated in terms of its impact on the application. The system performance will be totally influenced by the accuracy the user forecast the applications features and the grid status. GrADS follows the AppLeS approach of scheduling. Ranganathan et al. [3] describe scheduling considerations in data grid environment, they describe a scheduling framework which both address the job and data scheduling. Their simulation results show that the job scheduling considering data locations has better performance. Park et al. [9] implement a data grid scheduler that considers both data location and processor cycles. We have the same thinking as theirs, but they use the Greedy algorithm (shortest response time first). Greedy scheduling has high resource cost [13] and other shortcomings. We use adaptive algorithms to schedule the jobs and the data.

3 Adaptive Scheduling Model We propose here an adaptive scheduling model for data-intensive applications. We assume a data grid model, which is similar to the one CERN presents [11]. Every site has different computational capabilities and data stores, and the input datasets are replicated among them.

3.1 Influence Factors We distinguish the factors between static and dynamic ones. The dynamic factors change over time and the static ones are determined before the tasks are executed.

832

X. Shi et al.

3.1.1 Static Factors Static factors are supposed to be stable and do not change over time, and these factors are easy to measure compared to the dynamic factors. Application Specific Factors: The application specific factors are about the size of data to be transferred, which has three aspects: size of input data, size of output data, and size of application code. Most data-intensive applications have large amount of input datasets. The data may replicate in local site, if not, fetching the data has a great impact to the performance. If the local sites do not have the required data, the application can be staged to execute remotely, and size of application codes will influence the performance of the grid. If the jobs executed at a remote site, the results should be colleted to the local site, and the size of the output also influence the performance. System Attributes of the Sites: This includes three aspects: the computing capability, the storage capacity, and the physical attributes of the network, such as bandwidth, physical distance between two endpoints. Since we are referring here static factors, the current dynamic loads of the three aspects are not considered here. 3.1.2 Dynamic Factors These factors change over time, and require a monitoring system. To achieve high performance, a forecast system is needed. There are two dynamic factors: Network Bandwidth: In the grid environment there is a high factor of uncertainty for the currently available bandwidth for a given time. The network bandwidth is one of the most important factors that influence the performance. We use NWS [10] to measure and forecast the bandwidth, and use GARA [12] to make reservations to achieve the real-time requirements. Resource Status: The resource includes computing utilities and storage resources. The computing utilities include the load of the sites and the number of the available nodes. Storage resources changes over time, but the fluctuation won’t be violent, so we only consider the static information.

3.2 Definitions We first give some definitions. A task is defined as a tuple T={S, Q, D}. S is the starting time. D is the deadline of the task. is the collection of the jobs, is the serial number of the is the standard computing time measured by Gflops. is the data the job is related, including input datasets and output datasets. The other parameters are defined as follows: The response time of user’s job i; Number of the available nodes at site i; Size of input data; Size of output data; Size of application codes; Bandwidth of WAN connection between site i and j; Processing time at site i; : Time to transfer from site i to j; The time to read the input data; The time to write the output data

An Adaptive Meta-scheduler for Data-Intensive Applications

833

3.3 Computing Model We use the little theory to compute the processing time and the file transfer time.

where

is the length of waiting queue,

is the computing capability of the site i.

The time to read the input data is computed as:

where i = j means that the input data are at the site the jobs running on. The time to transfer the output data to the local site is computed as:

where i = j means that the output data are at local site. The response time for a task is given as:

From Eq.(1), (2), (3), (4), and (5), the response time for task i is given as follow:

We suppose the user submits the job from site i and the job will execute on site j, the input data is on site k, i, j, k can be any site in the grid, and they can be equal. If i = j, we can suppose is and is 0. This assumption is also suitable for the file transferring. For different i, j, k, there are five different scenarios that may occur: local data and local execution, local data and remote execution, remote data and local execution, remote data and same remote site execution, remote data and different remote site execution.

3.4 Adaptive Scheduling Algorithm We use deadline-based scheduling method to schedule the tasks. First D is checked in the task tuple. If D is null, that means the response time for that task is unimportant, then we dispatch the task to the servers of grid environment. If the task has QoS requirement, that is D is not null, we use the following methods. We set a target function Diff as

834

where

X. Shi et al.

is the target time to finish the task, it can be calculated as:

where Opt is a parameter of expected accuracy of the prediction. The value of Opt is between 0 and 1. is computed as:

where now is the current time. If Diff 0, that means the task can be finished before the deadline, the scheduler chooses to submit the job with smallest Diff, including transferring application codes and input data. If Diff < 0, that means the task can not be finished before the deadline, the scheduler uses the resource reservation to handle this situation. Reservation includes computing and network resources. If the computing resources are reserved, the computing capability will change form time to time, illustrated in Eq.(10). If the network resources are reserved, such as network bandwidth between site i and site j, the also changes form time to time, illustrated in Eq.(11). The scheduler recomputes the target function to re-schedule the task.

For the accuracy of the prediction of task response time, the scheduler considers other factors that influence the target function. First the scheduler use a corrected load value to modify the used in Eq.(1). The submission of jobs to site i will lead to the increasing of queue length of site i. The is computed as:

where pload is the parameter of correction, we use pload=1 in this paper. While the transferring finished, the site can estimate whether it completes the task by the required deadline. If not, the task is re-submitted to other sites. The conditions under which a task is re-submitted are as follows:

where is the number of re-submission, and is maximum number of resubmission allowed. As the applications here are data intensive, the cost of the resubmission will be very high, we choose in this paper.

4 Performance Evaluation We evaluate the performance of the adaptive scheduling algorithm our campus grid. Our campus grid has five sites, most of which are based on clusters. We have two

An Adaptive Meta-scheduler for Data-Intensive Applications

835

clusters at Internet and Cluster Computing Center (ICCC) (16 nodes cluster each with Xeon 1GHz, 40GB HD, 512MB Memory and 2 nodes cluster each with Intel IA-64 1GHz, 73GB HD, 6GB Memory), and one cluster at BlueGrid Co. (16 nodes cluster each with Xeon 1GHz, 40GB HD, 512MB Memory). National Hydro Electric Energy Simulation Laboratory (NHEESL) has one cluster (5 nodes cluster each with alpha 500MHz, 376MB Memory, 20GB HD). High Performance Computing Center (HPCC) has two clusters (3 nodes cluster each with four PowerPC 375MHz CPU, 9GB HD, and 8 nodes cluster each with Power604E 200MHz, 2GB HD, 128MB Memory) and a RAID with 500GB. We test two applications on our campus grid, one is gene sequence matching application, FASTA, which needs intensive analyze on large size of database, and another is video conversion, which is both computing-intensive and data-intensive. The parameters are listed in Table 1. We run 100 FASTA jobs and 1000 video conversion jobs to test our adaptive scheduling algorithm.

We compare the Greedy algorithm with our adaptive scheduling algorithm, and study the impact of different Opt parameters to the deadline misses. We also study the impact of different number of replicas. Figure 1 shows the missing rate for the Greedy algorithm and the adaptive algorithm with different value of Opt (denoted by Opt on the x axis). The missing rate is defined as the percentage of requests that missed the deadlines. From Fig.1, we have following observations: (1) larger scale of input data leads to higher missing rate. This is due to the bandwidth of our campus grid is limited and changes over time. When the input dataset is larger, the latency of the data transferring is larger and that will cause more computing resources idle (waiting for data). (2) The Greedy algorithm has almost the same missing rate as the adaptive algorithm with Opt = 0.6 or 0.7. Figure 2 shows the impact of the number of replica to missing rate. In this experiments, we choose Opt = 0.7. The number of replica and the replica on distributed sites are illustrated as follows: (1, local), (2, ICCC XEON), (3, ICCC XEON, BlueGrid), (4, ICCC XERON, BlueGrid, HPCC), (5, ICCC XERON, BlueGrid, HPCC, NHEESL). As the number of replica increases, the missing rate is decreased. If the replica is on the more powerful site, it decreases even more.

836

X. Shi et al.

Fig. 1. Missing rate for Greedy algorithm and the adaptive algorithm with different Opt value

Fig. 2. Missing rate decreases with the number of replica increases

Figure 3 shows the performance of our campus grid with different scheduling algorithm. Again, we have: (1) the performance of Greedy algorithm changes greatly. This is due to the Greedy algorithm always selects the best grid resources for the current job, and that will cause the load unbalanced. For the other jobs, they could not get the resources needed and have to wait, that will cause more resources waste in idle. (2) Larger scale of input dataset decreases the Gflops of the grid. Same reason described in Figure 1.

Fig. 3. Performance comparison between Greedy algorithm and the adaptive algorithm

5 Conclusions and Future Works In this paper, we proposed an adaptive scheduling algorithm for data-intensive applications, which considers both computing capability and data locations. We also compared our scheduling algorithm with Greedy algorithm with two data-intensive applications and concluded that our adaptive scheduler has better performance than Greedy algorithm. Our experiment results also show that the size of the dataset has great impact to the performance of the grid. Due to the limitation of network bandwidth in our campus grid, we do not test an application partitioned and running on more sites simultaneously. Our meta-scheduler assigns job execution to one site. For our future work, we plan to design a more

An Adaptive Meta-scheduler for Data-Intensive Applications

837

complex scheduling algorithm should be constructed for job execution on multiple sites, and study the performance of partitioning the tasks into individual jobs and utilizing more computing resources such as PCs.

References [1] [2]

[3]

[4] [5]

[6] [7] [8] [9]

[10] [11] [12] [13]

[14] [15] [16] [17]

I. Foster, C. Kesselman, and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, International J. Supercomputer Applications, 2001. A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke, “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets”, Journal of Network and Computer Applications, 2001. K. Ranganathan and I. Foster, “Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications”, Proceedings of 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), Edinburgh, Scotland, July 2002. H. Jin, D. Zou, S. Wu, and H. Chen, “Grid Fault-Tolerant Architecture and Practice”, Journal of Computer Science and Technology (JCST), No.4, 2003. W. Hoschek, J. Jaen-Martinez, A. Samar, H. Stockinger, and K. Stockinger, “Data management in an international data grid project”, Proceedings of 1st IEEE/ACM International Workshop on Grid Computing (Grid’2000), Bangalore, India, Dec 2000. D. Thain, J. Bent, A. Arpaci-Dusseau, R. Arpaci-Dusseau, and M. Livny, “Gathering at the Well: Creating Communities for Grid I/O”, Proceedings of Supercomputing 2001, Denver, Colorado, November 2001. J. Basney, M. Livny, P. Mazzanti, “Utilizing Widely Distributed Computational Resources Efficiently with Execution Domains”, Computer Physics Communications, 2000. F. Berman, “High Performance Schedulers”, The Grid: Blueprint for a New Computing Infrastructure, (I. Foster and C. Kesselman, editors), pp.279-309, Morgan-Kaufmann, 1999. S.-M. Park and J.-H. Kim, “Chameleon: A Resource Scheduler in A Data Grid Environment”, Proceedings of the 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, Tokyo, 2003. R. Wolski, N. Spring, and J. Hayes, “The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing”, Journal of Future Generation Computing Systems, Vol.15, October 1999. H. Stockinger, K. Stockinger, E. Schikuta, and I. Willers, “Towards a Cost Model for Distributed and Replicated Data Stores”, Proceedings of 9th Euromicro Workshop on Parallel and Distributed Processing PDP 2001, Mantova, 2001. GARA, http://www-fp.mcs.anl.gov/qos/. A. Takefusa, H. Casanova, S. Matsuoka, and F. Berman, “A Study of Deadline Scheduling for Client-Server Systems on the Computational Grid”, Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing (HPDC10), 2001. Condor, http://www.cs.wisc.edu/condor/. Data Grid Project WP1, “Definition of Architecture, Technical Plan and Evaluation Criteria for Scheduling, Resource Management, Security and Job Description”, Datagrid document DataGrid-01-D1.2-0112-0-3,14/09/2001. Application Level Scheduling (AppLeS), http://apples.ucsd.edu/. R. Buyya, D. Abramson, and J. Giddy, “Nimrod-G Resource Broker for Service-Oriented Grid Computing”, IEEE Distributed Systems Online, 2(7), November 2001.

Dynamic Data Grid Replication Strategy Based on Internet Hierarchy* Sang-Min Park1, Jai-Hoon Kim1, Young-Bae Ko2, and Won-Sik Yoon2 1

Graduate School of Information and Communication Ajou University, South Korea {smpark, jaikim}@ajou.ac.kr 2 College of Information Technology Ajou University, South Korea {youngko, wsyoon}@ajou.ac.kr

Abstract. In data grid, large quantity of data files is produced and data replication is applied to reduce data access time. Efficiently utilizing grid resources becomes important research issue since available resources in grid are limited while large number of workloads and large size of data files are produced. Dynamic replication in data grid aims to reduce data access time and to utilize network and storage resources efficiently. This paper proposes a novel dynamic replication strategy, called BHR, which reduces data access time by avoiding network congestions in a data grid network. With BHR strategy, we can take benefits from ‘network-level locality’ which represents that required file is located in the site which has broad bandwidth to the site of job execution. We evaluate BHR strategy by implementing it in an OptorSim, a data grid simulator initially developed by European Data Grid Projects. The simulation results show that BHR strategy can outperform other optimization techniques in terms of data access time when hierarchy of bandwidth appears in Internet. BHR extends current site-level replica optimization study to the network-level.

1 Introduction A grid is large scale resource sharing and problem solving mechanism in virtual organizations [6]. Large number of computational and storage resources are linked globally to form a grid. In some scientific application areas such as high energy physics, bioinformatics, and earth observations, we encounter huge amounts of data. People expect the size of data to be terabyte or even petabyte scale in some applications [7]. Managing such huge amounts of data in a centralized manner is almost impossible due to extensively increased data access time. Data replication is a key technique to manage large *

This work is supported by a grant of the International Mobile Telecommunications 2000 R&D Project, Ministry of Information & Communication in South Korea, ITA Professorship for Visiting Faculty Positions in Korea (International Joint Research Project) by Ministry of Information & Communication in South Korea, and grant No. (R01-2003-000-10794-0) from the Basic Research Program of the Korea Science & Engineering Foundation.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 838–846, 2004. © Springer-Verlag Berlin Heidelberg 2004

Dynamic Data Grid Replication Strategy Based on Internet Hierarchy

839

data in a distributed manner. By its nature, we can achieve better performance (access time) by replicating data in geographically distributed data stores. In data grid, user’s jobs require access to large number of files. If the required files are replicated in some site in which the job is executed, job is able to process data without any communication delay. However, if required files are not in the site, they should be fetched from other sites and it usually takes very long time because size of single replica may reach gigabyte scale in some applications and network bandwidth between sites is limited. As a result, job execution time becomes very long due to delay of fetching replicas over Internet. Dynamic replication is an optimization technique which aims to maximize chances of data locality. In other words, dynamic replica optimizer running in a site tries to locate files which are likely to be requested in the near future. As the number of file hit ratio increases, job execution time reduces significantly. Various dynamic replication strategies have been introduced so far [2, 3, 11]. In this paper, we propose novel dynamic replication strategy; called BHR (Bandwidth Hierarchy based Replication). The existing replication strategies try to maximize locality of file to reduce data access time. However, grid sites may be able to hold only small portion of overall amount of data since very large quantity of data is produced in data grid and the storage space in a site is limited. Therefore, effect from this locality is limited to a certain degree. BHR strategy takes benefit from other form of locality, called network-level locality. Although the required file is not in the same site performing job, there will be not long delay of fetching replica if the replica is located in the site having broad bandwidth to the site of job execution. We call this condition as network-level locality. In data grid, some sites may be located within a region where sites are linked closely. For instance, a country can be referred to as this network region. Network bandwidth between sites within a region will be broader than bandwidth between sites across regions. Thus, hierarchy of network bandwidth may appear in Internet. If the required file is located in the same region, less time will be consumed to fetch the file. In other words, benefit of network-level locality can be exploited. BHR strategy reduces data access time by maximizing this network-level locality.

2 Related Works Dynamic replication is a long-term optimization technique which aims at reducing average job execution time in data grid. Since very large quantity of data files are deployed in data grid, there will be certain limitation of amount of files which can be stored at each site. If SE (Storage Element) at a grid site is already filled up with replicas, some of them should be deleted in order to store newly requested data. Kavitha Ranganathan et al. present various traditional replication and caching strategies and evaluate them from the perspective of data grid in [11]. They measure access latency and bandwidth consumptions of each strategy with simulation tool and their simulation results show that Cascading and Fast Spread perform best among traditional strategies. Economy based replication is proposed in [2, 3]. In economic approach, a kind of auction protocol is used to select the best replica for a job and to trigger long-term

840

S.-M. Park et al.

optimization (dynamic optimization) by using file access patterns. The authors show the improvement compared to traditional replication techniques by performing simulation with OptorSim. OptorSim is a data grid simulation tool developed as part of European Data Grid Project [8]. General data grid scenarios are modeled in OptorSim and one can evaluate various replication strategies implemented in it. The existing replication techniques mentioned above are based on file access pattern at each site. If a grid site requests some files more frequently than others, it is better for the site to hold these files for near future usage. Even though this site-level locality can reduce data access time to some extent, there remains limitation. Performance gain from site-level locality can make sense when grid sites have enough space to store large portion of data and certain predictable file access patterns come out. However, we cannot assure in many cases that a single grid site will have enough space to store large portion of whole data and there will be predictable file access patterns. We find another key to the performance improvement by broadening our view of locality to the network level.

3 Dynamic Replication Strategy Based on Bandwidth Hierarchy In this section, we propose novel dynamic replication strategy, called BHR, which is based on bandwidth hierarchy of Internet. Our BHR strategy takes benefit from network-level locality of files. The idea of proposed strategy is motivated from the assumption that hierarchy of bandwidth appears in Internet. Figure 1 shows this assumption.

Fig. 1. Bandwidth hierarchy in Internet

We assume that group of sites are located on the same network region. A network region is a network topological space where sites are located closely. This network region can be seen as an Internet topology within a country. It is generally known that lower bandwidth can be allocated for network link between sites across countries than link between sites within a country. In many cases, network region may be usually correspondent to geographical space like a country or a continent. If the required

Dynamic Data Grid Replication Strategy Based on Internet Hierarchy

841

replica is in the same region, the job is able to fetch the replica easily since broader bandwidth can be provided within a region. In contrast, if the required replica is located at the site in other region, much time will be consumed to fetch this replica via many links including highly congested one. Thus, a form of locality emerges which we call network-level locality. Main purpose of BHR strategy is to maximize this network-level locality within job execution model in data grid. BHR tries to replicate files which are likely to be used frequently within the region in near future. BHR optimizer runs both on a region and on a site cooperating with each other. Figure 2 describes detail of BHR algorithm.

Fig. 2. BHR replication algorithm

The access frequency gathered by region-optimizer means number of file requests made by jobs run on the sites within a region. It reflects regional popularity of files. If the job fetches a file from other sites and the SE is already filled up with replicas, we should determine whether storing newly received file is beneficial. If it turns out to be profitable, then we choose a file that should be deleted in order to store new replica. We apply 2step decision process. First one is avoiding duplication. The procedure 4 in Figure 2

842

S.-M. Park et al.

locates variety of replicas as many as possible in the region without duplication. Secondly, we take account of popularity of files as represented by procedure 5. In data grid, there can be popularity of file accesses, that is, certain files will be requested more frequently than others by grid job. While the previous strategies consider popularity of files at the site level, we focus on access popularity at the region level. BHR replaces unpopular files from the regional point of view. By applying above two steps, chance of hitting network-level locality can be maximized.

4 Experiments 4.1 Simulation Tool We evaluate the performance of BHR by implementing it in OptorSim, a data grid simulator developed to test dynamic replication strategies [8]. In OptorSim, general job execution scenario for data grid is modeled and various dynamic replica optimizers are implemented to test their effectiveness. After jobs are distributed to grid sites through Broker, they run on CE (Computing Element) at each site. Each job in CE has list of required replicas. For the first phase of replica optimization, Optimizer selects the best site to fetch the replica based on the available network bandwidth between sites. Then, Optimizer performs the second phase of optimization (dynamic optimization) by deciding whether storing (replicating) fetched file is beneficial or not.

4.2 Configuration We perform simulation with assumed grid network topology and job execution scenarios. Figure 3 describes the network topology assumed in our simulations. We assume there are 4 regions and each region has 13 sites on the average. File transfer time is decided according to the narrowest bandwidth along the path to the destination. Broader bandwidth can be provided between sites within a region whereas bandwidth between sites across region is relatively narrow. Since many sites within a region try to fetch files from other region through single inter-region link, this inter-region link is highly congested with network traffic and it causes hierarchy of bandwidth. In data grid environment, various job execution scenarios will be present. We try to apply general job execution scenarios as presented by Table 1. In the simulation, 1000 jobs are distributed to grid sites through broker. According to the file set each job accesses, we classify jobs into 50 job types. Each file set consists of 15 files. We assume that certain preference of job types appears. This preference of job type makes popularity of files. Each job sequentially requests access to files in a file set. There is no overlap between file sets each job type accesses and the size of single file is 1 GB. Therefore, total size of data in this configuration is 750 GB (50 job types * 15 files in a file set * 1 GB for each file). We assume all files are initially held at master site. Replication takes place after jobs start to execute at each site.

Dynamic Data Grid Replication Strategy Based on Internet Hierarchy

843

Fig. 3. Grid topology in simulation

4.3 Results We compare the performance of BHR with site-level file replacement schemes, LRU Delete and Delete Oldest. In LRU Delete, the least recently accessed file is chosen for deletion whenever replacement takes place. Delete Oldest is another replacementbased scheme which deletes the oldest file in SE first when newly required replica is received and replacement is necessary. In order to easily interpret the result, we assume that all network links within region (intra-region) show same bandwidth. And also all inter-region link bandwidths are assumed to be the same in the scenario. Initially, we roughly set the bandwidth and storage space as shown in Table 2. We set the bandwidth between master site and its adjacent router as 2000 Mbps. It is much broader than other links to avoid effects from network traffic congestion at master site.

844

S.-M. Park et al.

Figure 4 shows the achieved results with initial parameters. BHR takes the least total job execution time among strategies. It takes 33,174 seconds which is about 30 % less than other strategies. Since size of SE at each site, 50 GB, is not enough to hold large portion of overall data (750 GB), we cannot achieve much performance improvements with site-level replacement schemes. However, BHR strategy takes benefit from network-level locality by locating variety of files in a region as many as possible. Also, it locates files which are likely to be used in the region based on regional access history. In this simulation, Delete LRU and Delete Oldest show almost the same job execution time. The reason is that we do not assume any specific file access pattern in the data grid system.

Fig. 4. Total job times with parameters shown in Table 1 and Table 2

We continue performance evaluation with varying bandwidths and storage spaces. As we increase the inter-region bandwidth, file transfer via inter-region link does not take long although it is highly congested with data traffics. Thus, hierarchy of bandwidth becomes indistinct. Here we show results with varying bandwidths in Figure 5 (a). When we set narrow bandwidth on the inter-region link, BHR outperforms other strategy considerably. However the differences of job execution time become smaller as broader inter-region bandwidth is set. Finally, the difference becomes negligible when more than 1800 Mbps is provided for inter-region bandwidth. We can conclude that BHR strategy can be effectively utilized when hierarchy of bandwidth appears apparently. Size of SE in a grid site also affects the result significantly. As we mentioned, traditional replacement-based scheme can be effective when large storage space is provided in a grid site. In Figure 5 (b), as the size of storage space decreases in grid sites, BHR outperforms other strategies greatly. However, as the storage size increases, job execution time of two replacement-based schemes reduces sharply. The reason is that the file hit ratio in a site increases when large number of replicas can be stored in a site, and regional file hit ratio also increases though no region-based optimization strategy is applied. After all, efficiency of all three strategies become almost the same when large quantity of storage is provided at a site. Our BHR strategy can be more effective when grid sites have relatively smaller storage. One may argue that 100 GB is not an impractical storage size for a grid site. However, enough size of storage which makes the site-level replacement schemes effective is relative to the total size of data in a data grid system. In this Simula-

Dynamic Data Grid Replication Strategy Based on Internet Hierarchy

845

tion, only 750 GB of data is assumed to be in data grid while, in practice, terabyte or even petabyte scale of data is expected to be common in data grid.

Fig. 5. Total job time with varying bandwidth and storage size

5 Conclusion and Future Works In this paper, we propose novel dynamic replica optimization strategy which is based on the network-level locality. BHR tries to replicate popular files as many as possible within a region, where broad bandwidth is provided between sites. The simulation results show that BHR takes less job execution time than other strategies especially when grid sites have relatively small size of storage and hierarchy of bandwidth clearly appears. BHR extends current site-level replica optimization study to more scalable way by exploiting network-level locality. In our future work, we plan to survey actual Internet topology for data grid and study on how we group the grid sites as a region. Also, we will collect the experimental data such as data access patterns from real data grid applications and apply it to our BHR strategy to verify its performance in practical applications.

References [1]

William H. Bell, David G. Cameron, Luigi Capozza, A. Paul Millar, Kurt Stockinger, and Floriano Zini.: Simulation of Dynamic Grid Replication Strategies in OptorSim. In Proc. of the 3rd Int’l. IEEE Workshop on Grid Computing (Grid’2002), Baltimore, USA, November 2002. Springer Verlag, Lecture Notes in Computer Science. [2] William H. Bell, David G Cameron, Ruben Carvajal-Schiaffino, A. Paul Millar, Kurt Stockinger, and Floriano Zini.: Evaluation of an Economy-Based File Replication Strategy for a Data Grid. In International Workshop on Agent based Cluster and Grid Computing at CCGrid 2003, Tokyo, Japan, May 2003. IEEE Computer Society Press. [3] Mark Carman, Floriano Zini, Luciano Serafini, and Kurt Stockinger.: Towards an Economy-Based Optimisation of File Access and Replication on a Data Grid. In International Workshop on Agent based Cluster and Grid Computing at International Symposium on Cluster Computing and the Grid (CCGrid’2002), Berlin, Germany, May 2002. IEEE Computer Society Press.

846 [4]

[5] [6] [7]

[8] [9]

[10]

[11]

S.-M. Park et al. Ann Chervenak, Ian Foster, Carl Kesselman, Charles Salisbury and Steven Tuecke.: The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets. Journal of Network and Computer Applications, 23:187-200,2001. EU Data Grid Project: http://www.eu-datagrid.org I. Foster, C. Kesselman and S. Tuecke.: The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International J. Supercomputer Applications, 15(3), 2001. Wolfgang Hoschek, Javier Jaen-Martinez, Asad Samar, Heinz Stockinger and Kurt Stokkinger.: Data Management in an International Data Grid Project. 1st IEEE/ACM International Workshop on Grid Computing (Grid’2000), Bangalore, India, Dec 2000. OptorSim – A Replica Optimizer Simulation: http://edg-wp2.web.cern.ch/edg-wp2/optimization/optorsim.html Sang-Min Park and Jai-Hoon Kim.: Chameleon: A Resource Scheduler in a Data Grid Environment. 2003 IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID’2003), Tokyo, Japan, May 2003. IEEE Computer Society Press. Kavitha Ranganathan and Ian Foster.: Design and Evaluation of Dynamic Replication Strategies for a High Performance Data Grid. International Conference on Computing in High Energy and Nuclear Physics, Beijing, September 2001. Kavitha Ranganathan and Ian Foster.: Identifying Dynamic Replication Strategies for a High Performance Data Grid. International Workshop on Grid Computing, Denver, November 2001.

Preserving Data Consistency in Grid Databases with Multiple Transactions Sushant Goel1, Hema Sharda1, and David Taniar2 1

School of Electrical and Computer Engineering, Royal Melbourne Institute of Technology, Australia [email protected] [email protected] 2

School of Business Systems, Monash University, Australia [email protected]

Abstract. High performance Grid computing provides an infrastructure for access and processing of large volume, terabyte or even petabytes, of distributed data. Research in data grid has focused on security issues, resource ownership, infrastructure development and replication issues assuming presence of single transaction in the system. In this paper we highlight that grid infrastructure comes with new set of problems in maintaining the consistency of databases in presence of multiple transactions. Traditional distributed data management techniques may not meet the requirements of databases in grid environment. We first show the circumstances where grid infrastructure may produce incorrect results and then propose a correctness condition – Grid Serializability Criterion that preserves the consistency of data in data-grids.

1 Introduction Grid computing can be defined as a type of parallel and distributed system that enables the sharing, selection and aggregation of geographically distributed autonomous resources dynamically at runtime depending on their availability, performance, cost and users Quality of Service [9]. Following the grid infrastructure the need for data grid [3] was realized. Intension behind developing data grid was to access and process geographically distributed data in computationally effective way. Handling distributed data has many research issues like scheduling of transactions, query execution, maintaining consistency of data etc. Recent research in data grid has focused mainly on file structures and read-only transactions [2,5,8]. Various replication strategies [1,3,5,6] have also been proposed to maintain the consistency of the data in these Network File System (NFS) and Distributed File System (DFS). Many applications need to modify the distant data, but with the replication strategy it may be difficult to maintain the correctness of the data in the data repository. Sometimes data consistency is referred as keeping the replicated data items synchronized [5]. This definition of data consistency assumes a single transaction in the system. We model the scenario where multiple transactions are in action and the transactions can be any combinations of read/write operations. In this paper we M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 847–854, 2004. © Springer-Verlag Berlin Heidelberg 2004

848

S. Goel, H. Sharda, and D. Taniar

always refer consistency as the correctness of data in the data repository, we also use correctness and consistency interchangeably. Thus, we focus on interleaving of multiple transactions; the concept of interleaving of transactions is known as serializability [10] in database systems. Based on centralized serializability, we name the correctness criterion in grid environment as Grid Serializability Criterion (GSC). The motive behind modelling multiple transactions with any read/write combination would be clear in the following sections where we use an example to demonstrate that under this scenario the data grid may produce incorrect result. Large-scale science and engineering problems need to access geographically separated data. Weather forecast, astronomical predictions, earth observation and applications that need collaborative analysis will have to retrieve data from different data sources. Traditional file systems like NFS and DPS, or Database Management Systems (DBMS) [11] have either of two strategies to manage transactions [9] (a) Central scheduling scheme (b) Decentralized consensus-based scheme. The central scheduler may become performance bottleneck due to central overloaded site by using these two scheduling strategy. Motivation behind this research of decentralized processing of data is that the enormous amount of data that today’s application has to handle, it is difficult to ship the data between sites and at the same time it is computationally very expensive to have a centralized scheduler. Secondly, centralized scheme and decentralized consensus based scheme may be suitable for read-only transactions. But, if transaction of any application has to modify any data source in the distributed data repository then the central scheduler can become a performance bottleneck. We show that new set of problems come into existence with introduction of distributed schedulers in grid environment. One of the major issues in decentralizing the scheduling responsibilities is the threat to maintain the consistency of the data. This paper looks at the future data-intensive grid applications from data correctness point of view. Section 2 gives an overview of data-grids. Section 3 discusses the new dimension of problem that grid infrastructure introduces in multi-transaction environment. Section 4 explains Grid Serializability Criterion, section 5 shows the correctness of the algorithm. Section 6 concludes the work with future directions.

2 Database on Grid: An Overview Most of the work done in data grid infrastructure assumes the existence of file systems like NFS and DFS for data storage, they also assume the execution of single transaction at a given time [6,7,5,8]. NFS and DFS are not designed to meet the stringent requirements of high performance data intensive applications [7].

2.1 Data-Grid With increasing diversity of scientific disciplines, amount of data collected is increasing. In domains as diverse as global climate change, high-energy physics and computational genomics the volume of data is already being measured in terabytes and will soon be measured in petabytes [3]. It becomes increasingly difficult to handle

Preserving Data Consistency in Grid Databases with Multiple Transactions

849

this volume of data if the data itself is geographically distributed. Grids will enable ubiquitous access to data and data-collection systems spread across the globe. Two major challenges for data-intensive applications would be to adapt the distributed data-scheduling environment and to keep the number of messages in the system to minimum while maintaining the correctness of data. Under these circumstances we need to integrate end systems in a more effective manner using new hardware and software infrastructure. Several design questions have been identified to meet the requirements for this architecture like, How to provide metadata information that describes data set’s location? What is the effect of caching of data in performance of the application? How replication affects the performance? What happens if the total amount of data is larger than the transmission capacity of the grid? How to synchronise the remote copies of data? What are the interfacing techniques between different databases? Past research [1,4,5,6,8] has focused on these issues. These issues assume only single transaction is executing in the system which is read-only and the data is stored in file structure. Our main concern in this paper is to deal with transactions that tend to modify the data items in data repository with multiple transaction execution.

2.2 Data-Grid Architecture Fig. 1. shows a general architecture of data intensive application where data is collected and stored at multiple remote sites. Initial research in grid computing was focused on management of computational resources. Most of the research was limited to file I/O operations, management of file caches, security and replication issues [4,12,6]. Lately researchers realized the importance of integrating data sites with grid infrastructure [3]. Replication and cache management issue assumes the read-only environment in the data intensive systems [3], which is not always true.

Fig. 1. Geographically separate databases using grid infrastructure

Traditional NFS provides access to remote data with uniform data namespace but because of lack of replication and batch I/O cannot have good performance. Parallel database systems like bubba, gamma [11] can provide collective I / O but are not designed to meet the distributed nature of grid computing [7]. Present database systems need central (or, consensus based decentralized) coordinator to achieve correctness of data, widely known and accepted as database serializability [10]. The

850

S. Goel, H. Sharda, and D. Taniar

centralized serializability criterion has to be extended and should evolve to such an extent that it could be implemented in grid infrastructure. With the database applications we are expecting to evolve in near future and the amount of data that will be generated, petabytes of data per year, it is impossible to use any of the present serializability criterions that could maintain the correctness of data in grid database systems environment.

3 Data Consistency in Multi-transaction Environment Major research interest in grid computing has been towards developing fastinterconnected networks, high throughput applications, moving data efficiently between sites, developing fast and scalable applications [1,3,5,6,8,]. Accepting the importance of these infrastructural developments, in this section we show the importance of maintaining the consistency of the database under multi-transaction environment that intend to modify the data. Correctness criterion for read-only transactions is different than transactions that modify the data. With application size spanning to global level it becomes difficult to manage central coordinator and possibilities of mutually independent scheduling techniques have to be explored. Distributing the scheduling responsibilities to local database sites may pose a threat to the consistency of data. We explain the problem by following example.

Fig. 2. Consistency of data in distributed data set

Without loss of generality we assume a scenario where an application has to access data from three different data sites – data-site 1, data-site 2 and data-site 3 (see fig 2). These three sites are connected by the state-of-the-art grid technology middleware and can communicate between themselves by high bandwidth network connections. Let us assume that two transactions, transaction 1 and transaction 2, are submitted to data-site 1 which needs to access data from other two data-sites as well, data-site 2 and data-site 3. By using the metadata information the database system at site 1 can form two subtransactions for each of the transaction i.e. Sub-transaction 1 Site 2, Sub-transaction 1 Site 3 and Sub-transaction 2 Site 2, Sub-transaction 2 Site 3.

Preserving Data Consistency in Grid Databases with Multiple Transactions

851

And then submit the subtransactions to the respective data-sites. If the transaction is read-only transaction it does not pose any threat to consistency but if the transaction intend to modify any data item it must synchronize the access to the data items by using semaphores or locks. By synchronization we mean, if Sub-transaction 1 Site 2 precedes its execution to Sub-transaction 2 Site 2 at site 2 then Sub-transaction 1 Site 3 must also precede its execution to Sub-transaction 2 Site 3 at site 3. Under these circumstances, delegating scheduling responsibilities to individual data-sites may produce incorrect interleaving. Thus we see that there is a genuine problem of scheduling that may be menace to correctness and consistency of the data.

4 Proposed Grid Serializability Criterion In this section we propose a Grid Serializability Criterion (GSC) that enforces a totalorder in the schedule to ensure correctness of data in multi-transaction environment. We name the correctness criteria with the word Serializability in between to correlate with the correctness criteria already existing in single DBMSs, conflict Serializability [10]. Total-order is required only for those transactions that accesses more than one data-site simultaneously and is implemented by using a unique timestamp value.

4.1 GSC Algorithms Following functions are used to demonstrate the algorithm: returns set of subtransactions that accesses different data-sites, returns the set of data-sites where the subtransactions for are to be executed. Cardinality() returns the number of elements in the set. Append_TS(Subtransaction) appends timestamp to the subtransaction. We assume that architecture is capable of producing unique timestamp values. Phase I: The transaction starts execution (algorithm 1). 1. As soon as the transaction arrives at the local database, splits the transaction into multiple subtransactions according to the allocation of data. 2. If there is only one subtransaction required by the transaction, the transaction can be submitted to the data-site immediately without any delay. The transaction with one subtransaction will not conflict with other transactions. 3. If multiple subtransactions are required by the transaction, the grid infrastructure appends a timestamp with every subtransaction before submitting it to the corresponding data-site. 4. If there is a transaction that access more than one data-site then the subtransactions are submitted to the data-site’s local scheduler. The subtransactions from the scheduler are executed strictly according to the ts. 5. Subtransactions from step-2 can be assumed to have lowest timestamp value e.g. 0 and can be scheduled immediately. 6. When all subtransactions of any transaction complete the execution at all the sites, only then commits (algorithm 2). Working of the algorithm is explained below:

852

S. Goel, H. Sharda, and D. Taniar

Phase II: Termination condition for transaction. 1. When any subtransaction finishes execution, it reports to the originating site. 2. The originating site checks whether the subtransaction is the last subtransaction to terminate. 2a. If the subtransaction is not the last to terminate, then that site is removed from the site_accessed() set. 2b. If the subtransaction is the last subtransaction of the transaction to terminate, then that transaction is removed from the Active_Trans() set. We intend to achieve two major advantages by using the Global Serializability Criterion that uses a decentralized scheduling approach: Reducing the load from the originating site: Cental scheduling scheme and decentralized consensus bases policies intend to delegate the originating site of the transaction as the coordinator. The proposed criterion delegates the scheduling responsibility to the respective sites where the data resides and thus avoids the originating site becoming the bottleneck. Reducing the number of messages in the inter-network: Centralized and consensus based decentralized scheduling schemes need to communicate with the coordinator to achieve correct schedules. The communication increases number of messages in the system. Messages are one of the most expensive things to handle in any distributed infrastructure. The algorithm intends to reduce the number of messages in the system.

Preserving Data Consistency in Grid Databases with Multiple Transactions

853

5 Correctness of the Algorithm We assume that the local databases are capable of guarantying correctness of data, stated in proposition 1, but as discussed earlier there is a need for additional strictness in grid infrastructure to maintain the correctness of data. We achieve this additional strictness by applying the concept of total-order, which in turn is achieved by using timestamps. In this section we show that GSC preserves the correctness and consistency of data in the grid. Proposition 1: All local database sites always schedule all transactions in correct order. To prove the correctness of the proposed algorithm we show that the additional criterion enforced by the total-order will guarantee GSC in grid systems. The totalorder enforced only determines the way subtransactions are submitted to the sites. If any transaction accesses more than one geographically separated database then the subtransactions cannot be scheduled immediately and thus have to be submitted to wait_queue (Proposition-2). Each site’s queue schedules the subtransactions according to their timestamp and thus guarantying the total-order of the transaction from global perspective. Proposition 2: The originator site submits the subtransactions to the wait_queue of site’s being accessed by that transaction, if transactions access more than one mutually exclusive database. Transactions that access more than one local databases can produce incorrect scheduling of subtransactions, Proposition-2 ensures that subtransactions are executed according to total-order and thus guarantee correctness of data in data-grids. Lemma 1: For any two transactions that follows the Grid Serializability Criterion, either all of subtransactions are executed before at all the sites where the subtransactions execute or vice versa. Proof: Following two cases are to be considered: Case 1) A transaction requires only single subtransaction: This situation is shown by if condition of the algorithm. The subtransaction is submitted immediately as shown in the algorithm flow chart. From Poposition-1 it follows that any other subtransaction either precedes or follows the subtransaction of As we have seen earlier that the transaction with only one subtransaction cannot produce incorrect interleaving. Case 2) A transaction splits into multiple subtransactions: This situation is shown by else condition of the algorithm. Under this condition schedulers at sites may schedule the transactions that can produce incorrect interleaving. Hence, transactions in the sites are submitted in the wait_queue instead of scheduling it immediately (Proposition-2), which executes transactions strictly according to the timestamp order. Say, transaction has two subtransactions and already executing at and When arrives and it also has and Then, if condition of the algorithm fails and the subtransactions are submitted to the wait_queue at each PE. Assume that the timestamp of A unique timestamp is appended to the subtransactions of and Then will precede at both the sites because the transactions are scheduled strictly according to the timestamp value

854

S. Goel, H. Sharda, and D. Taniar

avoiding execution of incorrect schedules and thus ensuring the total-order of transactions. Thus GSC avoids any incorrect interleaving of subtransactions.

6 Conclusion We have seen that despite a lot of infrastructural development for deploying grid computing and making it a reality, data-intensive applications and maintaining correctness of data in such applications have been neglected. In this paper we first show that high performance grid infrastructure may pose threat to the consistency of data in data-intensive applications that implement existing correctness criterions. We then propose a correctness criterion based on total-order of subtransactions that ensures the correctness of data. Finally, we demonstrate the correctness of the proposed criterion. Present work does not take failure of sites into consideration. Future work would focus in extending the algorithm to encounter local site failures.

References [1]

W. Hoschek, J. J., Martinez, A. S. Samar, H. Stockinger, and K. Stockinger. “Data management in an international data grid project.” ACM Workshop on Grid Computing (GRID-00), 17-20 Dec., pp 77-90, India, ‘00 [2] B. Allcock, I. Foster, V. Nefedova, A. Chervenak, E. Deelman, C. Kesselman, J. Leigh, A. Sim, A. Shoshani, B. Drach, D. Williams, “High-Performance Remote Access to Climate Simulation Data: A Challenge Problem for Data Grid Technologies.” Proc. of SC, ‘01. [3] A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, S. Tuecke, “The Data Grid: Towards an architecture for the Distributed Management and Analysis of Large Scientific Datasets”, Journal of Network and Computer Applications, vol. 23, pp 187-200, ‘01 [4] C. Baru, R. Moore, A. Rajasekar, M. Wan, “The SDSC storage Resource broker”, in Proceedings of CASCON ’98 Conference, ‘98. [5] D. Dullmann, W. Hosckek, J. Jaen-Martinez, and B. Segal, A.Samar, H. Stockinger, K. Stockinger, “Models for Replica Synchronization and Consistency in a Data Grid,” Proc. IEEE Symposium. on High Performance on Distr. Computing, ‘01. [6] I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, A. Roy, “A Distributed Resource Management Architecture that Supports Advance Reservations and Coallocation”, in Proceedings of the International Workshop on Quality of Service, pp 2736, ‘99. [7] I. Foster and C. Kesselman, “Globus: A Metacomputing Infrastructure Toolkit”, International Journal of Supercomputer Applications, vol 11(2), pp 115-128, ‘97. [8] H. Stockinger, “Distributed Database Management Systems and the Data Grid”, IEEE Symposium on Mass Storage Systems ‘01. [9] R. Buyya, “Economic-based Distributed Resource Management and Scheduling for Grid Computing”, PhD thesis, Monash University, Australia, ‘02. [10] P. A. Bernstein, V. Hadzilacos, N. Goodman, Concurrency Control and Recovery in Database Systems, Addision-Wesley, ‘87. [11] T,Ozsu, P.Valduriez, “Distributed and Parallel Database Systems”, ACM Computing Surveys, vol.28, no.1, pp 125-128, March ‘96. [12] I. Foster, C.Kesselman, G.Tsudik, S.Tuecke, “A Security Architecture for Computational Grids”, ACM Conf. on Computers and Security, pp 83-91, ACM Press, ‘98.

Dart: A Framework for Grid-Based Database Resource Access and Discovery Chang Huang, Zhaohui Wu, Guozhou Zheng, and Xiaojun Wu Grid Computing Laboratory, College of Computer Science, Zhejiang University, 310027 Hangzhou, China {changhuang, wzh,

zzzgz, wuxiaojun}@zju.edu.cn

Abstract. The Data Grid serves as a data management solution widely adopted by the existent data-intensive Grid applications. However, we argue that the core Grid data management demands can be better satisfied with introduction of database. In this paper, we provide a database-oriented resource management framework, which is intended to integrate database resources with the Grid infrastructure. We start by outlining the sketch of the proposed Database Grid architecture and then focus on two base-level services: the remote database access and database discovery, which we think should be firstly settled down as a necessary foundation for other high-level database services that are more application-driven. We discuss how the design principles that apply to these base-level services can adapt to the characteristics of Grid environment and how they can be nested within the OGSA paradigm.

1 Introduction As a pillar of the Grid computing infrastructure, the Grid data management was originally motivated by large data-intensive e-science applications in the distributed, widearea based context. The two components of the Data Grid [4] [5] [6] [7]: Grid FTP and Replica Management respectively address the resource access and discovery issues for storage resources. The Data Grid technology has been thus far successfully applied in many e-science applications (e.g. high-performance physics experiments and climate simulation). Though this file-based data management strategy proved to be a simple and effective solution in many Grid application testbeds, it has weakness in some aspects. Firstly, the GridFTP-interfaced storage resources function as networked file servers, which does not offer data query capability. The processing work on data files has to be done on the client side by specific programs, which will increase the end-users’ burden and probably lead to duplicate development work. Secondly, since files are not bound to a fixed structural and semantics definition scheme, this flexibility in structure implies that data integration from different organizations would become an extremely difficult task.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 855–862, 2004. © Springer-Verlag Berlin Heidelberg 2004

856

C. Huang et al.

We argue the weakness shown above can be relieved with introduction of database. First of all, database is a kind of compound computing facility, which can not only serve as data storage but also own strong data query capability. In database-involved application scenarios, data is communicated between data users and providers in structural records other than blocks. DBMS provides a variety of statement operations, which can be utilized by end users to achieve complex data processing task. The client-side programs can therefore be simplified a lot. In addition, the introduction of database also benefits data integration. Being conformant to a common data model (e.g. relational model), data format or schema is much easier to be extracted, compared and integrated. Moreover, the call for database support is also raised due to the increasing data-sharing need in both today’s e-science and e-business areas. Many organizations have stored tremendous volumes of data in very large databases. To avoid these information resources becoming isolated islands, a novel infrastructure needs to be proposed to make these database resources accessible and interoperable in the dynamic, open, loosely coupled Grid computing environment. We believe the virtualization of database as a type of Grid resource and their direct participation into Grid computing activities will better serve the needs of a wider range of Grid applications than the current Grid data management infrastructure dominated by the Data Grid. For this purpose, we proposed the Database Gird, a framework that wraps the existent relational database systems and expose a series of functional services at different levels in support of database resource management in the Grid context. We are currently working on a middleware project named DART (Database Access Remotely & Transparently), which aims to design and implement the specification of the Database Grid. In this paper, we introduce two core components in DART: the remote database access service, and the database resource discovery service, which we think set a necessary foundation for other high-level database related Grid services. The remaining part of the paper is organized as follows. In the next section, we give an overview of the Database Grid, introducing the major components and their relationship. In section 3 and 4, we describe the two core components respectively: database resource access and discovery services. We focus on their design issues that adapt to the Grid context. We give a brief introduction to the related work in section 5. We conclude by a summary of system attributes and a future work plan.

2 The Database Grid Overview We envision constructing a layer of middleware between the existent relational database systems and Grid applications (Figure 1). This middleware is comprised of a number of OGSA services. We divide these services into two categories: the core layer and the adjunct layer. The core layer focuses on two basic functionalities that is database resource access and discovery. The remote database access paradigm is defined in the GridSQL protocol and implemented as the Grid Database Access & Management Service (hereafter GDAMS). GridSQL adopts SQL as the communication language with relational

Dart: A Framework for Grid-Based Database Resource Access and Discovery

857

database resources, and introduces some advanced mechanisms to increase reliability and performance of database access in the wide-area environment. The discovery functionality aims at two different-level goals: general resource metadata publication which applies to any type of Grid resource, and content-based data source location which is specific to database resource. We implemented the first part within the OGSA information service framework and developed a knowledge-based database index service for the second part. The adjunct layer, which is expected to expand its scope continually, contains those functional components that perform more application-driven activities. For example, it is convenient in some applications to access tables in different databases via a global view. A schema management service is therefore devised to support users to define and maintain such global views. Since the local database schema may change over the application’s life-period, it is indispensable to monitor local schema changes. In addition, to avoid site overload due to the multiple simultaneous query processing sessions connected to the same access point, we separate the functionality of federated data access from view definition and assign them to different functional components. The specialized components, which perform complex computation, such as query processing, optimization and transaction management, are called query engine. Query engines are independent of global views. There may exist several engines in a database VO. Applications can locate an appropriate one through the information service (such as MDS) according to the criterion like throughout, current working load etc. Query engines can dynamically load view definition from schema management service and coordinate the corresponding databases to cooperate on a distributed query or transaction. Besides the above services, there are many other databaserelated functions we can virtualize as services. Raman et al. [11] list a group of such database-related services on the Grid. From the OGSA’s point of view, they are “higher-services” at the collective level, which can invoke the base-level services to serve more application-driven needs.

3 Remote Database Access and Management The Grid is different from other distributed computing environment in that a typical Grid application is participated and accomplished by a number of autonomous resource nodes across the Internet. In an environment without central control, these resources are dynamically configurable and available to all applications. We provide a solution to enabling database systems to directly participate in Grid computing, accessible and discoverable to Grid applications, as computational resources, storage resources and other existent Grid computing facilities, which have been moved above the Grid. In the traditional database access mode, a database session is based on a stable physical connection between the client and server. Unfortunately, we cannot assumeinvariable network condition in the wide-area environment. In addition, the traditional database usage paradigm lays little emphasis on data transfer in terms of

858

C. Huang et al.

Fig. 1. The Service architecture of the Database Grid

efficiency, since data transfer is normally accomplished in excellent network condition with a high bandwidth. However in the dynamic wide-area context, efficient and reliable data transfer would become a very critical performance issue, especially in cases where large datasets need to be transferred over a long distance. Even in a good network condition, the cost of data transfer may be considerable when large datasets are involved. On the point of large dataset transfer, the traditional database technology does not provide a satisfactory solution. We propose GridSQL, a remote database access protocol, which adopts SQL statements as command messages for clients to contact heterogeneous relational databases and includes the following new features: Connectionless client-server mode: GridSQL is based on the connectionless web-service/SOAP messaging passage. A virtual session still exists between client and server. However, its interaction is based on a series of service invocation, which does not rely heavily on a stable physical connection. Bulkload: In order to reduce the access latency caused by large dataset transmission, we devise a separate data channel to handle large dataset transfer. We choose to use Grid FTP to implement such a data channel, as we believe its advantage in parallel data transfer can be employed. Its other fault-tolerant features, like restorable transfer are also desirable in the dynamic environment. Multicast result-delivery: Multicast is a one-requestor, multiple-receiver invocation pattern, which means the resultset of a query submitted by one client can be sent out to multiple designated clients. It can be used as an implicit way to database authorization. By this means data consumers without database access permission can retrieve data from the database through some authorizer database client.

Dart: A Framework for Grid-Based Database Resource Access and Discovery

859

Third-party data unit movement control: In those analysis-oriented applications, a data query usually involves a large resultset and needs to be executed frequently. It would be convenient to build a mirroring table in the client’s local database to optimize query performance and reduce network load. GridSQL allows a third-party client to perform such a data unit movement between two database servers.

Fig. 2. A query scenario using GridSQL

Figure 2 depicts a query process using Bulkload and Multicast result-delivery. Note the client library of GridSQL provides an API that is similar to that of other database connectivity tools, such as JDBC or ODBC, although they are implemented differently.

4 Database Resource Discovery Classified by purposes, various discovery services can be divided into two categories. Basically, there is a general hierarchical discovery service serving primarily as a name-based location service, allowing users to discover what resources are available within a VO but not supporting sophisticated queries on those resources [2]. Above these general directory services are more specialized aggregate directories, which define an alternative organization or namespace to create a view that is optimized towards resource specific usage pattern. These high-level metadata services vary by resource types. For example, in the computational Grid, a specialized component Broker [9], communicating with the MDS service, is used to answer RSL queries from clients to help locating computing resource. In the DataGrid, the replica catalog [7] is queried to determine which physical replica is the best among a number of candidates in terms of access efficiency. Similarly, the database resource discovery framework is also organized in two levels. The name-based Database Metadata Serv-

860

C. Huang et al.

ice is responsible for physical database metadata retrieval, while the content-based Knowledge Index Service is for database resource location.

4.1 Database Metadata Service A database resource is characterized in terms of a number of database physical parameters, which must be collected by clients prior to connection building and statement execution. We have the following items to compose the database resource information model. Schema definition: the structure information of accessible tables or views; DBMS description: such as the vendor name and version ID. Service attributes: physical parameters relevant to GDAMS service. Privilege information: grantable access privileges, including system-level privileges (e.g. create table) and table-level privilege (e.g. select, and insert). Statistics: dynamic system attributes, such as CPU utilization, available storage space, active session number and so on. These metadata items can be used for system performance evaluation and resource selection. The OGSA information service or MDS3 provides an extensible framework to allow user-defined metadata models to be integrated with the local index service and aggregated into high-level index services through the soft-state registration protocol. We developed a database service data provider program to maintain database resource information model and integrate it with MDS Service.

4.2 Knowledge Index Service The Knowledge Index Service is intended to achieve location-transparent database discovery. The premise to achieve this goal is to develop a method to organize database resources of semantic conflicts in a semantically coherent space. Much previous work adopted knowledge-based approach to address the semantic conflicts between heterogeneous information sources and proposed a number of prototypes [14] [15]. A common point in these approaches is to introduce a domain model or ontology, which consists of a number of domain concepts that can be referenced by information sources in order to publish their local data. We employed the ontology-based approach to construct Knowledge Index Service, which serves as a semantic registry for database resources. We adopted a classic method to model database schema with ontology concepts. This method originated in [13] to discuss the problem of rewriting queries using views. We omit the description of this method for limitation of space. This knowledge index service provides two basic functions: Annotation This interface provides operations that allow database providers or third-party users to build mapping relationship from database tables to ontology concepts. Annotation can be made at different granularity levels. At the fine-grained granularity, annotators must specify information like which attribute is mapped to which property. A simply alternative only needs to specify rough mapping relationship between tables and concepts without indicating the attribute-level structural mapping information.

Dart: A Framework for Grid-Based Database Resource Access and Discovery

861

Location Corresponding to the two forms of annotation, there are two approaches to database discovery. In the simple query mode, users can look up which database tables are relevant to a specified concept. Then they can further find detailed information via the database metadata service and then construct and submit a SQL query. In the advanced mode, users can submit a full-qualified knowledge query, which is a structural expression in terms of the ontology concepts and predicates. A reasoner program will parse this query and return a query plan made up of a number of SQL queries on relevant tables. The figure below gives an example of this query conversion process.

Fig. 3. An example of knowledge index query

5 Related Work There are several ongoing projects, which investigate the use of database in the widearea, autonomous environments. The recent projects corresponding to this topic are the OGSA-DAI project [10][11]. The OGSA-DAI project is drafting a general grid service interface for accessing grid data sources, including relational and XML databases through a variety of query languages. The aims of our project and all above related projects are similar in providing solutions to database resource management in the Grid context. We used much of their result of Grid data requirements investigation. However we focus on different sub problems and adopt different methods and architecture. We believe the mutual ideas contributed by different sides would be supplementary to formalize the final standard.

6 Conclusions and Future Work We have described the Grid-oriented resource management and discovery architecture for database systems. We focus on two fundamental functionalities: highperformance, efficient remote database access and the database resource discovery. We have presented our initial design of these two services. These core database services are the stage goal of our Dart I project. We are going to deploy the middleware on tens of distributed TCM (Traditional Chinese Medicine)

862

C. Huang et al.

databases, which is part of our TCM Info-Grid program [12], to make further performance evaluation. After that, we plan to start Dart II, which is aimed to design and implement some high-level database Grid services, driven by the TCM applications.

References 1. 2.

3.

4.

5. 6.

7. 8. 9.

10. 11.

12.

13.

14. 15.

I. Foster, C. Kesselman, J. Nick, S. Tuecke, “Grid Services for Distributed Systems Integration”, IEEE Computer, 35 (6), 2002, pp. 37-46. K. Czajkowski, S.Fitzgerald, I.Foster, C.Kesselman. Grid Information Services for Distributed Resource Sharing. Proceedings of IEEE International Symposium on Highperformance Distributed Computing (HPDC-10), IEEE Press, 2001. I. Foster, C.Kesselman, J.Nick, S.Tuecke. The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration. Open Grid Service Infrastructure WG, Global Grid Forum, June 22, 2002. B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnal, and S. Tuecke. Data Management and Transfer in High Performance Computational Grid Environments. Parallel Computing Journal 28 (5), May 2002, pp. 749-771 Allcock, Bill et. al. “Secure, Efficient Data Transport and Replica Management for HighPerformance Datalntensive Computing”. IEEE Mass Storage Conference, 2001 A. Chervenak, I. Foster, C. Kesselman, C. Salisbury, and S. Tuecke, “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Data Sets,” Network and Computer Applications, pp. 187-200, 2001. S. Vazhkudai, S. Tuecke, and I. Foster. Replica selection in the globus data grid. IEEE International Symposium on Cluster Computing and the Grid (CCGrid), May 2001. I. Foster and D.Gannon.The OGSA Platform. OGSA work group draft V3, Global Grid Forum, Feb 16, 2003. K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, S. Tuecke, “ A Resource Management Architecture for Metacomputing Systems”, Proc. IPPS/SPDP ’98 Workshop on Job Scheduling Strategies for Parallel Processing, pp. 62-82, 1998. P. Watson. Databases and the Grid. DAIS WG, Global Grid Forum, Jan 2002 V. Raman, I.Narang, C.Crone, L.Haas, S.Malaika, T.Mukai, D.Wolfson and C.Baru. Services for Data Access and Data Processing on Grids. DAIS WG, Global Grid Forum, Feb 9, 2003. Huajun Chen, Zhaohui Wu, Chang Huang. Open Grid Services of Traditional Chinese Medicine. In Proceedings of IEEE Conference on System, Man and Cybernetics, (SMC2003), Hyatt Regency, Washington, D.C, Oct 2003. Alon Levy, Alberto Mendelzon, Yehoshua Sagiv, Divesh Srivastava. Answering Queries Using Views. Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 1995. A. Levy, A. Rajaraman, and J. Ordille. Querying heterogeneous information sources using source descriptions. In Proceedings of VLDB ’96, 1996. Y. Arens, C.-N. Hsu, C. A. Knoblock, Query processing in the SIMS information mediator, in: Proceedings of the ARPA/Rome Laboratory Knowledge-based Planning and Scheduling Initiative Workshop, 1996, reprinted in Readings in Agents. Huhns, Singh (eds.), Morgan Kaufmann.

An Optimal Task Scheduling for Cluster Systems Using Task Duplication* Xiao Xie, Wensheng Yao, and Jinyuan You Department of Computer Science and Engineering Shanghai Jiao Tong University 1954, HuaShan Road, Shanghai, 200030, China [email protected], {yao-ws, you-jy}@cs.sjtu.edu.cn

Abstract. This paper addresses the problem of scheduling a parallel program represented by a directed acyclic task graph onto homogeneous cluster systems to minimize its execution time. Many sub-optimal algorithms have been proposed that promises acceptable running time but not optimal solutions. We propose an algorithm that guarantees optimal solutions but not acceptable running time. Experiments show that in many practical cases the proposed algorithm performs very effectively in comparison with sub-optimal algorithms while still getting the optimal solutions.

1 Introduction A good scheduling is important for the efficiency of running programs on parallel or distributed systems such as cluster and Grid systems. Although many researches in scheduling have been focusing on heterogeneous systems, scheduling in homogeneous systems are still of concern because of its wide use and relative simplicity of modeling. A homogeneous system can be easily modeled as n processing elements (PE) that are fully connected via a network. The parallel programs that running on it can be modeled as weighted directed acyclic graphs (DAG) [1]. In a DAG model, the nodes represent the tasks, while the edges of a DAG represent the data dependencies and communications between tasks. Tasks used here are actually referred to as a set of instructions that must be executed on one processor without preemption. Every node of the DAG has a weight, which stands for the computation cost on a processor. Every edge of the DAG also has a weight, which stands for the communication cost. The problem is to find a schedule that minimizes the execution time for the program to run on the system. However, finding a schedule that minimizes the execution time of a parallel program is known to be NP-complete in most cases whenever task duplication is allowed

*

This paper is supported by the Shanghai Science and Technology Development Foundation under Grant No. 03DZ15027 and the National Natural Science Foundation of China under Grant No. 60173033

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 863–870, 2004. © Springer-Verlag Berlin Heidelberg 2004

864

X. Xie, W. Yao, and J. You

or not [10] [15]. Many heuristic algorithms have been proposed to look for suboptimal solutions. Techniques such as task duplication (which means that a task is allowed to be allocated to more than one PE) have also been introduced to improve the solutions. We support task duplication in the proposed algorithm for better solutions. Although heuristic algorithms can find sub-optimal solutions efficiently, there is still an embarrassment that few optimal solutions can be used to assess how far away those sub-optimal results deviate from optimal ones, even in the situation of unbounded underlying systems. This is due to difficulties in finding optimal solutions.Our objective in this paper is to propose an optimal scheduling algorithm that is feasible to get optimal solutions with problem scales as large as possible, in situations as many as possible. We have an assumption that the number of PEs that can be used is unlimited, under which the search space of the problem can be dramatically reduced. Unlike the sub-optimal algorithms, the proposed algorithm promises optimal solutions, but does not guarantee acceptable time. We call the proposed algorithm Large Scale Optimal Algorithm (LSOA). This paper is organized as follows. In section 2, we formally define the homogeneous system scheduling problem and define some terms used in this paper. In section 3, we will introduce a sub-optimal algorithm named CPFD [15] with which we compare our algorithm. In section 4, we will describe the proposed algorithm. In section 5, we shall show the experiment results of LSOA. We give our conclusion in the last section.

2 Background In this section, we formally define the scheduling problem. In this paper, a homogeneous cluster systems consists of a set of k identical PE which are completely connected via a network. Each PE executes only one task at one time. Once a task is executed on a PE, it will not be interrupted until it finishes. A parallel program is modeled as a weighted directed acyclic graph G=(V, E, C, M), where each node represents a task whose computation cost is C(v) and each edge represents the precedence relation that task u should be completed before task v can be started. In addition, at the end of its execution, u sends data to v and the communication cost is M(u, v). The communication cost is zero if u and v are scheduled to the same PE. If there is a path from u to v, then u is called a predecessor of v, while v is called a successor of u. For u is an immediate predecessor of v, denoted by while v is an immediate successor of u, denoted by ISucc(u) . A node without predecessors is called an entry node and a node without successors is called an exit node. The in-degree of node v is defined as the number of IPred(v). The communication-to-computation-ratio (CCR) of a parallel program is defined as its average communication cost divided by its average computation cost.

An Optimal Task Scheduling for Cluster Systems Using Task Duplication

865

For example, in Fig.1(a), the node weights and the edge weights denote the computation cost and the communication cost, respectively. The computation cost of is 2. The communication cost from to is 1. is an immediate predecessor of and is an immediate successor of is an entry nodes, and is an exit node. The CCR of the task graph is 0.54. A schedule of G, denoted by S(G), is a mapping of tasks onto PEs and assigning a start time to each task. For a task it is scheduled onto PE and assigned a start time ST(v, p). Therefore, the finishing time of v, denoted by FT(v, p), can be represented as FT(v, p) = ST(v, p)+C(v). If v is not mapped onto p, then FT(v, p)=0. A node can be mapped onto several PEs. In such a case, task duplication is used. The length or makespan of a schedule S, is the maximal finishing time of all tasks, that is, Our problem is that given a weighted DAG G = (V, E, C,M), find a schedule with smallest makespan, that is

Fig. 1. (a) An example DAG

(b) optimal scheduling created by LSOA

3 Related Works We introduce a typical sub-optimal algorithm in this section for later comparisons. The CPFD (Critical Path Fast Duplication) algorithm [15] is reported to be the best among LWB, LCTD, DSH, BTDH, PY and itself with a time-complexity of O(v4). CPFD is prone to find optimal solutions when the DAG’s CCR is small (<0.5). In section 5, we will compare the proposed algorithm with CPFD only in term of the running time. CPFD is an algorithm that based on partitioning the DAG into three categories: critical path nodes (CPN), in-branch nodes (IBN), and out-branch nodes (OBN). An IBN is a node from which there is a path reaching a CPN. An OBN is a node which is neither a CPN nor an IBN. Using this partitioning of the graph, the nodes can be ordered in decreasing priority as a list called the CPN-Dominant Sequence. CPFD will then schedule all the nodes according to the CPN-Dominant Sequence following some rules.

866

X. Xie, W. Yao, and J. You

4 The Proposed Optimal Algorithm 4.1 Definitions At first we shall introduce some definitions. Definition 1: Of all paths from an entry task to a node v, the one(s) with the largest sum of computation cost is called a Main Sequence(s) of v, denoted by MS(v). The Major Sequence Length is defined as the sum of computation cost of MS(v), denoted by MSL(v). Clearly, MSL(v) is a lower bound of the time needed to finish node v. Definition 2: The Computation Cost of cluster P, is the sum of computation cost of all the nodes on cluster P at some time t, denoted by CC(P, t). Definition 3: The Computation Cost of a list, is the sum of computation cost of all the nodes in that list at some time t, denoted by CC(list, t). Definition 4: A Must Parent (MP) of a node on cluster P under value F, is a parent node such that, given a value stand for the earliest start time of

4.2 The Proposed Algorithm The proposed algorithm is composed of three functions. Function 1 is the outline of the algorithm. Function 2 is used to calculate the earliest start time(EST) for a node v in the DAG. We use binary search to locate EST(v) between minValue and maxValue. Function 3 is a procedure to decide whether a node v can be finished before time F on P.

An Optimal Task Scheduling for Cluster Systems Using Task Duplication

For example, Table 1 is a trace of function 3 to decide whether cluster P5 before 12.

867

can be finish on

5 Experiments The experiments include two parts. In the first part, we use random graphs to compare LSOA with CPFD. For each scale of random graphs, we use 10 DAGs and record the average running times. In the second part, we will use traced graphs to evaluate the proposed algorithm and CPFD. The trace graphs are selected from an open benchmark [9] [10]. For each traced graph, we run both algorithms 10 times and record the average running time. We implemented CPFD and LSOA on a P4 2.0G Hz machine under Linux OS using C++.

5.1 Random Graphs Although the scale of problems which is solvable for LSOA in acceptable time is greatly limited (to less than 50) when CCR is larger than 2.0, LSOA shows its efficiency in running time when CCR is less than or around 1.0. Figure 2 is the comparative results with CPFD using random graphs (CCR=1.0). Figure 2 shows that LSOA not only guarantees optimal value, but also has better performance in terms of run-

868

X. Xie, W. Yao, and J. You

ning time. Experiments show that CPFD rarely gets optimal solutions with random graphs when CCR is around 1.0.

Fig. 2. Running time of LSOA and CPFD with random graphs (CCR=1.0)

5.2 Traced Graphs The benchmarks provide six kinds of traced graphs, that is, Cholesky, FFT, Guass, Laplace, LU and MVA. Different traced graphs have different structures. Except for Gauss and MVA, the other four’s original CCR is less than 1.0. The properties of the trace graphs are show in table 2 - table 7.

An Optimal Task Scheduling for Cluster Systems Using Task Duplication

869

Experiments show that LSOA can get optimal values for Cholesky, FFT, Laplace, LU very quickly. However, to get the optimal values for Guass and MVA, the problem scale is greatly limited. The results are shown in Figure 3 (a) - (d).

Fig. 3. Running time of LSOA and CPFD (a) Traced Graphs – Cholesky; (b) Traced Graphs – FFT; (c) Traced Graphs – Laplace; (d) Traced Graphs – LU

Figure 3(a) and Figure 3(d) shows that LSOA is very suitable for getting optimal solutions of Cholesky and LU. The running time is almost in a linear increment.Figure 3(b) shows LSOA outperforms CPFD in running time with FFT. However, Figure 3(c) LSOA is worse than CPFD in running time with Laplace. LSOA does not find optimal solutions in acceptable time for Gauss and MVA with such a large scale in the benchmarks. This is due to the large CCRs both the DAG possesses as we can see from Table 4 and Table 7. So we omit the comparing figures.

870

X. Xie, W. Yao, and J. You

6 Conclusion In this paper, we proposed an optimal algorithm (LSOA) for task duplication based cluster scheduling. LSOA searches the whole problem space, therefore it promises optimal solution while not guarantee acceptable running time. LSOA has effective cutting strategies to reduce the search spaces. Although running time degenerates with the increment of CCR rapidly, LSOA can found optimal solutions using less time than sub-optimal algorithms such as CPFD in many large scale cases. The experiment’s data presented in this paper demonstrate the algorithm’s effectiveness.

References 1. 2. 3. 4. 5. 6. 7. 8.

9. 10. 11.

12. 13.

14.

15.

Ahmad, Y.K. Kwok, M.Y. Wu and W. Shu, CASCH: a tool for computer-aided scheduling, IEEE Concurrency, (Oct-Dec 2000) 21-33. B. Kruatrachue and T. Lewis, Grain size determination for parallel processing, IEEE Software, (Jan. 1988) 23-32. J. D. Ullman, NP-complete scheduling problems, Journal of Computing System Science, 10 (1975) 384-393. S. Darbha and D.P. Agrawal, A task duplication based scalable scheduling algorithm for distributed memory system, Journal of Parallel Distributed Computing, 46 (1997) 15-26. Y.K. Kwok and I. Ahmad, Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors, ACM Computing Surveys, 31 (4) (Dec 1999) 407-471. Kruatrachue and T. Lewis, Grain size determination for parallel processing, IEEE Software, (Jan. 1988) 23-32. C.I. Park and T.Y. Choe, An optimal scheduling algorithm based on task duplication, IEEE Trans. Computers, 51 (4) (2002) 444-448. S. Ranaweera and D. P. Agrawal, A Task Duplication Based Scheduling Algorithm for Heterogeneous Systems, Proceedings of 4th International Parallel and Distributed Processing Symposium, (2000) 445-450. Y.-K. Kwok and I. Ahmad. Benchmarking the task graph scheduling algorithms. In Proc. 1st Merged IPPS/SPDP, (Mar. 1998) 531--537 Task Graph Scheduling Benchmarks http://www.eee.hku.hk/~ykwok/data/benchmarks.tar. gz G.–H. Chen and J.-S. Yu, A Branch-And-Bound-With Underestimates Algorithm for the Task Assignment Problem with Precedence Constraint, Proc. Int’l Conf. Distributed Computing Systems, (1990) 494-501 R.M. Karp and Y.Zhang, Randomized Parallel Algorithms for Backtrack Search and Branch-and-Bound Computation, Jorunal of the ACM, 40 (3) (Jul 1993) 765-789 M.A. Palis, J.-C Liou, and D.S.L. Wei. Task clustering and scheduling for distributed memory parallel architectures. IEEE Transactions on Parallel and Distributed Systems, 7(1) (Jan 1996) 46-55 Ishfaq Ahmad, Y.K.Kwok, Optimal and Near-Optimal Allocation of PrecedenceConstrained Tasks to Parallel Processors: Defying the High Complexity Using Effective Search Techniques, Proceedings of International Conference on Parallel Processing, Minneapolis, Minnesota, U.S.A., (Aug. 1998) 424-431. Ishfaq Ahmad, Yu-Kwong Kwok: On Exploiting Task Duplication in Parallel Program Scheduling. IEEE Transactions on Parallel and Distributed Systems, 9(9) (1998) 872-892

Towards an Interactive Architecture for Web-Based Databases Changgui Chen and Wanlei Zhou School of Information Technology Deakin University Geelong, Victoria, 3217, Australia Phone: +61-3-52272087 Fax: +61-3-52272028 {changgui, wanlei}@deakin.edu.au

Abstract. World Wide Web has brought us a lot of challenges, such as infinite contents, resource diversity, and maintenance and update of contents. Webbased database (WBDB) is one of the answers to these challenges. Currently the most commonly used WBDB architecture is three-tier architecture, which is still somehow lack of flexibility to adapt to frequently changed user requirements. In this paper, we propose a hybrid interactive architecture for WBDB based on the reactive system concepts. In this architecture, we use sensors to catch users’ frequently changed requirements and use a decision making manager agent to process them and generate SQL commands dynamically. Hence the efficiency and flexibility are gained from this architecture, and the performance of WBDB is enhanced accordingly.

1 Introduction The World Wide Web is just about the best way ever to distribute information -- it’s fast, nearly ubiquitous, and depends on no particular computer platform [9]. After Web and databases are incorporated together, a new term “Web-based database” (WBDB) arises. Generally speaking, a web-based database is a database that resides entirely on an Internet server. Access to the database is through a web browser and usually utilizes a password system that allows for restricted access to users depending on the privileges they have been given. A web-based database is a key component of many applications, such as applications in electronic commerce, information retrieval, and multimedia [8]. Web-based database systems are simple and convenient to use. Currently the most commonly used WBDB architecture is three-tier architecture. But this kind of systems still suffers from lack of flexibility [4]. They are hard to be applied to different environments and hard to adapt to frequently changing user requirements. It is very common that the structure of the system and the database is fixed, thus the information which can be retrieved is limited. More specifically, the web-based system can only provide certain fixed queries to users and the tables in the database are usually fixed both in number and structure, hence the system is tuned to functions on those fixed queries or tables. Access control and function authorization are usually done in a

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 871–878, 2004. © Springer-Verlag Berlin Heidelberg 2004

872

C. Chen and W. Zhou

coarse grain level. Users and functions are divided into several levels, and users can only perform functions allowed for his/her level. In this paper, we propose a novel approach, which is so-called reactive system approach, to solve the problems. The reactive system approach views a system as a reactor that continuously interacts with its environment by receiving and sending messages [3]. Figure 1 depicts a generic reactive system model, where DMMs represent decision-making managers; sensors and actuators connect with the DMMs and application objects by receiving inputs or sending outputs of the system. This model consists of three layers: policies, mechanisms and applications. From this model, we can see that a reactive system uses sensors and actuators to implement the mechanisms that interact with its environment or applications; its controllers, we call them decision-making managers (DMMs), are used to implement the policies regarding to the control of the applications.

Fig. 1. The generic reactive system architecture

The major advantage of this model is the separation of policies and mechanisms, i.e., if a policy is changed it may have no impact on related mechanisms and vice versa. For example, if a decision making condition based on two sensors was “AND” and now is changed to “OR”, the sensors can still be used without any changes required, i.e., the mechanism level can remain unchanged. This advantage will lead to a better software architecture and has a great significance in developing fault-tolerant and distributed computing applications since it can separate fault-tolerant computing policies and mechanisms from applications [11]. A web-based database system allows users to interact with the system through a web browser, thus it can be viewed as a reactive system. With the reactive system concepts, we can present an interactive architecture for web-based database systems, which contains a highly interactive middleware that can separate user requirements from the web server. In this architecture, the middleware processes different user requests dynamically and generates SQL statements automatically to operate databases, but maintaining itself unchanged. This flexible structure will let users customise their requests through browsers and access to different databases easily, which means that our system can be used in different RDBMS without any changes. Therefore, this approach can improve the system performance and save programmers’ work greatly. The rest of the paper is organized as follows. In Section 2 we investigate current architectures for WBDB. A hybrid interactive architecture for WBDB is proposed in Section 3. Section 4 addresses the design and implementation issues. Finally we summarize our work in Section 5.

Towards an Interactive Architecture for Web-Based Databases

873

2 Current Architectures of WBDB There are different WBDB frameworks according to various technologies and requirements. Generally speaking, WBDB can be considered as a single huge database as well as multiple data sources. Currently WBDB architectures can be classified as two types: two-tier architecture and three-tier architecture. The minimal spatial configuration of a WBDB is the two-tier architecture. It closely resembles the traditional client-server paradigm. But there are still some differences between them. The twotier architecture includes client (we called it and server (we called it here and are used to represent the different parts in WBDB. The two-tier solution clients are thin, and are lightweight applications responsible only for rendering the presentation. Application logic and data reside on the server side [6]. The three-tier architecture is a popular model, which contains generally client (we called it application server (we called it and data server (we called it see Figure 2. A full-fledged WBDB requires these three essential components although they can represent various types of technologies. In the following, we discuss some current three-tier architectures of WBDB.

Fig. 2. Three-tier architecture of WBDB

In the three-tier model of a database gateway, the three components are client API library, server API library, and glue [1]. The component is the client API library, which consists of client-side APIs. They determine the format and meaning of the requests that the client applications may issue. Glue is the component, which owns translation and mapping mechanisms. It transforms the client API to the DBMS (Database Management System) server’s API, and vice versa for the data returned to the clients. Server API library on the database server-side is the component. It manages the database service available to the clients. The services change in terms of the authentication from the DBMS. The extended client/server model is a typical three-tier architecture. In such a model, the client Web browser component) sends requests to the Web server component). The Web server transfers the requests to a database server component). After the database server processes the requests, the results are retrieved to the client Web browser by the reverse pathway. In the transition, the web server can handle the results from the database [7].

874

C. Chen and W. Zhou

In the Multi-distributed databases (MDBS) scenario, the Web server requests the MDBS to retrieve the required data [10]. The server does this by issuing a global-level SQL query to the MDBS. The MDBS then decomposes the whole query and generates the local queries according to various features of engaging database servers. Then these local queries can be issued to corresponding database servers that may be managed by the DBMS servers. But these DBMS servers can be accessed through all sorts of database access technologies. The MDBS integrates the local results it receives from all the database servers and finally presents a global result to the web server. In this case, the MDBS handles all the operations including data locating, interrelating, and integrating. The web server just sends the requests from clients, which is different from the typical client/server model. All the technologies can be used in the three-tier architecture according to different user requirements. The three-tier or even n-tier models are essential models to structure a WBDB. But both two-tier and three-tier architectures of WBDB are lack of flexibility. They are hard to be applied to different environments and hard to adapt to frequently changing user requirements, since the structures of the systems and the databases are fixed. To overcome this we introduce the reactive system model to implement an interactive architecture for WBDB.

3 A Hybrid Interactive Architecture of WBDB In a Web-based environment, there are so many users, who may be dispersed in all round the world and have different requirements, accessing the Web-based databases. At the present, the Web-based database applications are developed according to the limited user requirement analysis. It is impossible to include all the demands from all the users [5]. The database applications have to be changed even when there is a little difference on the requirement analysis. Sometimes the developers also need to modify or add new programs if there are some changes on database objects, such as adding a new table, or adding a new attribute to an exiting table, etc.. Therefore, it is very desirable that we can develop a flexible database application that can cope with any changes on the users’ requirements or database objects while remain itself unchanged [2]. In other words, it is desirable to have an interactive system architecture for WBDB in which users, applications and database servers can communicate with each other and adapt to each other’s changes. To achieve this, it is necessary to provide a method by which users can express their own requests and customize the results they want, and the middleware – the application server can process the requests and translate into SQL statements dynamically. There are several ways combining various technologies into Web or database to enhance the performances of WBDB. The hybrid interactive architecture proposed in this paper is to apply reactive system concepts in building WBDBs, as depicted in Figure 3. Restrictedly speaking, however, it also is the three-tier architecture. In this architecture, we design and embed a decision making manager (DMM) agent into the middleware (i.e. application server), which is dedicated to the processing of users’ requirements and translating them into SQL commands. We also attach a sensor as a Java applet to each client and attach an actuator into the Web server. In this scenario, a client sends either data or data and programs over the Web server that activates

Towards an Interactive Architecture for Web-Based Databases

875

the agent which further activates the Java sensor applet sent to the browser. The sensor collects all information about the client reqiurements of database processing and then sends to the DMM agent. The agent then processes the requested data using its own programs or using the received programs (from the client). After the completion of processing user requirements, the agent will send the final SQL statements generated based on the user reqirements to the Web server via the actuator. Then the Web server communicates with the database and the database server finishes the manipulation to the database and transfers the results to the Web server. The Web server will return the results back to the client directly.

Fig. 3. Hybrid architecture of WBDB

In this architecture, the DMM agent is the core of the system. It has following functions: first it provides table information from the database to users, so that users can know what they can do; secondly, it generates dynamic SQL commands according to the user requirements. The DBMS will operate the back-end database according to these commands and present the results to the users. This is the main task the DMM agent undertakes. The major advantage of this architecture is efficiency and flexibility. Since the DMM agent is dedicated to the processing of users’ database-related requirements and translating them into SQL commands, the application server can process clients’ requirements more efficiently. The Web server then is dedicated to the processing of users’ non-database related requirements. The flexibility is gained since different users can define different requirements in the sensor windows provided to them and the application server can dynamically process them and generate SQL commands. In contrast to other applications, which have the fixed SQL codes and can only process certain fixed queries, this system can meet different user requirements flexibly.

4 Design and Implementation Issues To implement the above hybrid architecture for WBDB, we have to solve one important issue, that is, how to generate SQL commands dynamically according to users’ requirements. When a user accesses to the Web server and issues some requests in-

876

C. Chen and W. Zhou

volving the database operation, the DMM agent will subscribe to a sensor built on a Java applet to the user immediately. The sensor catches all the information the user inputs and reports to the DMM. The DMM will record this user’s status and present all the database information from the back-end database server to the user, so that he/she can know what he/she can do about the database. After the user defines his/her requirements and sends to the DMM, the DMM constructs SQL commands automatically based on these requirements, then submits them to the DBMS using the actuator. The DBMS then executes the SQL commands and returns the results to the user. Therefore, the main task of the DMM agent in the system is to generate SQL commands dynamically according to users’ requirements. This task can be divided into three stages: presenting the database information to users, constructing users’ requirements and generating dynamic SQL commands. We address them respectively in the following.

4.1 Presenting Database Information Firstly, the DMM agent must present the database information from the back-end database server to users so that they can construct their requirements based on these information. To do this, we design two system-level tables for the DMM to catch the information from the RDBMS: one is Tb_relations, which describes the tables in the database and the relations between the tables, and another is Tb_attributes, which presents the attributes of the relations, as depicted in follows: Tb_relations (TableName, PrimaryKey, ForeignKey, ForeignTable); Tb_attributes (TableName, AttribName, DisplayLabel, DataType, Length); where TableName is the name of the tables in the RDBMS and ForeignTable is the name of the relevant foreign tables; AttribName is the name of the table attributes; DisplayLabel is the label to be displayed for attributes; DataType is the data types of attributes; Length is the lengths of attributes. The DMM presents all the tables and attributes information from these two system tables to users. It extracts the database schema from the data dictionary of the DBMS, and makes the system independent from a specific DBMS.

4.2 Constructing User Requirements For a given database system, we can express its structure in an E-R graph, which is comprised of vertices representing table entities and edges representing the relationships between table entities. The construction of a user’s requirements starts from a table in the graph, and then successively includes other tables and attributes the user might be interested in. Initially, we provide the user all table names from Tb_relations and let s/he select the tables from them. At this point, the user can only select one table, which s/he is most interested in. In other words, the user locates a vertex in the ER graph first. The attributes of this vertex will be displayed to the user automatically so that s/he can choose them. The user can also set the filter conditions. If the user needs more information from other tables, the DMM will find all vertices which have relations with (adjacent to) the vertex the user has chosen currently,

Towards an Interactive Architecture for Web-Based Databases

877

by travelling from this vertex in the graph, and then display them to the user for selection. By doing so, the DMM can find all vertices the user is interested in and selected, and they form a subset or sub-graph from the original E-R graph. The user can finalise his/her requirements by selecting attributes of these vertices or tables, and setting some filter conditions.

4.3 Generating SQL Commands After the traversal we obtain a sub-graph, from which the vertices comprise of the user’s requirements for database operations and have a vertical tree relationship (an execution-tree). In the execution-tree, each node contains three fields representing a vertex in the E-R graph (i.e. a table); a set of attributes selected by the user on the vertex; and the filter conditions added by the user, respectively. The DMM can generate the SELECT, INSERT, UPDATE, or DELETE commands based on this tree. We give steps for the SELECT command following: 1. Travel the execute-tree and find all the items in the vertex field of each node to finish the SELECT clause of the query command, at the same time use the attribute field to complete the FROM clause and append the condition field to the WHERE clause. 2. Travel the execute-tree again to obtain the join conditions based on the edge of the execute-tree, and use it to complete the WHERE clause.

Other commands are similar. Once the SQL commands have been generated, the DMM will send them to the RDBMS using the actuator for execution.

4.4 Other Implementation Issues In the hybrid WBDB architecture we embed three entities: DMM, sensor and actuator in the system. The basic framework of the DMM class, sensor class and actuator class has been implemented in [5]. Sensors and actuators are simple and they can be used in the WBDB with little changes, while the DMM class has to be added more semantics. As mentioned before, the main function of the DMM is to generate SQL statements dynamically. It has to provide table information from the database to users and work out the subset tables from the vertex graph, and produce the execution tree to generate SQL commands. At first, the DMM produces the database E-R graph based on the table TB_relations. In order to produce SQL statements, the DMM then must keep track of vertices or tables the user selected step by step, and they will form an execution tree. Notice that for each current vertex (which has been selected), the user may not choose all the vertices from its adjacent vertices in the E-R graph. The DMM has to decide which vertices should be displayed in the execution tree for each current vertex based on the user requirements.

878

C. Chen and W. Zhou

5 Conclusion and Future Work The paper has presented a hybrid interactive architecture for web-based database system based on the reactive system model. With the reactive system concepts, we build a decision making manager agent in the application server to cope with frequently changing user requirements and adapt to different databases. The DMM agent is dedicated to the processing of users’ database-related requirements and translating them into SQL commands, therefore, the application server can process clients’ requirements more efficiently. The flexibility is gained from this architecture. The separation of sensors and the DMM enables the system to be more flexible and easy for maintenance once there is a need to change. In the system, the DMM agent can retain the same no matter what changes the sensors have, and vice verse. Future work for the performance evaluation of the system will be carried out soon.

References 1. Ashenfelter, John Paul: Database Design For The Web, Webreview (1999), http://www.webreview.com/1999/03 26/developers/03 26 99 1.shtml. 2. Bouguettaya, Athman: Supporting Dynamic Interactions among Web-Based Information Sources, IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 5 (2000). 3. Chen, C. & Zhou, W.: Building Distributed Applications Using Reactive Approach Proceedings of the Australian Conference on Information System (ACIS-2000), Brisbane, Australia (2000). 4. Dong, X., Du, F. and Ni, L. M.: DWINS: A Dynamically Configurable Web-based Information System. Proc. of Int’l Workshop on Advance Issues of E-Commerce and WebBased Information Systems (WECWIS 2000), Milpitas, USA (2000). 5. Elmasri, R. & Bavathe, S. B.: Fundamental of Database Systems, 3 ed., New York: Addison-Wesley (2000). 6. Fraternali, Piero: Tools and Approaches for Developing Data – Intensive Web Applications: A Survey, ACM Computing Surveys, Vol. 31, No.3 (1999). 7. Hightower, Lauren: Publishing dynamic data on the Internet, Dr. Dobb’s Journal, v22 n1 (1997), p70(3). 8. Ioannidis, Yannis: Database and the Web: an Oxymoron or a Pleonasm, the 1st HELDINET Seminar, Athens, Hellas (2000). 9. Oreizy, P. and Kaiser, G.: The Web as enabling technology for software development and distribution. IEEE Internet Computing (1997), 1(6):84-87. 10. Ramakrishman, Ragbu: From Browsing to Interacting: DBMS Support for Responsive Websites, Proceedings of the 2000 ACM SIGMOD on Management of data (2000). 11. Zhou, W.: Detecting and Tolerating Failures in a Loosely Integrated Heterogenerous Database System. Computer Communications, 22 (1999), 1056-1067.

Network Storage Management in Data Grid Environment Shaofeng Yang, Zeyad Ali, Houssain Kettani, Vinti Verma, and Qutaibah Malluhi Department of Computer Science Jackson State University Jackson, MS 39217 [email protected] {zeyad.f.ali , houssain.kettani , vinti.verma , qmalluhi}@jsums.edu

Abstract. This paper presents the Network Storage Manager (NSM) developed in the Distributed Computing Laboratory at Jackson State University. NSM is designed as a Java-based, high-performance, distributed storage system, which can be utilized in the Grid environment. NSM architecture presents a framework offering parallelism, scalability, crash recovery, and portability for data-intensive distributed applications. Unlike several parallel research efforts, this paper introduces an architecture that is independent of systems and protocols. Therefore, the system can run in a typical heterogeneous Grid environment. We illustrates how NSM incorporates Grid-FTP and GSI authentication. We also provide a brief evaluation of the system performance.

1

Introduction

Nowadays, Grid technologies [9,10] are getting more popular and have been applied to various computational fields. The Grid infrastructure can support the sharing and coordinated use of diverse resources in dynamic and distributed virtual organizations [10]. Data intensive applications, such as experimental analysis, simulations and visualizations, require high-rate data access to huge data sets. Moreover, since many Grid applications deal with remote data and remote users that are often geographically distributed, a major challenge for building the computational Grid is providing an efficient distributed data storage environment. A number of research projects in the scientific field have targeted enhancing the performance, security, scalability, and reliability of data intensive distributed Grid applications. Some solutions have focused on tuning network parameters, such as setting the correct TCP buffers and using parallel streams to optimize the performance [7]. However, this approach requires the implementation of a Linux kernel-specific tuning daemon. Other efforts are designing new TCP stacks, which are operating system-related and nonstandard [12,5]. The traditional operating systems centered solutions limit the control and management of storage resources to the kernel. Applications are limited to the policies M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 879–886, 2004. © Springer-Verlag Berlin Heidelberg 2004

880

S. Yang et al.

and implementations provided by the system. The problem with this approach is that applications have different requirements. Therefore, storage policies suitable for one application may lead to poor performance and behavior for others. To avoid these limitations, two distributed storage systems are also being developed: Armada parallel file system [18] and Storage Resource Broker (SRB) [4]. But Armada does not handle crash recovery and SRB is not application-controlled and its simple backup mechanism is not cost-effective. Thus, in order to handle the high-rate of data access requirements for data intensive applications in the heterogeneous Grid environment, a general solution with the following features is desired: system-independent and applicationcontrolled; high performance; cost-effective data recovery; integration with core Grid services. The Network Storage Manager (NSM), a Java-based software system, has been developed for this purpose in the Distributed Computing Laboratory at Jackson State University. This paper discusses how NSM is designed and developed to meet all the requirements above. The rest of this paper is organized as follows: Introduction about NSM architecture and application-controlled feature is discussed in Section 2. In Section 3, we show how does NSM utilize GridFTP and GSI for the Grid environment. In Section 4, we evaluate system performance regarding parallel TCP streams, FTP, and GridFTP. Finally, a summary and concluding remarks are presented in Section 5.

2

NSM System Architecture

NSM has a unique architecture that provides many advantages, including high performance, reliability, self-healing, load balancing and seamless access. NSM utilizes multiple parallel data streams to achieve load balancing and high data rates. NSM delivers reliable storage by encoding redundant blocks of data and distributing the generated redundant data, which gives the system the ability to restore any missing or long-delayed data and to heal any damaged or corrupted data sets automatically. NSM approach is much more cost-effective than replication because its encoded redundant data blocks are much smaller in size than that of the original data. Therefore, applications utilizing NSM for their data storage are smart since they automatically inherit all of its merits and features.

2.1

Data Layout over Storage Servers

As illustrated in Figure 1, NSM partitions a data set into a number of small data blocks. The partitioning algorithm may be a standard fixed-size algorithm or an application provided algorithm. The system distributes the blocks across multiple data servers. To enable dependable service, NSM uses coding to add redundancy to the original data. This redundancy enables applications to retrieve the original data even if a portion of the data is unavailable due to server and/or network failure. Data blocks and their corresponding parity blocks are grouped in married blocks. A married block contains one block for each data and parity server. Selecting the blocks in each married block is an application issue and

Network Storage Management in Data Grid Environment

881

depends on its decision on the suitable data layout. To ensure load balancing, NSM distributes the blocks of a married block to distinct servers.

Fig. 1. Data Set Layout Over Storage Servers.

2.2

Distributing Data Sets

In Figure 2, we demonstrate how married blocks are buffered for uploading using NSMWriter. Efficient data service can then be achieved by using multiple concurrent streams established between the client and the distributed data servers. After uploading all the blocks of the data source, meta data, which contains the information describing the dataset and its distribution configurations, is obtained from the layout algorithm and uploaded to one or more designated meta servers.

2.3

Data Retrieval

The system offers application transparent and seamless access to the physically distributed data sets. Applications can use NSM as a high-performance random input stream. An application can open multiple data sets at a time using the same NSMReader. Each data set has its own buffer. A prefetching mechanism is utilized in an effort to have the most likely to be requested blocks in memory even before they are requested by the application. Application requests have higher priority over prefetching requests. A request to a single block results in requesting all the blocks in the corresponding married block. The requests are queued and served according to their priority. If data set buffer is full and more requests are coming, a cache management algorithm is used to decide which blocks to dispose. The standard cache replacement policy will dispose the least recently used blocks. As shown in Figure 3, the blocks are downloaded in parallel from their storage servers using asynchronous

882

S. Yang et al.

Fig. 2. Distributing Data Sets Using NSMWriter.

system calls. The system recovers any server failure or network delay by transparently switching to any of the available parity servers. Missing data blocks are reconstructed by decoding the corresponding parity blocks. On-the-fly data recovery leads to a high reliability without sacrificing the performance. NSM is also an application-controlled framework. Data layout model, partitioning algorithm, prefetching algorithm, cache replacement policies, and meta data are fully controlled by the application. NSM allows developers to specify or plug in their own data transfer protocols and authentication mechanisms. For example, users or developers can use the built-in FTP and HTTP or their own or other customized protocols for traditional distributed systems. GridFTP is provided and supported in the NSM for Grid systems. Two sample applications utilizing NSM features have been built on the top of NSM: one of these can display a terrain image by reading image tiles as needed from distributed servers [16]. The other one is video player client application which can play frames from different parallel remote servers and provide frame reconstruction and frame skipping features [17]. Generally speaking, the pure java implementation and data transfer protocol independency provide NSM portability and platform independence. The parallel streaming, load balancing, and buffering provide high performance for high rate of data access. Encoding and decoding schema make NSM cost-effective for data recovery. In the next section, we address how NSM integrates with Grid services.

Network Storage Management in Data Grid Environment

883

Fig. 3. Handling applications requests by NSMReader.

3

NSM in the Grid Environment

To build the Grid environment, some commercial solutions are available. In addition, Globus is considered the most widely utilized open-source toolkit for building grid applications [10]. The Globus Toolkit is fully compatible with the Open Grid Services Architecture (OGSA), which is the standard that defines the Grid service and its related mechanisms, protocol bindings, and integration with native platform facilities [10]. Therefore, the Globus ToolKit 2.0 was selected as the platform for developing and testing our Grid-enabled NSM. Globus data management architecture, one of the fundamental components of Globus, provides GridFTP service for Grid computing environments. Therefore, GridFTP is a basic Grid protocol for transferring data between Grid nodes. GridFTP extends FTP with new features and provides several advantages over FTP [1,2,3], such as Grid Security Infrastructure (GSI) authentication [11,8, 6], a standard and secure authentication mechanism in the Grid environment, third-party control, striping, and partial file access [3]. For NSM to run in the grid environment, it has to support this standard Grid data transfer protocol. NSM modular and programmable architecture permitted us to implement a GridFTP module as one of the application-controlled NSM Plug-ins. Implementing and utilizing a GridFTP pluggable module was the first important step required for running NSM in the grid environment. Our implementation took advantage of Java CoG Kit [14,15]. The latter is based on Globus Java API and provides a GridFTP client as well as mappings to commonly used Grid services including GSI and LDAP. Thus, we utilized Java CoG to implement the data transfer protocol interface of NSM. In addition to

884

S. Yang et al.

Fig. 4. NSM Authentication Framework.

GriFTP, NSM is currently capable of utilizing FTP, HTTP and NSM-specific data transfer protocols. The NSM pluggable architecture supports GSI authentication. GSI authentication implemented on top of the Generic Security Service application program interface (GSS-API) that provides authentication and authorization services using public key certificates as well as Kerberos authentication [11,8,6]. GSI is also a fundamental component of Globus and has been bound to Grid services as a standard authentication mechanism in the Grid environment. Since GSI authentication in Java CoG Kits is not compatible with GSS-API and GSS in Sun Java JSDK 1.4 is not yet pluggable [13]. NSM adopted Java Authentication and Authorization Service (JAAS) in Sun Java JSDK 1.4 as its authentication framework. JAAS is designed to provide a general and standard authentication and authorization framework as well as a programming interface [13]. The JAAS authentication framework implements a Java version of the Pluggable Authentication Module (PAM). Thus, it allows users and developers to plug in their own unique authentication mechanism. In our case, this mechanism is GSI authentication. GSI authentication is implemented with a JAAS interface so that GSI authentication in NSM will work with other Grid services and protocols. As a result, future application developers using NSM as a network storage layer can design, implement, and plug their own authentication mechanisms into the system. Meanwhile, NSM authentication infrastructure adds Sun Java GSS under JAAS for supporting Kerberos authentication.

Network Storage Management in Data Grid Environment

885

Figure 4 illustrates NSM authentication framework. The username and password module of JAAS works well with traditional data transfer protocols like FTP and HTTP. GSI authentication module is protocol-independent and currently works only with GridFTP. Kerberos can also be applied to a specific protocol when demanded.

4

Summary and Concluding Remarks

This paper shares the experiences gained from building a Grid enabled storage system. This paper also presents a flexible and platform-independent distributed data storage system architecture utilizing GridFTP and GSI authentication with high performance and reliability. Hence, this setup is suitable for data intensive Grid applications. By utilizing the NSM storage system, applications can tune their performance by selecting and implementing storage policies that are appropriate for their specific requirements. We also performed experiments regarding the performance of NSM employing GridFTP versus FTP. Such experiments indicated that as the number of parallel remote data servers increases; the time to get the sample data file decreases. Although GridFTP has higher overhead as compared to FTP, running NSM in Grid environment can still dramatically improve the performance of data intensive applications in the Grid environment, not to mention other advantages of NSM, such as reliability, self-healing, etc.

References 1. B. Allcock, J. Bester, J. Bresnahan, A. Chervenak, I. Foster, C. Kesselman, S. Meder, V. Nefedova, D. Quesnel and S. Tuecke (2001). “ Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing,“ Proceedings of the 18th Annual IEEE Symposium on Mass Storage Systems (MSS 2001), San Diego, California, April, 2001. 2. W. Allcock, A. Chervenak, I. Foster, C. Kesselman, C. Salisbury and S. Tuecke (2001). “The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Datasets”, Journal of Network and Computer Applications, 23:187-200, 2001. 3. W. Allcock, J. Bresnahan, I. Foster, L. Liming, J. Link and P. Plaszczac (2002). “GridFTP Update January 2002”, Technical Report. 4. C. Baru, R. Moore, A. Rajasekar and M. Wan (1998) “ The SDSC Storage Resource Broker”, Proceedings of 8th Annual IBM Centers for Advanced Studies Conference (CASCON 1998), December, 1998, Toronto, Canada. 5. J. J. Bunn, J. C. Doyle, S. H. Low, H. B. Newman and S. M. Yip (2002). “Ultrascale Network Protocols for Computing and Science in the 21st Century”, White paper to US Department of Energy’s Ultrascale Simulation for Science (USS)initiative, 6. R. Butler, D. Engert, I. Foster, C. Kesselman, S.Tuecke, J. Volmer and V. Welch (2000) “A National-Scale Authentication Infrastructure”, IEEE Computer.33(12):60-66, 2000 7. T. Dunigan, M. Mathis and B. Tierney (2002). “A TCP Tuning Daemon”, Proceedings of the 14th Annual Supercomputing Conference (SC2002), Baltimore, Maryland, November, 2002.

886

S. Yang et al.

8. I. Foster, N. T. Karonis, C. Kesselman and S. Tuecke (1998). “Managing Security in High-Performance Distributed Computing”, Cluster Computing, 1(1):95-107, 1998. 9. I. Foster, C. Kesselman, J. Nick and S. Tuecke (2002). “Grid Services for Distributed System Integration,” Computer, 35(6). 10. I. Foster, C. Kesselman, J. Nick and S. Tuecke (2002) “The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration,” Proceedings of the 5th Global Grid Forum Workshop (GGF5), Edinburgh, Scotland, July, 2002. 11. I. Foster, C. Kesselman, G. Tsudik and S. Tuecke (1998). “A Security Architecture for Computational Grid”, Proceedings of the 5th ACM Conference on Computer and Communications Security Conference, San Francisco, California, November, 1998. 12. S. Floyd (2002). “Highspeed TCP for Large Congestion Windows”, Internet Engineering task Force, June, 2002. 13. C. Lai, L. Gong, L. Koved, A. Nadalin and R. Schemers (1999). “User Authentication and Authorization in The JAVA(TM) Platform“, Proceedings of the 15th Annual Computer Security Applications Conference, Phoenix, Arizona, December, 1999. 14. G. V. Laszewski, I. Foster, J. Gawor, W. Smith and S. Tuecke (2000). CoG Kits, “A Bridge between Commodity Distributed Computing and High-Performance Grids”, Proceedings of ACM Java Grande 2000 Conference, 97-106, San Francisco, California, June, 2000. 15. G. V. Laszewski, I. Foster, J. Gawor and P. Lane (2001). Concurrency, “A Java Commodity Grid Toolkit”, Practice and Experience , 13, 2001. 16. Q. Malluhi and Z. Ali (2002). “DTViewer: A High Performance Distributed Terrain Image Viewer with Reliable Data Delivery”, the 2nd Annual International Workshop on Intelligent Multimedia Computing and Networking (IMMCN 2002), 927-930, Durham, North Carolina, March, 2002. 17. Q. Malluhi and O. Aldaoud (2002) “VoD System Using a Network Storage Manager” , Proceedings of the 8th Annual International Conference on Distributed Multimedia Systems (DMS 2002), San Francisco, California, September, 2002. 18. R. Oldfield and D. Kotz (2001). “Armada: A Parallel File System for the Computational Grid”, Proceedings of the 1st Annual IEEE International Symposium on Cluster Computing and the Grid (CCGrid2001), Brisbane, Australia, May, 2001.

Study on Data Access Technology in Information Grid* YouQun Shi1,2, ChunGang Yan1, Feng Yue1, and ChangJun Jiang1 1

Department of Computer Science and Engineering, Tongji University, 200092, Shanghai, China 2 College of Information & Electrical Engineering ,China University of Mining and Technology , 221008, XuZhou , JiangSu ,China , [email protected].

Abstract. The city traffic information system is usually operated by independent IT company with the trusteeship form. most of information systems use the relation database to organize and store data. As a result, this accumulates mass history data. They are the independent database systems, federated or distributed database system. So in the information Grid, it is necessary to study the technique of organization and access the data, referring the Grid service standard of OGSA, using the data service interface based on the developing and running platform of GlobusToolkit’s 3, building virtual database service to achieve data access , information service, and use the high performance computing devices of the Grid to process the data in large scale.

1 Introduction As building city traffic information Grid system, it is important to organize and access data. The Grid system must provide efficient integration mechanism for data resource to reorganize the separated data and to achieve resources share , it must also supply high performance computing resources to resolve storing, computing, data analyzing of large data sets, and supply integrative service mechanism to support extensive traffic information services. GlobusToolkit’s 3 (GTS) [3] is a development toolkit of new generation Grid application system. The core of GT3 supplies open grid service framework, API (service data, notification, query, soft state management), JAVA toolkit, and a developing and running environment of OGSA (Open Grid Service Architecture) [4]. Some important grid functions are realized through service, such as: data access, data transmission, replica management, audit service etc. Grid service can be implemented though a set of interfaces. Grid service supports virtual data access, shields the physical distribution and structure of database, forms one or more virtual databases with

* This work is support partially by projects of National Preeminent Youth Science Foundation (No. 60125205), National 863 Plan (2001AA413020, 2002AA4Z343 0), Excellent Ph.D Paper Author Foundation of China (199934), Foundation for University Key Teacher by the Ministry of Education, Shanghai Science & Technology Research Plan (02DJ14064, 03JC14071). M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 887–890, 2004. © Springer-Verlag Berlin Heidelberg 2004

888

Y. Shi et al.

the same logical structure, and supplies unique transparent data access interface for application system.

2 Grid Service and Service Data of GT3 Global Forum 4 gave an important standard OGSA (Open Grid service Architecture), which gives emphasis to services. OGSA defines the concept of Grid Service, which is a kind of web service with interfaces. The interfaces confirm convention, service discovery, dynamically creating service, lifecycle management,, notification, etc., and many Grid services can be integrated to satisfy VO’s requirements. GT3 is a new generation software toolkit used for building grids applications based on OGSA, and the core of GT3 supplies open grid service framework, API(for example: service data, notification, query, management of state ), Java toolkits, and the environment of developing and running OGSA. The basic grid service of GT3 supplies GRAM(Resource Allocation Management), Index Service, reliable file transfers and other auxiliary services[5]. In GT3 each of the Grid services can be achieved by OGSI including three parts. The first is to describe the basic interface service defined in OGSI by using WSDL and XML Schema, and present the interface realization of Java, such as GridService, Factory, Registration, NotificationSource, NotificationSink, NotificationSubscription etc . The second is to supply the API in relation to GSH and GSR mentioned in the passage for the Grid service call. The third is to have the definition and default value of service data finished simultaneously by using the extension to WSDL of GSDL. So service inquiry can get the service interface information and related service information in the same time, and the call of service becomes more convenient[6][7]. In GT3, in order to realize the web service of OGSI, ServiceData is used to express the status data of the service instance. All these enable the service demander to query and update the service instance. The realization of ServiceData is done by a ServiceDataImpl class. Each Grid service has an instance object of ServiceDataImpl, which has a Hashtable to manage and store the related data element. The default service data in GT3 as follow: PortType: In order to support the discovery and management, the Grid service interface should have a global unique name. In WSDL, interface is defined by portType with a global unique name--qname. ServiceDataNames: The assembly of service data element includes service data element-qname, which has been defined in WSDL or portType or is dynamically added by service instance. GridServiceHandles(GSH) :In OGSA, Grid Service supplies discovering inquiry, creating service instance, canceling service, etc. In the same time, it defines a kind of interface called Factory, which is exclusively used to create transient service instance interface. Each Grid service, which has a global unique handle GSH(Grid Service Handle), supports HTTP and can be identified by a GSR( Grid Service Reference). Through the handle map, a Grid service reference with information about network address, protocol binding and so on is got and is returned in WSDL format[8][9].

Study on Data Access Technology in Information Grid

889

GridServiceReferences: one of the element values must be the GSR in the form of the Grid Service instance WSDL.

3 Virtual Database Access It is a prototype system of virtual DB accessing based on GT3 , as shown in figure 1. Three traffic information system use different database products (For example: Oracle,SQL- Servier ,Sybase). Only virtual DB service A is visual in the service process, and Grid system supplies data to application system by DB service A. Virtual DB service includes a subordinate Virtual DB service (A1) or a real physical DB service (A2). Grid service A1 is also a virtual DB service mapping A11 and A12 , two physical DB services. Each of the virtual DB services can transfer the access task to the subordinate Grid service by logical DB structure and leads to the access to physical DB, in accordance to the data access query that it has received. Each physical Grid service maps a particular physical DB. It realizes the operation of query, recombination, cancellation and insertion to DB through interface and returns the result.

Fig. 1. The prototype system of virtual DB

Fig. 2. The interface relation diagram

Then, each virtual data access service composes a virtual DB service system, or in other word, a virtual DB server. Dendriform DB service structures of multiple layers compose a larger extensive virtual DB service system. The system realizes OGSI in accordance to GT3; the interface relation diagram is shown in figure 2. DatabasePorTpye defines the unified data access interface of virtual and real service and shields the heterogeneous characteristic of database. In this definition of the interface, the conventional operation methods to database are as follow: ReadOperate( ), InsertOprate( ), DeleteOperaten( ), ModifyOperate( ), etc. DatabaseAbstractImpl is an abstract class and a permanent Grid service of GT3. DatabaseAbstractImpl is in charge of initialing and mapping to the superior service structure , including unified access method in databasePortType interface instance and including the query and description to the database logical structure (getTables,

890

Y. Shi et al.

getColumn, getTablesDscription, getClolumnsDscription) and structure mapping (getreal- columnname, getpossibaleValues, isPossibleValue). An abstract class of VirtualDB AbstractImpl is specialized to realize the data access method in the DBPT interface, such as: ReadOperate, InsertOperate, etc, to query and call the available subordinate service, and to coalesce the results of return. An abstract class of RealDBAbstractImpl supplies unified database access method and registers to the superior service, and allows other services to discover and call the service. Others, such as: ORADBImpl, SQLDBImpl, SYSDBImpl inherit RealDBAbstractlmpl and VirtualDBAbstractImpl capsulate all kind of manipulation to the physical database, such as database connection, transaction manipulation, data query, insert, and delete, achieve physical database access and supply data service to the superior virtual service. To update, insert and delete operation, the valid records can be returned, but to the query operation, a two dimensional array with a field and a value can be returned. Then higher-level virtual data services map and merge the result, and all the result can be returned to the higher level.

4 Conclusion Grid data access includes a multiplicity of contents, such as scheduling of computing resources, recombining of data module, creating of metadata model, transacting of data form, transmitting of data, etc. An effective method should be employed to a single mapping space to shield hardware boundary, in accordance to standards of OGSA and Grid Service, based on the platform of GT3. In order to realize data form conversion and provide virtual data service, we may develop data access middleware to support the unified, transparent access of data resources in Grid, referring the methods mentioned above.

Reference 1. I. Foster, C. Kesselman. The Grid:Blueprint for a New Future Computing Infrastructure. San Francisco, CA:Morgan Kaufmann Publishers, 1999. http://www.Gridforum.org/ 2. Gridfodrum. http://www.Gridfodrum.org 3. http://www.Globus.org 4. OGSA specification. http://www.Gridforum.org/ 5. I. Foster, C. Kesselman. The Physiology of the Grid: An Open Grid Services Architecture for Distributed System Integration. 2002. http://www.Globus.org/reserch /papers/ogsa.pdf. 6. S. Tuecke, K. Czajkowski,I . Foster. Grid Service Speccification. Technical Report. Feb,2002. http://www.globus.org/ ogsi-wg/drfts /gs_Spec_draft03_2002-07-17.pdf 7. K. Czajkowski,S.Fitzgerald,I.foster, C.Kesselmann. Grid Information Services for Ditributed resource Sharing. Proceedings of the Tenth IEEE International Symposium on HighPerformance Ditributed Computing. IEEE Press,August 2001. 8. Norman W. Paton , Malcolm P. Atkinson. Database Access and Integration Services on the Grid processor, http://www.globus.org /research /papers/ 2002. 9. D. Pearson. Grid database requirements. Technical Report. http://www.cs.man.ac.uk/griddb/, paper for Databases and the Grid BOF, GGF4, 2002.

GridTP Services for Grid Transaction Processing* Zhengwei Qi, Jinyuan You, Ying Jin, and Feilong Tang Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200030, P.R.China {qizhwei, youngkim}@sjtu.edu.cn, {you-jy, tang-fl}@cs.sjtu.edu.cn

Abstract. We propose a new service-oriented Grid Transaction Processing architecture called GridTP based on the OGSA platform and the X/Open DTP model. GridTP services provide a consistent and effective way of making existing, autonomously managed databases available within Grid environments. In the Data Virtualization Services layer, Grid applications, via the WSDLstyle TX portType, delineate global transaction boundaries among virtual organizations. In the Common Resource Model layer, GridTP services, via the X/Open-defined interface XA between Database Managers and the Transaction Manager, manager local transactions in heterogeneous databases. Therefore, GridTP has made a seamless mechanism for embedding the X/Open DTP model in Grid services to support Grid Transaction Processing, which provides one promising reference implementation for the future Grid Data Services.

1 Introduction In Grid environments, the dynamic, distributed, and scalable data access and management will play a significant role in e-science and e-business applications. Existing databases do not provide Grid integration but the development of a new Grid database management system is not realistic. However, there exists a standard model for distributed transaction processing called X/Open Distributed Transaction Processing (DTP) [2]. The X/Open DTP model is a standard for distributed transaction processing software in which shared resources are located at different sites on a network. The major RDBMS (e.g., ORACLE, Sybase, etc.) support this model by providing the XA interface. Meanwhile, many transaction/message processing middleware (e.g., CICS, Encina, Tuxedo, IBM MQ Services, etc.) conform to X/Open DTP. So it is very attracting to integrate the X/Open DTP model into Grid computing. This paper proposes a service-based Grid Transaction Processing framework called GridTP built upon the OGSA platform [3] and the X/Open DTP model. OGSA supports, via standard interfaces and conventions, the creation, termination, management, and invocation of stateful, transient services as named, managed entities with * This paper is supported by the Shanghai Science and Technology Development Foundation under Grant No. 03DZ15027 . M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 891–894, 2004. © Springer-Verlag Berlin Heidelberg 2004

892

Z. Qi et al.

Fig. 1. The architecture of GridTP services

dynamic, managed lifetime. In [1], a Grid database service should support: (i) consistent access to databases from Grid applications; and (ii) coordinated use of distributed databases from Grid middleware. GridTP provides one promising reference implementation for the above two goals.

2 The Framework of GridTP 2.1 The Introduction of GridTP GridTP is divided into three layers: (1) the Common Resource Model layer; (2) the OGSA Platform Layer; and (3) the Data Virtualization Services layer (see Fig. 1).

2.2 The Common Resource Model Layer In the X/Open DTP model, RM (Resource Managers) denotes a shared, recoverable resource such as RDBMS. Because the term Resource Managers in Grid presents a more general concept of resource management (e.g., CPUs, Disks, Network adaptors, etc.), we use the term Database Managers (DMs) to denote the meaning of RM in the X/Open DTP model. DMs represent manageable data resource such as RDBMS, XML databases, and object databases. All DMs should provide the XA interface for

GridTP Services for Grid Transaction Processing

893

the coordination of distributed transactions. This requirement is easy to be satisfied by many popular RDBMS such as ORACLE, Sybase, DB2. It is obvious that TM and DMs make a database become the manageable resource, which conforms to Common Resource Models (CRM) in OGSA [3]. According to CRM, the architecture of TM-DMs makes it possible to expose database resource as Grid services in OGSA, which is applicable for OGSI to communicate and manager heterogeneous database resources. The standard XA interface provides the capabilities of distributed transaction management. In a word, in the CRM layer, we implement two basic functions: (1) transparent data access operations via the DM-specific API; and (2) the coordination of distributed transaction through the XA interface.

2.3 The OGSA Platform Layer In this layer, OGSI defines fundamental mechanisms such as creation, naming and management on which the OGSA Platform is constructed. These interfaces also include data management, resource management, and transactions. However, transactions are not well defined in the current OGSA platform. Our GridTP services provide a solution for Grid transactions building on the top of the OGSA/OGSI architecture.

2.4 The Data Virtualization Services Layer Data Virtualization Services (DVS) provide a variety of interfaces, including data caching, data replication, data access and mechanisms for accessing wide range of data types, including flat files, RDBMS, and streaming media. However, GridTP services as a kind of DVS provide transparent access of “virtual databases” among Virtual Organizations (VOs). Of course, the precondition is that the database should support the XA interface. The interfaces of GridTP can be divided into three categories: (1) the OGSA interfaces such as naming and dynamic creation; (2) the distributed transaction interface, namely, the TX interface; and (3) Application-specific interfaces. The second category is of great importance in GridTP and we discuss this in detail. In the X/Open DTP model, the interface between an AP and a TM is the TX interface, which is defined in C and COBOL languages. We extend TX to the WSDL-style portType. It is easy to translate the interface to the WSDL definitions. We define a new port type called TX and every tx_* function can be defined as an operation in this portType. Some operations need input parameters and output parameters, which can be defined as the message description in the standard WSDL. In this interface, there are three transaction characteristics pertain to each global transaction, i.e., commit_return, transaction_control, and transaction_timeout. These characteristics can be defined as serviceData elements of the GridTP TX portType.

894

Z. Qi et al.

3 The Implementation Issues of GridTP Our GridTP project called Hong GridTP in Shanghai Distributed Computing Center is divided into three steps. We briefly introduce this as follows: (1) The first step is to develop Transaction Manager in the Common Resource Model layer. Up to now, we have implemented this module called Hong Transaction in our lab, which supports the XA interface and provides transparent data operations. (2) The second step is to develop GridTP services based on the Globus Toolkit 3.0 in the Data Virtualization Services layer. In this step, the portType of TX has been defined and we are implementing this portType now. (3) The third step is to adopt GridTP services to support Grid applications such as the GridPortal project in our lab, which provides a single point of access to applications and information in a unified interface for the Grid users.

4 Conclusions This paper has made a preliminary, service-oriented proposal called GridTP to deal with Grid Transaction Processing. It is worth reminding that we just discuss how to integrate the traditional X/Open DTP model into Grid services. But the two-phase commit protocol adopted by GridTP is a bit too strict in general for Grid style applications. So it is necessary to extend this model to accommodate the wide variety of Grid applications. In the future version of Hong GridTP, we plan to improve it on the following aspects: (1) supporting the standard WS-Transaction; and (2) conforming to the future Grid Data Service Specification [4]. GridTP can act as the Agreement Provider to support WS-Agreement. That is, GridTP can be integrated into Grid Data Services as a reference implementation of Grid Transaction Processing.

References [1] N.W. Paton, M. P. Atkinson, V. Dialani, et al., Database Access and Integration Services on the Grid, U.K. National eScience Center, 2002. [2] X/Open CAE Specification, X/Open Distributed Transaction Processing: Reference Model, Version 3 (ISBN: 1-85912-170-5, G504), February 1996. [3] I. Foster, C. Kesselman, J.M. Nick, S. Tuecke, The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration, http://www.globus.org/ogsa, January 2002. [4] M. Antonioletti, M. Atkinson, et al., Grid Data Service Specification, GGF Database Access and Integration Services Working Group, June 2003. [5] X/Open CAE Specification, Distributed Transaction Processing: The TX (Transaction Demarcation) Specification (ISBN: 1-85912-094-6, C504), April 1995. [6] X/Open CAE Specification, X/Open Distributed Transaction Processing: The XA Specification (ISBN: 1-872630-24-3, C193 or XO/CAE/91/300), December 1991.

FTPGrid: A New Paradigm for Distributed FTP System Liutong Xu1 and Bo Ai2 1

School of Computer Science and Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China [email protected] 2

Information System Division of China Unicom, Beijing 100032, China [email protected]

Abstract. FTP is one of the most important applications on the Internet. This paper introduces a new paradigm for distributed FTP system called FTPGrid, which consists of a collection of FTP servers that work cooperatively and serve all FTP clients. FTPGrid adopts client/grid architecture. FTP clients connect to one server and access all resources in the grid. Some key issues such as resource directory and its synchronization, access relay and resource caching mechanism, and security are discussed in this paper.

1 Introduction The Internet is dominating our daily life. There are a huge amount of resources including computer programs, computer data and multimedia resources on the Internet. FTP is one of the best ways for resource-sharing. FTP defines a File Transfer Protocol that makes resource sharing easy, reliable and efficient through the Internet. The traditional FTP adopts so-called client/server architecture. And that leads two problems. First, you may not know where the resources that you want are. Second, the heavy load of FTP server will dramatically decrease the server’s response time and cause local traffic jam. One way to solve the problems is to establish a distributed FTP system, which consists of a collection of geographically distributed FTP servers that work cooperatively. Such a distributed FTP system is called an FTPGrid in this paper. FTP clients connect to one local server and can access all the resources in the grid. In this paper we introduce FTPGrid and define its architecture and some mechanisms to be used in FTPGrid.

2 FTPGrid Architecture FTPGrid consists of a dynamic collection of geographically distributed FTP servers that work cooperatively as a single one. The following figure shows a typical scenario of an FTPGrid application. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 895–898, 2004. © Springer-Verlag Berlin Heidelberg 2004

896

L. Xu and B. Ai

Fig. 1. FTPGrid: A new paradigm for distributed FTP system

FTPGrid adopts client/grid architecture. FTP clients only need to connect to one local server, and then they can get access to all resources of all FTP servers in FTPGrid. If the resources are on the local server, the clients can access directly from the local server. If the resources are not on the local server, the local server will access the resources from other servers in the grid on behalf of the clients. The local server can send required data to the clients while receiving it from remote servers at the same time. The key techniques in FTPGrid include the synchronized resource directory and the access relay protocol. Some related techniques are: File Transfer Protocol [5]: It is still the key protocol in FTPGrid. LDAP [6]: Directory services are necessary for locating the resources in FTPGrid. Domain name system [3, 4]: The caching strategies of DNS will be used in the resource caching mechanism in FTPGrid. Grid technology [1]: Globus project [2] is one of the most successful projects relating to the grid computing in the world, but there still lacks real practical grid applications. FTPGrid proposed here would realize a coordinated resource sharing among dynamic collection of FTP servers; it does not build on Globus platform though. The benefits of FTPGrid include: Compatibility: the traditional FTP client software can access FTPGrid. Leverage the access speed: normally FTP servers are connected to the high speed backbone, so FTP client can have a high speed access to the resources in FTPGrid through a local access point. Locating the resources efficiently by looking up the resource directory.

3 Key Issues to the Implementation of FTPGrid It is not necessary for all FTP servers to keep a complete copy of all resources in the grid. Each server only stores the authority resources and those that are mostly accessed by clients. While a complete directory that keeps the information about all resources stored on the geographically distributed FTP servers is necessary to present a unified resource view to all FTP clients. The followings are some of the main issues to the implementation of FTPGrid.

FTPGrid: A New Paradigm for Distributed FTP System

897

Authority Resources. The resources in FTPGrid are divided into two categories: authority and non-authority. The resources that have important or historical values are called authority resources and should be kept in the archives and cannot be deleted from the server. The resources such as those in caches or for temporal use are nonauthority and may be deleted upon the local storage policies. FTPGrid Directory Service. LDAP can be used for the directory service in FTPGrid. All resources in FTPGrid should be registered in LDAP server. The directory entry may include followings: Resource name, resource description, file size, etc. Resource locations (local and/or remote addresses in the grid.) Resource attributes, such as authority/non-authority resources. Access control level. Communications between LDAP and FTP Servers. FTP server receives client’s requests, retrieves LDAP server, and sends the requested directory information or resources back to the client. So, LDAP server is transparent to FTP clients. LDAP Server Deployment. In FTPGrid, LDAP server and FTP server may be hosted on the same machine, and the deployments of LDAP servers may take following two forms: Each LDAP server serves for a local FTP server only. Some LDAP servers may serve for several geographically neighbor FTP servers. Directory Synchronization and Caching. All directory servers must be synchronized to present a unified resource view of FTPGrid. Some attributes of resources might be different among different servers. The resources may be local to some server, but remote to other servers. The resources may be authority to some servers, but non-authority to others. Directory caching mechanism will also be used in order to leverage server’s response time. Access Relay. If the resources that a client requested are not in the local server, the server will first look up at local LDAP server, then access the resources in some remote servers on behalf of the client, and finally send the data back to the client. This access relay mechanism is key extension to the FTP protocol in FTPGrid. It is also transparent to FTP clients. Resource Caching. The resources retrieved from remote servers will be stored temporally on the local server in order to speed up the response time for possible subsequent requests. The temporal copies of the resources will be marked up as nonauthority, which means the resources could be deleted after certain period of time. The period depends on how frequently the resources have been accessed in the most recent period and the local server’s storage limitation.

898

L. Xu and B. Ai

Join and Leave of FTP Servers. In the dynamic environment of FTPGrid, it is common that servers will join or leave a grid. If a server joins into an FTPGrid, the newcomer should first synchronize its directory server from the neighbor directory servers. When a server leaves, all the resources including those authority resources stored on the server will not be available to clients. So, an FTPGrid could keep several copies of such authority resources on other servers upon certain policies. Authentication and Authorization. Single point logon mechanism is necessary for user to access the resources in FTPGrid. The simplest way is to adopt the global user account and its authentication/authorization information could be integrated into LDAP server. The access among FTP servers should use a special user account with host information in the authentication phase.

4 Conclusions This paper presents a brief description of FTPGrid. FTPGrid makes the geographically distributed FTP servers work cooperatively and keeps the interface between clients and servers unchanged. Some extensions are made to server side in order for FTP servers to work together. FTPGrid makes it possible for FTP clients to get a high speed access to all resources in FTPGrid through a local server. Another advantage of FTPGrid is that it can radically reduce the traffics on the network, because the same resources will not be transferred to-and-fro frequently on the backbone of the Internet.

References 1. Foster, I., Kesselman, C., and Tuecke, S. The Anatomy of the Grid: Enabling Scalable Virtual Organizations, International J. Supercomputer Applications, 15(3), 2001. http://www. globus.org/research/papers/anatomy.pdf. 2. Globus Project. http://www.globus.org 3. Mockapetris, P. Domain Names – Concepts and Facilities. IETF, RFC 1034. http://www.ietf.org/rfc/rfc1034.txt. November 1987. 4. Mockapetris, P. Domain Names – Implementation and Specification. IETF, RFC 1035. http://www.ietf.org/rfc/rfc1035.txt. November 1987. 5. Postel, J., and Reynolds, J. File Transfer Protocol (FTP). IETF, RFC 959. http://www.ietf.org/rfc/rfc959.txt. October 1985. 6. Wahl, M., Howes, T. and Kille, S. Lightweight Directory Access Protocol (v3). IETF, RFC 2251. http://www.ietf.org/rfc/rfc2251.txt. July 1997.

Using Data Cube for Mining of Hybrid-Dimensional Association Rules Zhi-jie Li1, Fei-xue Huang2, Dong-qing Zhou1, and Peng Zhang 3 1

Department of Computer of Dalian Univ. of Technology, Liaoning, China , 116024 [email protected], [email protected] 2

Software School of Dalian Univ. of Technology, [email protected]

3

The Institute of Information and Decision Technology of Dalian Univ. of Technology, [email protected]

Abstract. In this paper, the mining of single-dimensional association rule and non-repetitive predicate multi-dimensional association rule were integrated. We proposed an algorithm for mining of hybrid-dimensional association rule using a data cube structure. Preliminary result shows that when the number of tuples is large, the algorithm is efficient.

1 Introduction Mining association rules in transactional or relational databases is an important task in data mining. Previous studies on mining multi-dimensional association rules focused on finding non-repetitive predicate multi-dimensional rule. We integrate the singledimensional mining and non-repetitive predicate multi-dimensional mining, and present a method for mining hybrid- dimensional association rules using data cube. Our study is confined to single variable rules.

2 Preliminaries Definition 1: A rule containing more than one distinct predicates is a multidimensional association rule. Definition 2: A rule in which all the predicates have distinct predicate names is called a non-repetitive predicate multi-dimensional association rule. Definition 3: A rule in which some predicates appear more than one time is called a hybrid-dimensional association rule. Our study explores rule mining using data cube structure. An n-dimensional data cube is an n-D database. Each dimension of the cube, represents an attribute contains rows where is the number of distinct values in the dimension The first rows are data rows. Each distinct value of takes one data row. The last row, the sum row, is used to store the summation of the counts of the corresponding columns of the above rows. A 3-D data cube is shown in Fig.1. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 899–902, 2004. © Springer-Verlag Berlin Heidelberg 2004

900

Z.-j. Li et al.

Fig. 1. 3-dimension data cube

3 Methods for Ming Using Data Cubes We extend Apriori to deal with the problem of repetitive predicate and propose an algorithm of mining hybrid-dimensional association rule. The key of mining hybriddimensional association rule is to find large itemsets and large predicate sets at the same time. Now, we give an example to illustrate the method, then give the corresponding algorithm. Example : Suppose the mining task is the one specified in Example 1, and the data cube is given as in Fig.1, Let min_sup=0.2, and min_conf-0.8. First to find large itemsets. In this example, only buys satisfies this condition. Its large itemsets are {cp,sf},{cp,pt},{mp3,pt}. Second to find large predicate sets. Search 1-D data cube to find 1-predicate sets, i.e. and so on. Find large 2-predicate sets by searching 2-D space based on the candidate sets obtained by intersecting the large 1-D planes, e.g., intersecting and leads to six candidate pairs: Checking 2-D space of the cube, we have: Similar method to get and so on. Similarly, large 3-predicate sets are derived, resulting in Since the number of predicates in the DMQL is 3, the search for large predicate sets terminates. is used to generate the strong rules. and Based on the above method, we present the algorithm of mining single variable hybrid-dimension association rule as follows:

Using Data Cube for Mining of Hybrid-Dimensional Association Rules

901

Fig. 2. Performance with the growth of generalized tuples in the cube

4 Performance Study The algorithm was implemented and tested with a synthetic dataset on a PIII with 256MB of main memory running Win2000 . The data cube contains 5 dimensions, with 10 distinct values per dimension in the simulation. One predicate has 1000 items. The minimum support threshold was 20%. Fig.2 shows the running time of the algo-

902

Z.-j. Li et al.

rithm, tested on data cubes representing 100 to 10,000 generalized tuples, or cells. The curve shows that the algorithm has good performance.

5

Conclusions

In this paper, we have proposed the method for mining single variable hybriddimension association rules. Efficient method for mining of multi-variable hybriddimension association rules is an interesting topic for future research.

References 1. Ming-Syan Chen, etc. Data Mining: An Overview from Database Perspective[J]. IEEE Transactions on knowledge and data engineering, 1996,8(6),866-883. 2. G.Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, 229~238, AAAI/MIT Press, 1991. 3. R.Agrawal, etc. Mining association rules between sets of items in large datbases[C]. Proc. of the 1993 ACM SIGMOD Int’l Conf. on Management of data, Washington, DC, 1993. 207– 216. 4. R.Agrawal, R.Srikant. Fast algorithms for mining association rules[C]. Proc. of the 20th Int’l Conf. on Very large databases, Santiago, Chile, Sept, 1994,487–499.

Knowledge Sharing by Grid Technology Bangyong Liang, Juanzi Li, and Kehong Wang Dept. of Computer Science, Tsinghua University, Beijing 100084, China [email protected], [email protected]

Abstract. Nowadays, the distributed computing mode is more and more popular. A system regularly needs to distribute its data in different places. The knowledge base system is also among the systems that work on dispersed data. The web provides a ubiquitous medium for seamlessly integrating distributed applications, formats and contents, making them well suited for enterprise knowledge management. In this article, we discuss a framework for knowledge sharing by web service and grid technology. We will also mention how other knowledge bases can be integrated into this framework and share their knowledge. Finally we give a prototype system based in this framework and discuss the future work.

1 Introduction Knowledge base systems are usually close systems. The close system may have many excellent features, like high performance and good security, but it also brings some inconvenience. For example, when the system needs to be distributed, how can they exchange knowledge in the web environment? Fortunately, the web service technology provides an easy way to change data automatically and semantically. The web service technology is a core technology of the OGSA (Open Grid Service Architecture) model of the grid computing. In this paper, we describe our work on sharing knowledge by grid technology especially by web service technology. We construct our knowledge base for the stock annual reports. The stock annual reports are from the Shanghai Stock Exchange.

2 Framework The framework contains a few layers. Before the discussion of the framework, let’s consider the users’ need about knowledge sharing. The needs of the knowledge sharing concerns about the follows: 1. The end users access the knowledge as they are all local. 2. The end users can get uniform knowledge from different knowledge sources. 3. The distributed systems can exchange knowledge automatically and semantically. We design our system to fit the needs, the framework is as follows:

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 903–906, 2004. © Springer-Verlag Berlin Heidelberg 2004

904

B. Liang, J. Li, and K. Wang

Fig. 1. The framework of the system

The knowledge storage can be implemented by database or file systems. Considering the management of large amount data, the database is preferred. The knowledge query service is implemented by web service for remotely access and also provides local API for local access. The API Layer provides the uniform API for applications, hides the difference of the local access and remote access from end users. A scenario of the system is as follows:

Fig. 2. A scenario of the system The scenario shows that a query can go through distribute knowledge bases and get knowledge from different knowledge bases. The following sections will discuss the detail design based on the framework.

3 Design of the Layers The Knowledge Storage Layers are built on the database. The knowledge base consists of concepts, instances of concepts and relations between instances. The knowledge storage is made up of four tables in the rational database. The four tables are concept table, instance table, relation type table and the relation instance table. The concept table contains the classes of the reports. The instance table contains the instances of the classes. The relation type table includes the relations of the concepts. For example, a report has a part named basic part which has the information about the company name and the company location. The relation instance table includes the relations between the instances. Knowledge service layer is the core layer of the framework. The knowledge service layer provides the local and remote access service to the knowledge base. The local knowledge access service includes the query, adding, deleting and updating knowledge. It is provided by local API. The remote access service is the core for

Knowledge Sharing by Grid Technology

905

knowledge sharing. The remote access interface is provided by web services and the functional interface is as follows: QueryRelation(String firstArg, String secondArg, int type) The interface is simple yet powerful. The type parameter is important. Using the type we can identify what the query wants to get and what the parameter means. A specific type actually means a query template. We provide a template for such search. The template is described in XML format. A search template is as follows:

The search template can be easily extended because it is described in the XML language. If we want to add some new features to the template, we simply add some attributes to it. For example, A search for “Mary hasParent ?” may return no results. But there may have the relation “Mary hasFather Jack” in the knowledge base. And the relation type “hasFather” is inherited from “hasParent” which we call “hasFather” is a subclass of “hasParent”. We can enable subclass search of the knowledge base by adding new search templates. The Knowledge Access Layer provides the API for local access and remote access. The local access API aims at not only knowledge base query but also knowledge base management. The remote access API aims at query remote knowledge bases. It includes the API for discovery the knowledge bases in the same group by querying the register server and the API for querying knowledge in the remote knowledge base. The registry separates the knowledge bases by domain. Knowledge bases in the same domain will be put into one group. Currently, the registry data structure is implemented by LDAP (Lightweight Directory Access Protocol). LDAP server indexes all the data in its entries, and “filters” may be used to select just the person or group and return just the information you want. LDAP is suitable for looking up services and devices on the Internet. In the prototype system, we use the OpenLDAP to build the LDAP server in the system. The Registry server also provides its query and registry service by web service.

4 Knowledge Exchange The web service interface can easily enable knowledge exchange automatically and semantically. The knowledge base can exchange knowledge by querying each other. A sample query is as follows: QueryRelation(Company1, hasEnglishName, Find_Object) (SHANDONG PESTICIDE INDUSTRY CO.,LTD) Such triples contain the knowledge that the knowledge base wants to know. Such query can be scheduled by task programs. Other knowledge base can be integrated to this framework. The only change is to provide the QueryRelation interface by web service and registers it to the registry server. It is also why we try to provide as less

906

B. Liang, J. Li, and K. Wang

interface as we can. The ease of integrating other knowledge base will enhance the knowledge amount in the system.

5 Implementation Based in the framework and design, we implement a prototype system. The data is extracted from the stock annual reports. We construct several knowledge base systems and a register server. We also make a web site to let user query one of the knowledge base. Actually, the query to the knowledge base is the query to the knowledge bases in the same group. This distributed query is hidden from the end users.

6 Conclusion and Future Work The web service and grid technology are the key technology of the resource share in the future. They can be used in the field of knowledge share. We provide a method to share knowledge in the web environment using web service and grid technology. The semantic web is also a trend for next generation web. We also want to provide knowledge service for semantic web software including the annotation application, agents and others. We are also thinking to make our web service to semantic web service by describing our web service by DAML-S (DARPA Agent Markup Language-based Web Service Ontology).

References 1. Deborah L. M.: Conceptual Modeling for Distributed Ontology Environments. Proceedings of the Eighth International Conference on Conceptual Structures Logical, Linguistic, and Computational Issues, Darmstadt, Germany, August 14-18, 2000 2. Studer R., Benjamins R., and Fensal D.: Knowledge Engineering Principles and Methods. IEEE Trans. on Data and Knowledge Eng.,vol. 25, 1998 3. Tiwana A., Ramesh B.: Integrating Knowledge on the Web. IEEE Internet Computing. May/June, 2001 4. Curbera F., Duftler M., Khalaf R., Nagy W., Mukhi N., Weerawarana S.: Unraveling the Web Services Web: An Introduction to SOAP, WSDL, and UDDI. IEEE Internet Computing, March/April, 2002 5. Czajkowski K., Fitzgerald S., Foster I., Kesselman C.: Grid Information Services for Distributed Resource Sharing. Proceedings of the Tenth IEEE International Symposium on High-Performance Distributed Computing (HPDC-10), IEEE Press,August 2001 6. Sheila A. M., Tran C. S., Honglei Z.: Semantic Web Service. IEEE Intelligent Systems, March/April 2001

A Security Access Control Mechanism for a Multi-layer Heterogeneous Storage Structure Shiguang Ju, Héctor J. Hernández, and Lan Zhang Department of Computer Science, Texas Tech University, TX, USA {ju, hector, zhangl}@cs.ttu.edu

Abstract. This paper introduces a general multi-layer storage structure. Object instances from geographic surface objects are transferred and stored into the standard structure. We introduce some new access control rules to modify the BLP model, so that the owner can process unrestrictedly an object that is created by the owner. When a process needs to decide whether to execute an object method, it calculates using the object instance the security level and information flow direction. Keywords. Spatial databases, Object-relational data structure, Discretionary Access Control, Mandatory Access Control.

1 Introduction Based on the relational model, the security technology of multi-level access control has already matured in theory and in practice. A spatial database contains not only relational data, but also object-relational data. These data are declared as point, line, area, and volume data types. The difference among heterogeneous structure object methods makes data access control to be more complex [1]. Moreover, sometimes a subject needs to process the data of an object created by itself that has a higher security level. Because the traditional BLP model executes upwards-writing operations and downwards-reading operations, a subject cannot add data into higher layers continuously [2]. We design a multi-level security model for multi-layer heterogeneous spatial data structure. Based on the special storage structure, the paper mainly describes how to implement Access Control in spatial databases. First, we introduce a multi-layer general storage structure. Object instances from geographic surface entities are transferred and stored into the standard structure. Then we use some new rules to modify the BLP model, so that the owner can process, without any restrictions, an object that is created by itself. When a process needs to decide whether to execute an object method, it calculates, using the object instance, the security level and information flow direction.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 907–912, 2004. © Springer-Verlag Berlin Heidelberg 2004

908

S. Ju, H.J. Hernández, and L. Zhang

2 Spatial Data Multi-layer Heterogeneous Storage Structure The objects’ part in our spatial DBMS is used for maintaining the information of geographic surface entities. For example, an area may include rivers, houses, streets, oil wells, etc. That is, the map of the area has many classes of objects. Each class has many object instances. If an attribute of an object is another object class, we call it “nested object.” It is difficult to directly do the access control with traditional methods for heterogeneous nested objects [5]. After doing that, they must be transferred to the proper structure; the Discretionary Access Control or Mandatory Access Control can be possibly applied to them. We design a two-layer storage mechanism, as shown in figure 1. The first layer is called the relational information layer. It is mainly used to maintain metadata of the spatial DBMS. A B-tree can be used as an index mechanism. The second layer is the spatial data layer. It is used to store the coordinates of all geographic surface entities, graphics and images.

Fig. 1. Two-layer storage mechanism On the relational layer, the leaf nodes of the B-tree contain file pointers, which point to their corresponding structure description files on the second layer. On the spatial layer, the different objects have different data models. For example, the entity “river” has a data field that is represented with a series of point coordinates (x, y, z). But the entity “oil-well” is represented as a point coordinate (x, y, z) and its profundity. The two entities have different attributes and different length. On the spatial layer, we declare two types of files that are named with “.def” and “.gra” as the postfix. The group of files can be divided into three levels. One is used to store object class information; the second level is object instance information; the third is for pure data, as shown in figure 2. For all geographic entities, we transfer their different data styles into the general data structure as shown in figure 2. As for the implementation of the multi-layer Mandatory or Discretionary Access Control, we add security labels, not only to the

A Security Access Control Mechanism

909

first metadata layer, but also to the second spatial data layer that is applied to level 1 and level 2. For the layer “classes” and “instances,” each layer must be assigned a security label. According to the integrity requirements, an object security level must be higher than its ancestors’ security level [3,4].

Fig. 2. Three-level data structure for a group of spatial data files

For restricting the object’s consistency in the model and the integrity in the security label, the security level relationship must meet the following requirements: Any security level of an object must dominate the one of its ancestors. Any extension of a security level must dominate the one of its intensions. If we directly use the BLP model to apply it in this mechanism, when an object’s, say security level cannot dominate its ancestors’, say or security levels, what has happened is that the subject, say S, which has higher security level, cannot process as shown in figure 3. Object-relational data may have many layers and each layer’s security level is higher than its predecessor’s. If we directly apply the BLP model to them, it may be bring some problems. In the above map, for example, if a subject M wants to accomplish the drawing of the map, it is necessary to continually add object types and object instances to the MAP files. According to the BLP model, a safety access must satisfy If it is said that all objects have the same security levels. So the mandatory access control has not meaning. If the subject M does not have the right to read the file map_b.def, which is created by subject M. In an other aspect, the spatial entity’s dependence and spatial heterogeneity make the subject M unable to access the object’s information that is created by it. These problems cannot be resolved by the traditional BLP model.

910

S. Ju, H.J. Hernández, and L. Zhang

Fig. 3. An object and its ancestors’ security level relationship

3 A Security Access Control Mechanism For calling the object’s methods, there are a lot of differences due to having different data structures. For the RIVER example, we can draw a curve by connecting the continuous points. But for the OIL_WELL example, we can get a graph by drawing the intersection of the cross and circle symbol. The heterogeneity of the entities, e.g., RIVER and OIL WELL, causes different objects to have different operation methods. Therefore, the Mandatory Access Control cannot be uniform. Now, let us consider object instances of an object type that stores attributes of the corresponding object instances. For those attributes that belong to the same object instance, their operation methods are the same. The more convenient method is to classify object data operation methods by the direction in which the information moves. One is the upwards-writing operation that includes insert, update, and delete functions. Another is the downwards-reading operation, which includes read, draw, and so on. Different object instances call different object methods. We tie the method’s security with the object instance’s security level. According to comparison of object instance’s security levels and information moving direction, a subject decides whether to call object methods using one of the following rules: If a subject’s security level dominates the object’s, the subject may call the downward-reading operation. If a subject’s security level is dominated by the object’s, the subject may call the upwards-writing operation.

4

The Security of the Modified Model

The key to resolve the above problem is to enable the subject to access the object information owned by higher security levels, which are created by the owner of the object. We modify the access control rules of the BLP model, and introduce some new access control rules as the following:

A Security Access Control Mechanism

911

The owner may unrestrictedly access an object created by itself. The owner appoints an object, created by itself, as an ancestor to add data for a higher-level. When the owner writes information, which was read from the object created by itself, the security level of the written object must dominate the security level of the object which is the information source. Any no-owner and other cases must satisfy the BLP access control rules. To implement a secure spatial DBMS, we combine the owner’s privilege and security level comparison. It meets the multi-layer heterogeneous object data’s access control requirements. To make the access control more efficient, we also specify some restricted conditions. Some of them are: No other medium can make two subjects exchange database information without obeying the access control rules. The owner cannot keep information duplicates of an object, which has already been known by the owner, by any other means. Any data operations cannot reduce an object’s security level. We assume that S is a subject, O is an object that is created and owned by S. S knows all the information of object O. When S accesses object O’s information by rule 1, it doesn’t cause any information disclosure. Also, we assume that S1 and S2 are not owners, and they have the same security level. According to rule 4, S1 and S2 have no permission to process an object that has higher-level security. Therefore, S1 and S2 cannot process object O and it’s duplicated. Moreover, according to rule 1, through the owner S, S1, and S2 have the same security level, they cannot transmit object O’s information. It prevents objects’ downwards information flow caused by the illegal communications among owners, as shown in figure 4.

Fig. 4. Modified BLP model read-write rights

As rule 3 mentions before, the duplicate’s security level must dominate object O’s original security level. Even if the owner S makes a duplicate for object O, S1 and S2 still have no permission to access the duplicate’s information because the duplicate’s security level dominates the security level of the source of the object’s information. That makes the data secure.

912

S. Ju, H.J. Hernández, and L. Zhang

5 Conclusion Based on a discretionary spatial database system, we have implemented the above security protection mechanism for spatial information. The model has the following advantages: To meet the spatial database access control requirements, the heterogeneous object information is transferred to standard spatial data records by using a multi-layer storage structure. Based on the heterogeneous nature of spatial objects, we can combine the method security level and the object instances security level. To implement object security protection, we also need to decide the method invocation, according to the direction of the object method operation. By using this model, the owner’s access privilege is increased. Because the owner can assign object’s ancestors to insert new data, it solves the problem of multi-layer object’s continuous increment of security levels. In this model, we use the traditional access control rules to implement mandatory access control on each object’s layer. This method prevents users from bypassing the low-level access control to directly access the high-level object information. The model is also applicable to other object relational databases, which are based on multi-layer storage structures, to meet their multi-level security requirements. In the future, we will focus on research about how to increase the efficiency of multilevel security models and how to capture convert channels.

Acknowledgements. This work was supported by National Natural Science Foundation of China (No.60373069) and Jiangsu Nature Science foundation ( No. BK200204).

References 1. Shiguang Ju, “The storage mechanism of spatial database”, Systematic Engineering and Electronic Technology, 1999, No. 6, pp 62-65. 2. David F. Ferraiolo, Ravi Sandhu, Serban Gavrila, D. Richard Kuhn and Ramaswamy Chandramouli, “A Proposed NIST Standard for Role-Based Access Control,” ACM Transactions on Information and System Security, August 2001. 3. Ravi Sandhu, Qarnar Munawer, “How to do Discretionary Access Control Using Roles,” Proceedings of ACM Workshop Role-Based Access Control, Fairfax, Virginia, October 22-23,1998. 4. Ravi Sandhu and Fang Chen, “The Multilevel Relational(MLR) Data Model,” ACM Transactions on Information and System Security, November 1998. 5. LaPadula L J., “Formal modeling in a generalized framework for access control”, In: Proc. of the IEEE Computer Security Foundations Workshop III. Los Alamitos, CA, 1990. pp 100-109.

Investigating the Role of Handheld Devices in the Accomplishment of Grid-Enabled Analysis Environment Ashiq Anjum1, Arshad Ali1, Tahir Azim1, Ahsan Ikram1, Julian J. Bunn2, Harvey B.Newman3, Conrad Steenberg3, and Michael Thomas3 1

National University of Sciences and Technology, Rawalpindi, Pakistan

{ashiq.anjum, arshad.ali, tahir, ahsan.ikram}@niit.edu.pk 2

California Institute of Technology (Caltech), Pasadena, CA 91125, USA

3

California Institute of Technology (Caltech), Pasadena, CA 91125, USA

[email protected]

{Newman, Conrad, Thomas}@hep.caltech.edu

Abstract. We investigate the role of handheld devices as a potential platform to be used in the Grid Enabled Analysis Environment (GAE) by porting desktop PC-based analysis software to run on Pocket PC’s and other handhelds. This will enable them to be used for the analysis of data from the Compact Muon Solenoid (CMS), which goes online in 2006 at the European Organization for Nuclear Research (CERN). The environment currently comprises client software that runs on the Pocket PC providing interactive analysis features on the device and a remote data server named Clarens, which functions as a portal to the Grid, and ensures secure and authenticated access to the CMS data.

1 Introduction The CMS (Compact Muon Solenoid) [1] at CERN, going online in 2006, will use the Grid to store the gigabytes of data it will generate each minute. This data can only be analyzed by rendering it in the form of 2D and 3D diagrams to enable scientists to derive conclusions about events taking place in the CMS. Our research aims to harness the technology of handheld devices to analyze this event data stored on servers connected to the Grid. This paper describes an analysis environment that has been built by porting popular desktop PC-based physics analysis software including the Java Analysis Studio (JAS) [2] and the WWW Interactive Remote Event Display (WIRED) [3] for the PersonalJava environment on the PocketPC with WinCE 3.1. A portal to the Grid is provided by the Clarens server [4] developed at the California Institute of Technology (Caltech). Clarens also provides the Globus Security Infrastructure (GSI)-based [5] authentication features. Either wireless or wired network connections to the Pocket PC are possible by use of an appropriate 802.1 1b compatible plug-in card. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 913–917, 2004. © Springer-Verlag Berlin Heidelberg 2004

914

A. Anjum et al.

2 Related Work Currently there are no physics analysis applications available for the PocketPC and handheld devices. The main obstacles in this field include the slow and unreliable nature of wireless connections, and the slow speed (typically 200 to 400 Mhz), limited RAM and small amount of permanent storage (usually 32 MB) of the handhelds. However, a large number of desktop-based applications for analysis of physics data are available. These include JAS, WIRED, ROOT, GEANT and IGUANA. JAS, developed at the Stanford Linear Accelerator (SLAC), is used mainly for the analysis of 1D and 2D histogram data from particle accelerators. It can fit mathematical functions to the data histogram and display various statistics related to the data. WIRED, developed at CERN and SLAC, is used for rendering event data and subcomponent geometry information from particle accelerator experiments. The file format used by WIRED is HepRep, which represents event data using XML. Finally ROOT (developed at CERN) [6] is an important tool for us because of its special format ROOT files, containing data objects in a highly efficient, quickly accessible hierarchical structure.

3 Integration Architecture of the Analysis Environment with the Grid The analysis environment is integrated with the Grid environment through a Gridenabled portal developed at Caltech, named Clarens. Clarens is basically a remote data server, acting as a portal to the Grid. It provides secure access to data files through a GSI-based security protocol. Any users wishing to access the data need to authenticate themselves with Clarens before accessing any services on it. Once logged in, users can access various services using XML-RPC. We are now in the process of implementing own Java-based version of Clarens which can integrate more effectively in this architecture. The Java based Clarens (JClarens) will use a peer-to-peer platform for load-sharing and fault tolerance, and will thus provide better performance over the unreliable wireless connections.

4 JASOnPDA JASOnPDA is the main PocketPC based application that has been developed so far. This software communicates as a client with Clarens server, which enables it to fetch histogram data as ROOT files from the server, and manipulate and render it on the Pocket PC by use of the stylus. Users can also view statistics and fit functions against the histograms.

Investigating the Role of Handheld Devices in the Accomplishment

915

Fig. 1. A snapshot of JASOnPDA running on a PDA taken through the Remote Display Control Host, showing its features of histogram plotting, function fitting, and statistics calculation

The most important issue that had to be resolved was that of the incompatibility of some critical classes with PersonalJava on Wince. Firstly, the classes from FreeHEP [7] for analysis of ROOT files, known as RootIO, used dynamic proxies, which are unavailable in PersonalJava. As a result, we wrote entirely our own code for carrying out RootIO. Secondly, the encryption/ decryption features needed for authentication with Clarens were also incompatible. To solve this issue, cryptography classes from BouncyCastle [8] were used. These classes allowed us to make our application “Gridenabled”; able to access the Grid with Clarens providing the necessary security mechanisms. The first iteration of this application was not, however, up to the required standards of performance. The speed of the application was enhanced by optimizing the parsing of the ROOT file to fully utilize the indexed structure of ROOT files and using backend threads to speed up the tree structure display and populate hash tables with data from the histogram objects.

5 WiredOnPDA WiredOnPDA is a reduced version of Wired using Personal Java as the VM on WinCE. The basic interfacing of the application with Clarens is carried out using the same method as JASOnPDA. The HepRep2 XML is parsed using the Piccolo parser, which has been found to be the fastest SAX parser available so far. The parsed data is used to extract the “drawables” stored in the HepRep2 files, which are then displayed on the screen. Transformations and projections on the drawables can also be applied on the event displays.

916

A. Anjum et al.

The issue of incompatibility with Personal Java was resolved by replacing the incompatible code in WIRED3 classes with code from the older versions of WIRED (WIRED 1). This especially included the code for displaying the drawables, which was based initially on Java2D. We replaced this code with the Graphics classes of Java 1.1, which gave surprisingly good results in displaying the drawables.

Fig. 2. A view of WiredOnPDA displaying an event from a HepRep2 file. The buttons on top allow users to translate, rotate or scale the diagram

6 Selection/Evaluation of Tools and Technologies for Handhelds An important problem was to find out a suitable JVM for running our Java applications on the PocketPCs. After evaluating several technologies such as IBM Device Developer, SuperWaba, Savaje, and MIDP, we finally opted for Personal Java Runtime Environment by Insignia Jeode [9], which supported Java 1.1.6 as well as several security and collection classes from Java 2.

7 Conclusion Already our current work proves that resource-constrained devices such as the Pocket PC can be integrated with the Grid, and can play a vital role in the realization of the idea of a Grid-Enabled Analysis Environment (GAE). The completion of this project will prove to be a milestone towards the attainment of a level of maturity in Pocket PC based applications, that has only be seen in desktop applications so far.

References 1. 2. 3. 4.

Compact Muon Solenoid Outreach Activities.(http://cmsinfo.cern.ch/) Java Analysis Studio (http://jas.freehep.org/) WWW Interactive Remote Event Display (http://wired.freehep.org/) Clarens Grid-enabled Web Services Framework (http://clarens.sourceforge.net/)

Investigating the Role of Handheld Devices in the Accomplishment

917

5. Security for Grid Services. V. Welch, F.Siebenlist, I. Foster, J. Bresnahan, K.Czajkowski, J. Gawor, C. Kesselman, S. Meder, L. Pearlman, S. Tuecke. Twelfth InternationalSymposium on High Performance Distributed Computing (HPDC-12), IEEE Press, June 2003. 6. The ROOT System (http://root.cern.ch) 7. The FreeHEP library (http://java.freehep.org/) 8. Legion of the Bouncy Castle (http://www.bouncycastle.org) 9. Insignia Jeode Virtual Machine for Handhelds (http://www.insignia.com/)

A TMO-Based Object Group Model to Structuring Replicated Real-Time Objects for Distributed Real-Time Applications Chang-Sun Shin, Su-Chong Joo, and Young-Sik Jeong School of Electrical, Electronic and Information Engineering, Wonkwang University, Korea {csshin,scjoo,ysjeong}@wonkwang.ac.kr

Abstract. We designed the TMO-based Object Group(TMOOG) model with various services for supporting distributed real-time applications, and analyzed whether these given service strategies can be adopted and worked to this model well. In this paper, we are interested in the object group model that can be defined as a single logical view system for providing replication transparency and supporting real-time services. The TMOOG model is based on the concept of the object group recommended by the TINA-C and the OMG. Our model consists of a set of computational Time-triggered and Message-triggered Objects(TMOs) for real-time services and several management objects providing how to manage these TMOs and maintain their information such as repositories for executing given distributed real-time applications. The TMOs in an object group may act as copies, called replicas, of a replicated object with the same roles. For the distributed real-time applications, we focused on the following strategies; dynamic object selection and binding strategy to support replication transparency, and real-time strategy to support real-time applications. For supporting these strategies, we adopted the Dynamic Binder object, the Scheduler object and some related objects to our model. These strategies can be flexibly implemented as object implementations by using appropriate algorithms in the model. Our model is designed on Commercial Off-The-Shelf(COTS) middleware without restricting the special Object Request Broker(ORB) or the operating system. Finally, from the analyzed results numerically, we verified whether our model can support these strategies and showed the feasibility of these service strategies adopted from TMOOG model.

1 Introduction and Related Works The modern distributed systems have been physically extended forward the wide-area distributed real-time object computing environment[1]. The distributed services have to support the ubiquitous systems as a logical single view system, not physical one. Hence, the physical distributed environment must be logically reconfigured to the appropriate service-dependent groups for supporting specific application with realtime services, replicated resource services, and/or object-oriented services and so on. In this paper, we are interested in the object group, as a logical application recon-

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 918–926, 2004. © Springer-Verlag Berlin Heidelberg 2004

A TMO-Based Object Group Model to Structuring Replicated Real-Time Objects

919

figuration, for using the replicated resources, like the server objects with real-time property. As some of the representative researches in this area, the Telecommunications Information Networking Architecture-Consortium(TINA-C) defined the TINA[2]. In the TINA, the distributed applications can logically be configured to groups as units of associated objects on distributed systems. But they defined only the object group’s management focused on distributed functional components, and have not defined the real-time services in a distributed environment yet. After this, the Real-Time Special Interest Group(RT-SIG) organized by the Object Management Group(OMG) suggested the Real-Time Common Object Request Broker Architecture(RT-CORBA) with having the real-time extensibility for adding the real-time property to CORBA specification[3]. But it made the distributed realtime environments that depended on the special system and/or the operating systems for the real-time services by extending the ORB itself that is the core of CORBA. With following TINA and CORBA specifications, we have researched the Real-Time Object Group(RTOG) model[4,5]. The RTOG model is developed as a framework supporting the distributed real-time services and the object group management based on the TINA’s object group concept. The objects in the RTOG used Time-triggered and Message-triggered Objects(TMOs) for practices. The TMO developed by the Dream Laboratory at University California at Irvine(UCI)[6] is defined as an object having real-time property itself. For proving real-time service, we adopted the TMO scheme into the RTOG model. But they cannot support dynamic object selection and binding service for replicated TMOs. We extended exiting RTOG model to the TMO Object Group(TMOOG) model with services mentioned above. With this TMOOG model, we defined the concepts of the TMO and the structure of our model. We also designed the functions and interactions of the group components, such as a Dynamic Binder object and a Scheduler object for implementing the dynamic object selection and binding and real-time strategies. In order to verify the correct executions of the model, these strategies are implemented in a Dynamic Binder object and a Scheduler object respectively by the known relating algorithms. The algorithms are the binding priority algorithm for supporting dynamic object selection and binding service and the Earliest Deadline First(EDF) algorithm for providing real-time scheduling service. Finally, from the numerical execution results we are analyzed, we showed whether our TMOOG model could support the dynamic object selection and binding service for TMOs asked from clients, also real-time scheduling service from the selected TMO. Our paper is organized as follows; in section 2, we describe the overviews of the TMO scheme and the TMOOG model. In section 3 and 4, we explain the dynamic object selection and binding service and global real-time scheduling service. Finally, we discuss our conclusions and future works in section 5.

2 TMO-Based Object Group Model The TMOOG model can support the distributed object management service and realtime scheduling service. That is, the TMOOG manages the TMOs as a unit of logical

920

C.-S. Shin, S.-C. Joo, and Y.-S. Jeong

single group on COTS middleware based on TINA’s object group concepts, and defines the real-time requirements for interacting among the TMOs. In this section, we explain the TMO scheme and the structure of the TMOOG. We also describe the functions of components consisted in this model and their interactions for services.

2.1 TMO Scheme The TMO is a real-time servicing object with the real-time constraints itself. The TMO scheme has syntactically a simple and natural structure, but semantically powerful extension of conventional object structuring approaches. The existing service object is not able to define the real-time constraints in its own data structure. But, the TMO has the real-time constraints, for instance, as a Spontaneous Method (SpM) that can be spontaneously triggered by the defined absolute time, which is clearly separated from the existing object. Figure 1 is shown the structure of the TMO scheme, and contains its own name, ODS, EAC, AAC, SpM, and SvM, as components in 4 sections. As stated in [6], the roles of each component are described as follows.

Fig. 1. Structure of TMO Scheme

Object Data Store(ODS) is a common object store having properties and states of the TMOs, and accessed by the SpM and the SvM. But it cannot be accessed by simultaneously. That is, the TMO is triggered by the Basic Concurrency Constraint(BCC). Environment Access Capability(EAC) is a list of gates to objects to providing efficient call-paths to remote object methods, logical communication channels, and I/O device interfaces. Autonomous Activation Condition(AAC) is an activation condition for an SpM, which defines the time window for the execution of that SpM.

A TMO-Based Object Group Model to Structuring Replicated Real-Time Objects

921

Spontaneous Method(SpM) is a time triggered method which runs in real-time a periodic manner. Service Method(SvM) is a message triggered method which responds to external service requests. The TMO scheme is going on the lively research in the real-time simulation as the military or the transportation applications. But the TMO scheme model has not been dealt with checking the access right for the security of TMOs, selecting and binding the replicated TMOs with the same service property, and supporting the distributed scheduling service of several TMOs globally in a given object group. We intended to solve these problems for taking advantages of the TMOOG model we suggested.

2.2 Structure of TMO-Based Object Group Model The TMOOG model is applied by using the object group concepts on COTS middleware. The major roles of these components are categorized into two kinds of services; the object management service for dynamic object selection and binding strategy and the real-time scheduling service. The former is how to manage the selection and the binding among objects, the latter is how to support distributed global real-time scheduling service in our model. While proposing the structure of this model, we considered the following requirements. 1) The TMOOG is defined as a unit of logical object group. All of TMOs intra or inter groups communicate with each other via COTS middleware, like CORBA. The TMOs are located in individual systems connected with physical networks limited in a given logical domain. 2) In this model, the object group may contain the replicated or nonreplicated TMOs. Here the replicated object means two or more objects with the same service property. For supporting the object management services, our model contains the Group Manager(GM) object, the Security object, the Information Repository object, and the Dynamic Binder object. And, for supporting the real-time scheduling services, our model contains the TMOs, the Real-Time Manager(RTM) objects, and the Scheduler objects. According to the object names or properties, these TMOs may be replicated in an object group. Also, our model may nest the Sub-TMO Object Groups in its own object group. Nested group allows encapsulation and hierarchical organization. Figure 2 shows the structure of the TMOOG model.

3 Dynamic Object Selection and Binding Service Strategy Let us consider with modular functionalities and executing procedures for the object management service for dynamic object selection and binding. First, the client requests a desiring TMO’s reference to the Group Manager(GM) object. The GM object is responsible for wholly managing all objects contained in an object group and

922

C.-S. Shin, S.-C. Joo, and Y.-S. Jeong

Fig. 2. Structure of TMO-based Object Group Model

returning the reference of TMO requested from clients at final stage. Then, the GM object firstly checks the access right for the client. In this step, the Security object checks the access rights of an object requested by referring the Access Control List(ACL). Next, the GM object sends a client’s information to the Information Repository object for obtaining the binding information. The Information Repository object stores information about all TMOs existing in an object group. This information is consisted of the object list having the attributes, such as service names and their own references. And, in next step, the Information Repository object sends replicated TMOs references to the Dynamic Binder object for obtaining a reference of an appropriate object. This Dynamic Binder object implemented by an arbitrary algorithm given selects an appropriate object, after referring each system’s load information. In this part, we showed that the Dynamic Binder object selects an appropriate one of the replicated TMOs by using the binding priority algorithm. Finally, the GM object receives the TMO’s reference from the Information Repository object, and returns selected one’s reference to a client. As results, this client will be bound to the TMO with the selected reference. In this section, we do not consider the selection and binding strategy for non-replicated TMOs, because it is trivial. We explain the strategy in point of view of replicated TMOs.

3.1 Dynamic Object Selection and Binding Strategy from Replicated TMOs When two or more TMOs with the same property, as objects requested by a client, called replicated objects, may exist in an object group, one of replicated TMOs should be bound with the client for a service. We suggest an algorithm, called binding priority algorithm, and will be adopted into our TMOOG model. This algorithm implemented in the Dynamic Binder object calculates the binding priority of each replicated TMO, as the input parameters, like load information of systems that TMOs are located on and the request deadline information of client’s task. According to numbers and types of input parameters, this algorithm will be altered to another

A TMO-Based Object Group Model to Structuring Replicated Real-Time Objects

923

algorithms. In this section, we use the binding priority algorithm for analyzing and verifying whether the Dynamic Binder object in our model is correctly working, but not showing the improved performance of our model. Let us consider the calculation of binding priority from this algorithm. The for the client’s calculates by modifying the following expression[7]. We here do not consider the communication cost among systems for simple calculation as workloads, because this cost can be ignored or vary according to given network types.

where request_deadline : client’s request deadline, CPU_utilization: CPU utilization ratios, c: rate constant(0.01) The binding priority algorithm as shown in Figure 3 is divided into 2 parts. The first part is one for calculating the binding priorities of Replicated TMOs using expression(1). The second part is one for selecting and binding one of Replicated TMOs using binding priority calculated before.

Fig. 3. The Binding Priority Algorithm As an example, we examine the binding priority algorithm, when replicated TMOs(TMO1 and TMO2 are replicated with the same property) are being in a TMOOG. When a client desires to request the replicated TMOs with an arbitrary service, TMO1 or TMO2 could be bound with a client object. With this algorithm, we explain how to select and bind to an appropriate object. At initial state, each TMO’s ready queue managing in the Dynamic Binder object is nulls. Let the CPU utilization ratios of systems in which TMO1 and TMO2 are located be the 10% and 11% respectively and tasks are periodic every 1sec. When the client1(c1)’s request with the Request Deadline(RD) and the Client’s Invocation Time(CIT)(RD: 10:00:09, CIT:09:00:59) is reached to the GM object at 10:00:00, the Dynamic Binder object

924

C.-S. Shin, S.-C. Joo, and Y.-S. Jeong

returns the TMO1’s reference since the CPU utilization of TMO1 is lower than TMO2’s. After then, if the client2(c2)’s service request with the binding information(RD:10:00:11, CIT:10:00:00) arrives to the GM object at 10:00:01, the Dynamic Binder object inserts c2’s status information to the each TMO’s ready queue and calculates the binding priority. As a result, the Dynamic Binder object selects the TMO2’s reference since the TMO2’s binding priority(0.1909) is higher than TMO1’s one(0.1526). And it deletes the c2’s status information from TMO1’s ready queue. The c1 and c2 will be bounded to TMO1 and TMO2, respectively. At this time, we assume that clients’ requests continue to arrive to the GM object for invoking replicated TMOs and calculate the binding priorities as shown in Table 1 below.

Table 1 shows each TMO object’s ready queue stored the 8 clients’ status information according to incoming clients’ requests. The BP is for the results obtained from the binding priority algorithm. From this table 1, we can show that c1, c3, c5, c6 and c8 are bound to TMO1, and c2, c4 and c7 are bound to TMO2.

4 Real-Time Service Strategy For supporting real-time services, the TMOOG model basically consists of the RealTime Manager(RTM) object and the Scheduler object. The modular functionalities and executing procedures for the real-time service are as follows. In first step, a client tries to bind the selected TMO with Real-time Information(RI:Client_Name, CIT, RD). This information must be queued for priority scheduling in the Scheduler object. In next step, TMO passes the received RI to the RTM object. And the RTM object calculates the Service Deadline(SD=RD-Transfer Time(TT)). And, next step, the RTM object invokes the Scheduler object to decide the task priority of client request. And, Scheduler object schedules its task priority using the EDF algorithm, as one of the proper scheduling algorithms in accordance with given parameters with timing constraints. In next step, the Scheduler object requests the service to TMO via the RTM object. Finally, TMO executes the real-time service and returns its executing

A TMO-Based Object Group Model to Structuring Replicated Real-Time Objects

925

result to a client. At the same time, TMO informs the completion of service to the RTM object for executing the next requested service from another client. In this section, we verified whether the Scheduler object implemented by EDF algorithm can correctly schedule, but we did not expect the improved performance, like the deadline missing rate, of our model. According to this algorithm, the highest priority task is defined a client request with minimum SD. The following simple expression(2) uses for making a condition decision of the task priority(TP), when given SDs of two tasks. Let us consider clients(c1,c3,c5,c6,c8) that will be bound with the TMO1 from Table 1 in section 3, and the requests have been arriving to the TMO1 sporadically and sequentially. In initial state, the TMO1 waits for executing client’s requests. When the client1(c1)’s request is arrived at the TMO1, the TMO1 executes the client request immediately. If the TMO1 receives a new client3(c3)’s service request during the TMO1’s execution, this request is passed to the Scheduler object and then stored in Scheduler object’s ready queue. These situations will be occurred repeatedly and continuously. From the Scheduler object’s ready queue, given scheduling algorithm produces the priorities of tasks by using the service deadline of each task. According to the assigned priorities of tasks, each task will be non-preemptively serviced from the TMO1 in regular sequence. The Table 2 shows the Scheduler object’s ready queue by the incremental order of TP, after c1’s request is executing on the TMO1.

From the TPs described in above Table 2, although clients’ requests are arrived in order of c3, c5, c6, and c8 to TMO1, the c3, c5, c8, and c6 will be sequentially executed on the TMO1 in accordance with the non-preemptive mechanism.

5 Conclusions The TMO-based Object Group, TMOOG, model proposed in this paper is a real-time object group model that can adopt dynamic object selection and binding and real-time scheduling strategies for supporting reconfiguration of distributed real-time applications on the distributed environment. For achieving our goals, we described whole TMOOG model, such as the concepts of the TMOs, the structure of the TMOOG, and the designing functions and interactions among the components in an object group. Also, we implemented the Dynamic Binder object and the Scheduler object for supporting dynamic object selection and

926

C.-S. Shin, S.-C. Joo, and Y.-S. Jeong

binding and real-time scheduling strategies and detailed execution procedures of the object management and real-time services in the model. And then in order to verify executions of the model we designed, the Dynamic Binding object was implemented by using the binding priority algorithm for a dynamic object selection and binding strategy, and the Scheduler object was implemented by using the EDF algorithm for a real-time scheduling strategy, respectively. From the numerical execution results we analyzed using above algorithms, finally, we showed whether our model could support not only the dynamic object selection and binding service of replicated TMOs, but also the real-time scheduling service for clients from the selected TMO. In future, for applying this model, we have a plan to develop a prototypical platform that can adopt various dynamic object selection and binding strategies and real-time scheduling strategies in a TMOOG or between/among TMOOGs. Acknowledgements. The authors wish to acknowledge helpful discussions of the TMO Programming Scheme with Professor Kane Kim in University of California at Irvine, DREAM Lab. This paper was supported by Won-Kwang University in 2003.

References 1. 2. 3. 4.

5.

6.

7.

M. Takemoto: Fault-Tolerant Object on Network-wide Distributed Object-Oriented Systems for Future Telecommunications Applications. In IEEE PRFTS (1997) 139-146 L. Kristiansen, P. Farley, R. Minetti, M. Mampaey, P.F. Hansen, C.A. Licciardi: TINA Service Architecture and Specifications, http://www.tinac.com/specifications OMG Real-time Platform SIG: Real-Time CORBA A White Paper-Issue 1.0. http://www.omg.org/real time/real-time_whitepapers.html (1996) W.J. Lee, C.W. Jeong, M.H. Kim, S.C. Joo: Design and Implementation of an Object Group in Distributed Computing Environments. Journal of Electronics & Computer Science, Vol. 2, No. 1 (2000) 21-30 C.S. Shin, M.H. Kim, Y.S. Jeong, S.K. Han, S.C. Joo: Construction of CORBA Based Object Group Platform for Distributed Real-Time Services. In Proceedings of the 7th IEEE International Workshop on Object-oriented Real-time Dependable Systems (WORDS) (2002) 229-302 K.H. Kim: Object-Oriented Real-Time Distributed Programming and Support Middleware. In Proceedings of the 7th International Conference on Parallel & Distributed System (2000) 10-20 V. Kalogeraki, P.M. Melliar-Smith, and L.E. Moser: Dynamic Scheduling for Soft RealTime Distributed Object Systems. In Proceeding of the IEEE 3rd International Symposium. on Object-Oriented Real-Time Distributed Computing (2000) 114-121

Fuzzy Synthesis Evaluation Improved Task Distribution inWfMS Xiao-Guang Zhang, Jian Cao, and Shen-Sheng Zhang CIT Lab, Department Of Computer Science & Technology, Shanghai Jiaotong University, Shanghai, 200030 [email protected], (cao-jian, sszhang}@cs.sjtu.edu.cn

Abstract. Flexible task distribution is one of critical enabling technology for adaptive workflow management system (WFMS). Most of workflow systems support role based task distribution, which is driven by authorizations. However, quantitative aspects, such as proficiency, workload and task urgency, are not taken into account. In this paper, based on the role based task distribution model, a fuzzy synthesis evaluation hierarchy is proposed from three aspects including task, role and actor. The indexes of the hierarchy are related to the status or history of the attributes of these three aspects. When a task is offered to too many or too few workers sharing the required role, this hierarchy can be evaluated and used to select the more appropriate worker or add less-qualified workers to improve the performance of workflow.

1 Introduction Workflow management systems (WfMSs) are today used in numerous application domains including office automation, finance banking, healthcare, manufacturing and production, to run their day-to-day applications[1]. Flexible task distribution is its critical enabling technology. There are two basic mechanisms for task distribution in workflow system, push and pull, that is, a work item is pushed to a single person, or a person pulls work items from a common group pool of work items [2]. Most of workflow products including WfMC meta model[1][3-4] support role based task distribution, i.e., the person can perform a task when he/she owns the required role for the task. Current practice neglects quantitative aspects such as proficiency, workload, task urgency, etc. As a result, a task is offered to too many workers sharing the required role, making it hard to select appropriate individuals, or is offered too few workers when the qualified workers are not available, e.g., overloaded or on vacation, making the bad performance (e.g., long duration) of workflow. Selecting suitable individuals to perform a specified task is essential for the effective utilization of human resources. The selectivity of the workers should consider following conditions: Numerous workers may share the same role and they have different skills or capabilities. Some tasks may prefer to be assigned to the specialists rather than less-proficient workers such as software development, or the task is offered to the workers own high credits, i.e., a worker always finishes the assigned tasks on time and in high quality. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 927–934, 2004. © Springer-Verlag Berlin Heidelberg 2004

928

X.-G. Zhang, J. Cao, and S.-S. Zhang

Only considering worker capabilities during task distribution may result in the ratchet effect, i.e., abler workers being assigned heavier workload. Therefore, workload should also be considered to ensure tasks are assigned fairly. A task should not be assigned to a worker who owns the qualified role if he/she is not present or too busy. In some cases, in order to achieve a higher throughput of workflow, less-qualified workers can be allowed to execute the task. If a task is very urgent and important, the task might still be assigned to a worker although he/she already has a full workload. The worker has to reschedule his works. The above conditions show that there are many quantitative factors to assess the suitability of candidate workers. Task distribution should integrate these factors into role-based qualification. Some researches recognized these shortcomings. [5] defined load-balance policies and designed the workload dispatching algorithms to efficiently handle user tasks of workflow. [6] proposed a multi-criteria assessment model to evaluate the suitability of individual workers for a specified task according to their capabilities, social relationships and existing tasks. [7] considered both the deadlines of various work items and the workload and suitability of individual workers. All of them select different influential factors such as workload, capabilities, deadline, etc., and integrate these factors into task distribution, however, from the viewpoint of role-based task distribution, their factors are not complete and do not cover all aspects including role, actor and tasks. Some factors of them overlap existing functions of role-based distribution, such as “existing tasks” [6] is equal to the concept of Separation Of Roles. Furthermore, in these methods, their factors have equal influences or weights for different kinds of tasks under different context of workflow. These weights should be changed dynamically and flexibly to satisfy different status of workflow and business objectives, e.g., if a task is very urgent and the qualified workers are not present, the weights of “Suitability between role and worker” can be decreased to let less-qualified workers perform the task. To support the above requirements, in this paper, based on the typical role-based task distribution model, we propose a fuzzy synthesis evaluation index hierarchy from three aspects including task, role and actor. Quantitative factors and their influences are transferred into the indexes and their weights of the hierarchy. During run-time of workflow, we integrate this kind of evaluation into existing task distribution, which improves performance of selectivity. The outline of this paper is as following. Section 2 gives a basic model for task distribution. In section 3, we describe the method of the fuzzy synthesis evaluation for task distribution. Section 4 gives a typical example to illustrate our method. Last section concludes this paper.

2 Role-Based Task Distribution Model The workflow definition specifies the order in which activities must be executed. An activity (or a task) can be assigned to a user or role. Following is the formal definitions of these three entities and relations related to this kind of assignment. Definition 1: Role can be represented by Role=, where RCapS is a group of capabilities and their degree of the importance. It can be represented by where is the capability set, which is a group

Fuzzy Synthesis Evaluation Improved Task Distribution in WfMS

929

of descriptions of functions and represents executing some type task and operating some tool, w represents the importance of the capability for the role, and its value is between (0,1], if the weight of a capability is 1, it shows that this capability is the most required one for the role. The smaller the weight is, the less important of the capability is. Role inheritance is also supported in this model. That is, if r inherits it satisfies and Obviously, this relation is transitive. Definition 2: Actor can be represented by m=, where RS is the role set that the person can play, MCapS is a group of capabilities and their proficiency. It can be represented by where represents the proficiency, its value is between (0, 1). The smaller the value is, the more proficient the actor is for this kind of capabilities. ProS is the other properties of the person (e.g., age, gender, department, technical position, etc.). Each actor can be assigned several roles. Actually, this assignment is based on the roles’ hierarchies, i.e., if an actor has a superior role, it can do any tasks requiring inferior roles. Definition 3: There are two kinds of activity assignments, activity_role(a, r, num.) represents that an activity a is assigned to a role r and num is the required number of the workers. activity_person(a, m) means that an atomic activity a can be assigned to a specific actor m.

3 Fuzzy Synthesis Evaluation for Task Distribution According to above model, task distribution is related to three aspects: Task, Role and Actor. We establish a fuzzy synthesis evaluation index hierarchy as Fig. 1 from three aspects. In the index hierarchy there are two kinds of indexes: atomic and composite index. The atomic indexes are related to the current status or history of the attributes of these three aspects. Each atomic index has two values, one is the value with its own unit and another is the fuzzy evaluation value without unit. For example, “workload” index has five linguistic scales: Quite small, Small, Middle, Big, Quite big. If a worker has been assigned some tasks with 6 hours/day. The value of “workload” is 6 hours/day and its evaluation value is Big. Composite indexes are composed of atomic indexes or other composite indexes, so they are evaluated according to their sub indexes and only have evaluation value because their sub indexes may have different units. The whole synthesis evaluation value can be computed recursively from the atomic indexes to its parent composite indexes until reaching the root index.

3.1 Fuzzy Evaluation Value of Atomic Index Most of indexes of evaluation hierarchy have a fuzzy concept, such as Importance, Suitability and Availability. For example, the proficiency may be evaluated by five linguistic grades: less verdant, verdant, normal, proficient, and more proficient. Therefore, we adopt fuzzy set theory to describe these indexes. Definition 4: A triangle fuzzy number F can be expressed by a triple where m is the middle value of fuzzy number, is the left span and is the right span of it. L(u) and R(u) are the left and right pertaining function that indicate the degree that u belongs to fuzzy number F.

930

X.-G. Zhang, J. Cao, and S.-S. Zhang

Fig. 1. Synthesized Evaluation Index Hierarchy Of Task Distribution

The evaluation of atomic index can be calculated by three steps: calculation of index’s value, normalization of index’s value and calculation of index’s evaluation value. 1) Calculation of the index’s value Different index has different calculation method. Following gives our approach for each atomic index. 2) Normalization of the index’s value From above calculation, each index’s value has its own unit. For unified evaluation, we standardize all these indexes’ values by converting them into a number between 0 and 1, which is called Normalization. There are two types of indexes’ value, Positive and Negative. Positive type represents that the larger the index’s value is, the better the index is, such as “Suitability of Role” and “Fidelity”. Negative type is on the contrary, that is, the smaller the index’s value is, the better the index is, e.g., “Workload” and “Deadline”. Two types have different methods of normalization. Definition 5: If the domain of Index Is and represents the minimal and maximal value of the index, The width L is defined as and the midpoint M is defined as If is the index’s value calculated in step(1). For Positive type, the formulation of normalization is For Negative type, the formulation of normalization is For example, “Workload” is Negative. The domain of “Workload” is between 0 and 8 hours a day and the current index value is 7 hour, so the normalization of this index is 0.125. 3 ) Calculation of the index’s evaluation value Different indexes have difference linguistic scales, for example, “workload” has five scales: Quite Heavy, Heavy, Normal, Low and Quite Low. After normalization,

Fuzzy Synthesis Evaluation Improved Task Distribution in WfMS

931

the indexes values are standardized into numbers between 0 and 1. According to different scales of indexes, we can retrieve the evaluation value as Fig.2. Locate the normalized value in the x-coordinate and draw a vertical line to get the intersection point to the pertaining functions of different degrees, that is called pertaining degree. As an example in Fig.1, the pertaining degrees for two grades, “Quite Heavy” and “Heavy”, are 0.25 and 0.75 separately. According to the maximum of pertaining degree, the fuzzy evaluation value for the “Workload” is (0,0,0.25).

3.2 Fuzzy Synthesis Evaluation Through above processing, all atomic indexes are converted into fuzzy evaluation value, then, we can calculate the synthesis evaluation value of the index hierarchy. Following, we firstly define the basic arithmetic operations of fuzzy numbers used in our methods, then, give the algorithms for the synthesis evaluation.

932

X.-G. Zhang, J. Cao, and S.-S. Zhang

Fig. 2. The triangle fuzzy number of language value

Definition 6: Two fuzzy numbers are and and is a const number. The arithmetic operations are defined as follows: addition between these two numbers is defined as multiplication between two fuzzy numbers is defined as multiplication between const and fuzzy number is defined as In Fig.1, represents the fuzzy evaluation value of a index, where i is the layer of the index, j denotes the sequence number of its’ parent index in the parent’s layer i-1, and k represents the sequence number in the same layer i. is the weight of the index, and there exists,

where

and

begin and end sequence number of sub indexes of the same parent index value of parent index can be calculated using formula(1).

are the

The

For example,

and and , then the value of their parent index is (0.38, 0.48, 0.68). The synthesis evaluation value of the index hierarchy can be calculated layer by layer using formula (1) until reaching the root index.

3.3 Ranking of Fuzzy Evaluation Value The result of synthesis evaluation is a triangle fuzzy number. To compare with other results, fuzzy numbers should be defuzzified to obtain their Best Non-fuzzy Performance values (BNP). Although various defuzzification approaches have been proposed, this paper adopts the center of area(COA) approach to rank fuzzy numbers because this method is simple, practical and does not involve evaluator preference[8]. The rank value of a fuzzy number can be obtained by If more than one person share the required role, the higher the rank value is, the high opportunity the person have to be allocated the task. Furthermore, we define threshold of rank value. It can be used to further filter the improper persons. For example, a person owns the required role but is overloaded or not present, his/her

Fuzzy Synthesis Evaluation Improved Task Distribution in WfMS

933

rank value of synthesis value is less than the threshold, so the person will not be assigned the task. It can also be used to make more less-qualified persons execute the tasks although they do not have the required roles. That is, when the persons sharing the required role are not enough, the less-qualified persons can execute the task if their rank values are larger than the threshold.

4 Typical Illustration Following is the typical example for our methods. Table 2 gives the weights of all indexes, and Table 3 gives the evaluation value for each atomic index. Detailed calculation process for each atomic index is not discussed for brevity. We suppose that the threshold of rank value is 0.5.

In this example, we suppose that there are two qualified workers share the required role for a task and a less-qualified worker When the task is prepared to be executed by workflow engine, worklist handler firstly selects qualified workers according to the activity assignment activity_role. Because the requite number of the task is 1, worklist handler calls the synthesis evaluation to evaluate these two workers. In evaluation A and B, two workers share the required role, but the evaluation values of “Suitability of Role and Person” are different. The synthesis evaluation value of is (0.67, 0.75, 0.84) and the rank value is 0.53 that is larger than so worker is more suitable than worker So the worklist handler adds the task to the work list of the worker In Evaluation C, the workload of worker is too heavy or is not present, the evaluation value of “Workload” is (0.05,0.075,0.1). The synthesis evaluation value is (0.53, 0.60, 0.66), and the rank value is 0.60. According to Evaluation B and C, worklist handler will add the task to the work list of the worker If the rank values of the qualified workers are both lower than the threshold, worklist handler can’t select an appropriate worker, and notify the project manager.

934

X.-G. Zhang, J. Cao, and S.-S. Zhang

The project manager can change the weights of the index “Suitability Of Role” and its sub indexes to allow the less-qualified worker to execute the urgent task. Worklist handler will evaluated other workers and select a worker with the highest rank value to execute the task. In Evaluation D, suppose worker owns the highest rank value. The evaluation value of “Suitability of Role and Task” is (0.4,0.53,0.66) because of less suitability. However the evaluation value of “Deadline” becomes high and is (0.75,0.83,0.88), and the weights of and become 0.375 and 0.125. Finally, the synthesis evaluation value of this worker is 0.74, that is larger than 0.5, than the task can be allocated by this worker temporarily.

5 Conclusion Workflow management systems today support push and pull mechanisms for task distribution in workflow system. Most of them are role based task distribution and neglect quantitative aspects such as proficiency, workload, task urgency, etc. In this paper, we propose the fuzzy synthesis evaluation from three aspects affecting task distribution. During workflow execution, this kind of evaluation is integrated to the traditional role-based selectivity. This mechanism has been used in two projects from Chinese high technology plans (863 plan). Further research is to analyze the history of task distribution and current status of workflow to determine the weights of indexes automatically. Acknowledgement. This research is supported by two Chinese high technology plans (863 plan). Granted numbers are 2001AA415310 and 2001AA412010.

References 1. 2. 3. 4. 5. 6. 7.

8.

WfMC. The Workflow Reference Model. Workflow Management Coalition, Tech Rep: TC00-1003,1995. http://www.wfmc.org/standards/docs/tc003v11.pdf. WMP van der Aalst. A reference model for team-enabled workflow management systems. Data&Knowledge Engineering, 2001, 38(3): 335–263. Staffware. Staffware Process Suite White Paper. Staffware PLC, Tech Rep, 2001. http://www.staffware.com/ downloads.htm. IBM MQSeries. MQSeries Workflow. IBM MQSeries, Tech Rep, 2000. http://www4.ibm.com/software/ts/mqseries/workflow. Baoyan Song, Ge Yu, et al., An Efficient User Task Handling Mechanism Based on Dynamic Load-Balance for Workflow Systems, APWeb 2003, LNCS2642, pp483-494, 2003. Minxin Shen, et al., Multi-Criteria Task Assignment in Workflow Management Systems, Proceedings of the 36th Hawaii International Conference on System Sciences(HICSS’03). A. Kumar, W.M.P van der Aalst, et al., Dynamic Work Distribution in Workflow Management Systems: How to balance quality and performance? Journal of Management Information Systems, Vol. 18 No. 3, 2002(18), 3: pp. 157 – 194. Dubsis D and Prade H. Fuzzy sets and system. New York: Academic Press, 1980.

A Simulation Study of Job Workflow Execution Models over the Grid Yuhong Feng1, Wentong Cai1*, and Jiannong Cao2 1 School of Computer Engineering Nanyang Technological University, Singapore 639798 2 Department of Computing Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong

Abstract. Job workflow application execution over the Grid presents significant challenges to current existing job workflow execution models (JWEM). In this paper, we propose executable codes as dynamic services, which acts as part of Grid resources. Based on service deployment mechanism and where the control thread is, a classification of the JWEMs is presented in this paper. Performance evaluation and comparison studies are carried out on the execution models according to this classification. Our experimental studies show that the distributed job workflow execution based on dynamic services can achieve better performance than all other models. Keywords: Grid computing, job workflow execution model, mobile agent, code mobility, dynamic service

1

Introduction

Grid computing has emerged as a new important field. A Grid is a hardware and software infrastructure that provides a flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions and resources (what is called virtual organizations (VO) in the same paper) [5]. Job workflow over the Grid includes the composition of a complete job from multiple sub-jobs whose executions are distributed, specification of the order of the sub-jobs to be executed, and the rules that define the interactions between the jobs. Web Service Flow Language (WSFL) [10], which is proposed by IBM, is one of the methods to describe web service workflow. However, workflow execution defined in web services depend the web service workflow engine to intermediate at each step of the job execution [6]. It uses a typical Client/Server workflow execution model. The workflow engine relaying the data between services will produce a lot of unnecessary traffic around the engine. Grid Service Flow Language (GSFL) [11], a workflow framework for Grid service, adds extensions over web services in order to address Grid specific needs. It provides direct communication mechanism between Grid services using * Contact Author: [email protected] M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 935–943, 2004. © Springer-Verlag Berlin Heidelberg 2004

936

Y. Feng, W. Cai, and J. Cao

event-driven technique. It allows services to deliver messages to each other asynchronously, thus obviating the need for a centralized workflow engine relaying the data between services. The control thread is the thread that handles the control flow and data flow among sub-jobs of a job workflow. Although the peer-to-peer, direct communication mechanism improves the system performance, however, its control flow still stays with the workflow engine. This requires the control messages transferred over the Internet for each job execution, thus making the workflow engine a single point of failure for the whole system. Taking the dynamic aspect of the Grid into consideration, a distributed job workflow execution model over the Grid is promising. With distributed job workflow execution, the control thread may migrate with the data between the services. However, for distributed job workflow execution, the control thread migration may incur runtime overhead. To adapt to the Grid resource heterogeneity, more codes might be required to be transferred over the Internet. Therefore, to characterize the cost/benefit tradeoff between models, a simulation study was carried out to simulate the job workflow execution over the Grid. This paper is organized as follows: a classification of job workflow execution model will be given in section 2. The simulation model for job workflow execution is described in section 3. Simulation results will be given and analyzed in section 4. Finally, Section 5 concludes the paper with a discussion of future work.

2 2.1

Classification of Job Workflow Execution Models Static vs. Dynamic Services

Services are encapsulated functionalities that hide the details of implementations and provide APIs for usage. Services are evolved from components. By incorporating XML technologies and open standards into services, services become what we called web services. What we concern here is the service deployment, i.e., the process of instantiating the service into a computational environment. The service deployment can be location dependent or location independent. The deployment is location dependent when it is bound to a certain host. The deployment is location independent when services can be deployed on a set of hosts on the runtime. The services deployed in the location dependent way are called static services, while the services deployed in location independent manner are named as dynamic services [9].

2.2

Client/Server vs. Distributed Workflow Execution Model

According to where the control thread is, the job workflow execution model (JWEM) can be classified into Client/Server model and distributed model. In the Client/Server model, the control thread is fixed on a certain host, and the communications between the control thread and services are based on remote procedure call (RPC) style protocols. In distributed model, the control thread

A Simulation Study of Job Workflow Execution Models over the Grid

937

is not fixed on a certain host, instead, it will migrate from host to host carrying the intermediate sub-job execution results. In the Client/Server model, the main control thread runs on a fixed host and relays the intermediate computation results between services. This will produce a lot of unnecessary traffic around the host of the main control. In addition, its control flow stays with the workflow engine. This requires the control messages transferred over the Internet for each sub-job execution, thus making the workflow engine a single point of failure for the whole system. The distributed execution model can optimize the problem by directing the communication between data dependent services and migrating the control thread to the service host. When the sub-jobs in the job workflow require distributed large data sets as input, dynamic services can further optimize system performance by deploying a service on the host near the data repositories and the previous service host. However, the control thread migration results in runtime overhead: the time for stopping execution, serialization, transmission over the Internet, deserialization and creation of execution on destination service host. To adapt to Grid resource heterogeneity, the control thread code may have to consider all possibilities. This may increase code size. When the code size is large, its transmission time over the Internet may offset the benefit that distributed job workflow execution brings. When dynamic services are used, there are two additional overheads: service/functional code transmission time and the time for computational resources discovery.

2.3

Mobile Agent Based Distributed Workflow Execution Model

Code mobility is defined as the movement of the executable code over networks towards the location of needed resources for execution [3]. Mobile agent (MA) is a program which represents a user in a network and is capable of migrating from node to node, performing computations on behalf of the user [1]. “Codeon-demand” (COD) is another design paradigm for code mobility. Applications developed using this paradigm can download and link on-the-fly part of their codes from remote hosts that act as code servers [13] Current mobile agent systems can be classified into two categories: monolithic mobile agent and light-weight mobile agent. Mobile agent implementations can be classified into two part: one is the non-functional implementation, such as codes for mobile agent communication, intelligence and migration. These are common to all mobile agents. The other is the functional implementation, such as codes for the sub-job implementations on a certain host. When a mobile agent carries all the implementations migrating from host to host, we call it monolithic mobile agent. When a mobile agent carries its non-functional implementations along its itinerary, or only its blueprint (the description of its functional subjob to be executed) when it migrates, the mobile agent is called light-weight mobile agent [3]. For light-weight mobile agents, functional implementations are downloaded from code servers and are linked into the current computation environment on demand. It is obvious that light-weight mobile agent can reduce the

938

Y. Feng, W. Cai, and J. Cao

code volume transferred on the network. But it also has additional runtime overhead for code discovery, service/functional code transmission over the Internet and computational resource discovery.

3

Simulation Model

The runtime overhead for distributed job workflow execution models and the dynamic services may degrade the performance merits they can bring. To characterize the cost/benefit tradeoff between models, a simulation study was carried out to compare the makespan of the execution models. The makespan is the time between the job submission and the return of final result. Assume a sub-job is represented as a vertex and the data flow between two sub-jobs is represented as a directed edge between the corresponding two nodes. Sequential sub-jobs can be represented as the pipeline job workflow. More generally job workflows can be described by Directed Acyclic Graph (DAG), G = (V, E), where V is set of vertices and E set of edges.

3.1

Grid Resource Model

When executable codes are provided as dynamic services, Grid resources are composed of computational resources, data repositories, code repositories and network. Data Repositories. Data Grid is to provide management services for distributed data over the Grid. The management services include the data access, metadata information, data replica, replica selection and filtering. For a certain data set, there may be several replicas available, and the repositories having the data replicas are represented as a set Computational Resources. For a certain sub-job to be executed, there will be multiple computational resources satisfying the computation requirements available, which can be represented as a set Code Repositories. Code repository is part of Grid resources storing executable codes to provide “code-on-demand”. The code repository will provide services for replica location retrieval, code request and code transmission. For a certain required code, there will be multiple code repositories available, which are represented as a set Network resources. Grid resources are shared among different organizations, which may involve communications not only within LAN, but also over WAN. The data transferred over WAN will be one of the major performance factors affecting the system. In all, a Grid is modelled as where N is the matrix capturing network characteristics among nodes in and Job workflow applications are composed of jobs which require locating data and/or services at the runtime. There may be data dependencies among the sub-jobs. Job workflow execution is to map the application model G = (V, E) to Grid model

A Simulation Study of Job Workflow Execution Models over the Grid

3.2

939

Assumptions and Parameters

A pipeline job workflow is assumed. will be executed on The simulation is to compare the makespan of execution of the job workflow J.

Overhead cost assumptions are given in Table 1. Serialization and deserialization time increases as the agent size increases (provided that the agent code’s call stack size are the same). However, Comparing to the data transmission time and sub-job execution time, the time for the agent serialization/deserialization, creation and destroy is obviously very small. So, they are not considered in the simulation.

3.3

Simulation of Networks

Grid resources are shared among different organizations over the Internet. The workflow execution over the Grid may involve multiple distributed resources, so the topology of the Internet may affect the execution performance. Internet can be viewed as a collection of routing domains. A routing domain is a group of nodes (routers, switches and hosts) with single administration for routing information and policy sharing. There are two kinds of routing domains: stub domain and transit domain. Stub domains are generally campus network or LANs, while transit domains are always WAN or metropolitan-area networks (MAN). Existing topology models include random graph model, transit-stub model and tier model [4]. Since Grid resources are constructed hierarchically, the transit-stub mode and tier model are preferred. Here, we chose the tool GTITM, created by Zegura et al [14], to construct the transit-stub model topology. In our experimental study, the transit-stub graph generated has 600 vertices and 1228 edges, with average degree of 4.09. Network simulator (NS-2) is a discrete event simulator, which supports the simulation of TCP, network traffic, routing, and multicast protocols over wired and wireless (local and satellite) networks [12]. In our experiments, NS2 was used to set the bandwidth and delay for the links generated by GT-ITM and

940

Y. Feng, W. Cai, and J. Cao

to generate traffics on the topology to simulate the real network. Data and codes were transferred over the simulated network and the transmission time was calculated for different workflow execution models. sgb2hierns [12] is used to convert the topology generated by GT-ITM to ns-2 hierarchical format. A modification is made on the file generated by sgb2hierns in our simulation study: bandwidth to the links is not set identically. Links on the Internet are classified into three categories: the WAN link, access link and stub link. A link is a WAN link when both of its vertices belong to transit domains. A link is an access link when one of its vertex belongs to a transit domain, and the other belongs to a stub domain. A link is a stub link when both of its vertices belong to the stub domain. In our experiments, the bandwidth of a WAN link is set to 50 Mbits, the bandwidth of an access link is set to 5 Mbits, and the bandwidth of a stub link is set to 1 Mbits. The delay on the link is set according to the link distance. Random traffics are generated on the networks described above. The possibility of a node being a constant bit rate (CBR) traffic generator is set to 0.8. When a CBR traffic generator belongs to transit domains, the package size is set according to normal distribution with average of 1500 bits and standard deviation of 200 bits. When a CBR traffic generator is an access node, the package size is set according to normal distribution with average of 1000 bits and standard deviation of 100 bits. When a CBR traffic generator is a stub node (excluding the access node), the package size is set according to normal distribution with average of 300 bits and standard deviation of 100 bits. For each traffic generator, the package will be sent every 1 second. The FTP protocol is used in the data transmission in the simulation. The TCP protocol is used for lookup services which include data repository lookup, code repository lookup, and service lookup.

3.4

Simulation of Job Workflow Models

A pipeline workflow application with 5 sub-jobs was assumed in the experiments. The input data of the first sub-job were from a data repository as well as the client. The input data of the other 4 sub-jobs were from both a data repository and the previous sub-job execution. The final result would be returned to the client. The data set volume from data repository are assumed to be 3 times of that of output data for each sub-job execution. First, for each data set required by a sub-job, there will be 5 replicas. So 25 different nodes were chosen as the data repositories. Second, for each static service, there will also be 5 replicas, so another 25 different nodes were selected as static services. Third, for each sub-job code, there will be 10 replicas. So 10 nodes were selected as code repositories before execution. The selection was random and all the chosen nodes belong to stub domains and are not access nodes. In the following experiments, the initial input/output data volume is set to 1 Mbytes. The operations described below for each workflow model are repeated by adding input/output data volumes by 1 Mbytes, until it reaches 7 Mbytes.

A Simulation Study of Job Workflow Execution Models over the Grid

941

The agent code size is set to be 20 Kbytes, the functional code size for subjob execution is initiated to be 20 Kbytes too. The probability that a sub-job execution needs additional supporting software is 0.05. The package or software size is set randomly to be between 1 Mbytes and 5 Mbytes. Client/Server with static service model: Client will lookup the services and data repositories, select the nearest service, and the nearest data repository for the selected service if it is the first sub-job execution. The client will send input data to the selected service, so does the corresponding data repository. After sub-job execution, the result will be returned to the client. At the same time, the client will select the service and data repository for next sub-job execution. The same operations will be repeated 5 times with different service hosts and data repositories. Client/Server with dynamic service model: For each data repository, a node belonging to the same stub as the data repository will be selected as the service host. The selected node should be neither an access nor a data repository node. During the selection, traffic state of the link between the data repository and the service host will be considered. The code repository nearest to the selected computational resource will be selected. Before each sub-job execution, the required codes will be downloaded from the selected code repository. This operation will be invoked at the same time with the data set being transferred from data repository. Other operations are similar to the Client/Server with static service model. Mobile Agent (MA) with static service model: MA will move from host to host carrying input data (the first step) or intermediate sub-job execution results and the mobile agent code. The algorithm for destination host and data repository scheduling is the same as the one used for Client/Server with static service model. After all the sub-jobs are executed, the mobile agent will return to the client. Monolithic mobile agent (MMA) model: MMA will move from host to host carrying input data (the first step) or intermediate sub-job execution results and all the executable codes. The algorithm for computational resource selection is the same as the one used in Client/Server with static service model. After all the sub-jobs are executed, the mobile agent will return to the client. Light-weight mobile agent (LMA) model: LMA acts similar to MMA. The difference is that LMA does not carry sub-job execution codes on its migration. The specific sub-job code will be downloaded from code repository on demand. When client input data or intermediate results reach the destination host, input data from corresponding data repository and corresponding sub-job code will be transferred to the destination host simultaneously. The resource scheduling policy is similar to that used in Client/Server with dynamic service model. Other operations are similar to MMA model. To further evaluate overheads of the models that use dynamic services, experiments were repeated for these models by increasing the service/functional code size exponentially from 1 Kbytes to 3125 Kbytes, with output data volume

942

Y. Feng, W. Cai, and J. Cao

and data set volume fixed at 3 Mbytes and 9 Mbytes respectively. Note that increasing service/functional code size will not affect the makespan of the models that use static services.

Fig. 1. Makespan of Different Job Workflow Execution Models

4

Simulation Results and Analysis

The simulation results are shown in Figure 1(a) and (b) respectively. Figure 1(a) shows that the makespan of five execution models increase as the data volume increases. From this result, we can see no matter what job execution model, Client/Server or distributed, is used, when the sub-job executions are provided as static services, the makespan is always higher than that of the other three models based on dynamic services. When the code volume is increased, as shown in Figure 1(b), the makespan of MMA model increases faster than that of the LMA model. Since in our simulation, the sub-job execution code may require additional supporting package (the possibility is set 0.05, and the package size is set from 1 Mbytes to 5 Mbytes randomly.). From the result, we can see that when service/functional code size increases, and the makespan of MMA model will increase greatly correspondingly. However, the makespans of LMA model and Client/Server with dynamic service are less affected. Based on the simulation results shown in both Figure 1(a) and (b), we can draw the conclusion that distributed workflow execution model based on LMA has better performance than all other models.

5

Conclusions and Future Work

Based on the idea of having executable codes as grid resources, A classification of job workflow execution models over the grid has been presented in this

A Simulation Study of Job Workflow Execution Models over the Grid

943

paper. Considering the benefits and overhead of different execution models, we carried out a simulation study to compare the makespan of different execution models: Client/Server based on static services, mobile agent based on static services, Client/Server based on dynamic services, monolithic mobile agent and light-weight mobile agent. The sub-job execution over the Grid was simulated for different models. This includes the simulation of resource lookup, resource scheduling, data/code transferred over the Grid. Based on the simulation results, when sub-job execution involved large data sets from distributed data repositories, light-weight mobile agent get better performance than other models. Based on this conclusion, our future work will focus on proposing a distributed job workflow execution framework based on light-weight mobile agent for job workflow execution over the Grid.

References 1. Ajanta Mobile Agents Research Project. http://www.cs.umn.edu/Ajanta/ 2. R. Brandt. Dynamic Adaptation of Mobile Code in Heterogeneous Environments. Technical Report, Technische University Munich, February, 2001. 3. R. Brandt and H. Reiser. Dynamic Adaptation of Mobile Agents in Heterogenous Environments. 2001 Mobile Agents: 5th International Conference, Dec 2001. 4. Ken Calvert, Matt Doar, and Ellen W. Zegura. Modeling Internet Topology. IEEE Communications Magazine, June 1997. 5. Ian Foster, Carl Kesselman, and S. Tuecke. The Anatomy of the Grid. International Journal of High Performance Computing Applications, 15(3):200–222, 2001. 6. D. Gannon, R. Ananthakrishnan, S. Krishnan, M. Govindaraju, L. Ramakrishnan, and A. Slominski. Grid Web Services and Application Factories. in F. Berman, G. Fox and A. J. G. Hey edited, Grid computing: Making the global infrastructure a reality, Wiley, June 2003. 7. L. Ismail and D. Hagimont. A performance evaluation of the mobile agent. In Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, pages 306–313, 1999. 8. W. Jie, W. Cai, and S. J. Turner. POEMS: A Parallel Object-oriented Environment for Multi-computer Systems. The Computer Journal, 45(5):540–560, 2002. 9. M. Keidl, S. Seltzsam, and A. Kemper. Flexible and Reliable Web Service Execution. In Proceedings of the 1st Workshop on Entwicklung von Anwendungen auf der Basis der XML Web-Service Technologies, pages 17–30. July 2002. 10. Frank Leymann. Web services flow language. May 2001. http://www-3.ibm.com/software/solutions/webservices/pdf/WSFL.pdf 11. S. Krishnan, P. Wagstrom, and G. von Laszewski. GSFL: A Workflow Framework for Grid Services. http://www-unix.globus.org/cog/projects/workflow/gsfl-paper.pdf 12. Network Simulator. http://www.isi.edu/nsnam/ns/ 13. Jan Vitek and Christian Tschudin (Eds.). MOBILE OBJECT SYSTEMS: Towards the Programmable Internet. Second International Workshop, MOS’96, July 1996. 14. Ellen W. Zegura, Ken Calvert, and S. Bhattacharjee. How to model an internetwork. In Proceedings of IEEE Infocom ’96, San Francisco, CA., 1996.

An Approach to Distributed Collaboration Problem with Conflictive Tasks* Jingping Bi, Qi Wu, and Zhongcheng Li Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, P. R. China {jpingbi, abewu, zcli}@ict.ac.cn

Abstract. Distributed collaboration problem (DCP) is a kind of distributed resource allocation problem. Previous works give the solution of DCP with nonconflictive tasks. In this paper, we present a completely distributed algorithm (CDA) to solve DCP with conflictive tasks where each task requires exactly two resources. We call this the measurement collaboration problem (MCP). We prove the aliveness and correctness of CDA and give the run-time performance of CDA by simulation. To provide a baseline of performance parameters, we design a simple single-wait-state algorithm (SWSA). The simulation indicates CDA gradually has higher efficiency than SWSA, especially when the tasks are highly conflictive. CDA can also be utilized in many aspects, such as disaster rescue, distributed agent collaboration, objects position and so on.

1

Introduction

Distributed resource allocation problem (DRAP) is a pervasive problem. The basic model for this problem is many distributed processes competing for limited resources. Different assumptions and restrictions lead to distinct solutions and algorithms for the problem. When the resource number is 1, DRAP becomes distributed mutual exclusion problem (DMEP) where only one process can be permitted to use the resource at any time. The solutions for DMEP can be divided into token-based [1] and non-token based [2]. A non-token-based distributed algorithm is generally called a completely distributed algorithm. The dinning philosophers’ problem (DHP) by E. W. Dijkstra is another classic problem for resource allocation problem [3]. K. M. Chandy et al extend the problem and call it the drinking philosophers’ problem [4]. By introducing dynamic priority between neighboring philosophers, Chandy et al present a determinate, symmetric solution for this problem under a completely distributed environment. With the flourishing of agent technology, another kind of DRAP is presented during the collaboration of agents under the distributed environment, which is called distributed collaboration problem (DCP). Compared with the philosopher problem, the agent is the resource (chopsticks), and the task is the philosopher. The main difference * This work has been supported through funding by National Natural Science Foundation of China Grant No. 90104006 and Hi-Tech Research and Development Program of China (the 863 Project) Grant No. 2001AA112135 and 2001AA112091.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 944–953, 2004. © Springer-Verlag Berlin Heidelberg 2004

An Approach to Distributed Collaboration Problem with Conflictive Tasks

945

lies in that it is the resource, not the task, has consciousness. That is to say, the chopsticks will think and negotiate which philosopher can use them. According to the variance property of tasks and the power of conflicts, P. J. Modi et al divide DCP into 4 classes from easy to difficult and mapped DCP to dynamic distributed constraint satisfaction problem (DDCSP) and give out the solution for the former 3 classes [5] [6]. In the former 3 classes, there are no conflicts among tasks at all. But the real world is full of conflictive tasks. The distributed collaboration problem with conflictive is called the 5-th class DCP (DCP5). In this paper, we give a distributed algorithm for special case of DCP5 where the execution of each task just needs 2 nodes (agents). We call this special case the Measurement Collaboration Problem (MCP) because it origins from the measurement of one-way metrics where a measurement task needs to be performed by the two ends. Our previous paper has given the solution of MCP in arbitrator-based environment [7]. In spite of one-way measurement, the solution of MCP can also be utilized in many aspects, such as disaster rescue, distributed agent collaboration, objects position and so on [8][9].

1.1 Measurement Collaboration Problem Suppose that there are n nodes and every node can act as either a requester (when the node is the sender in measurement) or a collaborator (when it is a receiver). A measurement task should be collaboratively completed by a requester and a collaborator. So this task can be performed only after the two nodes establish concurrence. A node randomly generates a task and acts as a requester. That is to say, the requester selects collaborator from the other n-1 nodes and negotiates with it to perform the task. A task can be generated only when a node is not executing the task or negotiating with other nodes. At any time, a node cannot execute multiple tasks, but it may negotiate with more than one node to choose which task will be executed. Information exchange among nodes is performed by sending and receiving messages. Suppose that the messages are transferred as follows: messages are transferred in sequence without errors or losses; messages can reach the destination within messages are transferred asynchronously. Other assumptions include: each node runs continuously without any interruption; each measurement task can be finished within

2

Algorithm Specification

2.1 Messages Definition Req

Request message. It is used by one node to apply to another for executing tasks. We term the sender of the Req message as requester and the receiver as collaborator.

946

SU

Rst Ack LD

J. Bi, Q. Wu, and Z. Li

Speed up message. A requester uses it to indicate that no node requests to itself. A SU message can’t be sent unless the Req message is sent. Reset message. A requester uses this message to cancel its own request and a collaborator uses it to reject the request. Acknowledgement message. This message can only be sent by a collaborator to accept a request. Loop detection message. The message is used by a node to report topology to its requester in loop detection.

2.2 Node States idle busy wait1 wait2 wait3 wait4

indicating that a node is neither executing a task nor negotiating with other nodes. indicating that a node is executing a task. indicating that a node has just received one or more Req messages from other nodes. indicating that a node has sent a Req message just now. indicating that no other node requests to itself. indicating that a node receives one or more Req messages after it has sent out one.

2.3 Variables in a Node requests priority distance nodes

storing the received Req messages which have not been responded. storing the highest priority of nodes in the loop detection. storing the distance (in hops) between the latest found node and the highest priority node. storing the topology a node knows but has not reported yet.

2.4 The State Transition of a Node 1) Public operations Basic operations. When a node sends a Req message, it records the sequence number of the node to the first variable of the nodes array. If the node doesn’t reply immediately to a Req message, the message will be added to requests. Default operations. When a node receives a message that it does not know how to handle, it takes the following default operations. If the message type is Req or SU, Rst message is used as a response. Otherwise the node just keeps silent. 2) In busy state The node uses default operation in all received messages. After the task finished, the node returns to idle automatically. 3) In idle state A node may send a Req message to any other nodes and set the timer to and switch to wait2 state.

An Approach to Distributed Collaboration Problem with Conflictive Tasks

947

If a node receives a Req message, it sets the timer to and switches to wait1 state, where is the sending moment of the Req message. 4) In wait1 state If a node receives a Req message before the timer times out, it chooses the public operations. Otherwise, it takes the default operations. If a node receives a Rst message from its requesters, it removes the corresponding Req from requests. If the requests array is empty after the remove, the node switches to idle. If a correct SU message is received (the sender of SU has just sent a Req message), the node uses an Ack message as a response and reject all other requests. 5) In wait2 state If a Rst message from it collaborator is received, the node switches to idle. If a Req message is received, the node switches to wait4. If the timer of the node times out, the node sends a SU message to its collaborator and switches to wait3. 6) In wait3 state If a node receives a Rst message from its collaborator, it switches to idle. If a node receives an Ack message from its collaborator, it switches to busy. 7) In wait4 state If a node receives a Req message before the timer times out, it chooses the public operations. Otherwise it takes the default operations. If a node receives a Rst message from its collaborator, it switches to wait1. If a node receives a Rst message from its requesters, it removes the corresponding Req from requests. If the requests array becomes empty after this, the node sends SU to its collaborator and switches to wait3. If a node receives a SU message from its requesters, the node takes the following 4 steps: replying an Ack message, rejecting all other requests, canceling its request, and switching to busy. If the timer of the node times out and the node has only one requester, it executes the loop detection operations. (See the next sub-section). Figure 1 shows the finite state machine of a node, where interrogation mark “?” means sending and exclamatory mark “!” means receiving. Events which trigger states change and the corresponding actions are divided by solidus “/”. Some details which do not affect states change are ignored for simplification (Such as the timeout in wait1 and wait4). 2.5

Loop Detection and Dismantlement

Definition 1 (measurement request graph). Measurement request graph G=(V, E) is a directed graph where V is the node set and each task matches a directed edge in E. Definition 2 (isolated loop). An isolated loop

is a connection set of G,

where sect

and

and

don’t inter-

948

J. Bi, Q. Wu, and Z. Li

Fig. 1. States transition of a node

If it exists an isolated loop L in G and the timers of all nodes in have timed out, all nodes in reject new requests and wait for their requesters’ actions. Unfortunately, their requesters (which are themselves) are also waiting, which causes a deadlock. When a node is in an isolated loop, it matches the following three cases. The state of the node is wait4. The timer of the node has timed out. The length of requests array in the node is 1. If a node meets these cases, we call it a potential trouble (PT) node. Otherwise we call it a healthy (HE) node. A PT-node must have one and only one requester, which is defined as its superior. The operations of loop detection and dismantlement are defined as follows. 1) When a node wants to send out a LD message, it creates a LD message, copying the nodes array to the message body. Next it sends the message to its superior and empties the nodes array. 2) If a HE-node changes to a PT-node, it updates its priority and distance variables according to their definitions. Next, it checks the nodes array and executes the following operations. If the node itself appears in the nodes array, we can get that the node is in an isolated loop and it is the only one who knows the existence of the loop because it has never spread LD messages before. Therefore, the mission of breaking the deadlock falls on the node. It sends a SU message to its collaborator and a Rst message to its superior and switches to wait3, which dismantles the loop. Otherwise, the node sends a LD message to its superior. 3) If a node receives a LD message from its collaborator, it concatenates its nodes array and the array carried in the LD message and updates the priority and distance. Next, if the node is a PT-node, it does the following operations. If the node finds itself in the received LD message, which indicates that a directed loop has been found, it begins to dismantle the loop. Because the node has sent LD messages before, it is not impossible that other nodes in the loop have detected the loop. At this time, these nodes must have consistent actions.

An Approach to Distributed Collaboration Problem with Conflictive Tasks

949

Let’s define the matching rules by dismantling from the node with the highest priority. That is, the node chooses one of the following three actions according to its relative position to the highest priority (HP) node. If the distance variable is an odd number, the node sends a Rst message to its collaborator and a LD message to its superior and switch to wait1. If distance is an even number and the node’s collaborator is not the highest priority node, the node sends a SU message to its collaborator and a Rst message to its superior and switches to wait3. Otherwise, the node sends a Rst message to its collaborator and a LD message to its superior and switches to idle. Otherwise, it sends a LD message to its superior.

3

Feasibility Analysis

Lemma 1. To any node s in CDA, if it is not in a loop, it will switch to idle in finite time. Proof. We defer the proof for an extended version of this paper. Definition 3 (the requesters’ operator). Let the operator Asc(s) be the requester set of node s. To the node set S, we let Theorem 1 (Aliveness). Any node in CDA will return to idle state in finite time. Proof (by contradiction). Suppose that compose a directed loop and never switches to idle. If some nodes in switch to idle in finite time, the loop will be dismantled and will switch to idle according to lemma 1. Accordingly, any node in the loop will never be in idle state. Let and U = Asc(S) – S . Suppose a node is in a loop Because the out-degree of is 1, we get Finally, we can get in the same way, which disobeys Therefore, all the nodes in U are not in a loop and all of them will switch to idle in finite time, becomes an isolated loop. According to the state transition of a node, these nodes will detect the isolated loop. The messages transfer correctly and losslessly, so the loop will be detected by some nodes in k · No matter which one of the three actions the node takes, either it or its superior will break the loop. Therefore, will switch to idle in a finite time because it is not in the loop, which conflicts with the hypothesis. As a result, any node in CDA will return to idle state in finite time. Theorem 2 (Correctness). CDA is correct.

950

J. Bi, Q. Wu, and Z. Li

Proof. CDA is correct means that no node takes more than one measurement task at any time. This can be divided into three cases: Firstly, as a requester, a node can not take collaborative measurements with two nodes simultaneously. Secondly, as a collaborator, a node can not take collaborative measurements with two nodes at the same time. Thirdly, a node can not take collaborative measurements with one node as a requester and with another node as a collaborator concurrently. The collaborative measurement begins with an Ack message. The receiving (or the sending) of an Ack message means the beginning of a collaborative measurement. Ack messages can only be the response to SU messages, so the first case equals to proving that a node at most sends one SU message before it returns to idle. A node in wait3 can only have two cases of state transition: so a node at most passes by wait3 once. Because only when a node switches to wait3 can it send a SU message, the first case is tenable. Because a node must switch to busy once it sends an Ack message and the busy node doesn’t respond Ack to any messages, the second case is tenable. A node must switch to wait3 once it sends a SU message and the node does not send Ack messages in the following two possible state transfer ways: so the third case is tenable. As a result, CDA is correct.

4

Simulation

In this section, we will give the average performance of CDA by simulation. To provide the baseline of performance parameters, we design a Single-Wait State Algorithm (SWSA).

4.1 Single-Wait State Algorithm Message Types. Req—Request the collaborative measurement; Ack—Agree to the collaborative measurement; NAK—Refuse the collaborative measurement. States. A node has three states: busy, idle and wait, where busy and idle have the same meaning as those in CDA. A node in wait state indicates that the Req message sent from the node hasn’t been responded yet. State transition. A node generates tasks in idle state and switches to wait at once after it sends a Req message. If a node in idle state receives a Req message, it uses an Ack message as a response. If a node in wait or busy state receives a Req message, it uses a NAK message as a response. If a node receives an Ack message in wait state, it switches to busy. If a node receives a NAK message in wait state, it switches to idle.

An Approach to Distributed Collaboration Problem with Conflictive Tasks

951

4.2 Experiment Results In the simulation, a task is created at random time and with random requestercollaborator pairs. We implement SWSA and CDA by event driven, which guarantees the time order of events strictly. The parameters in the simulation are given in table 1. We change the values in Table 1 and find the results are similar.

The experiment focuses on discussing the performance of the two algorithms when given different number of concurrent tasks. We create 10000 tasks for each simulation. To different c, we run 100 simulations for each algorithm, and the results are shown in figure 2 and 3. Figure 2 compares CDA with SWSA about the proportions in executing tasks, where tasks execution proportion is defined as

X-axis in figure 5 is the number of tasks created concurrently, and y-axis is We can see, in the case of c being small, the difference of between CDA and SWSA is not large. With the increase of c, of the two algorithms tends to decrease but the decreasing speed of SWSA is larger than that of CDA. As a result, we draw the conclusions, when the number of concurrent tasks is large, CDA has better task execution ability than SWSA.

1

2

The revised negative exponent distribution is to add the maximum limitation on the basis of negative distribution where all values greater than the maximum are taken as the maximum. The mean is the mean of negative exponent distribution (before revising).

952

J. Bi, Q. Wu, and Z. Li

Figure 3 gives the efficiency in handling conflictive tasks of the two algorithms, where the confliction handling efficiency is termed as:

Fig. 2. Tasks execution proportion

Fig. 3. Efficiency of handling conflictive tasks

A SWSA node in wait state refuses requests from other nodes, so the ability of confliction handling efficiency of SWSA is poor. From the figure we can see, with both tasks confliction proportion and concurrent tasks number increasing, the confliction handling efficiency of SWSA becomes worse, while that of CD A slightly increases. Thus we get that CD A has good ability to handle conflictive tasks.

5

Conclusions

Measurement collaboration problem is a kind of distributed collaboration problem with conflictive tasks. In this paper, we give the completely distributed algorithm CDA of measurement collaboration problem, which avoids the deadlock by isolated loop detection and dismantlement. The aliveness and correctness of the CDA is proven in the paper. By simulation, we validate the efficiency of the CDA and draw the following conclusions: in the case of the number of concurrent tasks created randomly being small, the efficiency of the CDA is not higher than that of SWSA. With the increase of the number of concurrent tasks, the high efficiency of CDA gradually becomes remarkable. The characteristics of the CDA also indicate that CDA is more applicable to the cases with higher concurrency such as disaster rescue and objects position.

An Approach to Distributed Collaboration Problem with Conflictive Tasks

953

References 1.

2. 3. 4. 5.

6.

7.

8. 9.

Naimi, M., Trehel, M., Arnold, A.: A log(n) Distributed Mutual Exclusion Algorithm Based on Path Reversal. Journal of Parallel and Distributed Computing, Vol. 34, No. 1. (1996) 1-13 Lodha, S., Kshemkalyani, A.: A Fair Distributed Mutual Exclusion Algorithm. IEEE Transaction on Parallel and Distributed Systems, Vol. 11, No. 6. (2000) 537-549 Dijkstra, E.W.: Two starvation free solutions to a general exclusion problem. EWD 625, Plataanstraat 5, 5671 AL Nuenen, The Netherlands. Chandy, K.M., Misra, J.: The Drinking Philosophers Problem. ACM Transactions on Programming Languages and Systems, Vol. 6, No. 4. (1984) 632-646 Modi, P.J., Jung, H., Tambe, M., Shen, W.M., Kulkarni, S.: Dynamic Distributed Resource Allocation: A Distributed Constraint Satisfaction Approach. Proceedings of ATAL’01(Agent Theories, Architectures, and Languages). Seattle, WA, U.S.A. (2001) 264-276 Yokoo, M., Durfee, E.H., Ishida, T., Kuwabara, K.: The Distributed Constraint Satisfaction Problem: Formalization and Algorithms. Journal of Knowledge and Data Engineering, Vol. 10, No. 5. (1998) 673-685 Wu, Q., Huang, J., Bi, J.: Arbitrator-based Algorithm for Measurement Collaboration Problem. Proceedings of the 2003 International Conference on Computer Networks and Mobile Computing (ICCNMC 2003). Shanghai, China. (2003) 166-173 Decker, K., Li, J.: Coordinated hospital patient scheduling. Proceedings of the 3rd International Conference on Multi-Agent Systems (ICMAS’98). Paris, France. (1998) 104-111 Kitano, H.: RoboCup Rescue: A Grand Challenge for Multi-Agent Systems. Proceedings of ICMAS’00. Boston, Massachusetts, U.S.A. July (2000) 5-12

Temporal Problems in Service-Based Workflows Zhen Yu, Zhaohui Wu, Shuiguang Deng, and Qi Gao Grid Computing Laboratory, College of Computer Science, Zhejiang University 310027 Hangzhou, Zhejiang, P.R.China [email protected], {wzh, DengSG, hyperion}@zju.edu.cn

Abstract. Time constraint is a key problem in workflow management. Now, with the emergence of new technologies such as web service, grid, this problem has become more complicated in service-based, loose-coupled workflow models, in new models. This paper presents a fundamental time model for servicebased workflows. Then, based on the model, it focuses on some typical temporal problems in service-based workflows..

1 Introduction With the rapid advances of Internet technology, the web service or grid service based workflow models are drawing more and more attention and many standards like WSFL[1], XLANG[2], BPEL[3] are proposed to manage the loose-coupled interenterprise business processes. In those new workflow models, temporal problems, a crucial issue in traditional workflows, is still of great importance. Although there have been many researches on time management in workflows, most researches are limited to those processes whose activities or sub-process are all exposed to workflow modeling person. But in service-based workflows, a process is composed by services whose inner structures are not all publicized. So a new model and some algorithms are required to handle temporal problems in those workflows. This paper proposes a new time model for service-based workflows. Based on the model this paper also presents algorithms to handle some typical time management problems in workflows, such as the check of time constraints, the shortest execution time of a process, and the earliest/latest start time of activities. The following sections are organized as follows, section 2 presents the time model for service-based workflows; section 3 gives some algorithms to handle some temporal problems; section 4 concludes the work and discusses some future works.

2 Basic Elements and Time Model In reference to some web service-based or grid service-based workflow models, we design a time model for service-based workflows.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 954–961, 2004. © Springer-Verlag Berlin Heidelberg 2004

Temporal Problems in Service–Based Workflows

955

2.1 Basic Elements and Time Constraints Definition 1 (Activity). An Activity is defined as a 2-tuple, , where id is the identity of the activity, which comes from the domain Identity, duration is a positive real number, representing the execution time of the activity. Definition 2 (Dependency). A Dependency is defined as a 2-tuple, <prev, succ>, where prev and succ are two activities. A dependency means that when beginning to start the activity succ, the activity prev must have been completed. Definition 3 (Lower Time Constraint). A lower time constraint LConstraint is defined as a 5-tuple, <src, P1, des, P2, limitation>, where src and des are activities, P1 and P2 are from the set {b, e}, b represents the begin time of the activity and e represents the end time of the activity, limitation is a positive real number. Definition 4 (Upper Time Constraint). A upper time constraint UConstraint is defined as a 5-tuple, <src, P1, des, P2, limitation>, where src and des are activities, P1, b represents the begin time of the activity and e represents the end time of the activity, limitation is a positive real number.

2.2 Service and Service Declaration Combining the elements and time constraints, the definition of service can be given. Definition 5 (Service). A Service is defined as a 5-tuple, , where ActSet is the set of activities, DepSet is the set of dependencies, In is the identities of the input activities, Out is the identities of the output activities, and Constraints is the set of time constraints. In run time, the begin time (end time) of activities of a service may have different combinations, the combination is called an activity time assignment of the service. Definition 6 (Activity Time Assignment). For a Service and the map time : If the following conditions are satisfied, then the map time is called an activity assignment of Service, i. For each activity Service.ActSet, time (activity.id.b) + activity.duration = time (activity.id, e) ii. For each dependency Service.DepSet, time(dependency.prev, e) time(dependency.succ, b) iii. For each lower time constraint LConstra int Service.Constra int s, time(LConstra int .des.id, LConstra int .P2) – time(LConstra int .src.id, LConstra int .P1) LConstra int. lim itation iv. For each upper time constraint UConstra int Service.Constra int s, time(UConstra int .des.id, UConstra int .P2) – time(UConstra int .src.id, UConstra int.PI) UConstra int. limitation Definition 7 (Interface Time Constraint Equivalent). If two service Servcie1 and Service2 fulfill the following conditions, Service1 and Service2 are interface time constraints equivalent.

956

Z. Yu et al.

i. Service1.In=Service2.In, Service 1 .Out=Service2.Out ii. For any activity time assignment of Service 1, time1, there exists an activity time assignment of Service2, time2. They fulfill the condition

iii. For any activity time assignment of Service2, time2, there exists an activity time assignment of Service1, time1. They fulfill the condition

Based on the concept the definition of service declaration can be given. Definition 8 (Service Declaration). For a service, its service declaration is a service that is interface time constraints equivalent with it, expressed as Declare(Service). In [16], an algorithm with polynomial complexity is presented to generate the service declaration automatically for a given service.

2.3 Service-Based Process and Process Service A service-based process should include activities, dependencies, time constraints and some independent services, which implement some functions of the process. Definition 9 (Service-Based Process). A service-based process SProcess is defined as a 6-tuple, <ServiceSet, ActSet, DepSet, begin, end, Constraints>, where i. ServiceSet is the set of services and ActSet is the set of activities ii. DepSet is the set of dependencies among the activities in ActSet or the input and output activities of services in ServiceSet, that is

iii. begin and end are the input and output activities of the service-based process, begin, end

ActSet

iv. Constraints is the time constraints in the service-based process, which only refer to the the begin time and end time of the activities in ActSet, the begin time of input activities of services in ServiceSet and the end time of output activities of services in ServiceSet. Obviously, only according to the information provided by figure 6, it is impossible to handle some time management problems like evaluation of the execution time of the process, check of the time constraints. However, if a service can publish its service declaration, the structure of the whole process can be acquired by replacing a black box with the declaration of its service. Thus, the typical time management problems can be handled while the services still only publish a small part of their structures and thus keep their autonomy and encapsulation. In this way, a service-based process can be converted it into a corre-

Temporal Problems in Service-Based Workflows

957

spending service that only has one input activity and one output activity. Such a service is called a process service. Its definition is Definition 10 (Process Service). For a service-based process SProcess and the service declarations of its services, which is expressed as Declare(Service), its corresponding process service is defined as a service , where

3 Time Constraint Problems in Service-Based Workflows There are 2 typical temporal problems for workflows [9-15]. One is how long is the shortest execution time of a process given that all of its time constraints are fulfilled. The other is when are the earliest and latest end time of each activity of a process given that the process is completed in its shortest execution time and all the time constraints are fulfilled. In service-based workflows, those problems are still of much significance. Nevertheless, if a service only publishes its input and output interfaces but provides no information about its inner structure, it is impossible to solve those problems. However, because of the encapsulation and autonomy of services it is either improper to make them publish all their inner structures. To overcome the difficulty, an appropriate method is to ask service providers to publish service declarations for their services and compose those services declarations into a process service. There are some 2 algorithms to handle those problems.

3.1 Simple Service Simplification Algorithm As the problems about whether time constraints can be fulfilled and how long is the shortest execution time of a service-based process are converted into the same problems in a process service. An algorithm with polynomial complexity is presented here, which calculate the two problems by simplifying the process service. Moreover, the algorithm is applicable not only for process service, but for all the services that have only one input activity and one output activity. This special service is called simple service here. Simple Service Simplification Algorithm: Input: a simple service, Service Output: If the time constraints in the service can be fulfilled, return the shortest execution time, otherwise, return “time constraints cannot be satisfied”

Z. Yu et al.

958

Step

1:

Assume

that

the

ActSet of Service is the end time of each activity is an unknown and the identity of input activity is and the identity of output activity is i. For each dependency Service.DepSet, assuming that add the inequality ii: For each LConstra int Service.Constra int s, assuming that If LConstramt.P1=b, LConstraint.P2=b, then add the inequality

If LConstraint.P1=b, LConstraint.P2=e, then add the inequality If LConstraint.P1=e, LConstraint.P2=b, then add the inequality If LConstraint.P1=e, LConstraint.P2=e, then add the inequality iii. For each UConstra int

Service.Constra int s, assuming that

If UConstraint.P1=b, UConstraint.P2=b, then add the inequality

If UConstraint.P1=b, UConstraint.P2=e, then add the inequality If UConstraint.P1=e, UConstraint.P2=b, then add the inequality If UConstraint.P1=e, UConstraint.P2=e, then add the inequality Based on the rules the initial inequalities can be generated. In the inequalities, each inequality has a form like Step 2: Use the elimination algorithm given later to convert the initial inequalities into the target inequalities, where each inequality still has the form like and there are only two unknowns, and But if the time constraints of Service cannot be satisfied, the elimination algorithm will abnormally terminate. If it is the case, the algorithm will abnormally terminate too and return "the time constraints cannot be satisfied". Step 3: Obviously, according to the elimination algorithm, there are at most two inequalities in the target inequalities, one with the form like and the other with the form like If the target inequalities has no solution, i.e. t
Temporal Problems in Service-Based Workflows

959

Elimination Algorithm: Input: Initial Inequalities, where every inequality has the form and a set of unknowns, X Output: target Inequalities, where every inequality also has the form and all the unknowns are from the set X Step 1: Simplify the inequalities. For a pair of inequalities and , if then eliminate the later inequality from the inequalities; if then eliminate the former inequality from the inequalities. Repeat the operations until there is no such a pair of inequalities in the inequalities. If at that time, all the unknowns in the inequalities are in the set X, then the algorithm is completed and the current inequalities is the target inequalities, otherwise Step 2: Select an unknown which is not in the set X. Then find two sets of inequalities EN and EP in current inequalities. EN consists of all the inequalities that have the unknown and the coefficients of the in them are -1. EP consists of all the inequalities that have the unknown and the coefficients of in them are 1. Then for each inequality in EN and each inequality in EP, if then add the new inequality to current inequalities; otherwise, if then terminate abnormally and return that the time constraints cannot be satisfied in any case. Step 3: Eliminate all the inequalities in EN and EP from current inequalities and go to step 1.

3.2 Workflow View In order to calculate the earliest and latest end time of every activity in a service of a service-based process, it is necessary not only to get the corresponding process service of the service-based process, but also to use the service itself to replace its counterpart in the process service. After the replacement, the process service is converted to a new service called workflow view. Definition 11 (Workflow View). For a workflow service WService and a service in its corresponding service-based process, CService, the PView( WService, CService) is the workflow view of workflow service WService on CService. It is defined as a service, where

960

Z. Yu et al.

3.3 Calculation Based on a Workflow View Based on a workflow view, it is possible to calculate the earliest and latest end time for every activity in a service given the premise that the whole service-based process is completed in its shortest execution time. Activity Earliest/Latest End Time Calculation Algorithm: Input: a workflow view WView and an activity of it, Output: the earliest and latest end time of Step 1: Assume that the ActSet of WView is the end time of each activity is an unknown and the identity of input activity is and the identity of output activity is Follow the rules of the step 1 of simple service simplification algorithm and generate the initial inequalities. Step 2: Take as the destine unknown set and invoke the elimination algorithm. In the generated target inequalities, all the inequalities have the form like and there are only three unknowns and in the target inequalities. Step 3: Use the simple service simplification algorithm to calculate the shortest execution time of WView, which is expressed as t. Then replace with 0 and with t in target inequalities. After that, there is only one unknown in the inequalities. Calculate the infimum and supremum of its range and they are the earliest and latest end time of the activity.

4 Conclusions and Future Work Time management is an important topic in workflow. Especially, in the loose-coupled inter-enterprise environment, some new crucial and difficult problems have risen in the area and some new methods are required to handle them. This paper presents a time model for service-based workflows. Based on the model, some algorithms are given to handle some typical time constraint problems. The future works can be carried out in two directions. One is to extend the model to handle time management problems in more complicated workflows. The other is to propose mechanisms that not only handle time management problems in build time but also can do dynamic calculations to handle problems in run time.

References 1. 2. 3.

Frank Leymann.Web Services Flow Language. http://www4.ibm.com/software/solutions /webservice/pdf/ WSFL.pdf, May 2001 Satish Thatte. XLANG: Web Services for Business Process Design. http://www.gotdotnet.com/team/xml wsspecs/xlang-c/default.htm, 2001 BEA Systems, IBM, Microsoft, SAP AG and Siebel Systems, Business Process Execution Language for Web Services, May 2003

Temporal Problems in Service-Based Workflows 4.

5. 6.

7. 8.

9.

10.

11.

12. 13. 14. 15.

16.

961

I. Foster, C. Kesselman, J. Nick, S. Tuecke, The Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration, Open Grid Service Infrastructure WG, Global Grid Forum, June 22, 2002 Sriram Krishnan, Patrick Wagstrom, Gregor von Laszewski, GSFL: A Workflow Framework for Grid Services W.M.P. van der Aalst, K.M. van Hee, R.A. van der Toorn, Component-Based Software Architectures: A Framework Based on Inheritance of Behavior, Technical Report, CU-CS892-99, University of Colorado, Department of Computer Science, Boulder, USA, 1999 W.M.P. van der Aalst, T. Hasten, Inheritance of Workflows An approach to tackling problems related to change, Theoretical Computer Science January 2002 W.M.P. van der Aalst, Loosely Coupled Interorganizational Workflows: Modeling and Analyzing Workflows Crossing Organizational Boundaries, Information and Management, 37(2):67-75, March 2000 Eder, J., Panagos, E., Pozewaunig, H., et al. Time management in workflow systems. In: Abramowicz, W., Orlowska, M.E., eds.Proceedings of the 3rd International Conference on Business Information Systems. Heidelberg, London, Berlin: Springer-Verlag, 1999. 265– 280. Eder, J., Panagos, E., Rabinovich, M. Time constraints in workflow systems. In: Proceedings of the llth Conference on Advanced Information Systems Engineering (CaiSE’99). Heidelberg, 1999. 1–14. Johann Eder, Wolfgang Gruber and Euthimios Panagos. Temporal Modeling of Workflows with Conditional Execution Paths. In: llth International Conference on Database and Expert Systems Applications, DEXA 2000, Proceedings, Springer Verlag, LNCS 1873, London, September 2000, pp. 243-253, ISSN 3-540-67978-2 Marjanovic, O. Dynamic Verification of Temporal Constraints in Production Workflows. Proc.of the Australian Database Conference ADC’2000, IEEE Press, pp.74 –81. Zhu Ge Hai. Timed Workflow: Concept, Model, and Method. 1st International Conference on Web Information Systems Engineering (WISE2000). C. Bettini, X. Wang, S. Jajodia, Temporal Reasoning in Workflow Systems. Distributed and Parallel Databases, 11(3):269-306, Kluwer Academic Publishers, 2002. J.H.Son and M.H.Kim, “Finding the Critical Path in a Time-Constrained Workflow”, Seventh Int’l Conference on Real-Time Computing Systems and Applications (RTCSA 2000), 12-107, 2000, 7 Yu Zhen, Deng shuiguang, Wu Zhaohui, A Time Model for Service-Based Workflows, to be appear in The Eighth International Conference on Computer Supported Cooperative Work in Design (CSCWD 2004)

iCell: Integration Unit in Enterprise Cooperative Environment* Ruey-Shyang Wu1, Shyan-Ming Yuan1, Anderson Liang2 and Daphne Chyan2 1

Dept. of Computer and Information Science National Chiao Tung University, Hsinchu , Taiwan, R.O.C. {ruey, smyuan}@cis.nctu.edu.tw 2

W&Jsoft Inc. Unit 2, 19F, TaijungGang Rd. Sec. 3, Taichung 407, Taiwan, R.O.C {anderson, daphne}@wnjsoft.com

Abstract. An enterprise cooperative environment is combination of emerging technologies and methodologies on which both enterprise employee and customers can perform necessary business activities. A business activity will involve many systems of an enterprise. Integration between those systems becomes a critical issue because inter-/intra- enterprises will have heterogeneous systems. Recently, many new technologies boost the evolution of the integration toward a more efficient and effective computing. Enterprise has to choose suitable technology as integration platform to make value. However, only a few enterprises can really utilize those technologies because they do not have a proper infrastructure. Those technologies lose their advantages. In this paper, we design an infrastructure to provide the integration platform. It is iCell that provides flexible and useful mechanism to archive business operation. It is light-weigh architecture and can be adopt into existed enterprise environment easy. Based on the mechanism, the integration becomes possible and constructing enterprise cooperative becomes easy.

1 Introduction The enterprise cooperative environment provides necessary elements to accomplish business activities. It brings inter-and intra-enterprise together to make enterprise more efficiently. However, the environment is complex. Intra-enterprise includes business process, employees and much enterprise software. Inter-enterprise has many business-to-business operations. For the reason, there must be a solid foundation for building and performing business activities. Besides, all enterprise systems have their own platform, data format and specified communication interface. Hence, integration architecture is necessary. It will be easy to be adopted into current existed enterprise existed environment and light-weight to build up application. Moreover, it should * This research was supported by the Software Technology for Advanced Network Application project of Institute for Information Industry and sponsored by MOEA , ROC M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 962–969, 2004. © Springer-Verlag Berlin Heidelberg 2004

iCell: Integration Unit in Enterprise Cooperative Environment

963

cover the business operation to perform business activities. iCell provides the integration platform and is suitable in enterprise cooperative environment. iCell, stands for integration Cell, provides the infrastructure for the cooperative environment. It can lower the integration effort comparison with several open integration standards. iCell facilitate the “build to integrate” and light-weight construction strategy. It is intended to make the application constructions and application as one step, while begin served as the basic integration unit and keeping the overall solution to be light-weight. On such architecture, the cooperative environment becomes possible because it is easy to use and has good performance. The paper is organized as following: Section 2 shows the backgrounds. Section 3 has whole design of iCell. Section 4 shows the system performance. Finally, section 5 is the conclusion and future work.

2 Background 2.1 Enterprise Cooperative Environment

Enterprise cooperative environment provides the environment to perform all business inter-/intra-enterprise. Today, interaction between enterprises becomes more and more frequently especially when every enterprise only focuses on its special domain. Enterprise cooperative environment defines enterprise-level co-work model and rounds up all enterprise components. To realize enterprise cooperative environment, business modeling and platform to support is important. Business modeling relies on business manager to make decision. Software platform is the layer to execute the decided business activities. To build up the platform, the necessary systems in enterprise should be reviewed because a business activity may make several systems work together. Besides processing internal enterprise activities, information should also be extended to another partner. Business can gain extreme value.

2.2 Objective To construct the enterprise cooperative environment, integration is the key issue. Integration does not only link two systems together but also guarantee its stability. It should provide more mechanism to monitor and manage. Most enterprise integration technologies and product is hard to integrate into existed together. Otherwise, they may lack some features, like management or integration. For the reason, a good integration is necessary to be the foundation in the enterprise cooperative environment. The following aspects should be considered: 1. Define the business process clearly. 2. Adopt into existed platform. 3. Light-weight architecture. 4. Good performance. 5. Mechanism to monitor, exception handle and notify. 6. Standard and extensible interface for system communication.

964

R.-S. Wu et al.

Fig. 1. iCell Conceptual Model

Although there are much software can integrate with many systems easy, some of them may cost a lot resource and other has specified platform. To satisfy the requirement to construct the platform, it leads us to create new architecture.

2.3 Related Works OpenAdapter™ is a Java/XML-based software platform for business system integration with little or no custom programming. It defines Source-Pipe-Sink model to process information; source provides information, pipe process those data and sink takes the output the information. The communication between components is XML and cost a lot resource to process it. Therefore, a more efficient communication strategy is needed in the cooperative environment. TIBCO ActiveEnterprise is the commercial products to archive enterprise cooperative environment. TIBCO ActiveEnterprise provides many software packages: TIB/Rendezvous provides message bus; TIB/Adapters provide system integration; TIBCO BusinessWorks is the business model tool. TIBCO ActiveEnterprise has the complete set for construct the environment. However, the product is specific and is not easy to integrate with current existed IT infrastructure in Enterprise.

3 iCell Design 3.1 iCell Conceptual Model Figure 1 shows the conceptual model of iCell. The Input-Process-Output is mapped as Source-Pipe-Drain in the conceptual model. The Bootstrap component reads iCell configuration from XML file when the iCell is starting up, and it creates Controller, Source, Pipe, and Drain components accordingly. The Bootstrap component also allow external Web browser to read loaded configurations from it, and optionally, reconfigure it. The Controller acts as a dispatcher for the events in the iCell. Whenever the Source component captures events, it delegates the event to the Controller. The Controller invokes suitable components to process the event. Eventually, the Controller returns answers to the Source component when finish.

iCell: Integration Unit in Enterprise Cooperative Environment

965

Fig. 2. iCell Use Case

3.2 Use Case The whole use case is showed as Figure2. The “Base Cell” use case defines the common iCell component behaviors. The “Bootstrap” use case is the iCell initial entry that can be started from a standalone Java program main entry, or an embedded EJB initialization class. It identifies the location of the iCell configuration file, from “Load & Parse XML Configuration” use case, and streamlines the iCell component initialization process. The loaded configuration is stored in a run-time repository, by which the behavior is described by the “Configuration Repository” use case. The initialized iCell components are placed in the run-time component pool, by which the behavior is described by the “Component Pooling” use case. The startup scenario ends on finishing all the initialization tasks described in the configuration.

Fig. 3. iCell Class Diagram

3.3 iCell Components Figure 3 shows the key classes in iCell. iCell core classes group consists of the BaseCell root class and the UserCell derived class. BaseCell defines the common logics and functionalities when executing iCell. iCell configuration classes group mainly consists of the Configurator and ConfigurationService classes. The configurationService provides the Web-based remote configuration service for iCell the administrator. iCell utility classes group consists of some utility classes such as logging helper. iCell Process classes group control business process in enterprise. iCell components that implemented ProcessDispatchable interface, are able to join process and can be controlled by ProcessController and will be called onDispath() method when the process is used. During the iCell initialization, EAIEventListener object will be

966

R.-S. Wu et al.

Fig. 4. iCell Event Dispatching Sequence Diagram

Fig. 5. iCell Confiruation Example

registered to the corresponding EAIEventSource object. When defined event arrives from related Event Trigger, the invoking follow are: (1) Get the process definition of the incoming event from the Configurator object. (2) Based on the process definition, the ProcessController gets the next object reference from the component pool. (3) Invoke the onDispatch method of the retrieved component. (4) Repeat step (2) ~ step (3) until all the necessary processing steps finished. (5) Return to the EAIEventListener to complete the delegation.

3.4 Configuration Design The core cell definition only has the tag . The start tag and end tag enclose the iCell component definition and iCell process definitions. The tag has four attributes to identify the iCell components general information. <EAIComponent> shows the EAI components. Finally, the process model is at the least part in .

iCell: Integration Unit in Enterprise Cooperative Environment

967

Fig. 6. Process Modeling Cases

3.5 Process Modeling iCell supports the four types of process modeling that are most existed in enterprises: A. Sequential process. One input, one output, no split and merge occurred. B. Fan-Out process. One input, multiple outputs, split occurred in the process. C. Fan-In process. Multiple inputs, one output, merge occurred. D. Split & Merge process. One input, one output, split and merge occurred. E. Fan-In & Split process. Multiple inputs, multiple outputs, merge and split occurred in the process.

3.6 Pooling Pooling is usually a skill to enhance performance and make better resource utilization. The design of iCell utilizes the concept of object pooling to prepare necessary objects prior to iCell process execution. It implements the iCell component pool to hold typed Java objects defined in the XML configuration. When a defined event occurred, iCell ProcessController is able to quickly delegate the event processing task to the iCell component in the pool, without creating an object from beginning. Therefore, the overhead of iCell process task dispatching is reduced.

4 Performance Two kinds of overhead are introduced by the iCell framework: (1) Initialization Overhead; (2) Process Control Overhead. The measurement platform is on Windows 2000 SP3 with Pentium III 800 CPU and 640 MB RAM. The JVM is JRE 1.3.1_06 from Sun Microsystem. Figure 7 gives the summary result of the overhead measurement. The result shows that the initialization overhead of an iCell component is roughly a constant overhead that spends 3.5 milliseconds to create an iCell component. And the iCell framework overhead is about 510 milliseconds. This is relatively light-weighted compared to other application frameworks, like J2EE.

968

R.-S. Wu et al.

Fig. 7. Initialization Overhead

Fig. 8. Result Summary of iCell Process Control Overhead Test To measure the overhead of process control inside the iCell, this test introduces a benchmark source that triggered by timer for every 1 to 2 second. Each time the benchmark source is triggered, it delicates the event to the process control object. This test gives 10 sets of data, which controlled by the number of dummy pipes in the process. It ranges from 100 to 1000 dummy pipes in a given process. Figure 8 shows the summary result of the test. Typically, for a process with pipes less than 100, the process control overhead per pipe component is less than 0.19 milliseconds. It almost can be ignored compare to normal execution. This result shows that iCell is a lightweight framework for Java object assembly and process control.

5 Conclusions and Future Works There are many aspects to be considered in the enterprise cooperative environment. It is quite complex and changeful. iCell provides a low-cost and light-weight solution. It has great flexibility and reasonable management efforts those are not covered by other products. The overall robustness and scalability can be guaranteed and adapted by choosing suitable technology. iCell provides the missing parts of rapid application integration and assembly. To make iCell to be applied in real enterprise environment, more source/drain components will be provided. The more components, the more rapid connectivity can be available for software developers. Besides, good data translation will give great

iCell: Integration Unit in Enterprise Cooperative Environment

969

help in cooperative environment. The translation should also follow light-weight and effective policy.

References 1. 2. 3. 4. 5. 6.

7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.

Alan Everard, MQSeries Business Manager, IBM, “Business Process Management Uncovered “, eAI Journal, January 2001, Vol. 3, Num.1. IBM, “WebSphereMQ Product Family,” (http://www-3.ibm.com/software/integration/wmq/) Momentum Software Inc., “Service Oriented Enterprise,” (http://www.serviceoriented.org/service_oriented_enterprise.html) openadaptor.org , “openadaptor Programmer’s Guide v1.5.0” (http://www.openadaptor.org/) February 2003 Rob Hailstone, “Integration Strategies: The Start of Convergence,” IDC Group, January 2003 Steve Burbeck, “The Tao of e-business services – The evolution of Web applications into service-oriented components with Web services,” Emerging Technologies, IBM Software Group, October 1, 2000 (also available via: http://www-106.ibm.com/developerworks/webservices/library/ws-ao/?dwzone=webservices) David S. Linthicum, “Process Automation and EAI”, eAI Journal, March 2000, Vol. 2, Num3. Boris Lublinsky, “Achieving the Ultimate EAI Implementation”, eAI Journal, February 2001, Vol. 3, Num. 2. Johnny Long, “Integrating the Value Chain”, eAI Journal, May 2000, Vol. 2, Num. 5. Kathy Harris, “Where Is the Value in Value Chains?”, Garner Group, http://www.gartnergroup.com, March 2001, Note Num. COM-13-1796. David McGoveran, “Enterprise Integrity: BPMS Concepts, Part 1”, eAI Journal, January 2001, Vol. 3, Num. 1. David McGoveran, “Enterprise Integrity: BPMS Concepts, Part 2”, eAI Journal, February 2001, Vol. 3, Num. 2. David McGoveran, “Enterprise Integrity: BPMS Concepts, Part 3”, eAI Journal, March 2001, Vol. 3, Num. 3. David McGoveran, “Enterprise Integrity: BPMS Concepts, Part 4”, eAI Journal, April 2001, Vol. 3, Num. 4. Alan Everard, MQSeries Business Manager, IBM, “Business Process Management Uncovered “, eAI Journal, January 2001, Vol. 3, Num.1. Martin Butler, “Workflow Beyond the Enterprise”, eAI Journal, November/December 2000, Vol. 2, Num. 11. TIBCO Software Inc., “TIB/Rendezvous Concepts”, July 2001, Release 6.7. TIBCO Software Inc., “TIBCO ActiveEnterprise”, http://www.tibco.com/solutions/products/default.jsp W3C, “Web Services Activity”, http://www.w3c.org/2002/ws/

The Availability Semantics of Predicate Data Flow Diagram* Xiaolei Gao1,3, Huaikou Miao1, Shaoying Liu2, and Ling Liu1 1

School of Computer Engineering and Science, Shanghai University, Shanghai, 200072, China

[email protected], {hkmiao, liuling}@mail.shu.edu.cn 2

Faculty of Computer and Information Science Hosei University, Tokyo, Japan [email protected] http://www.k.hosei.ac.jp/~sliu 3

Xu zhou Normal University, Xuzhou, 221116, China

Abstract. The core of the SOZL(structured methodology + object-oriented methodology + Z language) is Predicate Data Flow Diagram. In order to eliminate the ambiguity of predicate data flow diagrams and their associated textual specifications, a formalization of the syntax and semantics of predicate data flow diagrams are necessary. In this paper we use Z notation to define an abstract syntax and the related structural constraints for the predicate data flow diagram notation, and provide it with an axiomatic semantics based on the concept of data availability. Necessary proofs are given to establish important properties on the axiomatic semantics.

1 Introduction We have designed a formal language, called SOZL, and its syntax has been reported in our previous publication[l][2]. The primary technique for writing specifications using SOZL is to use formalized data flow diagrams, called Predicate Data Flow Diagram (PDFD), to define the architectures of software systems. In the paper we define a formal semantics for PDFD based on data availability.

2 Basic Concept When we use Z notation to describe the predicate data flow diagram, the following given types are needed, their semantics can be found in [3]. [NAME, TYPE, PRED]

*

This work is supported by Natural Science Foundation of China (60173030) and the Ministry of Education, Culture, Sports, Science and Technology of Japan under Grant-in-Aid for Scientific Research on Priority Areas (No. 15017280).

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 970–977, 2004. © Springer-Verlag Berlin Heidelberg 2004

The Availability Semantics of Predicate Data Flow Diagram

971

NAME denotes a set of all name labels,TYPE denotes a set of all types, PRED denotes a set of all predicates. Definition 1. Predicate operation PO is defined as:

where PONAME is the name of PO, I is a set of input variables, O is a set of output variables, and P is a predicate defining a constraint on PO. PO represents a transformation from input to output under a certain condition. Definition 2. A data flow DF are defined as:

In this schema, P and Q are two predicate operations; x is the output of P and the input of Q. The data flow denotes that data moves from P to Q through variable x. x is also called the label variable of the directed line. P is called the predecessor of Q, noted as P = PredPO(Q). Definition 3. The Predicate data flow diagram PDFD are defined as

972

X. Gao et al.

where Vs and POset are the set of label variables and POs of the PDFD, respectively. The input and output variables belong to Vs. ARCs is the set of DF. A PDFD may be connected or disconnected. Definition 4. The set of PredPOs of PO in PDFD is defined as:

Let Let

we have we have

Definition 5. Decomposition relation is defined as following Z axiom description.

A PO can be decomposed into only one lower level PDFD

3 Some Notions In order to define the semantics, we need the following notions. Firing(_) denotes that the PO or PDFD is fired. Term(_) denotes the Firing(PO) or Firing(PDFD) terminates. Out(PDFD) denotes the set of PDFD outputs. denotes that if pre-assertion is true, then P is fired, is executed and terminates, then post-assertion C2 is true, where P denotes either PO or PDFD. we use the following formats to describe If rule and If and Only If rule.

4 Data Availability Definition 6. VAL(x) represent that the data passing variable x:X is available.

The Availability Semantics of Predicate Data Flow Diagram

Definition 7. If where and Y are correlated, which is denoted by Co-rel(X,Y). If we say that is correlated, represented by Definition 8. If correlated, represented by

973

then we say X

then call

is maximum

Definition 9. If satisfying MaxCo-rel(T); we say that T is called the maximum correlated subset, the set of all T of is denoted as Definition 10. Assume then define

Property 1. Assume

then If If

Proof. The conclusion can be got from Definition 10. Property 2. If Proof. Let Because so according to

and #MaxCR(A) = 1, then because #MaxCR(A) = 1, according to Property 1, then exist

and

we can get the conclusion

Property 3. Proof. We only proof the case of n=2, If according to Property 1 if then according to Property 1 and and then we can get the property.

974

X. Gao et al.

5 Axiomatic Semantics of Hierarchical HPDFD We describe an axiomatic semantics for hierarchical PDFD. It contains a set of inference rules that can be used to verify the consistency of hierarchical PDFD. Availability semantics is based on data availability[4] [5]. Firstly, we introduce an axiom: Axiom 1. Let Then It states that if predicate operation PO receives available input data, then PO immediately becomes Firing and terminates finally. rule 1 Let

rule 2 Let

Rule 1 and rule 2 defined the relation between the input data and output data of a PO; Rule 1 indicates that once PO becomes Firking(PO) and Term(PO) finally, the output of PO will be made available under the input were available before the firing, rule 2 describes that a normal termination of firing PO will make its output data available and its input data consumed. Rule 3 to rule 6 define the availability semantics of connected PDFD. The definition of semantics is expected to establish a base for interprete behaviors of PDFDs. rule 3 Let

is input PO of a PDFD

Once arbitrary input predicate operation of PDFD is fired, then PDFD will be fired. rule 4 Let

is output PO of a PDFD

That PDFD is fired and the execution terminates finally implies that at least one input predicate operation of PDFD will be fired. rule 5

The Availability Semantics of Predicate Data Flow Diagram

975

where PO is a part of PDFD. The may be simultaneously fired, and after all terminate, PO can be fired. rule 6

rule 7

Rule 7 defined the semantics of a disconnected PDFD. Let are connected PDFD. The firing of a disconnected PDFD is equivalent to that of some sub-connected PDFDs. A hierarchical PDFD is built by decomposing a PO. The following rules define the relation between the hierarchical PDFD and the PO. Rule 8

Rule 9

Rule 8 states that the firing of PO is equivalent to that of its decomposition PDFD, while Rule 9 describes that the same output data are generated based on the same input data by both PO and its decomposition, that is, the decomposition PDFD is a detailed representation of its parent predicate operation PO.

6 The Proof of Related Theory In order to demonstrate the application of above rules and understand the internal mechanism of a PDFD, for example, the relation between the available outputs of a predicate operation and the available inputs of others predicate operations in the PDFD, the PO’s firing order in a PDFD when the PDFD are fired, we present the Theorem 1. In addition, for the sake of the convenience of proving this theorem, we introduce following lemmas and corollaries.

976

X. Gao et al.

Lemma 1. Let

then

Proof: According to the Set Theory, following results can be derived

(1) (2) (3) (4)

assumption DF Definition (2), DF Definition (1), (3), Definition

(5)

Definition

(6)

(1), (5), Definition

hence We can get Corollary 1 straightly from the Lemma 1. Corollary 1. Let

Lemma 2. Let then Proof: (1) (2) (3) (4) (5) Lemma 3. Let

then

and

assumption (1), axiom 1, rule 1 assumption Definition (2), 3), 4), Property 2 then

Proof: (1)

assumption

The Availability Semantics of Predicate Data Flow Diagram

977

(2)

(1), Property 3

(3)

(2), Corollary 1

Theorem 1. Let and then Proof: It can be proven from Lemma 2 , Lemma 3 and rule 5. Certainly, we can get Corollary 2 easily from the theorem 1.

7 Conclusions We have presented an axiomatic semantics for predicate data flow diagrams and their components, which lays down fundamental rules for interpreting SOZL specifications. This semantics definition can also contribute to a transformation system supporting automatic or semi-automatic transformation from formal specifications into programs. To facilitate the use of the SOZL language, our future research will focus on the building of software toolkits for SOZL, based on the syntax and semantics provided in this paper.

References 1. Gao Xiaolei, Miao Huaikou, Chen Yihai, SOZL language: A New Software Development Methodology. 16 IFIP World Computer Congress,Beijing,China,21-25 August,2000 2. Miao Huaikou, Gao Xiaolei and Li Gang, The Comparison and Combination of Structured Methodology, Object-Oriented Methodology and Formal Methods. Computer Engineering & Science, Vol.21,No.4, 1999 3. J.M.Spivey, Understanding Z: A Specification Language and Its Formal Semantics. Number 3 in Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, 1988 4. Shaoying Liu, A Formal Definition of FRSM and Applications, International Journal of Software Engineering and Knowledge Engineering, Vol. 8, No. 3, September 1998, pp. 253-281 5. Shaoying Liu, A Formal Requirements Specification Method Based on Data Flow Analysis. The Journal of Systems and Software. 21:141-149, 1993

Virtual Workflow Management System in Grid Environment ShuiGuang Deng, ZhaoHui Wu, Qi Gao, and Zhen Yu College of Computer Science, Zhe Jiang University, 310027 HangZhou, China {dengsg,

wzh,

hyperion,

yuzhen}@zju.edu.cn

Abstract. Building a workflow management system (WFMS) is a large project. However, with the development of grid technology, it becomes easily to do that. In Open Grid Service Architecture (OGSA), everything is a service. Thus, a workflow management system can be easily built upon a set of services. Some workflow management services and important components are proposed in this paper. And the definition of virtual workflow management system (VWFMS) is presented. Some related issues like service registration and discovery will be discussed in detail.

1 Introduction Workflow technology has become a major approach to assist the automation of business processes that involve the exchange of documents, information, or task execution results in quite diverse domains [1]. Workflow management system (WFMS) is a system that defines, creates and manages the execution of workflows through the use of software, running on one or more workflow engines, which are able to interpret the process definition, interact with workflow participants and, where required, invoke the use of IT tools and applications [2]. But to build a WFMS for enterprises is a tough task involving long time consumption and vast resource investment. Grid computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation [3]. In OGSA, everything is a grid service. So a workflow management system can also be viewed as a set of workflow management services. Using those workflow management services, enterprises can easily build their own workflow management systems, which we call Virtual Workflow Management Systems (VWFMS). In this paper, we propose some essential workflow management services and important components in grid environment. We classify them into three layers: Resource Layer, Collective Layer and Discovery Layer. We use DAML-S [4] to represent each of them. Based on workflow management services and components, the definition of Virtual Workflow Management Systems is also proposed.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 978–985, 2004. © Springer-Verlag Berlin Heidelberg 2004

Virtual Workflow Management System in Grid Environment

979

2 Workflow Management Services and Functional Components in Grid Environment A workflow management system can also be built upon a set of workflow management services such as Workflow Engine Service, Workflow Definition Repository Service, Workflow Instance Repository Service, and upon a set of components such as Workflow Management Service Repository, Workflow Meta-data Repository, and some agents. An enterprise can easily choose some different workflow management services and components provided by other enterprises to build their own WFMS. We assume that the essential workflow management services and functional components should include workflow engine service, workflow process repository service, workflow instance repository service, workflow admin&monitor service; workflow management service repository, workflow meta-data repository, workflow management service discovery agent and workflow meta-data discovery agent. When an enterprise wants to build a WFMS, it will ask for a workflow management service discovery agent to find proper workflow management services advertised in WFMSR to compose a virtual workflow management system. Due to the lack of semantic information of WSDL, we use DAML-S to represent the workflow management services while advertising services and discovering services. According to their different functions, the essential workflow management services and functional components are divided into three layers: Resource Layer, Collective Layer and Discovery Layer. This is illustrated as Figure 1 shows.

Fig. 1. Layers of Workflow Management Services & components

980

S. Deng et al.

2.1 Resource Layer All workflow management services reside on Resource Layer and they are regarded as isolated ones. On this Layer, there are four main workflow management services: Workflow Engine Service, Workflow Process Repository Service, Workflow Instance Repository Service and Workflow Admin&Monitor Service. Workflow Engine Service (WFES): It is the core workflow management service, responsible for the execution of process instances. The advertised information about a workflow engine service should involve some properties such as the provider information (name, address, email...), the access URL, the throughput of the engine, and so on. To advertise a workflow engine service, all to do is to send a message described by DAML-S to the WFMSR. One example of the advertisement message is shown in figure 2.

Fig. 2. Example of Workflow Engine Advertise Information

Workflow Process Repository Service (WFPRS): In grid environment, workflow process definitions are also a resource which can be reused by others. Different organizations store their own business processes into WFPRS. After authorization and authentication, other enterprises can consult the existing processes reside in WFPRS before designing there own processes. WFPRS provides some operations such as adding a process, deleting a process, updating a process, querying a process and querying its current ability of storage; and also provides some properties like capacity, price, degree of quality etc. The essential advertisement information can be illustrated as figure 3. Workflow Instance Repository Service (WFIRS): After an end user has selected a workflow engine to execute his process, he can designate a workflow instance repository service to store the instance. WFIRS must provide some operations such as adding an instance, deleting an instance, querying an instance and some properties like capacity, price, degree of quality etc. Workflow Admin&Monitor Service (WFAMS): WFAMS provides administration and monitoring functions for end users to administrate and monitor information about

Virtual Workflow Management System in Grid Environment

981

processes. It involves user management, role management, audit management, resource management, process supervisory functions, etc.

Fig. 3. Example of Workflow Process Repository Service Advertise Information

2.2 Collective Layer All workflow management services located on Resource Layer now are aggregated on Collective Layer. Two important components are involved on this layer: workflow management service repository and workflow meta-data repository. Workflow Management Service Repository (WFMSR) is used to store the advertisement information about the above four type workflow management services. It not only provides large storage ability, throughput, but also conveniently querying functions. When a workflow management service makes any changes, it will readvertise or unadvertised to the WFMSR. And also, the WFMSR periodically pings each service that advertised in it. The WFMSR will remove the advertisement information about the services which have failed to respond. The interactions between workflow management services and WFMSR can be depicted as figure 4.

Fig. 4. Interactions between Workflow Management Services and WFMSR

Workflow Meta-data Repository (WFMR) is used to store the meta-data of workflow. Meta-data information includes: which workflow process repository serv-

982

S. Deng et al.

ice each workflow process definition resides in; which workflow engine service each service workflow process instance is running on and so on.

2.3 Discovery Layer In order to find proper services and meta-data for end user while building Virtual Workflow Management system, some agents must be provided in Discovery Layer. Workflow Management Service Discovery Agent: When a user wants to find a certain workflow management service, he first describes his request in a message with DAML-S and then submits the request to the WFMSDA. Later, the WFMSDA queries on the WFMSR and has a matching between the requested service and the advertised services. WFMSR will return the full specification of the workflow management service in DAML-S format to WFMSDA. The discovery of services is shown in figure 5.

Fig. 5. Workflow Management Services Discovery

For example, a user wants to find a workflow engine with the following constraints: 1) Throughput >1000; 2) Degree of quality=High; 3) Price<500. Those requirements can be represented by the following DAML-S message shown in figure 6:

Fig. 6. Workflow Management Service Query Message

Virtual Workflow Management System in Grid Environment

983

Workflow Meta-data Discovery Agent (WFMDA) is used by end users to get metadata about the location of the process definition or process instance. For example, when a user wants to update his workflow process definition, he must know where the definition locates. So he consults the WFMDA to find which workflow process repository services the definition resides in.

3 Virtual Workflow Management System in Grid Environment A Virtual Workflow Management System (VWMS) can be regarded as a temporary set of services composed to cooperate with each other in order executing a certain process. “Temporary” means that if the process is successfully finished or unconventionally terminated, the coalition of the services is dissolved. Furthermore, elements in the service set are not invariant during the execution of the process, they can break away from the coalition because of too much workload or becoming unavailable at current.

3.1 Snapshot of Virtual Workflow Management System Virtual Workflow Management Systems are full of the changes. But at a certain time, we can get the snapshot of the virtual workflow management system as figure 7.

Fig. 7. Snapshot of Virtual Workflow Management System

984

S. Deng et al.

According to the snapshot of virtual Workflow Management System, we can see the relationship between the services clearly in the coalition. Workflow Engine Service is the core of VWFMS as WFMS. It executes processes coming from a Workflow Process Repository Service; generates process instances and activity instances into a Workflow Instance Repository Service; invokes grid services to execute tasks. Administrators control and monitor process execution through a Workflow Admin&Monitor Service, and the latter interacts with Workflow Engine Service using commands.

3.2 Build Time To model a process, designers can design the process all by themselves or by consulting other process definitions advertised in workflow process repository. In the former case, they design a process by process definition tools on the client side. And then, they ask for the Workflow Management Service Discovery Agent to find one or more suitable Workflow Process Repository Services to store it. In the latter case, designers first ask for the Workflow Meta-data Discovery Agent to find some processes in the same category as the processes. After the agent finds some processes in Workflow Meta-data Repository and returns the location information about them, designers can access the Workflow Process Repository Services to query the processes if they have got authorizations.

3.3 Run Time To start and execute a process, users firstly get a Workflow Engine Service. This can be done through the Workflow Management Service Discovery Agent. After getting a Workflow Engine Service, the users now can designate a Workflow Instance Repository Service through the agent again. After that, the process can be executed by the engine. Some process instances and activity instances generated by the engine according to the process definition are put into the selected repository. During the execution of the process, users can administrate and monitor the process. To do that, Firstly, users find the Workflow Engine Service executing the process through the Workflow Meta-data Discovery Agent, which will query on the Workflow Meta-data Repository. At the same time, the users can find a Workflow Admin&Monitor Service by Workflow Management Service Discovery Agent. After that, user can use the Admin&Monitor Service to administrator and monitor the processes running on the Workflow Engine Service.

4 Related Work and Conclusions With the development of grid technology, it has received more and more concern from industrial fields and research domains. Now there are a lot of researches con-

Virtual Workflow Management System in Grid Environment

985

centrated on the topic of workflow management in grid and web services. Some schemes and standards are proposed, such as WSFL [5] and XLANG [6]. However, those researches only focus on the description of process and not pay much attention to the structure of workflow management. The workflow management systems in grid environment are something like the agent-based or component-based workflow management systems [7, 8]. In this paper, we build a workflow management system based on several essential workflow management services and some important components. We deem that, as grid technology becomes more and more popular and all become services in grid environment, there is not a long time to take before making virtual workflow management systems into realization.

Acknowledgement. This work was supported by Supported by the National High Technology Development 863 Program of China under Grant No.2001AA414320 and No.2001AA1 13142; the Key Research Program of ZheJiang province under Grant No. 2003C21013.

References 1. W.M.P. van der Aalst, M. Weskez. Advanced Topics in Workflow Management: Issues, Requirements, and Solutions. Journal of Integrated Design and Process Science, 2003. 2. David Hollingsworth, Workflow Management Coalition. The Workflow Reference Model, TC00-1003 Issue 1.1. 1995. 3. Junwei Cao, Stephen A. Jarvis. GridFlow: Workflow Management for Grid Computing. 3rd IEEE International Symposium on Cluster Computing and the Grid. Tokyo, Japan, 2003. 4. Anupriya Ankolekar, Mark Burstein. DAML-S: Web Service Description for the Semantic Web. The First International Semantic Web Conference (ISWC), Sardinia (Italy), June, 2002 5. Frank Leymann, Web Services Flow Language (WSFL 1.0), IBM Software Group, May 2001 6. Satish Thatte. XLANG: Web Services for Business Process Design. http://www.gotdotnet.com/team/xml wsspecs/xlang-c/default.htm, 2001 7. LiangZhao Zeng, Anne Ngu, Boualem Benatallah. An Agent-Based Approach for Supporting Cross-Enterprise workflow. Twelfth Australasian Database Conference, Queensland, Australia, 2001, 123~130. 8. W.M.P. van der Aalst, K.M. van Hee, and R.A. van der Toorn. Component-Based Software Architectures: A Framework Based on Inheritance of Behavior. Science of Computer Programming. 2002

Research of Online Expandability of Service Grid Yuan Wang, Zhiwei Xu, and Yuzhong Sun Software Division, Institute of Computing technology, Chinese Academy of Science Beijing 100080, China {wy,zxu,yuzhongsun}@ict.ac.cn

Abstract. Traditional information systems are usually predefined, closed systems with fixed structures, thus do not support on-line expansion. Such systems can not meet the rapidly changing requirements. In this paper the essence of online expansion is studied. Definitions about service grid and some basic properties such as input-output consistency, running stability, on-line expandability, connectivity, knowability, usability, substitutability are given. Finally a theorem on on-line expansion is proposed and proved.

1 Introduction Traditional information systems, centralized or decentralized, are difficult to meet rapid complex changing requirements. Since traditional information systems are usually predefined, closed systems with fixed structure, they do not support dynamic integration of newly developed systems with legacy systems. The future requirements of a system are difficult to predict when it is being designed. Therefore it is required that the system possesses on-line expandability so that newly developed functions can be dynamically added to the running system to adapt to the complex requirement changes. Some concrete on-line expandable systems have been reported in recent years. Professor K.Mori proposed the autonomous decentralized systems (ADS)[1]. Broadcast communication mechanism based on content code is used in ADS. Subsystems are loosely coupled, and they decide their own behavior based on the broadcasted content code. Each subsystem is required to have equality, locality and to be self-contained. This method does not fit for large scale grid systems due to the broadcast communication mechanism. Universal plug and play system[2] locate devices and services based on broadcast communication mechanism. The system has good on-line expandability. But it is mainly used in small scale network such as home networks and office networks. Grid system is a large scale system, with services widely distributed and owned by different organizations or people. It is expected that services can join or leave the service grid system freely according to their owner’s will without stopping the running grid system. A method of dynamic resource discovery and allocation in a grid

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 986–993, 2004. © Springer-Verlag Berlin Heidelberg 2004

Research of Online Expandability of Service Grid

987

environment is given in reference [4]. Dynamic service discovery approaches are presented in references [5] and [6]. The above works focus on concrete algorithms to solve on-line expandable problem or a part of it. Here we will make a study of the essence of on-line expandable problems. The problem considered is that what conditions should be met by a service and the service grid so that the service grid is on-line expandable for that service. Definitions about service grid and some basic properties such as input-output consistency, running stability, on-line expandability, connectivity, knowability, usability, substitutability are given in the paper. Finally a theorem on on-line expansion is proposed and proved. The rest of this paper is organized as follows. Section 2 presents the service grid and its mathematic application model. Based on the model, the input-output consistency and running stability are defined. Then the problem we are going to study is put forward. In section 3, basic properties such as connectivity, knowability, usability, substitutability are defined. In section 4, A theorem on on-line expandable is proposed and proved.

2 Service Grid and Online Expandable Problem Definition 1 (Service grid) Service grid is a system that provides function services, resource services and supports dynamic creating, running, maintaining and canceling of applications. The applications running in the service grid are considered to be logically independent, but physical relation (resource relation) may exist among applications There are three kinds of dynamic universes in the service grid: universe FUNSERVICE is the dynamic set of all function services in the service grid. universe SERVICE0 is the dynamic set of all resource services in the service grid. universe APPLICATION is the dynamic set of all running applications in the service grid. Definition 2 (Service grid application) A service grid application is composed of m function services,

where Applications and function services run with the support of resource services. the mathematic model of application

where

is the input of application

is

is the output of application

988

Y. Wang, Z. Xu, and Y. Sun

Online expandability of service grid is actually the requirement of service grid applications. Consider the situation that function services and resource services can dynamically join or leave the service grid system, the application model should be modified as

(4b) where

is the disturbance to application

due to dynamically joining or leaving

services or dynamically created or canceled applications. Definition 3 (Input-output consistency of grid application) If then where reqrespmatch: inputoutputconsistent: Equation (5) shows that application

possesses Input-output consistency.

Definition 4 (Running stability of service grid) if then runningstable(servicegrid) := true It means that If all applications running in service grid are input-output consistent, then the service grid is said to have running stability. Definition 5 (On-line expandability of service grid) If function services, resource services and/or applications can dynamically join or leave the service grid without destroying the running stability of the service grid, then the service grid is said to have on-line expandability. Definition 6 (on-line expandable state of service grid) on-line expandable state of service grid means that the affected services in the service grid due to actions of join, leave and substitution are in passive state that ongoing invocations to and from these services have been finished, new invocations to and from these services are temporarily suspended, and running context of these services are properly saved. Definition 7 (on-line expandable state reachability of service grid) If the on-line expandable state of service grid is reachable, the service grid is said to have on-line expandable state reachability The on-line expandable problem is that what conditions should be satisfied by a service and the service grid so that the service grid is on-line expandable for that service.

Research of Online Expandability of Service Grid

989

3 Properties of Services We define more universes and signatures which will be used in the following service properties. Universe is a set of services in the service grid. Universe SERVICE’ is a set of services currently not in the service grid. Universe PROTOCOL is a set of protocols. Universe BOOL={true,false} Universe INFOSERVICE is a set of information services in the service grid. Universe SERVICETABLE is a set of service tables which exist in information services. Universe DESCRIPTION is a set of service descriptions. The description of a service includes the features of the service and behavior constraints of the service. Universe INPUT is a set of inputs of services. Universe OUTPUT is a set of outputs of services. input: represents the input of a service or input description of a service. output: represents the output of a service or output description of a service. protocol: gives the protocol that a service uses. iomatch: yields true if one service input matches another one’s output. servicetable: gives the service table belonging to a information service. description: gives the description of a service. ssmatch: yields true if the description of a service matches that of another one. compatible: yields true if two service use compatible protocols connective: yields true if the service is connective. knowable: yields true if the service is knowable. usable: yields true if the service is usable. substitutable: yields true if the service is substitutable. Definition 8 (connectivity) If compatible(protocol(service’),protocol(service)) then connective(service’): = true It means that if the protocol that service’ uses is compatible with that of a service in the service grid, then service’ is said to be connective to the service grid. It is natural that all services in the service grid possess connectivity.

990

Y. Wang, Z. Xu, and Y. Sun

Definition 9 (knowability) if infoservice INFOSERVICE): description(service) servicetable(infoservice) then knowable(service):=true where It means that if the description of a service is saved in the service table of an information service of the service grid, then the service is said to be knowable to the service grid. Definition 10 (join and leave) If a service becomes known to the service grid from the unknown state, then the service is said to have joined the service grid. In this situation, SERVICE’ becomes SERVICE’-{service’}, and SERVICE becomes SERVICE+{service’} If a service becomes unknown to the service grid from the known state, then the service is said to have left the service grid. In this situation, SERVICE’ becomes SERVICE’ +{service}, and SERVICE becomes SERVICE-{service} Definition 11 (usability) for service1 If iomatch(input(description(service2)), output(description(service1)))=true then usable(service1):=true It means that if the input description of service2 matches the output description of service1, then the service1 is said to possess usability. Service1 can be used by service2. Note that It is assumed that services will function as specified by their descriptions. Definition 12 (substitutability) for service1 if ssmatch(description(service1), description(service2))=true then substitutable(service1,service2):= true It means that if there exists service2 whose description can match the description of service1, then service1 is said to possess substitutability.

4 A Theorem on Online Expandability of Service Grid System Assumption 1. Assume that the service to join the service grid possesses connectivity. Let’s justify the assumption. If the service service’ SERVICE’to be added does not possess connectivity, that is compatible(protocol(service’),protocol(service))=false. Two kinds of situation may exist: (1) service’ cannot be connected to the service grid system, it is actually a physically isolated island; service’ will not have knowability according to the definition of knowability and will not be used by service grid. (2) service’ is connected to the service grid system, but different protocols will make services in service grid and the newly joined service misunderstand each other and act wrongly and cause much disturbance d to the service grid, such kind of disturbance is

Research of Online Expandability of Service Grid

991

not controllable. In this situation, by equation (4), the running stability will be destroyed. Therefore, if the service service’ does not possess connectivity, the service grid is not on-line expandable for that unconnective service So it is reasonable to make the assumption. That is the service to be added should have compatible protocol with that of a service in service grid, so that the service can be physically connected to service grid. Theorem 1. Under assumption 1, Service grid is on-line expandable for a service if and only if that the service grid possesses the on-line expandable state reachability, and the service possesses knowablity, usability, and substitutability. Proof First we prove the sufficient condition. Since so we just need to prove the conclusion that if the service grid do not possesses the on-line expandable state reachability or a service does not possess knowability or usability or substitutability, the service grid is not online expandable for that service. If the service grid do not possesses the on-line expandable state reachability, that means a safety state for a service to join or leave or to be substituted can not be reached, actions of service join or leave or substitution will destroy the input-output consistency of the service grid application. So the service grid is not on-line expandable. If service’ to be joined does not possess knowability, that is infoservice INFOSERVICE: description(service’) servicetable(infoservice). by the definition of join, service’ has not join the service grid, service’ is actually a logical isolated island, it will not receive any service request, and will not be used in dynamic application creation. So the service grid is not on-line expandable for that unknown service. If service’ to be joined does not possess usability, that is iomatch(input(description(service)),output(description(service’)))=false. The response of service’ to a request will not be the desired one, therefore disturbance d caused by service’ will occur when service’ is called. Input-output consistency will not be satisfied, and running stability of the application including that service module will be destroyed. So the service grid is not on-line expandable for that unusable service. If to be left does not possess the substitutability, that is ssmatch(description(service1), description(service))=false. when service1 left the service grid, there is no service ( including composite services) can be used to replace the service, disturbance d will occur and cannot be removed, the running stability of the application using the service will be destroyed. So the service grid is not on-line expandable for that unsubstitutable service. Therefore the service grid is on-line expandable for a service if the service grid possesses the on-line expandable state reachability, and the service possesses knowability, usability, and substitutability.

992

Y. Wang, Z. Xu, and Y. Sun

The necessary condition is proved in a direct way as follows. We should prove that if the service grid is on-line expandable for a service, then the service grid possesses the on-line expandable state reachability, the service possesses knowability, usability, and substitutability. If the service grid is on-line expandable for a service, that is to say the service is able to join or leave the service grid or be substituted by another service without destroy the running stability of service grid application, these reconfiguration actions should be done at an on-line expandable state in order to maintain the running stability of the service grid application. By the definition of on-line expandable state reachability, the service grid possesses the on-line expandable state reachability. If the service grid is on-line expandable for a service, that is to say the service is able to join the service grid, by the definition of join, the service possesses knowability; By the definition of on-line expandability, the service which has joined the application system does not destroy the input-output consistency of the application system, that is to say, the output of the service matches with another service’s input. So the service possesses usability. By the definition of on-line expandability, the service which has left the application system should not destroy the running stability of service grid, that is to say the function of that service can be replaced by other service or composition of other services to ensure the running stability of the service grid. So the service possesses the substitutability. In a word, the theorem is proved correct.

5 Conclusion In this paper the essence of on-line expandable problem is studied. The theorem that under assumption 1, service grid is on-line expandable for a service if and only if that the service grid possesses the on-line expandable state reachability, and the service possesses knowability, usability, and substitutability is proposed and proved.

Acknowledgments. This work is supported in part by the National Natural Science Foundation of China (Grant No. 69925205), the China Ministry of Science and Technology 863 Program (Grant No. 2002AA104310), and the Chinese Academy of Sciences Oversea Distinguished Scholars Fund (Grant No. 20014010). We are grateful to many colleagues for numerous discussions on the topics discussed here, in particular Feng Baiming, Liu Xingwu, You Ganmei, Lu Yi.

Research of Online Expandability of Service Grid

993

References 1.

2. 3. 4.

5.

6. 7.

8.

Mori, K.. Autonomous Decentralized Systems: Concept, Data Field Architecture and Future Trends. Proc. IEEE Int. Symp. on Autonomous Decentralized Systems, (1993) 28– 34 Brent A. Miller, Toby Nixon, Charlie Tai and Mark D.Wood. Home Networking with Universal Plug and Play. IEEE Communications Magazine (2001), Vol.39 No.12 Foster, I., Kesselman, C., Nick, J., Tuecke, S.. Grid Services for Distributed Systems Integration. IEEE Computer (2002), Vol.35, No.6, 37-46 Gabrielle Allen; Dave Angulo; Ian Foster; Gerd Lanfermann; Chuang Liu; Thomas Radke; Ed Seidel; John Shalf. The Cactus Worm: Experiments with Dynamic Resource Discovery and Allocation in a Grid Environment. In Int. Journal of High-Performance Computing Applications (2001) Vol. 15, No. 4 Wolfgang Hoschek. A Unified Peer-to-Peer Database Framework and its Application for Scalable Service Discovery. Proc. of the Int’l. IEEE/ACM Workshop on Grid Computing (Grid’2002), Baltimore, USA, November (2002). Springer Verlag. Ratnasamy, S., Francis, P., Handley, M., Karp, R., and Shenker, S.. A scalable contentaddressable network. In Proc. ACM S1GCOMM, San Diego, CA, Aug. (2001), 161-172. Börger, E.. High Level System Design and Analysis using Abstract State Machines, in D. Hutter et al. (eds.), Current Trends in Applied Formal Methods (FM-Trends 98), LNCS1641, Springer, (1999) 1-43 Kramer, J. and Magee, J.. The Evolving Philosophers Problem: Dynamic Change Management. IEEE Transactions on Software Engineering, SE-16,11, (1990) 1293-1306

Modelling Cooperative Multi-agent Systems Lijun Shan1 and Hong Zhu2 1

Department of Computer Science National Univ. of Defence Technology Changsha, 410073, China [email protected] 2

Department of Computing Oxford Brookes University Oxford OX33 1HX, UK [email protected]

Abstract. Cooperative computing is becoming inevitable with the emerging of service-oriented computing and GRID becoming a ubiquitous computing resource. It is widely recognized that agent technology can be employed to construct cooperative systems due to agents’ autonomous and collaborative characteristics. We devise an agent-oriented modelling language called CAMLE for the analysis and design of MAS (Multi-Agent Systems). This paper presents the collaboration model that captures communication between agents. The structure of the collaboration model and the notation of collaboration diagrams are presented. Uses of the modelling language are illustrated by examples.

1 Introduction Cooperation between software systems shows its importance as GRID is becoming a ubiquitous computing resource. The recent years has also witnessed the emergence of service-oriented computing such as web services, where services can be dynamically discovered, negotiated, requested and provided. Agent technology has been widely recognized to be a viable approach due to agents’ autonomous and collaborative characteristics. Although cooperation is one of the key concepts in MAS, researchers have offered various definitions and typologies [1]. We consider cooperation as the embodiment of agents’ social ability. Agents can determine, to certain extent, when, how and with whom to interact at run-time. However, they must obey certain cooperation protocols to achieve their designed objectives. Design and analysis of such protocols is one of the central problems in the research on cooperative computing. This paper addresses this problem from an agent-oriented modelling approach. Researchers have investigated general problems associated with cooperation. Based on the speech act theory, a number of ACL (agent communication language) have been proposed, including KQML [2], FIPA ACL [3], etc. Recently, graphic notations are employed to model communication in MAS. For example, AUML

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 994–1001, 2004. © Springer-Verlag Berlin Heidelberg 2004

Modelling Cooperative Multi-agent Systems

995

describes agent communication protocols in a graphic notation that extends UML sequence diagrams [4]. However, few modelling language has been formally defined and reported in the literature. In [5, 6, 7, 8], we developed SLABS (Specification Language for Agent-Based System) and CAMLE (Caste-centric Agent-oriented Modelling Language and Environment) for engineering MAS. One of the central issues in MAS development is the modelling of agents’ cooperative behaviour. We address the problem at three levels. At the top level, a caste model defines the architecture of the system by grouping agents into various castes, which can be roughly considered as agent class; see [6] for more details and formal definition of the concept. At the middle level, communications between agents are specified in a collaboration model. At the lower level, a behaviour model defines the internal behaviour of various agents so that their cooperation with each other is realized by taking certain actions in certain scenarios. This paper focuses on the collaboration model. A collaboration model consists of a number of collaboration diagrams. Horizontally, the diagrams are organized as one general and some scenario-specific collaboration diagrams. Vertically, a hierarchy of collaboration models supports collaboration modelling on different granularity. The remainder of paper is organized as follows. Section 2 gives the background by briefly reviewing the conceptual model underlying our agent-oriented methodology. Section 3 presents the structure, notation and uses of collaboration model. Section 4 concludes the paper with a brief summary and outline of our related work.

2 Overview of the Conceptual Model This section briefly reviews the underlying conceptual model for MAS defined in SLABS and used in CAMLE. The conceptual model is from a software engineering perspective. The basic concepts can be characterized by a set of pseudo-equations. In particular, equation (1) states that agents are defined as real-time active computational entities that encapsulate data, operations and behaviour and situate in their designated environments. Here, data represent an agent’s state. Operations are the actions that an agent can take. Behaviour is a collection of sequences of state changes and operations performed by the agent in the context of its environment. By encapsulation, we mean that an agent’s state can only be changed by itself and it has its own rules that govern its behaviour in the designated environment to decide ‘when to go’ and ‘whether to say no’. As an extension to the notion of class in object-orientation, a caste has a set of agents as its members. As stated in equation (2), these members share a set of structural and behavioural characteristics defined by the caste. An agent can dynamically change its membership to castes during its existence by joining in a caste or retreating from its current caste at run-time. A caste may inherit from a number of other castes. Fig. 1 shows the structure of the description of a caste.

996

L. Shan and H. Zhu

Fig. 1. Structure of Caste Description in SLABS

Equation (3) states that a MAS consists of a set of agents. The environment of an agent is a subset of all agents in the system, as stated in equation (4). The environment description of an agent defines which agents are visible.

The mechanism of communication is that an agent’s actions and states are divided into two parts, the visible and invisible ones. Agents communicate with each other by taking visible actions and changing visible state variables, and by observing other agent’s visible actions and state variables, as expressed in equation (5).

3 The Collaboration Model A collaboration model captures cooperation in a MAS by a collection of diagrams. Communication defined in section 2 is represented in collaboration diagrams by a notation shown in Fig. 2. An agent node denotes a specific agent. Agents are the basic components of a system and are considered as black boxes with only their names inscribed in the nodes. A caste node denotes any agent in the caste. Interaction between agents is modeled by communication links that connect agent/caste nodes. A communication link labeled with a list of actions from node to represents that agent influences by taking and observing the actions. Actions can be numbered to denote the temporal order of their occurrence.

Fig. 2. Notation of Collaboration Diagrams

Fig. 3 shows an example of collaboration diagram that represents the interactions between the members of a university. For instance, an undergraduate student listens to his personal tutor for academic advice on selection of modules, attends lectures given

Modelling Cooperative Multi-agent Systems

997

Fig. 3. Example of Collaboration diagram

by faculty members and practical classes given by PhD students. When he graduates, he may want to apply for graduate course. Although the notation of our collaboration diagrams looks similar to that of collaboration diagrams in object-oriented methodologies such as UML [9], there are significant differences in the semantics. In OO paradigm, when a message is passed from object A to object B, object B must execute the corresponding method. Therefore, actions annotated on the link from A to B in UML diagrams are actually methods of B. In our model, however, the actions annotated on a link from A to B are visible actions of A, and agent B does not necessarily respond to agent A’s action. It fits well with the autonomous nature of agents. A flat diagram representation does not scale well for complex systems, so we extend the basic collaboration diagram to a collaboration model that comprises a set of diagrams to help handle systems’ complexity. We consider collaboration modelling from two perspectives: the agent perspective, viz. which agents are to be involved in each scenario of system behaviour, and the communication perspective viz. what communication the agents take to meet a specific global requirement. Therefore, the collaboration model is organized from the two aspects: the hierarchical organization of super-sub diagrams makes explicit the modelling domain, and the horizontal organization of general-specific diagrams characterizes various scenarios the agents participate in. Fig. 4 shows the example of a collaboration model’s structure. The system is directly composed of agents of three castes: A, B and C. Each of them can be decomposed into some components, called component agents. The process of decomposition terminates when some agents, such as and are identified as atomic components. An agent that is consists of a number of agents as component is called a compound agent. For each compound agent, such as the System, A, B and C, a collaboration model including one general and a number of specific diagrams is constructed to describe the collaboration between its components.

998

L. Shan and H. Zhu

Fig. 4. Example: A Collaboration Model’s Structure

3.1 Horizontal Structure of Collaboration Model One of the complications in collaboration modelling is on account of agents’ various behaviour in different scenarios during the system’s execution. By scenario, we mean a typical situation in the operation of a system. Various scenarios involving various sets of communications occur in their respective temporal sequences, therefore it is better to describe them separately. The collaboration model supports separation of scenarios by the general-specific diagram organization. A general collaboration diagram gives an overall picture of the communication between all the agents in a system by describing all visible actions an agent may take and all observers of the actions. Specific collaboration diagrams provide the means of grouping communications into separate diagrams in terms of scenarios. Each specific diagram describes a specific scenario by capturing a collection of related communications between some agents. For example, Fig. 5 shows two specific collaboration diagrams for the example of university. Diagrams in (a) and (b) respectively depict the scenarios of undergraduate’s study and applying for graduate course. They can be considered as presentation of specific parts described in Fig. 3. In each diagram, the actions are numbered to indicate their temporal orders in the specific scenario. Similarly, other scenarios in the university, such as graduate’s study and faculty’s work can also be described separately in specific collaboration diagrams. With the general and specific diagrams as complementary facilities for collaboration modelling, our language supports both decomposition and scenariodriven analysis approach. The decomposition approach means a whole-dividing process that begins by identifying all the agents’ actions and communications in a general diagram according to global system requirements. Then various scenarios that may occur during the system’s execution are plot out and communications involved in the specific scenarios are elaborated into specific diagrams. This approach may be suitable for the applications with a global requirement. In contrast, the scenario-driven approach means a part-integrating process that starts with specific situations

Modelling Cooperative Multi-agent Systems

999

Fig. 5. Examples of Specific Collaboration Diagram

modelling and finishes with a general description. This approach is suitable when a scenario-based representation of the application requirements has been given. It is up to the users to apply either of the two approaches or a hybrid of them in certain applications.

3.2 Vertical Structure of Collaboration Model The modelling language allows describing systems at a coarse granularity, that is, a system can be viewed as an agent that interacts with users and/or other systems in its external environment. Moreover, a sub-system can also be viewed as an agent that interacts with other sub-systems. As analysis deepens, the agents can be decomposed into components. Analysis of interaction among such component agents is in the same way as the analysis of the whole system. The only difference is that the environment of the components is clearer than the whole system, and such information can be carried over to the analysis of the components. Therefore a lower level collaboration diagram may have environment nodes, denoting the agents in the compound agent’s environment, drawn on the boundary. The lower level diagram which describes communication among component agents is called a sub-diagram. And the higher level diagram is called the sub-diagram’s super-diagram. Component agents are capable of communicating with the peer component agents as well as with external agents. A communication link from a component to an environment node indicates that the component agents take some particular tasks of its compound agent. In this way, the compound agent has its functionality decomposed through the decomposition of its structure. Fig. 6 shows an example of the decomposition of the caste DeptOffice in a lower level collaboration diagram. The caste DeptOffice in Fig. 3 means a department office

1000

L. Shan and H. Zhu

Fig. 6. Collaboration Diagram for Decomposition of DeptOffice

in the university. The castes Undergraduate and Faculty and agent DeptHead that interact with the caste DeptOffice described in Fig. 3 are carried to Fig. 6 as the environment nodes. The DeptOffice consists of three castes: the StudentManager, ModuleManagers and StaffManagers. This lower level diagram describes the internal structure of the DeptOffice and the interactions between the component agents. Component agents can be further decomposed into a set of components if necessary, followed by analysis of their communications in lower level diagrams. Such a refinement can be carried on until the problem is specified adequately in detail. Thus, a collaboration diagram on system level that specifies the boundary of the application can be eventually refined into a collaboration model comprising a hierarchy of collaboration diagrams on various abstract levels. Of course, the hierarchical structure of collaboration diagrams can also be used for bottom-up design and composition of existing components to form a system. In order to obtain a meaningful collaboration model, consistency between general and specific diagrams and that between models at different levels must be assured. Consistency constraints on collaboration model as well as other constraints for CAMLE model are defined in [8].

4 Conclusion This paper presents a collaboration model that captures communications in MAS by describing the agents’ interconnections through action taking and observing. Thus actions as a part of an agent’s internal capability are related to its external behaviour in terms of its cooperation with others. This view of communication leads to the independence of collaboration model to ad hoc communication languages or protocols, therefore makes it easy to model cooperation in a rather early stage of system analysis and enable engineers to focus on the conceptual analysis and design of agent communication. Diagrams in a collaboration model are organized into a

Modelling Cooperative Multi-agent Systems

1001

hierarchy to represent agents on different levels. Separation of concerns in terms of various scenarios of system behaviour helps engineers to manage complexity and to employ decomposition analysis or scenario-driven approach in specific applications. The work reported in this paper is a part of our research for modelling, formally specifying and developing MAS. An environment supporting multi-view modelling of MAS in CAMLE language has been designed and implemented. Besides the support to model construction, the environment can perform consistency checking for models of an application against the consistency constraints and can transform diagrammatic models in CAMLE to formal specifications in SLABS. Work in this direction will be reported separately.

Acknowledgement. The work reported in this paper is partly supported by China High-Technology Programme (863) under the Grant 2002AA116070.

References [1] [2] [3] [4] [5] [6] [7]

[8] [9]

J. E. Doran, S. Franklin, N. R. Jennings & T. J. Norman. On Cooperation in Multi-Agent Systems. Panel discussion at the First UK Workshop on Foundations of Multi-Agent Systems (held at the University of Warwick on Oct. 23rd 1996). Y. Labrou and T. Finin. A Proposal for a New KQML Specification. Tech. Report TRCS-97-03, Computer Science and Electrical Engineering Dept., Univ. of Maryland, Baltimore County, Baltimore, Md., 1997. FIPA. FIPA’99 Specification Part 2: Agent Communication Language. Available at http: www.fipa.org B. Bauer, J. P. Muller and J. Odell. Agent UML: A Formalism for Specifying Multiagent Software Systems. International Journal of Software Engineering and Knowledge Engineering. Vol. 11, No. 3, pp. 1-24, 2001. H. Zhu. SLABS: A Formal Specification Language for Agent-Based Systems. International Journal of Software Engineering and Knowledge Engineering, Vol. 11. No. 5, pp529~558. 2001 H. Zhu. Representation of roles in caste, Technical report TR-DoC-03-01, Department of Computing, Oxford Brookes University, 2003. L. Shan and H. Zhu. Modelling and specification of scenarios and agent behaviour, To appear in IEEE/WIC Conference on Intelligent Agent Technology (IAT’03), Halifax, Canada, Oct. 2003. L. Shan and H. Zhu Consistency Constraints on Agent-Oriented Modeling of MultiAgent Systems, Technical Report TR-DOC-03-03, Department of Computing, Oxford Brookes University, Oxford, UK, Nov. 2003. G. Booch, J. Rumbaugh, and I. Jacobson. The Unified Modeling Language User Guide. Addison Wesley. 1999

GHIRS: Integration of Hotel Management Systems by Web Services Yang Xiang, Wanlei Zhou, and Morshed Chowdhury School of Information Technology, Deakin University Melbourne Campus, Burwood 3125, Australia {yxi, wanlei, muc}@deakin.edu.au

Abstract. Nowadays web services technology is widely used to integrate heterogeneous systems and develop new applications. In this paper, an application of integration of hotel management systems by web services technology is presented. The Group Hotel Integration Reservation System (GHIRS) integrates lots of systems of hotel industry such as Front Office system, Property Management system, Enterprise Information System (EIS), Enterprise Information Portal system (EIP), Customer Relationship Management system (CRM) and Supply Chain Management system (SCM) together. This integration solution can add or expand hotel software system in any size of hotel chains environment.

1 Introduction It is generally accepted that the role of web services in businesses is undoubtedly important. More and more commercial software systems extend their capability and power by using web services technology. Today the e-commerce is not merely using internet to transfer business data or supporting people to interact with dynamic web page, but are fundamentally changed by web services. The World Wide Web Consortium’s eXtensible Markup Language (XML)[10] and the extensible Stylesheet Language (XSL)[11] are standards defined in the interest of multi-purpose publishing and content reuse and are increasingly being deployed in the construction of web services. Since XML is looked as the canonical message format, it could tie together thousands of systems programmed by hundreds of programming languages. Any program can be mapped into web service, while any web service can also be mapped into program [8]. In this paper, we present a next generation commercial system in hotel industry that fully integrates the hotel Front Office system, Property Management System, Customer Relationship Management System, Quality Management system, Back Office system and Central Reservations System distributed in different locations. And we found that this system greatly improves both the hotel customer and hotel officer’s experiences in the hotel business work flow. Because current technologies are quite mature, it seems non difficulty to integrate the existing system and the new coming systems (for example, web-based applications or mobile applications). However, currently in hotel industry there are few truly integrated systems used because there are so many heterogeneous systems already exist and scalability, maintenance, price, security issues then become huge to be overcome. From our study on Group Hotel Integration Reservation System (GHIRS), M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1002–1009, 2004. © Springer-Verlag Berlin Heidelberg 2004

GHIRS: Integration of Hotel Management Systems by Web Services

1003

there are still challenges to integrate Enterprise Information System (EIS), Enterprise Information Portal system (EIP), Customer Relationship Management system (CRM) and Supply Chain Management system (SCM) together because of standardization, security and scalability problems, although GHIRS is one of few integration solutions to add or expand hotel software system in any size of hotel chains environment. We developed this system to integrate the business flow of hotel management by using web services and software integration technologies. In this paper, firstly we describe a scenario of hotel reservation and discuss the interaction between GHIRS and human. Secondly we analyse details of design and implementation of this system. The result and implications of the studies on the development of GHIRS are shown in the later part. Finally we discuss some problems still need to be improved and possible future directions of development.

2 Hotel Reservation: A Business Case Study Our initial thinking to develop GHIRS is to minimize the human interaction with the system. Since GHIRS is flexible and automated, it offers clear benefits for both hotel customers and hotel staff, especially for group hotel customers and group hotel companies. Group hotel companies usually have lots of hotels, restaurants, resorts, theme parks or casinos in different locations. For example, Shangri-La group has hundreds of hotels in different countries all over the world. These groups have certain customers who prefer to consume in hotels belong to the same group because they are membership of the group and can have individual services. The first step of a scenario of hotel reservation is that the consumer plans and looks for a hotel according the location, price or whatever his criteria and then decides the hotel. Then he makes a reservation by telephone, fax, internet, or mail, or just through his travel agent. When hotel staff receives the request, they first look if they can provide available services. If there is enough resource in the hotel, they prepare the room, catering and transportation for the request and send back acknowledgement. At last the guest arrives and checks in. The business flow is quite simple; however, to accomplish all these tasks is burdensome for both the consumer side and the hotel side without an efficient and integrated hotel management system. Telephone may be a good way to make a reservation because it is beyond the limit of time and space. Guests can call hotels at any time and any place. However, it costs much when the hotel is far away from the city where guest lives; especially the hotel locates in a different country. More over, if there is a group of four or five people to make reservation together, it would take a long time for hotel staff to record all the information they need. Making reservation by travel agent saves consumers’ time and cost, but there is still millions of work for agent to do. They gather the requirements from consumers, then distribute to proper destination hotels. Because these hotels don’t use a same system (these thousands of hotels may use hundreds of management systems), someone, agent or hotel staff, must face the problem how to handle information from different sources with different hotel management systems to different destinations. Web service becomes the tool to solve these problems. Our web services integrate the web server and hotel management system together, and everyone gets benefit. Booking a room easily anywhere and anytime becomes possible by using GHIRS.

1004

Y. Xiang, W. Zhou, and M. Chowdhury

Consumer browses websites and finds hotel using his PC, PDA or mobile phone (WAP supported), after his identity is accepted, he can book a reservation. Two minutes later he can get the acknowledgement from the hotel by mobile phone text message or multimedia message, or email sent to his email account or just acknowledgement on the dynamic web page, if he hasn’t leave the website. The response time may take a little longer because when the hotel receives the quest, in some circumstance, hotel staff should check if there is clean and vacant room left. The web service is a standard interface that all travel agents can handle, gather and distribute the reservation information easily through internet. When the reservation request is acknowledged, hotel staff prepares the room, catering, and transportation for guests. Since the information already stored in the database, every part in the hotel chains can share it and work together properly. For example, staff in front office and housekeeping department can prepare room for guests according to the data, staff in back office can stock material for catering purpose and hotel manager can check business report in Enterprise Information Portal integrated with GHIRS by his browser. Then room rent-ratio reports, room status reports, daily income reports and other real time business reports are generated. Managers of the group can access any report of any hotel by the system. In the later part of this paper, we will show how consumers, agents, and hotel staff can efficiently work together by GHIRS. GHIRS is scalable for small-to-large hotel chains and management companies, especially good for hotel group. It truly soars with seamless connectivity to global distribution systems thereby offering worldwide reservation access. It also delivers real-time, on line reservations via the Internet.

3 Integration of Hotel Management System 3.1 Existed System GHIRS is developed on the base of an existed hotel management system called FoxhisTM. FoxhisTM shares the largest part of software market in hotel industry in China. FoxhisTM version 5 has distributed Client/Server architecture that the server runs Sco-UNIX and client runs Microsoft Windows and it use Sybase database on UNIX. The system includes Front Office system, Property Management system, Quality Management system, Human Resource Management system, Enterprise Information Portal system (EIP), Customer Relationship Management system (CRM) and Supply Chain Management system (SCM). This system is largely based on intranet environment. Most of the work is done in a single hotel by the hotel staff. It’s no customer self-service. If a consumer wants to book a room, hotel staff in local hotel must help the guest to record his request, although FoxhisTM system already done lots of automatic job. When the systems are deployed in different hotels that are parts of a group, sharing data becomes a problem. Just as an example, if the group has ten hotels, there would be at least ten local databases to store the consumers’ data. Because hotels need real time respond of the system, so these ten hotels can’t deploy a central database that does not locate in the same local network. Thus one guest may have different records in different hotels and the information can not be shared. By web services as an interface, these data can be exchanged easily.

GHIRS: Integration of Hotel Management Systems by Web Services

1005

3.2 Design Recall that our initial thinking to deploy GHIRS is to save hotel staff, travel agents and consumers’ labour work, the system is to link all the taches of hotel business chains. Figure 1 shows how consumers, agents, hotel staff cooperate together efficiently with the system.

Fig. 1. How consumers, agents and hotel staff work together

Consumers could be divided into two categories. One is member of hotel group, who holds different classes of memberships and gains benefits like discount or special offers. These consumers usually contribute a large part of the hotel’s profit then are looked as VIP. The hotel keeps their profiles, preferences and membership account status. The other category is common guest. All these two kinds of guests and travel agents who may trade with many other hotels face the web-based interface that let them to make a reservation. For common guest, the system just requires him to input reservation information such as guest name, contact information, arrival and departure date, room type, number of room, and preference etc. Then his request is submitted to the system. The central processing server then distributes the information to appropriate hotel. Since web services technology is so good for submitting documents to long running business process flows, hotel staff could easily handle this data in and out of database management system and application server. As the membership of hotel, a user just inputs his member id and password, room information, arrival and departure date, then finish the request. Because hotels keep members’ profile, and systems exchange profile across all hotels of the group by web services, hotel staff in different hotels could know the guest’s individual requirement and provide better services.

1006

Y. Xiang, W. Zhou, and M. Chowdhury

The agents work for consumers get benefits from GHIRS as well. They may also keep the consumers’ profile and the web services interface is open to them, it is easy to bridge their system to hotel management system. Before GHIRS is deployed, the agents should separate and process the reservation data and distribute them to different hotels, which is an onerous job. But now the agents could just press one button and all the hotel reservation is sent to destination. Hotel staff receives all request from different sources. Some policies are applied to response the request. For example, some very important guest’s request is passed automatically without confirmation, the guest could get acknowledgement in very short time. The request triggers all chains of the hotel business flow and all the preparation work is done before his arrival. But for the common customer, hotel staff would check on the anticipate date if there is vacant and clean rooms available. Because all the FoxhisTM components are integrated together, staff users needn’t change computer interface to check the room status. If it is a valid request with enough guests’ information and there is enough room left, a confirmation is sent back. If there is not enough vacant room, hotel staff will ask if guest would like to wait a time or transfer to other hotels in the hotel group or alliance hotels. In order to transfer guest’s request, data flows from local database to the central server through local web server, then it is passed to another hotels database by web services interface.

3.3 Implementation Today there are lots of platforms that could provide capabilities to integrate different system and offer other features such as security and work load balancing. The two main commercial products are Java 2 Enterprise Edition (J2EE) and Microsoft.NET. They offer pretty much the same laundry of list of features, albeit in different ways. We choose .NET platform as our programming environment, however, here we don’t advocate which platform is better or not. Our target is to integrate these decentralized and distributed system together. In fact, both of these platforms support XML and SOAP to accomplish our task. We use Microsoft Internet Information Services (IIS) as web server and Sybase database server. The firewalls separate the local networks from the public networks. This is very important from the security point of view. Each hotel of the group has a database server, an application server and a web server to deploy this multi-tier system that includes the user interface presentation tier, business presentation tier, business logical tier, and the data access tier. C# is adopted as the programming language for the core executable part. XML is the data exchange standard format.

4 Evaluation and Challenges 4.1 Evaluation of GHIRS New Century Tourism group is the first enterprise in hotel industry deployed this system in China. It includes three 5-star hotels and seven 4-star hotels locate in different cities in east China. Each hotel has about from 200 to 300 standard rooms and luxury rooms with other facilities such as restaurant, bar, swimming pool,

GHIRS: Integration of Hotel Management Systems by Web Services

1007

recreation room and so on. Then we conducted an evaluation study focusing on the staff performance of hotel reservation and consumer experience. The first quarter after deployment the group make ten percent more profit than last year. We found that by using the system the time staff spent in front of their computers reduced greatly. For instance, the staff in front office should spend 2 minutes to finish a reservation process from paper information and 4 minutes from telephone information, according the observation of 69 general front office staff in this hotel group. If reservation is done by help of GHIRS, only 30 seconds are needed for staff to confirm available room. If no confirmation needed, the system just does everything automatically. So our aim to reduce the human interaction with the system receives positive result. On the side of consumers, it also improves the hotel reservation experience greatly. Although nowadays computing technologies are still not “disappear”, as described by Mark Weiser as “they weave themselves into the fabric of everyday life until they are indistinguishable from it”[12]. With GHIRS, people needn’t be stuck facing his PC or make an expensive long distance phone call to make a reservation. Instead, by only sending cheap mobile short text messages to certain number, it can be finished. Or he can browse his palm size device and submit his request. Although integrating wearable computing device such as IBM’s Linux smart watch [3] outside the laboratory and into our system still faces significant challenges, our approach successfully offers various ways to make a hotel reservation.

The system is compared with other two hotel management systems. System 1 is a US software company’s product that dominates the largest market part in China’s 5star hotels. The other is also a popular system shares second largest market part in China’s 3 to 5-star hotels. We choose each system’s latest version. Compared with these current commercial hotel management systems, GHIRS has lots of advantages.

1008

Y. Xiang, W. Zhou, and M. Chowdhury

The most strength of GHIRS is integration that seamlessly links different systems of hotel chains.

4.2 Challenges Web services technology is developed rapidly as a practical means to integrate heterogeneous systems and develop new applications. Our project of GHIRS successfully integrates systems in hotel business chains together. Although research has greatly advanced in this area, developing and maintaining integration system in hotel industry remains a lot of challenges. First of all, our experience implicated that in order to deploy a global, or to less scale, national wide hotel reservation system, there is a long road ahead for hotel industry to get IT standardization. Nowadays there are several hotel alliances existed, but these organizations do not touch data exchange standard between enterprises. So our work is limited in the range of certain hotels deployed FoxhisTM system previously. If some agreements on hotel industry standards would be settled, business to business (B2B) system and business to customer (B2C) system in hotel industry would provide both enterprises and consumers more benefits. Secondly, security is one of the most important and complicated challenges related to our work. There are lots of research papers on providing security solution of integration of heterogeneous computers and resources spread across multiple domains with the intent to provide users services. We intend to obtain three basic targets. One is to keep the contents confidentiality and integrity - that is to ensure nobody ever tempers or steals the data transferred through public networks. The other is to control access to web services. Before use web services, end users must pass the authorization procedure. Finally, but not least importantly, protecting the server from malicious attacks is practical and imperative problem because there always no enough security when services opened to the Internet. Beside the technical issues, there are also social issues on security and privacy. The consumer might don’t care a certain hotel or hotel group to keep his personal profile. For example, he is an acrophobe and always prefers the room on the ground floor. If his personal information is open for other organizations for instance, the travel agents, to access through web services, it is possible for them to abuse the information or be eavesdropped by some criminals. Then the problem becomes very serious. Some legislation still need be developed to solve this problem.

5 Conclusion In this paper, an integration of hotel management systems by web services and relative future works are presented. The Group Hotel Integration Reservation System (GHIRS) integrates lots of systems of hotel industry such as Front Office system, Property Management system, Enterprise Information System (EIS), Enterprise Information Portal system (EIP), Customer Relationship Management system (CRM) and Supply Chain Management system (SCM) together. Consumers push our system to go ahead with requirements of easy to use features. Therefore, enterprises need an increasingly robust IT infrastructure to handle the

GHIRS: Integration of Hotel Management Systems by Web Services

1009

unpredictability and rapid growth associated with e-business ventures [5]. Some smart and intelligent systems with more integration would be developed. So we should develop a generic web services system work with airline web services processor, restaurant web services processor and hotel web services processor. We have make some attempt on integrate Chinese airline service into GHIRS and delightfully, now it could query a domestic flight to any city in China as well as flight fare, by interact with the system of Civil Aviation Administration of China (CAAC). Now technology has made the likely difficult and daunting problem easier to solve. However, we still need more efforts to build a system with high scalability and security.

References 1.

B. Benatallah, M. Dumas, M.-C. Fauvet, F. A. Rabhi, Quan Z. Sheng, Overview of some patterns for architecting and managing composite web services, ACM SIGecom Exchanges, vol. 3, no. 3, June 2002 2. T. Berners-Lee, J. Hendler, O. Lassila, The Semantic Web, Scientific American, May, 2001 3. Chandra Narayanaswami, Noboru Kamijoh, Mandayam Raghunath, etc., IBM’s Linux Watch: The Challenge of Miniaturization, IEEE Computer, 33-41, January 2002 4. Ernesto Damiani, Sabrina De Capitani di Vimercati, Pierangela Samarati, Towards securing XML Web services, Proceedings of the 2002 ACM workshop on XML security, November 2002 5. Ian Foster, Carl Kesselman, Jeffrey M. Nick, Steven Tuecke, Grid Services for Distributed System Integration, IEEE Computer, 37-46, June 2002 6. M. Kudo and S. Hada. XML Document Security and e-Business applications. In Proc. of the 7th ACM Conference on Computer and Communication Security, Athens, Greece, November 2000 7. Lei Li, Ian Horrocks, E-commerce: A software framework for matchmaking based on semantic web technology, Proceedings of the twelfth international conference on World Wide Web, May 2003 8. Eric Newcomer, Understanding Web Services: XML, WSDL, SOAP and UDDI, AddisonWesley, 2002 9. David Trastour, Claudio Bartolini, Chris Preist, Semantic web support for the business-tobusiness e-commerce lifecycle, Proceedings of the eleventh international conference on World Wide Web, May 2002 10. W3C, Extensible Markup Language (XML), http://www.w3.org/XML, 2003 11. W3C, Extensible Stylesheet Language Family (XSL), http://www.w3.org/Style/XSL, 2003 12. M. Weiser, The Computer of the 21st Century, Scientific American, vol. 265, no. 3, 66-75, 1991

Cooperative Ants Approach for a 2D Navigational Map of 3D Virtual Scene Jiangchun Wang and Shensheng Zhang CIT Lab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, P.R.China [email protected]

Abstract. The problem of intelligent agent navigation in virtual environment is the subject of many recent AI researches. Although many solutions have been proposed for these, the ever-growing complexity of virtual environments inhabited by sophisticated characters makes it necessary to further elaborate computational models used for building such a navigational map. In this paper we propose a cooperative ants approach for this problem. The novel idea of our approach is the implicit collaboration way, which free us from the complex communication among avatars in a Distributed Virtual Environment (DVE) system. There are some collision exactly, but our approach makes it least and keeps system high efficient.

1 Introduction Recent advances in graphic engines and software tools have facilitated the development of visual interfaces based on 3D virtual environments (VEs) [1]. For imbedding and navigating our own virtual objects in a virtual environment, a 2D planar map is very useful. The map is defined according to the cell decomposition method [2]: To simplify the navigation problem, in the cell decomposition method the environment is subdivided into simple cells. Then, the so-called connectivity graph is created to represent the adjacency information for the cells. After that, this graph is used for path planning. But the generalization of the map is not easy. Wang had proposed a multigrid method for getting such a map [3], which records the trace of an avatar in the scene to draw a planar map. As we known, cruising in a virtual environment is CPU cost and need spending a lot of time on graphic processing. It is a heavy burden to construct the map on a single computer in a limited time. The parallel computation is prospective field to combine many computers’ power to deal with such a problem, and agent is an excellent concept at present. Here, we propose a distributed multiagent system to extract planar map of 3D virtual scene. The natural metaphor on which ant algorithms are based is that of ant colonies. Real ants are capable of finding the shortest path from a food source to their nest [4,5] without using visual cues [6] by exploiting pheromone information. While walking, ants deposit pheromone on the ground, and follow, in probability, pheromone previM. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1010–1017, 2004. © Springer-Verlag Berlin Heidelberg 2004

Cooperative Ants Approach for a 2D Navigational Map of 3D Virtual Scene

1011

ously deposited by other ants. A way ants exploit pheromone to find a shortest path between two points is shown in Fig. 1.

Fig. 1. How real ants find a shortest way.

When ants arrive at a decision point. They select different roads according to the pheromone of it left by previous ants. If there are equal on both sides, they will choose any sides randomly. But pheromone accumulates at a higher rate on the shorter path, so more and more ants’ decision get the shortest path. The number of dashed lines is approximately proportional to the amount of pheromone deposited by ants. Our approach is from the ant colony. We do not care for shortest path in the 3D virtual scene, but have great interest in getting total information of road in the scene. In our approach, many avatars (as ants) are employed to rove in the scene. They work hard to get 2D roadmap of this 3D scene together. At the same time, they try their best to avoid walk on the same way. The other avatar’s trace (pheromone) on the road is criterion for decision points.

2 System Design and Architecture The whole system takes client-server architecture and each client is a MAS [7]. C-S structure makes the system more robust and easier to design. Autonomous agents and multi-agent systems represent a new way of analyzing, designing, and implementing complex software systems. Furthermore, it is easier to manage and coordinate such a complex system than peer-to-peer structure. The architecture of the system and the relationship between its components is illustrated in Figure 2.

2.1 Root Server Design The server as a root is designed to produces many clients as nodes, assign tasks and manage the nodes’ performance. It includes Communication Agent, NodeManagement Agent, Map Agent, and Strategy Agent. (a) Communication Agent. The communication agent is responsible for communicating with nodes and calling other agents according to the information type it receives from nodes. It listens to the connection.

1012

J. Wang and S. Zhang

(b) Nodes-Manage Agent. The nodes-manage agent maintains node information such as registered name, computational capability, and workload status, along with their current network status. (c) Strategy Agent. The strategy agent is a core part, which valuates all computational results from nodes and gives feedback according to its own knowledge. It maintains an estimation matrix for the working 2D map. While new computational result comes, it looks up the estimation matrix, updates the matrix based on the ECA rule, and notify map agent to draw. (d) Map Agent. The map Agent draws the 2D map of the virtual scene basing on the strategy agent evaluation

Fig. 2. The architecture of the system.

2.2 Node Design Every node is especially useful to construct complex reactive system whose behavior cannot be estimated in advance. Our nodes can load a virtual scene, and assign an avatar to search every place. Stimulated by the reward from the root server, it works happily and reports to the root server on its finding. The node is mainly composed of the following agents: Communication agent, Avatar agent, and VR Scene agent. (a) Communication agent. The communication agent here is simpler than that in root server, because it just is responsible for transferring message and network connection maintenance. (b) VR Scene Agent. The VR scene agent is responsible for receiving operation information from the avatar agent, updates the virtual scene according to the operation and returns feedback. We use a VRML browser as its function body. VRML is an ISO standard used to model virtual reality [8]. (c) Avatar Agent. Avatar agent autonomously manipulates an avatar’s activity in the VR scene and distinguishes current position in the 2D map on the feedback and its history knowledge. When Avatar agent acquires new information about the required 2D map, it will send that to root server expecting for a reward.

Cooperative Ants Approach for a 2D Navigational Map of 3D Virtual Scene

1013

3 Key Techniques 3.1 Novel Cooperation Method among Isolated Nodes 3.1.1 Implicit Collaboration It is only the root server who knows every node. It maintains an estimation matrix T for the working 2D map. Every node does not know or mind others’ existence. The element position in T reflects respectively the one (x,y coordinates) in the 2D planar map. When new computational result comes, the estimation matrix will be looked up, and then provides the reward. To simplify the problem, we had divided the virtual scene into many grids at the specified granularity. Every grid is minimal measurement for differing road or obstacle in the scene. So the planar map is changed to a mesh. Every grid has a counterpart in the mesh, as shown in figure 3. The size of the matrix T is corresponding to the size of planar mesh for the virtual scene. The format of computational result is a triple R(x, y, s). x And y denote a position in the Cartesian coordinates for the 2D map, and s means the status of that position. The reward value is After providing the reward, the strategy agent will update the element

according to the following equation:

Where d is a

repulsive force. Eq.1 means that the more is the same place accessed, the less is the reward value. The matrix T is our “Artificial Potential Field”[9,10]. The node’s agent moves in the field of force. The element of the matrix T is a pole for every node agent, and the element value is for the polarity — positive is attractive, negative is repulsive. We set initial value of all elements of T with 1, and d=0.5. IF element then the corresponding grid had been checked. Every node can know where had been done by others, and will avoid redoing the invalid work for an unsatisfied reward, although they do not directly communicate with each other. Figure 4 illustrates such a condition. 3.1.2 Feedback Mechanism Every node can be looked as an avatar simply in the virtual world. Everyone doesn’t care for others existence. Though searching in the same virtual scene, they don’t interact with each other, directly. Everyone only works hard by its mind. When it moves to an unknown place and reports the find to root server, looking forward to the reward, the strategy agent in root server will check if the find is new or old and then gives the reward. New means high value, and old is for the low or punitive value. After reward feedbacks, the avatar agent will make measurement according to different feedbacks. Regularly steps will be taken: Step 1: The avatar works hard to find a fresh result and report it to root server. Step 2: The strategy agent in root server evaluates every find and returns reward. Step 3: According to reward, the avatar takes new measure.

1014

J. Wang and S. Zhang

Table 1 lists some policies taken by the avatar. These policies are mapping relations defined with ECA rules. We consider the condition evaluation first. The evaluation of a condition c formally corresponds to function f(c), which either evaluates to TRUE or FALSE (see equation (2)). Usually c is recursively composed by c1, c2, ... , cn subconditions, which are concatenated by boolean operators. Atomic conditions are all kind of equations, inequations and boolean values:

The firing of an ECA rule is defined as follows:

Fig. 3. The corresponding between a mesh and the estimation matrix.

Fig. 4. Avoidance policy among nodes

Cooperative Ants Approach for a 2D Navigational Map of 3D Virtual Scene

1015

3.1.3 Task Suggestion In case of many avatars do overlapped work, it is important to avoid invalid searching in an enclosed area in which had been searched. We design a Task Suggestion strategy as a solution. When one of avatars found there is not any border grid, which is not overlapped. This means that its active area is enclosed by others’ region. It need penetrate these bulwarks to breathe fresh air. But with such an ineffective operation, the performance of the whole system will be down to minimum. So, when an avatar is in that condition, it will request a suggestion from the root server, and change its operation according to the answer. Regularly steps will be taken: Step1: When an avatar found it enclosed by others’ region, it asks the root server for a suggestion. Step2: Root server deals with the request and return the answer. In the root server, a few steps are executed: i. Communication agent forwards this message to nodes-manage agent. ii. Nodes-manage agent updates its records, and consults with strategy agent. iii. A suggestion instruction is returned from them through communication agent. Step3: The avatar does something according to the suggestion. The connection between communication agents is based on TCP/IP and the communication language is KQML.

3.2 Formula Description for an Avatar The avatar is a speculator, who decides next step according to maximum benefit. There are some definitions to illustrate it explicitly. Def. 1 the i th avatar, who is a speculator. Def. 2 R is a result, which is a triple (x, y, s), reported by an avatar. Def. 3 r is a reward from the root server Def. 4 K is the accumulated knowledge of the virtual scene in an avatar. Def. 5 M is the 2D map, which is constructed as a mesh. Def. 6 T is an evaluation matrix responding to M . Def. 7 t is the time variant. Def. 8 Measure is work strategy employed by an avatar. There are also some functions: L(R, T) is evaluation function for a reward R . Here we let Update(T, Event) is an update function of matrix T, and it is activated by an event. E(R) is an expectation of

for a result R .

Value(E(R),r) is reactive function for a reward r and an expectation E(R) , which decide the next measure for a avatar., such as table 1, and many path-planning algorithms , such as A*, NN etc, can be employed here, and ECA rules are employed here: Value(E(R), r) Measure So basing on the above, we can get our system dynamo formula of an avatar: 0.

1016

J. Wang and S. Zhang

1. 2. 3. 4.

(Update can partially affect the global evaluation matrix T,

because there are many avatars. So we use

5. 6. t = t +1 (to the next iteration) From the description above, we can see that the system dynamo formula is circular until a new measure of step 5 decides to stop. The critical step 5 contain many policies such as path-planning algorithms and EGA rules.

4 Experiments In this section, we select the VRML 97 and Cortona VRML Client 4.0 on HP workstation x4000 as our test-bed to evaluate the approach discussed in the previous section. The number of nodes is certain and we use different methods to generate the start position of each avatar in the virtual scene. We define some evaluation variants, convenient for explanation. O overlap ratio, which is P over mn (the number of grids in the scene). S speed-up ratio, which is the product of N and mn over P. B AO the average overlap ratio among avatars, which is O over N. In this experiment, we set the number of avatar is 20, N=20. The size of 3D scene is also 100×70. But we use two types of methods to assign the start position of each avatar. These methods are Random Distribution. Let the position of an avatar be (x,y) and the values of x and y are randomly distributed between the length dimension of the virtual scene and

and

where

is

is the width dimension of the

virtual scene, according to the 2D map. Linear Uniform Distribution. Select two end points,

and

in the virtual scene. Let the position of an avatar be (x, y) and the values of x and y are uniformly distributed between and Here, we select 6 kinds of typical lines for comparison: four edges and two diagonals. Table 2 illustrates the experimental result under random and linear uniform distributions. Fluctuations of three factors are all changed much on different avatars distribution. O’s fluctuation is 60.23%, and same as AO’s. The speed-up ratio varies 31.26%. It means our approach is affected by initial location of avatar, obviously. After carefully consideration, we found the reason lying in the avatars’ intelligence. Our avatar is not smart enough, with a simple path-finding strategy, which is just from top, right, and bottom to left side to try to access an adjacent grid. So, when some optimization is done, our approach is more stable than before.

Cooperative Ants Approach for a 2D Navigational Map of 3D Virtual Scene

1017

5 Conclusions and Future Work We have demonstrated a cooperative ants approach for getting a 2D navigational map of 3D virtual scene, including system architecture and collaboration way. We also designed some experiments to test our approach. The novel idea in our system is the implicit collaboration way among avatars. Although we do not mind collisions, our system only makes it least and keeps itself high efficient. In the future we will optimize our avatar and make it cleverer, with some robotic algorithm. Acknowledgements. This work was supported by the National High Technology Plan 863/CIMS under the grant No: 2001AA412010.

References Tiziana Catarci, Thomas, “Using 3D and Ancillary Media to Train Construction Workers”, Multimedia at Work, April 2002, pp88-92 2. Latombe J-C. Robot motion planning. Dordrecht: Kluwer , Multi-agent Coordination and Its Implications for an Agent architecture. Autonomous Agents and Multi-Agent Systems.1998. (1)89-111. 3. Jiangchun Wang, Shensheng Zhang, Jianqiang Luo, Multiple-Level Grid Algorithm for Getting 2D Road Map in 3D Virtual Scene, Proceeding of Computational Science ICCS2003, Lecture Notes in Computer Science 2659, P264~274 4. R. Beckers, J.L. Deneubourg, “Trails and U-turns in the selection of the shortest path by the ant Lasius Niger,” Journal of Theoretical Biology, vol. 159, pp. 397–415, 1992. 5. S. Goss, S. Aron, J.L. Deneubourg, and J.M. Pasteels, “Self-organized shortcuts in the argentine ant,” Naturwissenschaften, vol. 76, pp. 579–581, 1989. 6. B. Hölldobler and E.O. Wilson, The ants. Springer-Verlag, Berlin, 1990. 7. Wooldridge M, Jennings N R. Intelligentagents: Theory and Practice. The Knowledge Engineering Review, 1995,10(2): 115-152. 8. VRML 2.0 specifications, http://www.vrml.org/Specifications/ 9. O. Khatib, “Real-time obstacle avoidance for manipulators and mobile robots,” Int. J. Robotics Research 5, No. 1, 90–98 (1986). 10. OuYang Zhengzhu & He Kezhong, “The Intelligent Mobile Robot Navigation Control Base on Potential Field Methods”, 2001 vol 16 P128-130 1.

Workflow Interoperability – Enabling Online Approval in E-government Hua Xin and Fu-ren Xue Dept. of Computer Science & Engineering, Beijing Institute of Technology, Beijing 100081, China [email protected]

Abstract. This paper describes the rationale for workflow interoperability in the context of electric Government and as a means of implementing value chains that operate across and between organizations. In this paper, a collaborated center based on workflow technology is designed for managing and controlling approval process.

1 Introduction Workflow is concerned with the automation of procedures where information and tasks are passed between participants according to a defined set of rules to achieve, or contribute to, an overall business goal [1]. That is to say, workflow is the computerized facilitation or automation of a business process, in whole or part. Whilst workflow may be manually organized, in practice most workflow is normally organized within the context of an IT system to provide computerized support for the procedural automation [2]. A system that completely defines, manages and executes “workflows” through the execution of software whose order of execution is driven by a computer representation of the workflow logic, which is granted as Workflow Management System. Online approval is a typical procedure process, by the utilitarian of workflow technology, the automatic control management of process could be realized, and the efficiency of online-approval rises. The graphic custom tools and abundant in disposing of workflow, supported by workflow technology, which produce various workflow procedure, and make a security of which all users’ apply could be on the move accurately among the different departments in good time.

2 The Workflow Reference Model The Workflow Reference model has been developed from the generic workflow application structure by identifying the interfaces within this structure which enable products to interoperate at a variety of levels. All workflow systems contain a number of generic components which interact in a defined set of ways.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1018–1021, 2004. © Springer-Verlag Berlin Heidelberg 2004

Workflow Interoperability – Enabling Online Approval in E-government

1019

Fig. 1. Workflow Reference Model-Components & Interfaces

Process Definition Tools: Process modelling tools allow business users to coordinate business activities, people and applications, and to model routing of work requests within a process and across processes. Workflow Enactment Service: Which may consist of one or more workflow engines in order to create, manage and execute workflow instances. Applications may interface to this service via the workflow application programming interface(WAPI), which to support interoperability between different workflow systems. Invoked Applications: It’s the programs called in process instances of the workflow enactment services, or disposing of the application data. The detailed information of the invoked application is included in the process definition, say, as type, address, etc. Workflow Client Applications: To provide a kind of means to handle the task produced in the running process of workflow instance. Every task is regarded as a work item, which includes some requirements , demand in data type of disposing. Administration and Monitoring: its function is to monitor and manage the instance task which is produced in the process of .such as user management, role management, audit management, resource control, etc. The interoperability with workflow enactment is through interface, which include specific commands within the WAPI set to manipulate designed administration and monitoring functions.

3 Application in Online Approval The enterprise or individuals, put a application to government administration departments through E-approval system. The counterpart services then accept and hear a case by a window .In the process of approval, such rules should be abided as promulgation in the net, joint approval among different department, finishing it in a limited time, etc. At last the approval final result will be sent to the enterprise and

1020

H. Xin and F.-r. Xue

individual. There is a portal in which the user’s application is to be examined and accepted. And according to correspond rules and codes as well as policy , it is delivered to corresponding transaction services[3].

3.1 Collaborative Center in the E-approval System According to both the Model Reference of Workflow Management and the demand of online approval system, we can see the core part of online approval system is workflow engine and the collaborate center based on it. Typically a workflow engine provides such facilities to handle as interpretation of the process definition, control of process instances-creation, activation, suspension, termination, navigation between process activities, which may involve sequential or parallel operations, deadline scheduling, interpretation of workflow relevant data, identification of work items for user attention and an interface to support user interactions, maintenance of workflow control data and workflow relevant data, passing workflow relevant data to/from application or users, an interface to invoke external applications and link any workflow relevant data, supervisory actions l, administration and audit purposes.

3.2 The Workflow Modeling of E-approval System The workflow model of online approval system is a description to the government approval business procedure, that is, to divided the corresponding procedure into a series of activities and their dependent condition. An activity is typically the smallest unit of work which is scheduled by the collaborative center during approval process enactment, (e.g. using transition and pre/post-conditions) although one activity may result in several work items being assigned( to a workflow participant) [4]. The attribute of an activity is comprise of function input/output, resource input/output, control input/output as well as the description of activity. All input is regarded as the precondition, which plays a role on the occurrence of activity. All output is regarded as the subsequence, which is critical to the happening of the following activity. In case of the variety of approval items in the E-approval System, a kind of layered workflow description method has been undertaken.

Fig. 2. Workflow Modeling diagram in E-approval system

In the workflow modeling, there is a logical separation between the process and activity administration, so is a logical separation between the process and application tools, process and terminal custom tasks. Application tools and tasks are used to

Workflow Interoperability – Enabling Online Approval in E-government

1021

laying out its corresponding activity. Shown as figure 4, T3 is not only an activity, but also a sub process, in which the finish of T3 depends on the end of its all sub process, after finishing its sub processes, it return its concerning information back to T3.

3.3 Implementation The premise of applying workflow technology in approval system is, after user’s finishing the application, its information being written in intranet database. There is a form on which register number and approval events have been written. According to various approval event, different template has been built. There is a begin point and an end point in every templates. Between the two points, approval node (or role) has been added, at the same time endowing it a participant. Building a workflow form in the template, by which workflow configuration will been stored for workflow engine, including activity ID, activity name, activity participant, transition condition, and overdue disposal way .There are two ways to endow every node its participant. One is to write the information of participant in the template in advance. The other is to activate the workflow template through the corresponding application program, according to various custom ID, produce a new instance ID, when the participant of the first activity ends its task, by this interoperability way, writing the information of next participant in the new instance relevant information.

4 Conclusion Online approval is a comprehensive job, which concerns multi-department and mutiunits based on intranet. It must be recognizant that approval information is mobile among the net, and the approval process is consisted of approval chains, involved various approval departments, the transfer of approval information. The collaborative center defines, creates and manage the execution of workflows, and in which, running on one or more workflow engines, which is able to interpret the process definition, interact with workflow participants. It is key point that how to devise and implement the workflow collaborative center in scientific and effective way in the online approval system, since its predominant roles in online approval system.

References 1. Workflow Management Coalition: The Workflow Reference Model, Document Number TC00-1003 Document Status-Issue 1.1 2. Workflow Management Coalition: Terminology & Glossary Document Number WFMCTC-1011, Document Status-Issue 3.0, Feb 99 3. Mike Anderson MSc, BSc, MBCS, Ceng International Computers Ltd, Rob Allen BSc SNS/ASSURE Corp: Workflow Interoperability –Enabling E-Commerce 4. Carol Prior, Maestro BPE Pty Limited, Australia, Workflow and Process Management

A Multicast Routing Algorithm for CSCW Xiongfei Li 1,2, Dandan Huan 1, Yuanfang Dong 1, and Xin Zhou 1 1

2

College of Computing Science and Technology , Jilin University , Changchun 130025, China State Key Laboratory for Novel Software Technology,Nanjing University, Nanjing 210016,China [email protected]

Abstract. The CRMR, which is an algorithm for updating multicast tree dynamically, is presented. It constructs an initial multicast tree using KPP and builds a virtual trunk (VT) simultaneously. The nodes of the VT are relatively stable and nodes changes are based on the VT. The algorithm employs trigger function (TF) that associates QoS reduction caused by members update in this region, at the same time the algorithm satisfies delay constraints. We compare the performance of CRMR algorithm, by simulation, with others. The simulation results indicate that CRMR algorithm provides better balance between cost performance and changes in the multicast tree after each update. We also introduce into bandwidth indication function to alter CRMR algorithm to multiple QoS guaranteed algorithm, which can deal with the constraints of bandwidth and delay.

1 Introduction Many applications of multimedia such as computer supported cooperate work (CSCW), distance education and video-on-demand (VOD) services will relay on the ability of the network to provide multicast services. Multicast routing algorithm related to the development of multicast trees between a source node and a group of destination nodes. The construction of multicast trees satisfying quality of service (QoS) requirements is becoming a problem of prime importance. The multicast routing problem can be modeled as the Steiner problem in networks [1]. Since the Steiner problem is NP-complete [2], an algorithm that finds a minimum Steiner tree will not run in polynomial time. Consequently, its explicit solutions are prohibitively expensive [3]. The problem of updating the multicast tree after each addition and deletion is known as the on-line multicast problem in networks. The problem related to group member changes was first presented by Waxman in [4]. The routing algorithm of nonrearrangement doesn’t allow altering the existing multicast links to reduce the disturbance, which may cause to current group members. Such as GA[4], WGA[4], and VTDM[5] are nonrearrangement. Others are rearrangement algorithms. Such as EBA[6], GSDM[7], ARIES[8], and CRCDM[9] are arrangement. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1022–1025, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Multicast Routing Algorithm for CSCW

1023

2 CRMR Algorithm The network is modeled as a weighted, connected graph G = (V, E), with node set V and edge set E. An edge connecting nodes u and v is denoted by (u, v). Two nonnegative weight function respectively a cost function and a delay function D(e): is defined. Let s be a source node, and be a set of destination nodes in G, where be a path in the network and the delay and cost of this path are denoted as D(P) and C(P)

Given a delay tolerance the delay on the path from s to v is bounded above by is a sequence of dynamic requests vector where each request is a pair {add, remove}, request either adding or removing a destination node to or from the multicast group. Therefore, each request is also able to be denoted by or Let and (v, remove) be the set of nodes that belong to the multicast group after requests are made. Let and be an initial multicast tree. Dynamic multicast routing algorithm aims to find multicast trees such that the members of are those of tree T modified after requests and its cost is the minimum among all possible choice for satisfying delay constraints of all current group members. Formally, the optimized problem as follows is taken into consideration To describe our algorithm, we used the following definitions. M-Node, I-Node, M-Region and are defined as [5]. CC(R): denotes the sum of cost differences between current connection and optimal connection of two random nodes in region R. Trigger Function TF(R): the TF of an M-Region in a multicast tree T is defined as

The TF of an M-Region is a measure of its quality reduction caused by members update in this region. CC(R) is a measure of the QoS influence of a M-Region to the overall multicast session. By influence of quality in multicast tree, we mean the drop extent of total cost of current links in compared with the total cost of the shortest paths satisfying delay constraints between each of the two nodes in the region, and it is a key factor to evaluate the necessity of rearrangement. When the TF of a particular region of the tree drops below a threshold, a rearrangement technique is used to suitably modify the tree. CRMR algorithm optimizes the selection of point at which a rearrangement needs to be triggered. It aims to satisfy the delay constraints of all current group members, at the same time minimize the cost of the constructed tree. 1) Initial Multicast Tree construction Empirical studies have shown that KPP[10] produces better cost performance than others. We improve KPP algorithm to construct an initial multicast tree.

1024

X. Li et al.

Step1: Compute the distance graph each edge of which is in correspondence with the constrained cheapest paths between all pairs of nodes in primitive graph. Step 2: Construct a constrained spanning tree of Step 3: Consist of expanding the edges of the constrained spanning tree into the constrained cheapest paths they represent, and remove any loops that may be caused by this expansion. 2) Virtual Trunk Build; 3) Node Addition; 4) Node Removal As paper[5]. 5) Rearrangement Algorithm: Let R be an M-Region of multicast tree T spanning a source node s and a multicast group S, such that Hence, routing rearrangement is triggered. The first step is to remove all nodes and edges in the interior of R from T. This results in splitting T into a number of fragments. Step 1: Compute i = 1, 2, ••• , m of region Let delayStep 2: Determine the path P from node x to

satisfying delay-constraint

Step 3: Let path connect a node in the source fragment f with some node in fragment Note that path though intended with as the destination, might actually intersect at some node other than the leader node. Hence, may or may not be 6) Dynamic Multicast Routing Algorithm with multiple QoS Constraints: In graph G, we introduce into bandwidth indication function MB(v, w) for random edge to alter CRMR algorithm to multiple QoS guaranteed algorithm, which can deal with the constraints of bandwidth and delay.

When adding bandwidth constraint, CRCDM algorithm only need modifying of the initial multicast tree construction, node addition and rearrangement algorithm to the following function

3 Experimental Results and Conclusion We simulated CRMR algorithm and other multicast routing algorithm on 20 randomly generated, sparse, 200 nodes networks. Random graphs with average node degree of 4, were used for the simulation which resembe real networks in their connectivity. Each algorithm received 100 requests to add or delete a multicast member for each network. CRMR algorithm has two tunable parameter k

A Multicast Routing Algorithm for CSCW

1025

and k represents the number of VT nodes and controls the frequency with which arrangements are triggered. When TF below to rearrangement algorithm is triggered. 20 times simulation experiments are executed in a random graph, where k = 0.2N, a = 0.6, b = 0.4. In CRMR algorithm, the idea of triggering a rearrangement based on threshold is adopted. The algorithm employs trigger function (TF) that associates the usefulness of a region and the quality reduction caused by members update in this region, at the same time CRMR algorithm satisfies delay constraints. The simulation results indicate that CRMR algorithm provides batter balance between cost performance and changes in the multicast tree after each update. Reasonable values of the parameters are given in this paper. We also introduce into bandwidth indication function to alter CRMR algorithm to multiple QoS guaranteed algorithm, which can deal with the constraints of bandwidth and delay. Designing a distributed version of our algorithm with multiple QoS constraints is an area for future investigation.

Acknowledgment. This work was supported by the National Natural Foundation of China under Grant No. 60373097.

References 1.

S. E. Dreyfus and R. A. Wagner, “The Steiner Problem in graphs,” Networks, vol.1, no.3, pp. 195-207, 1971. 2. Garey M L, Johnson D S, “Computers and Intractability: A Guide to the Theory of NPCompleteness,” San Francisco: W H Freeman, 1979. 3. P. Winter, “Steiner problem in networks: A survey,” Networks, vol.17, no.2, pp.129-167, 1987. 4. B. Waxman, “Routing of multipoint connections,” IEEE J. Select. Areas Commun., vol.6, pp.1617-1622, Dec. 1988. 5. H. Lin and S. Lai, “VTDM-A dynamic multicast routing algorithm,” Proc. of IEEE INFOCOM’98, San Francisco, California, USA, pp.1426-1432, Mar. 31-April 2 1998. 6. M. Imase and B. Waxman, “Dynamic Steiner Tree Problems,” SIAM J. Disc. Math., vol. 4, no.3, pp.369-384, Aug. 1991. 7. J. Kadirire and G. Knight, “Comparison of Dynamic Multicast Routing Algorithm for Wide-area Packet Switched (Asynchronous Transfer Mode) Networks,” Proc. of IEEE INFOCOM, pp.212-219, Apr. 1995. 8. F. Bauer and A. Varma, “ARIES: A Rearrangeable Inexpensive Edge-Based On-Line Steiner Algorithm,” IEEE J. Select. Areas Commun., vol.15, no.3, pp.382-397, Apr. 1999. 9. S. Raghavan, G. Manimaran and C. Siva Ram Murthy, “A Rearrangeable Algorithm for the Construction of Delay-Constrained Dynamic Multicast Trees,” IEEE/ACM Trans. Networking, vol.7, no.4, pp.514-529, Aug. 1999. 10. V. P. Kompella, J. C. Pasquale, and G. C. Polyzos, “Multimedia Routing for Multimedia Communication[J],” IEEE/ACM Trans. Networking, vol.1, no.3, pp.286-292, June 1993.

A Multi-agent System Based on ECA Rule Xiaojun Zhou, Jian Cao, and Shensheng Zhang CIT Lab, Shanghai Jiaotong University, Shanghai, 200030 {zhou-xj,cao-jian}@.cs.sjtu.edu.cn, [email protected]

Abstract. The gap between theory and practice has been recognized and many research groups focus on the relation of formal specification methods for agent properties to the design of practical multi-agent system. In this paper, the design and implementation of a kind of agent called E-Agent which is based on the concepts from AgentSpeak(L) and ECA rules. E-Agent is formalized firstly and the tools developed are also introduced which can help the user design and deploy the agent easily and quickly.

1 Introduction The theory and technology of agent were derived from distributed artificial intelligence (DAI). And the application domains for which multi-agent technologies are especially useful contain intelligent manufacturing systems, workflow management, electronic commerce, etc [1]. Theoretical formalizations of such agents and agent implementations have proceeded in parallel with little or no connection between them [2]. The gap between theory and practice has been recognized and many research groups focus on the relation of formal specification methods for agent properties to the design of practical multi-agent system, for example, the AgentSpeak(L) programming language introduced by Rao[3]. But AgentSpeak(L) is such an abstract language that the interpreter or compiler for it is unavailable till now. Although Porto Alegre shows a way of turning AgentSpeak(L) agents into the running programs by using Sloman’s SIM AGENT toolkit[4], it can not deal with aspects of modeling environments and the basic actions that agents can perform. In this paper, the design and implementation of E-Agent which are based on the concepts of AgentSpeak(L) and ECA rules are introduced. Firstly the formalization of E-Agent is presented. Secondly, in order to support multi-agent system design and implementation in an easy and productive way, a system which contains a graphical agent design tool, an agent running environment and an agent name server is presented in the paper.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1026–1029, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Multi-agent System Based on ECA Rule

1027

2 The Formalization of E-agent The specification of E–Agent is based on AgentSpeak(L),ECA rule and several extensions .In this section, we give the main definitions of E-Agent. Definition 1: E-Agent=. Through receiving message and sending message, an E-Agent can perceive its environment and exchange information with each other. Sender is the sender of message and Receiver is the receiver of message. If ! g(t),? g(t), and are goals , then ! g(t),? g(t), and can be a MessageBody. We refer to accepting Message as Event, so there are five types of Events according to the MessageBody of Message. Definition 5: Atomic action can be performed directly by an E-Agent which is composed of three parts, respectively called, the type of action, the pre-condition of action and action result. Definition 6: The behavior of an agent is basically defined by specifying plans. A plan can be viewed as a process, and it can be denoted as where is the condition for the plan to be instanced, is the set of atomic actions which are used in the plan ,The definition of and are the same as the definition in the Process which is described in the next subsection . contains two parts respectively called, trigger event E and trigger condition C . The trigger condition C can be described as where b1,...,bm are all belief. It means that when trigger event E happens and the values of b1,.. .,bm all are all true, an intention of the plan can be instanced and then waiting for running . The set I is the set of intentions, where each intention is an instantiated plan which will control and monitor the action of an E-Agent. Process is the description of a serial of actions that are performed sequentially for achieving a target. The information of Process is composed of four parts, respectively called, actions which will be performed, the sequence relations between actions, the

1028

X. Zhou, J. Cao, and S. Zhang

restricted relations between actions and the trigger mechanism of actions. The sequence relations between actions represent the execution sequence of actions. The restricted relations represent the execution condition of actions. And the trigger mechanism describes how an action is active. Definition 9: Process can be represented as a 5-tuples , where C is the condition for process to be active. When the condition C is satisfied, the actions in the process will be performed one by one according to the sequence relations and the restricted relations; A is the set of actions; R is the set of routes; And T is the set of transition rules; M represents the trigger mechanism of actions. Definition 10: A Route describes a sequence relation between two actions, which is represent by a 2-tuples :< Join, Split >. Join is the type of input constraint and the Split is the type of output constraint, where the range of Join and Split is the set of {And,Xor}. Transition rule describes the restricted relation; it is represented by a 3-tuples < FrontNode, r, BehandNode >. The type of FrontNode and BehandNode can be either action or route. r represents an ECA rule, it describes the execution condition for BehandNode. Property 1: It means the type of FrontNode and BehandNode can not be action at the same time. Definition 11: An ECA rule can be represented by a 4-tuples, r =<Er,Pr,Jr,Idr>, where Er is the Event of the ECA rule; Pr represents the condition of ECA rule; Jr is the action needed to be performed when ECA rule is satisfied; Idr is the name of ECA rule.

3 The Implementation of E-agent In this paper, the formalizations of E-Agent and E-Agency are given which are based on AgentSpeak(L),ECA rule and several extensions. Then we discuss the interpreter loop of E-Agent and E-Agency. At the end of paper, the implementations of E-Agent and E-Agency on windows are presented. E-Agent combines the BDI agent approach with concepts from ECA rule and allows the design of complex agent plans. In order to support multi-agent system design and implementation in a safe, easy and productive way, a serial of tools are developed based on the formalization of E-Agent. Future work will concentrate on the development of coordination mechanisms for multi-agent system and the application in the area of process simulation.

4 Conclusion In this paper, the formalizations of E-Agent and E-Agency are given which are based on AgentSpeak(L),ECA rule and several extensions. Then we discuss the interpreter

A Multi-agent System Based on ECA Rule

1029

loop of E-Agent and E-Agency. At the end of paper, the implementations of E-Agent and E-Agency on windows are presented. E-Agent combines the BDI agent approach with concepts from ECA rule and allows the design of complex agent plans. In order to support multi-agent system design and implementation in a safe, easy and productive way, a serial of tools are developed based on the formalization of E-Agent. Future work will concentrate on the development of coordination mechanisms for multi-agent system and the application in the area of process simulation.

References 1. Oliveira Eugénio; Fischer Klaus; Stepankova Olga, Multi-agent systems: which research for which applications, Robotics and Autonomous Systems Volume: 27, Issue: 1-2, April 30, 1999, pp. 91-106 2. Rao A S, Georgeff M P. Modeling rational agents within a BDI architecture. Proceedings of the second International Conference on Principles of Knowledge Representation and Reasoning. San Mateo, CA : Morgan Kaufmann Publishers, 1991. 473–484 3. Rao A S. AgentSpeak(L): BDI agents speak out in a logical computable language. Velde W, Perram J. Agents Breaking Away . Eindhoven: Springer- Verlag,1996. 42–55 4. Rodrigo Machado, Rafael H. Bordini, Running AgentSpeak(L) Agents on SIM AGENT, Intelligent Agents VIII, 158–174 .Springer-Verlag Berlin Heidelberg ,2002

A Hybrid Algorithm of n-OPT and GA to Solve Dynamic TSP Zhao Liu1 and Lishan Kang 2 1

Department of Information Engineering, China University of Geosciences, 430074 Wuhan, Hubei, China [email protected]

2

Department of Computer Science and Technology, China University of Geosciences, 430074 Wuhan, Hubei, China [email protected]

Abstract. We proposed the concept dynamic traveling salesman problem (Dynamic TSP). According to the characteristics of Dynamic TSP, we used a hybrid algorithm of n-OPT and GA to solve it. 2-OPT and 3-OPT are used in GA procedures of mutation and selection. The productivity and quality of solutions under dynamic conditions are evaluated by the experiment.

1 Introduction Given a graph G, TSP is to find the shortest Hamilton circuit of it [1]. The Dynamic TSP is to find the shortest Hamilton circuit of a special graph G, in which, weights of the edges and number of vertexes could change with the time. Dynamic TSP could widely be used in domains of communication, robot control, vehicle route choose, mobile computing and so on [2]. According to the characteristics of Dynamic TSP, We use a hybrid algorithm of 2-OPT, 3-OPT and genetic algorithm (GA) to solve it.

2 Characteristics of Dynamic TSP Dynamic TSP connects closely with TSP, and in the mean time, it has Characteristics of itself: Actuality: Dynamic TSP could change with the time. Continuity: The problem change partially and quantitatively with the time. Robustness: Unexpected situations, such as when vertex is deleted or inserted, should meet with a quick response. Effective: Dynamic TSP requires get optimal tour in reasonable time.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1030–1033, 2004. © Springer-Verlag Berlin Heidelberg 2004

A Hybrid Algorithm of n-OPT and GA to Solve Dynamic TSP

1031

3 Basic Algorithm n-OPT algorithm is based on the concept n-Optimality: A tour is said to be n-OPT if it is impossible to obtain a shorter tour by replacing any n of its links by any other set of n links [3]. In general, the larger the value of n, the more likely it is that the final tour is optimal, in the mean time, the number of operations increases rapidly. As a result, the values n = 2 and n = 3 are the most commonly used. GA is an optimizing algorithm that is modeled after the evolution of organisms. It always includes reproduction, mutation, competition and selection procedures. GA algorithms are powerful tools for solving TSP problems.

Fig. 1. The pictures show an experiment procedure (TSPLIB case: ch130). The picture 1 shows building-up of initial state. Then, in the picture 2, we insert 2 dynamic vertexes to the case and assign the velocity of them. Other pictures show some solutions in the dynamic situations

4 A Hybrid Algorithm to Solve Dynamic TSP Dynamic TSP requires not only high quality of solutions but also high efficiency and rapid respond. 2-OPT and 3-OPT have high efficiency and rapid respond in solving Dynamic TSP, but output inferior solutions and there are likely to degenerate, which means that, the gap between solution and optimal is bigger and bigger. GA could outputs high quality solutions but it has inferior efficiency and responds slowly to the change of problem. To solve Dynamic TSP, we use 2-OPT and 3-OPT in GA procedures of mutation and selection in our algorithm.

1032

Z. Liu and L. Kang

Algorithm description

5 Experimental Results and Conclusion Our algorithm tests some cases in TSPLIB. (CPU: Celeron 366MHz; RAM: 256M). The productivity and quality of solutions were evaluated by the experiment. Experiment proved that the hybrid algorithm could solve Dynamic TSP problem effectively.

A Hybrid Algorithm of n-OPT and GA to Solve Dynamic TSP

1033

References 1. Garey, M.R., Johnson, D.S.: A Guide to the Theory of NP-Completeness. Computers and Intractability (1979) 2. Burkard, R.E., Deineko, V.G., Dal, R.V.: Well-Solvable Special Cases of the Traveling Salesman Problem: a survey. SIAM Review (1998) 496–546 3. Keld, Helsgaun.: An Effective Implementation of the Lin-Kernighan Traveling Salesman Heuristic. DATALOGISKE SKRIFTER (1998) 4. Giuseppe, Paletta.: The Period Traveling Salesman Problem: a new heuristic algorithm. Computers & Operations Research, vol. 29 (2002) 1343–1352 5. Laporte, F. Semet.: Computational Evaluation of a Transformation Procedure for the Symmetric Generalized Traveling Salesman Problem (1999) 6. J. Renaud, F.F. Boctor, G. Laporte.: A Fast Composite Heuristic for the Symmetric Traveling Salesman Problem. INFORMS Journal on Computing (1996) 134–143

The Application Research of Role-Based Access Control Model in Workflow Management System Baoyi Wang1, Shaomin Zhang1’2, and Xiaodong Xia1 1 School of Computer, North China Electric Power University, Baoding 071003, China School of Computer Science and Technology, Xidian University, Xi’an 710071, China

2

[email protected]

Abstract. Access control between multi-user and multi-object is the key technique in security management of distributed workflow systems, and the mechanism of the role-based authorization and access control is an effective way to solve the problem. Both RBAC96 and NRBAC are better models of role-based access control. After introducing authorization constraints, a new model of role-based access control, ARBAC, is presented to mend the above models’ shortcomings. After this, an example is given to explain the execution of the role assignment algorithm. Finally, ARBAC model is applied to workflow system architecture and the new architecture is also explained in this paper. In practice, the model ARBAC with the authorization mechanism is flexible, and it also simplifies the task complexity of security administrator.

1 Introduction The access control between multi-user and multi-object is always the focus of security management in a distributed workflow system. The mechanism of role-based access control can solve this problem preferably. A concept of role is introduced between users and authorizations, and access authorization of object is granted to a certain role. Because of the role, user is apart from authorization in logic. Users can get various authorizations when they are given corresponding roles. Since it can greatly simplify the operation of authorization and security management in distributed system, it has been concerned and studied widely. However, there are many limitations in current role-based constrained access control model. RBAC96 model cluster[1] has four different models; they are RBAC0, RBAC1, RBAC2, and RBAC3. Role integration and role limitation are added to RBAC3 on the base of primary model. It enlarges original RBAC models and includes almost all aspects of information of the role-based authorization management. But it does not give formalized definition and systemic description of its constraint model (RBAC2) in detail, and the operations of the security administrators are very complex. NRBAC model[2] is a new model, but it only integrates parts of RBAC96 model, and can’t solve the problem of the management of authorization and role themselves better. This model does not take characteristics of WFMS into account, either. So this paper gives a new role-based access control model that supports workflow and has been applied in a real workflow system.

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1034–1037, 2004. © Springer-Verlag Berlin Heidelberg 2004

The Application Research of Role-Based Access Control Model

1035

2 Role-Based Access Control Model Which Supports Workflow The RBAC96[1] model lacks of constraint management, and can’t satisfy the demands of workflow system. In order to apply role-based access control model to a workflow system, RBAC96 has been improved and introduced into authorization constraints. This paper presents a new role-based access control model ARBAC. Authorization constraints contain role constraints, task constraints, and authorization constraints. Combining all above constraints with role-based access control, the system can be accessed more securely. The ARBAC model is shown as Fig. 1. Role-based access control model supporting workflow = {US, RS, UA, RSH, TA, PA, TS, AS, OPS, OS}. By details: US (User Set) = each is an actual user; RS (Role Set), role’s attributes include authorizations and constraints between them; OS (Object Set), all objects that can be accessed in a workflow system; OPS (Operation Set) is the set of operations that can access the object; TS(Task Set) is the finite set of tasks; AS = OS × OPS, a role has the priority means the role can execute act to object TA (Task Assignment), a map from RS to TS; AA(Authorization Assignment) represents granting role with authorization; RSH (Role Set Hierarchy) is a duality relation between role set and role set, it represents the hierarchical relation among the roles in an organization. Role set is semi-order. Role hierarchy can be marked as RSH,

Fig. 1. Role-based access control model which supports workflow ARBAC

3 Authorization Constrains in Workflow System Workflow process

is a finite set of Here In a workflow system, there are many constraints among tasks, roles and authorizations: (1) Authorization constraints. CCTS[2] and CCTM[2] are two different types of authorization constraints. CCTS = {max numbers of assignment, states}, CCTM = {mutual exclusion, contain, dependence, inheritance}. Here is the description of mutual exclusion relation of CCTM. If and are mutually exclusive, no roles can have both of the two priorities at the same time. It is the rule of separation of duties. It is defined as follows: (2) Task constraints. If task when the role is executing

and are conflicting, no role can execute It is defined as follows:

1036

B. Wang, S. Zhang, and X. Xia

(3) Role constraints. Similitude with task constraints, role constraints contain RCTM and RCTS, the mutual exclusion in RCTS is defined as follows: If task and are conflicting, and are conflicting. It can be marked as

4 The Algorithms and Their Application Examples Security check, authorization assignment and role assignment are the most key algorithms in the role-based access control model. We design these algorithms, they are similar to the algorithms of Ref.[3]. As the limit of length of the paper, these algorithms will not be discussed here, we only give role assignment algorithm and its application as an example to introduce algorithm idea. Algorithm name: role assignment algorithm Input: (1) The workflow n is the number of tasks in W; (2) CB(W), CB (W) are all the constraints among authorizations, tasks and roles. Output: Return false if no role assignment satisfying the constraints; Return RAG(W), otherwise. Supposed that in a workflow system there are four tasks and can be executed by market clerk (MC), by market manager (MM) and general manager (GM), by MC and GM, and only be executed by MC. Constraints in the workflow are as follows: Constraint In this workflow system, at least three roles exist: MC, MM, and GM. Constraint The role who executes must have a higher priority than those who executes and If a user belongs to role MC and has Constraint executed this user cannot execute Constraint If a user has executed task he (or she) cannot execute task Constraint Each activity of must be executed by a different user. Constraint If a user has executed then he (or she) cannot execute Then, CB

Fig. 2. A workflow system architecture

The Application Research of Role-Based Access Control Model

1037

5 A Workflow System of Role-Based Access Control Model Here is a workflow system which has been improved, and its architecture is shown as Fig. 2. From Fig. 2, we can learn that the new workflow system is enriched with workflow authorization module, constraints analysis and enforcement module, assignment constraints and RSH. Workflow authorization module, constraints analysis and enforcement module are added to the workflow server. Before a workflow task is activated, some constraints and the RS(Role set) that can execute this task are sent to the workflow authorization module. According to constraints analysis and history data, the authorization module decides the role assignment. After this task has assigned to a user, the user and the task will be registered in workflow task list, and then workflow engine sends the task to this user who will perform it. So the workflow system that uses this model not only has a flexible authorization mechanism, but also simplifies the operation complexity of security administrator.

6 Conclusion The model proposed in this paper can be applied in some systems. After this model has been used in the distributed workflow system, the performance of the access security is improved greatly and it is easy to put into practice, and the execution efficiency of the system must be affected. Considering valid load balance policy can improve the system performance, my future work is to research on load balance policy and apply it to the workflow system

References 1. Ravi Sandhu, Edward Coyne, Hal Feinstein and etc.: Role-Based Access Control Model. IEEE Computer, Vol. 29(2) (1996) 38-47 2. Ying Qiao, De Xu, Guozhong Dai: A New Role-based Access Control Model and It’s Implement Mechanism. Journal of Computer Research& Development, Vol. 37(1) (2000) 37-44 3. Elisa Bertino, Elena Ferrari, Vijay Atluri: The Specification and Enforcement Of Authorization Constraints in Workflow Management Systems. ACM Transactions on Information and System Security. Vol. 2(1) (1999) 65-104

Research and Design of Remote Education System Based on CSCW Chunzhi Wang, Miao Shao, Jing Xia, and Huachao Chen Department of Computer Hubei Polytechnic University, 430068 Wuhan, China

Abstract. Guided by the cooperation theory, this paper puts forward an interactive and cooperative learning environment design that is based on the selflearning mode and cooperative learning mode, CSCW, Computer Supported Cooperative Work. It includes the research of colony working method and the technologies supporting it as well, and the exploitation of the application system to improve the ways of information communication and disaccustom the time and space in restriction of traditional education. Keywords: CSCW (Computer Support Cooperative Work), Remote Education, Cooperative Learning.

1 Introduction A new research field CSCW, Computer Supported Cooperative Work, has been engendered in this information era. As an efficient measure of the lifetime learning, the remote education and networking learning is the application of CSCW the idea and technology in education field. The remote education system based on CSCW not only integrates highly with the Web or a sort of knowledge-gaining tools, but also provides a cooperative working environment.

2 The Design Idea of the Remote Education System Based on CSCW The role of designer is to exploit and construct a significant knowledge environment in which the learner could enjoy the study. So the required characteristics include the scene, the construction of knowledge, cooperation and conversation. The remote education system in present days just affords the learning method focusing on the individual learning. It cannot carry forward the advantages of network technologies. Even more, it has to be an unavoidable replace of the F2F education. However, the remote education will have a vital status in the network era. If we inosculate the technologies of cooperation with it, and design a learning environment in network, the ill complexion of the remote education will be changed thoroughly. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1038–1041, 2004. © Springer-Verlag Berlin Heidelberg 2004

Research and Design of Remote Education System Based on CSCW

1039

3 The Design of the Remote Education System Based on CSCW 3.1 Principle of Design We will stand by the principles as following when designing and realizing the system: (1) Ensure the system advanced and practical. It’s mainly for students, mainly by teacher. (2) External system indicates the external system running through the realization and application. (3) Integration of the operation system with tools, and multimedia with multimode. (4) Expandability and adaptability adapt the development and changes of the network and learning assignment (5) Credibility and security assure the problems be present and settled in time as possible.

3.2 Design Steps a. Define the studying field. We should affirm how much knowledge being covered in the learning environment, and distinguish the different knowledge kinds. b. Analyze the learners’ characteristics. Set up a Student Module according to the learners’ studying peculiarity and the present knowledge structure. c. Student and teacher are the main bodies in this system. It’s the core of the system. We will provide some studying information and knowledge-gaining tools to them to achieve the goals by themselves. d. System exploitation. After tryout for period of time, it would come through further design and modification, consummating the whole system.

3.3 Module Partition The intention of the cooperative learning is to promote learners thinking over on their own, affecting, competing and helping each other through the collective activities in certain group and the farthest cooperation. The characteristics are concluded as following: a. The cooperative learning is a sort of interpenetrate teaching activities focusing on the group activities. b. The cooperative learning emphasize on the collaboration between the partners. c. The cooperative learning is a sort of goal-oriented activities. d. The cooperative learning performs encouragement according to the total achievement of the learning fellowship. e. The cooperative learning employs teachers distributing and control the learning assignment and teaching course The module of remote education system function based on CSCW is divided into the teacher module, teaching module and technology module. The interactive program in studying activities of the student module includes electronic class and discussion section; the feedback in the studying results actualized through the mail system or the electronic white board; the exercises and feedback in studying information

1040

C. Wang et al.

accomplished through the news bulletin. Other sub-modules are courseware, system management and testing on-line sub-modules. Users enter the different sub-systems via the users’ identity configuration in the main interface. (See the fig1) On the principle of different user roles, the system affords different views. Only the legal users can browse all the courseware in the base of courseware. Besides, users would browse on-line or download to learn in the light of their own choices. Electronic class is a real-time mutual interactive teaching system on multimedia, and it offers video/audio orders programs of on various subjects. The learning rate of progress and the result can be fed back to teaching studio on the ground. Argumentation section is a place where the learners and teachers discussing problems. Users perform real-time discussing in it and pasting their questions on the according board of BBS, so that others can either answer the questions on the same board or send the answer to their mailbox.

Fig. 1. The Remote Education System Based On Computer Cooperative Mode

News bulletin offers teachers and the teaching managers a place to releasing message, furnishing friendly interface for the certain users. Courseware sub-system supplies simple and convenient tools making the courseware, and uploading the finished wares onto the server to be browsed. Test on-line sub-system bears many functions including performing the test on line, creating the test paper automatically, timing electronically during the test, checking up papers and marking them to put into the user’s individual database when the examinees finished test and pressed the submission button.

Research and Design of Remote Education System Based on CSCW

1041

The teaching management module realizing the administration function for the manage department, students enrolling, alteration record of the enrollment, teaching plan management, enrollment, arrangement, selection and alteration of the course, planning of the test, management of the classroom, assesses for teachers, grades for student, evaluation for the individuals, stationary and disposition for the feedback information of the teaching, booking, storage, fund and building management of the teaching material, and collecting and issuing the concerned teaching information in time. Students may inquire the mark through the network, hand out the selection application, see about the information about the teachers, put the questions or answer them from the teachers or other students; meanwhile, on the web the teachers may hand out the estimation of students, inquire the name list of their class, bring forward the questions or answer them in the same way. Likely, teaching manage department may gain the instance of the teachers and students and feedback information on the web.

4 Conclusion The remote education system based on CSCW is a learning environment by using the computer supporting technologies, providing the learners various learning methods and efficiencies with high quality. On the other hand, the cooperative learning based on the network is just an effective way changing the present individual learning methods to gain the better result of study.

References 1. Shi Meilin, Yang Guangxin. The Cooperative Working Supported By Computer: The Past, Present, and Future (1999) 2. Wang Chunzhi, Xiongying. Discussion of The Development and Application of CSCW, Dissertation Collection of The Third National Workshop on CSCW (2002) 3. Zhang Yiwu, Gu Junzhong. A Cooperative Teaching System Based on Subspace, vClassRoom, The Communication Research (1999) 4. King Yuhui, Jiang Xiaodong. Research and Realization of The Cooperative Studying System based on CSCW, Computer Empolder and Application (1999) 5. Wang Chunzhi, Zhang Mingwu. Research and Design of The Education System Based on CSCW, Transaction of Hubei Polytechnic University (2003)

Data and Interaction Oriented Workflow Execution* Wan-Chun Dou, Juan Sun, Da-Gang Yang, and Shi-Jie Cai State Key Laboratory for Novel Software Technology Dept. of Computer Science and Technology, Nanjing University, Nanjing 210093, China [email protected]

Abstract. For better supervising workflow performance, the data elements engaged in workflow execution are classified into application data and process data. Taking advantage of HyperSet and Nested HyperSet, the rationale of domain-specified control and interaction is discussed and a fashion directing the hierarchical interaction is explored based on the domain-specified disciplines. The conclusions are presented at last.

1 Introduction In the competitive business arena, enterprise must continually strive to create new and better products faster, and efficiently than competitors to gain and keep the competitive advantage. To bring the objective into reality, workflow system is quickly becoming a choice of technology for enterprises to carry out their ambitions, efficiently, under a closely supervised fashion [1]. By extracting the information perspective and the behavioral perspective from data-flow and control-flow respectively, workflow performance supervision is explored in this paper. The remainder of this paper is constructed as follows. In section 2, workflow data elements and their execution-oriented logic are discussed. In section 3, the rationale of performance disciplines is presented by taking advantage of HyperSet and Nested HyperSet. Furthermore, a fashion is explored to direct the hierarchical interaction during workflow execution. Finally, the conclusions are presented in section 4.

2 Data Integration in Workflow Execution Data-flow is execution-related and the uncertainty caused by dynamic requirements during execution makes it difficult to prefigure all the details at the stage of modeling. Accordingly, data-flow is independent of static process definitions to some degree. S.Wu, etc. classified the data used in workflow system into three kinds: control data, workflow relevant data, and application data. The control data would be determined in advance, and the other two kinds are performance-related and cannot be specified in detail at stage of modeling. C.Ellis and K.Keddara merged the control data and the * This research is supported by the National Natural Science Foundation of China (NO. 60303025) M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1042–1045, 2004. © Springer-Verlag Berlin Heidelberg 2004

Data and Interaction Oriented Workflow Execution

1043

workflow relevant data into process data and they believed that process data are the local variables of a process or the data passed between processes.

Fig. 1. Execution-Oriented data logic engaged in workflow execution

According to the concept of process data and application data, the executionoriented data logic engaged in workflow execution is illustrated in Fig.1.

3 Interaction-Oriented Workflow Executions Here, the workflow execution is formalized as below: Wf_Modle = Wf_Unit R Wf_Unit. Herein, the Wf_Unit denotes workflow unit in line with execution block or individual activity, the R is associated relations between workflow units, which are often initiated as Routes&Rules such as invocation disciplines or execution sequence in modeling; and workflow execution is often supervised under those Routes&Rules. Generally, data-flow emphasizes data consistency and control-flow accentuates independence of domain-specified applications, which are all execution-related [4]. To bettering exploring the rationale of domain-specified interaction from behavioral perspective, two conceptions of HyperSet and Nested HyperSet are introduced [5]. Definition 1 (HyperSet). A HyperSet S is a set whose elements are simple elements or hyper-elements that are simple elements or Hypersets. Notation: A HyperSet S is a flat set, iif S has no hyper-element, that is any is a simple element. which may be a simple element or a hyper-element, is a subelement of a HyperSet, denoted as iff or for some The set of base elements of HyperSet S, denoted as base(S), is a flat set which contains all the simple sub-elements of S. Definition 2 (Nested HyperSet). A HyperSet S is nested if for any where By taking advantage of these two concepts, a supervision fashion will be presented for directing the interaction during workflow execution based on Petri net [6].

1044

W.-C. Dou et al.

Definition 3. Let N=(P, T; F) be a Petri net and let and Let ’ and , let on behalf of sub-net of the N, where We call this process decomposition of N based on transition. Here, and if the the is called coupled place of sub-net and Fig.2.a is a simplifications model in simulation of a local workflow. According to Def.3, it can be decomposed into Fig.2.b if and Herein, and is the set of coupled place between the two modules. Note that the individual module of N1 or N2 can also be decomposed further to describe lower level situations, which makes up a layered structure.

Fig. 2.a. Before decomposition

Fig. 2.b. After decomposition

Fig. 2. Domain-Specified Workflow Model Decomposition

Here, and can be treated as two Nested HyperSets derived from N. To protect the privacy of Nested HyperSet, a three-level interaction framework is put forward, as shown in Fig.3, in accordance with the decomposition result demonstrated in Fig.2.b.

Fig. 3. Hierarchical Process Supervision Oriented toward Workflow Execution

Obviously, information perspective and behavioral perspective are integrated in Fig.3. The double-arrowhead line denotes control-flow and data-flow occurred between these two modules. The low level execution-related data-flow can occur arbitrarily inside the isolated domains. Interaction can be realized through the midplace of and between the domain-specified applications that guarantee the data sharing without violating data consistency and local application’s independence.

Data and Interaction Oriented Workflow Execution

1045

If the domain-specified application contains only one T, it degrades into an atomic activity and the interactive process will be relatively simpler. Traditionally, workflow performance is deployed inside a given organization or enterprise. In recent years, a new trend of inter-organization workflow system is promoted in the context of electronic commerce by spanning different organizational units to achieve competitive advantages. Combined with the web technologies such as the CORBA [7], the fashion presented in this section puts forward a referenced method for complex workflow system by assigning link agents to replace the midplace. Additionally, compared with the approaches mentioned in [5], the hierarchical domain-specified control method presented in this paper indicates the organizational scheme qualitatively and leaves the quantificational minutiae into performance. The control process is progressed through different interface in hierarchical way that reduces the control complexity efficiently. Domain-specified control method also simplifies the exception handling in data-flow by limiting exception sphere inside the isolated module.

4 Conclusions The principle presented in this paper discovers the essence of domain-specified application. Together with the interactive discipline, it provides a way of isolating the interaction sphere. The next generation workflow such as intelligent workflow system or Web-based workflow system could also take flavor from the method oriented toward logics integration in despite of additional challenges is to be expected.

References 1. Kim, K., et al: Performance Analytic Models and Analyses for Workflow Architectures. Information System Frontiers, Vol. 3. 3(2001)339-355 2. Wu, S., Sheth, A., et al.: Authorization and Access Control of Application Data in Workflow Systems. Journal of Intelligent Information Systems, Vol.18. 1(2002)71-94 3. Ellis, C., Keddara, K.: ML-DEWS: Modeling Language to Support Dynamic Evolution Within Workflow Systems. Computer Supported Cooperative Work (CSCW), Vol.9. 3/4 (2000) 293-334 4. Wirtz, G., Weske, M., Giese, H.: The OcoN Approach to Workflow Modeling in ObjectOriented Systems. Information System Frontiers, Vol.3. 3(2001)357-376 5. Arpinar, B., et al.: Formalization of Workflows and Correctness Issues in the Presence of Concurrency. Distributed and Parallel Databases Vol.7. 2(1999)199-48 6. Salimifard, K., Wright, M.: Petri net-based Modeling of Workflow Systems: An Overview. European Journal of Operational Research, Vol.134. 3(2001)664-676 7. Leong, H., Ho, K., Lam, W: Web-based workflow framework with CORBA. Concurrent Engineering - Research and Application, Vol.9. 2(2001)120-130

XCS System: A New Architecture for Web-Based Applications* Yijian Wu and Wenyun Zhao Software Engineering Laboratory, Fudan University, Shanghai, 200433, China [email protected]

Abstract. This paper puts forward a model of Extended Client/Server (XCS) system, which is based on traditional Client/Server Architecture and borrows such merits from Browser/Server system as integrated code maintenance at Server-end and installation-free at Browser-end, etc. In an XCS system, Client is dynamically configured and automatically updated, and Server is able to commit self-health-checks and upgrade without shutting service down. XCS system is designed for secure application for authorized users, considering security extensions, and able to be extended to distributed application system.

1 Introduction and Assumptions The Extended C/S System (XCS) inherit as many as possible the advantages in C/S technology and B/S technology. Our destination is to build a system that end-users don’t have to care much about the installation and the server can be powerful and flexible to the max extent. We assume: First of all, the logic of the application is so complex that it is not practical to immigrate all Client logic/function to the Server-end. Second, all of our end-users should be authenticated before they are able to have access on the Server. In other words, the system is secured and only accessable to authorized users, instead of to all anonymous requests. Third, all data transfer should be secured. Data should be classified. Sender of data should be responsible for what has been sent.

2 XCS Architecture Overview The main architecture of XCS, depicted as Figure 1, can be divided into three layers, Client, Server and Resource. Basically there are two servers – Web Server and Application Server – in Server layer, and both are protected by a Firewall, which relays requests and responses between the Server layer and the Client layer. *

Supported by the National High Technology Research and Development Program of China (863 Program) under Grant Nos. 2002AA114010; 2001AA113070. Also supported by Shanghai Technology Development Foundation (No. 025115014).

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1046–1050, 2004. © Springer-Verlag Berlin Heidelberg 2004

XCS System: A New Architecture for Web-Based Applications

1047

Fig. 1. Overview of XCS Architecture

Mention that there are Logic modules in the Web Server to provide information services to clients, such as User Authentication, Client Version Checking, Customize Configuration, etc. It consults the Resource tier for data needed.

Fig. 2. Logic modules in Web Server

Fig. 3. Client Program

The Logic modules are encapsulated and predefined in various systems. To deal with the variations, a Resource Adapter is used to fetch data from Resource Tier, as shown in Figure 2. This can be implemented as an Adapter Pattern and a Strategy Pattern [GaH94]. The Application Server is an independent application that runs all business-related logical modules. Resource layer is an organized collection of all kinds of data within the whole system, such as metadata and up-to-date client program, ready to be sent to end-user if necessary. We devide Client program into Executable part and Configuration part, as shown in Figure 3. The client subsystem maintains information of executable version.

1048

Y. Wu and W. Zhao

Configurations can be stored in Resource Layer in Server subsystem and/or transferred to end-user with the Executable part. After all, a Metadata Editor (not presented in Fig 1) is defined data such as names of properties needed to describe configuration data for a specific application.

3 Logic and Detailed Structure Primary business logic is dealt between Client Application and Application Server, after Web Server and User Browser have initiated communication. Client is updated automatically and Application Server can be upgraded without shutting service down.

3.1 Logic and Workflow Before business-specific communication protocol is activated, common communications and authentication procedures are working. A first-time user has to visit the website and pass the authentication. User Authentication Module consults Resource layer to verify username and password. Version check procedure checks user’s local information to decide whether the latest Client Application is correctly installed. If so, system will be ready to fetch user configuration data; otherwise, an error page will be returned. User configuration data is kind of Customized Configuration Data (CCData) in Resource. CCData contain optional settings of specific users, including GUI style and business-specific features, which can be of great variation. A metadata configuring subsystem is used to maintain the variation efficiently. User commits configurations to Resource through a web page before using the Client Application. A list of CCData is stored in Resource and will be sent to end-user as needed.

3.2 Detailed Structure: Subsystems The whole system can be roughly divided into the following functioning subsystems: BS Login Subsystem, which consists of Browser (End-user), Web Server and login related web pages, User Authentication Logical Module and User Authentication Data; Client Application Management Subsystem, which consists of Version Checking Module, Release Control Module, Client Application stored in Resource layer and CCData Modification Module; Metadata Management Subsystem, which consists of a User Customized Data Definition Tool and Client Application Release Tools; and Business Logic Subsystem, consisting of Client Application, Application Server and Application Data. The subsystems are designed to meet the following requirements: Secured login and communication Client Application version control Integrated configuration control Easy way of upgrading client program

XCS System: A New Architecture for Web-Based Applications

1049

The machenism that keeps client program up-to-date and application server 24x7 working is as the following. Application Server with Release Server structure is depicted in Figure 4. Server is currently deployed on computer SVR. Release Server REL will do the following to deploy a new version of Server and Client: S1. Fetches a)version b)listening port of the server c)server IP of current server SVR from Resource. If the version is newer than that of the candidate program, terminates with an exception. Otherwise marks it out-of-date in Resource layer. S2. Selects a Server DEST that the new server shall be deployed (SVR is default). S3. Sends an Agent to DEST to negotiate an available port P on DEST, and waits. S4. Agent returns to REL, carrying the Result (success with P or failure with a short reason). If not succeed, Release process terminates with an exception. S5. Checks system status on DEST. If no service is on, put the new version on DEST and start it up (with port P). Else create new application working space on DEST and put the new version in it and start it up (with port P). S6. If DEST starts up normally, updates current system information in Resource with new information, including version, new IP and new port, and kills SVR. An out-of-date service, marked in S1, cannot accept any new connections and will automatically shut itself down as soon as no active client is connected. SVR, DEST and REL may be the same computer, but it is recommended that REL be independent from SVR or DEST and especially reliable. Other redundant measures can be taken to increase the reliability of the whole system, such as backup REL in case it fails.

Fig. 4. Release Server Structure

4 Discussion and Conclusion Resource can be either centralized or distributed. [MiK0l] describes a Mobile Agent based solution for Network Measurement, which illumes a Mobile Agent solution to improve performance of distributed system. Although we defined a metadata definition tool to achieve extensibility, we cannot guarantee all kinds of extensions. One solution is that the architecture turns to be domain-specific with only some specific logical modules, which proves to be more efficient and practical in engineering.

1050

Y. Wu and W. Zhao

We have to take it for granted that behind the Firewall, where all Server works, are trustworthy. To overcome the trust crisis, Application itself should be strength enough against security attack [How02]. XCS system integrates some merits in both BS and CS applications. It automatically and intelligently delivers new client application to out-of-date clients, thus keeps almost all clients are up-to-date. The Metadata Management Subsystem offers a fail-safe mechanism to upgrade Server program without shutting service down. Secure XCS system puts emphasis on not only secure network transport, but application security and robustness as well.

References [GaH94] [How02] [MiK01]

E. Gamma, R. Helm, R. Johnson, J. Vlissides, Design Patterns: Elements of Reusable Object-Oriented Software, Chinese Edition, CMP, 2000, p92,p208 M. Howard, Writing Secure Codes, S. Chinese Edition Translator Y.J. CHENG, et al, CMP, 2002,7, p31 A. Michalas, T. Kotsilieris, et al, Enhancing Performance of Mobile Agent based Network Management Applications, Sixth IEEE Symposium on Computers and Communications, 2001

A PKI-Based Scalable Security Infrastructure for Scalable Grid Lican Huang and Zhaohui Wu College of Computer Science and Technology, Zhejiang University, Hangzhou , PRC {lchuang,wzh}@cs.zju.edu.cn

Abstract. Scalable security is a vital important issue for scalable Grid. There are several issues to be solved for scalable Grid security such as mapping from global subjects to local subjects, centralized certificate authority center, large number of users, many heterogenous security policies. In this paper, we present a scalable Grid security infrastructure(SGSI)to solve the above problems. We here describe the models and related protocols for scalable Grid authentication and authorization.

1

Introduction

Security is a very important issue for a large-scale wide-area system, especially Grid [1]. Because the core issues of the Grid security are very hard, they are far from the solution. When a Grid system becomes large scale and has various many heterogeneous security polices, these issues are hard even more. The security polices of Grid nodes may be Role-based access control (RBAC) and Bell-LaPadula, and so on. How to integrate different security polices is also a big issue. When Grid system is large, how to map huge numbers of global subjects to local subjects is another problem. Centralized certificate authority center is not suitable for scalable Grid system. When a Grid system has a huge number of users, the access control for Grid service becomes a very hard issue. We have proposed a scale Grid architecture-VDHA (virtual and dynamic hierarchical architecture) [2]. Here, we present a Scale Grid Security Infrastructure (SGSI) to solve the above problems, which is suitable for scalable Grid, especially for our scalable VDHA-based Grid prototype system– VDHA_Grid.

2

Scalable Grid Security Infrastructure

We here mainly deal with authentication and authorization. In our SGSI[2], there are no global-to-local mapping table, and we use Grid nodes as CA centers of themselves and the owned users, which are totally locally managed. We also manage authorization and auditing and so on autonomously and locally, and we adopt some methods to avoid the problem of large number of user accounts. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1051–1054, 2004. © Springer-Verlag Berlin Heidelberg 2004

1052

L. Huang and Z. Wu

2.1

Formal Definitions

Grid Node ( denoted Entrance node ( denoted ent), Owner node ( denoted ow ), User ( denoted user) and Client host (denoted cli) are defined in the paper [2]. Definition 1. Grid service (denoted ). The service provided for consumers. owned by owned by owned by Definition 2. Security policy (denoted sp). The security policy for Grid service. for Definition 3. Grid user account ( denoted gacc) . by

means

user owned

Definition 4. Service lifetime management service ( SLMS). It manages the service instance life time, check the authorization, and so on. One Grid node only has one SLMS. belongs to Definition 5. Administrator (denoted Admin). It manages the security policies and authorization base, and so on. belongs to Definition 6. Accounting policy ( denoted Accountp). The accounting policy for Grid service. for Definition 7. Auditing policy ( denoted Auditp ). The Auditing policy for Grid service. for Definition 8. SGSI = { USER, CLI, P, S, GACC, SLMSSET, SP, ACCONTP, AUDITP, ADMIN, FP } , where, USER is the set of users; CLI is the set of client hosts ; P is the set of Grid nodes ; S is the set of services; GA CC is the set of Grid user account; SLMSSET is the set of service life time management service; SP is the set of security policies; ACCONTP is the set of accounting policies; AUDITP is the set of audit policies; ADMIN is the set of administrators; and FP is the set of functions and protocols or core services related to the Grid security. Here, the elements of FP is described as follows: Definition 9. LP:USER × CLI × P(ent) × LP is Login protocol, which is used by users to log in the Grid system. Here, P(ent) is entrance node, P(own) is owner node, and is a user with certificate ticket. Definition 10. SCDP is Service Creation and Destroy Protocol, which creates and destroys the service instances. Definition 11. SCDSIP-I: , Service Creation and Destroy from Service Instance Protocol Type I (SCDSIP-I) creates and destroys service instance when the created service instance is in the same node as SLMS.

A PKI-Based Scalable Security Infrastructure for Scalable Grid

1053

Definition 12. SCDSIP-II: Service Creation and Destroy from Service Instance Protocol Type II (SCDSIP-II) creates and destroys service instance when the created service instance is in the different node with requesting SLMS. Definition 13. Service Account Application Protocol (SAAP) is used by users to apply the access authorization for a certain service. Definition 14. ACP: Access Control Protocol( ACP) controls the users’ access rights for the services. Definition 15. MANAGE: ADMIN × SP × ACCONTP × AUDITP SP × ACCONTP × AUDITP, MANAGE is the service to manage the security policies, accounting policies , auditing services , and so on.

2.2

Login Protocol (LP)

The login protocol is based on public key infrastructure. In VDHA_Grid, the owner node takes as CA of the users and itself. The owner node keeps its owned users’ public key, and also some information of the owner users such as password, which is used to identify user in ordinary ways. The nodes’ public keys are authenticated by itself. We use user-credential to solve the problems such as single-sign-on. Meanwhile, because the client host’s IP address is generally LAN IP address, not the Internet IP address, we use the entrance nodes as proxy stations to help the client to connect to the Grid system. The detail of protocol is shown in paper [2].

2.3

Authentication for Service Creation and Destroy Protocols

When user requests the service, SLMS will check the authorization and create the service instance . In some cases, the created service(requesting service) needs other services ( requested service)to cooperate working. These requesting services and requested services may be within the same nodes or within the different nodes. So, there are three cases. When user requests the service, Service Creation and Destroy Protocol (SCDP) is used to create requested service instance. When a service instance wants another service to cooperator and the requested service is located at the same node as the requesting service, Service Creation and Destroy from Service Instance Protocol Type I (SCDSIP-I) is used to create the service instance. As the requested service and the requesting service instance are within different nodes, Service Creation and Destroy from Service Instance Protocol Type II (SCDSIP-II) is used to create the requested service. Here, we describe the SCDSIP-II in details needs service of Step 1: send (request-creation-instance, step 2: authenticates with step 3: if the user pays for the requested service right =ACP( user, )

1054

2.4

L. Huang and Z. Wu

Access Control Protocol (ACP)

One of the access control protocol may be like following: Step 1: check there are any global user ID. If there is, get the access right and exit, else goto step 2; Step 2: check there is any global group ID which the global user is within. If there is, get the access right and exit, else goto step 3. Step 3: check whether there is a guest account. If there is, get the access right and exit, else goto step 4. Step 4: refuse the access

3

Conclusion

We here propose a scale Grid security infrastructure (SGSI) to solve the issues of the scalability and heterogeneous security polices. SGSI solves the problem of mapping global entity name into local entity name and gives the methods for account management to solve the huge numbers of Grid users. The authorization and auditing are also managed by the service owner node locally. The user is managed by the owner node locally, but the user becomes global user by logging into the Grid system via entrance node from anywhere and getting a certificate ticket. All the above make the security scalable without losing the fulfillment of the Grid requirements.

References 1. I.Foster, C.Kesselman, G. Tsudik, S. Tuecke, “A Security Architecture for Computational Grids”, the 5th ACM Conference on Computer and Communication Security, ftp://ftp.globus.org/pub/globus/papers/security.pdf 2. Huang L., Wu Z. and Pan Y. “Virtual and Dynamic Hierarchical Architecture for e-Science Grid”, International Journal of High Performance Computing Application.2003, 17(3):329-347

A Layered Grid User Expression Model in Grid User Management Limin Liu, Zhiwei Xu, and Wei Li Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, China {liulm,

zxu,

liwei}@ict.ac.cn

Abstract. As a main compositive part of the grid user management infrastructure, grid user expression is very important to realize a practical user management system. In this paper, we present a layered user expression model –RUS model and describe each layer’s user set and expression method. Then give the model’s application. Finally we analysis the model’s advantages to the convenience of system administration and the users....

1 Introduction Among the various researching aspects of grid system, there is not a grid user expression model in the GGF and Globus’s[6] researching group now. In order to realize the effective management of users and support the authentication, authorization and auditing to users, we present a RUS(Role-User-Session) user expression model based on the considerations to the construction of a grid system and its characteristics. This model can effectively solves the grid subject expression and management problems.

2 The Grid User Expression RUS Model The RUS model divides the user expression into three layers, that is role, user and session layers.

2.1 Role Layer During the running of a grid system, the grid users (subject) can be divided into many roles: grid administrator, community or VO administrator, node administrator, service owner, user group administrator and common users. The user expression of role layer should provide a semantic expression method. The role layer locates at the highest level among the user expression layers. Its expression is a semantic one. For example: A grid is a set consisted of community 1N, that is

M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1055–1058, 2004. © Springer-Verlag Berlin Heidelberg 2004

1056

L. Liu, Z. Xu, and W. Li

Then we can express the administrator of the community as “Community Admin”. Supposed the set of all the user entities in grid G is U, all users of grid G are composed of the users distributed in role 1-n: Where

and

Take role as a example, the user number in it is as for as the user entities concerned, (the user entity is a real user in the grid, it is identified by the global exclusive ID in the following session layer) we have That is to say, the user sets of different roles have intersection. is the user entities overlay of grid G and not a partition of them. The users in different role set is a same user iff they have a same global ID.

2.2 User Layer The user layer locates at the middle layer among the RUS expression model. Its expression is similar with the inputted character string when we login into a system. But there must not be two same string names within a administrative domain. The user management basic unit is community or administrative domain. The users set in community expressed by the user layer is: To the whole user set U in grid G, we have: and

The users set constituted by is a partition of all user entities in the grid. At the same time, we should support a user entity to register in different communities. We can use the federation method to merge the information of the user entity among the different communities. There is a mapping relationship between role layer and user layer. The mapping is multiple to multiple, multiple users can be mapped to a role and a user can be mapped to multi-roles. The user expression method of role layer and use layer is user-oriented. The user name in user layer corresponds to the exterior name of a grid user.

2.3 Session Layer The session layer name is used to identify a user uniquely when the user requests a service or carries through other session activities. Considering the user expression’s relations with user management, accounting and auditing, the user expression in session layer must include the following information: The user’s home community (community can also be hierarchical), the user’s home node and the user’s identity information. For instance, the DN name of a user’s X509 certificate is expressed as:

A Layered Grid User Expression Model in Grid User Management

1057

Subject: O=Grid, O=Globus, OU=linux.ict.ac.cn, CN=lk252 This name describes the home community or VO ’s hierarchy name of the user – “lk252”.The user expression name in session layer is unique. Similarly, there is a mapping relationship between user layer and session layer user sets and this mapping is one to one.

3 The Application and Advantages of the RUS Model The three layers described above represent grid user space. At the service level, there is a local user space of the service or resource. Limited by current native operating systems and runtime environment, when a grid user requests a service or resource, there must be a local account mapping to the user at the service or resource level. This mapping relationship can be one to one or multiple to one. This mapping is affected by the policy of the service or resource. In order to support various mapping policy, the user expression names of the grid user space’s three layers can be mapped to the local account at the service level simultaneously. That is to say, the lower layer does not hide the upper layer strictly. The mapping relationship is illustrated as figure 1:

Fig. 1. The mapping relationship between grid user and service local account

The RUS model defines the user space at different layers. In order to support user entity’s session process, the data structure representing an active user entity must include the information of all the three layers. Through the above discussion, the RUS model has the following advantages: 1) Supports multi-granularity user access control The RUS model can provide a very good flexibility to the user access control. You can provide a fine-grained access control to a single user entity, just like Globus, and can maintenance various coarse-grained access control. For example, access control can be based on the role layer user name: {“Project X users”, R, W, E}. 2) Presents a friendly means for user’s interaction Sometimes grid users need to interact with other user entities, for example, the privilege request to a resource owner, the message transfer among coordinated users

1058

L. Liu, Z. Xu, and W. Li

etc. Using the RUS model, we can realize the interaction between users conveniently. The user to interact with other users needs not to remember and provide the expression name of the session layer but can use the expression name of role layer or user layer. 3) Supports Single Sign On(SSO) and the 3A (Anywhere, Anytime and Any device) using mode Because every user entity have a unique ID name and the user’s information can provide its home community and node, when the user logon an arbitrary community’s user management service or roaming among different community, the user management service can parse out the user’s home node from the user’s data structure. So we can realize SSO within a community and among communities.The user himself need not to be concerned about his position. The user only needs to provide his unique ID and then can use the grid successfully. The unique ID can be a very portable device like a key which is very easy to be integrated into any grid accessing device.

4 Conclusions and Future Work Through the contents above mentioned, the RUS model can support the concept of grid process because a user entity has a unique scalable data structure and also can solve the user management scalability problem . Now we are implementing a practical user management system using the RUS model. At the next step, we will consider the policy and context factors of a community and integrated the RUS model in the policy decision and context generation.

References 1. Foster, I., C. Kesselman, and S. Tuecke, The Anatomy of the Grid: Enabling Scalable Virtual Organizations. International Journal of High Performance Computing Applications, 2001.15(3): p. 200-222. 2. L.Pearlman, V. Welch, I. Foster, K. Kesselman, A Community Authorization Service for Group Collaboration. IEEE Workshop on Policies for Distributed Systems and Networks (2002). 3. Privilege and Role Management Infrastructure Standards Validation: http ://w w w. permis.org. 4. R. Alfieri, R. Cecchini.V. Ciaschini, L. dell’Agnello , A.Gianoli, F. Spataro , Managing Dynamic UserCommunities in a Grid of Autonomous Resources: http://wwwconf.slac.stanford.edu/chep03/register/administrator/papers/papers/ TUBT005.PDF 5. Xu Zhiwei, A model of Grid address space model with applications (in Chinese), Journal of Computer Research and Development. Vol. 6, 2003. 6. http://www.globus.org 7. http://www.globalgridforum.org

A QoS-Based Multicast Algorithm for CSCW in IP/DWDM Optical Internet* Xingwei Wang1, Hui Cheng1, Jia Li2, Min Huang 2, and Ludi Zheng3 1

Computing Center, Northeastern University, Shenyang, 110004, China [email protected]

2

College of Information Science and Engineering, Northeastern University, Shenyang, 110004, China 3 Bell Labs Research China, Beijing, 100080, China

Abstract. An integrated QoS-based multicast algorithm for CSCW in IP/DWDM optical Internet is proposed in this paper. Given a QoS multicast request and the delay interval required by group users, the proposed algorithm constructs a flexible QoS-based cost near-optimal multicast routing tree based on genetic algorithm, and assigns wavelengths to that tree based on the wavelength graph. Simulation results show that it is feasible and effective.

1 Introduction To support CSCW in IP/DWDM optical Internet, the feasible and efficient routing tree, through which group users transmit information, must be found. It has been proved that finding such a tree is NP-hard[l]. We propose an algorithm, which generates cost near-optimal multicast tree based on GA (Genetic Algorithm), and constructs an algorithm for wavelength assignment based on wavelength graph proposed by Chlamtac[2]. It integrates wavelength assignment into the process of generating the multicast tree. Both the cost of multicast tree and the user QoS satisfaction degree are considered simultaneously.

2 Model Description IP/DWDM optical Internet can be modeled as a directed and connected graph G(V,E), where V is the set of nodes representing optical nodes and E is the set of edges representing optical fibers that connect the nodes. n=|V|. Every node has multicast capability. Assume only partial nodes have wavelength conversion capabilities. Assume the conversion between any two different wave * This work was supported by the National Natural Science Foundation of China under Grant No.60003006 (jointly supported by Bell labs) and No.70101006; the National High-Tech Research and Development Plan of China under Grant No.2001AA121064. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1059–1062, 2004. © Springer-Verlag Berlin Heidelberg 2004

1060

X. Wang et al.

lengths has the same delay at any optical node with converters, i.e., wavelength conversion, Each link is associated with three parameters: the set of available wavelengths.

If no

is the set

of wavelengths supported by every link in the network, the transmission delay. the cost. Consider a multicast request R(s,D, ), where s is the source node, D the set of destinations. Define as the delay requirement interval of group users. The minimum and the maximum of the interval depend on the specific CSCW application. The route of the multicast connection is a tree The total cost of T is defined as the sum of all the link costs in T. The communication delay on a path consists of link transmission delay and wavelength conversion delay. The delay between s and along T is denoted by Define the delay of T as: Define If the user will accept the offered QoS at 100%; if the user will refuse the offered QoS at 100%. Define the acceptance ratio of the user as the user QoS satisfaction degree:

Define the heuristic cost function as:

3 The Algorithm for Wavelength Assignment The proposed algorithm is based on the idea of wavelength graph. First, construct wavelength graph WG for the tree The used method is as follows. 1) In WG, we create N*w nodes, namely All the nodes are arranged into a matrix with w rows and N columns. Row i represents a corresponding wavelength and column j represents a node in A mapping table is created to record the corresponding relationship between i and and another to record the relationship between j and The two tables will help reversely map the paths in WG back to the paths and wavelengths in 2) For i=1,2,...,w, in the ith row, we add a horizontal directional link between column j nd column h if there exists a link in from node to node sion delay

and the wavelength as its weight.

is available on this link. We assign the transmis-

A QoS-Based Multicast Algorithm for CSCW in IP/DWDM Optical Internet

3) For j=1,2,...,N, in the jth column, for rectional link

between row

and row

1061

we add a vertical bidiif node

in

has the wave-

length conversion capability. We assign the conversion delay t as its weight. A vertical link in WG represents a wavelength conversion at a node and a horizontal one represents an actual link in Denote the nodes in WG by sequential node number, for the node in ith row and jth column, that number is (i-l)*N+j. Treat the wavelength graph WG as an ordinary network topology graph. Run the following algorithm.

is the shortest path from source node s to destination node in WG. We have: i=(x-1)/N+1, j=(x-1)%N+1. Using the above two expressions and the two mapping tables created in step 1), we can reversely map the paths consisted of the sequential node numbers back to the links and wavelengths in conveniently. Then, the wavelength assignment is completed. If multiple wavelengths on a link transmit the same message occur, use the following method: among all the reachable destinations by passing the input link, first select the node which has the maximum end-to-end delay, then for the input link select the wavelength, which is used by the output link leading to the node. The time complexity of the above algorithm is m is the number of destination nodes.

1062

X. Wang et al.

4 GA Design The fitness of solution is computed by fitness function f. It is determined by the cost and the user QoS satisfaction degree for a corresponding tree (forest). The fitness function is: We denote the solution by binary coding. Each bit of the binary cluster corresponds to each node. Every chromosome corresponds to a solution. It’s possible that some chromosomes are bad sometime because they are generated randomly. For example, for CSCW group U={1,2,5,6}, the chromosome S=110110 is bad. It’s because that the graph corresponding to S doesn’t contain node 6. Then the minimum cost spanning tree doesn’t contain node 6. To solve this problem we can generate a special chromosome S. If then bit(S,i)=1, otherwise bit(S,i)=0. Let S and generated randomly have the logical “or” operation, i.e., Thus it’s guaranteed that the graph or tree, which corresponds to any chromosome generated randomly, will contain all the group nodes. The initial population is composed of the chromosomes generated randomly according to the population size. We adopt single point crossover. The crossover point is chosen randomly. For example, and Then after the crossover operation, and The crossover point is between the second position and the third position. We have the mutation operation according to mutation probability. For example, Then after the mutation operation, Bad chromosomes will be dealt with. We use the roulette-wheel selection as the policy to choose the chromosomes. By generating a random number between 0 and 1 and finding the position of the number in the roulette wheel, we can decide to choose which chromosome.

5 Conclusions The simulation research is composed of two parts: evaluation on the cost of the final multicast tree and on the QoS performance of the final multicast tree, respectively. Our simulation experiments are based on NSFNET topology. Simulation results show that the proposed algorithm can find a routing tree with wavelengths assigned for CSCW applications in IP/DWDM optical Internet efficiently.

References 1. Chen, B., Wang, J.P.: Efficient Routing and Wavelength Assignment for Multicast in WDM Networks. IEEE JSAC, Vol. 20, No. 1. (2002) 97-109 2. Chlamtac, I., Farago, A., Zhang, T.: Lightpath(Wavelength) Routing in Large WDM Networks. IEEE JSAC, Vol. 14, No. 5. (1996) 909-913

An Evolutionary Constraint Satisfaction Solution for over the Cell Channel Routing Ahmet Ünveren and Adnan Acan Department of Computer Engineering Eastern Mediterranean University Famagusta, T.R.N.C. (via Mersin 10, Turkey) {ahmet.unveren, adnan.acan}@emu.edu.tr

Abstract. A novel evolutionary approach for ordering assignments in combinatorial optimization using constraint satisfaction problem (CSP) modeling is presented. In assignment of values to variables, the order of assignments is determined through evolutionary optimization exploiting problem specific features. No a priori information is available on the order of assignments that is completely determined by evolutionary optimization to produce the best assignment results. Indeed, experimental evaluations show that the proposed method outperforms very well-known approaches for the solution of NP-hard combinatorial optimization problems from VLSI layout design, namely, channel routing and multi-layer over the cell channel routing.

1

Introduction

Many engineering problems in practice are of combinatorial optimization type that can be modeled as constraint satisfaction problems (CSPs). Constraint satisfaction modeling and the development of efficient algorithms for the solution of CSPs are still hot research topics in Artificial Intelligence (AI), Operations Research and Applied Mathematics. Problem specific information representation, in which model parameters are encoded, has a great impact on the determination of the solution method. In this respect, constraint satisfaction problems (CSPs) provide a general representation and solution model to a large class of combinatorial and other discrete structural problems [2]. This paper presents a novel evolutionary ordering scheme for the assignment of values to variables so that a feasible solution is found in reasonable computation times without using any computationally complex heuristic methods. The main novel feature of the presented approach is the use of an ordering scheme whose features are initially unknown. Then, during an evolutionary optimization procedure based on genetic algorithms (GAs), the order of assignments is optimized to lead a feasible solution in which all constraints are satisfied [3]. Experimental evaluations demonstrate that, the proposed approach is highly efficient in terms of its computational speed and solution quality. M. Li et al. (Eds.): GCC 2003, LNCS 3033, pp. 1063–1066, 2004. © Springer-Verlag Berlin Heidelberg 2004

1064

2

A. Ünveren and A. Acan

Evolutionary Optimization of Assignment Order

The idea in determining an optimized order of assignments for a CSP is as follows: In any CSP problem, a solution can be constructed by assigning values to variables in a certain order. Estimation of this assignment order by using problem-specific features only, such as a constraint graph, is not easy and, indeed, the heuristics proposed for this purpose are either computationally complex or not guaranteeing a feasible solution in reasonable computation times. The difficulty of determining the order of assignments is due to the fact that it is a non-linear function of problem features and the constraint requirements dynamically change as the assignments proceed. Hence, determination of an assignment order before the assignments are actually made is extremely difficult. Based of these observations, an evolutionary optimization approach for the assignment ordering is proposed. In this approach, values are selected from the variable domains one-by-one in certain predefined order, for example, in ascending order of values for integer-valued domains or in lexicographical order for string-valued domains etc. For a certain value to be assigned, the selection of the corresponding variable that has the selected value in its domain, among a number candidates, is the most critical issue in the construction of a feasible solution. The most commonly employed heuristics which aim to determine the candidate variable based on problem-specific features only are the following: On the one hand, minimum remaining values heuristic selects that variable having minimum number values in its domain. On the other hand, the most constrained variable heuristic assigns the current value to the variable having the maximum in-degree on the constraint graph. These heuristics, combined with the backtracking search, are either computationally complex or don’t guarantee to find a feasible solution in reasonable amount computation times. In our approach, no such heuristics are used. Instead, a generic assignment-ordering function is selected and optimized using a tailored GA. Assume that, C is the set of candidate variables that can take the currently selected value, CS(.) is a candidate scoring function which assigns a score to each candidate based on problem-specific features, and R(.) is the assignment-ordering function to be optimized, then the maximum score over all candidate variables is defined as;

where the assignment-ordering function acts as a biasing over the problemspecific score value. This bias value is optimized using GAs for each variable to get a feasible and optimal assignment for the CSP under consideration. The candidate variable having the maximum score takes the current value as assignment and after each assignment the candidate set and the candidate scores are updated. The current value is repeatedly assigned to variables until it is completely removed from the domains of the remaining variables. Consequently, the assignments proceed with another selected variable and goes on until a feasible solution is constructed. Note that, the novelty of the proposed approach lays in the use of function R(.) in the determination of the order of variables to be assigned to the currently

An Evolutionary Constraint Satisfaction Solution

1065

selected value. There is neither a priory information on the values or shape of R(.) nor any heuristics employed for the selection of the variables. The proposed approach determines R(.) through evolutionary optimization while providing an optimal feasible assignment for the problem under consideration.

3

Experimental Work

Two well-known NP-hard combinatorial optimization problems from VLSI layout design, namely, the channel routing problem and over the cell channel routing problem (OTC) are taken into consideration for performance tests [1,6]. Channel routing and multi-layer OTC problems can be transformed into a CSP as follows: the variables are nets to be routed and the domain of a variable contains the horizontal tracks that it can be assigned to. Constraints on variable values are extracted from Vertical Constraint Graph (VCG) and Horizontal Constraint Graph (HCG). Each variable is assigned a score as follows: The problem-specific scores for variables (nets) are taken as a function of maximum path-length passing through them in VCG and node degrees in HCG. Accordingly, the problem-specific candidate score, of a variable is determined as,

For the channel routing, each gene holds a real value, R(.), and for multilayer OTC channel routing each gene holds three integers and a real number, which identify the routing state, determine the OTC and within the channel H-layer numbers, and a value for R(.), respectively. For each genotype, the corresponding phenotype is a routing of nets determined by the assignment-ordering function and the corresponding problem-specific candidate score values. Each phenotype is a feasible routing for which the fitness value is taken the resulting channel width. The detail of genotype to phenotype transformation is as follows: For each horizontal track (a value to be assigned), firstly, the set of candidate nets (variables) is determined. Simply, these are the nets with no ancestors in the VCG. Then, the scores of candidate nets due to their problem-specific states are determined as explained above. Consequently, for the genotype under consideration, the total scores of candidate nets are determined by adding the computed scores and the fourth number values. The net having the maximum score is assigned to the current horizontal track. After an assignment, VCG and HCG are modified to reflect the effect of the assignment over existing constraints and problem specific features of variables. This procedure is repeated until the current horizontal track is full. Then, it is removed form the remaining variable domains and the next horizontal track is considered for assignments. This process is repeated for different layers. Table 1 (a), (b) and (c) compares the performance of the presented method, with recent works, for the solution of 7 benchmark problems taken from [6], where d refers to a lower bound extracted from channel constraints, OFBGA

1066

A. Ünveren and A. Acan

refers to the presented approach and T, B and I refers to the top, bottom and internal channel areas, respectively.

4

Conclusion

The main idea behind the proposed approach is to remove all heuristics in the assignment of values to variables and, instead, an assignment-ordering function is presented and optimized by a tailored GA to provide a feasible and optimal solution. The proposed algorithm is very simple to implement and very fast. For example, for the hardest benchmark problem, Deutsch, a feasible solution with the best-known lower bound is found within a few seconds in all trials. Further research will be carried out on the use of polynomial and nonpolynomial functions the shapes of which are optimized in a goal (i.e. fitness) directed manner.

References 1. Goni, B. M., Arslan, T. and Turton, B.: Power Driven Routing Using a Genetic Algorithm, 3rd World Multiconference on Systemics, Cybernetics and Informatics and 5th International Conference on Information Systems Analysis and Synthesis, Orlando (1999) 2. Kumar, V.: Algorithms for Constraint Satisfaction Problems: A survey, Artificial Intelligence Magazines (1992) 3. Liu, X., Sakamoto, A., and Shimamoto, T.: Restrictive Channel Routing with Evolution Programs, Vol. E76-A. Trans. IEICE (1993) 4. Rahmani, A.T. and Ono, N.: A Genetic Algorithm for Channel Routing Problem, proceedings of 5th International Conference on Genetic Algorithms (1993) 5. Shiraishi, Y.Y. and Sakemi, J.: A Permeation Router,Vol. CAD-6. IEEE Transactions Computer-Aided Design (1987) 462-471 6. Yoshimura, T. and Kuh, E.S.: Efficient Algorithms for Channel Routing, Vol. CAD1. IEEE Trans. on Computer Aided Design of ICAS, (1982) 25-35

Author Index

Acan, Adnan II-1063 Ahmad, H. Farooq I-608 Ai, Bo II-895 Ai, Ping II-191 Al-Ali, Rashid II-529 Ali Arshad I-608,II-913 Ali, Zeyad II-879 Aloisio, Giovanni I-131 Alvi, Fawaz Amin I-657 Amin, Kaizar II-464,II-529 Anjum, Ashiq II-913 Arafeh, Bassel R. II-254 Ayres, F. I-601 Azim, Tahir II-913 Bai, Haihuan I-371 Bai, Ying-Cai II-236 Berti, Guntram I-34 Bi, Jingping II-944 Bin, Wang I-875 Bouras, Christos II-344 Bragança, R.S.N. de I-601 Briggs, Ransom I-536 Bu, Jiajun I-242 Bunn, Julian J. II-913 Cafaro, Massimo I-131 Cai, Lizhi I-641 Cai, Shi-Jie I-633,II-1042 Cai, Wentong I-83,I-168,I-316,II-935 Cai, Wenyuan I-42 Cai, Xu II-207 Cai, Yue-Ru II-319 Cao, Jian I-108,I-616,I-708,I-738, I-948,II-927,II-1026 Cao, Jiannong I-266,I-316,II-97,II-935 Cao, Jing I-1017 Cao, Junwei I-34 Cao, Lei I-883,II-482 Cao, Min I-266 Casey, John II-121 Ce, Yu I-1026 Cha, Li II-521,II-541 Chang, Guiran I-813,II-549,II-698 Chang, Weng-Long II-97

Chau, Siu-Cheung I-75 Chen, Bo I-732 Chen, Changgui II-871 Chen, Changjia I-388,I-446 Chen, Chun I-242 Chen, Daoxu I-412 Chen, Dehui I-213 Chen, Deren II-105 Chen, Dongfeng I-821,II-426 Chen, G.L. I-1067 Chen, Guihai I-412 Chen, GuoLiang I-645,II-715 Chen, Haitao I-379 Chen, Haopeng II-608 Chen, Honghui I-560,II-240 Chen, Huachao II-1038 Chen, HuaJun II-727,II-752 Chen, Huaping I-645,II-715 Chen, Jianhua II-430 Chen, Kang I-396 Chen, Li I-19,II-669 Chen, Luo II-581 Chen, Mei II-706 Chen, Ming I-552,II-612 Chen, Ping I-677,I-799 Chen, Rong I-568,I-716,I-1079 Chen, Shudong I-404 Chen, Shuoying II-521 Chen, Song I-833 Chen, Tian II-223 Chen, Wei-dong II-677 Chen, Wenguang I-839 Chen, Xin II-400,II-715 Chen, Xue II-787 Chen, Yihai I-956 Chen, Yingjian II-279 Chen, YunTao II-507 Chen, Zhang-long II-604 Cheng, Haiying I-233 Cheng, Hui II-1059 Chi, Xuebin I-732 Chi, Yi II-219 Chien, Andrew A. I-9 Choi, Wan-Kyoo II-723 Chowdhury, Morshed II-1002

1068

Author Index

Chung, Il-Yong II-723 Chung, Ilyong I-972,II-259 Chyan, Daphne II-962 Costa, M.C.A. I-601 Costa, S.R.R. I-601 Cui, Wei I-724 Cui, Xianguo I-867

Dai, Bo I-813 Dai, Guanzhong I-247 Dai, Yafei I-456,I-464 Dai, Yunping I-1083 Dehestani, Alireza II-450 Den, Qianni I-948 Deng, Hong I-19 Deng, Kun II-815 Deng, Qianni I-11, I-19, I-108, I-164, I-259,I-503,I-616,I-669,I-738,II-620, II-669 Deng, ShuiGuang II-954,II-978 Deng, Zhiqun I-247 Di, Rui-Hua II-145 Ding, Jingbo I-641 Ding, Lianhong II-787 Ding, Peng II-803 Ding, Ying II-183 Dong, Shoubin I-229,II-89,II-644 Dong, Wei I-1034 Dong, Xingchang II-573 Dong, Yuanfang II-1022 Dong, Yuguo I-176 Dong, Zheng I-964 Dou, Wan-Chun I-633,II-1042 Du, Wei II-604 Du, Zeng-Kai II-303 Du, Zhihui I-1043,II-10,II-706 Duan, Hai-Xin I-859 Ebecken, N.F.F. I-601 Eom, Young Ik I-420 Epicoco, Italo I-131 Fan, Beibei I-1012 Fan, Weiwei II-597 Fan, Yushun I-653 Fang, Cheng I-560 Fang, Juan II-145 Fang, Jun II-778 Fang, Minglun II-137 Fang, Xiangming I-529

Feng, Boqin I-159,I-1039 Feng, Haolin II-113 Feng, Yuhong II-935 Feng, Zheng I-821,II-219 Ferreira, L.V. I-601 Fingberg, Jochen I-34 Fu, Ada Wai-Chee I-75 Fu, Cheng II-628 Fu, Qianfei II-227 Fu, Wei I-519 Fu, Yonggang II-710 Fu, You II-73 Gandour, F. I-601 Gao, Hui I-180 Gao, Ji I-568,I-1079,II-161 Gao, Qi II-727,II-954,II-978 Gao, Wen I-576 Gao, Xiaolei I-956,II-970 Gao, Yang II-65 Gentzsch, Wolfgang I-7 Ghose, Supratip I-139 Goel, Sushant II-847 Gong, Bin I-172 Gong, Yili I-685 Gong, Zhenghu I-379 Gordon, John II-811 Gu, Guo-chang I-980 Gu, Ning I-770 Guan, Jianbo I-932 Guan, Xin II-807 Gui, Chunmei II-597 Gui, Xiao-Lin I-891 Gui, Yadong I-568,I-616,I-1079 Gulati, Ved P II-360 Guo, Deke II-240 Guo, Dianchun I-988,I-1047 Guo, Leitao I-529,I-821 Guo, Minyi II-97 Guo, Qiang I-813 Guo, Yike II-811 Guo, Yuan-ni I-992 Guo, Yunfei I-176 Han, Guangjie II-549 Han, Jian-Jun II-141 Han, Jun I-59 Han, Sung-Kook I-754 Han, Weihong I-932 Han, Weili II-327,II-418

Author Index Han, Yanbo I-99,II-778 Han, Yaojun II-73 Hao, Ping II-207 He, Chuan II-10 He, Ge I-778 He, Hui II-57 He, Kaitao I-196,II-581 He, Kejing I-229 He, Tao I-428,I-536 He, Xiaojian I-363,II-513 Helian, Na II-811 Hernández, Héctor J. II-907 Ho, Quoc-Thuan I-83 Hönig, Udo II-18 Hong, Seok Won II-455 Hong, Yu I-67 Hou, Fangyong I-907 Hou, Huawei II-215 Hou, Yafei II-195 Hou, Zhiqiang I-825 Hu, Haitao I-99,II-778 Hu, Huaping I-379 Hu, Jinfeng I-292 Hu, Jun II-161 Hu, Mingzeng II-57 Hu, Tao II-149 Hu, Yunfa I-984, I-1004, II-690 Huan, Dandan II-1022 Huang, Bin I-262,I-519 Huang, Chang II-855 Huang, Changqin II-105 Huang, Daoyin II-404 Huang, Fei-xue II-899 Huang, He II-145 Huang, Jianhua II-404 Huang, Joshua II-65 Huang, L.S. I-1067 Huang, Lican II-1051 Huang, Lili I-428 Huang, Linpeng I-746,I-1071 Huang, Min II-1059 Huang, Shangteng II-422 Huang, Shuangxi I-653 Huang, Tao II-294 Huang, Yong-zhong II-400 Huang, Zunguo I-379 Huo, Zhigang I-724 Ikram, Ahsan II-913 Iqbal, Kashif I-608

1069

Jabeen, Zohra I-657 Jalili-Kharaajoo, Mahdi II-450,II-459 Jang, Hyuk Soo II-455 Jeong, Chang-Won I-754 Jeong, Jongil II-557 Jeong, Young-Sik II-918 Ji, Peng II-211 Jia, Yan I-932 Jian, Xiao I-1026 Jiang, Changjun I-209,I-475,I-616, I-1008,II-73,II-887 Jiang, Jinlei II-636,II-660 Jiang, Jixiang I-794 Jiang, Junjie I-371 Jiang, Sheng II-490 Jiang, Shui I-616 Jiang, Tingyao I-452 Jie, Yan-Rong II-396 Jin, Feng I-984 Jin, Hai I-487,I-700,II-48,II-573,II-830 Jin, Hairong I-356 Jin, Liang I-829 Jin, Tian II-335 Jin, Wei I-616,I-883,II-482 Jin, Ying II-891 Jing, Ning I-196,II-581 Jizhou, Sun I-1026 Jo, Geun-Sik I-115,I-139 Jong, Chu J. II-l Joo, Su-Chong I-754,II-918 Jou, Wou Seok II-455 Ju, Jiu-bin I-1000,II-303 Ju, Shiguang I-988,I-1047,II-907 Jung, Jason J. I-115 Kan, Haibin II-446 Kang, Dazhou II-736 Kang, Lishan II-1030 Kang, Moon Seol I-924,I-1091,I-1095 Kang, SeokHoon I-273,I-1099 Kang, Yong-hyeog I-420 Kesselman, Carl I-2 Kettani, Houssain II-879 Khoja, Shakeel A. I-657 Kim, Backhyun I-1099 Kim, Boon-Hee I-285 Kim, Cheolhyun I-972 Kim, Dong-Kyoo I-1087 Kim, Hak Du II-34

1070

Author Index

Kim, Hyun-Jun I-115 Kim, Iksoo I-1099 Kim, In-suk I-420 Kim, Jai-Hoon II-565,II-838 Kim, Jin Suk II-34 Kim, Myung-Joon I-269 Kim, Wonil I-1087,II-565 Kim, Young-Chan I-285 Kirstein, Peter II-490 Knosp, Boyd M. I-428,I-536 Ko, Young-Bae II-838 Kong, Xianglong I-172 Konidaris, Agisilaos II-344 Kostoulas, Dionysios II-344 Kwong, Oscar M.K. I-316 Lai, Zhuo II-166 Lan, M. I-584 Lao, Song-Yang II-170 Laszewski, Gregor von II-464,II-529 Le, Jiajin II-203 Lee, Chung Ki II-455 Lee, Do Hyung II-311 Lee, Gunhee I-1087 Lee, Ho-Kyoung I-139 Lee, Jaejin II-442 Lee, Jong Sik II-250 Lee, Minji II-565 Lee, Okbin II-259 Lee, Sang Jun I-924,I-1091,I-1095 Lee, Sangho II-259 Lee, Yeijin I-972 Lei, Lianhong I-388 Lei, Yongmei I-225 Lezzi, Daniele I-131 Li, Baiyan I-786,I-899 Li, Bingchen I-762 Li, Changyun I-817 Li, Chunjiang I-151,II-26 Li, Donglai II-778 Li, Dongsheng I-324,I-519 Li, Feng I-1021 Li, Gang I-99 Li, Gansheng I-817 Li, Guoqing I-1034 Li, Hui I-649 Li, Jia II-1059 Li, Jian I-980 Li, Jianzhong I-348 Li, Juan-Zi II-319,II-903

Li, Keqiu II-263 Li, Li II-97 Li, Lin I-778 Li, Luqun I-867 Li, Maosheng II-227 Li, Ming I-292 Li, Minglu I-108,I-616,I-708,I-738,I786,I-867,I-883,I-940,I-948,II-482, II669,II-710 Li, Qing-Hua II-141,II-694 Li, Qinghu I-340 Li, Ren-fa I-992 Li, San-li I-661,II-10 Li, Shanping I-356,I-916 Li, Shaolong I-446 Li, Wei I-480,I-544,I-685,I-762,I-825, II-521,II-541,II-1055 Li, Xiang II-795 Li, Xiao I-300 Li, Xiaobin I-188,II-187 Li, Xiaolin I-649 Li, Xiaoming I-464 Li, Xiong-fei II-807,II-1022 Li, Xuan I-1043 Li, Ya-wei II-174 Li, Yin I-308,I-817 Li, Ying I-883,I-940,II-482 Li, Yong I-242 Li, Zhi-jie II-899 Li, Zhongcheng II-944 Li, ZiTang II-149 Li, Zupeng II-404 Liang, Anderson II-962 Liang, Bangyong II-903 Liang, Zhengyou II-89,II-644 Lin, Chinglong I-428 Lin, Lin I-1071 Lin, Peijun I-1012 Lin, WeiMing II-727 Lin, Xinhua I-503 Liu, Bo I-262,I-653 Liu, Chengfei II-97 Liu, Dingsheng I-1034 Liu, Donghua I-778,I-825 Liu, Donglin II-413 Liu, Hong I-11 Liu, Hui I-616,I-883,II-482 Liu, Jie II-787,II-795 Liu, Jinde I-460,I-803,I-1055 Liu, Lilan II-137,II-500

Author Index Liu, Limin II-1055 Liu, Ling II-970 Liu, Linlan II-244 Liu, Peng I-661 Liu, Qin II-352 Liu, Shaoying II-970 Liu, Tao II-438 Liu, Wei II-604 Liu, Wenyin I-475 Liu, Wu I-859 Liu, Xiangjun I-237 Liu, Xingwu I-692 Liu, Xinyu I-576 Liu, Xu I-217 Liu, Xuezheng I-552,II-612 Liu, Yan I-536 Liu, Yin I-475 Liu, Yong I-259,II-677 Liu, Yuan I-716 Liu, Yujun II-327,II-418 Liu, Yun I-907 Liu, Yunhao I-300 Liu, Yunhuai I-300 Liu, Yunsheng I-200,II-590 Liu, Zhao II-1030 Liu, Zhen I-907 Long, Shanjiu I-821 Lou, BingLiang I-1017 Lu, Efeng I-205 Lu, Jian II-335 Lu, Jianjiang II-736 Lu, Li-Hua I-859 Lu, Xiaofeng I-229 Lu, Xiaolin I-91 Lu, Xicheng I-324,I-519 Lu, Xinda I-11,I-164,I-259,I-503,I-669, II-40,II-620,II-669 Luan, Xi-Dao II-170 Luan, Zhongzhi II-438 Luo, Junzhou I-637,I-1021,II-81,II-211 Luo, Xuemei II-73 Luo, Xueshan I-560,II-240 Luo, Yingwei I-624 Luo, Yong-Jun II-236 Luo, Zongwei II-65 Lv, ZhiHui I-221 Ma, Dan II-153,II-157,II-507 Ma, Fanyuan I-308,I-404,II-279, II-380,II-744

1071

Ma, Jie I-724 Ma, Tianchi I-356,I-916 Ma, Zhaofeng I-1039 Maccabe, Arthur B. II-1 Malluhi, Qutaibah II-879 Mao, Hongyan I-746 Mao, Yuxin II-752 Matsuoka, Satoshi I-8 Mendonça, C.E. I-601 Meng, Fanrong I-996 Meng, Qingchun I-471 Meng, Xiangxu I-172 Miao, Huaikou I-956,II-970 Miao, YuQing II-179 Mikler, Armin R. II-464 Min, Fan II-652 Min, Kyong Hoon II-455 Mirto, Maria I-131 Mocavero, Silvia I-131 Moon, Kiyoung I-849 Motallebpour, Hassan II-450 Nanya, Takashi I-576 Neo, Hoon Kang I-495 Neves, L.G. I-601 Newman, Harvey B. II-913 Ni, Guangnan I-716 Ni, Jun I-428,I-536 Ni, Lionel M. I-300 Niu, Liping I-168 O’Hanlon, Piers II-490 Onel, Yasar I-536 Ong, Yew-Soon I-83 Padmanabhan, Anand I-536 Pan, LeYun II-380,II-744 Pan, Yun-he II-677 Parhami, Behrooz II-408 Park, Cheehang I-849 Park, Chong-Won I-269 Park, Jin-Won I-269 Park, Namje I-849 Park, Sang-Min II-838 Park, Seung Bae I-924,I-1091,I-1095 Pegueroles, Josep I-875 Peng, Jian I-803 Peng, Jun I-511 Peng, Liang I-495 Peng, Xiaoning I-262

1072

Author Index

Peng, Xin I-592 Pujari, Arun K II-360 Qi, Cong I-964 Qi, Zhengwei I-738,II-891 Qian, Depei II-438 Qian, Liang I-356 Qian, Qi I-616,I-883,II-482 Qian, Weining I-42,I-277 Qiang, Weizhong I-487,II-573,II-830 Qiao, Li’an I-237 Qiao, Wei-guang I-192 Qin, Huaifeng I-1075 Qin, Zhiguang I-460,I-1055 Qing, Yang I-251 Qiu, Jie II-521,II-541 Qu, Min I-456 Qu, Weifen I-471 Qu, Xiangli II-597 Qu, Yuzhong I-1063,II-768 Radha, V. II-360 Rana, Omer II-529 Rao, Jinghai II-760 Rao, Ruonan I-616,I-786,II-207 Rao, Weixiong II-279 Ren, Aihua I-809 Ren, Ping I-859 Ren, Yan I-560 Ren, Yi I-932 Rico-Novella, Francisco I-875 Rong, Chunming I-1083 Rong, Hongqiang II-65 Ryu, Yeonseung II-455 Schiffmann, Wolfram II-18 Schmidt, Jens Georg I-34 See, Simon I-495 Shan, Hongwei II-166 Shan, J.L. I-1067 Shan, Jiulong II-715 Shan, Lijun II-994 Shang, Erfan II-706 Shao, Miao II-1038 Shao, Weimin II-430 Sharda, Hema II-847 Shen, Fuxiang I-440 Shen, Haiying I-412 Shen, Hao I-184 Shen, Hong II-263,II-446

Shen, Liping II-710 Shen, Meiming I-237,I-396 Shen, Ruimin I-475,II-710 Shen, Yunfu I-233 Sheng, Huanye II-803 Sheng, Qiujian II-231 Sheng, XiangZhi II-335 Shi, Hongyi I-809 Shi, Meilin II-636,II-660 Shi, Ming-Hong II-236 Shi, Shengfei I-348 Shi, Shuming I-396 Shi, Wei I-26 Shi, Xuanhua I-487,II-573,II-830 Shi, Yao I-661 Shi, YouQun II-887 Shi, Zhanbei II-137,II-500 Shi, Zhongzhi II-231 Shin, Chang-Sun II-918 Shin, Dongil II-557 Shin, Dongkyoo II-557 Shu, Jian II-244 Shuai, Dianxun II-327,II-413,II-418 Sohn, Sungwon I-849 Song, Anping I-233 Song, Guanghua I-51,II-113 Song, Jie I-495 Song, Shaowen II-244 Soriano, Miguel I-875 Steenberg, Conrad II-913 Stoelwinder, Appie I-495 Su, Wei-ji I-155 Su, Xiaomeng II-760 Su, Yu I-155 Suguri, Hiroki I-608 Sun, Jiaguang I-340 Sun, Jinfei I-188,II-187 Sun, Jizhou I-205,II-215 Sun, Juan I-633,II-1042 Sun, Xi-li I-11,I-164 Sun, Yongqiang I-184,I-746,I-1071 Sun, Yugeng II-719 Sun, Yunchuan II-795 Sun, Yuzhong II-986 Sung, Mee Young II-311 Tan, Enhua I-544 Tan, Liansheng II-352 Tang, Feilong I-108,I-708,I-738,II-891 Tang, Jianquan I-641,I-829

Author Index Tang, Jie II-319 Tang, Xin-huai II-513,II-702 Tang, Yu I-196,II-581 Taniar, David II-847 Tao, Xiaofeng I-1008 Thomas, Michael II-913 Thompson, Steve II-811 Tian, H.T. I-1067 Tian, Haitao II-715 Tian, Jing I-464 Tong, Frank II-65 Tong, Weiqin I-616,I-641,I-829,I-1059 Tsujita, Yuichi II-129 Tu, Shi-liang II-604 Ünveren, Ahmet Verma, Vinti

II-1063

II-879

Wagner, Roland M. I-99 Walker, David II-529 Wan, Peng II-223 Wang, Baoyi I-1051,II-1034 Wang, Benli II-507 Wang, Bin I-677,I-799 Wang, Bingqiang I-225 Wang, Binqiang I-176 Wang, Bo I-641,I-829 Wang, Chang-da I-988,I-1047 Wang, Chang-ji I-436 Wang, Chaokun I-348 Wang, Chunzhi II-1038 Wang, Dingxing I-552 Wang, Frank II-811 Wang, Gang I-155 Wang, Hao I-685 Wang, Hongbing I-1063 Wang, Hui I-217 Wang, Jiangchun II-1010 Wang, Jianmin I-340 Wang, Jianwu I-99 Wang, Jie I-511 Wang, Jing I-99 Wang, Ke-Hong II-319,II-903 Wang, Lei I-576 Wang, Li I-217 Wang, Qing I-255 Wang, Qing-Jiang I-891 Wang, Shan I-833 Wang, Shaowen I-428,I-536

Wang, Shoubin II-549 Wang, Shuqing I-629 Wang, Wei I-770 Wang, Weinong I-371 Wang, WenRui I-645 Wang, Xianbing I-316 Wang, Xiaolin I-624 Wang, Xiaozhi II-81 Wang, Xingwei II-549,II-1059 Wang, Xue II-778 Wang, Yan-Yan I-266 Wang, Yanlin II-719 Wang, Yijie I-324 Wang, Yu I-544,I-984,II-690 Wang, Yuan II-986 Wang, Zhaofu II-815 Wang, Zhijian II-191 Wang, Zhiying I-907 Wei, Chengbing I-471 Wei, Guiyi I-51,I-123 Wei, Wenguo II-89,II-644 Wen, Jiyue I-813,II-698 Weng, Chuliang I-669,II-620 Wirt, Richard I-4 Wladawsky-Berger, Irving I-3 Wu, Geng-Feng I-266 Wu, Guoqing I-147 Wu, Hong I-732 Wu, Jian-Ping I-436,I-859 Wu, Jiangxing I-176 Wu, Jie I-332 Wu, Ling-Da II-170 Wu, Qi II-944 Wu, Quanyuan I-932 Wu, Ruey-Shyang II-962 Wu, Shaomei I-1043 Wu, Weiguo II-438 Wu, Xiaojun II-855 Wu, Yijian II-1046 Wu, Yinghui I-292 Wu, Yongwei I-255 Wu, Yu-jin I-770 Wu, Zengde I-404 Wu, Zhaohui II-677,II-727,II-752, II-855,II-954,II-978,II-1051 Wu, Zhenyu I-19 Xia, Jing II-1038 Xia, Jun II-113 Xia, Xiaodong II-1034

1073

1074

Author Index

Xiang, Hui I-172 Xiang, Yang II-1002 Xiang, Zhen I-196 Xiao, Mingzhong I-456 Xiao, Nong I-151,I-262,I-324,I-519, II-26 Xiao, Wenjun II-408 Xie, Bing I-891 Xie, Bo I-475 Xie, Hongxia I-996 Xie, Junyuan I-1063 Xie, Lun-Guo II-170 Xie, Xiao II-822,II-863 Xie, Yu-Xiang II-170 Xin, Hua II-1018 Xing, Jianguo II-434 Xu, Baowen I-794,II-736 Xu, Chengzhong I-412 Xu, Chuanfu I-379 Xu, Cong-fu II-677 Xu, Guoshi I-799 Xu, Lei I-794,II-183 Xu, Linhao I-42,I-277 Xu, Liutong II-895 Xu, Shengliang I-778 Xu, Weimin I-225 Xu, Xianghua I-242 Xu, Xiao-fei I-964 Xu, Zhihong I-205 Xu, Zhiwei I-1,I-480,I-544,I-649, I-685,I-692,I-762,I-778,I-825,II-521, II-541,II-986,II-1055 Xu, Zhuoqun I-624,I-677,I-799 Xue, Fu-ren II-1018 Xue, Guangtao I-363 Xue, Tao I-159 Yan, ChunGang II-887 Yan, He II-474 Yan, Xinfang II-719 Yang, Chunlin I-332 Yang, Da-Gang I-633,II-1042 Yang, Feng I-440 Yang, Geng I-1083 Yang, Guangwen I-237,I-255,I-396, I-552,I-839,II-612 Yang, Guo-wei II-652 Yang, Jian I-221 Yang, Jiangang II-166 Yang, Jie I-147

Yang, Ke-xin I-1000 Yang, Ning I-649 Yang, Shaofeng II-879 Yang, Shoubao I-67,I-529,I-645,I-821, II-219,II-227,II-426 Yang, Tao I-803 Yang, Xuejun I-151,II-26,II-597 Yang, Xuesheng I-213 Yang, Yun II-271 Yang, Zhen I-988,I-1047 Yao, Wensheng I-899,II-822,II-863 Ye, En I-592 Ye, Hua II-388 Ye, Lin II-388 Yeh, Hongjin I-1087 Yin, Zhaolin I-188,II-187 Yoe, Hyun II-442 Yong, Jianming II-271 Yoon, Won-Sik II-838 You, Jinyuan I-363,I-786,I-899,II-207, II-513,II-628,II-702,II-822,II-863,II-891 Young, Chao II-203 Yu, Haiyan II-521,II-541 Yu, Jiadi I-738,I-883,I-940,II-482 Yu, Liping I-984 Yu, Shui II-121,II-380 Yu, Song II-396 Yu, Song-Nian II-474 Yu, Tao II-137,II-500 Yu, Xiang I-67 Yu, Xiangning II-294 Yu, Xueli I-217 Yu, Yijiao II-352 Yu, Young-Hoon I-139 Yu, Zhen II-954,II-978 Yuan, Pingpeng I-700 Yuan, Shijin I-1004 Yuan, Shyan-Ming II-962 Yuan, Xiaojie I-168 Yue, Feng II-887 Zang, Xue-bai II-807 Zeng, Bin II-149 Zeng, Guosun I-192,I-209 Zeng, Wandan I-813,II-549 Zeng, Yi I-1034 Zha, Yabing I-200,II-590 Zhan, Shouyi I-440 Zhang, Aijuan I-188,II-187 Zhang, Baowen II-608

Author Index Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, II-1026 Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang, Zhang,

Chuanfu I-200,II-590 Dehua I-468 Feng I-1055 Gongxuan I-1030 Guo-yin I-980 Hao I-487 Hong-jun II-157 Hongli II-57 Jian II-40 Jiang-ling I-251 Jianjun II-287 Jiawan II-215 Jifa I-51,I-123 Jun II-507 Jun-yan II-652 Kai II-690 Lan II-907 Lei I-529 Li I-180 Liang I-308,II-744 Liang-Jie I-10 Ling I-229,II-89,II-644 Liqin I-653 Peng II-899 Shao-hua I-770 Shaomin I-1051,II-1034 Shensheng I-948, II-927, II-1010, ShiYong I-221, II-195 Shu I-446 Shu-Jie II-145 Tong I-200, II-590 Wanjun I-1034 Wei I-404,II-153,II-157,II-507 Wei-Yong II-223 Weiming I-213 Weishi II-183 Weizhe II-57 Wensong II-815 Wu I-233,II-430 Xianfeng I-1055 Xiao-dan I-155 Xiao-Guang II-927 Xinquan II-279 Yanzhe II-541 Yaying II-702 Yingzhou II-736 Yongbo I-196 Yongxiang I-428 Yue-zhuo II-400

1075

Zhang, Yunfei I-388,I-446 Zhang, Yuqing I-468 Zhang, Zhaohui I-209 Zhang, Zhihuan I-629 Zhang, Zhixiang II-694 Zhao, Dazhe I-653 Zhao, Dong II-174 Zhao, Hai I-155 Zhao, Keping I-277 Zhao, Kun II-807 Zhao, Wenyun I-592, II-1046 Zhao, Xiubin II-404 Zhao, Zhuofeng I-99 Zheng, Guozhou II-855 Zheng, Ludi II-1059 Zheng, Ran II-48 Zheng, Shi-jue I-251 Zheng, Weimin I-237, I-292, I-396, I-839 Zheng, Weiming I-255 Zheng, Wen-yi I-988 Zheng, Yao I-51, I-123, II-105, II-113 Zhi, Qing I-209 Zhi, Xiaoli I-1059 Zhong, Aling I-452 Zhong, Jidong II-422 Zhong, YiPing I-221,II-195 Zhou, Aoying I-42,I-277,I-616 Zhou, Dong-qing II-899 Zhou, Haifang II-581 Zhou, Lihua I-1051 Zhou, Ming-Tian I-833, II-174 Zhou, Mingquan II-287 Zhou, Shijie I-460 Zhou, Shuigeng I-42,I-277 Zhou, Songnian I-5 Zhou, Wanlei I-26, I-584, II-121, II-388, II-871, II-1002 Zhou, Xiaofeng II-191 Zhou, Xiaojun II-1026 Zhou, Xin II-1022 Zhou, Xingshe I-1075 Zhou, Xubo II-549 Zhu, Guohun II-179 Zhu, Hao I-616 Zhu, Hong II-446, II-994 Zhu, Jianmin I-716 Zhu, Jing I-237 Zhu, Lejun II-803 Zhu, Luis I-147 Zhu, Shisheng I-1012

1076

Author Index

Zhu, Tao I-237 Zhu, Tieying II-199 Zhu, Ye I-637 Zhuge, Hai I-6, II-787, II-795 Zhun, Junmao II-227 Zong, Yuwei I-829

Zou, Zou, Zou, Zuo, Zuo,

Deqing I-487, II-573, II-830 Futai I-308, II-380 Shengrong II-683 Guowei I-1030 Xiaolu II-372

This page intentionally left blank

This page intentionally left blank

Grid and Cooperative Computing: Second International Workshop, GCC 2003, Shanghai, China, December 7-10, 2003, Revised Papers, Part II

Active Mining: Second International Workshop, AM 2003, Maebashi, Japan, October 28, 2003, Revised Selected Papers

Digital Watermarking: Second International Workshop, IWDW 2003, Seoul, Korea, October 20-22, 2003, Revised Papers

Grid and Cooperative Computing - GCC 2004: Third International Conference, Wuhan, China, October 21-24, 2004. Proceedings

Multi-Agent-Based Simulation III: International Workshop, MABS 2003, Melbourne, Australia, July 1, 2003, Revised Papers

Grid and Cooperative Computing - GCC 2005: 4th International Conference, Beijing, China, November 30 -- December 3, 2005, Proceedings

php|architect (December 2003)

Biomedical Image Registration: Second International Workshop, WBIR 2003, Philadelphia, PA, USA, June 23-24, 2003, Revised Papers

Cooperative Information Agents VII: 7th International Workshop, CIA 2003, Helsinki, Finland, August 27-29, 2003, Proceedings

Formal Modeling and Analysis of Timed Systems: First International Workshop, FORMATS 2003, Marseille, France, September 6-7, 2003, Revised Papers

Experimental and Efficient Algorithms: Second International Workshop, WEA 2003, Ascona, Switzerland, May 26-28, 2003, Proceedings

Intelligent Techniques for Web Personalization: IJCAI 2003 Workshop, ITWP 2003, Acapulco, Mexico, August 11, 2003, Revised Selected Papers

High Performance Computing -- HiPC 2003: 10th International Conference, Hyderabad, India, December 17-20, 2003, Proceedings

Intelligent Data Engineering and Automated Learning -- IDEAL 2003: 4th International Conference, Hong Kong, China, March 21-23, 2003, Revised Papers

Scientific Applications of Grid Computing: First International Workshop, SAG 2004, Beijing, China, September, Revised Selected and Invited Papers

Advanced Parallel Processing Technologies: 5th International Workshop, APPT 2003, Xiamen, China, September 17-19, 2003, Proceedings

Distributed Computing - IWDC 2003

Global Optimization and Constraint Satisfaction: Second International Workshop, COCOS 2003, Lausanne, Switzerland, Nevember 18-21, 2003, Revised

Innovative Internet Community Systems: Third International Workshop, IICS 2003, Leipzig, Germany, June 19-21, 2003, Revised Papers

Agent-Oriented Software Engineering IV: 4th International Workshop, AOSE 2003, Melbourne, Australia, July 15, 2003, Revised Papers

Information Security Applications: 4th International Workshop, WISA 2003, Jeju Island, Korea, August 25-27, 2003, Revised Papers

Security in Pervasive Computing: First International Conference, Boppard, Germany, March 12-14, 2003, Revised Papers

Grid and Cooperative Computing - GCC 2004 Workshops: GCC 2004 International Workshops, IGKG, SGT, GISS, AAC-GEVO, and VVS, Wuhan, China, October 21-24, 2004

Grid Computing - GRID 2001: Second International Workshop, Denver, CO, USA, November 12, 2001. Proceedings

Power-Aware Computer Systems: Third International Workshop, PACS 2003, San Diego, CA, USA, December 1, 2003, Revised Papers (Lecture Notes in Computer Science)

Grid and cooperative computing: second international workshop, GCC 2003, Shanghai, China, December 7-10, 2003: revised papers, part II

Grid and Cooperative Computing: Second International Workshop, GCC 2003, Shanghai, China, December 7-10, 2003, Revised Papers, Part II

Active Mining: Second International Workshop, AM 2003, Maebashi, Japan, October 28, 2003, Revised Selected Papers

Digital Watermarking: Second International Workshop, IWDW 2003, Seoul, Korea, October 20-22, 2003, Revised Papers

Grid and Cooperative Computing - GCC 2004: Third International Conference, Wuhan, China, October 21-24, 2004. Proceedings

Multi-Agent-Based Simulation III: International Workshop, MABS 2003, Melbourne, Australia, July 1, 2003, Revised Papers

Grid and Cooperative Computing - GCC 2005: 4th International Conference, Beijing, China, November 30 -- December 3, 2005, Proceedings

php|architect (December 2003)

Biomedical Image Registration: Second International Workshop, WBIR 2003, Philadelphia, PA, USA, June 23-24, 2003, Revised Papers

Cooperative Information Agents VII: 7th International Workshop, CIA 2003, Helsinki, Finland, August 27-29, 2003, Proceedings

Formal Modeling and Analysis of Timed Systems: First International Workshop, FORMATS 2003, Marseille, France, September 6-7, 2003, Revised Papers

Experimental and Efficient Algorithms: Second International Workshop, WEA 2003, Ascona, Switzerland, May 26-28, 2003, Proceedings

Intelligent Techniques for Web Personalization: IJCAI 2003 Workshop, ITWP 2003, Acapulco, Mexico, August 11, 2003, Revised Selected Papers

High Performance Computing -- HiPC 2003: 10th International Conference, Hyderabad, India, December 17-20, 2003, Proceedings

Intelligent Data Engineering and Automated Learning -- IDEAL 2003: 4th International Conference, Hong Kong, China, March 21-23, 2003, Revised Papers

Scientific Applications of Grid Computing: First International Workshop, SAG 2004, Beijing, China, September, Revised Selected and Invited Papers

Advanced Parallel Processing Technologies: 5th International Workshop, APPT 2003, Xiamen, China, September 17-19, 2003, Proceedings

Distributed Computing - IWDC 2003

Global Optimization and Constraint Satisfaction: Second International Workshop, COCOS 2003, Lausanne, Switzerland, Nevember 18-21, 2003, Revised

Innovative Internet Community Systems: Third International Workshop, IICS 2003, Leipzig, Germany, June 19-21, 2003, Revised Papers

Agent-Oriented Software Engineering IV: 4th International Workshop, AOSE 2003, Melbourne, Australia, July 15, 2003, Revised Papers

Information Security Applications: 4th International Workshop, WISA 2003, Jeju Island, Korea, August 25-27, 2003, Revised Papers

Security in Pervasive Computing: First International Conference, Boppard, Germany, March 12-14, 2003, Revised Papers

Grid and Cooperative Computing - GCC 2004 Workshops: GCC 2004 International Workshops, IGKG, SGT, GISS, AAC-GEVO, and VVS, Wuhan, China, October 21-24, 2004

Grid Computing - GRID 2001: Second International Workshop, Denver, CO, USA, November 12, 2001. Proceedings

Power-Aware Computer Systems: Third International Workshop, PACS 2003, San Diego, CA, USA, December 1, 2003, Revised Papers (Lecture Notes in Computer Science)

International Investment Perspectives 2003

Service-Oriented Computing - ICSOC 2003

Grid Computing: International Symposium on Grid Computing (ISGC 2007)

2003)

2003)

Grid and cooperative computing: second international workshop, GCC 2003, Shanghai, China, December 7-10, 2003: revised papers, part II

Grid and Cooperative Computing: Second International Workshop, GCC 2003, Shanghai, China, December 7-10, 2003, Revised Papers, Part II

Active Mining: Second International Workshop, AM 2003, Maebashi, Japan, October 28, 2003, Revised Selected Papers

Digital Watermarking: Second International Workshop, IWDW 2003, Seoul, Korea, October 20-22, 2003, Revised Papers

Grid and Cooperative Computing - GCC 2004: Third International Conference, Wuhan, China, October 21-24, 2004. Proceedings

Multi-Agent-Based Simulation III: International Workshop, MABS 2003, Melbourne, Australia, July 1, 2003, Revised Papers

Grid and Cooperative Computing - GCC 2005: 4th International Conference, Beijing, China, November 30 -- December 3, 2005, Proceedings

php|architect (December 2003)

Biomedical Image Registration: Second International Workshop, WBIR 2003, Philadelphia, PA, USA, June 23-24, 2003, Revised Papers

Cooperative Information Agents VII: 7th International Workshop, CIA 2003, Helsinki, Finland, August 27-29, 2003, Proceedings

Formal Modeling and Analysis of Timed Systems: First International Workshop, FORMATS 2003, Marseille, France, September 6-7, 2003, Revised Papers

Experimental and Efficient Algorithms: Second International Workshop, WEA 2003, Ascona, Switzerland, May 26-28, 2003, Proceedings

Intelligent Techniques for Web Personalization: IJCAI 2003 Workshop, ITWP 2003, Acapulco, Mexico, August 11, 2003, Revised Selected Papers

High Performance Computing -- HiPC 2003: 10th International Conference, Hyderabad, India, December 17-20, 2003, Proceedings

Intelligent Data Engineering and Automated Learning -- IDEAL 2003: 4th International Conference, Hong Kong, China, March 21-23, 2003, Revised Papers

Scientific Applications of Grid Computing: First International Workshop, SAG 2004, Beijing, China, September, Revised Selected and Invited Papers

Advanced Parallel Processing Technologies: 5th International Workshop, APPT 2003, Xiamen, China, September 17-19, 2003, Proceedings

Distributed Computing - IWDC 2003

Global Optimization and Constraint Satisfaction: Second International Workshop, COCOS 2003, Lausanne, Switzerland, Nevember 18-21, 2003, Revised

Innovative Internet Community Systems: Third International Workshop, IICS 2003, Leipzig, Germany, June 19-21, 2003, Revised Papers

Agent-Oriented Software Engineering IV: 4th International Workshop, AOSE 2003, Melbourne, Australia, July 15, 2003, Revised Papers

Information Security Applications: 4th International Workshop, WISA 2003, Jeju Island, Korea, August 25-27, 2003, Revised Papers

Security in Pervasive Computing: First International Conference, Boppard, Germany, March 12-14, 2003, Revised Papers

Grid and Cooperative Computing - GCC 2004 Workshops: GCC 2004 International Workshops, IGKG, SGT, GISS, AAC-GEVO, and VVS, Wuhan, China, October 21-24, 2004

Grid Computing - GRID 2001: Second International Workshop, Denver, CO, USA, November 12, 2001. Proceedings

Power-Aware Computer Systems: Third International Workshop, PACS 2003, San Diego, CA, USA, December 1, 2003, Revised Papers (Lecture Notes in Computer Science)

International Investment Perspectives 2003

Service-Oriented Computing - ICSOC 2003

Grid Computing: International Symposium on Grid Computing (ISGC 2007)

2003)

2003)

Recommend Documents